1

I have the following dataset in df_1 which I want to convert into the format of df_2. In df_2 I have converted the columns of df_1 to rows in df_2 (excluding UserId and Date). I looked up for similar answers but they are providing little complex solutions. Is there a simple way to do this?

df_1

   UserId       Date                   -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
    87      2011-05-10 18:38:55.030     0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
    487     2011-11-29 14:46:12.080     0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
    21      2012-03-02 14:35:06.867     0   1   0   1   2   0   2   2   0   1   2   2   1   3   1

df_2

day | count
-7   0
-7   0
-7   0
-6   0
-6   0
-6   1
-5   0
-5   1
-5   0 
.    .
.    .(Similarly for other columns in between)
.    .
6   0    
6   0
6   3
7   0
7   0
7   1

3 Answers 3

1

Pandas provides a default method df.melt() for exactly this purpose, which is the reverse operation of df.pivot() or df.pivot_table(). (Not sure why the function name is not the more intuitive unpivot).

The advantages of this solution:

  • No reinvention of wheels. An easily understandable and generally applicable df.transpose() -> df.melt() logic.
  • Concatenation of columns and/or appending datasets were avoided.

Code

# 1. preparation: get the "day" column in place.
# Note: The column names were strings ('-7', '-6', ...) as copy-pasted.
col_names = [str(i) for i in range(-7, 8)]
df_tr = df_1[col_names].transpose().reset_index()
df_tr.rename(columns={"index": "day"}, inplace=True)
df_tr["day"] = df_tr["day"].astype(int)  # str to int

# 2. unpivoting (melting)
df_2_unpivot = df_tr.melt(id_vars="day", var_name="col", value_name="count")
df_2 = df_2_unpivot.sort_values(by=["day", "col"])

# 3.cleanup
del df_2["col"]
df_2.reset_index(drop=True, inplace=True)

Result

df_2
Out[134]: 
    day  count
0    -7      0
1    -7      0
2    -7      0
3    -6      0
4    -6      0
5    -6      1
6    -5      0
7    -5      1
8    -5      0
9    -4      0
10   -4      0
11   -4      1
12   -3      0
13   -3      0
14   -3      2
15   -2      0
16   -2      0
17   -2      0
18   -1      1
19   -1      0
20   -1      2
21    0      0
22    0      0
23    0      2
24    1      0
25    1      0
26    1      0
27    2      0
28    2      0
29    2      1
30    3      0
31    3      0
32    3      2
33    4      0
34    4      0
35    4      2
36    5      0
37    5      0
38    5      1
39    6      0
40    6      0
41    6      3
42    7      0
43    7      0
44    7      1

Also check out the intermediate datasets and play with the options yourself.

Sign up to request clarification or add additional context in comments.

Comments

1

You could use apply and concatenate all the rows and sort them-

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.random((3, 10)), columns=range(10))
df = df.T
new_df = pd.Series([], dtype=np.float64)


def f(x):
    global new_df  # not the most elegant way, something you could work upon?
    new_df = pd.Series.append(new_df, x)


df.apply(f, axis=0)
new_df.sort_index(inplace=True)
print(new_df)
0    0.020673
0    0.710004
0    0.590984
1    0.643964
1    0.719694
1    0.105075
2    0.270417
2    0.537349
2    0.610228
3    0.391562
3    0.760375
3    0.105794
4    0.726044
4    0.676487
4    0.851921
5    0.447779
5    0.798975
5    0.877853
6    0.807380
6    0.639440
6    0.435890
7    0.263091
7    0.722340
7    0.586944
8    0.142973
8    0.928533
8    0.438123
9    0.076326
9    0.385373
9    0.662350
dtype: float64

Comments

0

Is this what you want (transpose())?

import pandas as pd
from io import StringIO

# Prework to generate your data
data = """UserId       Date                   -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
    87      2011-05-10 18:38:55.030     0   0   0   0   0   0   1   0   0   0   0   0   0   0   0
    487     2011-11-29 14:46:12.080     0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
    21      2012-03-02 14:35:06.867     0   1   0   1   2   0   2   2   0   1   2   2   1   3"""

input_data = StringIO(data)
df_1 = pd.read_table(input_data, sep=r"\s{2,}", engine="python")

# remove unused columns
df_1.drop(["Date", "UserId"], axis=1, inplace=True)

# # and transpose
df_2 = df_1.transpose()

# concat all lines
df_2 = df_2[0].append(df_2[1]).append(df_2[2])
df_2.sort_index(inplace=True)

print(df_2)

Output:

-1    0.0
-1    2.0
-1    1.0
-2    0.0
-2    0.0
-2    0.0
-3    0.0
-3    2.0
-3    0.0
-4    0.0
-4    1.0
-4    0.0
-5    0.0
-5    0.0
-5    1.0
-6    1.0
-6    0.0
-6    0.0
-7    0.0
-7    0.0
-7    0.0
0     2.0
0     0.0
0     0.0
1     0.0
1     0.0
1     0.0
2     0.0
2     1.0
2     0.0
3     0.0
3     2.0
3     0.0
4     2.0
4     0.0
4     0.0
5     0.0
5     0.0
5     1.0
6     0.0
6     0.0
6     3.0
7     0.0
7     0.0
7     NaN

2 Comments

No, I don't want the Transpose. The obtained dataframe must have only 2 columns.
i changed to a 2 col table after transposing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.