5

I am trying to create a pandas df that looks like:

   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

To implement, I am for now creating two dataframes

df1 = pd.DataFrame({'AAA' : [4] * 2 , 'BBB' : [10,20], 'CCC' : [100,50]})
df2 = pd.DataFrame({'AAA': [5]*2, 'BBB' : [30,40],'CCC' : [-30,-50]})

and then appending rows of df2 to df1 to create the desired df

I tried to do

df = pd.DataFrame({'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' :
 [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

But I get an error with the key message:

ValueError('arrays must all be same length') ValueError: arrays must all be the same length

I can of course do:

df = pd.DataFrame({'AAA' : [4,4,5,5], 'BBB' : [10,20,30,40],'CCC' :
 [100,50,-30,-50]}); df

But is there not another elegant way to do this? This small example is easy to implement but if I want to scale up to many rows, the input becomes very long.

3 Answers 3

7

I believe you need join lists by +:

df = pd.DataFrame({'AAA' : [4]*2 + [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

Or use repeat with concatenate:

df = pd.DataFrame({'AAA' :  np.concatenate([np.repeat(4, 2), np.repeat(5, 2)]),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

Alternative:

df = pd.DataFrame({'AAA' :  np.repeat((4,5), 2),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50
Sign up to request clarification or add additional context in comments.

2 Comments

explanation about the last np.repeat((4,5), (2, 2)), parameters please
@KansaiRobot - Old code ;) Here need only np.repeat((4,5), 2) - repeat values 2 times
1

For a general solution you could do:

import pandas as pd

data = [(4, 2), (5, 2)]
df = pd.DataFrame({'AAA' : [value for value, reps in data for _ in range(reps)], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df)

Where data is a list of value, repetitions tuple. So for your particular example you have 4 with 2 repetitions and 5 with 2 repetitions hence [(4, 2), (5, 2)].

Comments

1

The error you get is quite clear. When you create a dataframe from a dictionary, all of the arrays must be the same length. When you create a dictionary, if you give the same key multiple time, the last one is used. So

{'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

is the same as

{'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

When you try to create a dataframe from that dictionnary, you want one column with 2 rows and 2 columns with 4 rows, hence the error. As @jezrael pointed out, you can create the desired column for 'AAA' by joining the list and then creating the dataframe from that list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.