0

Recently I am struggling to learn how to create DataFrames using Pandas being guided by Pandas' documentation. The main problems that I was having were related to the parameters of the DataFrame function. The attributes columns and index. I was using one object containing the data and was trying to create one DataFrame using that data.

The code that I am working on and having errors is this:

data = [["Orange", "Watermelon", "Banana", "Apple"], [10, 30, 23, 56]]
df = pd.DataFrame(data=data[1], columns=data[0])
print(df)

It does not work for me, and if I remove the columns parameter it works. What can be done to solve it?

1
  • You probably meant pd.DataFrame(data=[data[1]], columns=data[0]). data= is looking for a 2d list containing all the rows. In your case you have 1 row, so the outer list has 1 element with your 4 cells of data. Commented Jan 17, 2021 at 1:18

3 Answers 3

1

You can simply do this:

In [1426]: df = pd.DataFrame(data=[data[1]], columns=data[0])

In [1427]: df
Out[1427]: 
   Orange  Watermelon  Banana  Apple
0      10          30      23     56
Sign up to request clarification or add additional context in comments.

Comments

1

For someone who has experience with DataFrame, the problem is clear and easy to find out, but for who is starting it can be very frustrating to have to deal with problems like that.

The problem with the DataFrame is that pandas will not be able to assign 4 data values as 4 columns if you do not guide pandas to do so. What I was trying to do was to create one DataFrame of 4 lines and 1 column (4,1) using 4 data values and 4 column names (4,4). For the code work, you can use the columns values as index and then transpose your DataFrame. What transpose() does is simply transform the lines of the DataFrame into columns.

data = [["Orange", "Watermelon", "Banana", "Apple"], [10, 30, 23, 56]]
df = pd.DataFrame(data=data[1], index=data[0]).transpose()
print(df)

   Orange  Watermelon  Banana  Apple
0      10          30      23     56

The first way will work just if you have enough data to create one 4x4 DataFrame, for example:

data = [["Orange", "Watermelon", "Banana", "Apple"], [
    [10, 30, 23, 56], [18, 44, 12, 73], [2, 24, 88, 10], [5, 71, 35, 62]]]
df = pd.DataFrame(data=data[1], columns=data[0])
print(df)

   Orange  Watermelon  Banana  Apple
0      10          30      23     56
1      18          44      12     73
2       2          24      88     10
3       5          71      35     62

EDIT

As @Mayank said, it can be simplified by:

df = pd.DataFrame(data=[data[1]], columns=data[0]) 

   Orange  Watermelon  Banana  Apple
0      10          30      23     56

Comments

0

Your code is throwing the following exception:

ValueError: Shape of passed values is (4, 1), indices imply (4, 4)

This means there's a mismatch in the data shape: you're passing a column vector of shape (4, 1), but pandas expects a row of shape (1, 4) for the given column headers.

You can fix this by reshaping your data as follows:

df = pd.DataFrame(data=[data[1]], columns=data[0])

1 Comment

what does this add that the other two answers don't already have?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.