1

I´m having trouble creating a dataframe on my list.

The list contains four columns, but instead it says on presente one column with data:

ValueError: 4 columns passed, passed data had 1 columns.

The list itself is presented in this way:

[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 559.64, 8.01, 0.5520765512479038]]
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 520.34, 7.44, 0.5393857093988743]]
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 556.72, 7.96, 0.5410827096899603]]
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 688.67, 9.84, 0.5845350761787548]]
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 625.3, 8.94, 0.5612954767824924]]

I know there is something happening due to the double [], but i can´t figure it out. Can´t someone help me?

Here is the code so far:

   for i in range(6):
    excel_file = pd.read_excel(input_file, sheet_name=sheet[i])
    excel_file = excel_file.values.tolist()

    filtered = [x for x in excel_file if 'TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)' in x
                or 'TOTAL DAS DESPESAS DE CUSTEIO (A)' in x
                ]

    sheet_file = sheet[i]
    sheet_variable.append(sheet_file)
    wb_name.append(file_name)
    conab_data.append(filtered)

    print(filtered)

df_conab = pd.DataFrame(conab_data, columns=['Descrição', 'Preço/ha', 'Scs/ha', 'Part. %'])
df_conab['Local/UF/Ano'] = sheet_variable
df_conab['Fonte'] = wb_name

print(df_conab)
3
  • the excel you are reading have a header or empty line? Commented Aug 17, 2022 at 17:41
  • the word your looking for is "nested" lists. a list will give you a column, a nested list will give you rows, so [1,2,3,4,5] will be laid out vertically, [[1,2,3,4,5],[6,7,8,9,10] will be laid out as 2 rows with 5 columns. any further nesting will add the deeper lists as a single element in your DF Commented Aug 17, 2022 at 17:41
  • @FelipeBonfante these excel files they do not present any headers. Commented Aug 17, 2022 at 17:49

2 Answers 2

2

you could fix this with a for loop

overly_nested = [[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 559.64, 8.01, 0.5520765512479038]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 520.34, 7.44, 0.5393857093988743]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 556.72, 7.96, 0.5410827096899603]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 688.67, 9.84, 0.5845350761787548]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 625.3, 8.94, 0.5612954767824924]]]

for i, sub_list in enumerate(overly_nested):
    overly_nested[i]=sub_list[0]
df = pd.DataFrame(overly_nested)
print(df)

I'm sure theres a way to do this with zip(), let me experiment and I'll edit if I find it

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot, I tried something like this and worked!
this happens pretty often while importing from google sheets or excel, I just came accros the solution overly_nested = [np.squeeze(i).tolist() for i in overly_nested] to remove the innermost later of nesting
1

You can try:

data = [
    [['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 559.64, 8.01, 0.5520765512479038]], 
    [['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 520.34, 7.44, 0.5393857093988743]],
    [['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 556.72, 7.96, 0.5410827096899603]], 
    [['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 688.67, 9.84, 0.5845350761787548]], 
    [['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 625.3, 8.94, 0.5612954767824924]]
]

df = pd.DataFrame([x[0] for x in data], columns=['A', 'B', 'C', 'D'])

print(df)

Output:

                                              A       B     C         D
0  TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)  559.64  8.01  0.552077
1  TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)  520.34  7.44  0.539386
2  TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)  556.72  7.96  0.541083
3  TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)  688.67  9.84  0.584535
4  TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)  625.30  8.94  0.561295

1 Comment

I think you could also make data into a numpy array and then drop an axis, which might be more performant if there are a lot of rows.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.