adding (list of) lists to csv using pandas dataframe or anything else python 3

Question

I have 10 lists with thousands of rows, for example:

l1 = ['a1', 'a2', ...], l2 = ['1', '2', ...], ..., l10 = ['abc', 'sde',...]

the count of rows of all of them is the same. I would like to create a CSV file like:

name reg... address

'a1' '1'... 'abc'

'a2' '2'... 'sde'

First I thought using pandas DataFrame:(I only used 103 first rows for testing)

data = [l1, l2,..., l10]
lables = ['name', 'reg', ...,'address']    
df = pd.DataFrame(data, columns=labels)
....

I got this error:

Traceback (most recent call last): File "ch.py", line 122, in status_list, retrieved_at_list, source_url_list) File "ch.py", line 95, in charity df = pd.DataFrame(data, columns=labels) File "C:\Users\MON\AppData\Local\Programs\Python\Python36-32\lib\site packages\ pandas\core\frame.py", line 369, in init arrays, columns = _to_arrays(data, columns, dtype=dtype) File "C:\Users\MON\AppData\Local\Programs\Python\Python36-32\lib\site-packages\ pandas\core\frame.py", line 6284, in _to_arrays dtype=dtype) File "C:\Users\MON\AppData\Local\Programs\Python\Python36-32\lib\site-packages\ pandas\core\frame.py", line 6363, in _list_to_arrays coerce_float=coerce_float) File "C:\Users\MON\AppData\Local\Programs\Python\Python36-32\lib\site-packages\ pandas\core\frame.py", line 6420, in _convert_object_array 'columns' % (len(columns), len(content)))

AssertionError: 10 columns passed, passed data had 103 columns

Then I tried to use:

data = [l1, l2,..., l10]
with open('charity.csv', 'w') as  ch_list:
        wr = csv.writer(ch_list, lineterminator='\n')
        wr.writerows(data)

But I got all of the data of l1 to l10 in one column.

I have two questions:

1- How can I solve my problem? In terms of performance, I prefer to use pandas DataFrame, however, I am open to any new suggestions.

2-What is the meaning of the error I got for DataFrame and how can I solve it?

Prashant Gupta · Accepted Answer · 2020-12-11 13:46:09Z

2

Problem (answer for 2nd question): Error arises due to way of passing parameters.

Consider the code:

import pandas as pd
l1 = [1,2,3,4]
l2=['a','b','c','d']
values = [l1,l2]
df2 = pd.DataFrame(values, columns=['p', 'q', 'r', 's'])
df2.head()

    p   q   r   s
0   1   2   3   4
1   a   b   c   d

The problem is columns passed are interpreted as rows (see pandas doc). Hence, each item of values is a row and length of a row is total number of columns which is 4 here.

Solution: Right way of passing parameters

d = {'num':l1, 'char':l2}
df = pd.DataFrame(data=d)
df.head()
    char  num
0   a     1
1   b     2
2   c     3
3   d     4

edited Dec 11, 2020 at 13:46

answered May 7, 2018 at 20:25

Prashant Gupta

4392 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

G. Hak. Over a year ago

I found this page chrisalbon.com/python/data_wrangling/… and I noticed that the problem is I used a list instead of a dictionary when I came back to put a note I saw @Prashant 's note which is less code and more simple

Prashant Gupta Over a year ago

@G.Hak. Thanks for the appreciation.

harpan · Accepted Answer · 2018-05-07 18:47:03Z

1

2-What is the meaning of the error I got for DataFrame and how can I solve it?

Your error says that data has 103 entries while labels have 3 column headers. You can use above solution.

EDIT: based on OP's comment, seems like below is the solution

l1 = ['a1', 'a2', 'a3']
l2 = ['c1', 'c2', 'c3']
l3 = [1,2,3]
labels=['name', 'reg', 'address']
df = pd.DataFrame(np.column_stack([l1, l2, l3]), columns=labels)

Output:

    name    reg  address
0   a1      c1     1
1   a2      c2     2
2   a3      c3     3

edited May 7, 2018 at 18:47

answered May 7, 2018 at 17:03

harpan

8,6412 gold badges22 silver badges40 bronze badges

3 Comments

G. Hak. Over a year ago

thanks for your help I tried both of solutions and got the same error. The problem is the lists are not like l1 = ['a1', 1, 'c1'] l2 = ['a2',2,'c2'] but it is like l1 = ['a1', 'a2'] l2 = ['1',2'] let's say that each list is the value for a column and I want the output that you mentioned above

harpan Over a year ago

@G.Hak. see the edit. If that solves the problem, I will edit my entire answer.

G. Hak. Over a year ago

thank you it solved the problem, however, I am worried about performance issue as each list has thousands of rows

Jonas Byström · Accepted Answer · 2018-05-07 20:29:55Z

0

df = pd.DataFrame({'l%i'%i:data[i] for i in range(len(data))})

answered May 7, 2018 at 20:29

Jonas Byström

26.4k23 gold badges106 silver badges154 bronze badges

Collectives™ on Stack Overflow

adding (list of) lists to csv using pandas dataframe or anything else python 3

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related