2

I have a list of lists containing 12 elements in 10 items. I would like to obtain a dataframe with 12 columns and 10 rows. The twelve comma separated items are treated as one column by pd.DataFrame(). The apostrophe are part of the list to indicate a string but I suspect it is interpreted by the DataFrame function as the column boundaries. They cannot be replaced. How can this be done? What is causing this behaviour? Here is the sample data:

[['1,er,2,Fado de Padd,1\'18"1,H,6,2600,J. Dekker,17 490 €,A. De Wrede,1,6'],
 ['2,e,7,Elixir Normand,1\'18"2,H,7,2600,S. Schoonhoven,24 755 €,S. Schoonhoven,14'],
 ['3,e,3,Give You All of Me,1\'18"2,H,5,2600,JF. Van Dooyeweerd,17 600 €,JF. Van Dooyeweerd,10'],
 ['4,e,4,Gouritch,1\'18"3,H,5,2600,BJ. Crebas,20 700 €,BJ. Crebas,32'],
 ['5,e,1,Franky du Cap Vert,1\'18"4,H,6,2600,JH. Mieras,15 536 €,N. De Vreede,65'],
 ['6,e,10,Défi Magik,1\'18"0,H,8,2620,F. Verkaik,44 865 €,AW. Bosscha,6,3'],
 ['7,e,9,Fleuron,1\'18"2,H,6,2620,M. Brouwer,44 830 €,D. Brouwer,7,3'],
 ['8,e,8,Dream Gibus,1\'18"6,H,8,2620,R. Ebbinge,33 330 €,Mme A. Lehmann,36'],
 ['9,e,5,Beau Gaillard,1\'19"5,H,10,2600,A. Bakker,20 140 €,N. De Vreede,44'],
 ['0,DAI,6,Bikini de Larcy,H,10,2600,D. Den Dubbelden,21 834 €,N. Rip,52']]

Any help welcome.

5
  • That's a csv format, so why not use read_csv to parse it? Commented Jun 13, 2022 at 11:03
  • Simply because it is part of a bigger chain of actions and I'm not ready at this point in the sequence to open a file in append mode, write the file, read the file to get the dataframe. 3 lines of code vs. 1. I need to iterate this step through a range that will multiply the code. Besides I wanted to know how to do this, I'm still learning. I already know read_csv ;) Commented Jun 13, 2022 at 16:04
  • read_csv can also read from io.StringIO, i.e from a string. :) Commented Jun 13, 2022 at 16:05
  • StringIO does not like lists "initial_value must be str or None, not list" Commented Jun 13, 2022 at 16:19
  • Let us continue this discussion in chat. Commented Jun 14, 2022 at 8:18

2 Answers 2

2

The apostrophe means that the data is string type in the list, but can be extracted as the first element using my_list[0]. Need to process each list using list comprehension before putting into the dataframe.

There seems some typo (missing coordinates) in the last line of data, so I corrected it by adding 'null'.

import pandas as pd

data = [['1,er,2,Fado de Padd,1\'18"1,H,6,2600,J. Dekker,17 490 €,A. De Wrede,1,6'],
 ['2,e,7,Elixir Normand,1\'18"2,H,7,2600,S. Schoonhoven,24 755 €,S. Schoonhoven,14'],
 ['3,e,3,Give You All of Me,1\'18"2,H,5,2600,JF. Van Dooyeweerd,17 600 €,JF. Van Dooyeweerd,10'],
 ['4,e,4,Gouritch,1\'18"3,H,5,2600,BJ. Crebas,20 700 €,BJ. Crebas,32'],
 ['5,e,1,Franky du Cap Vert,1\'18"4,H,6,2600,JH. Mieras,15 536 €,N. De Vreede,65'],
 ['6,e,10,Défi Magik,1\'18"0,H,8,2620,F. Verkaik,44 865 €,AW. Bosscha,6,3'],
 ['7,e,9,Fleuron,1\'18"2,H,6,2620,M. Brouwer,44 830 €,D. Brouwer,7,3'],
 ['8,e,8,Dream Gibus,1\'18"6,H,8,2620,R. Ebbinge,33 330 €,Mme A. Lehmann,36'],
 ['9,e,5,Beau Gaillard,1\'19"5,H,10,2600,A. Bakker,20 140 €,N. De Vreede,44'],
 ['0,DAI,6,Bikini de Larcy,null,H,10,2600,D. Den Dubbelden,21 834 €,N. Rip,52']]

df = pd.DataFrame([line[0].split(',') for line in data])
print(df)

Output

   0    1   2                   3       4  5   6     7                   8   \
0  1   er   2        Fado de Padd  1'18"1  H   6  2600           J. Dekker   
1  2    e   7      Elixir Normand  1'18"2  H   7  2600      S. Schoonhoven   
2  3    e   3  Give You All of Me  1'18"2  H   5  2600  JF. Van Dooyeweerd   
3  4    e   4            Gouritch  1'18"3  H   5  2600          BJ. Crebas   
4  5    e   1  Franky du Cap Vert  1'18"4  H   6  2600          JH. Mieras   
5  6    e  10          Défi Magik  1'18"0  H   8  2620          F. Verkaik   
6  7    e   9             Fleuron  1'18"2  H   6  2620          M. Brouwer   
7  8    e   8         Dream Gibus  1'18"6  H   8  2620          R. Ebbinge   
8  9    e   5       Beau Gaillard  1'19"5  H  10  2600           A. Bakker   
9  0  DAI   6     Bikini de Larcy    null  H  10  2600    D. Den Dubbelden   

          9                  10  11    12  
0  17 490 €         A. De Wrede   1     6  
1  24 755 €      S. Schoonhoven  14  None  
2  17 600 €  JF. Van Dooyeweerd  10  None  
3  20 700 €          BJ. Crebas  32  None  
4  15 536 €        N. De Vreede  65  None  
5  44 865 €         AW. Bosscha   6     3  
6  44 830 €          D. Brouwer   7     3  
7  33 330 €      Mme A. Lehmann  36  None  
8  20 140 €        N. De Vreede  44  None  
9  21 834 €              N. Rip  52  None 

Second method with the same output:

df = pd.DataFrame(data)[0].str.split(',', expand=True)

Third method with similar output:

from io import StringIO

stringdata = StringIO('\n'.join([line[0] for line in data]))
df = pd.read_csv(stringdata, sep=',', header=None)

However, please note that the first method (list comprehension) is still the most efficient!

Sign up to request clarification or add additional context in comments.

7 Comments

So, list comprehension cycles through the lines and split() separates each entry. It would be simpler if you could tell DataFrame() that your records are comma separated like in read_csv because the function already cycles through each line.
you could use the alternative: pd.DataFrame(data)[0].str.split(',', expand=True)
hey I like this, it seems more efficient!
@Zen4ttitude - hmmm, more efficient is list compreheonsion here
let's have a race... list comprehension: --- 0.000804901123046875 seconds --- DataFrame with split: --- 0.0012080669403076172 seconds --- you are right @jezrael
|
2

If use only split it working well but last row is mismatched, so all values from column 4 are shifted:

df = pd.DataFrame([y.split(',') for x in L for y in x])

df.iloc[-1, 4:] = df.iloc[-1, 4:].shift()

print (df)
  0    1   2                   3       4  5   6     7                   8   \
0  1   er   2        Fado de Padd  1'18"1  H   6  2600           J. Dekker   
1  2    e   7      Elixir Normand  1'18"2  H   7  2600      S. Schoonhoven   
2  3    e   3  Give You All of Me  1'18"2  H   5  2600  JF. Van Dooyeweerd   
3  4    e   4            Gouritch  1'18"3  H   5  2600          BJ. Crebas   
4  5    e   1  Franky du Cap Vert  1'18"4  H   6  2600          JH. Mieras   
5  6    e  10          Défi Magik  1'18"0  H   8  2620          F. Verkaik   
6  7    e   9             Fleuron  1'18"2  H   6  2620          M. Brouwer   
7  8    e   8         Dream Gibus  1'18"6  H   8  2620          R. Ebbinge   
8  9    e   5       Beau Gaillard  1'19"5  H  10  2600           A. Bakker   
9  0  DAI   6     Bikini de Larcy     NaN  H  10  2600    D. Den Dubbelden   

         9                   10  11    12  
0  17 490 €         A. De Wrede   1     6  
1  24 755 €      S. Schoonhoven  14  None  
2  17 600 €  JF. Van Dooyeweerd  10  None  
3  20 700 €          BJ. Crebas  32  None  
4  15 536 €        N. De Vreede  65  None  
5  44 865 €         AW. Bosscha   6     3  
6  44 830 €          D. Brouwer   7     3  
7  33 330 €      Mme A. Lehmann  36  None  
8  20 140 €        N. De Vreede  44  None  
9  21 834 €              N. Rip  52  None  

1 Comment

Thanks to both! @jezrael nice observation and solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.