0

I would like to create a pandas dataframe out of a list variable. With pd.DataFrame() I am not able to declare delimiter which leads to just one column per list entry.

If I use pd.read_csv() instead, I of course receive the following error

ValueError: Invalid file path or buffer object type: <class 'list'>

If there a way to use pd.read_csv() with my list and not first save the list to a csv and read the csv file in a second step?

I also tried pd.read_table() which also need a file or buffer object.

Example data (seperated by tab stops):

Col1    Col2    Col3
12      Info1   34.1
15      Info4   674.1

test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]

Current workaround:

with open(f'{filepath}tmp.csv', 'w', encoding='UTF8') as f:
    [f.write(line + "\n") for line in consolidated_file]
        

df = pd.read_csv(f'{filepath}tmp.csv', sep='\t', index_col=1 ) 
1
  • you could convert it to nested list [['Col1', 'Col2', 'Col3'], ['12', 'Info1', '34.1'], ['15', 'Info4', '674.1']] and then use DataFrame - like in answer. OR if you would convert it to single string (using '\n') as line separator then you could use read_csv with io.BytesIO or io.StringIO to create file in memory. io.BytesIO is popular if you get file (data, image, audio) from network and you want to use it without saving on disk. Commented Sep 17, 2021 at 20:25

2 Answers 2

1
import pandas as pd
df = pd.DataFrame([x.split('\t') for x in test])
print(df)

and you want header as your first row then

df.columns = df.iloc[0]
df = df[1:]
Sign up to request clarification or add additional context in comments.

1 Comment

you could also first convert to nested list data = [line.split('\t') for line in test] and later use it in df = pd.DataFrame(data[1:], columns=data[0])
0

It seems simpler to convert it to nested list like in other answer

import pandas as pd

test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]

data = [line.split('\t') for line in test]

df = pd.DataFrame(data[1:], columns=data[0])

but you can also convert it back to single string (or get it directly form file on socket/network as single string) and then you can use io.BytesIO or io.StringIO to simulate file in memory.

import pandas as pd
import io

test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]

single_string = "\n".join(test)

file_like_object = io.StringIO(single_string)

df = pd.read_csv(file_like_object, sep='\t')

or shorter

df = pd.read_csv(io.StringIO("\n".join(test)), sep='\t')

This method is popular when you get data from network (socket, web API) as single string or data.

2 Comments

Thank you, the first solution works fine. Unfortunately, the variant with io leads to the fact that I get only one column in my dataframe in which col1, col2 and col3 are contained.
I forgot sep='\t' like with normal file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.