Handle variable as file with pandas dataframe

Question

I would like to create a pandas dataframe out of a list variable. With pd.DataFrame() I am not able to declare delimiter which leads to just one column per list entry.

If I use pd.read_csv() instead, I of course receive the following error

ValueError: Invalid file path or buffer object type: <class 'list'>

If there a way to use pd.read_csv() with my list and not first save the list to a csv and read the csv file in a second step?

I also tried pd.read_table() which also need a file or buffer object.

Example data (seperated by tab stops):

Col1    Col2    Col3
12      Info1   34.1
15      Info4   674.1

test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]

Current workaround:

with open(f'{filepath}tmp.csv', 'w', encoding='UTF8') as f:
    [f.write(line + "\n") for line in consolidated_file]
        

df = pd.read_csv(f'{filepath}tmp.csv', sep='\t', index_col=1 )

you could convert it to nested list [['Col1', 'Col2', 'Col3'], ['12', 'Info1', '34.1'], ['15', 'Info4', '674.1']] and then use DataFrame - like in answer. OR if you would convert it to single string (using '\n') as line separator then you could use read_csv with io.BytesIO or io.StringIO to create file in memory. io.BytesIO is popular if you get file (data, image, audio) from network and you want to use it without saving on disk. — furas
– furas, Commented Sep 17, 2021 at 20:25

Prarthan Ramesh · Accepted Answer · 2021-09-17 16:52:11Z

1

import pandas as pd
df = pd.DataFrame([x.split('\t') for x in test])
print(df)

and you want header as your first row then

df.columns = df.iloc[0]
df = df[1:]

edited Sep 17, 2021 at 16:52

answered Sep 17, 2021 at 16:19

Prarthan Ramesh

3341 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

furas Over a year ago

you could also first convert to nested list data = [line.split('\t') for line in test] and later use it in df = pd.DataFrame(data[1:], columns=data[0])

furas · Accepted Answer · 2021-09-18 12:19:09Z

0

It seems simpler to convert it to nested list like in other answer

import pandas as pd

test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]

data = [line.split('\t') for line in test]

df = pd.DataFrame(data[1:], columns=data[0])

but you can also convert it back to single string (or get it directly form file on socket/network as single string) and then you can use io.BytesIO or io.StringIO to simulate file in memory.

import pandas as pd
import io

test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]

single_string = "\n".join(test)

file_like_object = io.StringIO(single_string)

df = pd.read_csv(file_like_object, sep='\t')

or shorter

df = pd.read_csv(io.StringIO("\n".join(test)), sep='\t')

This method is popular when you get data from network (socket, web API) as single string or data.

edited Sep 18, 2021 at 12:19

answered Sep 17, 2021 at 20:30

furas

149k12 gold badges121 silver badges171 bronze badges

2 Comments

Chris Over a year ago

Thank you, the first solution works fine. Unfortunately, the variant with io leads to the fact that I get only one column in my dataframe in which col1, col2 and col3 are contained.

furas Over a year ago

I forgot sep='\t' like with normal file.

Collectives™ on Stack Overflow

Handle variable as file with pandas dataframe

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related