Create dataframe from a string Python

Question

How do I create a dataframe from a string that look like this (part of the string)

,file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,

Trying to make a dataframe that look like this

bb1 · Accepted Answer · 2021-02-13 09:10:05Z

2

In the code below s is the string:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(s)).dropna(axis=1)
df.rename(columns={df.columns[0]: ""}, inplace=True)

By the way, if the string comes from a csv file then it is simpler to read the file directly using pd.read_csv.

Edit: This code will create a multiindex of columns:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(s), header = None).dropna(how="all", axis=1).T
df[0] = df.loc[1, 0]
df = df.set_index([0, 1]).T

edited Feb 13, 2021 at 9:10

answered Feb 13, 2021 at 8:52

bb1

7,9232 gold badges11 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Joe Ferndz Over a year ago

Very interesting. Thanks for sharing this. I didnt think read_csv will figure out the additional commas in first row.

Joe Ferndz Over a year ago

the multi-index solution is even better. Thanks for updating it. I was going to recommend the same

Joe Ferndz · Accepted Answer · 2021-02-13 08:50:11Z

Looks like you want a multi-level dataframe from the string. Here's how I would do it.

Step 1: Split the string by '\r\n'. Then for each value, split by ','
Step 2: The above step will create a list of list. Element #0 has 4 items and element #1 has 2 items. The rest have 3 items each and is the actual data
Step 3: Convert the data into a dictionary from element #3 onwards. Use values in element #2 as keys for the dictionary (namely x data and y data). To ensure you have key:[list of values], use the dict.setdefault(key,[]).append(value). This will ensure the data is created as a `key:[list of values]' dictionary.
Step 4: Create a normal dataframe using the dictionary as all the values are stored as key and values in the dictionary.
Step 5: Now that you have the dictionary, you want to create the MultiIndex. Convert the column to MultiIndex.

Putting all this together, the code is:

import pandas as pd

text = ',file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,'

line_text = [txt.split(',') for txt in text.split('\r\n')]

dct = {}

for x,y,z in line_text[2:]:
    dct.setdefault(line_text[1][0], []).append(x)
    dct.setdefault(line_text[1][1], []).append(y)

df = pd.DataFrame(dct)

df.columns = pd.MultiIndex.from_tuples([(line_text[0][i],line_text[1][i]) for i in [0,1]])

print (df)

Output of this will be:

              file_05
   x data      y data
0  -970.0   -34.12164
1  -959.0   -32.37526
2  -949.0  -30.360199
3  -938.0   -28.74816
4  -929.0   -27.53912
5  -920.0   -25.92707
6  -911.0   -24.31503
7  -900.0   -23.64334
8  -891.0   -22.29997

William Mou · Accepted Answer · 2021-02-13 08:43:07Z

0

You should convert your raw data to a table with python.
Save to csv file by import csv package with python.



from pandas import DataFrame

# s is raw datas
s = ",file_05,,\r\nx data,y data\r\n-970.0,-34.12164,\r\n-959.0,-32.37526,\r\n-949.0,-30.360199,\r\n-938.0,-28.74816,\r\n-929.0,-27.53912,\r\n-920.0,-25.92707,\r\n-911.0,-24.31503,\r\n-900.0,-23.64334,\r\n-891.0,-22.29997,"

# convert raw data to a table
table = [i.split(',') for i in s.split("\r\n")]
table = [i[:2] for i in table]

# table is like
"""
[['', 'file_05'],
 ['x data', 'y data'],
 ['-970.0', '-34.12164'],
 ['-959.0', '-32.37526'],
 ['-949.0', '-30.360199'],
  ...
 ['-891.0', '-22.29997']]
 """

# save to output.csv file 
import csv
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(table)
    
# Save to DataFrame df
from pandas import DataFrame
df = DataFrame (table[2:],columns=table[1][:2])
print(df)

edited Feb 13, 2021 at 8:43

answered Feb 13, 2021 at 8:21

William Mou

3312 silver badges6 bronze badges

1 Comment

Joe Ferndz Over a year ago

OP wants to create a dataframe not write to a file

Collectives™ on Stack Overflow

Create dataframe from a string Python

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related