Python Data Frame - nested comma in csv file

Question

The data file is like given below. How shall I read through data frame?

'''
 [[2020,1,22],0,0,0], 
 [[2020,1,23],0,0,0], 
 [[2020,1,24],0,0,0], 
 [[2020,1,25],0,0,0], 
 [[2020,1,26],0,0,0], 
 [[2020,1,27],0,0,0], 

'''

You could prepare and modify the input with a script to better suit the CSV format: separate the year, month and day into individual columns, remove all square brackets, and remote the trailing comma. Then read it as a standard CSV file. — 9769953
– 9769953, Commented May 12, 2021 at 20:00
I modified like this and then using quotechar ='~' ~2020,1,22~,0,0,0, ~2020,1,23~,0,0,0, ~2020,1,24~,0,0,0, ~2020,1,25~,0,0,0, ~2020,1,26~,0,0,0, — Kalai
– Kalai, Commented May 12, 2021 at 20:16

tdy · Accepted Answer · 2021-05-12 20:46:59Z

1

Read the data as a single column of strings:

df = pd.read_fwf('data.txt', header=None)

# or read as csv with sep='\n'
# df = pd.read_csv('data.txt', sep='\n', header=None)

Parse the list-looking strings into actual lists with ast.literal_eval and expand them into columns with apply(pd.Series):

from ast import literal_eval
df = df[0].str.strip(', ').apply(literal_eval).apply(pd.Series)

Convert the date lists to real datetimes:

df[0] = df[0].agg(lambda x: pd.to_datetime('-'.join(map(str, x))))

Output:

           0  1  2  3
0 2020-01-22  0  0  0
1 2020-01-23  0  0  0
2 2020-01-24  0  0  0
3 2020-01-25  0  0  0
4 2020-01-26  0  0  0
5 2020-01-27  0  0  0

edited May 12, 2021 at 20:46

answered May 12, 2021 at 20:14

tdy

42k42 gold badges124 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Kalai Over a year ago

Thanks. I am new to this forum. Need 15 reputations it seems. Accepted your answer. Thanks again.

jjramsey Over a year ago

One catch with read_fwf is that it's for fixed-width formats, but there's no guarantee that the lines will all be the same width. For example, the month that is 1 on one line could be 11 on another, or the zeros in the 2nd through 4th columns could be replaced with any other numbers.

tdy Over a year ago

@jjramsey it's flexible enough to handle that unless you explicitly specify colspecs or widths. the main caveat here is if there are spaces in the original data, in which case read_fwf will infer more columns

tdy Over a year ago

read_csv with sep='\n' also works if read_fwf gives you trouble

Kalai Over a year ago

tdy and jjramsey , you guys are right. fwf gave problem at the third line mentioned below. [[2020,5,8],3344,1111,96], [[2020,5,9],3113,1414,116], [[2020,5,10],4353,1668,111], used csv instead. But didn't give problem here [[2020,2,9],0,0,0], [[2020,2,10],0,0,0], It worked. Thanks.

Collectives™ on Stack Overflow

Python Data Frame - nested comma in csv file

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related