1

The data file is like given below. How shall I read through data frame?

'''
 [[2020,1,22],0,0,0], 
 [[2020,1,23],0,0,0], 
 [[2020,1,24],0,0,0], 
 [[2020,1,25],0,0,0], 
 [[2020,1,26],0,0,0], 
 [[2020,1,27],0,0,0], 

'''
3
  • You could prepare and modify the input with a script to better suit the CSV format: separate the year, month and day into individual columns, remove all square brackets, and remote the trailing comma. Then read it as a standard CSV file. Commented May 12, 2021 at 20:00
  • 1
    What have you tried so far ? Commented May 12, 2021 at 20:04
  • I modified like this and then using quotechar ='~' ~2020,1,22~,0,0,0, ~2020,1,23~,0,0,0, ~2020,1,24~,0,0,0, ~2020,1,25~,0,0,0, ~2020,1,26~,0,0,0, Commented May 12, 2021 at 20:16

1 Answer 1

1

Read the data as a single column of strings:

df = pd.read_fwf('data.txt', header=None)

# or read as csv with sep='\n'
# df = pd.read_csv('data.txt', sep='\n', header=None)

Parse the list-looking strings into actual lists with ast.literal_eval and expand them into columns with apply(pd.Series):

from ast import literal_eval
df = df[0].str.strip(', ').apply(literal_eval).apply(pd.Series)

Convert the date lists to real datetimes:

df[0] = df[0].agg(lambda x: pd.to_datetime('-'.join(map(str, x))))

Output:

           0  1  2  3
0 2020-01-22  0  0  0
1 2020-01-23  0  0  0
2 2020-01-24  0  0  0
3 2020-01-25  0  0  0
4 2020-01-26  0  0  0
5 2020-01-27  0  0  0
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. I am new to this forum. Need 15 reputations it seems. Accepted your answer. Thanks again.
One catch with read_fwf is that it's for fixed-width formats, but there's no guarantee that the lines will all be the same width. For example, the month that is 1 on one line could be 11 on another, or the zeros in the 2nd through 4th columns could be replaced with any other numbers.
@jjramsey it's flexible enough to handle that unless you explicitly specify colspecs or widths. the main caveat here is if there are spaces in the original data, in which case read_fwf will infer more columns
read_csv with sep='\n' also works if read_fwf gives you trouble
tdy and jjramsey , you guys are right. fwf gave problem at the third line mentioned below. [[2020,5,8],3344,1111,96], [[2020,5,9],3113,1414,116], [[2020,5,10],4353,1668,111], used csv instead. But didn't give problem here [[2020,2,9],0,0,0], [[2020,2,10],0,0,0], It worked. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.