loading csv file using pandas in python

Question

Here is my sample data:

2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"
2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"
2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"

I tried to load the data using pandas using :

data = pd.read_csv("sample.csv",header = None)

My output is:

                0                 1           2
0  2017-11-27T00:29:37.698-06:00 NaN  42,00,00,00,3E,51,1B,D7,42,1C,00,00,40
1  2017-11-27T00:29:37.698-06:00 NaN  42,00,00,00,3E,51,1B,D7,42,1C,00,00,40
2  2017-11-27T00:29:37.698-06:00 NaN  42,00,00,00,3E,51,1B,D7,42,1C,00,00,40

I wanted to separate each data in second column with first column as time stamp.

My expected output would be:

    0                             1  2  3  4....
0  2017-11-27T00:29:37.698-06:00  42 00 00 00
1  2017-11-27T00:29:37.698-06:00  42 00 00 00
2  2017-11-27T00:29:37.698-06:00  42 00 00 00

@Dark sorry for the confusion! I have updated the question again — gokyori
– gokyori, Commented Jan 14, 2018 at 5:19

Stephen Rauch · Accepted Answer · 2018-01-14 05:39:12Z

3

You can, if needed, do your own csv parser like:

Code:

def read_my_csv(filename):
    with open(filename, 'rU') as f:

        # build csv reader
        reader = csv.reader(f)

        # for each row, check for footer
        for row in reader:
            yield [row[0]] + row[2].split(',')

Test Code:

import csv
import pandas as pd

df = pd.DataFrame(read_my_csv('csvfile.csv'))
print(df)

Results:

                                  0   1   2   3   4   5   6   7   8   9   10  \
0      2017-11-27T00:29:37.698-06:00  42  00  00  00  3E  51  1B  D7  42  1C   
1      2017-11-27T00:29:37.698-06:00  42  00  00  00  3E  51  1B  D7  42  1C   
2      2017-11-27T00:29:37.698-06:00  42  00  00  00  3E  51  1B  D7  42  1C   

   11  12  13  
0  00  00  40  
1  00  00  40  
2  00  00  40

edited Jan 14, 2018 at 5:39

answered Jan 14, 2018 at 5:24

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Anton vBR Over a year ago

Nice but isn’t it possible to move the with open to inside the function?

cs95 · Accepted Answer · 2018-01-14 05:26:04Z

3

Pass a sep argument with a regular expression. Afterwards, do a little cleanup on the data.

df = pd.read_csv(
      'file.csv', 
      sep='"*,',           # separator
      header=None,         # no headers
      engine='python',     # allows a regex with multiple characters
      index_col=[0]        # specify timestamp as the index
)   

df.iloc[:, 1] = df.iloc[:, 1].str.strip('"').astype(int)
df.iloc[:, -1] = df.iloc[:, -1].str.strip('"').astype(int)

df

                               1   2   3   4   5   6   7   8   9   10  11  12  \
0                                                                               
2017-11-27T00:29:37.698-06:00 NaN  42   0   0   0  3E  51  1B  D7  42  1C   0   
2017-11-27T00:29:37.698-06:00 NaN  42   0   0   0  3E  51  1B  D7  42  1C   0   
2017-11-27T00:29:37.698-06:00 NaN  42   0   0   0  3E  51  1B  D7  42  1C   0   

                               13  14  
0                                      
2017-11-27T00:29:37.698-06:00   0  40  
2017-11-27T00:29:37.698-06:00   0  40  
2017-11-27T00:29:37.698-06:00   0  40

To drop the column with NaNs, use dropna -

df.dropna(how='all', axis=1, inplace=True)

edited Jan 14, 2018 at 5:26

answered Jan 14, 2018 at 5:19

cs95

406k106 gold badges744 silver badges797 bronze badges

4 Comments

jezrael Over a year ago

Downvoter, if there's something wrong with this answer, please let to know, so it canbe correct. Thanks.

cs95 Over a year ago

@jezrael Thanks, I just wanted to confirm. I guess it's the same user who has been mass downvoting my answers. They've downvoted 4 of my answers so far. I believe the moderators should be able to handle it from there. They'll record the user's misconduct.

jezrael Over a year ago

I have same experience, but the best is one comment in META about it - dont panic. I have no reason for do it.

jezrael Over a year ago

Unfortunately I think this should be some angry user(s) :(

jezrael · Accepted Answer · 2018-01-14 05:30:05Z

3

First add parameter parse_dates=[0] for parse first column to datetime.

Then join to original splited column 2 and remove columns 1 and 2, last rename all columns with add 1:

df = pd.read_csv("sample.csv",header = None, parse_dates=[0])

df = (df.drop([1,2], axis=1)
        .join(df[2].str.split(',', expand=True)
        .rename(columns = lambda x: x+1))   
      )  
print (df)
                       0   1   2   3   4   5   6   7   8   9   10  11  12  13
0 2017-11-27 06:29:37.698  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
1 2017-11-27 06:29:37.698  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
2 2017-11-27 06:29:37.698  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40

Detail

print (df[2].str.split(',', expand=True))
   0   1   2   3   4   5   6   7   8   9   10  11  12
0  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
1  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
2  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40

edited Jan 14, 2018 at 5:30

answered Jan 14, 2018 at 5:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

cs95 Over a year ago

Seems like my answer has been downvoted. You see anything wrong with it?

jezrael Over a year ago

Not me, I dont know...Your answer is OK, no reason.

Collectives™ on Stack Overflow

loading csv file using pandas in python

3 Answers 3

Code:

Test Code:

Results:

1 Comment

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Code:

Test Code:

Results:

1 Comment

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related