2

Here is my sample data:

2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"
2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"
2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"

I tried to load the data using pandas using :

data = pd.read_csv("sample.csv",header = None)

My output is:

                0                 1           2
0  2017-11-27T00:29:37.698-06:00 NaN  42,00,00,00,3E,51,1B,D7,42,1C,00,00,40
1  2017-11-27T00:29:37.698-06:00 NaN  42,00,00,00,3E,51,1B,D7,42,1C,00,00,40
2  2017-11-27T00:29:37.698-06:00 NaN  42,00,00,00,3E,51,1B,D7,42,1C,00,00,40

I wanted to separate each data in second column with first column as time stamp.

My expected output would be:

    0                             1  2  3  4....
0  2017-11-27T00:29:37.698-06:00  42 00 00 00
1  2017-11-27T00:29:37.698-06:00  42 00 00 00
2  2017-11-27T00:29:37.698-06:00  42 00 00 00
2
  • 1
    So whats the expected output? Commented Jan 14, 2018 at 5:14
  • @Dark sorry for the confusion! I have updated the question again Commented Jan 14, 2018 at 5:19

3 Answers 3

3

You can, if needed, do your own csv parser like:

Code:

def read_my_csv(filename):
    with open(filename, 'rU') as f:

        # build csv reader
        reader = csv.reader(f)

        # for each row, check for footer
        for row in reader:
            yield [row[0]] + row[2].split(',')

Test Code:

import csv
import pandas as pd

df = pd.DataFrame(read_my_csv('csvfile.csv'))
print(df)

Results:

                                  0   1   2   3   4   5   6   7   8   9   10  \
0      2017-11-27T00:29:37.698-06:00  42  00  00  00  3E  51  1B  D7  42  1C   
1      2017-11-27T00:29:37.698-06:00  42  00  00  00  3E  51  1B  D7  42  1C   
2      2017-11-27T00:29:37.698-06:00  42  00  00  00  3E  51  1B  D7  42  1C   

   11  12  13  
0  00  00  40  
1  00  00  40  
2  00  00  40  
Sign up to request clarification or add additional context in comments.

1 Comment

Nice but isn’t it possible to move the with open to inside the function?
3

Pass a sep argument with a regular expression. Afterwards, do a little cleanup on the data.

df = pd.read_csv(
      'file.csv', 
      sep='"*,',           # separator
      header=None,         # no headers
      engine='python',     # allows a regex with multiple characters
      index_col=[0]        # specify timestamp as the index
)   

df.iloc[:, 1] = df.iloc[:, 1].str.strip('"').astype(int)
df.iloc[:, -1] = df.iloc[:, -1].str.strip('"').astype(int)

df

                               1   2   3   4   5   6   7   8   9   10  11  12  \
0                                                                               
2017-11-27T00:29:37.698-06:00 NaN  42   0   0   0  3E  51  1B  D7  42  1C   0   
2017-11-27T00:29:37.698-06:00 NaN  42   0   0   0  3E  51  1B  D7  42  1C   0   
2017-11-27T00:29:37.698-06:00 NaN  42   0   0   0  3E  51  1B  D7  42  1C   0   

                               13  14  
0                                      
2017-11-27T00:29:37.698-06:00   0  40  
2017-11-27T00:29:37.698-06:00   0  40  
2017-11-27T00:29:37.698-06:00   0  40  

To drop the column with NaNs, use dropna -

df.dropna(how='all', axis=1, inplace=True)

4 Comments

Downvoter, if there's something wrong with this answer, please let to know, so it canbe correct. Thanks.
@jezrael Thanks, I just wanted to confirm. I guess it's the same user who has been mass downvoting my answers. They've downvoted 4 of my answers so far. I believe the moderators should be able to handle it from there. They'll record the user's misconduct.
I have same experience, but the best is one comment in META about it - dont panic. I have no reason for do it.
Unfortunately I think this should be some angry user(s) :(
3

First add parameter parse_dates=[0] for parse first column to datetime.

Then join to original splited column 2 and remove columns 1 and 2, last rename all columns with add 1:

df = pd.read_csv("sample.csv",header = None, parse_dates=[0])

df = (df.drop([1,2], axis=1)
        .join(df[2].str.split(',', expand=True)
        .rename(columns = lambda x: x+1))   
      )  
print (df)
                       0   1   2   3   4   5   6   7   8   9   10  11  12  13
0 2017-11-27 06:29:37.698  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
1 2017-11-27 06:29:37.698  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
2 2017-11-27 06:29:37.698  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40

Detail

print (df[2].str.split(',', expand=True))
   0   1   2   3   4   5   6   7   8   9   10  11  12
0  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
1  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40
2  42  00  00  00  3E  51  1B  D7  42  1C  00  00  40

2 Comments

Seems like my answer has been downvoted. You see anything wrong with it?
Not me, I dont know...Your answer is OK, no reason.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.