0

I have csv-file with this structure:

Last Name   First Name  Start Date  End Date            
Example     Eva         1.1.2021    15.6.2021
                                        
Here is some random information.                                        
                                        
------- Header-------                       
Index   Date    Time        Reading
0   10.4.2021   16:26:01    0,1             
1   10.4.2021   16:25:44    0,1             
2   10.4.2021   16:00:00    0,1             
3   10.4.2021   16:00:00    0,1             
4   10.4.2021   14:00:00    0,1             
5   10.4.2021   14:00:00    0,1             
6   10.4.2021   13:00:00    0,3             

------- Header------- 
Index   Date    Time        Reading
0   10.4.2021   16:26:01    0,1             
1   10.4.2021   16:25:44    0,1             
2   10.4.2021   16:00:00    0,1             
3   10.4.2021   16:00:00    0,1             
4   10.4.2021   14:00:00    0,1             
5   10.4.2021   14:00:00    0,1             
6   10.4.2021   13:00:00    0,3

I want to read the file using pandas and make a dictionary about the data, like this for example: {'last_name': 'Example', 'first_name': 'Eva'} and so on. How can I read certain values into variables for example? At the moment, I read the csv -file like this: data = pd.read_csv(file, sep='delimiter').

1
  • So, you don't care about everything after the first space? Commented Aug 31, 2021 at 12:55

1 Answer 1

1

header

If you only want to read the beginning of the file as a dictionary, you can do:

pd.read_csv('filename.csv', sep='\s\s+', nrows=1).loc[0].to_dict()

output:

{'Last Name': 'Example',
 'First Name': 'Eva',
 'Start Date': '1.1.2021',
 'End Date': '15.6.2021'}

rest of the file

To read the rest of the file:

df = (pd.read_csv('filename.csv',
                  sep='\s+',
                  skiprows=6,
                  index_col=0,
                 )
        .drop(['Index', '-------']) # get rid of extra headers
     )

output:

            Date      Time Reading
Index                             
0      10.4.2021  16:26:01     0,1
1      10.4.2021  16:25:44     0,1
2      10.4.2021  16:00:00     0,1
3      10.4.2021  16:00:00     0,1
4      10.4.2021  14:00:00     0,1
5      10.4.2021  14:00:00     0,1
6      10.4.2021  13:00:00     0,3
0      10.4.2021  16:26:01     0,1
1      10.4.2021  16:25:44     0,1
2      10.4.2021  16:00:00     0,1
3      10.4.2021  16:00:00     0,1
4      10.4.2021  14:00:00     0,1
5      10.4.2021  14:00:00     0,1
6      10.4.2021  13:00:00     0,3

If you need to determine programmatically the number of lines to skip:

with open('filename.csv') as f:
    skip = 1
    for l in f:
        if l.startswith('-------'):
            break
        skip+=1

skip: 6

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @mozway, this is definitely a right direction. My goal is to get a clean dict out of the file with only the information I need. Let's say I want only name, dates and readings structured like this: {'last_name': 'Example', 'first_name': 'Eva', 'measurements': [{'date:': 'some_date', 'reading': 'some_reading'}, ...}. How can I iterate through the columns and only get the ones I need?
difficult to answer without having the exact format ;) As this is a different question, I suggest you give it a try first and start a new question if need be.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.