How to write a Python script to open a csv file correctly

Question

I have a csv file I cannot read properly because instead of it being comma-separated it has semicolons, therefore I cannot read it as a table.

Do you know if I can write a script in order to see it properly? Below I typed how I am reading part of the file.

;"sid";"aid";"sentnr";"parnr";"sentence";"Subject.party";                                               
1;43160789;74861000;1;1;"Officieel â€žaanzoek"" namens                                                  
2;43160790;74861000;1;2;"Van onze parlementaire redactie  NA;NA;NA;NA;NA;NA;NA                                      
3;43160791;74861000;2;2;"Hierdoor is de opvolging van                                                   
4;43160792;74861000;3;2;"Dr. Samkalden had in ;NA;NA;NA;NA;NA;NA;NA                                             
5;43160793;74861000;4;2;"In het kabinet-Bi                                  
6;43160794;74861000;5;2;"_";NA;NA;NA;NA;NA;NA;NA

What's the signifance of the lines of NA as this will screw up tokenising — EdChum
– EdChum, Commented Mar 30, 2015 at 10:23

Nebril · Accepted Answer · 2015-03-30 10:24:49Z

1

I recommend using csv module.

import csv

with open('file.csv', 'r') as f:
    reader = csv.reader(f, delimiter=';')
    data = list(reader)

edited Mar 30, 2015 at 10:24

answered Mar 30, 2015 at 10:22

Nebril

3,3211 gold badge35 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nebril Over a year ago

@tobias_k: true, I forgot about delimiter argument.

EdChum Over a year ago

Sorry the OP's input data has more tokens than columns so your code won't work

mhawke · Accepted Answer · 2015-03-30 10:39:08Z

1

Use the delimiter argument to csv.reader();

import csv

with open('your_file.csv') as f:
    reader = csv.reader(f, delimiter=';')
    _ = next(reader)    # skip header row
    for row in reader:
        print row

Output

['1', '43160789', '74861000', '1', '1', 'Officieel \xc3\xa2\xe2\x82\xac\xc5\xbeaanzoek" namens\n2;43160790;74861000;1;2;Van onze parlementaire redactie  NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']
['3', '43160791', '74861000', '2', '2', 'Hierdoor is de opvolging van\n4;43160792;74861000;3;2;Dr. Samkalden had in ', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']
['5', '43160793', '74861000', '4', '2', 'In het kabinet-Bi\n6;43160794;74861000;5;2;_"', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']

This code will split fields on the semicolon as required, however, as pointed out by EdChum, there are other problems with the file, notably the use of unbalanced quotes.

edited Mar 30, 2015 at 10:39

answered Mar 30, 2015 at 10:24

mhawke

87.5k10 gold badges122 silver badges142 bronze badges

2 Comments

EdChum Over a year ago

This won't work the OP's csv has screwed up content and variable tokens and quoting

mhawke Over a year ago

@EdChum. Thanks, you're right, but it is at least now splitting the fields based on the semicolon. I have added a note as per your comment.

Collectives™ on Stack Overflow

How to write a Python script to open a csv file correctly

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related