0

I have a csv file I cannot read properly because instead of it being comma-separated it has semicolons, therefore I cannot read it as a table.

Do you know if I can write a script in order to see it properly? Below I typed how I am reading part of the file.

;"sid";"aid";"sentnr";"parnr";"sentence";"Subject.party";                                               
1;43160789;74861000;1;1;"Officieel „aanzoek"" namens                                                  
2;43160790;74861000;1;2;"Van onze parlementaire redactie  NA;NA;NA;NA;NA;NA;NA                                      
3;43160791;74861000;2;2;"Hierdoor is de opvolging van                                                   
4;43160792;74861000;3;2;"Dr. Samkalden had in ;NA;NA;NA;NA;NA;NA;NA                                             
5;43160793;74861000;4;2;"In het kabinet-Bi                                  
6;43160794;74861000;5;2;"_";NA;NA;NA;NA;NA;NA;NA
1
  • What's the signifance of the lines of NA as this will screw up tokenising Commented Mar 30, 2015 at 10:23

2 Answers 2

1

I recommend using csv module.

import csv

with open('file.csv', 'r') as f:
    reader = csv.reader(f, delimiter=';')
    data = list(reader)
Sign up to request clarification or add additional context in comments.

2 Comments

@tobias_k: true, I forgot about delimiter argument.
Sorry the OP's input data has more tokens than columns so your code won't work
1

Use the delimiter argument to csv.reader();

import csv

with open('your_file.csv') as f:
    reader = csv.reader(f, delimiter=';')
    _ = next(reader)    # skip header row
    for row in reader:
        print row

Output

['1', '43160789', '74861000', '1', '1', 'Officieel \xc3\xa2\xe2\x82\xac\xc5\xbeaanzoek" namens\n2;43160790;74861000;1;2;Van onze parlementaire redactie  NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']
['3', '43160791', '74861000', '2', '2', 'Hierdoor is de opvolging van\n4;43160792;74861000;3;2;Dr. Samkalden had in ', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']
['5', '43160793', '74861000', '4', '2', 'In het kabinet-Bi\n6;43160794;74861000;5;2;_"', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA']

This code will split fields on the semicolon as required, however, as pointed out by EdChum, there are other problems with the file, notably the use of unbalanced quotes.

2 Comments

This won't work the OP's csv has screwed up content and variable tokens and quoting
@EdChum. Thanks, you're right, but it is at least now splitting the fields based on the semicolon. I have added a note as per your comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.