How to parse a mixed CSV file in Python?

Question

I am dealing with a CSV file similar to this one

foo; val1; position1
bar; name1; address1; phone_nbr1
bar; name2; address2; phone_nbr2
foo; val2; position2
bar; name3; address3; phone_nbr3
bar; name4; address4; phone_nbr4
bar; name5; address5; phone_nbr5
bar; name6; address6; phone_nbr6
foo; val3; position3

Needless to say, I cannot modify the CSV.

Instances displayed in foo lines are different from the ones with bar lines (notice they don't even have the same number of fields)

I need simply reading this data, no need to write it.

My first idea was to separate the file into two temporary files and then read each one separately with a csv.DictReader, however I really don't like this approach.

Is there a simpler way to do this? I would like to avoid if possible having to write files to disk.

For the record, I'm using Python2.7 on a Solaris 10 machine.

Sven Marnach · Accepted Answer · 2011-10-26 12:47:43Z

6

You could collect the records from a csv.reader in two different lists, depending on their length (or whatever criterion you use to distinguish the two streams):

list1 = []
list2 = []
with open("input.csv", "rb") as f:
    for record in csv.reader(f, delimiter=";"):
        if len(record) == 3:
            list1.append(record)
        else:
            list2.append(record)

answered Oct 26, 2011 at 12:47

Sven Marnach

608k123 gold badges968 silver badges865 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Razzle · Accepted Answer · 2021-02-07 21:14:15Z

4

csv.reader() has no problem with this:

import csv
foo = []
bar = []
with open("test.csv", 'r') as f:
    c = csv.reader(f, delimiter = ";")
    for row in c:
        if row[0] == "foo":
            foo.append(row[1:])
        elif row[0] == "bar":
            bar.append(row[1:])
print(foo)
print(bar)

results in

[[' val1', ' position1'], [' val2', ' position2'], [' val3', ' position3']]
[[' name1', ' address1', ' phone_nbr1'], [' name2', ' address2', ' phone_nbr2'], [' name3', ' address3', ' phone_nbr3'], [' name4', ' address4', ' phone_nbr4'], [' name5', ' address5', ' phone_nbr5'], [' name6', ' address6', ' phone_nbr6']]

edited Feb 7, 2021 at 21:14

Razzle

5136 silver badges11 bronze badges

answered Oct 26, 2011 at 12:48

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Comments

Klaus Byskov Pedersen · Accepted Answer · 2011-10-26 12:44:05Z

1

What about just using str.split on each line?

items = line.split(";")

Then if the first item in the items list is foo you do one thing, and if it's bar you do something else.

answered Oct 26, 2011 at 12:44

Klaus Byskov Pedersen

122k31 gold badges192 silver badges223 bronze badges

Comments

Joël · Accepted Answer · 2011-10-26 12:49:05Z

0

The fact that the lines are different is not a problem for csv module, but you'll have to analyze line content differently depending on first 'cell'.

Example of code:

with open(input_file, 'rb') as fin:
    c = csv.reader(fin)
    for line in c:
         if line[0] == 'foo':
              # do some treatment
         elif line[0] == 'bar':
              # do something else
    c.close()

answered Oct 26, 2011 at 12:49

Joël

2,8491 gold badge22 silver badges38 bronze badges

Comments

kojiro · Accepted Answer · 2011-10-26 12:50:03Z

0

It's not clear from your question what it is you actually want to achieve, but I'm not sure you need the csv module here.

for row in myfile.readlines():
    cols = [r.strip() for r in row.split(';')]
    if (cols[0] == "foo"):
        # Do something for foo
    elif (cols[0] == "bar"):
        # Do something for bar

answered Oct 26, 2011 at 12:50

kojiro

77.8k20 gold badges151 silver badges217 bronze badges

Comments

Nicola Musatti · Accepted Answer · 2011-10-26 12:50:15Z

0

What about something like:

foos = []
bars = []
for line in csv.reader(open("file.csv","rb"), delimiter=";"):
  if line[0] == "foo":
    foos.append(Foo(line[1], line[2]))
  else:
    bars.append(Bar(line[1], line[2], line[3]))

Assuming you have a Foo and a Bar class taking the rest of your line cells as arguments.

answered Oct 26, 2011 at 12:50

Nicola Musatti

18.4k3 gold badges49 silver badges56 bronze badges

Collectives™ on Stack Overflow

How to parse a mixed CSV file in Python?

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related