python 3 csv data structure problems

Question

I have a csv file like this

Category    Subcategory
-----------------------
cat         panther
cat         tiger
dog         wolf
dog         heyena
cat         lion
dog         beagle

Im trying to write a script that outputs something like this (order not important):

animals = [
              [['cat'], ['panther', 'tiger', 'lion']],
              [['dog'], ['wolf', 'heyena', 'beagle']]
          ]

So far I am able to make a list of unique categories, and a list of unique sub categories.

for p in infile:
    if(p[0] not in catlist):
        catlist.append(p[0])
    if(p[1] not in subcatlist) :
        subcatlist.append(p[1])

But I am having trouble writing the logic that says "if Category 'cat' is in animals[], but 'panther' is not in 'cat', append it."

Ive played with zip() and dict() some, but Im pretty much just flailing about here. Fairly new to python. Using Python 3.

do you really want the nested lists? Dicts would be more comfortable tu use. — Thomas Fenzl
– Thomas Fenzl, Commented May 20, 2013 at 18:55
Ill post it in a bit, its all screwed up now. I guess Im basically looking for a better way to handle 2 dimensional arrays, or if theres just some overall better way of approaching this kind of problem. — jason
– jason, Commented May 20, 2013 at 18:55
@Tom : isnt a dict a 1 to 1 (cat : lion), or can it be a 1 to many (cat : lion, tiger)? I guess I would need a list of dicts? — jason
– jason, Commented May 20, 2013 at 18:57

Thomas Fenzl · Accepted Answer · 2013-05-20 19:02:26Z

4

It is a lot easier to use dictionaries if you want to map keys to some values. Especially convenient for building them is defaultdict.

Assuming your infile splits the input lines on blank, the following should help:

from collections import defaultdict

animals = defaultdict(list)

for p in infile:
    animals[p[0]].append(p[1])

answered May 20, 2013 at 19:02

Thomas Fenzl

4,4221 gold badge19 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Gareth Latty Over a year ago

Note that rather than indexing p[0]/p[1], the more readable thing is to use unpacking and do for key, value in infile: animals[key].append(value).

philosodad · Accepted Answer · 2013-05-20 19:55:04Z

You might consider using a set and a dict. Use the category name as the key of the dictionary. So for every p in infile, animals[p[0]].add(p[1]), assuming that p0, p1 are the type and the species.

The advantage of this is that if 'Panther' appears multiple times as a 'Cat', you won't have to check if it already exists in the 'Cat' list, because the set type will insure that you have a set of unique elements.

>>> from collections import defaultdict
>>> animals = defaultdict(set)
>>> animals['Cat'].add('Panther')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Panther'}})
>>> animals['Cat'].add('Lion')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Lion', 'Panther'}})
>>> animals['Cat'].add('Panther')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Lion', 'Panther'}})

compared to the use of list:

>>> moreanimals = defaultdict(list)
>>> moreanimals['Cat'].append('Panther')
>>> moreanimals
defaultdict(<class 'list'>, {'Cat': ['Panther']})
>>> moreanimals['Cat'].append('Panther')
>>> moreanimals
defaultdict(<class 'list'>, {'Cat': ['Panther', 'Panther']})

Collectives™ on Stack Overflow

python 3 csv data structure problems

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related