0

I have a csv file like this

Category    Subcategory
-----------------------
cat         panther
cat         tiger
dog         wolf
dog         heyena
cat         lion
dog         beagle

Im trying to write a script that outputs something like this (order not important):

animals = [
              [['cat'], ['panther', 'tiger', 'lion']],
              [['dog'], ['wolf', 'heyena', 'beagle']]
          ]

So far I am able to make a list of unique categories, and a list of unique sub categories.

for p in infile:
    if(p[0] not in catlist):
        catlist.append(p[0])
    if(p[1] not in subcatlist) :
        subcatlist.append(p[1])

But I am having trouble writing the logic that says "if Category 'cat' is in animals[], but 'panther' is not in 'cat', append it."

Ive played with zip() and dict() some, but Im pretty much just flailing about here. Fairly new to python. Using Python 3.

4
  • Where is the rest of your code? What output did you get? Commented May 20, 2013 at 18:49
  • do you really want the nested lists? Dicts would be more comfortable tu use. Commented May 20, 2013 at 18:55
  • Ill post it in a bit, its all screwed up now. I guess Im basically looking for a better way to handle 2 dimensional arrays, or if theres just some overall better way of approaching this kind of problem. Commented May 20, 2013 at 18:55
  • @Tom : isnt a dict a 1 to 1 (cat : lion), or can it be a 1 to many (cat : lion, tiger)? I guess I would need a list of dicts? Commented May 20, 2013 at 18:57

2 Answers 2

4

It is a lot easier to use dictionaries if you want to map keys to some values. Especially convenient for building them is defaultdict.

Assuming your infile splits the input lines on blank, the following should help:

from collections import defaultdict

animals = defaultdict(list)

for p in infile:
    animals[p[0]].append(p[1])
Sign up to request clarification or add additional context in comments.

1 Comment

Note that rather than indexing p[0]/p[1], the more readable thing is to use unpacking and do for key, value in infile: animals[key].append(value).
2

You might consider using a set and a dict. Use the category name as the key of the dictionary. So for every p in infile, animals[p[0]].add(p[1]), assuming that p0, p1 are the type and the species.

The advantage of this is that if 'Panther' appears multiple times as a 'Cat', you won't have to check if it already exists in the 'Cat' list, because the set type will insure that you have a set of unique elements.

>>> from collections import defaultdict
>>> animals = defaultdict(set)
>>> animals['Cat'].add('Panther')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Panther'}})
>>> animals['Cat'].add('Lion')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Lion', 'Panther'}})
>>> animals['Cat'].add('Panther')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Lion', 'Panther'}})

compared to the use of list:

>>> moreanimals = defaultdict(list)
>>> moreanimals['Cat'].append('Panther')
>>> moreanimals
defaultdict(<class 'list'>, {'Cat': ['Panther']})
>>> moreanimals['Cat'].append('Panther')
>>> moreanimals
defaultdict(<class 'list'>, {'Cat': ['Panther', 'Panther']})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.