20

I have a huge file (with around 200k inputs). The inputs are in the form:

A B C D
B E F
C A B D
D  

I am reading this file and storing it in a list as follows:

text = f.read().split('\n')

This splits the file whenever it sees a new line. Hence text is like follows:

[[A B C D] [B E F] [C A B D] [D]]

I have to now store these values in a dictionary where the key values are the first element from each list. i.e the keys will be A, B, C, D. I am finding it difficult to enter the values as the remaining elements of the list. i.e the dictionary should look like:

{A: [B C D]; B: [E F]; C: [A B D]; D: []}

I have done the following:

    inlinkDict = {}
    for doc in text:
    adoc= doc.split(' ')
    docid = adoc[0]
    inlinkDict[docid] = inlinkDict.get(docid,0) +  {I do not understand what to put in here}

Please help as to how should i add the values to my dictionary. It should be 0 if there are no elements in the list except for the one which will be the key value. Like in example for 0.

3
  • Do you want the dictionary to be {A: [B, C, D]; B: [E, F]; C: [A, B, D]; D: []}? Or maybe {A: "B C D"; B: "E F"; C: "A B D"; D: 0}? Commented Mar 25, 2012 at 5:27
  • Please edit your question to say what you want to do about duplicate keys; foer example, what if you have a 5th line containing A P Q R? How do you want to store the values B C D ... as a list ['B', 'C', 'D']? If you it will be much better to represent the case of an empty list as an empty list [], not as an integer 0. Commented Mar 25, 2012 at 5:34
  • @JohnMachin: There are no duplicate values. And yes storing values as a list will definitely help. I will edit my question. Commented Mar 25, 2012 at 5:38

3 Answers 3

27

A dictionary comprehension makes short work of this task:

>>> s = [['A','B','C','D'], ['B','E','F'], ['C','A','B','D'], ['D']]
>>> {t[0]:t[1:] for t in s}
{'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
Sign up to request clarification or add additional context in comments.

3 Comments

If you're using an old version of python that doesn't have dict comprehensions, you can use dict(t[0], t[1:] for t in s) instead
And if you're using a version of python that predates generator expressions, you can use dict([(t[0], t[1:]) for t in s]). And, if you're using a version older than that, you can use for t in s: d[t[0]] = t[1:]. And, if you're so far back in time that Python doesn't exist, you can use Dartmouth BASIC to DIM an array so that you can simulate a hash table by writing your own hash function. And, if you're working on a system without a higher level language, you can hand translate your assembler code into machine language and input your program with toggle switches ...
Ha, ha, ha. It's just that 2.5 and 2.6 are still very common, and dict comprehensions were only added in 2.7.
22

Try using a slice:

inlinkDict[docid] = adoc[1:]

This will give you an empty list instead of a 0 for the case where only the key value is on the line. To get a 0 instead, use an or (which always returns one of the operands):

inlinkDict[docid] = adoc[1:] or 0

Easier way with a dict comprehension:

>>> with open('/tmp/spam.txt') as f:
...     data = [line.split() for line in f]
... 
>>> {d[0]: d[1:] for d in data}
{'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
>>> {d[0]: ' '.join(d[1:]) if d[1:] else 0 for d in data}
{'A': 'B C D', 'C': 'A B D', 'B': 'E F', 'D': 0}

Note: dict keys must be unique, so if you have, say, two lines beginning with 'C' the first one will be over-written.

Comments

4

The accepted answer is correct, except that it reads the entire file into memory (may not be desirable if you have a large file), and it will overwrite duplicate keys.

An alternate approach using defaultdict, which is available from Python 2.4 solves this:

from collections import defaultdict
d = defaultdict(list)
with open('/tmp/spam.txt') as f:
  for line in f:
    parts = line.strip().split()
    d[parts[0]] += parts[1:]

Input:

A B C D
B E F
C A B D
D  
C H I J

Result:

>>> d = defaultdict(list)
>>> with open('/tmp/spam.txt') as f:
...    for line in f:
...      parts = line.strip().split()
...      d[parts[0]] += parts[1:]
...
>>> d['C']
['A', 'B', 'D', 'H', 'I', 'J']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.