0

I have such a code:

corpus_file = codecs.open("corpus_en-tr.txt", encoding="utf-8").readlines()

corpus = []
for a in range(0, len(corpus_file), 2):
     corpus.append({'src': corpus_file[a].rstrip(), 'tgt': corpus_file[a+1].rstrip()})

params = {}

for sentencePair in corpus:
     for tgtWord in sentencePair['tgt']:
          for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

Basically I am trying to create a dictionary of dictionary of float. But I get the following error:

Traceback (most recent call last):
  File "initial_guess.py", line 15, in <module>
    params[srcWord][tgtWord] = 1.0
KeyError: u'A'

UTF-8 string as key in dictionary causes KeyError

I checked the case above, but it doesn't help.

Basically I don't understand why unicoded string 'A' is not allowed in python to be a key value? Is there any way to fix it?

2 Answers 2

2

Your params dict is empty.

You can use tree for that:

from collections import defaultdict

def tree():
    return defaultdict(tree)

params = tree()
params['any']['keys']['you']['want'] = 1.0

Or a simpler defaultdict case without tree:

from collections import defaultdict

params = defaultdict(dict)

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

If you don't want to add anything like that, then just try to add dict to params on every iteration:

params = {}

for sentencePair in corpus:
    for srcWord in sentencePair['src']:
        params.setdefault(srcWord, {})
        for tgtWord in sentencePair['tgt']:  
               params[srcWord][tgtWord] = 1.0

Please note, that I've changed the order of for loops, because you need to know srcWord first.

Otherwise you need to check key existence too often:

params = {}

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
            params.setdefault(srcWord, {})[tgtWord] = 1.0
Sign up to request clarification or add additional context in comments.

2 Comments

why you changed the order of for loops?
@yusuf, I've updated the answer. Check out the last variant, it allows you to order for loops however you like.
1

You can just use setdefault:

Replace

params[srcWord][tgtWord] = 1.0

with

params.setdefault(srcWord, {})[tgtWord] = 1.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.