1

I am in the process of shifting from Perl to Python, and I am struggling with what was a hash of hashes of arrays. I have this data structure return from a REST service:

[
    {
      "gene": "ENSG00000270076", 
      "minus_log10_p_value": 0.0271298550085406, 
      "tissue": "Thyroid", 
      "value": 0.939442373223424
    },
    {
      "gene": "ENSG00000104643", 
      "minus_log10_p_value": 0.255628260060896, 
      "tissue": "Thyroid", 
      "value": 0.555100655197016
    }
]

Speaking in Perl, I'd like to parse it and have the Python equivalent of

${$tissue}{$value} = [$gene]
${Throid}{0.5555} = [ENSG1, ENSG2, ENSG3]

In Python I tried things along the line:

d={}
d[hit['tissue']][hit['value']].append(hit[gene])

but encountered various errors.

In the end, I want d to look like:

{
    'Thyroid': {
        0.939442373223424: ['ENSG00000270076'],
        0.555100655197016: ['ENSG00000104643']
    }
}

so grouping by tissue, then by value, and for each value have a list of genes.

8
  • No, {'Thyroid': {0.555100655197016: 'ENSG00000104643', 0.939442373223424: 'ENSG00000270076'}} is what OP is looking for. It states clearly isn't it.. Commented May 11, 2017 at 17:33
  • @KeerthanaPrabhakaran: yet we disagree on what they expect. No, that's not what they want, because they expect to have a list of values per gene, not using the values as keys. Commented May 11, 2017 at 17:34
  • Not really. I understood it in the first go. Commented May 11, 2017 at 17:34
  • OP has clearly included ${Throid}{0.5555} = [ENSG1, ENSG2, ENSG3]. So its d[tissue][value]=gene is what is expected. Commented May 11, 2017 at 17:35
  • @KeerthanaPrabhakaran: and they also included conflicting information in the same question. Commented May 11, 2017 at 17:36

4 Answers 4

2

You can use list comprehension to get output of your desired format!

>>> l = [{'minus_log10_p_value': 0.0271298550085406, 'gene': 'ENSG00000270076', 'tissue': 'Thyroid', 'value': 0.939442373223424}, {'minus_log10_p_value': 0.255628260060896, 'gene': 'ENSG00000104643', 'tissue': 'Thyroid', 'value': 0.555100655197016}]
>>> for each in l:
...     if each['tissue'] not in res:
...             res[each['tissue']]={each['value']:each['gene']}
...     else:
...             res[each['tissue']][each['value']]=each['gene']
... 
>>> res
{'Thyroid': {0.555100655197016: 'ENSG00000104643', 0.939442373223424: 'ENSG00000270076'}}
Sign up to request clarification or add additional context in comments.

6 Comments

Please leave a comment when you down vote stating the reason! :)
How is this going to produce a single nested dictionary? Whatever we may disagree with in the comments on the question, this is further removed from either format the OP is discussing.
@MartijnPieters I'm sorry about that. shared the wrong code I've edited the answer!!
That still doesn't produce lists in the nested dictionary.
This is the approach that I suggest. Its upto OP to improvise it according to his need!
|
2

You can use the dict.setdefault() method to insert nested data structures for keys that are missing. Because that method returns either the already existing key, or the newly-inserted default value, you can chain these calls:

d = {}
for hit in list_of_hits:
    tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
    d.setdefault(tissue, {}).setdefault(value, []).append(gene)

So for each d[tissue] key, ensure that there is a nested dictionary. For each d[tissue][value] pair of keys, ensure that there is a nested list value, and append the gene to that.

Demo:

>>> list_of_hits = [
...     {
...       "gene": "ENSG00000270076",
...       "minus_log10_p_value": 0.0271298550085406,
...       "tissue": "Thyroid",
...       "value": 0.939442373223424
...     },
...     {
...       "gene": "ENSG00000104643",
...       "minus_log10_p_value": 0.255628260060896,
...       "tissue": "Thyroid",
...       "value": 0.555100655197016
...     }
... ]
>>> d = {}
>>> for hit in list_of_hits:
...     tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
...     d.setdefault(tissue, {}).setdefault(value, []).append(gene)
...
>>> d
{'Thyroid': {0.939442373223424: ['ENSG00000270076'], 0.555100655197016: ['ENSG00000104643']}}
>>> from pprint import pprint
>>> pprint(d)
{'Thyroid': {0.555100655197016: ['ENSG00000104643'],
             0.939442373223424: ['ENSG00000270076']}}

Do realise that floating point values can be imprecise. You may want to apply some rounding to normalise the values. 0.555100655197016 and 0.555100655197017 are very close together, for example, but not equal:

>>> 0.555100655197016 == 0.555100655197017
False

You could simply use the round() function on value, to a number of digits that still makes sense for your application:

d = {}
for hit in list_of_hits:
    tissue, value, gene = hit['tissue'], hit['value'], hit['gene']
    value = round(value, 4)
    d.setdefault(tissue, {}).setdefault(value, []).append(gene)

Comments

1

I would personally use a mix of list comprehensions and default dicts, but I wanted to illustrate what the most simple / introductory approach would be as you're transitioning to Python:

output = {}
for a_dict in results:
    tissue = a_dict['tissue']
    value = a_dict['value']
    gene = a_dict['gene']
    # the `tissue` is a nested dict
    if tissue not in output:
        output[tissue] = {}
    # the genes should be an array
    if value not in output[tissue]:
        output[tissue][value] = []
    output[tissue][value].append(gene)

the reason why this is so verbose (compared to a Perl approach) is that Perl has some conveniences built in for creating data structures of a certain type as needed. In Python you need to use either check for the presence of the correct datastore or use one (of several) approaches to a dict that has default values.

Comments

0

I think this will do the job. In fact you have almost done it

from collections import defaultdict
d= defaultdict(lambda:defaultdict(list))
for value in values:
    d[value["tissue"]][value["value"]].append(value["gene"])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.