Beginner Python query - selecting items in a JSON list

Question

I have a list of JSON twitter files that I read into a list in Python like so:

data5=[]
with codecs.open('twitFile_5.txt','rU') as file5:
    for line in file5:
       data5.append(json.loads(line))

I can select "text" for example to give me a selected tweet

data5[1]["text"]

However I don't know how to

1) just make a list of all the "text" items

2) search that "text" list and count the number of times a list of phrases is mentioned in the text e.g. ['apple', 'orange fruit', 'bunch of bananas'].

Thanks.

ryachza · Accepted Answer · 2015-08-23 19:34:50Z

1

It sounds like map and reduce could solve these:

For example:

texts = map(lambda x: x['text'], data5)

and:

texts = ['apple test', 'test orange fruit']

init = { 'apple': 0, 'orange fruit': 0, 'bunch of bananas': 0 }

def aggregate(agg,x):
  for k in agg:
    if k in x:
      agg[k] += 1
  return agg

counts = reduce(aggregate, texts, init)

Edit

Per comment:

values = [
    {'text': 'apple test', 'user': 'A'},
    {'text': 'test orange fruit', 'user': 'B'}
  ]

init = { 'apple': [], 'orange fruit': [], 'bunch of bananas': [] }

def aggregate(agg,x):
  for k in agg:
    if k in x['text']:
      agg[k].append(x)
  return agg

counts = reduce(aggregate, values, init)

edited Aug 23, 2015 at 19:34

answered Aug 23, 2015 at 17:43

ryachza

4,54020 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Betty Over a year ago

Thanks! I should probably just ask a new question - but how about if I want to also return other list details for each of the searched for tweets, such as "user" for example?

ryachza Over a year ago

@Betty Do you mean like instead of x['text']? You could return any arbitrary object. The simplest would be a tuple so something like (x['text'],x['user']) and that might work well for a handful of fields. A tuple is basically just a fixed size list so you access things by index. For something more robust, you would probably want to define a class and construct and return an instance.

Betty Over a year ago

What's there is perfect for now. What I was thinking was for every time "apple" is found returning more details such as the user, but perhaps it would make more sense to use a SQL database for that kind of querying.

ryachza Over a year ago

@Betty I updated the answer. Rather than an integer counter, you can use a list. The length of the list then would be the "count", but this way you can get the matched values as well.

Betty Over a year ago

Excellent, thanks! One more thing as this is great stuff to learn...This returns everything in my list relating to the search term. Is there any way to just return certain fields like "user" or "date_created"?

|

saulspatz · Accepted Answer · 2015-08-23 17:50:44Z

1

1) Use a list comprehension

texts = [d["text"] for d in data5]

2) List comprehension again

count = len([t for t in texts if 'apple' in t])

I'm interpreting your post to mean you want to count the number of texts that mention "apple." If you want to count the number of times "apple" occurs you can use

count = sum([t.count('apple') for t in texts])

answered Aug 23, 2015 at 17:50

saulspatz

5,3117 gold badges43 silver badges49 bronze badges

1 Comment

Betty Over a year ago

Thank you so much. How about if I want to also return other list details for each of the searched for tweets, such as "user" for example.

Collectives™ on Stack Overflow

Beginner Python query - selecting items in a JSON list

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related