3

I have a list of tweets that is grouped into chunks of tweets within the list like so:

[[tweet1, tweet2, tweet3],[tweet4,tweet5,tweet6],[tweet7, tweet8, tweet9]]

I want to count the number of occurences of each word within each subgroup. To do this, I need to split each tweet into individual words. I want to use something similar to str.split(' '), but I receive an error:

AttributeError: 'list' object has no attribute 'split' 

Is there a way to split each tweet into its individual words? The result should looks something like:

[['word1', 'word2', 'word3', 'word2', 'word2'],['word1', 'word1', 'word3', 'word4', 'word5'],['word1', 'word3', 'word3', 'word5', 'word6']]
0

5 Answers 5

6

If you have a list of strings

tweets = ['a tweet', 'another tweet']

Then you can split each element using a list comprehension

split_tweets = [tweet.split(' ')
                for tweet in tweets]

Since it's a list of lists of tweets:

tweet_groups = [['tweet 1', 'tweet 1b'], ['tweet 2', 'tweet 2b']]
tweet_group_words = [[word
                      for tweet in group
                      for word in tweet.split(' ')]
                     for group in tweet_groups]

Which will give a list of lists of words.

If you want to count distinct words,

words = [set(word 
             for tweet in group
             for word in tweet.split(' '))
         for group in tweet_groups]
Sign up to request clarification or add additional context in comments.

2 Comments

This gives the OP an extra level of unwanted list nesting
@sshashank124 This wasn't clear to me when I first read it. see edit
1

You want something like this:

l1 = [['a b', 'c d', 'e f'], ['a b', 'c d', 'e f'], ['a b', 'c d', 'e f']]

l2 = []
for i,j in enumerate(l1):
    l2.append([])
    for k in j:
        l2[i].extend(k.split())

print(l2)

DEMO

Comments

1

If you want to count the occurrences then use a Counter dict, chaining all the words with itertools.chain after splitting.

from collections import Counter
from itertools import chain

tweets  = [['foo bar', 'foo foobar'], ['bar foo', 'bar']]
print([Counter(chain.from_iterable(map(str.split,sub)))  for sub in tweets] )
[Counter({'foo': 2, 'foobar': 1, 'bar': 1}), Counter({'bar': 2, 'foo': 1})]

Comments

1
groups = [["foo bar", "bar baz"], ["foo foo"]]
[sum((tweet.split(' ') for tweet in group), []) for group in groups]
# => [['foo', 'bar', 'bar', 'baz'], ['foo', 'foo']]

EDIT: It seems an explanation is needed.

  • For each group [... for group in groups]

    • For each tweet, split into words (tweet.split(' ') for tweet in group)
    • Concatenate the split tweets sum(..., [])

4 Comments

Downvoter, care to comment? This is the most elegant and Pythonic solution by far.
@Shashank: I actually like tobyodavies's solution quite a bit, now that he started answering the right question :) So I wouldn't say mine is the most elegant, but thank you.
Yes, his solution with the nested for-loop comprehension is great as well, I upvoted it. However, I still like this one the most and don't believe it deserves a downvote with no reason.
This post was flagged by at least one user, presumably because they thought an answer without explanation should be deleted. ... Yeah, I got nothin'.
0

You could create a function that you pass your list to that will assemble and return a dictionary of the words and how many times they show up in your tweets.

def countWords(listitem):
    a = []
    for x in listitem:
        for y in x:
            for z in y.split(' '):
                a.append(z)
    b = {}
    for word in a:
        if word not in b:
            b[word] = 1
        else:
            b[word] += 1
    return b

this way you will keep both your list and be able to assign the return value back to a new variable for inspection.

dictvar = countWords(listoftweets)

creating a definition will allow you to place this inside of its own file that you can always import use in the future.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.