0

I am creating a string that is about 30 million words long. As you can imagine, this takes absolutely forever to create with a for-loop increasing by about 100 words at a time. Is there a way to represent the string in a more memory-friendly way, like a numpy array? I have very little experience with numpy.

bigStr = ''
for tweet in df['text']:
  bigStr = bigStr + ' ' + tweet
len(bigStr)
4
  • What are you doing with the string once you've created it? do you need to create the string at all? If all you are doing is getting a length then do that Commented Jul 30, 2021 at 13:47
  • What exactly is your goal? Loading all words into memory? If that is not the case you want to look into 'generators' Commented Jul 30, 2021 at 13:47
  • The question is which operation is more expensive? Looping through the data or appending the string? Commented Jul 30, 2021 at 13:47
  • bigStr is, and will be, a regular Python str value, no matter what compatible type tweet may have. Commented Jul 30, 2021 at 13:54

2 Answers 2

1

If you want to build a string, use ' '.join, which will create the final string in O(n) time, rather than building it up one piece at a time, which takes O(n^2) time.

bigStr = ' '.join([tweet for tweet in df['text']])
Sign up to request clarification or add additional context in comments.

Comments

0

I can see you're trying to get the length of all data. For that you don't need to append all strings. (And I see you add a white space for each element)

Just get the length of tweet and add it to an integer variable (+1 for each white space):

number_of_texts = 0
for tweet in df['text']:
  number_of_texts += 1 + len(tweet)

print(number_of_texts)

1 Comment

Sorry, I shouldn't have included the len() function. That was just for curiosity. I need the array of strings so I can convert each unique word to an int.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.