Can I loop through a text file when values are strings?

Question

I have a problem I'd be very grateful for help with.

Specifically, I have a gigantic text file; I need to replace specific strings in it with entries from a dictionary. Usefully, the words I need to replace are named in sequential fashion: 'Word1', 'Word2', ... , 'Wordn'.

Now, I'd like to write a 'for' loop that loops across the file, and for all instances of 'Wordx' replaces it with dictionary[x]. The problem, of course, is that 'Wordx' requires the 'x' part to function as a variable, which (so far as I know) can't be done inside a string.

Does anyone have workaround? I tried looking at regular expressions, but found nothing obvious (possibly because I also found it somewhat confusing).

(Note that I can when I generate the text file, I have complete control over the form the words I want to replace can take: i.e., it need not be 'Word11; it can be 'Wordeleven' or 'wordXI' or anything ascii at all.)

Edit: To add more detail, as requested: my text file is an export of the javascript behind a survey file. The original survey software only allows me to enter text prompts one at a time (as opposed to pipe the in from a csv), but I have several thousand text prompts to enter (the words). My plan is to manually enter about 100 words ('Word1, ..., 'Word100'), export the survey javascript as a text file, write a script to replace the words with dictionary entries, import the resulting files, and join them into a new survey.

However, the issue remains whether I can use the number portion of a string as a variable to loop across

maybe you need show more clear example, more about your text file, and what you want — BertramLAU
– BertramLAU, Commented Jun 4, 2016 at 11:21
Eh, the size isn't really my point: I say 'gigantic' only to convey that it rewards writing some code, as opposed to doing a 'find' 'replace' one word at a time. — Lodore66
– Lodore66, Commented Jun 4, 2016 at 12:41

z0r · Accepted Answer · 2016-06-04 11:51:05Z

5

With re.sub(), you can pass it a function instead of a replacement string. This function can look up the replacement from a dictionary. For example:

d = {'0': 'foo', '1': 'bar', '2': 'baz'}
re.sub(r'word(\d+)',
       lambda match: d[match.group(1)],
       "Hello word0, this is word2. How is word1?")

Hello foo, this is baz. How is bar?

edited Jun 4, 2016 at 11:51

answered Jun 4, 2016 at 11:31

z0r

8,6805 gold badges67 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user6275647 Over a year ago

This is great. The only issue I have with it is that the code throws an error if it encounters words with numbers that are not in the dictionary. This seems sub-optimal given that the author really just wants to replace where the integer is found in the dictionary, not react to all integers in the doc.

Jasper Over a year ago

I don't know how well re.sub() can handle "gigantic" text files as input.

PM 2Ring Over a year ago

@Jason: Sure, but that only requires a minor adjustment to the replacement function. But what makes you think there could be words of the "wordx" pattern in the file that aren't in the dictionary? According to the info given, the OP has enough control over the file to prevent that situation from arising.

z0r Over a year ago

@Jasper If the performance of sub is a problem with a large string, you could do it line-by-line.

Rory Daulton · Accepted Answer · 2016-06-04 11:30:44Z

2

n = 1
while not done:
    replace_str = 'Word' + str(n)
    # find and replace all instances of replace_str in the file text
    # set variable done if finished
    n += 1

Does that framework solve your needs? A string is not a variable: a string is a value which can be calculated, while a variable is a name, which (usually) is not calculated. With more difficulty you can also set strings like 'WordEleven' and so on.

edited Jun 4, 2016 at 11:30

answered Jun 4, 2016 at 11:25

Rory Daulton

22.7k7 gold badges46 silver badges51 bronze badges

3 Comments

Barmar Over a year ago

Reading through a huge file repeatedly for each n is a very expensive method. It would be better to read through the file once, and do all the replacements on each line.

Rory Daulton Over a year ago

I agree with those concerns. I ignored them in my answer because the question as originally written gave very few details, and I wanted to concentrate on what seemed to be the main issue, "'Wordx' requires the 'x' part to function as a variable, which (so far as I know) can't be done inside a string".

Lodore66 Over a year ago

For what it's worth, Rory Daulton's suggestion does capture what I was looking for in a fairly direct way. The other suggestions are excellent too, but this gives the kind of workaround that the problem needs. It may well be that it's less efficient––though the problem is a once-off.

Community · Accepted Answer · 2020-06-20 09:12:55Z

I suppose the text file you were talking was like this:

Hi! This is word1

I like to swim, word2 and word3 ....

if so, then you can read line by line, split lines and replace words with values from dictionary, whose keys would be int(word[-1])

Here is the code,

from __future__ import print_function

dict = {1: 'Aravind', 2: 'eat', 3:'play'}

def word_gen(file):
    for line in file:
        for word in line.split():

            if word[0:4] == 'word' and len(word) == 5:
                 print( dict[ int( word[-1] ) ], end=" " )  #remove int() if keys are are "chars" like {'1':'Mark',..}
                 #this------------------^

            else: print(word, end = " ")

        print("\r")


with open('re.txt', 'r') as f:
    word_gen(f)

now direct terminal output to another file with

python replace.py > replaced.txt

Hope that helps :)

Collectives™ on Stack Overflow

Can I loop through a text file when values are strings?

3 Answers 3

4 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related