Adjusting iteration amounts in a Python loop

Question

I'm trying to create an algo which goes through a list of strings, joins strings together if they meet a certain criteria, then skips by the number of strings it joined to avoid double counting of sections of the same joined string.

I understand i = i + x or i += x doesnt change the amount each loop iterates by, so am looking for an alternative method to skip a number of iterations by a variable.

Background: Im trying to create a Named Entity recognition algo for use in news articles. I tokenise the text ('Prime Minister Jacinda Ardern is from New Zealand') into ('Prime','Minister','Jacinda','Ardern','is'...) and run the NLTK POS tagging algo over it giving : ...(('Jacinda','NNP'),('Ardern','NNP'),('is','VBZ')... then combine words when subsequent words are also 'NNP' /proper nouns.

The goal is to count 'Prime Minister Jacinda Ardern' as 1 string as opposed to 4, then to skip the loop iteration by as many words to avoid the next string being 'Minister Jacinda Ardern' and then 'Jacinda Ardern'.

Context: 'text' is a list of lists created by tokenising and then POS tagging my article and is in the format: [...('She', 'PRP'), ('said', 'VBD'), ('the', 'DT'), ('roughly', 'RB'), ('25-minute', 'JJ'), ('meeting', 'NN')...] 'NNP' = proper noun or the names of places/people/organisations etc.

for (i) in range(len(text)):

    print(i)

    #initialising wordcounter as a variable
    wordcounter = 0

    # if text[i] is a Proper Noun, make namedEnt = the word. 
    # then increase wordcounter by 1
    if text[i][1] == 'NNP':
        namedEnt = text[i][0]
        wordcounter +=1

        # while the next word in text is also a Proper Noun,
        # increase wordcounter by 1. Initialise J as = 1
        while text[i + wordcounter][1] == 'NNP':
            wordcounter +=1
            j = 1


            # While J is less than wordcounter, join text[i+j] to 
            # namedEnt. Increase J by 1. When that is no longer
            # the case append namedEnt to a namedEntity list
            while j < wordcounter:
                namedEnt = ' '.join([namedEnt,text[i+j][0]])
                j += 1
            InitialNamedEntity.append(namedEnt)

        i += wordcounter

If I print(i) at the start it goes up by 1 at a time. When I print the Counter of the NamedEntity list made up of namedEnts, i results as follows: (...'New Zealand': 7, 'Zealand': 7, 'United': 4, 'Prime Minister Minister Jacinda Minister Jacinda Ardern': 3...)

So im not only getting double counts as in 'New Zealand' and 'Zealand', but im also getting wacky results like 'Prime Minister Minister Jacinda Minister Jacinda Ardern'.

The results I would like would be ('New Zealand':7, 'United States':4,'Prime Minister Jacinda Ardern':3)

Any help would be greatly appreciated. Cheers

Just use a while loop here

juanpa.arrivillaga
– juanpa.arrivillaga

2019-10-21 03:27:10 +00:00
Commented Oct 21, 2019 at 3:27 — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Oct 21, 2019 at 3:27

Barmar · Accepted Answer · 2019-10-20 23:51:26Z

1

Don't use a for loop if you need to adjust how i is incremented, as it always sets it to the next value in the range. Use a while loop:

i = 0
while i < len(text):
    ...
    i += wordcounter

answered Oct 20, 2019 at 23:51

Barmar

789k57 gold badges554 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Michiel · Accepted Answer · 2019-10-21 03:35:25Z

1

range() creates an iterable object. The for...in construct calls a next method on it and each time next returns the next value in the sequence. So your i variable is not the index in that sequence, it's just the next value produced by the iterator. Modifying i has no effect, it will just be overwritten when the next value is retrieved from the sequence.

This is very different from a loop like for (int i = 0; i < 5; i++) {} in C, where there is no concept of a sequence; that just checks if i less than five before executing the block.

Compare it to this:

for i in {2,-1,-4}:
  print(i)
  i = i + 2

Perhaps here it is more obvious that setting i will have no effect.

But that C-like construct, you can do that in Python too. As follows:

i = 0
while i < 6:
  print(i)
  if i == 2:
    i = i + 2
  else:
    i = i + 1

This prints

See how it didn't output 3? When it got to i == 2, it added 2 so it skipped over 3. You can do something similar in your code.

(these examples were Python 3)

edited Oct 21, 2019 at 3:35

answered Oct 21, 2019 at 0:15

Michiel

1178 bronze badges

2 Comments

Michiel Over a year ago

I think you mean "constructor". And thank you, iterable is the correct term. I'll edit my answer. For any wanting to read more on range, here is the documentation: docs.python.org/3/library/stdtypes.html#typesseq

Michiel Over a year ago

Specifically, it's a generator.

Ameth Rawat · Accepted Answer · 2019-10-23 04:09:09Z

0

Thanks for the help everyone. I used the while loop shown by Barmar:

i = 0

while i < len(text):

i += wordcounter

and at the end used an if else statement:

if wordcounter > 0: i += wordcounter

else: i += 1

edited Oct 23, 2019 at 4:09

answered Oct 21, 2019 at 7:07

Ameth Rawat

231 silver badge5 bronze badges

Collectives™ on Stack Overflow

Adjusting iteration amounts in a Python loop

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related