How to deal with "String index out of range " in Python

Question

I'm learning a bit of python and I'm doing the python workbook exercises . Right now I'm stuck on one called Tokenizing a String . I'm sure you know what that means . In my case the string must be a math equation and my code must tokenize it . here is my code :

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            j=j+1
        while x[j]>="0" and x[j]<="9":
            temp = temp + x[j]
            while j<len(x):
                j=j+1
        if temp!="":
            list.append(temp)
            temp=""
    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

So the problem is I can't find a solution where it is not running out of range . It all comes from that "while" loop . I tried the solution from the book , which was quite similar to my one , but it gives the same result . How can I avoid running out of range when I'm using while and adding to counter "j" in my case?

Thanks !!!

Can you give examples of what you would consider to be valid math expressions and their respective outputs? Also, you probably don't want to override the built-in list function — jackal
– jackal, Commented Apr 3, 2022 at 7:25

Anshumaan Mishra · Accepted Answer · 2022-04-03 08:10:05Z

1

The problem is you are adding 1 to j in this block:

while j < len(x):
    if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
        list.append(x[j])
        j=j+1
    while x[j]>="0" and x[j]<="9":
        temp = temp + x[j]
        while j<len(x):
            j=j+1
    if temp!="":
        list.append(temp)
        temp=""

Let's say j = len(x)-1 and the if statement evaluates to be True. This will execute the j=j+1 statement. Now when it enters the while loop, it checks whether x[j]>="0" but x[j] = x[len(x)]. Since we know that indexing starts at zero, for an array like

a = "abcd"

len(a) = 4 but a[4] does not exist(last element is 3rd one) causing an IndexError.

Code with corrections:

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            print(x[j])
            print(list)
            j=j+1
        else: # Error 1: You code needs to execute this only if
              # the above condition fails
            j = j
            while j<len(x) and x[j].isnumeric(): # 2: You need to check both
                                                 # if the current character 
                                                 # is an integer and if 
                                                 # the index is out of range
                
                temp = temp + x[j]
                # while j<len(x)-1: No need for this statement
                j=j+1
            if temp!="":
                list.append(temp)
                temp=""
            
    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

edited Apr 3, 2022 at 8:10

answered Apr 3, 2022 at 7:10

Anshumaan Mishra

1,3721 gold badge6 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Georgi Ivanov Over a year ago

Thanks for your answer Anshumaan . I understand that , but how can I avoid it ? I bashing my head in the desk for the last few hours now . Tried for loop tried to find some solution adding more conditions like if etc. . No matter what I do the last symbol in the string is always out of range . I'm sure there is some simple solution to that , I just can't think of it !

Anshumaan Mishra Over a year ago

@GeorgiIvanov I have added the code

Georgi Ivanov Over a year ago

Thanks again man . I found out why my second solution worked and it is so simple . in the while loop the "j<len(x)" condition was last . When I put it first the first code works just fine as well :) . Had no idea this matters as well .

Georgi Ivanov · Accepted Answer · 2022-04-03 08:07:10Z

I have no idea why but this works :

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            j=j+1
        while j<len(x) and x[j].isnumeric():
                temp=temp+x[j]
                j=j+1
        if temp!="":
            list.append(temp)
            temp=""

    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

I change the " x[j]>="0" and x[j]<="9"" statement with .isnumeric() and for some weird reason it now works . For me both conditions are identical . Can anyone explain why this works ? I really want to learn how to overcome cases like that in future without loosing my sanity !!!

Thanks

Collectives™ on Stack Overflow

How to deal with "String index out of range " in Python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related