1

I'm learning a bit of python and I'm doing the python workbook exercises . Right now I'm stuck on one called Tokenizing a String . I'm sure you know what that means . In my case the string must be a math equation and my code must tokenize it . here is my code :

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            j=j+1
        while x[j]>="0" and x[j]<="9":
            temp = temp + x[j]
            while j<len(x):
                j=j+1
        if temp!="":
            list.append(temp)
            temp=""
    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

So the problem is I can't find a solution where it is not running out of range . It all comes from that "while" loop . I tried the solution from the book , which was quite similar to my one , but it gives the same result . How can I avoid running out of range when I'm using while and adding to counter "j" in my case?

Thanks !!!

1
  • 1
    Can you give examples of what you would consider to be valid math expressions and their respective outputs? Also, you probably don't want to override the built-in list function Commented Apr 3, 2022 at 7:25

2 Answers 2

1

The problem is you are adding 1 to j in this block:

while j < len(x):
    if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
        list.append(x[j])
        j=j+1
    while x[j]>="0" and x[j]<="9":
        temp = temp + x[j]
        while j<len(x):
            j=j+1
    if temp!="":
        list.append(temp)
        temp=""

Let's say j = len(x)-1 and the if statement evaluates to be True. This will execute the j=j+1 statement. Now when it enters the while loop, it checks whether x[j]>="0" but x[j] = x[len(x)]. Since we know that indexing starts at zero, for an array like

a = "abcd"

len(a) = 4 but a[4] does not exist(last element is 3rd one) causing an IndexError.

Code with corrections:

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            print(x[j])
            print(list)
            j=j+1
        else: # Error 1: You code needs to execute this only if
              # the above condition fails
            j = j
            while j<len(x) and x[j].isnumeric(): # 2: You need to check both
                                                 # if the current character 
                                                 # is an integer and if 
                                                 # the index is out of range
                
                temp = temp + x[j]
                # while j<len(x)-1: No need for this statement
                j=j+1
            if temp!="":
                list.append(temp)
                temp=""
            
    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer Anshumaan . I understand that , but how can I avoid it ? I bashing my head in the desk for the last few hours now . Tried for loop tried to find some solution adding more conditions like if etc. . No matter what I do the last symbol in the string is always out of range . I'm sure there is some simple solution to that , I just can't think of it !
@GeorgiIvanov I have added the code
Thanks again man . I found out why my second solution worked and it is so simple . in the while loop the "j<len(x)" condition was last . When I put it first the first code works just fine as well :) . Had no idea this matters as well .
0

I have no idea why but this works :

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == "+" or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            j=j+1
        while j<len(x) and x[j].isnumeric():
                temp=temp+x[j]
                j=j+1
        if temp!="":
            list.append(temp)
            temp=""

    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

I change the " x[j]>="0" and x[j]<="9"" statement with .isnumeric() and for some weird reason it now works . For me both conditions are identical . Can anyone explain why this works ? I really want to learn how to overcome cases like that in future without loosing my sanity !!!

Thanks

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.