0

i want to extract a number from a html string (i usually do not know the number).

The crucial part looks like this:

<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>
<tagend>

And i want to extract the "286". I want to do something like "start after "L :" and stop before "<". How can i do this ? Thank you very much in advance.

1
  • maybe i should be a little more specific. The problem is this is a huge HTMl doc with hundreds of ":" symbols. The only unique combination in this file is "TOTAL : " or just "L : " works as well. i dont know the length of the number so the only option is to end the search after we reach the opening tag < of the next element. Commented Mar 4, 2020 at 10:22

6 Answers 6

1

If the string "TOTAL : number" is unique then use a regular expression to first search this substring and then extract the number from it.

import re

string = 'test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>'

reg__expr = r'TOTAL\s:\s\d+'  # TOTAL<whitespace>:<whitespace><number>
# find the substring
result = re.findall(reg__expr, string)
if result:

   substring = result[0]

   reg__expr = r'\d+'  # <number>
   result = re.findall(reg__expr, substring)
   number = int(result[0])

   print(number)

You can test your own regular expressions here https://regex101.com/

Sign up to request clarification or add additional context in comments.

Comments

0

in your view.py document you can try this:

import re
my_string="TOTAL : 286"
int(re.search(r'\d+', my_string).group())

286

Comments

0

If the HTML is always in the same format you would split the string at "TOTAL : " and then the first part of the next string is your answer. If what follows is consistent then spiting again would get you what you want.

Example HTML ( i have just made up the surrounding)

Target : 123
TOTAL : 286
Mass : 123

Code

t = """    Target : 123
    TOTAL : 286
    Mass : 123"""


print (t.split("TOTAL : ")[1].split("Mass")[0])

returns:

286

There are tools that do this mush neater such as beautifulsoup but for a basic example this works also.

1 Comment

exactly so i just need the pyhton syntax for that, that would solve the problem
0

You can try the following like this below:

    line = "TOTAL : 286"
    if line.startswith('TOTAL : '):
        print(line[8:len(line)])

Output :

    286

Comments

0

You can use string partitioning to extract a "number" string from the whole HTML string like this (assuming HTML code is in html_string variable):

num_string=html_string.partition("TOTAL:")[2].partition("<")[0]

there you get num_string with the number as a string, then simply convert it to an integer or whatever you want. Keep in mind that this will process the first occurence of anything that looks like "TOTAL: anything_goes_here <", so you want to make sure that this pattern is unique.

Comments

0

If your HTML String is this:

html_string = """<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>
<tagend>"""

Try this:

int(html_string.split("</test>")[0].split(":")[-1].replace(" ", ""))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.