Extract string from HTML String

Question

i want to extract a number from a html string (i usually do not know the number).

The crucial part looks like this:

<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>
<tagend>

And i want to extract the "286". I want to do something like "start after "L :" and stop before "<". How can i do this ? Thank you very much in advance.

maybe i should be a little more specific. The problem is this is a huge HTMl doc with hundreds of ":" symbols. The only unique combination in this file is "TOTAL : " or just "L : " works as well. i dont know the length of the number so the only option is to end the search after we reach the opening tag < of the next element. — Jannik732
– Jannik732, Commented Mar 4, 2020 at 10:22

Mace · Accepted Answer · 2020-03-04 10:39:15Z

1

If the string "TOTAL : number" is unique then use a regular expression to first search this substring and then extract the number from it.

import re

string = 'test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>'

reg__expr = r'TOTAL\s:\s\d+'  # TOTAL<whitespace>:<whitespace><number>
# find the substring
result = re.findall(reg__expr, string)
if result:

   substring = result[0]

   reg__expr = r'\d+'  # <number>
   result = re.findall(reg__expr, substring)
   number = int(result[0])

   print(number)

You can test your own regular expressions here https://regex101.com/

edited Mar 4, 2020 at 10:39

answered Mar 4, 2020 at 10:33

Mace

1,51011 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Saeed Ramezani · Accepted Answer · 2020-03-04 10:23:36Z

0

in your view.py document you can try this:

import re
my_string="TOTAL : 286"
int(re.search(r'\d+', my_string).group())

286

answered Mar 4, 2020 at 10:23

Saeed Ramezani

4828 silver badges21 bronze badges

Comments

CodeCupboard · Accepted Answer · 2020-03-04 10:29:39Z

0

If the HTML is always in the same format you would split the string at "TOTAL : " and then the first part of the next string is your answer. If what follows is consistent then spiting again would get you what you want.

Example HTML ( i have just made up the surrounding)

Target : 123
TOTAL : 286
Mass : 123

Code

t = """    Target : 123
    TOTAL : 286
    Mass : 123"""


print (t.split("TOTAL : ")[1].split("Mass")[0])

returns:

There are tools that do this mush neater such as beautifulsoup but for a basic example this works also.

edited Mar 4, 2020 at 10:29

answered Mar 4, 2020 at 10:26

CodeCupboard

1,5853 gold badges18 silver badges27 bronze badges

1 Comment

Jannik732 Over a year ago

exactly so i just need the pyhton syntax for that, that would solve the problem

Abhishek Kulkarni · Accepted Answer · 2020-03-04 10:30:32Z

0

You can try the following like this below:

    line = "TOTAL : 286"
    if line.startswith('TOTAL : '):
        print(line[8:len(line)])

Output :

answered Mar 4, 2020 at 10:30

Abhishek Kulkarni

1,7671 gold badge8 silver badges9 bronze badges

Comments

Rado · Accepted Answer · 2020-03-04 10:36:37Z

0

You can use string partitioning to extract a "number" string from the whole HTML string like this (assuming HTML code is in html_string variable):

num_string=html_string.partition("TOTAL:")[2].partition("<")[0]

there you get num_string with the number as a string, then simply convert it to an integer or whatever you want. Keep in mind that this will process the first occurence of anything that looks like "TOTAL: anything_goes_here <", so you want to make sure that this pattern is unique.

answered Mar 4, 2020 at 10:36

Rado

663 bronze badges

Comments

Jatin Chauhan · Accepted Answer · 2020-03-04 10:50:51Z

0

If your HTML String is this:

html_string = """<test test="3" test="search_summary_figure WHR WVM">TOTAL : 286</test>
<tagend>"""

Try this:

int(html_string.split("</test>")[0].split(":")[-1].replace(" ", ""))

answered Mar 4, 2020 at 10:50

Jatin Chauhan

3251 gold badge2 silver badges10 bronze badges

Collectives™ on Stack Overflow

Extract string from HTML String

6 Answers 6

Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related