4

I have following html:

<td>
   <input maxlen="1" name="db" size="1" type="text" value="25"/>
   <div style="display:inline-block;position:relative;top:6px;left:0px;width:20px;">
    <input class="p_b" name="ta" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▴"/>
    <input class="p_b" name="ta" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▾"/>
   </div>
   <span style="position:relative;top:8px">
    
   </span>
   <input maxlen="1" name="dc" size="1" type="text" value="0"/>
   <div style="display:inline-block;position:relative;top:6px;left:0px;width:20px;">
    <input class="p_b" name="tb" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▴"/>
    <input class="p_b" name="tb" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▾"/>
   </div>
  </td>

I need to extract both numbers from value="25" and value="0". I made a workaround like:

y = soup.findAll('input', {'type':'text'})
a = re.findall('(?<=value=")(\d*)',str(y))

But I think there is should be more direct way to do it via parser, can anyone help with it?

5
  • 1
    I'd use an XPath approach. Commented Dec 15, 2020 at 11:45
  • More or less a duplicate of this post. Commented Dec 15, 2020 at 11:46
  • 2
    Does this answer your question? Python beautifulsoup - getting input value Commented Dec 15, 2020 at 11:57
  • @MetallimaX BeautifulSoup doesn't suport XPath Commented Dec 15, 2020 at 12:00
  • @Parolla I know and you don't have to stick to it either. XPath has been done for those kind of queries. Commented Dec 15, 2020 at 12:01

1 Answer 1

1

Try below code line to extract @value from each input node

values = [element['value'] for element in soup.findAll('input', {'type':'text'})]

P.S. Note that using regex for web-scraping is a very bad practice - there are enough web-scraping tools that can easily do this for you (for instance, BeautifulSoup and lxml can be used in Python)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.