9

My goal is to grab a list of all input names and values. To pair them up and submit the form. The names and values are randomised.

from bs4 import BeautifulSoup # parsing

html = """
<html>
<head id="Head1"><title>Title Page</title></head>
<body>
    <form id="formS" action="login.asp?dx=" method="post">

    <input type=hidden name=qw1NWJOJi/E8IyqHSHA== value='gDcZHY+nV' >
    <input type=hidden name=sfqwWJOJi/E8DFDHSHB== value='kgDcZHY+n' >
    <input type=hidden name=Jsfqw1NdddfDDSDKKSL== value='rNg4pUhnV' >
    </form>

</body>

</html>
"""

html_proc = BeautifulSoup(html)

This bit works fine:

print html_proc.find("input", value=True)["value"]
> gDcZHY+nV

However the following statements don't work or don't work as hoped:

print html_proc.find("input", name=True)["name"]
> TypeError: find() got multiple values for keyword argument 'name'

print html_proc.findAll("input", value=True, attrs={'value'})
> []  

print html_proc.findAll('input', value=True)
> <input name="qw1NWJOJi/E8IyqHSHA==" type="hidden" value="gDcZHY+nV">
> <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" value="kgDcZHY+n">
> <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
> </input></input></input>, <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" 
> value="kgDcZHY+n">
> <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
> </input></input>, <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4p
> UhnV"></input>

2 Answers 2

23

You cannot submit a form with BeautifulSoup, but here's how you can get the list of name,value pairs:

print [(element['name'], element['value']) for element in html_proc.find_all('input')]

prints:

[('qw1NWJOJi/E8IyqHSHA==', 'gDcZHY+nV'), 
 ('sfqwWJOJi/E8DFDHSHB==', 'kgDcZHY+n'), 
 ('Jsfqw1NdddfDDSDKKSL==', 'rNg4pUhnV')]
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. Brilliant. Parsimonious solutions to code are intoxicating.
When my reputation is higher I'll come back and rate your answer up.
@sarasimple thanks, but don't worry about it, just glad it helped. Happy web-scraping!
alcxe, what if I need to submit a form? What would you recommend?
7
d = {e['name']: e.get('value', '') for e in html_proc.find_all('input', {'name': True})}
print(d)

prints:

{'sfqwWJOJi/E8DFDHSHB==': 'kgDcZHY+n', 
 'qw1NWJOJi/E8IyqHSHA==': 'gDcZHY+nV', 
 'Jsfqw1NdddfDDSDKKSL==': 'rNg4pUhnV'}

Building on @alecxe, this avoids KeyErrors, and parses the form into a dictionary, more ready for requests.

url = 'http://example.com/' + html_proc.form['action']
requests.post(url , data=d)

Though if this gets any more complicated (cookies, scripts) you might want to Mechanize.


The reason for the TypeError is confusion over the first parameter to find() being 'name'. Instead html_proc.find("input", attrs={'name': True}). Also for the attrs parameter, instead of the set {'value'} use the dictionary {'value': True}.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.