Extracting Fields Names of an HTML form - Python

Question

Assume that there is a link "http://www.someHTMLPageWithTwoForms.com" which is basically a HTML page having two forms (say Form 1 and Form 2). I have a code like this ...

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
h = httplib2.Http('.cache')
response, content = h.request('http://www.someHTMLPageWithTwoForms.com')
for field in BeautifulSoup(content, parseOnlyThese=SoupStrainer('input')):
        if field.has_key('name'):
                print field['name']

This returns me all the field names that belong both to the Form 1 and Form 2 of my HTML page. Is there any way I can get only the Field names that belong to a particular form (say Form 2 only)?

Anas · Accepted Answer · 2013-11-11 12:43:02Z

5

If it's only 2 forms you may try this one:

from BeautifulSoup import BeautifulSoup

forms = BeautifulSoup(content).findAll('form')
for field in forms[1]:
    if field.has_key('name'):
            print field['name']

If it's not only about the 2nd form you make it more specific (by an id or class attributs

from BeautifulSoup import BeautifulSoup

forms = BeautifulSoup(content).findAll(attrs={'id' : 'yourFormId'})
for field in forms[0]:
    if field.has_key('name'):
            print field['name']

edited Nov 11, 2013 at 12:43

answered Aug 2, 2011 at 11:15

Anas

1,8612 gold badges14 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Khokhar Over a year ago

I have tried this solution but got following error message: "<type 'exceptions.NameError'>: global name 'BeautifulSoup' is not defined"

Anas Over a year ago

Please make sure that BeautifulSoup is installed and imported. I edited the response for the import.

shao.lo Over a year ago

for version 4 use 'from bs4 import BeautifulSoup'

mdeous · Accepted Answer · 2011-08-02 12:19:54Z

1

Doing this kind of parsing would also be quite easy using lxml (which i personally prefer over BeautifulSoup because of its Xpath support). For example, the following snippet would print all fields names (if they have one) which belong to forms named "form2":

# you can ignore this part, it's only here for the demo
from StringIO import StringIO
HTML = StringIO("""
<html>
<body>
    <form name="form1" action="/foo">
        <input name="uselessInput" type="text" />
    </form>
    <form name="form2" action="/bar">
        <input name="firstInput" type="text" />
        <input name="secondInput" type="text" />
    </form>
</body>
</html>
""")

# here goes the useful code
import lxml.html
tree = lxml.html.parse(HTML) # you can pass parse() a file-like object or an URL
root = tree.getroot()
for form in root.xpath('//form[@name="form2"]'):
    for field in form.getchildren():
        if 'name' in field.keys():
            print field.get('name')

answered Aug 2, 2011 at 12:19

mdeous

18.1k7 gold badges60 silver badges60 bronze badges

1 Comment

janek37 Over a year ago

This is not so good, it only looks at immediate children of the form element and does not check whether they are form inputs (other elements may also have name attributes).

Rusty · Accepted Answer · 2018-02-06 14:27:44Z

1

If you have lxml and cssselect python packages installed:

from lxml import html
def parse_form(form):
    tree = html.fromstring(form)
    data = {}
    for e in tree.cssselect('form input'):
        if e.get('name'):
            data[e.get('name')] = e.get('value')
    return data

answered Feb 6, 2018 at 14:27

Rusty

4,5315 gold badges41 silver badges52 bronze badges

Comments

CivFan · Accepted Answer · 2019-12-04 22:47:58Z

1

If you have attribute name and value, you can search

from BeautifulSoup import BeautifulStoneSoup
xml = '<person name="Bob"><parent rel="mother" name="Alice">'
xmlSoup = BeautifulStoneSoup(xml)

xmlSoup.findAll(name="Alice")
# []

edited Dec 4, 2019 at 22:47

CivFan

15.5k11 gold badges49 silver badges69 bronze badges

answered Aug 2, 2011 at 11:17

Kracekumar

20.5k11 gold badges50 silver badges56 bronze badges

Collectives™ on Stack Overflow

Extracting Fields Names of an HTML form - Python

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related