How to parse HTML using the lxml.html library

Question

Here is the HTML that appears on my site:

<meta content="auth" name="param" />
<meta content="I_WANT_THIS" name="token" />

How can I use lxml.html to grab that?

alecxe · Accepted Answer · 2014-03-12 21:47:32Z

2

Use xpath to find the meta tag by name attribute and get the value of content attribute:

from lxml.html import fromstring


html_data = """ <meta content="auth" name="param" />
 <meta content="I_WANT_THIS" name="token" />"""

tree = fromstring(html_data)
print tree.xpath('//meta[@name="token"]/@content')

prints:

['I_WANT_THIS']

answered Mar 12, 2014 at 21:47

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3412816 Over a year ago

Can you explain what ".//meta[@name="token"]/@content'" does?

alecxe Over a year ago

@user3412816 yup, it is an xpath expression that basically says: find me meta tag anywhere in the html, this meta tag show have an attribute name with the value token, then give me the value of content attrubite.

Collectives™ on Stack Overflow

How to parse HTML using the lxml.html library

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related