Python HTML parse, getting tag name with its value

Question

I'm using beautifulsoup at Python.
Is there a way to get property name with its value like:

name=title value=This is title

name=link value=.../style.css

soup.html.head=

<meta content="all" name="audience"/>
<meta content="2006-2013 webrazzi.com." name="copyright"/>
<title> This is title</title>
<link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>

falsetru · Accepted Answer · 2014-02-25 08:12:37Z

3

Use .text or .string attribute to get text content of the element.

Use .get('attrname') or ['attrname'] to get attribute value.

html = '''
<head>
    <meta content="all" name="audience"/>
    <meta content="2006-2013 webrazzi.com." name="copyright"/>
    <title> This is title</title>
    <link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>
</head>
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print('name={} value={}'.format('title', soup.title.text))  # <----
print('name={} value={}'.format('link', soup.link['href'])) # <----

output:

name=title value= This is title
name=link value=.../style.css

UPDATE according to the OP's comment:

def get_text(el): return el.text
def get_href(el): return el['href']

# map tag names to functions (what to retrieve from the tag)
what_todo = {
    'title': get_text,
    'link': get_href,
}
for el in soup.select('head *'): # To retrieve all children inside `head`
    f = what_todo.get(el.name)
    if not f: # skip non-title, non-link tags.
        continue
    print('name={} value={}'.format(el.name, f(el)))

output: same as above

edited Feb 25, 2014 at 8:12

answered Feb 25, 2014 at 8:01

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ridvanzoro Over a year ago

Thanks for your reply. It is working. But i'm looking for another way, using a loop to get all value at one time.like while(){print property,value}

falsetru Over a year ago

@ridvanzoro, Then you need to define what tags should retrieve text content, what tags should retrieve what attributes first.

falsetru Over a year ago

@ridvanzoro, Do you mean content attribute of meta tag? Did you define it in the mapping?

falsetru Over a year ago

@ridvanzoro, It seems like, in your real html, there's no content attribute for some tags. Replacing el['content'] with el.get('content', 'fallback-default-value') will give you default value instead of raising KeyError. Try it.

falsetru Over a year ago

@ridvanzoro, If you have another question, please post a separated question. Doing so, you have more chance to be answered.

|

Collectives™ on Stack Overflow

Python HTML parse, getting tag name with its value

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related