4

I'm using beautifulsoup at Python.
Is there a way to get property name with its value like:

name=title value=This is title

name=link value=.../style.css

soup.html.head=

<meta content="all" name="audience"/>
<meta content="2006-2013 webrazzi.com." name="copyright"/>
<title> This is title</title>
<link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>

1 Answer 1

3

Use .text or .string attribute to get text content of the element.

Use .get('attrname') or ['attrname'] to get attribute value.

html = '''
<head>
    <meta content="all" name="audience"/>
    <meta content="2006-2013 webrazzi.com." name="copyright"/>
    <title> This is title</title>
    <link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>
</head>
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print('name={} value={}'.format('title', soup.title.text))  # <----
print('name={} value={}'.format('link', soup.link['href'])) # <----

output:

name=title value= This is title
name=link value=.../style.css

UPDATE according to the OP's comment:

def get_text(el): return el.text
def get_href(el): return el['href']

# map tag names to functions (what to retrieve from the tag)
what_todo = {
    'title': get_text,
    'link': get_href,
}
for el in soup.select('head *'): # To retrieve all children inside `head`
    f = what_todo.get(el.name)
    if not f: # skip non-title, non-link tags.
        continue
    print('name={} value={}'.format(el.name, f(el)))

output: same as above

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for your reply. It is working. But i'm looking for another way, using a loop to get all value at one time.like while(){print property,value}
@ridvanzoro, Then you need to define what tags should retrieve text content, what tags should retrieve what attributes first.
@ridvanzoro, Do you mean content attribute of meta tag? Did you define it in the mapping?
@ridvanzoro, It seems like, in your real html, there's no content attribute for some tags. Replacing el['content'] with el.get('content', 'fallback-default-value') will give you default value instead of raising KeyError. Try it.
@ridvanzoro, If you have another question, please post a separated question. Doing so, you have more chance to be answered.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.