BeautifulSoup and Python Lambda

Question

I am having a hard time understanding this code.

I would like to extract HTML comments using BeautifulSoup and Python3.

Given:

html = '''
       <!-- Python is awesome -->
       <!-- Lambda is confusing -->
       <title>I don't grok it</title>
       '''

soup = BeautifulSoup(html, 'html.parser')

I searched for solutions and most people said:

comments = soup.find_all(text= lambda text: isinstance(text, Comment))

Which in my case would result in:

[' Python is awesome ', ' Lambda is confusing ']

This is what I understand:

isinstance asks if text is an instance of Comment and returns a boolean.
I sort of understand lambda. Takes text as an argument and evaluates the isinstance expression.
You can pass a function to find_all

This is what I do not understand:

What is text in text=?
What is text in lambda text?
What argument from html is passed into lambda text
soup.text returns I don't grok it. Why is lambda text passing  as an argument?

It really sounds like you need a basic Python tutorial, followed by a detailed reading of the relevant beautifulsoup docs. The docs are pretty hard to understand if you do not have a working knowledge of basic Python constructs. I am telling you this because even the explanations you get here will not help you as much as they ought until you understand the underlying concepts. While jumping head first into a subject is indeed a good way to learn, fundamentals are also very important. — Mad Physicist
– Mad Physicist, Commented Apr 24, 2018 at 13:44
@MadPhysicist can you please clarify what fundamentals I need? I think I am strong with basic Python. — tomordonez
– tomordonez, Commented Apr 24, 2018 at 15:10

innicoder · Accepted Answer · 2018-04-24 16:45:15Z

5

Summary

.find_all() goes through each line and tries to match text='<our_text>. Instead of an actual string (like in the example down) '<our_text>' is a lambda function that basically has a condition.

I'll explain each part of this question.

`text=`

html = '''
       <!--Python is awesome-->
       <!--Lambda is confusing-->
       <title>I don't grok it</title>
       '''

soup = BeautifulSoup(html, 'html.parser')

print(soup.find_all(text='Python is awesome'))

Output:

['Python is awesome']

Here text= is only a parameter (i.e. argument) where we can pass a regex or another function or a variable or 'string'. It just happened to be a lambda in our case. We'll explain next what the lambda does.

`Lambda`

This lambda function takes in text variable as input.

We automatically feed the text of each line into the lambda-func with .find_all

lambda text: isinstance(text, Comment)

And the isinstance checks if the first arg. text is Comment it either returns True OR False. Example: some_var = 'Ey man' then I do isisntance(some_var, str) -> True. It's a string (str).

Next, we combine both of these.

soup.find_all(text= lambda text: isinstance(text, Comment))

soup.find_all - goes through each line <--Python is awesome.., <--Lambda.. <title>I..
We have a condition within the .find_all(<the_condition>) and keep the lines that fulfill that condition
The condition in our case is,

3.1. Firstly we don't check everything only the clear, plain English text and inside tags, and/or whatever string there is. That's text=

3.2. The text also has a condition, it doesn't take any text, only if a lambda function returns True, i.e. fulfills the condition of the lambda.

3.3. The lambda condition is that it has to be an instance of Comment meaning only if it's a Comment it will return True.

Only and only if all these conditions are met we take that line and store it.

edited Apr 24, 2018 at 16:45

answered Apr 24, 2018 at 16:28

innicoder

2,7203 gold badges17 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

innicoder Over a year ago

@tomorodonez I have explained it in great detail in my answer, let me know if I should clarify anything else. I would also appreciate edits (suggestions or other ways) to my answer if it's going to improve it. Anything, readability -- functionality -- effectivness.

tomordonez Over a year ago

Thanks for the details. I think the only change is that after digging into the BS4 docs. It said that the text argument was replaced by string. Then the text variable inside lambda could be anything. Such as: soup.find_all(string= lambda html_comment: isinstance(html_comment, Comment)). Thanks.

innicoder Over a year ago

text is still supported. Yes exactly, awesome understanding, thank you too for going through the effort of looking into the docs, you made an educated and formidable question and without any research, it'd be hard for anyone to explain all of this. Take care!

user1443098 · Accepted Answer · 2018-04-24 13:37:28Z

2

What is text in text=?

A keyword argument to the find_all function

What is text in lambda text?

The parameter for the function, same as

def <name>(text)...

What argument from html is passed into lambda text

that would be up to you, in the sample the variable Comments refers to the text to parse.

soup.text returns I don't grok it. Why is lambda text passing as an argument?

that's just an example to be replaced with real HTML

answered Apr 24, 2018 at 13:37

user1443098

7,75511 gold badges43 silver badges79 bronze badges

2 Comments

tomordonez Over a year ago

Thanks. Can you please clarify? I know text= is a keyword argument. But what is its value? I know lambda text is the same as def name(text). But what is the value of text there?. There is no variable Comments. Above comments is refer to a list of results and not as the text to parse. What do you mean to be replaced with real HTML? Thanks.

user1443098 Over a year ago

The value is the the lambda (a function definition, in other words). The value of text in the lambda definition is set at run time, when it is called by find_all. Comment is the type being tested in isinstance. The sample code you posted has sample html, in the real world it will be, well, real! Like the html for this page you are looking at, right now.

Collectives™ on Stack Overflow

BeautifulSoup and Python Lambda

2 Answers 2

Summary

`text=`

`Lambda`

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Summary

text=

Lambda

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related

`text=`

`Lambda`