0

I'm novice with python and beautiful so this answer may be obvious.

I'm using beautiful soup to parse the following html and extract the Date.

html='''
<p><strong>Event:</strong>Meeting</p>
<p><strong>Date:</strong> Mon, Apr 25, 2016, 11 am</p>
<p><strong>Price:</strong>$20.00</p>

<p><strong>Event:</strong>Convention</p>
<p><strong>Date:</strong> Mon, May 2, 2016, 11 am</p>
<p><strong>Price:</strong>$25.00</p>

<p><strong>Event:</strong>Dance</p>
<p><strong>Date:</strong> Mon, May 9, 2016, 11 am</p>
<p><strong>Price:</strong>Free</p>
'''

I parsed the date when there is only one date using the following code but having a hard time when encountering multiple dates (only gets one date).

date_raw = html.find_all('strong',string='Date:')
date = str(date_raw.p.nextSibling).strip()

Is there a way to do this in bs4 or should I use regular expressions. Any other suggestions?

Desired list output:

['Mon, Apr 25, 2016, 11 am','Mon, May 2, 2016, 11 am','Mon, May 9, 2016, 11 am']

2 Answers 2

1

I would probably iterate of every found element and append it to a list. Something like this maybe (untested):

date_list = []
date_raw = html.find_all('strong',string='Date:')

for d in date_raw:
    date = str(d.p.nextSibling).strip()
    date_list.append(date)

print date_list
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your attempt to help. I posted the answer. I had to put in the index and iterate off of that.
1

Rookie mistake...fixed it:

for x in range(0,len(date_raw)):
    date_add = date_raw[x].next_sibling.strip()
    date_list.append(date_add)
    print (date_add)

5 Comments

Good you spotted your error. FYI iterating over range(len(something)) is considered an anti-pattern in Python. Ed Dunn's answer with for d in date_raw is more idiomatic.
Thanks for your feedback! Could you elaborate on how this is an anti-pattern? I see it as redundant but not counter productive in terms of logic. Is it an anti-pattern because it takes extra unnecessary steps?
Trey, I reworked the code. This doesn't look right but its the only way I could get it to work. ' for x in date_raw: date_add = date_raw[date_raw.index(x)].next_sibling.strip() date_list.append(date_add)'
for x in date_raw: date_add = date_raw[date_raw.index(x)].next_sibling.strip() date_list.append(date_add)
I believe date_raw[date_raw.index(x)] could simply be x.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.