1

So this is the html I'm working with

<hr>
<b>1914 December 12 - </b>. 
<ul>
    <li>
        <b>Birth of Herbert Hans Guendel</b> - . 
        <i>Nation</i>: 
        <a href="http://www.astronautix.com/g/germany.html">Germany</a>, 
        <a href="http://www.astronautix.com/u/usa.html">USA</a>. 
        <i>Related Persons</i>: 
        <a href="http://www.astronautix.com/g/guendel.html">Guendel</a>.
     
    German-American engineer in WW2, member of the Rocket Team in the United
     States thereafter. German expert in guided missiles during WW2. As of 
    January 1947, working at Fort Bliss, Texas. Died at Boston, New York.. 
    </li>
</ul>

I would like for it to look like this:

Birth of Herbert Hans Guendel
German-American engineer in WW2, member of the Rocket Team in the United
     States thereafter. German expert in guided missiles during WW2. As of 
    January 1947, working at Fort Bliss, Texas. Died at Boston, New York.

Here's my code:

from bs4 import BeautifulSoup
import requests
import linkMaker as linkMaker

url = linkMaker.link

page = requests.get(url)

soup = BeautifulSoup(page.content, "html.parser")

with open("test1.txt", "w") as file:
    hrs = soup.find_all('hr')
    for hr in hrs:
        lis = soup.find_all('li')
        for li in lis:
            file.write(str(li.text)+str(hr.text)+"\n"+"\n"+"\n")

Here's what it's returning:

Birth of Herbert Hans Guendel - . 
: Germany, 
USA. 
Related Persons: Guendel. 
German-American engineer in WW2, member of the Rocket Team in the United States thereafter. German expert in guided missiles during WW2. As of January 1947, working at Fort Bliss, Texas. Died at Boston, New York.. 

My ultimate Goal is to get those two parts of the html tags to tweet them out.

1 Answer 1

1

Looking at the HTML snippet for title you can search for first <b> inside the <li> tag. For the text you can search the last .contents of the <li> tag:

from bs4 import BeautifulSoup


html_doc = """\
<hr>
<b>1914 December 12 - </b>. 
<ul>
    <li>
        <b>Birth of Herbert Hans Guendel</b> - . 
        <i>Nation</i>: 
        <a href="http://www.astronautix.com/g/germany.html">Germany</a>, 
        <a href="http://www.astronautix.com/u/usa.html">USA</a>. 
        <i>Related Persons</i>: 
        <a href="http://www.astronautix.com/g/guendel.html">Guendel</a>.
     
    German-American engineer in WW2, member of the Rocket Team in the United
     States thereafter. German expert in guided missiles during WW2. As of 
    January 1947, working at Fort Bliss, Texas. Died at Boston, New York.. 
    </li>
</ul>"""

soup = BeautifulSoup(html_doc, "html.parser")

title = soup.find("li").b.text
text = soup.find("li").contents[-1].strip(" .\n")

print(title)
print(text)

Prints:

Birth of Herbert Hans Guendel
German-American engineer in WW2, member of the Rocket Team in the United
     States thereafter. German expert in guided missiles during WW2. As of 
    January 1947, working at Fort Bliss, Texas. Died at Boston, New York
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.