0

I am trying to get the string value for each link. (For example, like Pennsylvania)

 <li class="facetbox-shownrow ">
    <a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&amp;s=1&amp;q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
        Pennsylvania        <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span>    </a>
</li>
   </a> 

But since there are title and id attributes, I am a bit confused about how to do it. I get a null result when I display my array. Here is my code :

  for link in links_array:

    main_url_link = base_url_link + link
    html_page_link = requests.get(main_url_link)
    soup_link = BeautifulSoup(html_page_link.text, 'html.parser')
    allData_link = soup_link.findAll('li',{'class':'facetbox-shownrow'})
  
    distric = [y.text_content() for y in allData_link]
    district_array.append(distric)


district_array 

1 Answer 1

1

Use .stripped_strings to generate a list of strings of elements in your selection and pick / slice the result - In this case pick first element to get Pennsylvania:

[list(x.stripped_strings)[0] for x in soup.find_all('li',{'class':'facetbox-shownrow'})]

Note In new code find_all() should be used, findAll() actually still works but is very old syntax

To get the href:

[x.a['href'] for x in soup.find_all('li',{'class':'facetbox-shownrow'})]

Example

With multiple li tags:

from bs4 import BeautifulSoup

html="""
<li class="facetbox-shownrow ">
    <a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&amp;s=1&amp;q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
        Pennsylvania        <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span>    </a>
</li>
<li class="facetbox-shownrow ">
    <a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&amp;s=1&amp;q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
        Main        <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span>    </a>
</li>
<li class="facetbox-shownrow ">
    <a href="/bill/116th-congress/house-bill/9043/cosponsors?r=1&amp;s=1&amp;q=%7B%22search%22%3A%5B%22H.R.9043%22%2C%22H.R.9043%22%5D%2C%22cosponsor-state%22%3A%22Pennsylvania%22%7D" title="include this search constraint" id="facetItemcosponsor-statePennsylvania">
        California        <span id="facetItemcosponsor-statePennsylvaniacount" class="count">[1]</span>    </a>
</li>
"""
soup=BeautifulSoup(html,"html.parser")

[list(x.stripped_strings)[0] for x in soup.find_all('li',{'class':'facetbox-shownrow'})]

Output

['Pennsylvania', 'Main', 'California']
Sign up to request clarification or add additional context in comments.

3 Comments

I just realized that after I tried what you said. I cannot get the link in links_array. But when I display just links_array, I can get the output. Do you have any idea about it?
If you mean the href with link, then take a look in my answer I updated it, else give some more context please. thanks
Thank you very much for your help. I posted another question about it. stackoverflow.com/questions/3575359/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.