0

I need to extract from this website link name of the city where shops are located. I created this code:

def get_page_data(number):
    print('number:', number)

    url = 'https://www.biedronka.pl/pl/sklepy/lista,lat,52.25,lng,21,page,'.format(number)
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    container = soup.find(class_='s-content shop-list-page')
    items = container.find_all(class_='shopListElement')

    dane = []
    for item in items:
        miasto = item.find(class_='h4').get_text(strip=True)
        adres = item.find(class_='shopFullAddress').get_text(strip=True)
        dane.append([adres])

    return dane

wszystkie_dane = []
for number in range(1, 2):
    dane_na_stronie = get_page_data(number)

    wszystkie_dane.extend(dane_na_stronie)

dane = pd.DataFrame(wszystkie_dane, columns=['miasto','adres'])

dane.to_csv('biedronki_lista.csv', index=False)

The problem appears in:

   miasto = item.find(class_='h4').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'

Any ideas how to extract name of the city (in h4) from this website?

0

2 Answers 2

3

class_='h4' is an attribute you are passing a tag name to the class which is not correct instead :

miasto = item.find('h4').get_text(strip=True)
Sign up to request clarification or add additional context in comments.

Comments

2

Try using:

miasto = item.find('h4').text.split()[0]

Or:

miasto = item.find('h4').get_text(strip=True)

Note:

"h4" is a tag, not a class.


Explanation:

  • When you give .find('h4'), it returns:
<h4 style="margin-bottom: 10px;">

                Rzeszów             <span class="shopFullAddress">ul.<span class="shopAddress"> </span></span>
  • When you give .text, it returns:
'Rzeszów            \tul.'
  • When you give .split(), it returns:
['Rzeszów', 'ul.']
  • And from this we take what we require.

So do this where-ever you face error in this code.

dane = []
    for item in items:
        miasto = item.find('h4').get_text(strip=True)
        adres = item.find('shopFullAddress').get_text(strip=True)
        dane.append([adres])

2 Comments

remove the class_=
shopFullAddress is not a class

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.