1

I'm scraping a Cyrillic website with python using BeautifulSoup, but I'm having some trouble, every word is showing like this:

СилÑановÑка Ðавкова во Ðази

I also tried some other Cyrillic websites, but they are working good.

My code is this:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://').text

soup = BeautifulSoup(source, 'lxml')

print(soup.prettify())

How should I fix it?

1 Answer 1

4

requests fails to detect it as utf-8.

from bs4 import BeautifulSoup
import requests

source = requests.get('https://time.mk/')  # don't convert to text just yet

# print(source.encoding)
# prints out ISO-8859-1

source.encoding = 'utf-8'  # override encoding manually

soup = BeautifulSoup(source.text, 'lxml')  # this will now decode utf-8 correctly
Sign up to request clarification or add additional context in comments.

6 Comments

The site doesn't serve a content-type header so requests falls back to ISO-8859-1/latin-1. However there is a meta tag in the html that defines the charset, so another approach might be to pass source.content to BeautifulSoup and let BeautifulSoup handle the decoding.
when I add this line " source.encoding = 'utf-8' " I don't have any errors but the output is blank !? Did you get any result with this?
@scpbook setting a variable doesn't print anything. Just like foo = 42 doesn't print anything unless you print(foo). You can add a print(source.encoding) on the following line to test it, or simply see if it fixed your problem. It has for me, at least.
@PatrykBratkowski of course im printing it, my code: from bs4 import BeautifulSoup import requests source = requests.get('https://time.mk/') source.encoding = 'utf-8' soup = BeautifulSoup(source.text, 'lxml') print(soup) It shows that i have 2740 lines of text, but when I open it its empty.
@scpbook I think you should make a new post, if you are having a different problem now, as SO isn't really suited to discuss it in comments. The code I posted definitely works.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.