So I am trying to do a POST request to a website and this website will display a CSV, however, the CSV is not downloadable only there in the form it is in so can be copied and pasted. I am trying to get the HTML from the POST request and get the CSV, export this into a CSV file, to then run a function on. I have managed to get it into CSV form as a string but there doesn't appear to be new lines i.e.
>>> print(text1)
"Heading 1","Heading 2""Item 1","Item 2"
not
"Heading 1","Heading 2"
"Item 1","Item 2"
Is this format OK?
If not how do I get it into an OK format?
Secondly, how can I write this string into a CSV file? If I try to convert text1 into bytes, I get _csv.Error: iterable expected, not int, if not I get TypeError: a bytes-like object is required, not 'str'.
My code so far:
with requests.Session() as s:
response = s.post(headers=headers, data=data, url=url)
html = response.content
soup = BeautifulSoup(html, features="html.parser")
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
text1 = text.replace(text[:56], '')
print(text1)
text1strsoup.get_text()because that will give you all the text in all the html elements on the page. You can go to the table element and just scrape the text of each table row to a list of lists with items as<td>