Python Text Specific Search from a Html Page

Question

I am trying to make a programm which could do a text based search from an HTML webpage and write the columns under those attributes.The code I try is:

import time
from bs4 import BeautifulSoup
import urllib.request
response=urllib.request.urlopen('http://api.sl.se/api2/deviations.html?key=--------')

html=response.read()

soup = BeautifulSoup(html, 'html.parser')
field.value.text for field in document.find_all("Details") if field["Details"] == "MATCHING.Title"]

#text = soup.get_text(strip=True)

#print(soup.get_text("Details"))

for tag in soup.find_all(['Details']):
  print(tag)

[field.value.text for field in document.find_all("field") if field["name"] == "MATCHING.Title"]

The needed output is the separate columns in CSV with Headings Details, FromDateTime, UpToDateTime,Updated

The URL you're using, api.sl.se/api2/… , contains JSON data. You can just parse it using the JSON module... — AKX
– AKX, Commented Sep 30, 2019 at 9:39
@AKX is right ;) please read about JSON first ;) No need for beautiful soup here. BTW:Please remove your api key from the question and better add minimal example data. — koks der drache
– koks der drache, Commented Sep 30, 2019 at 9:42
Ok..where can I get more resources on how to parse with json please? and key removed now..thx — Zygote
– Zygote, Commented Sep 30, 2019 at 9:46
can you please share html text, so that we can filter using BS4 — Akash Pagar
– Akash Pagar, Commented Sep 30, 2019 at 9:47
Try it with the standard library documentation about JSON. Furthermore I suggest working with the python requests module. It handles these kind of things a lot better. — Kraay89
– Kraay89, Commented Sep 30, 2019 at 9:47

AKX · Accepted Answer · 2019-09-30 10:46:52Z

To parse things as JSON and output using the built-in CSV module:

import urllib.request
import json
import csv
import sys

response = urllib.request.urlopen('http://api.sl.se/api2/deviations.html?key=...')
data = json.loads(response.read())

headers = ['Details', 'FromDateTime', 'UpToDateTime', 'Updated']
writer = csv.writer(sys.stdout)  # write to standard output; use `python script.py > something.csv` to redirect
writer.writerow(headers)
for event in data.get('ResponseData', []):
    writer.writerow([event.get(key) for key in headers])

This outputs e.g.

Details,FromDateTime,UpToDateTime,Updated
"Buss 748 trafikerar ej mellan Jovisgatan till Polhemsgatan pga vägarbete. Hållplatserna Jovisgatan och Polhemsgatan är därmed indragna. Resenärer hänvisas till alternativa linjer 751,753,758 Och 783. Detta gäller båda riktningar.
Gäller från och med 2019-10-01 kl 0:00 beräknas vara klart till 2019-12-13 kl 23:30 ",2019-10-01T00:00:00,2019-12-13T23:30:00,2019-09-26T07:40:08.47+02:00
"Hållplats Brottby trafikplats och Söderhalls trafikplats är tillfälligt flyttad till hållplats för 625 för blåbuss 676 mot Tekniska Högskolan och buss 639 mot Stockholm och 696 mot Tekniska Högskolan från och med 2019-09-23 på grund av vägarbete.
Detta planeras pågå måndag till fredag mellan kl. 21:00 - 05:00 till och med 2019-10-04.

Resande från Söderhall trafikplats riktning Stockholm hänvisas till andra sidan av motorvägen vid bussvändslingan vid pendlarslingan.
Resande från Brottby trafikplats riktning Stockholm hänvisas ombordstigning på bron över E18.",2019-09-23T21:00:00,2019-10-04T05:00:00,2019-09-22T08:44:31.783+02:00

etc.

Collectives™ on Stack Overflow

Python Text Specific Search from a Html Page

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related