1

I have parsed XML from website and i found that it has two branches (children),

How to Separate the two branches into two lists of dictionaries,

here's my code so far:

import pandas as pd
import xml.etree.ElementTree as ET
import requests
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {'data':'spurpyr'}
response = requests.get (url, params)
tree = response.content

#extract the root element as separate variable, and display the root tag.
root = ET.fromstring(tree)
print(root.tag)

#Get attributes of root
root_attr = root.attrib
print(root_attr)

#Finding children of root
for child in root:
    print(child.tag, child.attrib)

#extract the two children of the root element into another two separate variables, and display their tags as well
child_dict = []
for child in root:
    child_dict.append(child.tag)
    
tweets_branch = child_dict[0]
cities_branch = child_dict[1]

#the elements in the entire tree
[elem.tag for elem in root.iter()]

#specify both the encoding and decoding of the document you are displaying as the string
print(ET.tostring(root, encoding='utf8').decode('utf8'))

1 Answer 1

1

Using beautifulsoup module. To parse tweets and cities to list of dictionaries you can use this example:

import requests
from bs4 import BeautifulSoup

url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {"data": "spurpyr"}

soup = BeautifulSoup(requests.get(url, params=params).content, "xml")

tweets = []
for t in soup.select("tweets > tweet"):
    tweets.append({"id": t["id"], **{x.name: x.text for x in t.find_all()}})

cities = []
for c in soup.select("cities > city"):
    cities.append({"id": c["id"], **{x.name: x.text for x in c.find_all()}})

print(tweets)
print(cities)

Prints:

[
    {
        "id": "16620625 5686",
        "Name": "Kenyon Conley",
        "Phone": "0327 103 9485",
        "Email": "[email protected]",
        "Location": "45.5333, -73.2833",
        "GenderID": "male",
        "Tweet": "#FollowFriday @DanielleMorrill - She's with @Seattle20 and @Twilio. Also fun to talk to.  #entrepreneur",
        "City": "Saint-Basile-le-Grand",
        "Country": "Canada",
        "Age": "34",
    },
    {
        "id": "16310427-5502",
        "Name": "Griffin Norton",
        "Phone": "0306 178 7917",
        "Email": "[email protected]",
        "Location": "52.0000, 84.9833",
        "GenderID": "male",
        "Tweet": "!!!Veryy Bored!!!  ~~Craving Million's Of MilkShakes~~",
        "City": "Belokurikha",
        "Country": "Russia",
        "Age": "33",
    },

...
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.