1

I'm writing a program in Python that looks at an XML file that I get from an API and should return a list of users' initials to a list for later use. My XML file looks like this with about 60 users:

<ArrayOfuser xmlns="WebsiteWhereDataComesFrom.com" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
    <user>
        <active>true</active>
        <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
        <dept>3</dept>
        <email>useremail</email>
        <firstname>userfirstname</firstname>
        <lastname>userlastname</lastname>
        <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
        <lastupdatedby>lastupdateduserinitials</lastupdatedby>
        <loginemail>userloginemail</loginemail>
        <phone1>userphone</phone1>
        <phone2/>
        <rep>userinitials1</rep>
    </user>
    <user>
        <active>true</active>
        <datelastlogin>12/1/2022 3:31:25 PM</datelastlogin>
        <dept>5</dept>
        <email>useremail</email>
        <firstname>userfirstname</firstname>
        <lastname>userlastname</lastname>
        <lastupdated>4/8/2020 3:02:08 PM</lastupdated>
        <lastupdatedby>lastupdateduserinitials</lastupdatedby>
        <loginemail>userloginemail</loginemail>
        <phone1>userphone</phone1>
        <phone2/>
        <rep>userinitials2</rep>
    </user>
...
...
...
</ArrayOfuser>

I'm trying to use an XML parser to return the text in the <rep> tag for each user to a list. I would also love to have it sorted by date of last login, but that's not something I need and I'll just alphabetize the list if sorting by date overcomplicates this process.

The code below shows my attempt at just printing the data without saving it to a list, but the output is unexpected as shown below as well. Code I tried:

#load file
activeusers = etree.parse("activeusers.xml")

#declare namespaces
ns = {'xx': 'http://schemas.datacontract.org/2004/07/IQWebAPI.Users'}

#locate rep tag and print (saving to list once printing shows expected output)
targets = activeusers.xpath('//xx:user[xx:rep]',namespaces=ns)
for target in targets:
    print(target.attrib)

Output:

{}
{}

I'm expecting the output to look like the below codeblock. Once it looks something like that I should be able to change the print statement to instead save to a list.

{userinitials1}
{userinitials2}

I think my issue comes from what's inside my print statement with printing the attribute. I tried this with variations of target.getparent() with keys(), items(), and get() as well and they all seem to show the same empty output when printed.

EDIT: I found a post from someone with a similar problem that had been solved and the solution was to use this code but I changed filenames to suit my need:

root = (etree.parse("activeusers.xml"))
values = [s.find('rep').text for s in root.findall('.//user') if s.find('rep') is not None]
print(values)

Again, the expected output was a populated list but when printed the list is empty. I think now my issue may have to do with the fact that my document contains namespaces. For my use, I may just delete them since I don't think these will end up being required so please correct me if namespaces are more important than I realize.

SECOND EDIT: I also realized the API can send me this data in a JSON format and not just XML so that file would look like the below codeblock. Any solution that can append the text in the "rep" child of each user to a list in JSON format or XML is perfect and would be greatly appreciated since once I have this list, I will not need to use the XML or JSON file for any other use.

[
    {
        "active": true,
        "datelastlogin": "8/21/2019 9:16:30 PM",
        "dept": 3,
        "email": "useremail",
        "firstname": "userfirstname",
        "lastname": "userlastname",
        "lastupdated": "2/6/2019 11:10:29 PM",
        "lastupdatedby": "lastupdateduserinitials",
        "loginemail": "userloginemail",
        "phone1": "userphone",
        "phone2": "",
        "rep": "userinitials1"
    },
    {
        "active": true,
        "datelastlogin": "12/1/2022 3:31:25 PM",
        "dept": 5,
        "email": "useremail",
        "firstname": "userfirstname",
        "lastname": "userlastname",
        "lastupdated": "4/8/2020 3:02:08 PM",
        "lastupdatedby": "lastupdateduserinitials",
        "loginemail": "userloginemail",
        "phone1": "userphone",
        "phone2": "",
        "rep": "userinitials2"
    }
]
0

3 Answers 3

1

As this is xml with namespace, you can have like

import xml.etree.ElementTree as ET
root = ET.fromstring(xml_in_qes)
my_ns = {'root': 'WebsiteWhereDataComesFrom.com'}
myUser=[]
for eachUser in root.findall('root:user',my_ns):
    rep=eachUser.find("root:rep",my_ns)
    print(rep.text)
    myUser.append(rep.text)

note: xml_in_qes is the XML attached in this question.

('root:user',my_ns): search user in my_ns which has key root i.e WebsiteWhereDataComesFrom.com

Sign up to request clarification or add additional context in comments.

Comments

0

XML data implementation:

import xml.etree.ElementTree as ET
xmlstring = '''
<ArrayOfuser>
    <user>
        <active>true</active>
        <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
        <dept>3</dept>
        <email>useremail</email>
        <firstname>userfirstname</firstname>
        <lastname>userlastname</lastname>
        <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
        <lastupdatedby>lastupdateduserinitials</lastupdatedby>
        <loginemail>userloginemail</loginemail>
        <phone1>userphone</phone1>
        <phone2/>
        <rep>userinitials1</rep>
    </user>
    <user>
        <active>true</active>
        <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
        <dept>3</dept>
        <email>useremail</email>
        <firstname>userfirstname</firstname>
        <lastname>userlastname</lastname>
        <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
        <lastupdatedby>lastupdateduserinitials</lastupdatedby>
        <loginemail>userloginemail</loginemail>
        <phone1>userphone</phone1>
        <phone2/>
        <rep>userinitials2</rep>
    </user>
    <user>
        <active>true</active>
        <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
        <dept>3</dept>
        <email>useremail</email>
        <firstname>userfirstname</firstname>
        <lastname>userlastname</lastname>
        <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
        <lastupdatedby>lastupdateduserinitials</lastupdatedby>
        <loginemail>userloginemail</loginemail>
        <phone1>userphone</phone1>
        <phone2/>
        <rep>userinitials3</rep>
    </user>
</ArrayOfuser>
'''

user_array = ET.fromstring(xmlstring)

replist = []
for users in user_array.findall('user'):
    replist.append((users.find('rep').text))

print(replist)

Output:

['userinitials1', 'userinitials2', 'userinitials3']

JSON data implementation:

userlist = [
    {
        "active": "true",
        "datelastlogin": "8/21/2019 9:16:30 PM",
        "dept": 3,
        "email": "useremail",
        "firstname": "userfirstname",
        "lastname": "userlastname",
        "lastupdated": "2/6/2019 11:10:29 PM",
        "lastupdatedby": "lastupdateduserinitials",
        "loginemail": "userloginemail",
        "phone1": "userphone",
        "phone2": "",
        "rep": "userinitials1"
    },
    {
        "active": "true",
        "datelastlogin": "12/1/2022 3:31:25 PM",
        "dept": 5,
        "email": "useremail",
        "firstname": "userfirstname",
        "lastname": "userlastname",
        "lastupdated": "4/8/2020 3:02:08 PM",
        "lastupdatedby": "lastupdateduserinitials",
        "loginemail": "userloginemail",
        "phone1": "userphone",
        "phone2": "",
        "rep": "userinitials2"
    },
        {
        "active": "true",
        "datelastlogin": "12/1/2022 3:31:25 PM",
        "dept": 5,
        "email": "useremail",
        "firstname": "userfirstname",
        "lastname": "userlastname",
        "lastupdated": "4/8/2020 3:02:08 PM",
        "lastupdatedby": "lastupdateduserinitials",
        "loginemail": "userloginemail",
        "phone1": "userphone",
        "phone2": "",
        "rep": "userinitials3"
    }
]

replist = []
for user in userlist:
    replist.append(user["rep"])

print(replist)

Output:

['userinitials1', 'userinitials2', 'userinitials3']

Comments

0

If you like a sorted tabel of users who have last logged on you can put the parsed values into pandas:

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse("activeusers.xml")
root = tree.getroot()

namespaces = {"xmlns":"WebsiteWhereDataComesFrom.com" , "xmlns:i":"http://www.w3.org/2001/XMLSchema-instance"}

columns =["rep", "datelastlogin"]
login = []
usr = []
for user in root.findall("xmlns:user", namespaces):
    for lastlog in user.findall("xmlns:datelastlogin", namespaces):
        login.append(lastlog.text)
        
    for activ in user.findall("xmlns:rep", namespaces):
        usr.append(activ.text)
        
data = list(zip(usr, login))


df = pd.DataFrame(data, columns=columns)
df["datelastlogin"] = df["datelastlogin"].astype('datetime64[ns]')
df = df.sort_values(by='datelastlogin', ascending = False)
print(df.to_string())

Output:

             rep       datelastlogin
1  userinitials2 2022-12-01 15:31:25
0  userinitials1 2019-08-21 21:16:30

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.