2

I try to extract text from specific nodes. I want from all personthe id value and the age. In person 10 the age would be 30 as can be seen at the attribute text with name="age". However, I end up receiving an error (see below for my code and the resulting error), that there is no text existing and I don't understand why.

I've used the same code for an almost identical structure before already and it worked without and issue. I'd be really glad if someone could give me a hint on what's causing the problem.

The XML style:

<population desc="Switzerland Baseline">
   <person id="10">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >30</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >no</attribute>
            <attribute name="home_x" class="java.lang.Double" >2679482.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1237545.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >374775</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >281604</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000137</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240012081086</attribute>
        </attributes>
        <plan score="-9.025277777777776" selected="yes">
            <activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" end_time="07:50:56" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->

    <person id="100">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >3</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >false</attribute>
            <attribute name="hasLicense" class="java.lang.String" >no</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >true</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >false</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >324961</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >-1</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >true</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000049</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240013385042</attribute>
        </attributes>
        <plan score="0.0" selected="no">
            <activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
            </activity>
        </plan>

        <plan score="0.0" selected="yes">
            <activity type="home" link="362038" facility="home27" x="2678781.0" y="1237314.0" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->

    <person id="1000">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >48</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >yes</attribute>
            <attribute name="home_x" class="java.lang.Double" >2678966.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1235785.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >137604</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >496052</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000745</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240009138483</attribute>
        </attributes>
        <plan score="-437.00166666666667" selected="yes">
            <activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="05:33:00" >
            </activity>
            <leg mode="transit_walk" dep_time="07:15:00" trav_time="00:01:01">
                <route type="generic" start_link="812194" end_link="588385" trav_time="00:01:01" distance="73.45759253010056"></route>
            </leg>
            <activity type="pt interaction" link="588385" x="2682500.5564242266" y="1246491.125064118" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="07:16:01" trav_time="00:13:58">
                <route type="enriched_pt" start_link="588385" end_link="368678" trav_time="00:13:58" distance="8378.187255109851">{"inVehicleTime":420.0,"transferTime":418.7853395582497,"accessStopIndex":4,"egressStopindex":5,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05362"}</route>
            </leg>
            <activity type="pt interaction" link="368678" x="2685173.595399507" y="1238953.4179927576" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="07:30:00" trav_time="00:01:10">
                <route type="generic" start_link="368678" end_link="812077" trav_time="00:01:10" distance="82.96796919207021"></route>
            </leg>
            <activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="15:52:43" >
            </activity>
            <leg mode="outside" dep_time="15:52:43" trav_time="00:00:00">
                <route type="generic" start_link="812077" end_link="812077" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="outside" link="812077" facility="outside_6" x="2685153.844294359" y="1239014.106373788" end_time="16:59:00" >
            </activity>
            <leg mode="transit_walk" dep_time="16:59:00" trav_time="01:42:47">
                <route type="generic" start_link="812077" end_link="555704" trav_time="01:42:47" distance="7401.037993401233"></route>
            </leg>
            <activity type="outside" link="555704" facility="outside_7" x="2690699.2533230074" y="1240302.4760125757" end_time="17:07:39" >
            </activity>
            <leg mode="access_walk" dep_time="17:07:39" trav_time="00:33:33">
                <route type="generic" start_link="555704" end_link="348266" trav_time="00:33:33" distance="2415.2684761259893"></route>
            </leg>
            <activity type="pt interaction" link="348266" x="2688841.9870530544" y="1240253.9986282045" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="17:41:12" trav_time="00:10:48">
                <route type="enriched_pt" start_link="348266" end_link="166875" trav_time="00:10:48" distance="3166.770768054601">{"inVehicleTime":420.0,"transferTime":228.0,"accessStopIndex":0,"egressStopindex":10,"transitRouteId":"02828_023","transitLineId":"VZO_line961","departureId":"125106"}</route>
            </leg>
            <activity type="pt interaction" link="166875" x="2687161.005729228" y="1240076.9559941967" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="17:52:00" trav_time="00:00:21">
                <route type="generic" start_link="166875" end_link="771010" trav_time="00:00:21" distance="25.959922652207396"></route>
            </leg>
            <activity type="pt interaction" link="771010" x="2687180.6471416447" y="1240073.3528400902" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="17:52:21" trav_time="00:19:38">
                <route type="enriched_pt" start_link="771010" end_link="955474" trav_time="00:19:38" distance="9742.201043728513">{"inVehicleTime":960.0,"transferTime":218.36673112316203,"accessStopIndex":1,"egressStopindex":7,"transitRouteId":"19622_002","transitLineId":"SBB_S16_8503016-8503103","departureId":"06187"}</route>
            </leg>
            <activity type="pt interaction" link="955474" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="18:12:00" trav_time="00:00:00">
                <route type="generic" start_link="955474" end_link="955504" trav_time="00:00:00" distance="0.0"></route>
            </leg>
            <activity type="pt interaction" link="955504" x="2683187.8521402166" y="1248065.21559948" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="18:12:00" trav_time="00:07:00">
                <route type="enriched_pt" start_link="955504" end_link="4223" trav_time="00:07:00" distance="3304.5168456795577">{"inVehicleTime":120.0,"transferTime":300.0,"accessStopIndex":2,"egressStopindex":3,"transitRouteId":"18221_002","transitLineId":"SBB_S2_8503016-8503225","departureId":"05406"}</route>
            </leg>
            <activity type="pt interaction" link="4223" x="2681934.8161827456" y="1247302.7661533705" max_dur="00:00:00" >
            </activity>
            <leg mode="transit_walk" dep_time="18:19:00" trav_time="00:00:59">
                <route type="generic" start_link="4223" end_link="586407" trav_time="00:00:59" distance="71.92245024668337"></route>
            </leg>
            <activity type="pt interaction" link="586407" x="2681990.0107938214" y="1247298.9705903793" max_dur="00:00:00" >
            </activity>
            <leg mode="pt" dep_time="18:19:59" trav_time="01:01:00">
                <route type="enriched_pt" start_link="586407" end_link="617712" trav_time="01:01:00" distance="15771.43292404094">{"inVehicleTime":1920.0,"transferTime":1740.0646247944242,"accessStopIndex":0,"egressStopindex":19,"transitRouteId":"07744_004","transitLineId":"PAG_line236","departureId":"77196"}</route>
            </leg>
            <activity type="pt interaction" link="617712" x="2679299.97008475" y="1237575.0077440983" max_dur="00:00:00" >
            </activity>
            <leg mode="egress_walk" dep_time="19:21:00" trav_time="00:15:42">
                <route type="generic" start_link="617712" end_link="360294" trav_time="00:15:42" distance="1130.0689845763227"></route>
            </leg>
            <activity type="outside" link="360294" facility="outside_3" x="2678575.5094664157" y="1237094.5796047896" end_time="17:53:00" >
            </activity>
        </plan>

    </person>

<!-- ====================================================================== -->

    <person id="1000157">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >52</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_ALL</attribute>
            <attribute name="carAvail" class="java.lang.String" >always</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >yes</attribute>
            <attribute name="home_x" class="java.lang.Double" >2695732.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1259962.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >275258</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >212563</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >true</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201202300043212</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240010759877</attribute>
        </attributes>
        <plan score="-1.7305555555555556" selected="yes">
            <activity type="outside" link="557064" facility="outside_8" x="2691803.987049347" y="1253846.2689263367" end_time="07:04:33" >
            </activity>
        </plan>

    </person>
</population>

My code:

import xml.etree.ElementTree as ET
import pandas as pd
import gzip


tree = ET.parse(gzip.open('STORAGE/500/1/output_plans.xml.gz', 'r'))

root = tree.getroot()
rows = []
for it in root.iter('person'):
    id = it.attrib['id']
    age = it.find('attributes/attribute[@name="age"]').text 
    rows.append([id, age])

d = pd.DataFrame(rows, columns=['id', 'age'])

The error:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-badcde9dbf74> in <module>
      8 for it in root.iter('person'):
      9     id = it.attrib['id']
---> 10     age = it.find('attributes/attribute[@name="age"]').text
     11     rows.append([id, age])
     12 

AttributeError: 'NoneType' object has no attribute 'text'
1
  • Your code runs just fine and produced the right result... try unzipping this file manually and try again Commented May 26, 2020 at 14:19

2 Answers 2

1

Consider migrating all attributes!

rows = []
for it in root.iter('person'):
    attribute = it.find('attributes')

    id_dict = {'id':it.attrib['id']}
    attrs_dict = {a.attrib['name']:a.text for a in attribute.findall('attribute')}

    # MERGE DICTIONARIES (ONLY WORKS Python 3.5+)
    rows.append({**id_dict, **attrs_dict})

d = pd.DataFrame(rows)

print(d)    
#         id age bikeAvailability carAvail employed  ... ptHasVerbund sex spRegion statpopHouseholdId  statpopPersonId
# 0       10  30         FOR_SOME    never     true  ...        false   f        1    201200010000137  201240012081086
# 1      100   3         FOR_SOME    never    false  ...         true   f        1    201200010000049  201240013385042
# 2     1000  48         FOR_SOME    never     true  ...        false   f        1    201200010000745  201240009138483
# 3  1000157  52          FOR_ALL   always     true  ...         true   f        1    201202300043212  201240010759877

Alternatively with nested list/dict comprehension!

attrs_list = [{**{'id':it.attrib['id']}, **{a.attrib['name']:a.text 
                    for a in it.find('attributes').findall('attribute')}} 
                    for it in root.iter('person')]

d = pd.DataFrame(attrs_list)

print(d)
#         id age bikeAvailability carAvail employed hasLicense  ... ptHasStrecke ptHasVerbund sex spRegion statpopHouseholdId  statpopPersonId
# 0       10  30         FOR_SOME    never     true         no  ...        false        false   f        1    201200010000137  201240012081086
# 1      100   3         FOR_SOME    never    false         no  ...         true         true   f        1    201200010000049  201240013385042
# 2     1000  48         FOR_SOME    never     true        yes  ...        false        false   f        1    201200010000745  201240009138483
# 3  1000157  52          FOR_ALL   always     true        yes  ...        false         true   f        1    201202300043212  201240010759877
Sign up to request clarification or add additional context in comments.

2 Comments

my god.. I wish i had known that before. I wasted so much time^^ thanks a lot! that's a brilliant solution.
Haha...no problem. Glad to help. Happy coding!
1

See below (it works)

Look at: it.find("attributes/attribute[@name='age']") and see the difference

import xml.etree.ElementTree as ET


xml = '''<population desc="Switzerland Baseline">
   <person id="10">
        <attributes>
            <attribute name="age" class="java.lang.Integer" >30</attribute>
            <attribute name="bikeAvailability" class="java.lang.String" >FOR_SOME</attribute>
            <attribute name="carAvail" class="java.lang.String" >never</attribute>
            <attribute name="employed" class="java.lang.Boolean" >true</attribute>
            <attribute name="hasLicense" class="java.lang.String" >no</attribute>
            <attribute name="home_x" class="java.lang.Double" >2679482.0</attribute>
            <attribute name="home_y" class="java.lang.Double" >1237545.0</attribute>
            <attribute name="isCarPassenger" class="java.lang.Boolean" >false</attribute>
            <attribute name="isOutside" class="java.lang.Boolean" >true</attribute>
            <attribute name="mzHeadId" class="java.lang.Long" >374775</attribute>
            <attribute name="mzPersonId" class="java.lang.Long" >281604</attribute>
            <attribute name="ptHasGA" class="java.lang.Boolean" >true</attribute>
            <attribute name="ptHasHalbtax" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasStrecke" class="java.lang.Boolean" >false</attribute>
            <attribute name="ptHasVerbund" class="java.lang.Boolean" >false</attribute>
            <attribute name="sex" class="java.lang.String" >f</attribute>
            <attribute name="spRegion" class="java.lang.Integer" >1</attribute>
            <attribute name="statpopHouseholdId" class="java.lang.Long" >201200010000137</attribute>
            <attribute name="statpopPersonId" class="java.lang.Long" >201240012081086</attribute>
        </attributes>
        <plan score="-9.025277777777776" selected="yes">
            <activity type="home" link="270549" facility="home4" x="2679482.0" y="1237545.0" end_time="07:50:56" >
            </activity>
        </plan>

    </person>
</population>'''

root = ET.fromstring(xml)

rows = []
for it in root.iter('person'):
    id = it.attrib['id']
    age = it.find("attributes/attribute[@name='age']").text 
    rows.append([id, age])

print(rows)

4 Comments

Thank you for checking it. Stupid question; the " or ' shouldn't make a difference, right?`
I think in this special case it does make a difference. See effbot.org/zone/element-xpath.htm
If you think my answer solved your problem - feel free to vote up.
^^ thank you, I was still running it. the entire file is huge.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.