0

I am working on a new Python script to parse XML out and am having troubles navigating to the right index. The script gets data from a .csv and converts the XML per row into a string, and I need to extract from that string. All the code I have tried comes up empty. There are only 4 pieces of information I need (marked by ****). Under 'Hotel Reservation ID' I am trying to grab ResID_Value and ResID_Source for both entries. Under 'TimeSpan' I am trying to get both 'Start' and 'End' but am having no luck. I have tried using indexes and navigating using root/OTA_HotelResModifyRQ/HotelResModifies/HotelResModify. Here is the XML:

<soapns:Envelope xmlns:soapns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns="http://www.opentravel.org/OTA/2003/05">
   <soapns:Body>
      <OTA_HotelResModifyRQ xsi:schemaLocation="http://www.opentravel.org/OTA/2003/05 OTA_HotelResModifyRQ.xsd" TimeStamp="2021-04-01T05:00:23+00:00" Target="Production" Version="2.001" ResStatus="Commit" SequenceNmbr="1" TransactionIdentifier="xxxxxx" TransactionStatusCode="End" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opentravel.org/OTA/2003/05">
         <POS>
            <Source>
               <RequestorID Type="13" ID="WWWBC" ID_Context="xxxxxx" URL="xxxxxx"/>
            </Source>
         </POS>
         <HotelResModifies>
            <HotelResModify>
               <UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyX"/>
               <UniqueID Type="14" ID="xxxxxx" ID_Context="CompanyY" Instance="1"/>
               <RoomStays>
                  <RoomStay IndexNumber="104">
                     <RoomTypes>
                        <RoomType RoomTypeCode="32458814">
                           <RoomDescription Name="Deluxe Double or Twin Room with Mountain View">
                              <Text>This modern room is on the fifth or sixth floor and  offers a private balcony overlooking the mountains. It includes a flat-screen TV, a DVD player and a minibar. The bathroom has free toiletries, a shower and a hairdryer.</Text>
                           </RoomDescription>
                           <Amenities>
                              <Amenity>Minibar</Amenity>
                              <Amenity>Shower</Amenity>
                              <Amenity>Bath</Amenity>
                              <Amenity>Safety Deposit Box</Amenity>
                           </Amenities>
                        </RoomType>
                     </RoomTypes>
                     <RatePlans>
                        <RatePlan>
                           <Commission>
                              <CommissionPayableAmount Amount="832" DecimalPlaces="1" CurrencyCode="OMR"/>
                           </Commission>
                        </RatePlan>
                     </RatePlans>
                     <RoomRates>
                        <RoomRate EffectiveDate="2017-03-12" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-12" ExpireDate="2017-03-13">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                        <RoomRate EffectiveDate="2017-03-13" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-13" ExpireDate="2017-03-14">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                        <RoomRate EffectiveDate="2017-03-14" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-14" ExpireDate="2017-03-15">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                        <RoomRate EffectiveDate="2017-03-15" RatePlanCode="1431301">
                           <Rates>
                              <Rate EffectiveDate="2017-03-15" ExpireDate="2017-03-16">
                                 <Base AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                                 <Total AmountBeforeTax="xxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                              </Rate>
                           </Rates>
                        </RoomRate>
                     </RoomRates>
                     <GuestCounts>
                        <GuestCount Count="2" AgeQualifyingCode="10"/>
                     </GuestCounts>
    **************** <TimeSpan Start="2017-03-12" End="2017-03-16"/>
                     <Total AmountBeforeTax="xxxxx" DecimalPlaces="2" CurrencyCode="OMR"/>
                     <BasicPropertyInfo HotelCode="xxxxx"/>
                     <ResGuestRPHs>
                        <ResGuestRPH RPH="1"/>
                     </ResGuestRPHs>
                     <SpecialRequests>
                        <SpecialRequest Name="smoking preference">
                           <Text>Non-Smoking</Text>
                        </SpecialRequest>
                     </SpecialRequests>
                  </RoomStay>
               </RoomStays>
               <ResGuests>
                  <ResGuest ResGuestRPH="1">
                     <Profiles>
                        <ProfileInfo>
                           <Profile ProfileType="1">
                              <Customer>
                                 <PersonName>
                                   <GivenName>francois</GivenName>
                                    <Surname>maire</Surname>
                                 </PersonName>
                              </Customer>
                           </Profile>
                       </ProfileInfo>
                     </Profiles>
                     <GuestCounts>
                        <GuestCount Count="2"/>
                     </GuestCounts>
                  </ResGuest>
               </ResGuests>
               <ResGlobalInfo>
                  <Comments>
                     <Comment ParagraphNumber="1">
                        <Text>** Genius Booker You have a booker that prefers communication by email</Text>
                     </Comment>
                  </Comments>
                  <Total AmountBeforeTax="52000" DecimalPlaces="2" CurrencyCode="OMR"/>
                  <HotelReservationIDs>
   ****************  <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyX" ResID_Type="14" ResID_SourceContext="324588"/>
   ****************  <HotelReservationID ResID_Value="xxxxxx" ResID_Date="2016-12-10T06:13:26" ResID_Source="CompanyY" ResID_Type="14"/>
                  </HotelReservationIDs>
                  <Profiles>
                     <ProfileInfo>
                        <UniqueID Type="5" ID="xxxxx"/>
                        <Profile ProfileType="1">
                           <Customer>
                              <PersonName>
                                 <GivenName>francois</GivenName>
                                 <Surname>maire</Surname>
                              </PersonName>
                              <Address>
                                 <AddressLine>123 main st</AddressLine>
                                 <CityName>paris</CityName>
                                 <PostalCode>75016</PostalCode>
                                 <CountryName Code="FR"/>
                                 <CompanyName>[Unknown]</CompanyName>
                              </Address>
                           </Customer>
                        </Profile>
                     </ProfileInfo>
                  </Profiles>
               </ResGlobalInfo>
            </HotelResModify>
         </HotelResModifies>
      </OTA_HotelResModifyRQ>
   </soapns:Body>
</soapns:Envelope>

I have been trying with xml.Etree. I understand how to grab the data once I can point in the right direction, but how can I get that deep in the subattributes? I realize this may not be much to go off, I apologize. If you need more info, please let me know. This is my first attempt at XML parsing, any guidance would be very much appreciated!!! Here is the code I am using thus far: (Nothing is printing out, it doesn't even enter the second for loop)

import xml.etree.ElementTree as Xet
import pandas as pd

file_path = xxxx

df = pd.read_csv(file_path, usecols=['Client Content']

for i in range(len(df)):
     xml_string = df.values[i][0]
     root = Xet.fromstring(xml_string)
     for TimeSpan in root.findall('./OTA_HotelResModifyRQ/HotelResModifies/HotelResModify/RoomStays/RoomStay'):
print(TimeSpan)
5
  • 1
    Welcome to Stack Overflow. I don't see a question here. Please read Why is "Can someone help me?" not an actual question? Commented Apr 2, 2021 at 12:17
  • I have updated it for you. Commented Apr 2, 2021 at 12:27
  • 1
    Please read How to Ask, paying particular attention to the part that says, "pretend you're talking to a busy colleague". We shouldn't need to deduce your question, and it isn't usually a good idea to be snarky to people who are trying to help you for free. Make it as easy as possible for us to help you. Commented Apr 2, 2021 at 12:29
  • 1
    You say "I understand how to grab the data once I can point in the right direction, but how can I get that deep in the subattributes?" but I don't understand the problem. If you can navigate into one element, simply navigate into the next, and so on. Can you illustrate the problem by sharing your code? Commented Apr 2, 2021 at 12:33
  • 1
    I have added what I have so far. It never enters the second FOR loop. When I try to use for child in root: print(child.tag, child.attrib) all that prints is {schemas.xmlsoap.org/soap/envelope}Header{} and {schemas.xmlsoap.org/soap/envelope}Body{} Commented Apr 2, 2021 at 13:10

1 Answer 1

1

Is it possible to use lxml parser? It allows usage of XPath, which would make hob a bit easier:

from lxml import etree

# declare namespaces
ns = {'ns': 'http://www.opentravel.org/OTA/2003/05'}

# parse XML from string
root = etree.fromstring(xml)

# retrieve time span using xpath
time_span = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:RoomStays/ns:RoomStay/ns:TimeSpan', namespaces=ns)[0]
print(time_span.get('Start'))
print(time_span.get('End'))

# retrieve list of reservation ids
hotel_reservation_ids = root.xpath('//ns:OTA_HotelResModifyRQ/ns:HotelResModifies/ns:HotelResModify/ns:ResGlobalInfo/ns:HotelReservationIDs/ns:HotelReservationID', namespaces=ns)
for hotel_reservation_id in hotel_reservation_ids:
  print(hotel_reservation_id.get('ResID_Value'))
  print(hotel_reservation_id.get('ResID_Date'))
  print(hotel_reservation_id.get('ResID_Source'))
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help! I tried using this and got a ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
After encoding the string to ascii it worked like a charm, thank you SO MUCH!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.