0

I'm trying to read this file if "TypeOfVessel" value is not null. list will be read if have "TypeOfVessel" value. Please see my code below. any suggestion please. Thanks

<ArrayOfConsolidatedList xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/">
    <ConsolidatedList>
        <RegimeName>Test1</RegimeName>
        <Subsidiaries i:nil="true" />
        <TonnageOfVessel i:nil="true" />
        <TownOfBirth i:nil="true" />
        <TypeOfVessel i:nil="true" />
    </ConsolidatedList>
    <ConsolidatedList>
        <RegimeName>Test2</RegimeName>
        <Subsidiaries i:nil="true"/>
        <TonnageOfVessel>841</TonnageOfVessel>
        <TownOfBirth i:nil="true"/>
        <TypeOfVessel>Bunkering Vessel</TypeOfVessel>
    </ConsolidatedList>
</ArrayOfConsolidatedList>

Python code:

import xml.etree.ElementTree as ET
import inspect

def ListParse():
    tree = ET.parse('ListRead.xml')
    root = tree.getroot()
    all_entity_entries = root.find("{http://schemas.datacontract.org/2004/07/}ArrayOfConsolidatedList")
    for entry in all_entity_entries:                                                
        RegimeName = entry.find('RegimeName').text
        TonnageOfVessel = entry.find('TonnageOfVessel')
        TypeOfVessel = entry.find('TypeOfVessel')
        print(TypeOfVessel)
            
ListParse()
6
  • What should be the output? What is your current problem? Commented Jun 29, 2021 at 15:42
  • I want all value in this list all_entity_entries if have "TypeOfVessel" value. I'm getting "all_entity_entries" variable is null . Thanks Commented Jun 29, 2021 at 15:44
  • ListParse() does not return any value. Change the code and make it return what you need. Commented Jun 29, 2021 at 15:45
  • I will some database work with this TypeOfVessel value and then return data from database. that portion I didn't include here. Thanks Commented Jun 29, 2021 at 15:49
  • Based on the XML you have attached, what is the expected output? Add it to the post. Why do you have matches and country_list in the code? Clean the code, share the expected output and explain what is the problem Commented Jun 29, 2021 at 15:51

1 Answer 1

1
import xml.etree.ElementTree as ET


def ListParse():
    root = ET.parse('ListRead.xml')
    vessels_entries = root.findall("{http://schemas.datacontract.org/2004/07/}ConsolidatedList")
    for vessel_entry in vessels_entries:
        RegimeName = vessel_entry.find("{http://schemas.datacontract.org/2004/07/}RegimeName").text
        TypeOfVessel = vessel_entry.find("{http://schemas.datacontract.org/2004/07/}TypeOfVessel")
        TypeOfVessel_is_missing = TypeOfVessel.attrib.get("{http://www.w3.org/2001/XMLSchema-instance}nil", "false")
        print(RegimeName)
        print("missing" if TypeOfVessel_is_missing == "true" else "available")


ListParse()

outputs :

Test1
missing
Test2
available


EDIT: in comments you indicated that you don't want to have all the data in memory. Thus, you should use event-based parsing instead of tree-parsing, and use Python generators. Here is an example :

import xml.etree.ElementTree as ET


def get_vessels_with_non_null_type():
    with open("ListRead.xml", "rb") as xml_file:
        parser = ET.XMLPullParser(["end"])  # we are only interested in the end of tags
        # now we read the file by chunk (deliberately low for example purposes)
        chunk_size = 10
        while True:
            chunk = xml_file.read(chunk_size)
            if chunk == b"":
                break  # end-of-file
            else:
                parser.feed(chunk)
            # the parser received a few more bytes, let's see if there is new vessels
            new_events = parser.read_events()
            for event_name, element in new_events:
                # we have to check the tag of the element that has just finished parsing for the one we are interested in
                if element.tag == "{http://schemas.datacontract.org/2004/07/}ConsolidatedList":
                    # and we want to filter the ones which do not have a value for TypeOfVessel
                    TypeOfVessel = element.find("{http://schemas.datacontract.org/2004/07/}TypeOfVessel")
                    TypeOfVessel_is_missing = TypeOfVessel.attrib.get("{http://www.w3.org/2001/XMLSchema-instance}nil", "false")
                    if TypeOfVessel_is_missing == "false":
                        yield element


def do_something_with_a_vessel(vessel_entry):
    RegimeName = vessel_entry.find("{http://schemas.datacontract.org/2004/07/}RegimeName").text
    TypeOfVessel = vessel_entry.find("{http://schemas.datacontract.org/2004/07/}TypeOfVessel").text
    print(RegimeName, TypeOfVessel)


for vessel_entry in get_vessels_with_non_null_type():
    do_something_with_a_vessel(vessel_entry)

output : just Test2 Bunkering Vessel

This reduces the memory footprint to a near minimum.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your answer. is it possible to have only available data in TypeOfVessel . I to want get data where TypeOfVessel data is available. xml file is so large so want only TypeOfVessel data list. here findall only pull data if TypeOfVessel data available. vessels_entries = root.findall("{schemas.datacontract.org/2004/07}ConsolidatedList")
@Letoncse simply print(TypeOfVessel.text), you will have a None when it is missing, and Bunkering Vessel when it is available.
is it possible to get all vessels_entries only TypeOfVessel value is available ? I don't want to pull out all because it will be thousand of records. Thanks vessels_entries = root.findall("{schemas.datacontract.org/2004/07}ConsolidatedList")
@Letoncse I updated my answer to add a solution based on streaming : it will read the file a chunk at the time, produce events when an element is ready to be inspected, which is ignored if it does not match, so that only ConsolidatedList with non-null TypeOfVessel get yielded, one by one, when requested.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.