1

This is my input file

<datasource formatted-name='federated.1819qwi0hys5391dzxhl70o95li4' inline='true' source-platform='win' version='18.1' xmlns:user='http://www.tableausoftware.com/xml/user'>
  <connection class='federated'>
    <named-connections>
      <named-connection caption='Sample - Superstore' name='excel.1ew9u4t0tggb9315darmm0nfz2kb'>
        <connection class='excel' driver='' filename='C:/Users/XXXX/Downloads/Sample - Superstore.xls' header='yes' imex='1' password='' server='' />
      </named-connection>
    </named-connections>
    <relation connection='excel.1ew9u4t0tggb9315darmm0nfz2kb' name='Custom SQL Query' type='text'>SELECT [Orders$].[Category] AS [Category],&#13;&#10;  [Orders$].[City] AS [City],&#13;&#10;  [Orders$].[Country] AS [Country],&#13;&#10;  [Orders$].[Customer ID] AS [Customer ID],&#13;&#10;  [Orders$].[Customer Name] AS [Customer Name],&#13;&#10;  [Orders$].[Discount] AS [Discount],&#13;&#10;  [Orders$].[Profit] AS [Profit],&#13;&#10;  [Orders$].[Quantity] AS [Quantity],&#13;&#10;  [Orders$].[Region] AS [Region],&#13;&#10;  [Orders$].[State] AS [State],&#13;&#10;  [People$].[Person] AS [Person],&#13;&#10;  [People$].[Region] AS [Region (People)]&#13;&#10;FROM [Orders$]&#13;&#10;  INNER JOIN [People$] ON [Orders$].[Region] = [People$].[Region]</relation>
    <metadata-records>
      <metadata-record class='column'>
        <remote-name>Category</remote-name>
        <remote-type>130</remote-type>
        <local-name>[Category]</local-name>
        <parent-name>[Custom SQL Query]</parent-name>
        <remote-alias>Category</remote-alias>
        <ordinal>1</ordinal>
        <local-type>string</local-type>
        <aggregation>Count</aggregation>
        <contains-null>true</contains-null>
        <collation>LEN_RUS_S2_WO</collation>
        <attributes>
          <attribute datatype='string' name='DebugRemoteType'>&quot;WSTR&quot;</attribute>
        </attributes>
      </metadata-record>

I want to get the attribute tag . I Have tried

for x in xmlRoot.findall('./metadata-record'):
            sqlString=x.find('attribute').text

but im getting only space as result. I have changed all the possible combinations in findall , still not able to get the result . I want to read that attribute tag dynamically and write in the output file as same . I have retrived the other tags from metadata-record but this alone not working. Can some one help ??

My expected output is

<metadata-records>
      <metadata-record class='column'>
        <remote-name>Category</remote-name>
        <remote-type>130</remote-type>
        <local-name>[Category]</local-name>
        <parent-name>[Custom SQL Query]</parent-name>
        <remote-alias>Category</remote-alias>
        <ordinal>1</ordinal>
        <local-type>string</local-type>
        <aggregation>Count</aggregation>
        <contains-null>true</contains-null>
        <collation>LEN_RUS_S2_WO</collation>
        <attributes>
          <attribute datatype='string' name='DebugRemoteType'>&quot;WSTR&quot;</attribute>
        </attributes>
      </metadata-record>

I have retrieved till collation tag but do not know how to get the attributes tag. Can someone help??

Thanks, Aarush

1
  • 1
    Please explain a little further: you are looking for the <attribute> tag, but how does that combine with your expected output? Commented Jul 15, 2020 at 17:35

2 Answers 2

2

Using xml.etree.ElementTree, you can try something like this:

import xml.etree.ElementTree as ET

xmlRoot = ET.fromstring(xml)
print(''.join([ET.tostring(x, encoding="unicode") for x in xmlRoot.findall('.//metadata-records//*')]))

Where xml is your xml input data.

Key is the findall: It looks from the root for any subelement called metadata-records and from that it just looks for any element.

The double forward slash // makes sure not only direct children are found, but any descendant of the metadata-records element. That is why you did find the <attributes> element (child), but failed to find the <attribute> element (child of child)

Sign up to request clarification or add additional context in comments.

Comments

1

Fix XML

First, I would fix the input file. It is not a good xml as it is missing some closing tags.

I fixed it for you here

<datasource formatted-name='federated.1819qwi0hys5391dzxhl70o95li4' inline='true' source-platform='win' version='18.1' xmlns:user='http://www.tableausoftware.com/xml/user'>
  <connection class='federated'>
    <named-connections>
      <named-connection caption='Sample - Superstore' name='excel.1ew9u4t0tggb9315darmm0nfz2kb'>
        <connection class='excel' driver='' filename='C:/Users/XXXX/Downloads/Sample - Superstore.xls' header='yes' imex='1' password='' server='' />
      </named-connection>
    </named-connections>
    <relation connection='excel.1ew9u4t0tggb9315darmm0nfz2kb' name='Custom SQL Query' type='text'>SELECT [Orders$].[Category] AS [Category],&#13;&#10;  [Orders$].[City] AS [City],&#13;&#10;  [Orders$].[Country] AS [Country],&#13;&#10;  [Orders$].[Customer ID] AS [Customer ID],&#13;&#10;  [Orders$].[Customer Name] AS [Customer Name],&#13;&#10;  [Orders$].[Discount] AS [Discount],&#13;&#10;  [Orders$].[Profit] AS [Profit],&#13;&#10;  [Orders$].[Quantity] AS [Quantity],&#13;&#10;  [Orders$].[Region] AS [Region],&#13;&#10;  [Orders$].[State] AS [State],&#13;&#10;  [People$].[Person] AS [Person],&#13;&#10;  [People$].[Region] AS [Region (People)]&#13;&#10;FROM [Orders$]&#13;&#10;  INNER JOIN [People$] ON [Orders$].[Region] = [People$].[Region]
    </relation>
  </connection>
    <metadata-records>
      <metadata-record class='column'>
        <remote-name>Category</remote-name>
        <remote-type>130</remote-type>
        <local-name>[Category]</local-name>
        <parent-name>[Custom SQL Query]</parent-name>
        <remote-alias>Category</remote-alias>
        <ordinal>1</ordinal>
        <local-type>string</local-type>
        <aggregation>Count</aggregation>
        <contains-null>true</contains-null>
        <collation>LEN_RUS_S2_WO</collation>
        <attributes>
          <attribute datatype='string' name='DebugRemoteType'>&quot;WSTR&quot;</attribute>
        </attributes>
      </metadata-record>
    </metadata-records>
</datasource>

Now to use minidom to traverse the XML

  1. import the minidom module from xml.dom
  2. parse the xml (I just saved it to my file system as x.xml)
  3. Get the element you are looking for with getElementsByTagName

Here is my code

from xml.dom import minidom

mydoc = minidom.parse('x.xml')

items = mydoc.getElementsByTagName('attribute')

print(items)

print(items) will print the object [<DOM Element: attribute at 0x10aad6690>] To get the values inside, you need to print the contents of this object which is a nodelist. Do this to get the value between the tags

# Traverse the childNodes of the tag
for t in items[0].childNodes:
    # if the node is a text node then print it
    if t.nodeType == t.TEXT_NODE:
        print(t.nodeValue)

One Liner

print(''.join((t.nodeValue for t in items[0].childNodes if t.nodeType == t.TEXT_NODE)))

This page really helped me get started with XML parsing Reference page

1 Comment

Thanks much for the help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.