0

There is this XML data I have which needs to be parsed and certain information should be extracted. But, there is a catch when I am trying to extract the name field from the xml using beautifulSoup.

  1. Issue 1: I get the name of its parent as attribute-item instead of the data from the name field as "priority"
  2. Issue 2: I also need to extract the ID from the XML which is <attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">

I am using BeautifulSoup as the standard approach and can't change to any other package. Hence, workaround using the same would be much appreciated.

below is the XML data: data highlighted in bold requires to be extracted.

<configurations>
   <attributes-configuration>
      <attributes>
         <attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
            <name>priority</name>
            <description>priority of a requirement</description>
            <customization-element>mydata.core.customization.requirements</customization-element>
            <attribute-type>mydata.attribute_type.list</attribute-type>
            <options>
               <option>
                  <key>DEFAULT_LIST</key>
                  <value class="java.lang.String"> high,low,medium</value>
               </option>
               <option>
                  <key>LIST_TYPE</key>
                  <value class="java.lang.String">CUSTOM</value>
               </option>
            </options>
            <editable>true</editable>
            <userDefined>true</userDefined>
            <internal>false</internal>
         </attribute-item>
         <attribute-item id="mydata.core.customization.teststep.prerequisite">
            <name>Prerequisite</name>
            <description>User Defined Attribute</description>
            <customization-element>mydata.core.customization.teststep</customization-element>
            <attribute-type>mydata.attribute_type.string</attribute-type>
            <options>
               <option>
                  <key>DEFAULT_VALUE</key>
                  <value/>
               </option>
               <option>
                  <key>MAX_CHARACTERS</key>
                  <value class="java.lang.String">5000</value>
               </option>
            </options>
            <editable>true</editable>
            <userDefined>true</userDefined>
            <internal>false</internal>
         </attribute-item>
      </attributes>
   </attributes-configuration>
   <test-management/>
</configurations>

Below is my python Code:

import os
from bs4 import BeautifulSoup  as bs  

fileName = 'Configuration.xml'
fullFile = os.path.abspath(os.path.join('DataTransporter', fileName))
attributeList = []
with open(fullFile) as f:
    soup = bs(f, 'xml')

for attribData in soup.find_all('attribute-item'):
    dat = {
            'attribName' : attribData.name,
            'attribDesc' : attribData.description.text,
            'attribValue' : attribData.options.value.text,
          }
    attributeList.append(dat)
    #for attribParams in soup.find_all(name = 'value'):
    #newdict[attribName.text] = attribParams.text
print(attributeList)

My Output:

[{'attribName': 'attribute-item', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'attribute-item', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]

Expected output:

[{'attribName': 'priority', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'prerequisite', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]
0

1 Answer 1

1

At first I thought that using attribData.name.text should do it but it seems that 'name' is some kind of a keyword attribute for attribData. In order to get the correct values you could use the findChildren(<key>) method as follows:

attribData.findChildren('name')[0].text

findChildren() returns a list that in this case only has one value so it makes sense to use [0] to get the element and then .text to get the expected value.

To get the Id you could use attribData['id']. In summary, your code would look like this (inside the for loop):

dat = {
    'attribName' : attribData.findChildren('name')[0].text,
    'id': attribData['id'],
    'attribDesc' : attribData.description.text,
    'attribValue' : attribData.options.value.text,
}

The output would look like this:

[{'attribName': 'priority', 'id': 'mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'Prerequisite', 'id': 'mydata.core.customization.teststep.prerequisite', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]

I hope it helps!

Sign up to request clarification or add additional context in comments.

1 Comment

Man this works like charm! :) perfect. Thanks a ton.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.