There is this XML data I have which needs to be parsed and certain information should be extracted. But, there is a catch when I am trying to extract the name field from the xml using beautifulSoup.
- Issue 1: I get the name of its parent as attribute-item instead of the data from the name field as "priority"
- Issue 2:
I also need to extract the ID from the XML which is
<attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
I am using BeautifulSoup as the standard approach and can't change to any other package. Hence, workaround using the same would be much appreciated.
below is the XML data: data highlighted in bold requires to be extracted.
<configurations>
<attributes-configuration>
<attributes>
<attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
<name>priority</name>
<description>priority of a requirement</description>
<customization-element>mydata.core.customization.requirements</customization-element>
<attribute-type>mydata.attribute_type.list</attribute-type>
<options>
<option>
<key>DEFAULT_LIST</key>
<value class="java.lang.String"> high,low,medium</value>
</option>
<option>
<key>LIST_TYPE</key>
<value class="java.lang.String">CUSTOM</value>
</option>
</options>
<editable>true</editable>
<userDefined>true</userDefined>
<internal>false</internal>
</attribute-item>
<attribute-item id="mydata.core.customization.teststep.prerequisite">
<name>Prerequisite</name>
<description>User Defined Attribute</description>
<customization-element>mydata.core.customization.teststep</customization-element>
<attribute-type>mydata.attribute_type.string</attribute-type>
<options>
<option>
<key>DEFAULT_VALUE</key>
<value/>
</option>
<option>
<key>MAX_CHARACTERS</key>
<value class="java.lang.String">5000</value>
</option>
</options>
<editable>true</editable>
<userDefined>true</userDefined>
<internal>false</internal>
</attribute-item>
</attributes>
</attributes-configuration>
<test-management/>
</configurations>
Below is my python Code:
import os
from bs4 import BeautifulSoup as bs
fileName = 'Configuration.xml'
fullFile = os.path.abspath(os.path.join('DataTransporter', fileName))
attributeList = []
with open(fullFile) as f:
soup = bs(f, 'xml')
for attribData in soup.find_all('attribute-item'):
dat = {
'attribName' : attribData.name,
'attribDesc' : attribData.description.text,
'attribValue' : attribData.options.value.text,
}
attributeList.append(dat)
#for attribParams in soup.find_all(name = 'value'):
#newdict[attribName.text] = attribParams.text
print(attributeList)
My Output:
[{'attribName': 'attribute-item', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'attribute-item', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]
Expected output:
[{'attribName': 'priority', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'prerequisite', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]