0

Am trying to search for a regex pattern in xml file content and finding issues on how to pass sub-string which always ends with digit (this is part which is dynamic in the xml file so, don't know how create a pattern and search).

Once pattern is found, then I need to get it's child tag items ie, attrib and text value.

xml file content :

         <author NAME="PYTHON_DD101">
             <type>BOOK</type>
             <ID>59</ID>
             <inst ID="A">Garry</inst>
             <inst ID="B">Gerald</inst>
         </author>
         <author NAME="PYTHON_ABC4">
             <type>BOOK</type>
             <SrcID>62</SrcID>
             <inst ID="A">Niel</inst>
             <inst ID="B">Long</inst>
         </author>

code :

text = "PYTHON"
tmp = '"' + text + "_ABC" + '"'
print(tmp)
#pattern = re.compile('%s\d+'%tmp)
endsWithNumber = re.compile('%s\d$'%tmp)
print(endsWithNumber)
#FoundDetails = Content.find("PYTHON_ABC4")
FoundDetails = Content.find(".//author[@NAME='{}']".format(endsWithNumber))
#regex = re.compile('%s\d+'%tmp)
#matches = regex.match(Content)
#print(matches)
print(type(Content))      
print(type(FoundDetails))
print(FoundDetails)
for FoundDetails in FoundDetails.iterfind('author'):
    author = FoundDetails.attrib['NAME']
    print 'author:', author
for inst in FoundDetails.iterfind('inst'):
    print 'inst id:', inst.attrib['ID'], 'inst name:', inst.text    

error am getting :

PYTHON_ABC
<_sre.SRE_Pattern object at 0x000000000403F168>
<class 'xml.etree.ElementTree.Element'>
<type 'NoneType'>
None
Traceback (most recent call last):
  File "C:\test_Book.py", line 45, in <module>
    bookauthor = book.get_Book_by_author(Book)
  File "C:\Book.py", line 219, in get_Book_by_author
    for FoundDetails in FoundDetails.iterfind('author'):
AttributeError: 'NoneType' object has no attribute 'iterfind'

Expected output :

inst id: A inst name: Niel
inst id: B inst name: Long

if I pass exact NAME value ie, "PYTHON_ABC4" in the below line, it works but I don't want to pass hard-code value since there maybe other instance in the file there are chance of having name with same pattern ex :"PYTHON_ABC12" that case I wanted to get those book details as well.

FoundDetails = Content.find(".//author[@NAME='{}']".format("PYTHON_ABC4"))

1 Answer 1

1

i modified your code little bit , to get the desired output, hope it helps

data='''
<PARAMETER-VALUES>
<author NAME="PYTHON_DD11">
             <type>BOOK</type>
             <ID>59</ID>
             <inst ID="A">Garry</inst>
             <inst ID="B">Gerald</inst>
         </author>
         <author NAME="PYTHON_ABC4">
             <type>BOOK</type>
             <SrcID>62</SrcID>
             <inst ID="A">Niel</inst>
             <inst ID="B">Long</inst>
         </author>
</PARAMETER-VALUES>
'''




#Element tree to parse the xml data

import xml.etree.ElementTree as ET
import re
root=ET.fromstring(data)

# A function to verify if the node is alphanumeric

def hasnumbers(result):
    return bool(re.search(r'\d', result))

for author in root.iter('author'):
    result=author.attrib.get('NAME')
    b=hasnumbers(result)
    if b==True:
        for inst in author.iterfind('inst'):
            print 'inst id:',inst.attrib.get('ID'),'inst name:',inst.text      

output

inst id: A inst name: Garry
inst id: B inst name: Gerald
inst id: A inst name: Niel
inst id: B inst name: Long
Sign up to request clarification or add additional context in comments.

1 Comment

perfect. thanks Pankaj. good approach to call "hasnumbers" sub-function. learned new idea today. Many thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.