2

I want to collect specific information from data.xml with root[0] 'CaplockSet' contain more than 100 'Caplock' in which I need only author information to be extracted! Kindly help me with this, your support is highly appreciated.

<?xml version="1.0"?>

<CaplockSet>

<Caplock>
    <MedlineCitation Status="clonelisher" Owner="NLM">
        <PMID Version="1">32045906</PMID>
        <DateRevised>
            <Year>2020</Year>
            <Month>02</Month>
            <Day>11</Day>
        </DateRevised>
        <Article cloneModel="Print-Electronic">
            <Journal>
                <ISSN IssnType="Electronic">1423-0135</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <cloneDate>
                        <Year>2020</Year>
                        <Month>Feb</Month>
                        <Day>11</Day>
                    </cloneDate>
                </JournalIssue>
                <Title>Journal of vascular research</Title>
                <ISOAbbreviation>J. Vasc. Res.</ISOAbbreviation>
            </Journal>
            <ArticleTitle>miR-96-5p Regulates Proliferation, Migration, and Apoptosis of Vascular Smooth Muscle Cell Induced by Angiotensin II via Targeting NFAT5.</ArticleTitle>
            <Pagination>
                <MedlinePgn>1-11</MedlinePgn>
            </Pagination>
            <ELocationID EIdType="doi" ValidYN="Y">10.1159/000505457</ELocationID>
            <Abstract>
                <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Aberrant proliferation, migration, and apoptosis of vascular smooth muscle cells (VSMCs) are major pathological phenomenon in hypertension. MicroRNAs (miRNAs/miRs) serve crucial roles in the progression of hypertension. We aimed to determine the role of miR-96-5p in the proliferation, migration, and apoptosis of VSMCs and its underlying mechanisms.</AbstractText>
                <AbstractText Label="METHODS" NlmCategory="METHODS">Angiotensin II (Ang II) was employed to treat VSMCs, and the expression of miR-96-5p was detected by RT-qPCR. Then, miR-96-5p mimic was transfected into VSMCs. Cell Counting Kit-8 assay, flow cytometry, transwell assay, and wound healing assay were applied to measure proliferation, cell cycle, and migration of VSMCs. The expression of proteins associated with proliferation, migration, and apoptosis was assessed. A luciferase reporter assay was applied to confirm the target binding between miR-96-5p and nuclear factors of activated T-cells 5 (NFAT5). Subsequently, siRNA was used to silence NFAT5, and cell proliferation, migration, and apoptosis were assessed.</AbstractText>
                <AbstractText Label="RESULTS" NlmCategory="RESULTS">The results revealed that the expression of miR-96-5p was downregulated in Ang II-induced VSMCs. MiR-96-5p overexpression inhibited cell proliferation and migration but promoted cell apoptosis, enhanced the percentages of cells in the G1 and G2 phases, and reduced those in the S phase, accompanied by changes in the expression associated proteins. NFAT5 was confirmed as a direct target of miR-96-5p. NFAT5 silencing had the same results with miR-96-5p overexpression on VSMC proliferation, migration, and apoptosis, whereas miR-96-5p inhibitor reversed these effects.</AbstractText>
                <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">Our findings concluded that miR-96-5p could regulate proliferation, migration, and apoptosis of VSMCs induced by Ang II via targeting NFAT5.</AbstractText>
                <CopyrightInformation>© 2020 S. Karger AG, Basel.</CopyrightInformation>
            </Abstract>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Tian</LastName>
                    <ForeName>Long</ForeName>
                    <Initials>L</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Cai</LastName>
                    <ForeName>Dinghua</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Zhuang</LastName>
                    <ForeName>Derong</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Wenyuan</ForeName>
                    <Initials>W</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Xuan</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Bian</LastName>
                    <ForeName>Xiaoli</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Xu</LastName>
                    <ForeName>Rui</ForeName>
                    <Initials>R</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Nephrology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wu</LastName>
                    <ForeName>Guanji</ForeName>
                    <Initials>G</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Xi'an Central Hospital of Xi'an Jiaotong University, Xi'an, China, [email protected].</Affiliation>
                    </AffiliationInfo>
                </Author>
            </AuthorList>
            <Language>eng</Language>
            <clonelicationTypeList>
                <clonelicationType UI="D016428">Journal Article</clonelicationType>
            </clonelicationTypeList>
            <ArticleDate DateType="Electronic">
                <Year>2020</Year>
                <Month>02</Month>
                <Day>11</Day>
            </ArticleDate>
        </Article>
        <MedlineJournalInfo>
            <Country>Switzerland</Country>
            <MedlineTA>J Vasc Res</MedlineTA>
            <NlmUniqueID>9206092</NlmUniqueID>
            <ISSNLinking>1018-1172</ISSNLinking>
        </MedlineJournalInfo>
        <CitationSubset>IM</CitationSubset>
        <KeywordList Owner="NOTNLM">
            <Keyword MajorTopicYN="N">Migration</Keyword>
            <Keyword MajorTopicYN="N">NFAT5</Keyword>
            <Keyword MajorTopicYN="N">Proliferation</Keyword>
            <Keyword MajorTopicYN="N">Vascular smooth muscle cell</Keyword>
            <Keyword MajorTopicYN="N">miR-96-5p</Keyword>
        </KeywordList>
    </MedlineCitation>
    <CardData>
        <History>
            <CardcloneDate cloneStatus="received">
                <Year>2019</Year>
                <Month>09</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="accepted">
                <Year>2019</Year>
                <Month>12</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="entrez">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="Card">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="medline">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
        </History>
        <clonelicationStatus>aheadofprint</clonelicationStatus>
        <ArticleIdList>
            <ArticleId IdType="Card">32045906</ArticleId>
            <ArticleId IdType="pii">000505457</ArticleId>
            <ArticleId IdType="doi">10.1159/000505457</ArticleId>
        </ArticleIdList>
    </CardData>
</Caplock>


</CaplockSet>

I tried multiple ways to get away with this .py code but am facing lot of errors. I elaborated one of the method below

import xml.etree.ElementTree as ET

mytree = ET.parse('data.xml')
myroot = mytree.getroot()
for x in myroot.findall('Author'):
    lastname = x.find('LastName').text
    forename = x.find('ForeName').text
    affiliation = x.find('AffiliationInfo/Affiliation').text

    print(lastname,forename,affiliation)

Error

Traceback (most recent call last):
  File "c:/Users/jeeva/Desktop/data/program.py", line 3, in <module>
    mytree = ET.parse('data/data.xml')
  File "C:\Users\jeeva\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1202, in parse
    tree.parse(source, parser)
  File "C:\Users\jeeva\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 595, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: syntax error: line 2, column 21
2
  • “It doesn’t work” is completely unhelpful because it doesn’t give any indication where to look for problems. Please be specific about the “lots of errors” you are getting with the code and data you put in your question. Commented Feb 14, 2020 at 8:14
  • #barny thanks for acknowledging; i added error messages to the question. Commented Feb 14, 2020 at 8:26

2 Answers 2

2

Maybe this should work

def find_rec(node):
    for item in node.iter():
        if item.tag == "Author":
            author_values = {}
            for i in item.iter():
                author_values[i.tag] = i.text
            yield author_values


auth = find_rec(ET.parse('./data.xml').getroot())
for x in auth:
    print(x["LastName"], x["ForeName"], x["Affiliation"])
Sign up to request clarification or add additional context in comments.

2 Comments

i got 1.15 GB Xml file and so i wish to reduce the file size by removing every thing except Author list is it possible! if yes can you help me with that! Thanks.
@smaragadus Sorry for the late answer this code can solve the Problem: The solution based on @balderman with codecs.open("output.xml", 'wb', encoding='utf-8') as outfile: outfile.write(parseString(dicttoxml(data, custom_root='Author', attr_type=False)).toprettyxml(indent=' '*4))
0

One liner:

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0"?>
<CaplockSet>
<Caplock>
    <MedlineCitation Status="clonelisher" Owner="NLM">
        <PMID Version="1">32045906</PMID>
        <DateRevised>
            <Year>2020</Year>
            <Month>02</Month>
            <Day>11</Day>
        </DateRevised>
        <Article cloneModel="Print-Electronic">
            <Journal>
                <ISSN IssnType="Electronic">1423-0135</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <cloneDate>
                        <Year>2020</Year>
                        <Month>Feb</Month>
                        <Day>11</Day>
                    </cloneDate>
                </JournalIssue>
                <Title>Journal of vascular research</Title>
                <ISOAbbreviation>J. Vasc. Res.</ISOAbbreviation>
            </Journal>
            <ArticleTitle>miR-96-5p Regulates Proliferation, Migration, and Apoptosis of Vascular Smooth Muscle Cell Induced by Angiotensin II via Targeting NFAT5.</ArticleTitle>
            <Pagination>
                <MedlinePgn>1-11</MedlinePgn>
            </Pagination>
            <ELocationID EIdType="doi" ValidYN="Y">10.1159/000505457</ELocationID>
            <Abstract>
                <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Aberrant proliferation, migration, and apoptosis of vascular smooth muscle cells (VSMCs) are major pathological phenomenon in hypertension. MicroRNAs (miRNAs/miRs) serve crucial roles in the progression of hypertension. We aimed to determine the role of miR-96-5p in the proliferation, migration, and apoptosis of VSMCs and its underlying mechanisms.</AbstractText>
                <AbstractText Label="METHODS" NlmCategory="METHODS">Angiotensin II (Ang II) was employed to treat VSMCs, and the expression of miR-96-5p was detected by RT-qPCR. Then, miR-96-5p mimic was transfected into VSMCs. Cell Counting Kit-8 assay, flow cytometry, transwell assay, and wound healing assay were applied to measure proliferation, cell cycle, and migration of VSMCs. The expression of proteins associated with proliferation, migration, and apoptosis was assessed. A luciferase reporter assay was applied to confirm the target binding between miR-96-5p and nuclear factors of activated T-cells 5 (NFAT5). Subsequently, siRNA was used to silence NFAT5, and cell proliferation, migration, and apoptosis were assessed.</AbstractText>
                <AbstractText Label="RESULTS" NlmCategory="RESULTS">The results revealed that the expression of miR-96-5p was downregulated in Ang II-induced VSMCs. MiR-96-5p overexpression inhibited cell proliferation and migration but promoted cell apoptosis, enhanced the percentages of cells in the G1 and G2 phases, and reduced those in the S phase, accompanied by changes in the expression associated proteins. NFAT5 was confirmed as a direct target of miR-96-5p. NFAT5 silencing had the same results with miR-96-5p overexpression on VSMC proliferation, migration, and apoptosis, whereas miR-96-5p inhibitor reversed these effects.</AbstractText>
                <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">Our findings concluded that miR-96-5p could regulate proliferation, migration, and apoptosis of VSMCs induced by Ang II via targeting NFAT5.</AbstractText>
                <CopyrightInformation>© 2020 S. Karger AG, Basel.</CopyrightInformation>
            </Abstract>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Tian</LastName>
                    <ForeName>Long</ForeName>
                    <Initials>L</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Cai</LastName>
                    <ForeName>Dinghua</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Zhuang</LastName>
                    <ForeName>Derong</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Wenyuan</ForeName>
                    <Initials>W</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Xuan</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Bian</LastName>
                    <ForeName>Xiaoli</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Xu</LastName>
                    <ForeName>Rui</ForeName>
                    <Initials>R</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Nephrology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wu</LastName>
                    <ForeName>Guanji</ForeName>
                    <Initials>G</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Xi'an Central Hospital of Xi'an Jiaotong University, Xi'an, China, [email protected].</Affiliation>
                    </AffiliationInfo>
                </Author>
            </AuthorList>
            <Language>eng</Language>
            <clonelicationTypeList>
                <clonelicationType UI="D016428">Journal Article</clonelicationType>
            </clonelicationTypeList>
            <ArticleDate DateType="Electronic">
                <Year>2020</Year>
                <Month>02</Month>
                <Day>11</Day>
            </ArticleDate>
        </Article>
        <MedlineJournalInfo>
            <Country>Switzerland</Country>
            <MedlineTA>J Vasc Res</MedlineTA>
            <NlmUniqueID>9206092</NlmUniqueID>
            <ISSNLinking>1018-1172</ISSNLinking>
        </MedlineJournalInfo>
        <CitationSubset>IM</CitationSubset>
        <KeywordList Owner="NOTNLM">
            <Keyword MajorTopicYN="N">Migration</Keyword>
            <Keyword MajorTopicYN="N">NFAT5</Keyword>
            <Keyword MajorTopicYN="N">Proliferation</Keyword>
            <Keyword MajorTopicYN="N">Vascular smooth muscle cell</Keyword>
            <Keyword MajorTopicYN="N">miR-96-5p</Keyword>
        </KeywordList>
    </MedlineCitation>
    <CardData>
        <History>
            <CardcloneDate cloneStatus="received">
                <Year>2019</Year>
                <Month>09</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="accepted">
                <Year>2019</Year>
                <Month>12</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="entrez">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="Card">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="medline">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
        </History>
        <clonelicationStatus>aheadofprint</clonelicationStatus>
        <ArticleIdList>
            <ArticleId IdType="Card">32045906</ArticleId>
            <ArticleId IdType="pii">000505457</ArticleId>
            <ArticleId IdType="doi">10.1159/000505457</ArticleId>
        </ArticleIdList>
    </CardData>
</Caplock>
</CaplockSet>'''

root = ET.fromstring(xml)
data = [{'Affiliation':a.find('AffiliationInfo/Affiliation').text,'ForeName': a.find('ForeName').text,'LastName': a.find('LastName').text} for a in root.findall('.//Author')]

1 Comment

sorry for delay in response and thanks for your help. i change .xml to .py and added the code (SyntaxError: Non-UTF-8 code starting with '\xef' in file c:/Users/jeeva/Desktop/data/data.py on line 37561, but no encoding declared; see python.org/dev/peps/pep-0263 for details) [sorry to bother you, i am new to code and its hard to understand error, Kindly help me with it ]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.