I am parsing large projects with many thousand XML files for specific Elements and Attributes. I have managed to print all the Elements and Attributes I want but I cannot write them into a CSV Table. It would be great if I could get every occurrence of every Element/Attribute under the respective headers. The Problem is that I get "NameError: name 'X' is not defined", I do not know how to restructure, everything seemed to be working fine with my variables until I moved them to a CSV.
from logging import root
import xml.etree.ElementTree as ET
import csv
import os
path = r'C:\Users\briefe\V'
f = open('jp-elements.csv', 'w', encoding="utf-8")
writer = csv.writer(f)
writer.writerow(["Note", "Supplied", "@Certainty", "@Source"])
#opening files in folder for project
for filename in os.listdir(path):
if filename.endswith(".xml"):
fullpath = os.path.join(path, filename)
#getting the root of each file as my starting point
for file in fullpath:
tree = ET.parse(fullpath)
root = tree.getroot()
try:
for note in root.findall('.//note'):
notes = note.attrib, note.text
for supplied in root.findall(".//supplied"):
print(supplied.attrib)
for suppliedChild in supplied.findall(".//*"):
supplies = suppliedChild.tag, suppliedChild.attrib
#attribute search
for responsibility in root.findall(".//*[@resp]"):
responsibilities = responsibility.tag, responsibility.attrib, responsibility.text
for certainty in root.findall(".//*[@cert]"):
certainties = certainty.tag, certainty.attrib, certainty.text
writer.writerow([notes, supplies, responsibilities, certainties])
finally:
f.close()
As was kindly advised I am trying to save results that looked like:
{http://www.tei-c.org/ns/1.0}add {'resp': '#MB', 'status': 'unremarkable'} Nach H gedruckt IV. Abt., V, Anhang Nr.
10.
{http://www.tei-c.org/ns/1.0}date {'cert': 'medium', 'when': '1805-04-09'} 9. April 1805
I am trying to save these mixtures of tuples and dictionary items as strings into csv fields. But I get "NameError: name 'notes' is not defined" for example.
XML code example:
<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0" type="letter" xml:id="V_100">
<teiHeader>
</teiHeader>
<text>
<body>
<div type="letter">
<note type="ig">Kopie</note>
<p>Erlauben Sie mir, in Ihre Ehrenpforte noch einige Zwick<lb xml:id="V_39-7" rendition="#hyphen"/>steinchen einzuschieben. Philemon und Baucis müssen —
wenn<note corresp="#V_39-8">
<listPerson type="lineReference">
<person corresp="#JP-000228">
<persName>
<name cert="high" type="reg">Baucis</name>
</persName>
</person>
<person corresp="#JP-003214" ana="†">
<persName>
<name cert="low" type="reg">Philemon</name>
</persName>
</person>
</listPerson>
<p>
<hi rendition="#aq">Der Brief ist vielleicht nicht an den Minister Hardenberg
gerichtet,<lb/>
</p>
<lb/>
</div>
</body>
</text>
</TEI>
attrib) to every row. Ideally, scalar strings/numbers should be saved to every CSV row.<choice> <sic>cheesemakers</sic> <corr resp="#editor" cert="high">peacemakers</corr> </choice>: for they shall be called the children of God.It is all TEI conform but I am looking for many different elements and attributes - I just want the key info - tag, attributes and text all to be added as a string to a csv fieldnotesis never assigned in one iteration since itsforloop retrieves nothing. Results indicate XML may have namespaces which can vary by elements. Because of namespaces, always post at least root of XML. Your snippet can be anywhere in document.