0

I have a csv file where 1 of the columns of information is in XML format. I'd like to be able to parse this information into separate columns and re-save it. I am trying to do this with python, but I am not having much luck. I have looked at similar problems on stack exchange, but I am still having trouble knowing what to do.

Thank you for your help in advanced!

K

3
  • Try to extract data from XML by using "BeautifulSoup" (BS4) and rewrite in new column. Commented Jan 11, 2018 at 20:10
  • Welcome to SO. Please take the tour and read How do I ask a good question? and edit your question to provide your code including, sample input, expected and actual output, error messages. Commented Jan 11, 2018 at 20:10
  • @Piinthesky your comment was unnecessary. From the link you shared: "Not all questions benefit from including code. But if your problem is with code you've written, you should include some." User is asking how to parse XML; including code about how they have read csv files probably isn't going to add anything of value, and, even if it did, the question is not necessarily about a specific problem with specific code. Commented Jan 11, 2018 at 21:10

1 Answer 1

1

ElementTree is a python XML parser ( https://docs.python.org/2/library/xml.etree.elementtree.html )

parse the XML literals in the CSV cells as strings, then iterate through the elements and resave them :

from xml.etree.ElementTree import XML

parsed = XML('''
<root>
  <group>
    <child id="a">This is child "a".</child>
    <child id="b">This is child "b".</child>
  </group>                                     // replace this with a variable that contains your XML string literals
  <group>
    <child id="c">This is child "c".</child>
  </group>
</root>
''')

print 'parsed =', parsed

for elem in parsed:
    print elem.tag
    if elem.text is not None and elem.text.strip():
        print '  text: "%s"' % elem.text
    if elem.tail is not None and elem.tail.strip():
        print '  tail: "%s"' % elem.tail
    for name, value in sorted(elem.attrib.items()):
        print '  %-4s = "%s"' % (name, value)
    print

source :https://pymotw.com/2/xml/etree/ElementTree/parse.html#parsing-strings

alternatively you can convert the XML cells directly :

http://blog.appliedinformaticsinc.com/how-to-parse-and-convert-xml-to-csv-using-python/

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.