2

The link below gives us the list of ingredients in recipelist. I would like to extract the names of the ingredient and save it to another file using python. http://stream.massey.ac.nz/file.php/6087/Eva_Material/Tutorials/recipebook.xml

So far I have tried using the following code, but it gives me the complete recipe not the names of the ingredients:

from xml.sax.handler import ContentHandler
import xml.sax
import sys
def recipeBook(): 
    path = "C:\Users\user\Desktop"
    basename = "recipebook.xml"
    filename = path+"\\"+basename
    file=open(filename,"rt")
    # find contents 
    contents = file.read()

    class textHandler(ContentHandler):
      def characters(self, ch):
      sys.stdout.write(ch.encode("Latin-1"))
    parser = xml.sax.make_parser()
    handler = textHandler( )
    parser.setContentHandler(handler)
    parser.parse("C:\Users\user\Desktop\\recipebook.xml")



  file.close()

How do I extract the name of each ingredient and save them to another file?

3
  • 1
    Unrelated: You should look at os.path and raw strings rather than dealing with windows paths like that. Commented May 7, 2012 at 1:35
  • 1
    Put the XML string as example in the question; it is easier to give specific answer when we can see what data you are working with. Your URL is inaccessible to those without a Massey ID. Also consider navigating the XML tree using ElementTree. Commented May 7, 2012 at 4:28
  • 2
    PS: This answer, using ElementTree should get you there. Commented May 7, 2012 at 4:58

3 Answers 3

3

@Neha

I guess you have solved your request by now, here is a little piece I put together using the tutorial at http://lxml.de/tutorial.html. The XML file is saved in 'rough_data.xml'

import xml.etree.cElementTree as etree

xmlDoc = open('rough_data.xml', 'r')
xmlDocData = xmlDoc.read()
xmlDocTree = etree.XML(xmlDocData)

for ingredient in xmlDocTree.iter('ingredient'):
    print ingredient[0].text

To all experienced Python programmers reading this, kindly improve this "newbie" code.

Note: The lxml package looks very good, it definitely worth using. Thanks

Sign up to request clarification or add additional context in comments.

Comments

1

Please place the relevant XML text in order to receive a proper answer. Also please consider using lxml for anything xml specific (including html) .

try this :

from lxml import etree

tree=etree.parse("your xml here")
all_recipes=tree.xpath('./recipebook/recipe')
recipe_names=[x.xpath('recipe_name/text()') for x in all_recipes]
ingredients=[x.getparent().xpath('../ingredient_list/ingredients') for x in recipe_names]
ingredient_names=[x.xpath('ingredient_name/text()') for x in ingredients]

Here is the beginning only, but i think you get the idea from here -> get the parent from each ingredient_name and the ingredients/quantities from there and so on. You can't really do any other kind of search i think due to the structured nature of the document.

you can read more on [www.lxml.de]

2 Comments

<recipebook> <recipe> <recipe_name>Omelette</recipe_name> <ingredients_list> <ingredient> <name>eggs</name> <quantity>2</quantity> </ingredient> <ingredient> <name>milk</name> <quantity>50</quantity> <unit>ml</unit> </ingredient> <ingredient> <name>tomato</name> <quantity>1</quantity> </ingredient> <ingredient> <name>oil</name> <quantity>1</quantity> <unit>ts</unit> </ingredient> </ingredients_list> </recipe> <recipe> <recipe_name>Sandwich</recipe_name> <ingredients_list> <ingredient> <name>bread</name> <quantity>2</quantity> <unit>slices</unit> </ingredient> <ingredient> <name>butter</name>
thats the xml file I am working on and i just need the ingredients from the file i.e. milk, eggs etc. i need to extract the ingredients and write them to another file. thanks
0

Some time ago I made a series of screencasts that explain how to collect data from sites. The code is in Python and there are 2 videos about XML parsing with the lxml library. All the videos are posted here: http://railean.net/index.php/2012/01/27/fortune-cowsay-python-video-tutorial

The ones you want are:

  • XPath experiments and query examples
  • Python and LXML, examples of XPath queries with Python
  • Automate page retrieving via HTTP and HTML parsing with lxml

You'll learn how to write and test XPath queries, as well as how to run such queries in Python. The examples are straightforward, I hope you'll find them helpful.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.