0

I want to be able to parse specific content from a website into a mySQL database. For example, on site http://allrecipes.com/Recipe/Fluffy-Pancakes-2/Detail.aspx I want to parse into my database (which has a table with columns RecipeName, Ingredients 1-10).

So basically my database will contain the name and all the ingredients for that recipe. There is no need to edit the content, simply parse them in as is (i.e. 3/4 cup milk) since i am using character in my database.

How exactly do I go about doing this? I was looking a pre-built parsers and it seems its tough to find one that's easy to use since I am fairly new to programming. Of course, I can manually enter values in but I want to parse them in.

Would it be possible to just parse this content and write a file that has a RecipieName, Ingredient string which I can then parse into my database? Or should I just do it directly into the database? I am unsure as to how to connect a database to a parser also directly, but I might be able to find some information online.

Basically, I am looking for help on how to exactly go about doing this since I am not very well versed in programming and this seems to be a lot more complicated than it might be.

I am using Java as my main language right now, although I can't say I am very good at it. But I should be able to understand the basic concepts.

Any suggestions on what parser to use or how to do this?

Thanks!

1
  • what program language your using ??? php ??? Commented Apr 5, 2011 at 3:43

1 Answer 1

1

This is how I would do it in PHP. This is almost certainly NOT the most efficient way to do it, nor has it been debugged.

function parseHTML($rawHTML){
 $startPosition = strpos($rawHTML,'<div class="ingredients"'); //Find the position of the beginning of the ingredients list, return the character number.
 $endPosition  = strpos($rawHTML,'</div>',$startPosition);     //Find the position of the end of the ingredients list, begin searching from the beginning of the list (found in step 1)
 $relevantPart = substr($rawHTML,$startPosition,$endPosition); //Isolate the ingredients list
 $parsedString = strip_tags($relevantPart);                    //Strip the HTML tags off of the ingredients list
 return $parsedString;
}

Still to be done: You say you have a mySQL database with 10 separate ingredients columns. This code outputs everything as one big string. You would have to change the strip_tags($relevantPart) function to strip_tags($relevantPart,"<li>"). That would let the <li> tags through. Then, you would have to loop through every <li> tag, performing a similar function to this. It shouldn't be too hard, but I don't feel comfortable writing it with no functioning PHP server.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.