PHP query XML with SQL

Question

I am doing some web scraping and come across several tables of data that I want to query against. Currently I'm up to:

$url = 'http://finance.yahoo.com/q/op?s=QQQQ&m=2012-04';
$html = @DOMDocument::loadHTMLFile($url); 
$xml = simplexml_import_dom($html); 
$results = $xml->xpath('//table[@class="yfnc_datamodoutline1"]');
var_dump($results);

Produces results: http://pastebin.com/6p3L2Kcc

This is well-ordered HTML table data, with TH and TD and everything. I'd like to use it like this:

$sql = 'SELECT Last,Open_Int FROM TABLE1 WHERE Last>25 AND Symbol LIKE "%C%"';
$results = $xmltable->sql($sql);
while($result = $results->fetch_assoc())
  echo $result['Last'] . " -- " . $result['Open_Int'] . "\n";

Without any creativity, I can write classes to parse that HTML table, take the first row, create a table in sqlite, select other rows and turn them into insert statements. But, do you know a better way to do this, or is there some powerful PHP function that I'm not seeing?

Update: Perhaps the scope here is too big. I'd be happy with a link to a library or advice on getting an HTML table in to a (proper) XML table.

Is there a good reason why you're loading the document with BOTH DOMDocument and SimpleXML? — Mark Eirich
– Mark Eirich, Commented Mar 5, 2011 at 19:44
The "simple" approach I'm referring to is: get data using this method phpro.org/examples/Parse-HTML-With-PHP-And-DOM.html and then insert data in a database. The question is: is there a better way to get it done than that? — William Entriken
– William Entriken, Commented Mar 5, 2011 at 19:46
@Mark: nope, I didn't know simplexml accepted html directly, thank you — William Entriken
– William Entriken, Commented Mar 5, 2011 at 19:48
Yeah, try simplexml_load_file() or simplexml_load_string(). For your actual question, if you really need full SQL ability, then you need to store it in a database. However, do you really need all that SQL provides? If not, you may do well to write your own function to do the queries using XPath. — Mark Eirich
– Mark Eirich, Commented Mar 5, 2011 at 19:54

Ken Downs · Accepted Answer · 2011-03-14 03:08:29Z

1

The answer depends on your larger needs. Here are three questions that can flesh those out:

1) How often is the data read vs. written?

2) Do you keep old versions or is only the latest required?

3) Will the data be compared to other data?

In one case let's say the answer to #1 is "many more reads" and the answer to #3 is "yes". In this case it might be well worthwhile to put the XML results into a SQL table for frequent and flexible querying.

However, in another case, let's say the answer to #2 is "no" and the answer to #3 is "no" -- you just keep the latest retrieval and don't compare it to anything. In this case you can just stick into a file and retrieve it as needed for display (#1 becomes kind of irrelevant).

EDIT in response to question in comment: Assuming you want to put it into a database, the display you link to shows a nested set of objects/arrays. You "walk the tree" to peel out the nested objects, strip off their properties and issue individual inserts to the particular tables.

edited Mar 14, 2011 at 3:08

answered Mar 5, 2011 at 19:21

Ken Downs

4,8671 gold badge25 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

William Entriken Over a year ago

I agree with your response. However, what you are describing is using the data after it is in a unable format. My question is asking how to get the data (currently in HTML, scraped from the web) into a useful format.

Collectives™ on Stack Overflow

PHP query XML with SQL

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related