1

I am very new to programming and need a little help with getting data from a website and passing it into my PHP script.

The website is http://www.birthdatabase.com/.

I would like to plug in a name (First and Last) and retrieve the result. I know you can query the site by passing the name in the URL, but I am having problems scraping the results.

http://www.birthdatabase.com/cgi-bin/query.pl?textfield=FIRST&textfield2=LAST&age=&affid=

I am using the file_get_contents($URL) function to get the page but need help after that. Specifically, I would like to scrape only the results from a certain state if there are multiple results for that name.

Thanks for your help.

3
  • Its working for me ... Refer : code.google.com/p/php-html2array/downloads/… Commented Mar 24, 2013 at 17:24
  • I've tried using preg_match but I'm not sure if that's the best way Commented Mar 24, 2013 at 17:25
  • How do I use that HTML parser? Commented Mar 24, 2013 at 17:28

1 Answer 1

2

You need the awesome simple_html_dom class.

With this class you can query the webpage's DOM in a similar way to jQuery.

First include the class in your page, then get the page content with this snippet:

$html = file_get_html('http://www.birthdatabase.com/cgi-bin/query.pl?textfield=' . $first . '&textfield2=' . $last . '&age=&affid=');

Then you can use CSS selections to scrape your data (something like this):

$n = 0;
foreach($html->find('table tbody tr td div font b table tbody') as $element) {
    @$row[$n]['tr']  = $element->find('tr')->text;
    $n++;
}

// output your data
print_r($row);
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the help. This class definitely looks like what I need. The output from birthdatabase.com contains multiple tables and there are no unique tags to scrape. I hope i'm using the right terminology. I am such a noob at all of this, so any help would be appreciated.
I'm not from the US so I don't know the answer to this but could you use the ZIP code to force the state you want somehow?
The states are listed as part of the output. I could probably search for the appropriate values in the array and then output the corresponding birthdays. I guess my question is still in relation to getting that array in the first place. The output of the database has multiple tables that contain ads and other nonsense that I don't want in the array. How do i get around that?
Otherwise you will need to either find a way to make them un-paginate the data or make MANY requests to their server. To get all items -- then loop and filter based on the state column.
I have amended my answer with a better CSS DOM path
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.