Is it possible to do something with PHP where I can set up a connection to a URL like http://en.wikipedia.org/wiki/Wiki and extract any words that contain a prefix like "Exa" and "ins" such that the resulting PHP page will print out all the words that it found. For example with "Exa", the word "Example" would be printed out each time it found an instance of "Example". Same thing for words that start with "ins".
-
Your question is very broad, and almost impossible to answer in a post. Consider breaking this task down into chunks and working on each one separately, and asking for help as necessary.eykanal– eykanal2011-05-09 18:12:53 +00:00Commented May 9, 2011 at 18:12
-
1Also, just an FYI: you'll want to check if accessing a website via PHP is against their terms/conditions.sdleihssirhc– sdleihssirhc2011-05-09 18:14:27 +00:00Commented May 9, 2011 at 18:14
4 Answers
$data = strip_tags(file_get_contents($url));
$matches = array();
preg_match('/\bExa|ins([^\b]+)/', $data, &$matches);
for ($i = 1; $i < count($matches); $i++) {
echo "Match: '".$matches[$i]."'\r\n";
}
Probably something like this, though I'm not so sure about the regex, I haven't tested it yet...
Edit: I changed it, it should work now... (\B => \b and strip_tags to prevent HTML-classes from being matched).
Comments
I don't have a full answer with example to give you, but yes, you should be able to read the whole page into a string variable and then do normal string operations on it. It will read in all the HTML, so you will probably need to do a lot of regex to eliminate tags if you don't want them.
Comments
Read the page into a string using file_get_contents. Use one of the various string functions to examine the page.
2 Comments
Yes, this possible. A potential approach would be to:
Use something like fopen (if allow_url_fopen is enabled - failing that use CURL) to grab the external web page content.
Remove the (presumably not required) HTML tags via strip_tags.
Use strtok to tokenise and iterate over the remaining content, checking for whatever conditions you require.