0

do you know if there is any function (PHP) which clean up some HTML code (got with cURL) and filter the visible text (the one the browser is going to show). Thanks

1
  • I'm assuming that you're counting any text in the HTML file since finding out which text is actually visible would be extremely difficult (CSS display: none; or just something overlaying it) Commented Apr 6, 2011 at 20:26

3 Answers 3

4

This is harder than you'd think. An obvious simple solution is to run strip_tags() over it, but that would simply remove tags and leave all text content intact, including embedded javascript and CSS, as well as all text inside elements that are normally hidden (e.g. by setting display: none on them). You could try some regex magic to filter out the parts you're not interested in, but regular expressions on HTML are generally a bad idea for anything nontrivial. The ultimate solution is, I'm afraid, to use a proper HTML parser and then pull the actual text out of the resulting DOM tree - by the time you have that, you'll be pretty close to implementing a web browser.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I'll try to cut out the java and css before running strip_tags() :-)
I hope you mean javascript. Java is an entirely different thing.
1

Take a look at strip_tags():

http://us.php.net/manual/en/function.strip-tags.php

1 Comment

Now I remember why I stopped contributing to this site originally. It only took 30 minutes to rediscover the feeling of frustration caused by over-zealous down-voters.
0

If you're literally just "cleaning up" the code, then a solution like TIDY could be your answer.

Some solutions like this will allow you to pull out plain text and might ease your pain.

However, "full on" parsing is a whole other story and you'd better bone up on your regex.

1 Comment

The strip_tags function with some improvement should work great. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.