0

I have a html table, generated by another website that I'm trying to convert to a php array.

I can not convert it using simplexml because the code of the generated table is not valid, and cause a lot of errors, also I need to keep some attributes of the table td elements, and remove the others.

What would be the most efficient way of doing this? Or do you know any php class that could help me achieve this?

BTW: What I'm trying to do is convert an school schedule to a php array, that I will be able to exploit after.

Here is an example of the data I retrieve: http://paste2.org/p/1869193

Btw, using php strip tags, I already remove the unnecessary tags such as spans and fonts.

3
  • 1
    Try this stackoverflow.com/questions/292926/…, although it might not work because the HTML is not valid. Commented Jan 15, 2012 at 22:49
  • Thank you!, it cleans my html, so I might be able to work with this. Commented Jan 15, 2012 at 22:56
  • Great, I'll post an answer as it might be helpful to someone else too. Commented Jan 15, 2012 at 23:01

2 Answers 2

1

You can also use PHP's Tidy if installed (it is by default on some installs) - it not only cleans up the HTML, but also lets you traverse the DOM:

http://www.php.net/manual/en/book.tidy.php

Sign up to request clarification or add additional context in comments.

Comments

0

You can find a list of HTML parserd in the answers of the following question on SO: Robust and Mature HTML Parser for PHP

1 Comment

I'm not really sure it is even possible to build a parser that fixes HTML before parsing it. I think your best bet is t fix the HTML yourself before feeding it to any parser.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.