1

I want to store a html text into database as splitted to individual characters. Since, the text is long and the process is frequence, performance is of particular importance. Thus, I need to find an efficient way to conudct this in PHP without overload of building multiple arrays.

Of course, the purpose is simple text with a few markup html tags, without nested nodes. It can be considered for BBCode or something like that. I just want to have this possibility to skip some tags in this split process.

Example:

$html='This <i>is</i> a <strong>test</test>';

This string should be stored in mysql database as

id  character  html_tag
1    T
2    h
3    i
4    s
5
6    i          italic
7    s          italic
8
9    a
10
11   t          strong
12   e          strong
13   s          strong
14   t          strong
15   !

How to capture the individual characters without corresponding html tags?

7
  • How do you plan on storing <strong>nested <em>tags</em> </strong>? But if performance & correctness is an issue, go with xmlreader. Commented Dec 14, 2012 at 20:14
  • @Wrikken 1. If finding a practical solution, it can be somehow extended to child nodes too, 2. I am talking about simple html text (consider even bbcodes), otherwise, it is impossible to do this with nested DIVs. Commented Dec 14, 2012 at 20:46
  • What you are doing is parsing HTML. You need to use an HTML parser to do this. See htmlparsing.com/php.html for examples and pointers to libraries. See also stackoverflow.com/questions/292926/… Commented Dec 14, 2012 at 20:56
  • @AndyLester no it is no parsing HTML, as I edited the question, it can be for a case other than HTML. It is just a process to skip some tags during split process. Commented Dec 14, 2012 at 20:59
  • @AndyLester Regarding the tags, yes it is connected with mysql, as this process can even be done by mysql functions, since the target is to be stored in database. It is just easier to do this in PHP, but no obligation. Commented Dec 14, 2012 at 21:02

1 Answer 1

2

Parse Html with fast XMLReader.

This code will also work with nested tags, $tags variable is stack of tags. Here I always echo the most nested tag, the last one in stack.

$html='This <i>is</i> a <strong>test</strong>!';

$reader=new XMLReader();
$reader->XML('<root>'.$html.'</root>');
// skip root node
$reader->read();
$tags=array('');
while($reader->read())
    switch($reader->nodeType)
    {
        case $reader::ELEMENT:
            $tags[]=$reader->name;
            break;
        case $reader::END_ELEMENT;
            array_pop($tags);
            break;
        default:
            for($i=0;$i<strlen($reader->value);$i++)
                // your insert sql here
                echo "<br/>'".$reader->value[$i]."' ".end($tags);
    }

Also, because speed is crucial, consider buffering inserts into string and running them as a batch:

INSERT INTO tname (character,html_tag) VALUES('T',''),('h','');
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.