1

I have an xml File This file is composed of Blocks, Lines, words, characters:

 <block id="48" left="2532" top="108" right="2896" bottom="137">
    <line id="49" left="2532" top="108" right="2896" bottom="137">
        <word id="50" left="2532" top="108" right="2616" bottom="137" value='Date&quot;d&apos;' confidence="69" font="MP" type="-1">
            <char id="51" left="2532" top="115" right="2550" bottom="137" value="D" confidence="92" />
            <char id="52" left="2551" top="120" right="2565" bottom="137" value="a" confidence="51" />
            <char id="53" left="2566" top="116" right="2574" bottom="137" value="t" confidence="33" />
            <char id="54" left="2574" top="120" right="2589" bottom="136" value="e" confidence="100" />
            <char id="55" left="2589" top="108" right="2592" bottom="112" value='&quot;' confidence="39" />
            <char id="56" left="2597" top="115" right="2611" bottom="136" value="d" confidence="76" />
            <char id="57" left="2612" top="115" right="2616" bottom="123" value="&apos;" confidence="100" />
        </word>
  • Every Block is composed of 1,...,n Lines
  • Every line is composed of 1,...,k words
  • Every word is composed of 1,...,l characters

I am trying to create objects as follows:

Block(int top, int left, int bottom, int right, vector<Lines>)
Line(int top, int left, int bottom, int right, vector<words>)
Word(int top, int left, int bottom, int right, vector<characters>)

I am using TinyXML on C++, but i can't link them together, My code can take one object( block,line,word,character) at a time.

void Keywords::checkChild(TiXmlElement *child)
{
       if(child)
        {
            if((string)child->Value() == "block")
            {
                cout << child->Value()<<endl;

                double x1 = atoi(child->Attribute("left"));
                double y1 = atoi(child->Attribute("top"));
                double x2 = atoi(child->Attribute("right"));
                double y2=  atoi(child->Attribute("bottom"));
                int bid = atoi(child->Attribute("id"));
                double xcenter =  (x1 + x2)/2.0;
                double ycenter = (y1 + y2)/2.0;
                double hauteur = y2-y1;
                double largeur = x2-x1;
              //LineList is a vector, and  I can't find a way to fill the vector
              //  blockList.push_back(new Block(y1,x1,y2,x2,xcenter,ycenter,largeur,hauteur,xmlFile,lineList));
            }

          checkChild(child->FirstChildElement());

          checkChild(child->NextSiblingElement());

        }///end if child
}
4
  • What can't you link together? Commented May 29, 2013 at 11:26
  • What i mean is that I can extract the block, line, word alone, i can't find a way to have for example an object: block(int,int,int,vector<line>) with a vector that contains all the lines inside the block and i can't find a way to create the object line(int,int,int,vector<word>) with a vector that contains all the words etc.. Commented May 29, 2013 at 11:28
  • I can't find a way to iterate through the XML file. i'll post the algorithm Commented May 29, 2013 at 11:30
  • I edited the question, hope that it's more clear now. Commented May 29, 2013 at 11:41

1 Answer 1

2

Instead of trying to build the tree by iterating through the document, it makes more sense to parse the document as a tree structure:

void parseFile(TiXmlElement* document, vector<Block*>& blocks)
{
  for (TiXmlElement* sub = document->GetFirstChildElement("block"); sub; sub = sub->GetNextSiblingElement("block"))
    blocks.push_back(parseBlock(sub));
}
Block* parseBlock(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Line*> lines;
  for (TiXmlElement* sub = element->GetFirstChildElement("line"); sub; sub = sub->GetNextSiblingElement("line"))
    lines.push_back(parseLine(sub));
  return new Block(x1, ..., lines);
}
Line* parseLine(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Word*> words;
  for (TiXmlElement* sub = element->GetFirstChildElement("word"); sub; sub = sub->GetNextSiblingElement("word"))
    words.push_back(parseWord(sub));
  return new Line(x1, ..., words);
}
Word* parseWord(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  vector<Char*> chars;
  for (TiXmlElement* sub = element->GetFirstChildElement("char"); sub; sub = sub->GetNextSiblingElement("char"))
    chars.push_back(parseChar(sub));
  return new Word(x1, ..., chars);
}
Char* parseChar(TiXmlElement* element)
{
  double x1 = atof(element->Attribute("left"));
  // ...
  return new Char(x1, ...);
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.