4

I'm aware there are several XML libaries out there, but unfortunately, I am unable to use them for a school project I am working on.

I have a program that created this XML file.

<theKey>
<theValue>23432</theValue>
</theKey>

What I am trying to do is parse out "23432" between the tags. However, there are random tags in the file so may not always on the second line from the top. Also, I don't know how many digits the number is between the tags.

Here is the code I developed so far. It is basic because I don't know what I can use that is part of the C++ language that will parse the value out. My hint, from me working with JAVA, is to use somethign from the "String" library but so far I am coming up short on what I can use.

Can anyone give me direction or a clue on what I can do/use? Thanks a lot.

Here is the code I developed so far:

#include <iostream>
#include <fstream>
#include <string>

using std::cout;
using std::cin;
using std::endl;
using std::fstream;
using std::string;
using std::ifstream;


int main()
{
 ifstream inFile;
 inFile.open("theXML.xml");

 if (!inFile)
 {
 }

 string x;
 while (inFile >> x)
 {
  cout << x << endl;
 }

 inFile.close();

 system ( "PAUSE" );


 return 0;
}
9
  • Just grab Bison and use it generate your own XML parser. Commented Feb 8, 2010 at 22:58
  • ... or Boost Spirit, if you prefer. Commented Feb 8, 2010 at 23:00
  • @*: He mentioned he can't use publicly available libraries -- homework possibly. Commented Feb 8, 2010 at 23:04
  • If you can use a regular expressions library (such as boost.regex or std::tr1::regex) then you might consider doing as this post says: immike.net/blog/2007/04/06/… Commented Feb 8, 2010 at 23:10
  • 2
    I have figured out a solution based on your ideas. Here is my basic algorithm: - read XML file into a string - user an iterator to iterator through the string - find my tag. record location - prase out value This may not be the best solution but it works Commented Feb 8, 2010 at 23:49

4 Answers 4

7

To parse arbitrary XML, you really need a proper XML parser. When you include all the character-model nooks and DTD-related crannies of the language, it is not at all simple to parse, and it's a terrible faux pas to write a parser that only understands an arbitrary subset of XML.

In the real world, it would be wrong to use anything but a proper XML parser library to implement this. If you can't use a library and you can't change the program's output format to something more easily-parsed (eg. newline-separated key/value pairs), you're in an untenable position. Any school project that requires you to parse XML without an XML parser is totally misguided.

(Well, unless the whole point of the project is to write an XML parser in C++. But that would be a very cruel assignment.)

Sign up to request clarification or add additional context in comments.

Comments

4

Here's an outline of what your code should look like (I've left out the tedious parts as an exercise):

std::string whole_file;

// TODO:  read your whole XML file into "whole_file"

std::size_t found = whole_file.find("<theValue>");

// TODO: ensure that the opening tag was actually found ...

std::string aux = whole_file.substr(found);
found = aux.find(">");

// TODO: ensure that the closing angle bracket was actually found ...

aux = aux.substr(found + 1);

std::size_t end_found = aux.find("</theValue>");

// TODO: ensure that the closing tag was actually found ...

std::string num_as_str = aux.substr(0, end_found); // "23432"

int the_num;

// TODO: convert "num_as_str" to int

This is not a proper XML parser of course, just something quick and dirty that solves your problem.

5 Comments

Except that it doesn't necessarily solve his problem. It'll produce the wrong value for something like: "<theValue>123</thevalue><theKey><theValue>345</theValue></theKey>".
At least I hope that this will get him started.
@Jerry - I've replaced a backslash with a slash in the literal "</theValue>". Is that why you said my code didn't work?
No -- at least according to the sample he gave, he only wants a "theValue" that's inside of a theKey, whereas your code appears to look for any theValue anywhere in the file.
Please do not assume you can use std::string to store UTf-16.
2

You will need to create functions to at least:

  • If the node is a container node then
    • Identify/parse elements (beginings and ends) and attributes, if any
    • Parse children recursively
  • Otherwise, extract the value while trimming trailing and leading whitespaces, if any, if they are not significant

The std::string provides quite a few useful member functions such as: find, find_first_of, substr etc. Try to use these in your functions.

Comments

2

THe C++ Standard library provides no XML parsing features. If you want to write this on your own, I suggest looking at std::geline() to read your data into strings (don't try to use operator>> for this), and then at the std::string class's basic features like the substr() function to chop it up. But be warned that writing your own XML parser, even a basic one, is very far from trivial.

3 Comments

why is it prefered to use std::getline() over << ?
The stream operator>> is basically intended for reading space delimited numeric values. You can make it work for other values, but it it is particularly bad at reading strings, which may contain spaces.
I have figured out a solution based on your ideas. Here is my basic algorithm: - read XML file into a string - user an iterator to iterator through the string - find my tag. record location - prase out value This may not be the best solution but it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.