1

In maintaining a large legacy code base I came across this function which serves as an accessor to an XML tree.

std::string DvCfgNode::getStringValue() const
{
    xmlChar *val = xmlNodeGetContent(mpNode);
    if (val == 0)
        return 0;

    std::string value = (const char *)val;
    xmlFree(val);
    return value;
}

How can this function return '0'? In C you could return a zero pointer as char * indicating no data was found, but is this possible in C++ with std::string? I wouldn't think so but not knowledgable enough with C++.

The (10 year old) code compiles and runs under C++ 98.

EDIT I

Thanks to everyone for the comments. To answer a few questions:

a) Code is actually 18 years old and about, umm, 500K-1M lines (?)

b) exception handling is turned on, but there are no try-catch blocks anywhere except for a few in main(), which result in immediate program termination.

c) Poor design in the calling code which seems to "trust" getStringValue() to return a proper value, so lots of something like:

std::string s = pTheNode->getStringValue()

Probably just lucky it never returned zero (or if it did, nobody found the bug until now).

1
  • There is also another problem with the code in that function. Given that xmlNodeGetContent(mpNode) has obviously 2 reasons to return NULL. One, being that there is no content. The other being that the system is out of memory. If you fix the semantics of the function, for example by having it return std::optional<std::string> to express that there is not always a value, you still have to worry about the out of memory condition, which might need another form of addressing in this particular code base. Commented Jul 22, 2018 at 10:11

2 Answers 2

5

Your intuition about the "zero pointer as char*" is correct. What is happening is that 0 is being interpreted as the null pointer, resulting in the returned string being initialized from a const char* null pointer.

However, that is undefined behaviour in C++. The std::string(const char*) constructor requires a pointer to a null-terminated string. So you have found a bug. The fix really depends on the requirements of the function, but I throwing an exception would be an improvement over undefined behaviour*.


* That is a massive understatement. Code should not have undefined behaviour

Sign up to request clarification or add additional context in comments.

5 Comments

If the code base is old, it might well be that it is not really "throwing" exceptions. Maybe it is even compiled with exception handling off.. So basically, the signature of the function suggests, that there will always be a result, while the implementation shows, that this is not true. The easiest fix I see is to change the return value from 0 to return std::string(). This should make the code more readable at least.
@BitTickler The code has undefined behaviour. That is what needs to be fixed. A fix could be to throw exceptions. Another one could be to return an empty string. There is not enough information to know which is best, but I would stay away from returning an empty string, because it is a valid string, and it means the client code will have to check for empty strings to figure out if the function had an error condition or not.
Agreed - Inspection of what calling code expects here is advised. If something is undefined behavior in general, it usually still does something specific in a particular case. And returning an empty string is a good approximation UNLESS callers of this function are riddled with try .. catch() blocks as they expect an exception.
@BitTickler Wrong about UB, wrong about the empty string, and wrong about the try-catch blocks on the caller side.
BitTickler, good suggestion, thanks. I've modified it to return an empty string, which is acceptable to the logic of the calling ode.
1

It depends on how you want to signal that there was no data. If no data means that there is an empty string value in the xml tree you can just return an empty string.

In case you want to model e.g. that there is no data item and thus no data in the tree, you have several options depending on your data semantics.

  • If the data is mandatory and shall be present, you have an object with a violated invariant, i.e. an object in an illegal state. Using that object for anything is illegal. I would either std::terminate the program (or use some other termination mechanism that is suitable, e.g. an error reporter) or throw something that is guaranteed not to be caught and handled.
  • If the data is optional you can return something that models this. In C, you would probably go with a pointer to an object which can be null, but this introduces ownership issues. In C++, you can return an std::optional<std::string> which exactly describes this.

7 Comments

A code maintainers role is to keep the code running. They have to make a delicate choice between fixing too aggressively (which might result in a complete re-write of the application) and being too timid. I would probably find all callers, and depending on the number of callers and what they expect so far I would make a choice either between the std::optional if the number of callers is small and an empty string if the number of callers is huge. The throwing of an exception most likely would require change of the callers code as well to keep the application running.
@BitTickler I would disagree. I would try to define the semantics of the function, and depending on this (e.g. does a valid object always have a value?) decide on the bug fix. From a maintenance point of view, as pointed out by juanchopanza in his answer, the current code contains undefined behavior which means that there is great freedom when choosing the fix because nobody can currently rely on the behavior in case ´val == 0`. No client can be broken because no client can make any assumption.
That is the tricky thing with undefined behavior. The code exists and also the code calling this function exists. As such, they made an assumption. And the implementation of the caller sites is based on that assumption. Now, of course - with updates of the standard library, those assumption could become wrong if the new version of the standard library does some different "undefined behavior". But - if for example the current code base never encountered exceptions due to the return 0, I would consider it a bad idea to now throw one and thus break the calling code for sure.
@BitTickler Still not convinced. The code in question contains UB on the path val == 0. If that path is never taken I can change it because it will not break callers. If the path is executed the program executes UB. I would consider this a bug that has to be fixed, even if the program somehow passed a test because the test does not verify anything. The argument "the code exists and also the calling function exists. As such they made an assumption" is ill-formed because you cannot make assumptions about UB.
@BitTickler In addition, you are changing code and recompiling it, so what has been tested before may now be different as it is UB. Other compiler version, other compiler switches, other whatever can arbitrarily change the behavior. If your code executes UB there is something wrong in your code and it should be fixed. Sometimes I think it would be a good idea for compiler vendors to add some random effects in case UB is executed to prevent people from making assumptions about it... Wasn't there a gcc version that started Towers of Hanoi?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.