To do some serious parsing work, you need to learn some language theory. Fortunately, it isn't very difficult.
The method we are going to cover here is what we called Top Down Recursive Parsing.
The full listing of the source code here is going to be too long for the purpose of this forum, instead, I will present some pseudo-code for it.
The first thing you will need to do is to define your grammar. What is considered valid and what is not, you represent a grammar like this:
formula := term
:= term + formula
:= term - formula
term := variable
:= coefficient variable
So a formula C + 2D can be represented as
formula
term
variable
C
+
formula
term
coefficient
2
variable
D
With this in mind, we first solve a simpler problem, there are only a few types of things we need from the input string
+
-
coefficient
variable
Only these four things are valid input, you may want to skip space. Splitting the input string into these 4 types of things is called lexical analysis. We typically implement a so called scanner to do this.
A scanner typically look like this
class Scanner
{
public:
Scanner(const char* text);
Token GetToken(); // The current token
void Scan(); // read the next token
}
Next, you will want to group these token into a tree like what I have shown you above. This logic we typically call it parsing and it implemented as a parser. You can implement a parser in many ways, here is one way you can do it with a top down predictive parser
class Parser
{
public:
private:
bool ParseVariable()
{
if (s.GetToken() is variable) { s.Scan(); return true; }
}
bool ParseTerm()
{
if (s.GetToken() is variable) { s.Scan(); return true; }
if (s.GetToken() is coefficient) { s.Scan(); return this->ParseVariable(); }
}
Scanner s;
}
The similar code goes on. Obviously one can extend the return type of those Parse() method to return something useful to its caller and assemble the representation you need for your purpose.
For my personal purposes, I wrote a few parsers for different languages. You can take a look at them as sample.
This is a sample in Python.
https://github.com/cshung/MiscLab/blob/master/GreatestCommonDivisor/polynomial_module.py
This is a sample in C++ with a small twist, I parsed the string backwards to avoid 'left recursion'
https://github.com/cshung/Competition/blob/master/Competition/LEET_BASIC_CALCULATOR.cpp
To see a top down parser in action in real life product, see this example in ChakraCore, which I proudly worked on some time ago.
https://github.com/Microsoft/ChakraCore/blob/master/lib/Parser/Parse.cpp
char * blah; blah = input.c_str();