0

I'm doing a project that reads some string reactions from file at formula e.g: (5A+3B=c+10D) as an input. I need to do a parsing for the string reaction so that I can extract &(split) integer values beside a char and put them into a vector i.e vector associated with the reaction here is :[5 3 1 10]. I thought about std::strtok function but I think it cannot seperate integer values!!! Can any one help me ?? Here my try:

int main()
{
  std::string input;
  std::getline(std::cin, input);
  std::stringstream stream(input);
  while(1) {
      int n;
      stream >> n;
      char * pch;

      pch = strtok (input," ");
      while (pch != NULL)
        {
          printf ("%s\n",pch);
          pch = strtok (NULL, " ,.");
        }
      return 0;
  }
}
4
  • 1
    State machine. Or search StackOverflow for "c++ parse math expression". Commented Feb 2, 2016 at 5:03
  • I am quite sure that strtok won't work unless you convert input to c style string char * blah; blah = input.c_str(); Commented Feb 2, 2016 at 5:12
  • So you want to seperate digits from given input string? Commented Feb 2, 2016 at 6:30
  • @Sagar patel yes exactly i only need the digits ..can any one help ? Commented Feb 3, 2016 at 0:59

1 Answer 1

1

To do some serious parsing work, you need to learn some language theory. Fortunately, it isn't very difficult.

The method we are going to cover here is what we called Top Down Recursive Parsing.

The full listing of the source code here is going to be too long for the purpose of this forum, instead, I will present some pseudo-code for it.

The first thing you will need to do is to define your grammar. What is considered valid and what is not, you represent a grammar like this:

formula := term 
        := term + formula
        := term - formula
term    := variable
        := coefficient variable

So a formula C + 2D can be represented as

formula
        term
             variable
                      C
        +
        formula
                term
                     coefficient
                                 2
                     variable
                              D

With this in mind, we first solve a simpler problem, there are only a few types of things we need from the input string

+
-
coefficient
variable

Only these four things are valid input, you may want to skip space. Splitting the input string into these 4 types of things is called lexical analysis. We typically implement a so called scanner to do this.

A scanner typically look like this

class Scanner
{
public:
    Scanner(const char* text);
    Token GetToken(); // The current token
    void Scan(); // read the next token
}

Next, you will want to group these token into a tree like what I have shown you above. This logic we typically call it parsing and it implemented as a parser. You can implement a parser in many ways, here is one way you can do it with a top down predictive parser

class Parser
{
public:

private:
    bool ParseVariable()
    {
        if (s.GetToken() is variable) { s.Scan(); return true; }
    }
    bool ParseTerm()
    {
        if (s.GetToken() is variable) { s.Scan(); return true; }
        if (s.GetToken() is coefficient) { s.Scan(); return this->ParseVariable(); }
    }
    Scanner s;
}

The similar code goes on. Obviously one can extend the return type of those Parse() method to return something useful to its caller and assemble the representation you need for your purpose.

For my personal purposes, I wrote a few parsers for different languages. You can take a look at them as sample.

This is a sample in Python. https://github.com/cshung/MiscLab/blob/master/GreatestCommonDivisor/polynomial_module.py

This is a sample in C++ with a small twist, I parsed the string backwards to avoid 'left recursion' https://github.com/cshung/Competition/blob/master/Competition/LEET_BASIC_CALCULATOR.cpp

To see a top down parser in action in real life product, see this example in ChakraCore, which I proudly worked on some time ago. https://github.com/Microsoft/ChakraCore/blob/master/lib/Parser/Parse.cpp

Sign up to request clarification or add additional context in comments.

3 Comments

thank you very much for your explanation althought it sound a little difficult, could you please clarify what refers to that type of the function : Token GetToken(); // The current token ??
and how to state this in c++ code : formula term variable C + formula term coefficient 2 variable D ? ? :/
The code above in the post are really just 'pseudocode', they are meant for describing concepts, not actual code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.