For context, i'm trying to create an overly simplified version of bash, not like a bash full script interpreter, just a series of commands and operators (|,||, &&, <, >, <<, >>,$, $?) small interpreter, The mental model which i used in a nutshell is:
- Lexer + Expander: in the first stage i used a simple state machine to lex and store data (commands, arguments, redirection files etc.) and lex input into tokens, i expand env variables and i handle lexical errors too.(as simple as checking finite states of valid characters).
- Parser: in the second i stage i intend to create an AST out of the tokens + data, and handle parsing errors.
- Executor: Finally i'll execute the AST.
No i'm at the parser stage, and i'm trying to think about how might i handle parsing errors, now the thought i had is out of the possible range of valid statements, it seems very difficult to check the validity of such an input cause the range is too big or at least that's what i think, and i'm sure there's some generalized solution for the problem, why i'm sure? because bash have done it.
For example this statement:
$ < $FILE || && > outfile
From the lexer point of view it's all bright and shiny, but it's surely not a valid input from the parser's perspective. Now one possible solution to this is to check whether there's a command token in the input if not then invalid. but what about this one:
$ || ls > $FILE && cat < $FILE
Again all valid lexeme, but unparsable statement, maybe that too could be checked against "if the line start with an OR or AND token error.".
Now the specific question is how bash exactly parse these combination of commands and operators, either there's some sort of more generalized solution or i'm left with an if&else error checking against inputs that i think is invalid. which honestly seems stupid and cumbersome.