Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of @args
my ($var, $command, @args) = parse($_);
# Execute the anonymous sub specified by $command with the @args
my $value = $functions{ $command }->(@args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = @_; # get the $line, which is a argument
my ($var, $command, @args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push @args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push @args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push @args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, @args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).