I have a perl script that does some regex replacements on a text file, which I need to modify along the following lines: (a) I need to process the text as blocks of text, and then, depending on the presence/absence of one line different replacements need to be done. (b) I need to add text to the end of each block. (this transforms text from a transcription program to LaTeX code)
These are supposed to be two columns:
To the left is how the input looks, to the right what it should become:
ORIGINAL INPUT EXPECTED OUTCOME
# Single line blocks: label to be replaced and \xe added to en
txt@#Name Text text text \ex[exno=\spkr{Name}] \txt Text text text
\xe
nvb@#Name Text text text \ex[exno=\spkr{Name}] \nvb Text text text
\xe
# Multi-line blocks: labels to be replaced and \xe added to end
txt@#Name Text text text \ex[exno=\spkr{Name}] \txt Text text text
fte@#Name Text text text \freetr Text text text
\xe
txt@#Name Text text text \ex[exno=\spkr{Name}] \txt Text text text
SD (0.0) \silence{0.0}
\xe
txt@#Name Text text text \ex[exno=\spkr{Name}] \txt Text text text
tli@#Name Text text text \translit Text text text
fte@#Name Text text text \freetr Text text text
\xe
# Multi-line block that has the mrb@... line (must start with txt):
txt@#Name Text text text \ex[exno=\spkr{Name}] \begingl \glpreamble Text text text //
mrb@#Name Text text text \gla Text text text //
gle@#Name Text text text \glb Text text text //
fte@#Name Text text text \glft Text text text //
SD (0.0) \endgl
\silence{0.0}
\xe
# The tricky thing here is that (a) the labels get replaced differently, the txt line gets two commands, \begingl and \glpreamble, all lines have to end with // and they end with \endgl and \xe. In case there is an SD (silence duration) line then that needs to go between the \endgl and the \xe. (but not all have the SD).
Blocks are separated by an extra blank line. The first line of each block begins with a label txt@..., nvb@... or event and may or may not be followed by subsequent lines starting with different labels. Each label needs to be replaced with something else, here accomplished through regexes like in the example below (plus some other replacements, this is just minimal for purpose of explanation). And then I need to mark the end of each block.
Furthermore, I need to have one conditional somewhere in there: If the block includes a line starting with an mrb@ label (like the sixth block above), different replacement patterns apply.
The following script is what I have, but it processes everything line by line. I know perl can do block by block, which should then make it possible to do the modifications, but unfortunately my skills are way too rudimentary to figure it out by myself.
#!/usr/bin/perl
use warnings;
use strict;
open my $fh_in, '<', $ARGV[0] or die "No input: $!";
open my $fh_out, '>', $ARGV[1] or die "No output: $!";
print $fh_out "\\begin{myenv}\n\n"; # begins group at beginning of file
while (<$fh_in>)
{
# general replacements for everything except if block includes a "mrb@" line:
s/^txt@#(\S*)\s+(.*)/\\ex[exno=\\spkr{$1}] \\txt $2 /g;
s/^nvb@#(\S*)\s+(.*)/\\ex[exno=\\spkr{$1}] \\txt $2 /g;
s/^tli@#\S*\s+(.*)/\\translit $1 /g;
s/^fte@#\S*\s+(.*)/\\freetr $1 /g;
s/^SD\s*\((\d*)\.(\d*)\)/\\silence{\($1\.$2\)}/g;
# after each block I need to add "\\xe"
# replacements if block includes a "mrb@" line:
s/^txt@#(\S*)\s+(.*)/\\ex[exno=\\spkr{$1}] \\begingl \\glpreamble $2 \/\/ /g;
s/^mrb@#\S*\s+(.*)/\\gla $1 \/\/ /g; #
s/^gle@#\S*\s+(.*)/\\glb $1 \/\/ /g; #
s/^fte@#\S*\s+(.*)/\\glft $1 \/\/ /g; #
s/^tli@#\S*\s+(.*)/\\translit $1 \/\/ /g; #
s/^fte@#\S*\s+(.*)/\\freetr $1 \/\/ /g; #
s/^SD\s*\((\d*)\.(\d*)\)/\\silence{\($1\.$2\)}/g;
# after each block with a "mrb@" line I need to add "\\endgl" and "\\xe"
# if there is a line starting with SD at the end of the block it needs to go between "\\endgl" and "\\xe"
print $fh_out $_;
}
print $fh_out "\\end{myenv}"; # ends group
Any help much appreciated!