I have a simple perl script that does a number of text replacements along the following lines:
#!/usr/bin/perl
{
open(my $in, "<", "Texts.txt") or die "No input: $!";
open(my $out, ">", "TeXed/Texts.tex") or die "No output directory: $!";
LINE: while (<$in>) {
s/(txt@)(.*)(?<!\t|\[)\[(.*)/\1\2\\ovl{}\3/g;#
# there are a bunch of other replacements like the above
print $out $_ ;
}
}
So far so good. The text I am running this script on is organized into blocks (not always of the same length). Each block starts with the same identifier (txt@) and then a unique label. Every label starts with a #.
What I would like to achieve is to delete all repeated labels – essentially I only want to keep every first instance of a label and replace/delete all subsequent ones until the label changes. In the below example the ones to be replaced/deleted are in bold.
txt@#Label1 some text
some more text
some more text
txt@#Label1 some other text
some more text
some more text
some more text
txt@#Label1 some random text
some more text
some more text
txt@#Label2 some text
some more text
some more text
some more text
txt@#Label1 some text
some more text
some more text
txt@#Label3 some text
some more text
some more text
txt@#Label3 some text
some more text
some more text
txt@#Label1 some text
some more text
some more text
etc.
Sorry for the long example – I couldn't come up with a better way of explaining this.
So I want to delete all repeated Label1, Label2, etc., but without modifying the rest of the text (some text, some more text) both on the same line but also on subsequent lines. The number of subsequent lines is not always the same (so it's not every n-th line that has to be replaced).
Is that possible with perl? Or any other way? (I'm not married to perl, if it is easier with another language I'd be happy to try that – I'm not a programmer though so detailed instructions would be highly appreciated).
#into the non-capturing group. Then the replacement doesn't put it back. Let me know if more comes up.