Script matching literal pattern over multiple lines?

Question

I have a multi-line string in the variable $PAT. $PAT must be searched for within the file $FILE. If $PAT is in $FILE, it needs to print the file with $PAT removed. If $PAT is not found, then print nothing. It is unknown if $PAT contains any special characters, and it must matched literally. For example, if $PAT is //\/\\|* then that exact same 8-character string should be searched for in $FILE.

The real world use for this is for installing and removing text within already existing files/scripts. If you want to append $PAT in $FILE, you want to know if it has already been appended previously. If $PAT is already in $FILE, then the output without $PAT allows you to easily uninstall it.

The systems I'm needing such a script for (Android devices) only have BusyBox on them. No Perl or other scripting languages.

What BusyBox utilities are available? Only what comes with Android (which aren't actually BusyBox), or more? Android ships a very limited set of utilities, but more and more with each release: what minimal version are you targetting? — Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil', Commented Dec 6, 2012 at 22:53
Also, how big are the files? Will they fit comfortably in RAM (so no more than ~100MB)? — Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil', Commented Dec 6, 2012 at 22:56
These answers to this question so far have been so imaginatively brilliant. I am so impressed, I'm beyond words. Excellent thinking everyone. — Sepero
– Sepero, Commented Dec 7, 2012 at 17:30
Yes, the files would generally be small Gilles. Your answer was great. The Android devices will be rooted androids, so default busybox. ;) — Sepero
– Sepero, Commented Dec 7, 2012 at 17:32

jfg956 · Accepted Answer · 2012-12-06 12:30:52Z

If you want to match $PAT as complete lines, I have a solution. By complete lines, I mean that, in the case of a match, you can split $FILE in three sub-files (f1, f2 & f3) where:

cat f1 f2 f3 is $FILE,
f2 is $PAT.

Note that f1 and/or f3 can be empty.

First, create the f2 file:

cat << EOF > f2
$PAT
EOF

Then, diff $FILE and f2, saving the result:

diff $FILE f2 > diff_res
res=$?

If $res is zero, then f1 and f3 are empty, and $FILE is equal to $PAT. I will suppose that you want an empty file in this case.

If diff_res contains a line starting by ">", f2 contains at least one line not in $FILE. To test that:

grep -q '^> ' diff_res
test $? -eq 0 && echo "PAT not found"

If diff_res does not contain lines starting by ">", all lines of f2 are in $FILE, but maybe not contiguously. If it is contiguously, diff_res will contain either:

A single line not starting with "<" (if f1 or f3 are empty),
Two lines not starting with "<", the 1st always starting with "1d" or "1,".

To test this, we have:

nb=$(grep -v "^< " diff_res | wc -l)
if test $nb -gt 2; then
  pat_found=0
elif test $nb -eq 1; then
  pat_found=1
else
  pat_found=$(sed -n -e '1{/^1d/p;/^1,/p}' diff_res | wc -l)
fi

Then, if pat_found is 1, the file without $PAT is the diff result with only the lines starting by "<" without those 2 char:

grep '^< ' diff_res | cut -c 3-

The complete and reorganized script would look like:

# Output the desired result on stdin.

f2=/tmp/f2              # Use of PID or mktmp would be better'
diff_res=/tmp/diff_res  # Use of PID or mktmp would be better'

cat << EOF > $f2
$PAT
EOF

diff $FILE $f2 > $diff_res
if test $? -ne 0; then
  grep -q '^> ' $diff_res
  if test $? -ne 0; then
    nb=$(grep -v "^< " $diff_res | wc -l)
    if test $nb -eq 1; then
      grep '^< ' $diff_res | cut -c 3-
    elif test $nb -eq 2; then
      pat_found=$(sed -n -e '1{/^1d/p;/^1,/p}' $diff_res | wc -l)
      test $pat_found -eq 1 && grep '^< ' $diff_res | cut -c 3-
    fi
  fi
fi

rm -f $f2 $diff_res

I had to pick Gilles answer for simplicity, but this was just brilliant! Excellent stuff. — Sepero
– Sepero, Commented Dec 7, 2012 at 17:35

Gilles 'SO- stop being evil' · Accepted Answer · 2012-12-07 00:29:39Z

I assume that you're rewriting a text file that fits in memory (it looks like you're rewriting a configuration file).

The following script only uses shell builtin features and cat. It should work on Android's shell, at least since Gingerbread and definitely since Ice Cream Sandwich. It prints the file contents minus the first occurrence of $PAT if there is one; if $PAT does not occur, it prints nothing.

contents=$(cat "$FILE")
case $contents in
  *"$PAT"*)
    echo "${contents%%$PAT*}${contents#*$PAT}";;
esac

This snippet assumes that the file does not contain any null byte, ends in a single newline, and does not start with a dash. Also, if the pattern ends with a newline, it won't be found at the end of the file. The following more complex snippet copes with arbitrary text files:

contents=$(cat "$FILE"; echo a)
contents=${contents%a}
case $contents in
  *"$PAT"*)
    contents="${contents%%$PAT*}${contents#*$PAT}"
    dashes=${contents%%[!-]*}
    echo -n "$dashes"
    echo -n "${contents#$dashes}";;
esac

(Note that your proposed behavior makes it impossible to distinguish a file that contained exactly the pattern and an empty file.)

It's actually easier to implement your append/remove script directly than to use the proposed intermediate function.

contents=$(cat "$FILE"; echo a)
contents=${contents%a}
append=
case $contents in
  *"$PAT"*) contents="${contents%%$PAT*}${contents#*$PAT}";;
  *) contents="$contents$PAT"
esac
dashes=${contents%%[!-]*}
{ echo -n "$dashes"; echo -n "${contents#$dashes}"; } >"$FILE.new"
mv -- "$FILE.new" "$FILE"

You're so right about impossible distinguishment. I made a real error. Gosh your answer is great! Thanks so much. — Sepero
– Sepero, Commented Dec 7, 2012 at 17:40
Wouldn't it be safer to catch newlines using this? case "$contents" in — Sepero
– Sepero, Commented Dec 7, 2012 at 17:43
@Sepero You mean case "$contents" in instead of case $contents in? This is one of the few cases where you can leave out the double quotes. There can only be a single word, so the shell doesn't do word splitting and globbing even without double quotes. You can also leave out the double quotes in assignments: foo=$bar is equivalent to foo="$bar". However, export foo=$bar is not safe: if bar is hello world, this runs export foo=hello world, i.e. set foo to hello and export both foo and world). If you don't want to memorize such exceptions, use double quotes all the time. — Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil', Commented Dec 7, 2012 at 17:50

choroba · Accepted Answer · 2012-12-06 08:44:23Z

1

Read the file character by character. If the character matches the first character of the variable, compare the next one, and so on. If the whole variable is not matched, return back. You can even implement a more advanced algorithm to make it work faster, but as your language happens to be the shell, it would be terribly slow anyway.

answered Dec 6, 2012 at 8:44

choroba

49.7k7 gold badges92 silver badges119 bronze badges

Add a comment |

Stack Exchange Network

Script matching literal pattern over multiple lines?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Script matching literal pattern over multiple lines?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions