3

I'm looking to replace characters at specific byte offsets.

Here's what is provided: An input file that is simple ASCII text. An array within a Bash shell script, each element of the array is a numerical byte-offset value.

The goal: Take the input file, and at each of the byte-offsets, replace the character there with an asterisk.

So essentially the idea I have in mind is to somehow go through the file, byte-by-byte, and if the current byte-offset being read is a match for an element value from the array of offsets, then replace that byte with an asterisk.

This post seems to indicate that the dd command would be a good candidate for this action, but I can't understand how to perform the replacement multiple times on the input file.

Input file looks like this:

00000
00000
00000

The array of offsets looks this:

offsetsArray=("2" "8" "9" "15")

The output file's desired format looks like this:

0*000
0**00
00*00

Any help you could provide is most appreciated. Thank you!

4
  • How does offset of 7 result in 0**00? Commented Apr 19, 2014 at 20:57
  • I assume you're talking about offset from the beginning of the file. But your example output seems to ignore newline characters at the end of each line. I would expect that output with the array ("2" "8" "9" "15") Commented Apr 19, 2014 at 21:02
  • If you already have a way of doing it once, can't you just do it once for each input? Commented Apr 19, 2014 at 21:04
  • @DigitalTrauma You're correct, I neglected those. I edited the question to reflect your corrected array values. Commented Apr 19, 2014 at 21:35

3 Answers 3

4

Please check my comment about about newline offset. Assuming this is correct (note I have changed your offset array), then I think this should work for you:

#!/bin/bash

read -r -d ''
offsetsArray=("2" "8" "9" "15")
txt="${REPLY}"
for i in "${offsetsArray[@]}"; do
    txt="${txt:0:$i-1}*${txt:$i}"
done
printf "%s" "$txt"

Explanation:

  • read -d '' reads the whole input (redirected file) in one go into the $REPLY variable. If you have large files, this can run you out of memory.
  • We then loop through the offsets array, one element at a time. We use each index i to grab i-1 characters from the beginning of the string, then insert a * character, then add the remaining bytes from offset i. This is done with bash parameter expansion. Note that while your offsets are one-based, strings use zero-based indexing.

In use:

$ ./replacechars.sh < input.txt
0*000
0**00
00*00
$ 

Caveat:

This is not really a very efficient solution, as it causes the sting containing the whole file to be copied for every offset. If you have large files and/or a large number of offsets, then this will run slowly. If you need something faster, then another language that allows modification of individual characters in a string would be much better.

Sign up to request clarification or add additional context in comments.

6 Comments

Boy, Im impressed. Im still struggling to understand 0**00. But +1 for this great piece of work!
Wow, that's like Elven magic! I don't know how that works, but it certainly does. I took this code and plugged it into a script that was already being written. Changed the read line to read -r -d '' < /path/to/input/file', and changed the printf` line to printf "%s" "$txt" > /path/to/output/file ... and viola, works like a charm! Thank you!
@stotrami I am with you. His solution is giving me trauma digitally. ;)
@DigitalTrauma Could you take a minute and explain what's going on in the line txt="${txt:0:10#$i-1}*${txt:10#$i}" please?
the 10# is actually not needed, it's the same as txt="${txt:0:$i-1}*${txt:$i}" - which creates a new string by taking the part from 0 to $i-1, then an * and then the rest from index $i. reference: bash parameter espansion
|
3

The usage of dd can be a bit confusing at the time, but it's not that hard:

outfile="test.txt"

# create some test data
echo -n 0123456789abcde > "$outfile"

offsetsArray=("2" "7" "8" "13")
for offset in "${offsetsArray[@]}"; do
    dd bs=1 count=1 seek="$offset" conv=notrunc of="$outfile" <<< '*'
done

cat "$outfile"

Important for this example is to use conv=notrunc, otherwise dd truncates the file to the length of blocks it seeks over. bs=1 specifies that you want to work with blocks of size 1, and seek specifies the offset to satart writing count blocks to.

The above produces 01*3456**9abc*e

4 Comments

+1 for addressing the use of dd. The explanation is immensely helpful :) Also this works great, thank you!
dd is sometimes referred to as the Swiss Army knife of unix utilities. I prefer to think of it as a Swiss Army knife with a chainsaw attachment. You can very easily do serious damage to data on your disk without even knowing it. That being said, this answer looks good and should be much faster than mine for large files. +1
@DigitalTrauma That's a good analogy, and a wise precaution to keep in mind. Although this option works, I was forced to not use it because I could not figure out how to suppress dd's operations messages from going to stdout. Normally I would redirect to null, but the syntax provided in this example does not appear to allow that. Maybe there's another workaround?
dd bs=1 count=1 seek="$offset" conv=notrunc of="$outfile" <<< '*' 2>/dev/null works fine for me... (dd writes messages to stderr, not stdout)
2

With the same offset considerations as @DigitalTrauma's superior solution, here's a GNU awk-based alternative. This assumes your file contains no null bytes

(IFS=','; awk -F '' -v RS=$'\0' -v OFS=''  -v offsets="${offsetsArray[*]}" \
'BEGIN{split(offsets, o, ",")};{for (k in o)  $o[k]="*"; print}' file)

0*000
0**00
00*00

1 Comment

awk could be nice for certain reasons. Is there a way to include the array via variable name into this awk command?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.