Bash shell scripting: How to replace characters at specific byte offsets

Question

I'm looking to replace characters at specific byte offsets.

Here's what is provided: An input file that is simple ASCII text. An array within a Bash shell script, each element of the array is a numerical byte-offset value.

The goal: Take the input file, and at each of the byte-offsets, replace the character there with an asterisk.

So essentially the idea I have in mind is to somehow go through the file, byte-by-byte, and if the current byte-offset being read is a match for an element value from the array of offsets, then replace that byte with an asterisk.

This post seems to indicate that the dd command would be a good candidate for this action, but I can't understand how to perform the replacement multiple times on the input file.

Input file looks like this:

00000
00000
00000

The array of offsets looks this:

offsetsArray=("2" "8" "9" "15")

The output file's desired format looks like this:

0*000
0**00
00*00

Any help you could provide is most appreciated. Thank you!

I assume you're talking about offset from the beginning of the file. But your example output seems to ignore newline characters at the end of each line. I would expect that output with the array ("2" "8" "9" "15") — Digital Trauma
– Digital Trauma, Commented Apr 19, 2014 at 21:02
If you already have a way of doing it once, can't you just do it once for each input? — that other guy
– that other guy, Commented Apr 19, 2014 at 21:04
@DigitalTrauma You're correct, I neglected those. I edited the question to reflect your corrected array values. — stotrami
– stotrami, Commented Apr 19, 2014 at 21:35

Digital Trauma · Accepted Answer · 2014-04-19 22:57:23Z

4

Please check my comment about about newline offset. Assuming this is correct (note I have changed your offset array), then I think this should work for you:

#!/bin/bash

read -r -d ''
offsetsArray=("2" "8" "9" "15")
txt="${REPLY}"
for i in "${offsetsArray[@]}"; do
    txt="${txt:0:$i-1}*${txt:$i}"
done
printf "%s" "$txt"

Explanation:

read -d '' reads the whole input (redirected file) in one go into the $REPLY variable. If you have large files, this can run you out of memory.
We then loop through the offsets array, one element at a time. We use each index i to grab i-1 characters from the beginning of the string, then insert a * character, then add the remaining bytes from offset i. This is done with bash parameter expansion. Note that while your offsets are one-based, bash strings use zero-based indexing.

In use:

$ ./replacechars.sh < input.txt
0*000
0**00
00*00
$

Caveat:

This is not really a very efficient solution, as it causes the sting containing the whole file to be copied for every offset. If you have large files and/or a large number of offsets, then this will run slowly. If you need something faster, then another language that allows modification of individual characters in a string would be much better.

edited Apr 19, 2014 at 22:57

answered Apr 19, 2014 at 21:07

Digital Trauma

16.1k4 gold badges55 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

jaypal singh Over a year ago

Boy, Im impressed. Im still struggling to understand 0**00. But +1 for this great piece of work!

stotrami Over a year ago

Wow, that's like Elven magic! I don't know how that works, but it certainly does. I took this code and plugged it into a script that was already being written. Changed the read line to read -r -d '' < /path/to/input/file', and changed the printf` line to printf "%s" "$txt" > /path/to/output/file ... and viola, works like a charm! Thank you!

jaypal singh Over a year ago

@stotrami I am with you. His solution is giving me trauma digitally. ;)

stotrami Over a year ago

@DigitalTrauma Could you take a minute and explain what's going on in the line txt="${txt:0:10#$i-1}*${txt:10#$i}" please?

mata Over a year ago

the 10# is actually not needed, it's the same as txt="${txt:0:$i-1}*${txt:$i}" - which creates a new string by taking the part from 0 to $i-1, then an * and then the rest from index $i. reference: bash parameter espansion

|

mata · Accepted Answer · 2014-04-19 21:16:17Z

3

The usage of dd can be a bit confusing at the time, but it's not that hard:

outfile="test.txt"

# create some test data
echo -n 0123456789abcde > "$outfile"

offsetsArray=("2" "7" "8" "13")
for offset in "${offsetsArray[@]}"; do
    dd bs=1 count=1 seek="$offset" conv=notrunc of="$outfile" <<< '*'
done

cat "$outfile"

Important for this example is to use conv=notrunc, otherwise dd truncates the file to the length of blocks it seeks over. bs=1 specifies that you want to work with blocks of size 1, and seek specifies the offset to satart writing count blocks to.

The above produces 01*3456**9abc*e

answered Apr 19, 2014 at 21:16

mata

69.3k10 gold badges168 silver badges162 bronze badges

4 Comments

stotrami Over a year ago

+1 for addressing the use of dd. The explanation is immensely helpful :) Also this works great, thank you!

Digital Trauma Over a year ago

dd is sometimes referred to as the Swiss Army knife of unix utilities. I prefer to think of it as a Swiss Army knife with a chainsaw attachment. You can very easily do serious damage to data on your disk without even knowing it. That being said, this answer looks good and should be much faster than mine for large files. +1

stotrami Over a year ago

@DigitalTrauma That's a good analogy, and a wise precaution to keep in mind. Although this option works, I was forced to not use it because I could not figure out how to suppress dd's operations messages from going to stdout. Normally I would redirect to null, but the syntax provided in this example does not appear to allow that. Maybe there's another workaround?

mata Over a year ago

dd bs=1 count=1 seek="$offset" conv=notrunc of="$outfile" <<< '*' 2>/dev/null works fine for me... (dd writes messages to stderr, not stdout)

iruvar · Accepted Answer · 2014-04-19 21:57:23Z

2

With the same offset considerations as @DigitalTrauma's superior solution, here's a GNU awk-based alternative. This assumes your file contains no null bytes

(IFS=','; awk -F '' -v RS=$'\0' -v OFS=''  -v offsets="${offsetsArray[*]}" \
'BEGIN{split(offsets, o, ",")};{for (k in o)  $o[k]="*"; print}' file)

0*000
0**00
00*00

edited Apr 19, 2014 at 21:57

answered Apr 19, 2014 at 21:12

iruvar

23.5k7 gold badges58 silver badges83 bronze badges

1 Comment

stotrami Over a year ago

awk could be nice for certain reasons. Is there a way to include the array via variable name into this awk command?

Collectives™ on Stack Overflow

Bash shell scripting: How to replace characters at specific byte offsets

3 Answers 3

6 Comments

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related