Bash Text file formatting

Question

I have some files with the following format:

555584280113;01-04-2013 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
552185022741;01-04-2013 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511965271852;01-04-2013 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511980644500;01-04-2013 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
553186398559;01-04-2013 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
558487839822;01-04-2013 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz

I need to have them with a sequence of 10 digits long at the beginning, removed the prefix 55 on the second column (which I have done with a simple sed 's/^55//g') and reformat the date to look like this:

0000000001;555584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;552185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;5511965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;5511980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;553186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz

I have the date part in a separate way:

cat file.txt | cut -d\; -f2 | awk '{print $1}' |awk -v OFS="-" -F"-" '{print $3$2$1}'

And it works, but I don't know how to put all of them together, the sequence + sed for the prefix + change the date format. The sequence part I'm not even sure how to do it.

Thanks for the help.

jaypal singh · Accepted Answer · 2014-03-07 03:37:04Z

6

awk is one of the best tool out there used for text parsing and formatting. Here is one way of meeting your requirements:

awk '
BEGIN { FS = OFS = ";" }
{
    printf "%010d;", NR
    $1 = substr($1,3)
    split($2, tmp, /[- ]/)
    $2=tmp[3]tmp[2]tmp[1]" "tmp[4]
}1' file

We set the input and output field separator to ;
We use printf to format your first column number requirement
We use substr function to remove the first two characters of column 1
We use split function to format the time
Using 1 we print rest of the statement as is.

Output:

0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz

answered Mar 7, 2014 at 3:37

jaypal singh

77.6k24 gold badges108 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Age Over a year ago

This one worked perfect! Hats off. Thanks a lot. I will certainly read more about awk.

John1024 · Accepted Answer · 2014-03-07 03:50:29Z

2

If the name of the input file is input, then the following command removes the 55, adds a 10-digit line number, and rearranges the date. With GNU sed:

 nl -nrz -w10 -s\; input | sed -r 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'

If one is using Mac OSX (or another OS without GNU sed), then a slight change is required:

 nl -nrz -w10 -s\; input | sed -E 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'

Sample output:

0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz

How it works: nl is a handy *nix utility for adding line numbers. -w10 tells nl that we want 10 digit line numbers. -nrz tells nl to pad the line numbers with zeros, and -s\; tells nl to add a semicolon after the line number. (We have to escape the semicolon so that the shell ignores it.)

The remaining changes are handled by sed. The sed command s/55// removes the first occurrence of 55. The rearrangement of the date is handled by s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/.

edited Mar 7, 2014 at 3:50

answered Mar 7, 2014 at 3:31

John1024

115k15 gold badges152 silver badges183 bronze badges

2 Comments

Age Over a year ago

Not sure if it's my sed version but I got an error on this one: ➜ SMS_April_2013 nl -nrz -w10 -s\; SMS.txt | sed -r 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/' sed: illegal option -- r usage: sed script [-Ealn] [-i extension] [file ...] sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...]

John1024 Over a year ago

@Age That likely means that you are using a Mac. See updated answer.

John B · Accepted Answer · 2014-03-07 04:22:00Z

0

You could actually use a Bash loop to do this.

i=0
while read f1 f2; do
    ((++i))
    IFS=\; read n d <<< $f1
    d=${d:6:4}${d:3:2}${d:0:2}
    printf "%010d;%d;%d %s\n" $i $n $d $f2
done < file.txt

answered Mar 7, 2014 at 4:22

John B

3,6661 gold badge19 silver badges22 bronze badges

Collectives™ on Stack Overflow

Bash Text file formatting

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related