1

I have a file whose name is 142490.1 and that file will have content like this -

^A^A^@^@^@=^@^@=y^B^@e^A^C^@f^B^H¬^\ÂA^Y^A^G^B<81>s
^A^@G@client.1424906160996.30431.DC1.5faa5c2a-c382-40b8-baa8-234a8e6ecd19^@^@^A^F<8b>f^@ø^@y^@^@^AKÃ^F<86>T^@^@^@êõ^A\^@^R304344351^N2047675^@^D77^@^Y^W^B^@
27.99^@^X261449949761^@Ã^O^@<92>^NICHOLSON Baseball     ^V|t -S M L XL XXL(2)^@
15724^@
63862^U^GðV11450^@^B7^@<9a>^A^@^L823196^@¨<99>´°øR^B^@^TBj%2FRZUw*^@^PBoZf8jU*^@^T1032869222^B^@&LH_DefaultDomain_77^@^@^A^@^@H@client.1424906160992.116975.DC1.344073e8-93f6-487c-b343-7923080f07aa^@^@^AKÃ^F<8b>f^@­^@y^@^@^AKÃ^Eò<9f>£^AX^@^T1169755138^N2047935^@^B3.^W^@ð^?^B^@^H0.99^@^X171689807229^B^@rTOPSHOP LEATHER 3 EU 36^B^B^@
45333^B^B^@^F^@^L161103^@ðï°øR^B^B^@^PBosZQlE*^B^B^B^@^@^A^@^@G@client.1424906160976.1295684.DC1.66a6ca77-30ee-4d50-b7ea-4a524eb94af1^@^@^AKÃ^F<8b>f^@¤^@y^@^@^AKÃ^F<89>^O^@^@^@<96><9a>^AT^@^R129569484^N2047935^@^B3^]^V^B^@^F499^853759648^B^@bWILLIS AND^B^B^@
20489^B^B^@^F^@^P-1404420^@<9e>¤´°øR^B^B^@^PBop4ml0*^B^B^B^@^@^A^@^@H@client.1424906160989.104826.DC1.4d58c06a-3526-408a-a48b-8bdc82b94dba^@^@^AKÃ^F<8b>f^@¨^@R^@^@^AKÃ^F<83>¶^@^@^@<9a>·^AX^@^T1048328026^N2045573^@^B0.^W^@^P^B^B^^Að@^@^H6000^@^Z1955 corvette^@ì<8e>´°øR^B^@^PBiZzFm8*^@^PBoO8YKc*^@^@^A^@

I know above file content looks mainly binary but there are some strings in the file which we can read it clearly.

If you see the above file content, you will see a string like this -

@client.1424906160996.30431.DC1.5faa5c2a-c382-40b8-baa8-234a8e6ecd19

In the above string 1424906160996 is a timestamp.

ProblemStatement:

I need to find all the strings which starts with @client and whose timestamp is one minute old as compared to current timestamp.

Let's say if below are the strings which starts with @client and whose timestamp is one minute older as compared to current timestamp, then it should print out like this after reading the file -

@client.1424906161996.3031.DC1.5faaa-c382-40b8-baa8-234a8ed19
@client.1424906162996.3041.DC1.5a5c2a-c382-40b8-baa8-238e6ec9
@client.1424906163996.3043231.DC1.5faa2a-c382-40b8-baa8-23e6ed19
@client.1424906164996.3016731.DC1.5faa5a-c382-40b8-baa8-234ad19

Is there any way to do this using shell script which can read the above file and print out those strings which starts with @client and whose timestamp is older than 1 minute.

I have Ubuntu 12.04 running.

2 Answers 2

2

You should try something with strings, it only keep printable ASCII characters from your file :

strings - 142490.1 |
  awk -F '.' -v timestamp="$(date +%s)" '/^@client/ && $2 < (timestamp - 60)*1000 {print}'

This awk script may be too specific to this example : it look at the field between the first and the second dot, and consider it's the timestamp. If it's less than the current timestamp - 60 seconds, it print the line.

Hope it helped.

EDIT : As noted by Thomas Dickey (I'm new here, I don't know how to make a real reference to your account), you have to use the - flag on strings

EDIT2 : After a few attempts, we reached a working version by adapting another answer from @ThomasDickey

FILE=1424911080.1
strings - $FILE |
  awk -v fileTs="${FILE%.*}000" '/@client/ { ts = $0 ; sub("^.*@client\.","", ts); sub("\..*$","",ts); if ( ts - fileTs > 500 || ts - fileTs < -500 ) { print $0; } }'

Finally, to have the percentage of lines that have a timestamp difference > 500 :

FILE=1424911080.1
tot=$(strings - "$FILE" | grep '@client' |wc -l)
old=$(strings - "$FILE" |
  awk -v fileTs="${FILE%.*}000" '/@client/ { ts = $0 ; sub("^.*@client\.","", ts); sub("\..*$","",ts); if ( ts - fileTs > 500 || ts - fileTs < -500 ) { print $0; } }' |
  wc -l)

echo "old : $(( old * 100 / tot ))%"
Sign up to request clarification or add additional context in comments.

13 Comments

If you don't use the "-" option, strings assumes it is an executable file, and will omit parts.
@CorentinPeuvrel I tried this but it printed out bunch of stuff as well which doesn't start with @client. Any idea what could be wrong?
Try to replace the awk script by /^@client/ && $2 < (timestamp - 60)*1000 {print}. I didn't checked that it was a line starting with @client
@CorentinPeuvrel Can you edit that as well for the sake of completeness?
As I said, "strings" is the simplest way - as a caveat it sometimes adds trailing junk if the strings are not null-terminated.
|
2

The simplest way to extract the data is by using the strings utility, telling it to scan the whole file, e.g.,

strings - inputfile | egrep '@client(\.[[:xdigit:]]+)+(-[[:xdigit:]]+)+'

but as noted in the other example, there is still the timestamp to consider. That can be done by piping the raw data through awk, e.g.,

awk '/@client/ { ts = $0; sub("^.*@client.","",ts); sub("\..*$","",ts); if ( ts >= '$TS' - 60 and ts < '$TS' ) { print $0; } }'

where $TS is the value that you are looking for (a range makes more sense than equality).

Actually the egrep is redundant (awk/mawk/gawk can do character classes unless you're using the obsolete version from Ubuntu). But it helps to break the process into stages to check that they work. In the awk script,

  • it starts with a simple pattern /@client/
  • I'm not certain strings will return this at the beginning of a line, but then
  • assign the line contents $0 to a variable which I can modify,
  • trim off the part through "@client."
  • trim off the part beginning with "." (is that milliseconds?)
  • compare the value to the $TS variable (passed in as part of the script, though another recent posting reminds us that awk's "-v" option would work too).
  • if it passes the comparison, print the original line

As an aside, I'm aware that awk has a "-v" option, but since I generally build up scripts using the simplest tool which works first (such as sed), I generally do direct substitution by habit, saving "-v" for scripts passed as separate files. I did (long ago) run into an awk which did not support "-v" -- see changelog). But we can take for granted that it is there.

5 Comments

I tried your suggestion but I didn't got any output on the console. In your suggestion how can I do the check for timestamp as well?
The timestamp check is in the second chunk of script -- I'll add some text to help.
How will I pass $TS variable value?
One more thing, looks like I might be off what I was trying to do. I guess I need to print out @client strings whose difference between current timestamp and timestamp in the string is greater than 1 minute.
In the example I gave, $TS would be an ordinary shell variable, e.g., TS=1424906161996 (awk user-defined variables do not begin with "$").

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.