2

I want to get a part of a binary file, from byte #480161397 to #480170447 (included, 9051 bytes in total)

I use cut -b, and I expected the size of trunk1.gz to be 9051 bytes, but I get a different result.

$ wget https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2016-07/segments/1454701152097.59/warc/CC-MAIN-20160205193912-00264-ip-10-236-182-209.ec2.internal.warc.gz

$ cut -b480161397-480170447 CC-MAIN-20160205193912-00264-ip-10-236-182-209.ec2.internal.warc.gz >trunk1.gz

$ echo $((480170447-480161397+1))
9051

$ ls -l trunk1.gz
-rw-r--r--  1 david  staff     3400324 Sep  8 10:28 trunk1.gz

What is wrong?

4
  • What do you get if you do a wc -c trunk1.gz? Commented Sep 8, 2016 at 8:48
  • 3400324 trunk1.gz Commented Sep 8, 2016 at 8:54
  • This could help stackoverflow.com/questions/1423346/… Commented Sep 8, 2016 at 8:55
  • That means your cut is not doing what you thought it should. I tried cut -b with some .gz files that I had as well. I also got file sizes larger than the bbytes specified. In normal files this can be explained by the fact that there are columns in the files. So the command cut -b picks out the corresponding bytes from each line. Hence large file sizes. i.e. cut -b is probably not what you need here. Commented Sep 8, 2016 at 8:57

2 Answers 2

2

cut -bN-M copies the range N-M bytes from every line of the input.

Example:

$ cut -b4-7 <<END
0123456789
abcdefghij
ABCDEFGHIJ
END

Output:

3456
defg
DEFG

Consider using dd for your purposes.

Sign up to request clarification or add additional context in comments.

Comments

1

If you work with binary, I advise you to use dd command.

dd if=trunk1.gz bs=1 skip=480161397 count=9051 of=output.bin

bs is the block size and is set to 1 byte.

1 Comment

dd if=CC-MAIN-20160205193912-00264-ip-10-236-182-209.ec2.internal.warc.gz bs=1 skip=$((480161397-1)) count=9051 of=trunk1.gz

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.