1

I am trying to compress in snappy format a csv file using a python script and the python-snappy module. This is my code so far:

import snappy
d = snappy.compress("C:\\Users\\my_user\\Desktop\\Test\\Test_file.csv")
with open("compressed_file.snappy", 'w') as snappy_data:
     snappy_data.write(d)
snappy_data.close()

This code actually creates a snappy file, but the snappy file created only contains a string: "C:\Users\my_user\Desktop\Test\Test_file.csv"

So I am a bit lost on getting my csv compressed. I got it done working on windows cmd with this command:

python -m snappy -c Test_file.csv compressed_file.snappy

But I need it to be done as a part of a python script, so working on cmd is not fine for me.

Thank you very much, Álvaro

1 Answer 1

2

You are compressing the plain string, as the compress function takes raw data.

There are two ways to compress snappy data - as one block and the other as streaming (or framed) data

This function will compress a file using framed method

import snappy

def snappy_compress(path):
        path_to_store = path+'.snappy'

        with open(path, 'rb') as in_file:
          with open(path_to_store, 'w') as out_file:
            snappy.stream_compress(in_file, out_file)
            out_file.close()
            in_file.close()

        return path_to_store

snappy_compress('testfile.csv')

You can decompress from command line using:

python -m snappy -d testfile.csv.snappy testfile_decompressed.csv

It should be noted that the current framing used by python / snappy is not compatible with the framing used by Hadoop

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.