26

Is anyone can provide example how to create zip file from csv file using Python/Pandas package? Thank you

4 Answers 4

37

Use

df.to_csv('my_file.gz', compression='gzip')

From the docs:

compression : string, optional a string representing the compression to use in the output file, allowed values are ‘gzip’, ‘bz2’, ‘xz’, only used when the first argument is a filename

See discussion of support of zip files here.

Sign up to request clarification or add additional context in comments.

1 Comment

Is this zip or gzip?
22

In the to_csv() method of pandas, besides the compression type (gz, zip etc) you can specify the archive file name - just pass the dict with necessary params as the compression parameter:

compression_opts = dict(method='zip',
                        archive_name='out.csv')  
df.to_csv('out.zip', compression=compression_opts)

In the example above, the first argument of the to_csv method defines the name of the [ZIP] archive file, the method key of the dict defines [ZIP] compression type and the archive_name key of the dict defines the name of the [CSV] file inside the archive file.

Result:

├─ out.zip
│  └─ out.csv

See details in to_csv() pandas docs

Comments

3

In response to Stefan's answer, add '.csv.gz' for the zip csv file to work

df.to_csv('my_file.csv.gz', compression='gzip')

Hope that helps

Comments

2

The Pandas to_csv compression has some security vulnerabilities where it leaves the absolute path of the file in the zip archive on Linux machine. Not to mention one might want to save a file in the highest level of a zipped file. The following function addresses this issue by using zipfile. On top of that, it doesn't suffer from pickle protocol change (4 to 5).

from pathlib import Path
import zipfile

def save_compressed_df(df, dirPath, fileName):
    """Save a Pandas dataframe as a zipped .csv file.

    Parameters
    ----------
    df : pandas.core.frame.DataFrame
        Input dataframe.
    dirPath : str or pathlib.PosixPath
        Parent directory of the zipped file.
    fileName : str
        File name without extension.
    """

    dirPath = Path(dirPath)
    path_zip = dirPath / f'{fileName}.csv.zip'
    txt = df.to_csv(index=False)
    with zipfile.ZipFile(path_zip, 'w', zipfile.ZIP_DEFLATED) as zf:
        zf.writestr(f'{fileName}.csv', txt)

1 Comment

As written in another answer and in the help page of df.to_csv, the compression argument accepts a dictionary which can specifiy the archive_name inside the zip archive. This works only for zip archives though, for gzip you have to use df.to_csv("/tmp/df.csv.gz", compression="gzip").

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.