Python/Pandas create zip file from csv

Question

Is anyone can provide example how to create zip file from csv file using Python/Pandas package? Thank you

Stefan · Accepted Answer · 2018-12-04 13:12:59Z

37

Use

df.to_csv('my_file.gz', compression='gzip')

From the docs:

compression : string, optional a string representing the compression to use in the output file, allowed values are ‘gzip’, ‘bz2’, ‘xz’, only used when the first argument is a filename

See discussion of support of zip files here.

edited Dec 4, 2018 at 13:12

answered Jun 10, 2016 at 17:38

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Felix Over a year ago

Is this zip or gzip?

Anton Ermakov · Accepted Answer · 2020-11-28 20:28:18Z

22

In the to_csv() method of pandas, besides the compression type (gz, zip etc) you can specify the archive file name - just pass the dict with necessary params as the compression parameter:

compression_opts = dict(method='zip',
                        archive_name='out.csv')  
df.to_csv('out.zip', compression=compression_opts)

In the example above, the first argument of the to_csv method defines the name of the [ZIP] archive file, the method key of the dict defines [ZIP] compression type and the archive_name key of the dict defines the name of the [CSV] file inside the archive file.

Result:

├─ out.zip
│  └─ out.csv

See details in to_csv() pandas docs

edited Nov 28, 2020 at 20:28

answered Apr 7, 2020 at 23:00

Anton Ermakov

4694 silver badges15 bronze badges

Comments

Yi Xiang Chong · Accepted Answer · 2019-10-21 02:33:47Z

3

In response to Stefan's answer, add '.csv.gz' for the zip csv file to work

df.to_csv('my_file.csv.gz', compression='gzip')

Hope that helps

answered Oct 21, 2019 at 2:33

Yi Xiang Chong

77411 silver badges9 bronze badges

Comments

Miladiouss · Accepted Answer · 2021-06-24 00:39:27Z

2

The Pandas to_csv compression has some security vulnerabilities where it leaves the absolute path of the file in the zip archive on Linux machine. Not to mention one might want to save a file in the highest level of a zipped file. The following function addresses this issue by using zipfile. On top of that, it doesn't suffer from pickle protocol change (4 to 5).

from pathlib import Path
import zipfile

def save_compressed_df(df, dirPath, fileName):
    """Save a Pandas dataframe as a zipped .csv file.

    Parameters
    ----------
    df : pandas.core.frame.DataFrame
        Input dataframe.
    dirPath : str or pathlib.PosixPath
        Parent directory of the zipped file.
    fileName : str
        File name without extension.
    """

    dirPath = Path(dirPath)
    path_zip = dirPath / f'{fileName}.csv.zip'
    txt = df.to_csv(index=False)
    with zipfile.ZipFile(path_zip, 'w', zipfile.ZIP_DEFLATED) as zf:
        zf.writestr(f'{fileName}.csv', txt)

answered Jun 24, 2021 at 0:39

Miladiouss

4,8201 gold badge32 silver badges38 bronze badges

1 Comment

Paul Rougieux Over a year ago

As written in another answer and in the help page of df.to_csv, the compression argument accepts a dictionary which can specifiy the archive_name inside the zip archive. This works only for zip archives though, for gzip you have to use df.to_csv("/tmp/df.csv.gz", compression="gzip").

Collectives™ on Stack Overflow

Python/Pandas create zip file from csv

4 Answers 4

1 Comment

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related