Replace comma with pipe delimiter within the same file using python

Question

The following is the code, this code works fine and I get an output file with pipe as a delimiter. However, I do not want a new file to be generated rather I would like the existing file to be replaced with pipe delimiter instead of comma. Appreciate your inputs. I am new to python and learning it on the go.

with open(dst1,encoding='utf-8',errors='ignore') as input_file:
    with open(dst2, 'w',encoding='utf-8',errors='ignore', newline='') as output_file:
        reader = csv.DictReader(input_file, delimiter=',')
        writer = csv.DictWriter(output_file, reader.fieldnames,'uft-8', delimiter='|')
        writer.writeheader()
        writer.writerows(reader)

Well, if everything fits in memory, just keep the data and rewrite them after. If not, just do a temp_file — BlueSheepToken
– BlueSheepToken, Commented Sep 4, 2019 at 17:36
@snakecharmerb: Usually you'd do it the other way around; write a new file, then atomically replace the original file with the new file only when the new file has been completely written. — ShadowRanger
– ShadowRanger, Commented Sep 4, 2019 at 17:47

ShadowRanger · Accepted Answer · 2019-09-04 17:55:53Z

The only truly safe way to do this is to write to a new file, then atomically replace the old file with the new file. Any other solution risks data loss/corruption on power loss. The simple approach is to use the tempfile module to make a temporary file in the same directory (so atomic replace will work):

import os.path
import tempfile

with open(dst1, encoding='utf-8', errors='ignore', newline='') as input_file, \
     tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', newline='',
                                 dir=os.path.dirname(dst1), delete=False) as tf:
    try:
        reader = csv.DictReader(input_file)
        writer = csv.DictWriter(tf, reader.fieldnames, delimiter='|')
        writer.writeheader()
        writer.writerows(reader)
    except:
        # On error, remove temporary before reraising exception
        os.remove(tf.name)
        raise
    else:
        # else is optional, if you want to be extra careful that all
        # data is synced to disk to reduce risk that metadata updates
        # before data synced to disk:
        tf.flush()
        os.fsync(tf.fileno())

# Atomically replace original file with temporary now that with block exited and
# data fully written
try:
    os.replace(tf.name, dst1)
except:
    # On error, remove temporary before reraising exception
    os.remove(tf.name)
    raise

blhsing · Accepted Answer · 2019-09-04 17:59:16Z

0

Since you are simply replacing a single-character delimiter from one to another, there will be no change in file size or positions of any characters not being replaced. As such, this is a perfect scenario to open the file in r+ mode to allow writing back the processed content to the very same file being read at the same time, so that no temporary file is ever needed:

with open(dst, encoding='utf-8', errors='ignore') as input_file, open(dst, 'r+', encoding='utf-8', errors='ignore', newline='') as output_file:
    reader = csv.DictReader(input_file, delimiter=',')
    writer = csv.DictWriter(output_file, reader.fieldnames, 'uft-8', delimiter='|')
    writer.writeheader()
    writer.writerows(reader)

EDIT: Please read @ShadowRanger's comment for limitations of this approach.

edited Sep 4, 2019 at 17:59

answered Sep 4, 2019 at 17:49

blhsing

109k9 gold badges88 silver badges132 bronze badges

14 Comments

ShadowRanger Over a year ago

There's no guarantee the file size won't change actually. The default quoting rule for the csv module is csv.QUOTE_MINIMAL, which only quotes fields if they contain the delimiter, quote character or line terminator; if you change the delimiter from , to |, fields that previously required quoting due to embedded commas won't be quoted if they don't contain |. And if the script is killed partway through (for whatever reason; power loss, program crash, user hits Ctrl-C), you'll end up with a mix of new and old data.

blhsing Over a year ago

Good point. I'll leave my answer here still just in case the OP's actual CSV files don't involve any quoted fields and just want something minimal. But I agree that this is not a robust solution in general.

ShadowRanger Over a year ago

Note: You could probably fix the file size issue (though not the problems with power loss/crash/Ctrl-C) by adding output_file.truncate() after the writerows call. Leaves the (relatively unlikely) possibility that the new file data is so much larger that it overwrites part of the file before you get around to buffering the data from the file, but at least it doesn't risk trailing garbage.

user11996418 Over a year ago

Both the solutions worked. However,I am convinced towards using ShadowRanger's solution Thank you for the help appreciate your time.

user11996418 Over a year ago

Hi again, On the same note, one of the column in my csv file has data with | in between the data. How shall I remove it.Thankyou!!

|

Aditya Mishra · Accepted Answer · 2019-09-04 18:00:51Z

0

I'm not totally sure, but if the file is not too big, you can load the file in pandas using read_csv & then save it using your desired delimiter using to_csv function using whatever delimiter you like. For example -

import pandas as pd
data = pd.read_csv(input_file, encoding='utf-8')
data.to_csv(input_file, sep='|', encoding='utf-8')

Hope this helps!!

edited Sep 4, 2019 at 18:00

answered Sep 4, 2019 at 17:56

Aditya Mishra

1,9053 gold badges17 silver badges27 bronze badges

2 Comments

ShadowRanger Over a year ago

This doesn't replace the original file... And even if you change it to do so by passing input_file to to_csv as well, it does risk data corruption (since it will be rewriting the file in place by truncating it, then writing out the new data, and a crash partway through will lose data). Beyond that, if the OP isn't already using pandas, adding it as a dependency is a pretty heavyweight solution.

Aditya Mishra Over a year ago

Yeah, I do agree with you. But I think it is a neat solution. Thanks for bringing it to my notice

Collectives™ on Stack Overflow

Replace comma with pipe delimiter within the same file using python

3 Answers 3

Comments

14 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

14 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related