1

I have a txt file with some data I want to clean and export as csv but the format is too messed up . The lines in the txt file are in this format

[email protected]:specialcode | Status - 2022-11-25

[email protected]:anothercode | Status - 2023-08-15

[email protected]:codeworcd | Status - 2036-06-19

and so one

I want to convert the lines to

[email protected] , specialcode , Status , 2022-11-25

[email protected], anothercode , Status , 2023-08-15 

[email protected], codeworcd, Status, 2036-06-19

So that i can save the file as csv.

How can I approach such a complex situation? I can loop over the lines and split it with split(‘:’) but each character is different. So it appears more challenging.

Thanks

2 Answers 2

4

With your shown sample please try the following:

import csv, re

with open("file.txt") as fi, open("output.csv", "w") as fo:
    writer = csv.writer(fo)
    for line in fi:
        l = re.split(r':| [|-] ', line.rstrip())
        writer.writerow(l)

Result:

[email protected],specialcode,Status,2022-11-25
[email protected],anothercode,Status,2023-08-15
[email protected],codeworcd,Status,2036-06-19
  • It assumes the input filename is "file.txt" and the output filename is "output.csv".
  • The delimiter is defined as :| [|-] . It splits the line on a colon or a sequence of a whitespace, pipe or hyphen, and a whitespace. The important thing is the pipe character and hyphen are surrounded by whitespaces as shown in your sample.
Sign up to request clarification or add additional context in comments.

Comments

1

here is one way to do it without regex

assuming that you have a text file in filesystem, that you are or can read using read_csv

# read in the text file, and name the columns, assuming there is only one '|' in file

df=pd.read_csv(r'csv.csv', sep='|', header=None, names=['col1','col2'])

# split col1 on colon
df[['email','code']]=df['col1'].str.split(':', expand=True)

# split only one occurrence on hyphen
df[['status','date']]=df['col2'].str.split('-', 1,expand=True)

# drop the read-in columns
df=df.drop(columns=['col1','col2'])

# write to csv
df.to_csv(r'csvout.csv')


    email                 code          status  date
0   [email protected]    specialcode   Status  2022-11-25
1   [email protected] anothercode   Status  2023-08-15
2   [email protected]    codeworcd     Status  2036-06-19

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.