1

i'am new to python. I have one txt file. it contains some data like

0: 480x640 2 persons, 1 cat, 1 clock, 1: 480x640 2 persons, 1 chair, Done. date (0.635s) Tue, 05 April 03:54:02 
0: 480x640 3 persons, 1 cat, 1 laptop, 1 clock, 1: 480x640 4 persons, 2 chairs, Done. date (0.587s) Tue, 05 April 03:54:05 
0: 480x640 3 persons, 1 chair, 1: 480x640 4 persons, 2 chairs, Done. date (0.582s) Tue, 05 April 03:54:07 

i used to convert it into pandas dataframe with multiple delimiter

i tried code :

import pandas as pd

`student_csv =  pd.read_csv('output.txt', names=['a', 'b','date','status'], sep='[0: 480x640, 1: 480x640 , date]')

student_csv.to_csv('txttocsv.csv', index = None)`

Now how to convert it into pandas dataframe like this...

     a               b                       c           
    
2 persons    2 persons,  Done    Tue, 05 April03:54:02   

How to convert text file into dataframe

1
  • I ran this code on the text file and it throws an error: Expected 67 fields in line 2, saw 73. Error could possibly be due to quotes being ignored when a multi-char delimiter is used. This means that the example text file is not being read correctly... Please provide a correct dataframe or stackoverflow.com/help/minimal-reproducible-example. Commented Apr 5, 2022 at 11:13

2 Answers 2

2

It's tricky to know exactly what are your rules for splitting. You can use a regex as delimiter.

Here is a working example to split the lists and date as columns, but you'll probably have to tweak it to your exact rules:

df = pd.read_csv('output.txt', sep=r'(?:,\s*|^)(?:\d+: \d+x\d+|Done[^)]+\)\s*)',
                 header=None, engine='python', names=(None, 'a', 'b', 'date')).iloc[:, 1:]

output:

                                      a                     b                    date
0             2 persons, 1 cat, 1 clock    2 persons, 1 chair  Tue, 05 April 03:54:02
1   3 persons, 1 cat, 1 laptop, 1 clock   4 persons, 2 chairs  Tue, 05 April 03:54:05
2                    3 persons, 1 chair   4 persons, 2 chairs  Tue, 05 April 03:54:07
Sign up to request clarification or add additional context in comments.

1 Comment

df.to_json(), check the docs for the exact format you want
1

You can use | in sep argument for multiple delimiters

df = pd.read_csv('data.txt', sep=r'0: 480x640|1: 480x640|date \(.*\)',
                 engine='python', names=('None', 'a', 'b', 'c')).drop('None', axis=1)
print(df)

                                        a                             b  \
0             2 persons, 1 cat, 1 clock,     2 persons, 1 chair, Done.
1   3 persons, 1 cat, 1 laptop, 1 clock,    4 persons, 2 chairs, Done.
2                    3 persons, 1 chair,    4 persons, 2 chairs, Done.

                     c
0  Tue, 05 April 03:54:02
1  Tue, 05 April 03:54:05
2  Tue, 05 April 03:54:07

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.