0

I have multiple .txt files in a directory and I want to merge them into one by importing in python. The catch here is that after the merge I want to convert it into one csv file on which the whole program is based.

So far I only had to input one .txt file and converted it into csv file by this code:

import io
bytes = open('XYZ.txt', 'rb').read()
df=pd.read_csv(io.StringIO(bytes.decode('utf-8')), sep='\t', parse_dates=['Time'] )
df.head()

Now I need to input multiple .txt files, merge them and then convert them into csv files. Any workaround?

4
  • Are the headers of your files the same? Commented Jul 12, 2018 at 5:27
  • Yes. Just need to merge the data in the columns Commented Jul 12, 2018 at 5:29
  • you don't need pandas to do this if thats all its doing, csv module in the standard library would do fine Commented Jul 14, 2018 at 4:31
  • also those are commonly referred to as tsv files, so really your question is about converting tsv to csv? Commented Jul 14, 2018 at 4:39

2 Answers 2

3

If the headers are same then it should be as easy as this

import os
import io

merged_df = pd.DataFrame()
for file in os.listdir("PATH_OF_DIRECTORY"):
    if file.endswith(".txt"):
        bytes = open(file, 'rb').read()
        merged_df = merged_df.append(pd.read_csv(io.StringIO(
            bytes.decode('utf-8')), sep='\t', parse_dates=['Time']))

print(len(merged_df))
Sign up to request clarification or add additional context in comments.

5 Comments

You can manipulate list of files by doing some string filter or regex operations.
Can you please tell how I can edit it so that it only takes .txt files?
There are 4 .txt files in the directory and it is only taking in one. Working same like the code which I wrote for reading 1 file
Then make sure path is right and all the extensions are exactly ".txt" (matching the case)as this is very straightforward code.
Path is alright as it is reading one file. And the other files are also .txt. Not working. Weird. Thanks anyway man!
0
import glob
path="location/of/folder"
allFiles = glob.glob(path + "\\*.txt")

list_ = []
for file in allFiles:
    print(file)
    df = pd.read_csv(io.StringIO(file.decode('utf-8')), sep='\t', parse_dates=['Time'])
    list_.append(df)
combined_files = pd.concat(list_)

1 Comment

Could you please explain why you thing this is a solution? How is your code different from the one in the question?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.