2

My question: Is there a way to load data from all files in a directory using Python

Input: Get all files in a given directory of mine (wow.txt, testting.txt,etc.)

Process: I want to run all the files through a def function

Output: I want the output to be all the files names and their respective content below it.For example:

/home/file/wow.txt "all of its content" /home/file/www.txt "all of its content"


Here is my code:

# Import Functions
import os
import sys

# Define the file path
path="/home/my_files"
file_name="wow.txt"

#Load Data Function
def load_data(path,file_name):
    """
    Input  : path and file_name
    Purpose: loading text file
    Output : list of paragraphs/documents and
             title(initial 100 words considered as title of document)
    """
    documents_list = []
    titles=[]
    with open( os.path.join(path, file_name) ,"rt", encoding='latin-1') as fin:
        for line in fin.readlines():
            text = line.strip()
            documents_list.append(text)
    print("Total Number of Documents:",len(documents_list))
    titles.append( text[0:min(len(text),100)] )
    return documents_list,titles

#Output
load_data(path,file_name)

Here is my output:

enter image description here


My Problem is that my output only takes one file and shows its content. Obviously, i defined the path and file name in my code to one file but I am confused as to how to write the path in a way to load all the files and output each of its contents separately. Any suggestions?

1
  • 1
    look for os.listdir() or glob, read all the files and their contents afterwards Commented Apr 17, 2019 at 15:13

6 Answers 6

4

Using glob:

import glob
files = glob.glob("*.txt")           # get all the .txt files

for file in files:                   # iterate over the list of files
    with open(file, "r") as fin:     # open the file
        # rest of the code

Using os.listdir():

import os
arr = os.listdir()    
files = [x for x in arr if x.endswith('.txt')]

for file in files:                   # iterate over the list of files
    with open(file, "r") as fin:     # open the file
       # rest of the code
Sign up to request clarification or add additional context in comments.

Comments

3

Try this:

import glob

for file in glob.glob("test/*.xyz"):
    print(file)

if my directory name was "test" and I had lots of xyz files in them...

1 Comment

I accidentally published before finishing it :D
0

You can use glob and pandas

import pandas as pd import glob

path = r'some_directory' # use your path
all_files = glob.glob(path + "/*.txt")

li = []

for filename in all_files:
    #read file here
    # if you decide to use pandas you might need to use the 'sep' paramaeter as well
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

# get it all together
frame = pd.concat(li, axis=0, ignore_index=True)

Comments

0

I will take advantage of the function you have already written, so use the following:

data = []
path="/home/my_files"
dirs = os.listdir( path )
for file in dirs:
    data.append(load_data(path, file))

In this case you will have all data in the list data.

Comments

0

Hi you can use a for loop on a listdir:

os.listdir(<path of your directory>)

this gives you the list of files in your directory, but this gives you also the name of folders in that directory

Comments

0

Try generating a file list first, then passing that to a modified version of your function.

def dir_recursive(dirName):
    import os
    import re

    fileList = list()
    for (dir, _, files) in os.walk(dirName):
         for f in files:
             path = os.path.join(dir, f)
             if os.path.exists(path):
                 fileList.append(path)

    fList = list()
    prog = re.compile('.txt$')
    for k in range(len(fileList)):
        binMatch = prog.search(fileList[k])
        if binMatch:
            fList.append(binMatch.string)

    return fList

def load_data2(file_list):
    documents_list = []
    titles=[]
    for file_path in file_list:
        with open( file_path ,"rt", encoding='latin-1') as fin:
            for line in fin.readlines():
                text = line.strip()
                documents_list.append(text)
    print("Total Number of Documents:",len(documents_list))
    titles.append( text[0:min(len(text),100)] )
    return documents_list,titles

# Generate a file list & load the data from it
file_list = dir_recursive(path)
documents_list, titles = load_data2(file_list)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.