7

I have a folder full of text documents, the text of which needs to be loaded into a single list variable.

Each index of the list, should be the full text of each document.

So far I have this code, but it is not working as well.

dir = os.path.join(current_working_directory, 'FolderName')
file_list = glob.glob(dir + '/*.txt')
corpus = [] #-->my list variable
for file_path in file_list:
    text_file = open(file_path, 'r')
    corpus.append(text_file.readlines()) 
    text_file.close()

Is there a better way to do this?

Edit: Replaced the csv reading function (read_csv) with text reading function (readlines()).

0

4 Answers 4

15

You just need to read() each file in and append it to your corpus list as follows:

import glob
import os

file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))

corpus = []

for file_path in file_list:
    with open(file_path) as f_input:
        corpus.append(f_input.read())

print(corpus)

Each list entry would then be the entire contents of each text file. Note, using readlines() would give you a list of lines for each file rather than the raw text.

With a list-comprehension

file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))

corpus = [open(file).read() for file in file_list]

This approach though might end up with more resource usage as there is no with section to automatically close each file.

Sign up to request clarification or add additional context in comments.

2 Comments

If you define a function eg. def get_file_text() that uses a with section, you can then use that in your list comprehension so that the files are still closed.
Indeed you correct, I was though trying to emphasis why a one line approach might have disadvantages
3
  • Solve this with the pathlib module, which treats paths as objects with methods.
  • Use Path() to create a pathlib object of the path (or use .cwd()), and use .glob (or .rglob()) to find the files matching the specific pattern.
    • files = (Path().cwd() / 'FolderName').glob('*.txt')
      • / is used to add folders (extend) to a pathlib object.
    • Alternatives:
      • files = Path('./FolderName').glob('*.txt')
      • files = Path('e:/PythonProjects/stack_overflow/t-files/').glob('*.txt')
  • Path.read_text() can be used to read the text into a list, without using .open(). The file is opened and then closed.
    • text = [f.read_text() for f in files]
    • Alternatives:
      • text = [f.open().read() for f in files]
      • text = [f.open().readlines() for f in files] - creates a list of lists of text.
from pathlib import Path

# get the files
files = (Path().cwd() / 'FolderName').glob('*.txt')

# write the text from each file into a list with a list comprehension - the file is opened and closed
text = [f.read_text() for f in files]

for-loop Alternative

Option 1

files = Path('./FolderName').glob('*.txt')

text = list()

for file in files:
    text.append(file.read_text())  # the file is opened and closed

Option 2

  • Path.open() with .read() can be used to open, and read the file text into a list, and close the file.
files = Path('./FolderName').glob('*.txt')

text = list()

for file in files:
    with file.open() as f:
        text.append(f.read())

Comments

-1

I find this to be an easier way:

    import glob


    corpus = []

    file_list = glob.glob("Foldername/*.txt")
    for file_path in file_list:
        with open(file_path, 'r') as file_input:
           corpus.append(file_input.read())
    print (corpus)

Comments

-3
import os
import shutil
import csv
import sys

csv_file = "card.csv"

with open(csv_file, 'r') as f:
    reader = csv.reader(f)
    for i, row in enumerate(reader):
        if i == 0:
            print(i)
            pass    # Skip header row
        else:
            filename,filepath,x,y,w,h = row

            file2 = filename + ".txt"    
            file1 = open(file2,"a")#append mode 
            file1.write("%s\n%s\n%s\n%s\n" % (x, y, w,h)) 
            file1.close() 

1 Comment

Add some eplanation to your answer

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.