3

I want to extract data from multiple files nested within subfolders.

e.g. folder structure

A/B/C/D.dat
A/B/E/F.dat
A/B/G/H.dat

The code I came up with is:

import os
values = 2
doc = []
rootdir = 'C:/A/B'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        if file.endswith('.dat'):
            with open (file, 'rt') as myfile:
                    current_line = 0
                    for mylines in myfile:
                            if current_line == values:
                                doc.append()
                                break
                            current_line += 1
            continue

print(doc)

Error I struggle to solve:

...with open (file, 'rt') as myfile:
IOError: [Errno 2] No such file or directory: 'D.dat'

5 Answers 5

1

Though your solution is not the cleanest. The bug you are getting comes from

            with open (file, 'rt') as myfile:

which should be replaced with

            with open (subdir + "/" + file, 'rt') as myfile:
Sign up to request clarification or add additional context in comments.

Comments

1

Error is due to missing of complete file path. So you need make sure that "A/B/C/D.dat" should be there in file which you are trying to open as myfile.

you can add the below snippet to your logic to achieve it.

for subdir, dirs, files in os.walk(rootdir): for file in files: filepath=subdir+'/'+file

Comments

0

sounds like you are looking for the third line of all the .dat files in the subdirectories. There are using pathlib.Path you can do quite a lot of this in a few simple steps.

from pathlib import Path
doc = []
line_number_of_each_file = values = 2

for file in Path('C:/A/B').rglob('*.dat'):
    doc.append(file.readtext().splitlines()[line_number_of_each_file])

print(doc)

Comments

0

I had a similar problem. My file structure is something like this:

project
|__dir1
|  |__file_to_read.txt
|
|__dir2
   |__file_reader.py

In order to actually find the other file, I have to go out one directory, to the parent dir of my .py file. I used this code originally:

import os

current_path = os.path.dirname(__file__)

file_to_read = os.path.relpath('project/dir1/file_to_read', current_path)

This worked for me, but I later changed over to a different version. The reason is not for any reason you'll have to worry about, other than evidently this next module is better for path crawling than os.

from pathlib import Path

parent = Path.cwd().parent
file_to_read = Path(f'{parent}/project/dir1/file_to_read.txt').resolve()

Maybe this would be more preferable, as it's more highly recommended to me. I hope this helps your problem.

Comments

0

The issue here is that you are trying to call filename.dat. Instead you should access A/B/C/filename.dat . Join the pathname + filename for this:

import os
values = 2
doc = []
rootdir = 'A/B/C/'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        if file.endswith('.txt'):
            print(file)
            file = os.path.join(rootdir,subdir,file)
            with open (file, 'rt') as myfile:
                    current_line = 0
                    for mylines in myfile:
                            if current_line == values:
                                doc.append(mylines)
                                break
                            current_line += 1
            continue

print(doc)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.