I have several .txt files and I need to extract certain data from them. Files looks similar, but each of them stores different data. Here is an example of that file:
Start Date: 21/05/2016
Format: TIFF
Resolution: 300dpi
Source: X Company
...
There is more information in the text files, but I need to extract the start date, format and the resolution. Files are in the same parent directory ("E:\Images") but each file has its own folder. Therefore I need a script for recursive reading of these files. Here is my script so far:
#importing a library
import os
#defining location of parent folder
BASE_DIRECTORY = 'E:\Images'
#scanning through subfolders
for dirpath, dirnames, filenames in os.walk(BASE_DIRECTORY):
for filename in filenames:
#defining file type
txtfile=open(filename,"r")
txtfile_full_path = os.path.join(dirpath, filename)
try:
for line in txtfile:
if line.startswidth('Start Date:'):
start_date = line.split()[-1]
elif line.startswidth('Format:'):
data_format = line.split()[-1]
elif line.startswidth('Resolution:'):
resolution = line.split()[-1]
print(
txtfile_full_path,
start_date,
data_format,
resolution)
Ideally it might be better if Python extracts it together with a name of ech file and saves it in a text file. Because I don't have much experience in Python, I don't know how to progress any further.