Preface - I'm pretty new to Python, having had more experience in another language.
I have a text file with single column list of strings in the generic (but slightly varying) format "./abc123a1/type/1ab2_x_data_type.file.type"
I need to extract the abc123a1 and the 1ab2 portions from all several hundred of the rows and put them under two columns (column a and b) in a csv. Sometimes there may be a "1ab2_a" and a "1ab2_b", but I only want one 1ab2. So I'd want to grab "1ab2_a" and ignore all others.
I have the regex which I THINK will work:
tmp = list()
if re.findall(re.compile(r'^([a-zA-Z0-9]{4})_'), x):
tmp = re.findall(re.compile(r'^([a-zA-Z0-9]{4})_'), x)
elif re.findall(re.compile(r'_([a-zA-Z0-9]{4})_'), x):
tmp = re.findall(re.compile(r'_([a-zA-Z0-9]{4})_'), x)
if len(tmp) == 0:
return None
elif len(tmp) > 1:
print "ERROR found multiple matches"
return "ERROR"
else:
return tmp[0].upper()
I am trying to make this script step by step and testing things to make sure it works, but it's just not.
import sys
import csv
listOfData = []
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
for line in f:
listOfData.append([line])
print listOfData
with open('extracted.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('column a', 'column b'))
writer.writerows(listOfData)
print listOfData
Still failing to get anything in the csv other than column headers, much less a parsed version!
Does anyone have any better ideas or formats I could do this in? A friend mentioned looking into glob.glob, but I haven't had luck getting that to work either.
listOfData, does it have the data that you want?1ab2or1ab2_a?