I am trying to search multiple text files for the count of 'plate appearances' which occurs when
temp_array2[0] == "play" and temp_array2[2] == "1":
I am trying to export this as a .csv file however, when I print the 'count of plate appearances' it appears as a count, not a string (and I think it needs to be a string to export to .csv).
The code I have got so far is below.
import os
input_folder = 'files' # path of folder containing the multiple text files
# create a list with file names
data_files = [os.path.join(input_folder, file) for file in
os.listdir(input_folder)]
# open csv file for writing
csv = open('myoutput.csv', 'w')
def write_to_csv(line):
print(line)
csv.write(line)
j=0 # initialise as 0
count_of_plate_appearances=0 # initialise as 0
for file in data_files:
with open(file, 'r') as f: # use context manager to open files
for line in f:
lines = f.readlines()
i=0
while i < len(lines):
temp_array = lines[i].rstrip().split(",")
if temp_array[0] == "id":
j=0
count_of_plate_appearances=0
game_id = temp_array[1]
awayteam = lines[i+2].rstrip().split(",")[2]
hometeam = lines[i+3].rstrip().split(",")[2]
date = lines[i+5].rstrip().split(",")[2]
output_for_csv=(game_id,date,hometeam, awayteam)
print(output_for_csv)
#csv.write(','.join(output_for_csv) + '\n') #write to the csv file. This works
for j in range(i+46,i+120,1): #only check for plate appearances this when temp_array[0] == "id"
temp_array2 = lines[j].rstrip().split(",") #create new array to check for plate apperances
if temp_array2[0] == "play" and temp_array2[2] == "1": # plate apperance occurs when these are true
count_of_plate_appearances=count_of_plate_appearances+1
#print(count_of_plate_appearances)
output_for_csv2=(game_id,date,hometeam, awayteam,[count_of_plate_appearances]) #this is what I want to outpu to the CSV
print(output_for_csv2)
#csv.write(','.join(output_for_csv2) + '\n') #the code does not run when this is uncommented
i=i+1
else:
i=i+1
j=0
count_of_plate_appearances=0
#quit()
csv.close()
The output is as follows
('ARI201803300', '2018/03/30', 'ARI', 'COL')
('ARI201803300', '2018/03/30', 'ARI', 'COL', [35])
('ARI201803310', '2018/03/31', 'ARI', 'COL')
('ARI201803310', '2018/03/31', 'ARI', 'COL', [33])
('ARI201804020', '2018/04/02', 'ARI', 'LAN')
('ARI201804020', '2018/04/02', 'ARI', 'LAN', [32])
('ARI201804030', '2018/04/03', 'ARI', 'LAN')
('ARI201804030', '2018/04/03', 'ARI', 'LAN', [38])
('ARI201804040', '2018/04/04', 'ARI', 'LAN')
*note there are 2 lines of out put for each. The first one is without 'count of plate appearances' and this prints to pdf fine. The second line is what I want to print.
I think it should look like this in order to get the .csv output.
('ARI201803300', '2018/03/30', 'ARI', 'COL', '35')
('ARI201803310', '2018/03/31', 'ARI', 'COL', '33')
('ARI201804020', '2018/04/02', 'ARI', 'LAN', '32')
('ARI201804030', '2018/04/03', 'ARI', 'LAN', '38')
Any suggestions on how I can do this?