I currently have a csv-file with 200k rows, each row including 80 entries, separated by a comma. I try to open the csv-file with open() and append the data to a 2-D python list. When I try to iterate through that list and append the 80 entries to a single one, the computer freezes. Does my code produce some kind of memory issue? Should I work with my data in batches or is there a more efficient way to got through what I'm trying to do?
In short: Open csv, go through 200k entries and transform them from [1, 2, 3, 4, 5,..., 80], [1, ..., 80], .... 200k -> [12345...80]. [1...80], 200k
import csv
# create empty shells
raw_data = []
concatenate_data = []
def get_data():
counter = 1
# open the raw data file and put it into a list
with open('raw_data_train.csv', 'r') as file:
reader = csv.reader(file, dialect='excel')
for row in reader:
print('\rCurrent item: {0}'.format(counter), end='', flush=True)
raw_data.append(row)
counter += 1
print('\nReading done')
def format_data():
counter = 1
temp = ''
# concatenate the separated letters for each string in the csv file
for batch in raw_data:
for letters in batch:
temp += letters
concatenate_data.append(temp)
print('\rCurrent item: {0}'.format(counter), end='', flush=True)
counter += 1
print('\nTransforming done')
print(concatenate_data[0:10])
tempis only initialized at start?