0

I have tried

num_columns = 982

def transform_row(row):
    #row = row.split('\n')  # split on new line
    row = row.split(',')  # split on commas
    row = [i.split() for i in row if i!='5']  # remove 5s
    row += ['0']*(num_columns - len(row))  # add 0s to end
    return ','.join(row) 
#and then apply this over the csv.

out = open('outfile.csv', 'w')
for row in open('dataset_TR1.csv'):
    out.write(transform_row(row))

In essence, I want to remove all 5s from each row in a csv file and replace the missing length with trailing 0s bewtween columns 982 and 983. However, using the data file from http://www.filedropper.com/datasettr1 , this only seems to write everything to one row and the output is not as expected.

9
  • Correct. The second line to that does that, however when I split by comma, it starts writing to the second row instead of the current row. Commented Feb 22, 2018 at 16:08
  • 1
    Also the fact that you have a double comma suggests that the list « row » contains a None. Inspect what « row » contains after you called split to ensure you don’t have extra stuff you wouldn’t want. Commented Feb 22, 2018 at 16:09
  • I dont see anything extra it could be adding...Do you? Please point it out if you do. Commented Feb 22, 2018 at 16:11
  • I tried: 'row = "1,5,5,5,3" row = row.split(',') # split on commas row = [i for i in row if i!='5'] # remove 5s row += ['0']*(num_columns - len(row)) # add 0s to end row = ','.join(row)' Gives the output you want. It must be something to do with splitting on \n. Commented Feb 22, 2018 at 16:14
  • 2
    when you read a line from your csv, it gives you the whole of '1,5,5,5,3\n', and when you split by comma if gives ['1', '5', '5', '5', '3\n']. So it's only natural that printing (after transform) the list ['1', '3\n', '0', '0', '0'] outputs text on two lines. You need to additionally strip '\n' by changing your list comprehension into [i.strip() for i in row if i != '5'] Commented Feb 22, 2018 at 16:27

3 Answers 3

1
import csv

with open('dataset_TR1.csv', 'r') as f:
    reader = csv.reader(f)
    result = []
    for line in reader:
        print(len(line))
        remove_5s = [elem for elem in line if elem != '5']
        trailing_zeros = ['0'] * (len(line) - len(remove_5s))

        # if you want the zeros added to the end of the line
        # result.append(remove_5s + trailing_zeros)

        # or if you want the zeros added before the last element of the line
        result.append(remove_5s[:-1] + trailing_zeros + [remove_5s[-1]])

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(result)
Sign up to request clarification or add additional context in comments.

4 Comments

This works! However, is there a way to add the trailing 0s between columns 982 and 983 instead of just at the end?
@xion If there are 983 columns, wouldn't index 982 be at the end anyways? The index of a list starts at 0. Are you saying you want the zeros added before the last element of the row/line?
Correct. I am trying to add the zeroes before the last element of each row
works beautifully. Exactly what I was trying to achieve. Thank you!
1

You'll have to handle commas and new lines separately to keep them right.

rows = "1,5,5,5,3\n2,5,5,5,9"
rows = rows.split('\n')
lines = []

for idx, row in enumerate(rows):
  row = row.split(',')  # split on commas
  row = [i for i in row if i!='5']  # remove 5s
  row += ['0']*(5 - len(row))  # add 0s to end
  row = ','.join(row)
  lines.append(row)


print(rows)
lines = '\n'.join(lines)
print(lines)

Scan through and split on \n. Then scan through each line individually, do your replacement and then put everything back.

5 Comments

Could you update this to reflect usage with the csv file? Thanks
@xion: ?? but this should already work instead of your own current code! It seems to me there is no further "usage" left.
@usr2564301 I am reading a csv file in. Could it be updated to reflect that for clarity?
Is it as simple as replacing "\n" with ","? The problem seems to happen because the last value of each row doesn't end with a comma.
You read each row in from the csv file but the last value on each row is followed by "\n" not ",". Simply replacing that on each row might help.
1

A better way of doing that is by using the builtin module csv

import csv
num_columns = 982

def transform_row(row):
    row = [column for column in row if column != '5']
    row += ['0'] * (num_columns - len(row))
    return row

fout = open('outfile.csv', 'w', newline='')
writer = csv.writer(fout)
fin = open('dataset_TR1.csv', 'r')
reader = csv.reader(fin)
for row in reader:
    writer.writerow(transform_row(row))

2 Comments

This works! However, is there a way to add the trailing 0s between columns 982 and 983 instead of just at the end?
I don't think I understand your question. Maybe an example would be helpful

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.