1

I am trying to combine multiple rows in a csv file together. I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. I have tried to store rows in arrays but it doesn't seem to work. I am using Python to do it.

So lets say I have a csv file;

1,2,3
4,5,6
7,8,9

All I want to do is to have a csv file as this;

1,2,3,4,5,6,7,8,9

The code I have tried is this;

fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
  new = line.replace(',', ' ', 1)
  fout.write (new)
fin.close()
fout.close()

Could you please help?

5
  • 2
    Please show your attempt that didn't work so we can help you understand what went wrong Commented Dec 7, 2018 at 13:08
  • So I have tried this but I don't think it is right fin = open("C:\\1.csv", 'r+') fout = open("C:\\2.csv",'w') for line in fin.xreadlines(): new = line.replace(',', ' ', 1) fout.write (new) fin.close() fout.close() Commented Dec 7, 2018 at 13:10
  • It's really not possible to read code in comments, please edit it into your original question with code formatting. Commented Dec 7, 2018 at 13:11
  • 1
    Sorry for this, I am new to here. I have edited my original question and added the code. Thanks Commented Dec 7, 2018 at 13:12
  • No worries :) Working on an answer; you should be using the csv module Commented Dec 7, 2018 at 13:13

4 Answers 4

7

You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). The CSV module uses lists of values to represent single rows.

import csv

def return_contents(file_name):
    with open(file_name) as infile:
        reader = csv.reader(infile)
        return list(reader)

data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')

print(data1)
print(data2)

combined = []
for row in data1:
    combined.extend(row)

for row in data2:
    combined.extend(row)

with open('csv_out.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(combined)

That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. This is the reason that I packed the reading code into the return_contents function; we can repeat the same process millions of times on different files with only one set of code to do the actual reading. Something like this:

import csv
import os


def return_contents(file_name):
    with open(file_name) as infile:
        reader = csv.reader(infile)
        return list(reader)

all_files = os.listdir('my_csvs')

combined_output = []

for file in all_files:
    data = return_contents('my_csvs/{}'.format(file))
    for row in data:
        combined_output.extend(row)

with open('csv_out.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(combined_output)
Sign up to request clarification or add additional context in comments.

Comments

3

If you are specially dealing with csv file format. I recommend you to use csv package for the file operations. If you also use with...as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files Here is what you can do:

PATH = "your folder path"
def order_list():
      data_list = []
      for filename in os.listdir(PATH):
          if filename.endswith(".csv"):
              with open("data.csv") as csvfile:
                  read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
                  for row in read_csv:
                      data_list.extend(row)

  print(data_list)

if __name__ == '__main__':
    order_list()

3 Comments

This doesn't show how it can be done for multiple files. The OP mentions that they have hundreds.
@roganjosh Hi, I have edited my answer. I would be glad if you can suggest or improve. Best Regards
I extended mine to show how you could use os.listdir and package the CSV reading code into a function to avoid all of the nesting.
1

Store your data in pandas df

import pandas as pd    
df = pd.read_csv('file.csv')

Store the modified dataframe into new one

df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column

Write the df to new csv

df2.to_csv("file_modified.csv")

Comments

1

You could do it also like this:

fIn = open("test.csv", "r")
fOut = open("output.csv", "w")

fOut.write(",".join([line for line in fIn]).replace("\n",""))

fIn.close()
fOut.close()

I've you want now to run it on multiple file you can run it as script with arguments:

import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")

fOut.write(",".join([line for line in fIn]).replace("\n",""))

fIn.close()
fOut.close()

So now expect you use some Linux System and the script is called csvOnliner.py you could call it with:

for i in *.csv; do python csvOnliner.py $i changed_$i; done

With windows you could do it in a way like this:

FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.