0

I have 5 files in a folder a.csv b.csv ....

I need to combine these files into one file called X.csv and in future keep merging the contents of X.csv with a,b,c etc.

Even with only two lines in each file I have error message saying that there is not enough memory - its essentially only copying 10 lines across. Im using the following command :

 paste -d, *.csv >> X.csv. 

However when i use

 paste -d, *.csv > X.csv

There are no memory issues. This however I canont use since i need the information on the X.CSV file previously as well so I should only append not edit contents of the whole file.

Would anyone know how I achieve this? These are ',' separated CSV files and I would like to not copy the header (Row 1) which names the columns more than once.

I use MAC OSX Mavericks 8 GB Ram.

Thank You :)

5
  • Please show us a a.csv and b.csv and X.csv Commented Jul 7, 2014 at 12:00
  • Thank You but I am having Similar memory issue with cat as well. cat *.csv > X.csv. I cannot use a..e since some files have slightly different names but its 5 files in total. Commented Jul 7, 2014 at 12:01
  • I think you may have a weird file in there. Try ls -al and see if there is anything big and odd! Else try rebooting. Commented Jul 7, 2014 at 12:04
  • 3
    I guess "*.csv" also matches "X.csv" - let the fun begin :) Commented Jul 7, 2014 at 13:10
  • 1
    Seriously, save your output to a temporary file whose extension is NOT ".csv" and rename when you're done. Commented Jul 7, 2014 at 13:12

2 Answers 2

2

As @loreb said, *.csv is matching X.csv as well. You can avoid that using a proper globbing variable, but that depends if your shell is capable of handling that... I know bash can, and you posted your question with the bash tag, but you said you are using Mac OSX - well, I guess you can try it anyway.

paste -d, [a-z]*.csv >> X.csv 

That will run the paste command for the files named from a.csv to z.csv, lowercase, so it will not take X.csv, which I think is the current problem.

Sign up to request clarification or add additional context in comments.

2 Comments

I don't understand what you intend to imply about OS X and bash. OS X comes with an older version of bash, but it still has extglob. shopt -s extglob; paste -d, !(X).csv >> X.csv
@kojiro, I'm not a OS X user, so I'm not sure if that's ok for its shell or not... so, better to try than not, but I wasn't implying something bad, by the way :) - Thanks for clarifying it for me ;)
0

If you have a lot of data, and in the absence of a tool which keeps track of the line number at which processing last stopped in [a-z].csv (which I don't think exists) you can use the following process:

  1. Ensure that the writing program is not going to write to the CSV files anymore. Some possible ways to do this:
    • Move the files to a read-only filesystem.
    • Stop the file writing program.
    • Somehow force the program to start writing to a new file descriptor.
  2. Rename or move the files if necessary to avoid the writer opening them again.
  3. Restart the writer if necessary.
  4. paste -d, /temporary_directory/*.csv >> /final_destination/X.csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.