1

I have 40 csv files that I need to edit. 20 have matching format and the names only differ by one character, e.g., docA.csv, docB.csv, etc. The other 20 also match and are named pair_docA.csv, pair_docB.csv, etc.

I have the code written to edit and combine docA.csv and pair_docA.csv, but I'm struggling writing a loop that calls both the above files, edits them, and combines them under the name combinedA.csv, then goes on the the next pair.

Can anyone help my rudimentary bash scripting? Here's what I have thus far. I've tried in a single for loop, and now I'm trying in 2 (probably 3) for loops. I'd prefer to keep it in a single loop.

set -x
DIR=/path/to/file/location

for file in `ls $DIR/doc?.csv`
do

#code to edit the doc*.csv files ie $file

done

for pairdoc in `ls $DIR/pair_doc?.csv`
do

#code to edit the piar_doc*.csv files ie $pairdoc

done

#still need to combine the files. I have the join written for a single iteration, 
#but how do I loop the code to save each join as a different file corresponding
#to combined*.csv
1

2 Answers 2

3

Something along these lines:

#!/bin/bash

dir=/path/to/file/location
 
cd "$dir" || exit
for file in doc?.csv; do
    pair=pair_$file
    # "${file#doc}" deletes the prefix "doc"
    combined=combined_${file#doc}
    cat "$file" "$pair" >> "$combined" 
done

ls, on principle, shouldn't be used in a shell script in order to iterate over the files. It is intended to be used interactively and nearly never needed within a script. Also, all-capitalized variable names shouldn't be used as ordinary variables, since they may collide with internal shell variables or environment variables.


Below is a version without changing the directory.

#!/bin/bash

dir=/path/to/file/location

for file in "$dir/"doc?.csv; do
    basename=${file#"$dir/"}
    pair=$dir/pair_$basename
    combined=$dir/combined_${basename#doc}
    cat "$file" "$pair" >> "$combined"
done
Sign up to request clarification or add additional context in comments.

3 Comments

Bless you! Thank you. That makes sense. Is it a preferential thing to change directories rather than add the directory $DIR in the for loop setup, or is it seen as bad practice to call the $DIR like I did? Also, what if I need to append each output to combined_A.csv, combined_B.csv, etc.?
Actually, I'd rather append each output into a single combined.csv file
@swgRrr Please see the updated answer. Changing the directory is not strictly necessary but it makes things easy for this particular task. Otherwise, some string manipulation via shell parameter expansions would be required in order to split the pathname into a path prefix (the directory name) and a basename (the non-directory portion of the pathname).
0

This might work for you (GNU parallel):

parallel cat {1} {2} \> join_{1}_{2} ::: doc{A..T}.csv :::+ pair_doc{A..T}.csv

Change the cat commands to your chosen commands where {1} represents the docX.csv files and {2} represents the pair_docX.csv file.

N.B. X represents the letters A thru T

2 Comments

Whoo, this one may be a bit above my current paygrade. I understand the flow and such, but could you explain what parallel does and what the series of colon's do? If not, I'll look it up! Thanks for your input!
@swgRrr Gnu parallel is a tool worth investing time in. It provides provides the possibilities of loops in a one-liner format but much more besides.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.