0

trying to change the value of a column based on other column in other csv

so let's say we have a CSV_1 that states with over 1000 lines with 3 columns

shape   Color    size  
round      2      big  
triangle   1      small   
square     3      medium 

then we have a CSV2 that has only 10 with the following information

color  
1 REd  
2 Blue  
3 Yellow  
etc  

now i want to change the value in column color in CSV_1 with the name of the color of CSV2

so in other words .. something like

for (i=0; i<column.color(csv1); i++) { 
if color.csv1=1; then
subustite with color.csv2=1  }

so that loop iterates in all CSV1 Color column and changes the value with the values from CSV2

4
  • yes will do thanks! Commented Jun 5, 2021 at 20:20
  • 1
    You said csv files, but I cannot see any commans here. Do you use spaces or tabs to delimit them? Commented Jun 5, 2021 at 20:31
  • true! sorry i should have specified that, the ',' is the separator in this case! Commented Jun 5, 2021 at 20:33
  • 2
    Not a problem. That's what I'd expect from a csv file. But you should adapt your file examples to actually use , as the separator. Commented Jun 5, 2021 at 20:35

3 Answers 3

2

An explicit loop for this would be very slow in bash. Use a command that does the line-wise processing for you.

sed 's/abc/xyz/' searches abc in each line and replaces it by xyz. Use this to search and replace the numbers in your 2nd column by the names from your 2nd file. The sed command can be automatically generated from the 2nd file using another sed command:

The following script assumes a CSV file without spaces around the delimiting ,.

sed -E "$(sed -E '1d;s#^([^,]*),(.*)#s/^([^,]*,)\1,/\\1\2,/#' 2.csv)" 1.csv

Interactive Example

$ cat 1.csv 
shape,Color,size
round,2,big
triangle,1,small
square,3,medium
$ cat 2.csv 
color
1,REd
2,Blue
3,Yellow
$ sed -E "$(sed -E '1d;s#^([^,]*),(.*)#s/^([^,]*,)\1,/\\1\2,/#' 2.csv)" 1.csv
shape,Color,size
round,Blue,big
triangle,REd,small
square,Yellow,medium
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot! i'll have to look more into deep into this! this sort of sed definitions are also still hard to follow for me! thanks for your time!
1

Here is one approach, with mapfile which is a bash4+ feature and some common utilities in linux/unix.

Assuming both files are delimited with a comma ,

#!/usr/bin/env bash

mapfile -t colors_csv2 < csv2.csv

head -n1 csv1.csv

while IFS=, read -r shape_csv1 color_csv1 size_csv1; do
  for color_csv2 in "${colors_csv2[@]:1}"; do
    if [[ $color_csv1 == ${color_csv2%,*} ]]; then
      printf '%s,%s,%s\n' "$shape_csv1" "${color_csv2#*,}" "$size_csv1"
    fi
  done
done < <(tail -n +2 csv1.csv)

Would be very slow on large set of data/files.


If ed is available acceptable, with the bash shell.

#!/usr/bin/env bash

ed -s csv1.csv < <(
  printf '%s\n' '1d' $'g|.|s|,|/|\\\ns|^|,s/|\\\ns|$|/|' '$a' ',p' 'Q' . ,p |
  ed -s csv2.csv
)

3 Comments

Thanks! im still not able to complete follow as i think i should have call the columns more differently as per the above code im not sure which one is the color for csv1 or which is the color for csv 2 or it will be easier to rename them with the same name both :/ so i get that on line 1 map we define the colors as the column name for csv2 right? then shape color, is referring to column in csv1 right? we set the loop, with colors referring again to csv2 then the loop runs and checks if color from csv1 its in csv2 and changes the color in csv1 am i right? . again thanks for your time! :)
I have change the variable names, hopefully it will give you an idea, add the debug flag bash -x ./myscript to see what is the script actually doing. Tested both on your sample files.
There are two loops, one for and the other a while loop with the read builtin.
1

To add to @Jetchisel interesting answer, here is an old bash way to achieve that. It should work with bash release 2 as it supports escape literals, indexed array, string expansion, indirect variable references. It implies that color keys in csv2.csv will always be a numeric value. Add shopt -s compat31 at the beginning to test it in the 'old way' with a recent bash. You can also replace declare -a csv2 with a Bash 4+ declare -A csv2 for an associative array, in which case the key can be anything.

#!/bin/bash
declare -a csv2
esc=$'\x1B'
while read -r colors; do
   if [ "${colors}" ] ; then
     colors="${colors// /${esc}}"
     set ${colors//,/ }
     if [ "$1" ] ; then
       csv2["$1"]="$2"
     fi
   fi
done < csv2.csv
while read -r output; do
   if [ "${output}" ] ; then
     outputfilter="${output// /${esc}}"
     set ${outputfilter//,/ }
     if [ "$2" ] ; then
       color="${csv2["$2"]}"
       [ "${color}" ] && { tmp="$1,${color},$3";output="${tmp//${esc}/ }"; };
     fi
     echo "${output}"
   fi
done < csv1.csv

1 Comment

I have learned something new, well not new but tricks to use for old shells, I'll keep this bookmarked :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.