0

I am operating with the multiple column data: ID,num,score

7LMQ,Y6G,1.99
7LAA,Y65,2.95
7LZZ,Y55,8.106
7LDD,YAA,9.063
7N66,0HG,6.042
7444,HOP,5.02
7LJF,HEI,5.14
7LFD,LAL,4.128
7KCV,Cho,4.31
7GHJ,Ro,9.045

using some simple script I need to create two bash arrays from this data:

  1. a simple array containing elemements from the second column:

    sdf_lists=("Y6G" "Y65" "Y55" "YAA" "0HG" "HOP" "HEI" "LAL" "Cho" "Ro")

  2. an associative array made from the elements of the 2nd and the 1st columns:

    dataset=( [Y6G]=7LMQ [Y65]=7LAA [Y55]=7LZZ [YAA]=7LDD [0HG]=7N66 [HOP]=7444 [HEI]=7LJF [LAL]=7LFD [Cho]=7KCV [Ro]=7GHj ).

Do I need something complex like AWK to achive it or simple GREP solution will work as well?

4
  • 1
    hi, is your data available in csv format ?? Commented Dec 15, 2021 at 12:06
  • 2
    Please, never show text with images. They are not searchable, not copy-paste-able and much heavier than needed. Moreover they affect accessibility negatively. Please copy-paste the text in your question and format it properly, instead. And do not forget to explain what format you use (CSV, TSV...) Commented Dec 15, 2021 at 12:19
  • I am terrible sorry, actually I tried to paste it in csv initially... so it's been edited :-) Commented Dec 15, 2021 at 12:22
  • I would do this completely in bash, i.e. [1] read the file line by line, [2] split each line into the fields, and [3] populate the two arrays. Don't forget to think about, whether the values in the second column can occur multiple times, or are unique (like in your example). Commented Dec 15, 2021 at 12:47

1 Answer 1

2

bash by itself is all you need

declare -a sdf_lists=()
declare -A dataset=()

while IFS=, read -r id sdf value; do
  sdf_lists+=("$sdf")
  dataset[$id]="$sdf"
done < file.csv

declare -p sdf_lists dataset

result

declare -a sdf_lists=([0]="Y6G" [1]="Y65" [2]="Y55" [3]="YAA" [4]="0HG" [5]="HOP" [6]="HEI" [7]="LAL" [8]="Cho" [9]="Ro")
declare -A dataset=([7LMQ]="Y6G" [7LJF]="HEI" [7444]="HOP" [7GHJ]="Ro" [7KCV]="Cho" [7N66]="0HG" [7LFD]="LAL" [7LZZ]="Y55" [7LDD]="YAA" [7LAA]="Y65" )

To address Andre Wildberg's appropriate concern about CSV data, with bash 5.1, we can do

enable -f /usr/local/lib/bash/csv csv     # your location may be different

while IFS= read -r line; do
  csv -a fields "$line"
  sdf_lists+=("${fields[1]}")
  dataset[${fields[0]}]="${fields[1]}"
done < file.csv

Or, use a tool like python or ruby that ship with CSV modules in their standard library.

Sign up to request clarification or add additional context in comments.

2 Comments

Careful: This fails on quoted commas and probably all sorts of whitespace characters.
It would have trouble with quoted commas, but not whitespace.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.