0

I am trying to write a function in bash but it won't work. The function is as follows, it gets a file in the format of:

1 2 first 3
4 5 second 6
...

I'm trying to access only the strings in the 3rd word in every line and to fill the array "arr" with them, without repeating identical strings. When I activated the "echo" command right after the for loop, it printed only the first string in every iteration (in the above case "first").

Thank you!

function storeDevNames {

n=0
b=0
while read line; do
    line=$line
    tempArr=( $line )
    name=${tempArr[2]}
    for i in $arr ; do
        #echo ${arr[i]}
        if [ "${arr[i]}" == "$name" ]; then
            b=1
            break
        fi
    done
    if [ "$b" -eq 0 ]; then
        arr[n]=$name
        n=$(($n+1))
    fi
    b=0
done < $1
}
6
  • How do you call the function? How do you echo the array? Commented Apr 5, 2015 at 8:33
  • Use shellcheck.net Commented Apr 5, 2015 at 8:37
  • choroba: I call it using "storeDevNames a.txt". I am printing the array in a different function. I'll try to see your answer. Thanks! Commented Apr 5, 2015 at 8:49
  • Are you sure that the input file is separated by regular spaces, and not unbreaking space characters? stackoverflow.com/questions/11272374/… Commented Apr 5, 2015 at 9:00
  • @asimovwasright I think it is. This is a .comp file Commented Apr 5, 2015 at 9:26

3 Answers 3

1

The following line seems suspicious

    for i in $arr ; do

I changed it as follows and it works for me:

#! /bin/bash

function storeDevNames {
    n=0
    b=0
    while read line; do
        # line=$line # ?!
        tempArr=( $line )
        name=${tempArr[2]}
        for i in "${arr[@]}" ; do
            if [ "$i" == "$name" ]; then
                b=1
                break
            fi
        done
        if [ "$b" -eq 0 ]; then
            arr[n]=$name
            (( n++ ))
        fi
        b=0
    done
}

storeDevNames < <(cat <<EOF 
1 2 first 3
4 5 second 6
7 8 first 9
10 11 third 12
13 14 second 15
EOF
)

echo "${arr[@]}"
Sign up to request clarification or add additional context in comments.

5 Comments

You're right, it does print the whole array. I still don't get two things: 1. Why does it store two identical strings? Is there something wrong with my if-else? 2. Why doesn't it print every single element in the array with this echo command but prints only the first one every time?
@GalFl: I don't understand. I'm getting no duplicates. Show the code that produces them in the question.
I tried it now with a simple txt file and it worked with no duplicates, but I tried it with a .comp file (the format i have to work with) and it does show duplicates. Maybe it's something with this format? maybe there is something different with the spaces or end of lines?
It works! So just to fully understand: the "i" in the for loop represents a number (like in C for example) or in this case a string?
It's the string. If you wanted numbers, you'd need something like for i in $(seq 0 ${#arr[@]}) or for ((i=0; i<${#arr[@]}; i++))
1

You can replace all of your read block with:

arr=( $(awk '{print $3}' <"$1" | sort | uniq) )

This will fill arr with only unique names from the 3rd word such as first, second, ... This will reduce the entire function to:

function storeDevNames {
    arr=( $(awk '{print $3}' <"$1" | sort | uniq) )
}

Note: this will provide a list of all unique device names in sorted order. Removing duplicates also destroys the original order. If preserving the order accept where duplicates are removed, see 4ae1e1's alternative.

2 Comments

Your answer breaks the order of lines, which might (or might not) be important. See the other awk answer (disclosure: mine) for how to preserve the order.
Indeed it does, if retaining the order of device names is important, then your answer is the one to use.
1

You're using the wrong tool. awk is designed for this kind of job.

awk '{ if (!seen[$3]++) print $3 }' <"$1"

This one-liner prints the third column of each line, removing duplicates along the way while preserving the order of lines (only the first occurrence of each unique string is printed). sort | uniq, on the other hand, breaks the original order of lines. This one-liner is also faster than using sort | uniq (for large files, which doesn't seem to be applicable in OP's case), since this one-liner linearly scans the file once, while sort is obviously much more expensive.

As an example, for an input file with contents

1 2 first 3
4 5 second 6
7 8 third 9
10 11 second 12
13 14 fourth 15

the above awk one-liner gives you

first
second
third
fourth

To put the results in an array:

arr=( $(awk '{ if (!seen[$3]++) print $3 }' <"$1") )

Then echo ${arr[@]} will give you first second third fourth.

2 Comments

This looks like a really good solution, but since I'm a beginner in bash I'm trying to write the simplest code to understand rather than the most efficient to write. Plus we are probably not allowed to use "awk". Thank you very much!
@GalFl No problem, this might still help future users.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.