1

I have a spreadsheet in which each column represents a day of the week. Each cell in the column holds the string value of an animal on the farm that was fed that day. Like this:

Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
cow, cow, cow, cow, cow, cow, cow,
goat, goat, goat, goat, goat, goat, 
horse, horse, , horse, horse, horse, horse
 , pig, , , pig, , ,
duck, duck, duck, duck, duck, goose, duck
 , , , , , , goat

Notice that the cow was fed every day, the goat was fed every day but it was recorded on two disjointed rows, the horse was not fed on Wednesday, the pig was only fed on Tuesday and Friday, and instead of feeding the duck on Saturday, they fed the goose instead but recorded it on the duck line.

What I want to do now is construct an AWK script that will tell me which animals were fed every day of the week.

What I think I want to do is loop through the data once, and make an associative array of every unique value in field $7, the idea being that if an animal wasn't fed on Sunday, it wasn't fed every day of the week.

Then, I want to loop through the file again, and increment the value of the array holding the value of the animal on each day it is found. I then want to print out the names of every animal that was fed every day.

Here is the pseudo-code I've got so far:

awk -F "," 'FNR > 1 BEGIN {
    [SOMEHOW MAGICALLY CONSTRUCT AN ARRAY HOLDING THE VALUES OF FIELD $7]
    }
    {
        for (i=1; i <= NR; i++) {
            if ($i in animals) {
                animals[$i]++
            }
            else {
                 animals[$i]=0
            }
         }
     }
     END {
         for (animal in animals) {
             if (animals[animal]==7) {
                 print $animal[animal]
             }
          }
     }
}

I know that AWK code is probably not correct on a lot of levels. But I've been bashing my head against this problem all day, despite having read O'Reilly's "sed & awk" book and referencing it and The Googles all day.

Any help would be greatly appreciated.

3
  • You're right that if the animal was not fed on Sunday, it was not fed every day. However, if the animal was fed on Sunday but was not fed on Monday (or Tuesday, or …) then it still wasn't fed every day. So, being fed on Sunday is a necessary condition; it is not a sufficient condition. Commented Jun 3, 2016 at 5:54
  • Could the data have a line goose, goose, goose, goose, goose, duck, goose too? Could it have a line goat, goat, goat, goat, goat, goat, goat too? This would mean that the goats being fed (every day) was recorded on 3 lines of data, and each day would have two entries for goats. It's a question of 'how chaotic can the input data be'. Commented Jun 3, 2016 at 5:58
  • Is it possible that an animal was fed more than once on a single day? Commented Jun 3, 2016 at 6:05

1 Answer 1

3

What I want to do now is construct an AWK script that will tell me which animals were fed every day of the week.

Only the goat and cow were fed every day:

$ awk -F'[[:space:]]*,[[:space:]]*' 'NR>1{for (i=1;i<=7;i++) if ($i) fed[$i]+=1} END{for (a in fed) if (fed[a]==7) print a}' farmdata
goat
cow

How it works

awk implicitly loops over each record (line) in the file. This script uses one array, called fed, to keep track of how many times each animal was fed.

  • -F'[[:space:]]*,[[:space:]]*'

    This sets the field separator to be a comma along with adjacent white space if any.

  • NR>1{for (i=1;i<=7;i++) if ($i) fed[$i]+=1}

    For every line after the first, loop over each field and add one to the count for the name in that field.

  • END{for (a in fed) if (fed[a]==7) print a}

    After we reach the end of the file, print out every animal that was fed seven times.

Multiple lines

For those who prefer their code spread over multiple lines:

awk -F'[[:space:]]*,[[:space:]]*' '
    NR>1{
        for (i=1;i<=7;i++) 
           if ($i) fed[$i]+=1
    }  

    END{
        for (a in fed) 
           if (fed[a]==7) print a
    }
    ' farmdata
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! I was getting mentally stuck on the idea that I had to use the BEGIN{} block.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.