I have a spreadsheet in which each column represents a day of the week. Each cell in the column holds the string value of an animal on the farm that was fed that day. Like this:
Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
cow, cow, cow, cow, cow, cow, cow,
goat, goat, goat, goat, goat, goat,
horse, horse, , horse, horse, horse, horse
, pig, , , pig, , ,
duck, duck, duck, duck, duck, goose, duck
, , , , , , goat
Notice that the cow was fed every day, the goat was fed every day but it was recorded on two disjointed rows, the horse was not fed on Wednesday, the pig was only fed on Tuesday and Friday, and instead of feeding the duck on Saturday, they fed the goose instead but recorded it on the duck line.
What I want to do now is construct an AWK script that will tell me which animals were fed every day of the week.
What I think I want to do is loop through the data once, and make an associative array of every unique value in field $7, the idea being that if an animal wasn't fed on Sunday, it wasn't fed every day of the week.
Then, I want to loop through the file again, and increment the value of the array holding the value of the animal on each day it is found. I then want to print out the names of every animal that was fed every day.
Here is the pseudo-code I've got so far:
awk -F "," 'FNR > 1 BEGIN {
[SOMEHOW MAGICALLY CONSTRUCT AN ARRAY HOLDING THE VALUES OF FIELD $7]
}
{
for (i=1; i <= NR; i++) {
if ($i in animals) {
animals[$i]++
}
else {
animals[$i]=0
}
}
}
END {
for (animal in animals) {
if (animals[animal]==7) {
print $animal[animal]
}
}
}
}
I know that AWK code is probably not correct on a lot of levels. But I've been bashing my head against this problem all day, despite having read O'Reilly's "sed & awk" book and referencing it and The Googles all day.
Any help would be greatly appreciated.
goose, goose, goose, goose, goose, duck, goosetoo? Could it have a linegoat, goat, goat, goat, goat, goat, goattoo? This would mean that the goats being fed (every day) was recorded on 3 lines of data, and each day would have two entries for goats. It's a question of 'how chaotic can the input data be'.