I want a loop that can find the letter that ends words most frequently in multiple languages and output the data in columns. So far I have
count="./wordlist/french/fr.txt ./wordlist/spanish/es.txt ./wordlist/german/de.$
lang="French Spanish German Portuguese Italian"
(
echo -e "Language Letter Count"
for i in $count
do
(for j in {a..z}
do
echo -e "LANG" $j $(grep -c $j\> $i)
done
) | sort -k3 -rn | head -1
done
) | column -t
I want it to output as shown:
Language Letter Count
French e 196195
Spanish a 357193
German e 251892
Portuguese a 217178
Italian a 216125
Instead I get:
Language Letter Count
LANG z 0
LANG z 0
LANG z 0
LANG z 0
LANG z 0
The words files have the format:
Word Freq(#) where the word and its frequency are delimited by a space.
This means I have 2 problems;
First, the grep command is not handling the argument $j\> to find a character at the end of a word. I have tried using grep -E $j\> and grep '$j\>' and neither worked.
The second problem is that I don't know how to output the name of the language (in the variable lang). Nesting another for loop did not work when I tried it like this (or with i and k in the opposite order):
(
for i in $count
do
for k in $lang
do
for j in {a..z}
do
echo -e $k $j $(grep -c $j\> $i)
done
) | sort -k3 -rn | head -1
done
done
) | column -t
Since this outputs multiples of the name of the language "$k" in places where it does not belong.
I know that I can just copy and paste the loop for each language, but I would like to extend this to every language. Thanks in advance!
is 1000; xertz 1; showbiz 1;the result would bez 2(rather thans 1000)z 2which is what I want since I want to count the frequency and display the character that most frequently ends a word within the file itself. And, roelofs, a sample of the file is shown here:de 1622928 je 1622619 est 1348809 pas 1128894 le 1093232so within this file itself, e most commonly ends a word. Sorry for the misconception.