0

I'm asking this as a new question because people didn't seem to understand my original question.

I can figure out how to find if a word starts with a capital and is followed by 9 letters with the code:

echo "word" | grep -Eo '^[A-Z][[:alpha:]]{8}'

So that's part 1 of what I'm supposed to do. My actual script is supposed to loop through each word in a text file that is given as the first and only argument, then check if any of those words start with a capital and are 9 letters long.

I've tried:

cat textfile | grep -Eo '^[A-Z][[:alpha:]]{8}'

and

while read p
do echo $p | grep -Eo '^[A-Z][[:alpha:]]{8}' 
done < $1

to no avail.

Although:

cat randomtext.txt 

outputs:

The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha

so it's correctly outputting all the words in the file randomtext.txt

then why wouldn't

cat randomtext.txt | grep -Eo '^[A-Z][[:alpha:]]{8}'

work?

1

4 Answers 4

2

The problem is in the anchor. Your pattern starts with ^ which matches the beginning of a line, but the word you want to get returned is in the middle of a line. You can replace it with \b to match at a word boundary.

Sign up to request clarification or add additional context in comments.

2 Comments

yes, that's why I use printf to split each words on a newline.
This fixed it, thank you. Can you explain what \b does? I've never encountered it.
1

The words are all one after the other, but your grep expression refers to a whole row.

You ought to split the file into words:

sed -e 's/\s*\b\s*/\n/g' < file.txt | grep ...

Or maybe better, since you're only interested in alphanumeric sequences,

sed -e 's/\W\W*/\n/g' < file.txt | grep -E '^[A-Z][[:alpha:]]{8}$'

The $ (end of line) being made necessary because otherwise 'Supercalifragilisticexpialidocious' would match.

(I had modified {8} in {9} because you specified "and is followed by 9 letters", but then I saw you also state "and are 9 letters long")

By the way, if you use {8} and -o, you might be led into thinking a match is there where it isn't. "-o" means "only print the part matching my pattern".

So if you fed "Supercalifragilistic" to "^[A-Z][[:alpha:]]{8}", it would accept it as a match and print "Supercali". This is not what I think you asked.

Comments

0

If you cat the whole line is fed to grep at once. You should split the words before feeding to grep.

You could try:

cat randomtext | awk '{ for(i=1; i <= NF; i++) {print $i } }' | grep -Eo '^[A-Z][a-z]{8}'

2 Comments

cat | awk is useless, awk can do it himself. partmaps.org/era/unix/award.html#cat
@sputnick I think it is quite useful that awk can react to stdin. Presumably you'd want the command to work with other sources than the example randomtext. But thanks for the award anyway!
0

You should do this :

$ cat file.txt
The loud Brown Cow jumped over the White Moon. November October tesTer Abcdefgh Abcdefgha
$ printf '%s\n' $(<file.txt) | grep -Eo '^[A-Z][[:alpha:]]{8}$' 
Abcdefgha

If you want to work on the same source line, you need to remove the ^ character (means the beginning of the line) :

grep -Eo '\b[A-Z][[:alpha:]]{8}\b' file.txt

(added \b like choroba explains)

5 Comments

Probably it doesn't matter so much, but this way you're limited to $( getconf ARG_MAX ) characters in the file. Also, eleven-characters words are accepted due to the missing $.
...and added $ for the first solution.
@Iserni, you have mistaken my command behaviour, there's no limitation of ARG_MAX like you said here, see pastie.org/5100411
If you want to trigger the error you expected, try doing grep -Eo $(<file.txt) file.txt> /dev/null
you're right; it's shell dependant. If you use the internal printf (e.g. bash), it works; if it falls through /usr/bin/printf, it fails. I still think it's a bit risky, and I'd rather cutting with sed, but hey, TMTOWTDI :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.