1

set of words that are 10 characters long and that contain a substring of three consecutive vowels. So far I tried these command.

grep -E '^.{10}$'| grep 'a*.e*.i*.o*.u*' words2.txt
grep -E '^.{10}$&a*.e*.i*.o*.u*' words2.txt

Input data, extracted via OCR of this screenshot:

unpernicious
unperspicuous
unpervious
unpious
unpiteous
unpiteously
unpiteousness
unplebeian
unplenteous
unportmanteaued
unportuous
unprecarious
unprecious
unprecocious
unpredacious
unpresumptuous
unpresumptuously
unpretentious
unpretentiously
unpretentiousness
unpromiscuous
unpropitious
unpropitiously
unpropitiousness
unpugnacious
unpunctilious
unquailed
unquailing
unquailingly
unqueen
unqueened
unqueening
unqueenlike
unqueenly
unquiescence
unquiescent
unquiescently
unquiet
unquietable
unquieted
unquieting
unquietly
unquietness
unquietude
unrapacious
unrebellious
unreligious
unreligiously
unreligiousness
unrighteous
unrighteously
unrighteousness
unsacrilegious
Unsagacious
unsalubrious
unsanctimonious
unsanctimoniously
unsanctimoniousness
unsanguineous
unsanguineously
unseditious
unseeable
unseeing
2
  • Should it report words like plateauing (4 consecutive vowels)? Commented Apr 26, 2017 at 12:01
  • Please be more specific in your question's title. That new "need a help and clarify with grep & regex" title is not useful and won't help you get answers or help people with a similar need find this Q&A. The original title ("Find the set of words that are exactly 10 characters long and that contain a substring of 3 consecutive vowels") was a lot better. Commented Apr 27, 2017 at 11:19

4 Answers 4

2

Your problem is (IMHO) better solved with awk, but I'll just point out a problem with your command

grep -E '^.{10}$'| grep 'a*.e*.i*.o*.u*' words2.txt 

To filter the contents of the file word2.txt through both grep invocations, this ought to look like

grep -E '^.{10}$' words2.txt | grep 'a*.e*.i*.o*.u*'

The second grep pattern should be [auoie]{3}, which lands us at

grep -E '^.{10}$' words2.txt | grep -E '[aouie]{3}'

The input to the first grep is your file. The input to the second grep is the output of the first grep, not your file.

Using a POSIX awk (like recent versions of GNU awk):

$ awk 'length == 10 && /[aouei]{3}/' words2.txt
unpervious
unplebeian
unportuous
unprecious
unquailing
unqueening
unquieting
unquietude

mawk, BSD awk and historical pre-POSIX implementations of awk don't support {n} in regular expressions as pointed out by Stéphane Chazelas.

0
2

You had the 10 characters right, but to find 3 vowels in a row, look for a group [AEIOU]:

egrep '^.{10}$' | egrep -i '[AEIOU]{3}'

To reject whitepace use this:

egrep '^[^ \t]{10}$' | egrep -i '[AEIOu]{3}'
16
  • it worked, but some of them has words less than 10 characters :(' Commented Apr 26, 2017 at 5:51
  • Is there extra whitespace? Commented Apr 26, 2017 at 5:53
  • nope, there aren't any whitespaces, but there are characters less than 10, I need characters either 10 or more than 10 that contain a substring of 3 consecutive vowels Commented Apr 26, 2017 at 6:00
  • To help further, you are likely going to need to show some sample data in your post. Commented Apr 26, 2017 at 6:01
  • 3
    @StephenRauch The OP is putting the input file name at the very end of the command line. Commented Apr 26, 2017 at 11:59
2

Assuming 1 word/line, you can do this:

sed -nE '/^.{10}$/!d;/[aAeEiIoOuU]{3}/p' words.txt
0
1

With grep built with PCRE support:

grep -iPx '(?=.*[aeiou]{3}.*).{10}'

Or:

grep -wiP '(?=\w*[aeiou]{3}\w*)\w{10}'

to search for those words when they're not one per line (add -o if your grep implementation supports it to print the matching words only instead of the whole line they're found in). There word means any sequence of word characters (letters (in the latin script, without diacritics only, add a (*UCP) for letters in any script, though that still won't cover vowels like é or α), digits and underscore).

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.