I would like to extract 3 words before the selay dervice but the query returns an empty column :(
with a as (
select * from tablename1 b
where lower(ptranscript) rlike 'selay dervice'
)
select *,regexp_extract(lower(a.ptranscript),'([a-zA-Z0-9]+\s+){3}selay dervice',0) from a
##########update 1
as pointed by Raid earlier, in Hive we cannot use \s and have to use \\s. I updated the above regex accordingly and it works
with a as (
select * from tablename1 b
where lower(ptranscript) rlike 'selay dervice'
)
select *,regexp_extract(lower(a.ptranscript),'([a-zA-Z0-9]+\\s+){3}selay dervice',0) from a
lower()but then look forA-Zas well.0means returning the whole matched string, not the 3 words you want. To get the 3 at once you may need to add extra parenthesis:(([a-zA-Z0-9]+\s+){3}), as otherwise the groups are the individual words. Testing the regex here works fine. It matchesThis is a selay dervice, and with the extra parenthesis you getThis is a.rlikeis for regular expressions, it might be faster to uselike '%selay dervice'%instead. Following from my comment above, I think you need to use this:regexp_extract(a.ptranscript,'(([a-zA-Z0-9]+\s+){3})selay dervice',1).