Postgresql: Dynamic Regex Pattern

Question

I have event data that looks like this:

 id | instance_id | value
 1  | 1           | a
 2  | 1           | ap
 3  | 1           | app
 4  | 1           | appl
 5  | 2           | b
 6  | 2           | bo
 7  | 1           | apple
 8  | 2           | boa
 9  | 2           | boat
10  | 2           | boa
11  | 1           | appl
12  | 1           | apply

Basically, each row is a user typing a new letter. They can also delete letters.

I'd like to create a dataset that looks like this, let's call it data

 id | instance_id | value
 7  | 1           | apple
 9  | 2           | boat
12  | 1           | apply

My goal is to extract all the complete words in each instance, accounting for deletion as well - so it's not sufficient to just get the longest word or the most recently typed.

To do so, I was planning to do a regex operation like so:

select * from data
where not exists (select * from data d2 where d2.value ~ (d.value || '.'))

Effectively I'm trying to build a dynamic regex that adds matches one character more than is present, and is specific to the row it's matching against.

The code above doesn't seem to work. In Python, I can "compile" a regex pattern before I use it. What is the equivalent in PostgreSQL to dynamically build a pattern?

How do you know what a word is? I mean, why isn't "boa" and "a" and "app" in your list? — Gordon Linoff
– Gordon Linoff, Commented Jul 12, 2018 at 19:52
Sounds more like you want to compare all values against a dictionary. I cannot imagine how regular expressions should help here (at least to do the job completely on their own). — sticky bit
– sticky bit, Commented Jul 12, 2018 at 20:08
Hey guys - actually not interested in dictionary words at all (our actual data isn't dictionary words anyway). We see two triangular patterns, I'm interested in the "peaks" ie: a > aa > aaa > aaaa > aaa > aa > aab has two peaks: aaaa and aab. — A User
– A User, Commented Jul 12, 2018 at 21:30

krokodilko · Accepted Answer · 2018-07-12 20:13:50Z

1

Try simple LIKE operator instead of regex patterns:

SELECT * FROM data d1
WHERE NOT EXISTS (
  SELECT * FROM data d2
  WHERE d2.value LIKE d1.value ||'_%'
)

Demo: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cd064c92565639576ff456dbe0cd5f39

Create an index on value column, this should speed up the query a bit.

answered Jul 12, 2018 at 20:13

krokodilko

36.3k7 gold badges62 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

A User Over a year ago

This worked for my base case, thank you! Is there a way to build a dynamic pattern though?

Abelisto · Accepted Answer · 2018-07-12 22:22:17Z

1

To find peaks in the sequential data window functions is a good choice. You just need to compare each value with previous and next ones using lag() and lead() functions:

with cte as (
  select 
    *, 
    length(value) > coalesce(length(lead(value) over (partition by instance_id order by id)),0) and
    length(value) > coalesce(length(lag(value) over (partition by instance_id order by id)),length(value)) as is_peak
  from data)
select * from cte where is_peak order by id;

Demo

answered Jul 12, 2018 at 22:22

Abelisto

15.8k3 gold badges38 silver badges47 bronze badges

Collectives™ on Stack Overflow

Postgresql: Dynamic Regex Pattern

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related