Removing consecutive duplicates in a postgresql database where data is in json column

Question

So I have a postgresql table called state_data where there are two columns: datetime and state. The state column is of jsonb type and specifies various state data for a given datetime. Here is an example of the table:

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

Over time this table will get very big - particularly I increase sampling frequency - and I really only want to store data where consecutive rows have different temperatures. So the table above would reduce to,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

I know how to do this if the temperature data were in its own column, but is there a straightforward way to handle this operation and delete all consecutive duplicates based on an item within a json column?

What if I wanted to delete duplicates for both json items? For example,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

I figured out I can use AND statement in your last line to solve the condition based on intersection of two subconditions. — racket99
– racket99, Commented Oct 31, 2018 at 20:01

klin · Accepted Answer · 2018-10-31 19:49:48Z

2

Use the window function lag():

select datetime, state
from (
    select datetime, state, lag(state) over (order by datetime) as prev
    from state_data
    ) s
where state->>'temp' is distinct from prev->>'temp'

If the table has a primary key you should use it in the delete command. In the lack of a primary key you can cast state to jsonb:

delete from state_data
where (datetime, state::jsonb) not in (
    select datetime, state::jsonb
    from (
        select datetime, state, lag(state) over (order by datetime) as prev
        from state_data
        ) s
    where state->>'temp' is distinct from prev->>'temp'
)

edited Oct 31, 2018 at 19:49

answered Oct 31, 2018 at 19:36

klin

123k15 gold badges240 silver badges262 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

racket99 Over a year ago

Sorry, this is probably a dumb questions, but what does the s do in the penultimate line?

klin Over a year ago

s is an alias - subquery in FROM must have an alias. I've updated the answer with the delete command.

racket99 Over a year ago

What if I wanted to modify such that it deletes duplicate records of temp AND location (see modified question)?

klin Over a year ago

You can use and or compare state::jsonb to prev::jsonb. The issue is that you cannot compare json columns (have to cast to jsonb)

racket99 Over a year ago

Yes, state is in jsonb. I edited original post to reflect same. Thanks.

Collectives™ on Stack Overflow

Removing consecutive duplicates in a postgresql database where data is in json column

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related