1

So I have a postgresql table called state_data where there are two columns: datetime and state. The state column is of jsonb type and specifies various state data for a given datetime. Here is an example of the table:

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

Over time this table will get very big - particularly I increase sampling frequency - and I really only want to store data where consecutive rows have different temperatures. So the table above would reduce to,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

I know how to do this if the temperature data were in its own column, but is there a straightforward way to handle this operation and delete all consecutive duplicates based on an item within a json column?

What if I wanted to delete duplicates for both json items? For example,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
1
  • I figured out I can use AND statement in your last line to solve the condition based on intersection of two subconditions. Commented Oct 31, 2018 at 20:01

1 Answer 1

2

Use the window function lag():

select datetime, state
from (
    select datetime, state, lag(state) over (order by datetime) as prev
    from state_data
    ) s
where state->>'temp' is distinct from prev->>'temp'

If the table has a primary key you should use it in the delete command. In the lack of a primary key you can cast state to jsonb:

delete from state_data
where (datetime, state::jsonb) not in (
    select datetime, state::jsonb
    from (
        select datetime, state, lag(state) over (order by datetime) as prev
        from state_data
        ) s
    where state->>'temp' is distinct from prev->>'temp'
)
Sign up to request clarification or add additional context in comments.

5 Comments

Sorry, this is probably a dumb questions, but what does the s do in the penultimate line?
s is an alias - subquery in FROM must have an alias. I've updated the answer with the delete command.
What if I wanted to modify such that it deletes duplicate records of temp AND location (see modified question)?
You can use and or compare state::jsonb to prev::jsonb. The issue is that you cannot compare json columns (have to cast to jsonb)
Yes, state is in jsonb. I edited original post to reflect same. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.