I have a table in PostgreSQL that has two date fields ( start and end ). There are many invalid dates both date fields like 0988-08-11,4987-09-11 etc.. Is there a simple query to identify them? The data type of the field is DATE. Thanks in advance.
2 Answers
Values in a date column ARE valid per definition. The year 0988 = 988 is a valid historic date as well as the year 4987 which is far in the future.
To filter out dates which are too historic or too far in the future you simply make this query:
SELECT
date_col
FROM
table
WHERE
date_col < /* <MINIMUM DATE> */
OR date_col > /* <MAXIMUM DATE> */
For date ranges (your minimum and maximum date) you could use the daterange functionality:
- https://www.postgresql.org/docs/current/static/rangetypes.html
- https://www.postgresql.org/docs/current/static/functions-range.html
Example table:
start_date end_date
2015-01-01 2017-01-01 -- valid
200-01-01 900-01-01 -- completely too early
3000-01-01 4000-01-01 -- completely too late
0200-01-01 2000-01-01 -- begin too early
2000-01-01 4000-01-01 -- end too late
200-01-01 4000-01-01 -- begin too early, end too late
Query:
SELECT
start_date,
end_date
FROM
dates
WHERE
daterange('1900-01-01', '2100-01-01') @> daterange(start_date, end_date)
Result:
start_date end_date
2015-01-01 2017-01-01
1 Comment
Those are valid dates, but if you have business rules that state they are not valid for your purpose, you can delete them based on those rules:
For example, if you don't want any dates prior to 1900 or after 2999, this statement would delete the records with those dates:
DELETE FROM mytable
WHERE
start_date < '1900-01-01'::DATE OR
start_date >= '2999-01-01'::DATE OR
end_date < '1900-01-01'::DATE OR
end_date >= '2999-01-01'::DATE;
If you want to replace the dates with the lowest/highest acceptable dates instead of deleting the entire record, you could do something like this:
UPDATE mytable
SET
start_date = least('2999-01-01'::DATE, greatest('1900-01-01'::DATE, start_date)),
end_date = least('2999-01-01'::DATE, greatest('1900-01-01'::DATE, end_date))
WHERE
start_date < '1900-01-01'::DATE OR
start_date >= '2999-01-01'::DATE OR
end_date < '1900-01-01'::DATE OR
end_date >= '2999-01-01'::DATE;
datecan not contain invalid dates.