Context
We are storing weather forecasts as multiband rasters (using Postgres with PostGIS). Each forecast is identified by its date and time (either 0:00, 6:00, or 12:00). We are looking at about 20 years of data. Besides analyses conducted for research purposes, for which speed isn't of primary importance, there will also be an online dashboard that will display the latest forecast(s) on a map.
The table could have the following columns:
datetime, metadata_id, raster_data
I foresee two main use cases: (1) either getting the most recent forecast, or (2) selecting a single past forecast. The queries could look something like this:
SELECT datetime, metadata_id, raster_data FROM myTable ORDER BY datetime DESC LIMIT 1;SELECT datetime, metadata_id, raster_data FROM myTable WHERE datetime= '2015-10-12 12:00:00'::timestamp;
Broader Context
The above situation actually applies to 6 different tables that only differ in that their data comes from different sources, and different physical variables are represented. I am therefore considering a single table with the following columns:
datetime, dataSource_physVar, metadata_id, raster_data
The different sources do have different raster extents, so I am consider using child tables.
Question 1
Given the limited use cases, is it better to use a single datetime field, or rather a date field with an indexed smallint field for the hour?
Question 2
Given the quantity of data (365*3*20 years = 22k lines), is it even worth worrying about efficiency?