I'm dealing with a legacy database with about 2 million rows. It's used for tracking order information. It's basically a transactional log of order entry and all edits, cancellations, etc., to that order. This database comes from a legacy 1980s error text database program that is rock solid. I'm making some web views of the data.
Simplified schema:
| id | Order# | Order Line# | Widget ID# | Quantity | ~ 15 other columns |
|---|---|---|---|---|---|
| 1 | 100 | 1 | ABC | 5 | |
| 2 | 100 | 2 | DEF | 2 | |
| 3 | 101 | 1 | XYZ | 10 | |
| 4 | 102 | 1 | ABC | 5 | |
| 5 | 100 | 1 | ABC | -5 | |
| 6 | 100 | 1 | GHI | 2 |
The relevant part is that in lines id 1 and 2:
- id 1: defines Order #100, Order Line 1, widget "ABC", in quantity of 5
- id 2: defines Order #100, Order Line 2, widget "DEF" in quantity 2.
Then, id 5, Order Line 1 is cancelled.
- id 5: defines Order #100, Order Line 1, Widget "ABC", adds quantity -5, zeroing out the earlier order of 5 so effectively cancelling Order #100, Order Line 1.
Then, a NEW widget replaces the original, and now zeroed out, Order #100 / Order Line 1.
- id 6, Order #100, Order Line #1 is replaced by an order of Widget "GHI" with quantity 2.
It's simple to deal with this data structure on an order by order basis, but I was hoping to create a VIEW that would remove some of the application logic.
Basically what I need is a query that GROUPs BY:
- Order #
- Order Line #
And returns the entire row with the highest id. So the above table as a VIEW would remove rows 1 and 5 since they cancel each other out and are superseded by row id 6. Like:
| id | Order# | Order Line# | Widget ID# | Quantity | ... other columns ... |
|---|---|---|---|---|---|
| 2 | 100 | 2 | DEF | 2 | |
| 3 | 101 | 1 | XYZ | 10 | |
| 4 | 102 | 1 | ABC | 5 | |
| 6 | 100 | 1 | GHI | 2 |
Orders can comprise 100s of rows and have multipe edits, but I just want the most recent (highest id) for each order# - order line # grouping if that makes sense.
Edit: Actually this is even more complicated that I realized, because it's not always a total replacement as in my contrived example. Sometimes a new quantity can be added, in which case it has to be summed from multiple rows.