Working forwards like this is definitely the right solution if it's achievable. ...

Working forwards like this is definitely the right solution if it's achievable. I have worked with some government and police datasets, and they reflect that the records-keeping approach is very much designed with the old-world use case of manually reviewing individual records. For example, a record of a traffic collision would be perfectly fine if you wanted to go back see what happened in a specific collision. However, if you wanted to run an analysis over a set of collision records, you would run into problems like vehicle types being specified as 'free text' (anything can be entered), with no standard set of vehicle classifications (like an enumeration).