Correcting insertion of new data
-
I have a fairly simple schema:
CREATE TABLE "mysensor" ( "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "date" INTEGER NOT NULL, "sensor_data" INTEGER NOT NULL );
When inserting data into this table, the 3rd party application takes care of the date field, it's a standard UNIX-timestamp in seconds.
Until now this was acceptable, as the sensors I used sent their data:
- either at low frequency (1 sample/minute, or even 1 sample/day)
- or they were not too accurate, so the measured data had to change a lot to trigger the insertion (there is a constraint in the 3rd party app that it doesn't insert if last value matches the current one)
However I got some new sensors which are more precise, so the triggering in the app can't filter as the measured data keeps changing a little so technically last measured is not equal to current one, so the insertion will take place.
So, anyway I would like to solve this from DB side, without too much overhead.
I know how I can silently abort the insertion with TRIGGER, however I'm looking for an alternative best-effort method to have update the table when a too fresh data arrives.
Too fresh: let's consider that everything below 1 minute.
So currently, I'm dropping all data within one minute:
CREATE TRIGGER "mytrigger" BEFORE INSERT ON "mysensor" WHEN NEW.date < 60 + (SELECT MAX(date) FROM "mysensor") BEGIN SELECT RAISE(ABORT, "Last data is yet too fresh"); END
I thought on some wise averaging, but for correct averaging I would need an extra count variable too.
So currently I can either
- lose data correctness by incorrectly weighting earlier vs new data, or
- lose storage by adding a new field to schema to store the count of averaged earlier data.
Is there a better way to keep data-correctness while not sacrificing storage?
-
I'm not sure if it's wise to drop data. Maybe your current assumption changes in the future, or perhaps you'll need the extra information later.
If possible, I would do the following:
1 - Create a new table that collects all raw data
CREATE TABLE "sensorRawData" ( "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "date" INTEGER NOT NULL, "sensor_data" INTEGER NOT NULL, "sqltime" TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL );
2 - Only insert the data you need from sensorRawData into your mysensor table (either as a trigger or a recurring job), by calculating the difference between "date" and "sqltime".
3 - If space is a limitation, run a weekly cleanup job that deletes data older than X weeks or X months, depending on your configuration.