Correcting insertion of new data



  • I have a fairly simple schema:

    CREATE TABLE "mysensor" (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    "date" INTEGER NOT NULL,
    "sensor_data" INTEGER NOT NULL
    );
    

    When inserting data into this table, the 3rd party application takes care of the date field, it's a standard UNIX-timestamp in seconds.

    Until now this was acceptable, as the sensors I used sent their data:

    • either at low frequency (1 sample/minute, or even 1 sample/day)
    • or they were not too accurate, so the measured data had to change a lot to trigger the insertion (there is a constraint in the 3rd party app that it doesn't insert if last value matches the current one)

    However I got some new sensors which are more precise, so the triggering in the app can't filter as the measured data keeps changing a little so technically last measured is not equal to current one, so the insertion will take place.

    So, anyway I would like to solve this from DB side, without too much overhead.

    I know how I can silently abort the insertion with TRIGGER, however I'm looking for an alternative best-effort method to have update the table when a too fresh data arrives.

    Too fresh: let's consider that everything below 1 minute.

    So currently, I'm dropping all data within one minute:

    CREATE TRIGGER "mytrigger" BEFORE INSERT ON "mysensor" WHEN NEW.date < 60 + (SELECT MAX(date) FROM "mysensor")
    BEGIN
      SELECT RAISE(ABORT, "Last data is yet too fresh");
    END
    

    I thought on some wise averaging, but for correct averaging I would need an extra count variable too.

    So currently I can either

    • lose data correctness by incorrectly weighting earlier vs new data, or
    • lose storage by adding a new field to schema to store the count of averaged earlier data.

    Is there a better way to keep data-correctness while not sacrificing storage?



  • I'm not sure if it's wise to drop data. Maybe your current assumption changes in the future, or perhaps you'll need the extra information later.

    If possible, I would do the following:

    1 - Create a new table that collects all raw data

    CREATE TABLE "sensorRawData" (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    "date" INTEGER NOT NULL,
    "sensor_data" INTEGER NOT NULL,
    "sqltime" TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
    );
    

    2 - Only insert the data you need from sensorRawData into your mysensor table (either as a trigger or a recurring job), by calculating the difference between "date" and "sqltime".

    3 - If space is a limitation, run a weekly cleanup job that deletes data older than X weeks or X months, depending on your configuration.


Log in to reply
 


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2