Splitting time period into multiple smaller ones with irregular length



  • We have a set of data containing event dates, other event data and the start and end date that of the 'event season' that the event took place in.

    The issue is as follows: In postgresql, I wish to split the event seasons into multiple smaller ones, ending a new season a week after an event has taken place. A new season will start the day after, again ending 7 days after the first event in this new season. This is quite easy to do if there would have been an event every day, I could just use generate_series() and split each season into 7 day chunks.

    However, the problem is that there is not an event every day, and I wish a new season to only end 6 days after an event has taken place.

    For example, if there is an event on day 1,2,16,18 and 20 of the old season (day 1-31), the new seasons would have the following start and end dates:

    • Part 1: Day 1 - Day 7 (6 days after first event in this new season)
    • Part 2: Day 8 - Day 16 (6 days after the 10th)
    • Part 3: Day 17 - Day 24 (6 days after the 18th)
    • Part 4: Day 25 - Day 31 (which would contain no events)

    I have tried to create this using a combination of LAG, least and greatest, but can't seem to find a way that doesn't result in at least part of the script being iterative.

    Can someone perhaps give some pointers as to the direction how to best solve this issue?

    Example: Input table:

    CREATE TABLE events (
     season_start  date
    ,season_end    date
    ,event_date    date
    )
    

    INSERT INTO events (season_start, season_end, event_date) VALUES
    ('2022-01-01', '2022-01-31', '2022-01-01')
    ,('2022-01-01', '2022-01-31', '2022-01-02')
    ,('2022-01-01', '2022-01-31', '2022-01-10')
    ,('2022-01-01', '2022-01-31', '2022-01-16')
    ,('2022-01-01', '2022-01-31', '2022-01-18')
    ,('2022-01-01', '2022-01-31', '2022-01-20')
    ;

    The expected output would be:

    season_start | season_end | event_date
    2022-01-01   | 2022-01-07 | 2022-01-01
    2022-01-01   | 2022-01-07 | 2022-01-02
    2022-01-08   | 2022-01-16 | 2022-01-10
    2022-01-08   | 2022-01-16 | 2022-01-16
    2022-01-17   | 2022-01-24 | 2022-01-18
    2022-01-17   | 2022-01-24 | 2022-01-20
    

    If the event of the 10th would not exist, then the expected output would be:

    season_start | season_end | event_date
    2022-01-01   | 2022-01-07 | 2022-01-01
    2022-01-01   | 2022-01-07 | 2022-01-02
    2022-01-08   | 2022-01-22 | 2022-01-16
    2022-01-08   | 2022-01-22 | 2022-01-18
    2022-01-08   | 2022-01-22 | 2022-01-20
    


  • WITH RECURSIVE
    ranges AS (
        (
        SELECT id, 
               season_start, 
               season_end, 
               event_date,
               season_start range_start,
               (event_date + INTERVAL '6 day') :: DATE range_end
        FROM events
        ORDER BY event_date LIMIT 1
        )
        UNION ALL
        (
        SELECT events.id, 
               events.season_start, 
               events.season_end, 
               events.event_date,
               (ranges.range_end + INTERVAL '1 day') :: DATE,
               (events.event_date + INTERVAL '6 day') :: DATE
        FROM events
        JOIN ranges ON ranges.range_end < events.event_date
        ORDER BY event_date LIMIT 1
        )
    )
    SELECT events.id,
           events.season_start, 
           events.season_end,
           events.event_date,
           ranges.range_start,
           ranges.range_end
    FROM events
    JOIN ranges ON events.event_date BETWEEN range_start AND range_end;
    

    https://dbfiddle.uk/?rdbms=postgres_12&fiddle=8108f98dc0370d28dba04a7df4dae1ba




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2