Get rows grouped by a foreign key with count of consecutive values



  • I have a SQL Server database with a transactions table with client_id, date, and is_cancelled.

    I'm trying to get the client_ids that have 3 or more transactions in a row marked as is_cancelled, along with the in_a_row count. I've gotten as far as the following, which gives me a 1 for is_same when the is_cancelled supports are consecutive, and a running total of cancelled transactions (which is not quite what I need)

    SELECT 
    client_id,
    date,
    is_same,
    SUM(is_same) OVER (PARTITION BY client_id ORDER BY date) AS sum_same,
    transCancelled
    FROM
    (
        SELECT
        client_id,
        LAG(is_cancelled) OVER (PARTITION BY client_id ORDER BY date) AS previous_cancelled,
        CASE 
            WHEN is_cancelled = LAG(is_cancelled) OVER (PARTITION BY client_id ORDER BY date)
            THEN 1
            ELSE 0
        END as is_same,
        date,
        is_cancelled
        FROM transactions
        WHERE deleted_at IS NULL -- Ignore soft-deleted rows
    ) AS t_01
    WHERE previous_cancelled = 1
    ORDER BY date
    

    Fiddle with sample data: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a0c9b12203ab2d0c83f73604ccc9d0a0

    Expected data (client_id, count) 1, 3 3, 6



  • This is a type of gaps-and-islands problem.

    The key to most solutions of this type, is to count the changes, so you want an is_different column, not is_same. Then you conditionally count that column (easier if you use NULL instead of 0) to create an ID for each group of rows.

    It's unclear exactly what final results you want, but by grouping up that result, you can get the maximum and minimum number of consecutive rows, as well as the count of actual row-groups, per client_id:

    WITH PrevValues AS (
        SELECT
          client_id,
          LAG(is_cancelled) OVER (PARTITION BY client_id ORDER BY date) AS previous_cancelled,
          CASE WHEN is_cancelled = LAG(is_cancelled) OVER (PARTITION BY client_id ORDER BY date)
            THEN NULL ELSE 1 END
            as is_different,
          date,
          is_cancelled
        FROM transactions
    ),
    Grouped AS (
        SELECT 
          client_id,
          date,
          is_different,
          COUNT(is_different) OVER (PARTITION BY client_id ORDER BY date ROWS UNBOUNDED PRECEDING) AS group_id
        FROM PrevValues
    ),
    ByGroups AS (
        SELECT
          client_id,
          COUNT(*) as in_a_row
        FROM Grouped
        GROUP BY
          client_id,
          group_id
        HAVING COUNT(*) >= 3
    )
    SELECT
      client_id,
      MAX(in_a_row) as max_in_a_row,
      MIN(in_a_row) as min_in_a_row,
      COUNT(*) as num_groups
    FROM ByGroups
    GROUP BY
      client_id;
    

    https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ca7d37a3a4d51a5fc9f9a27f941b739a

    Note that your sample data has rows with identical dates, and is one reason you should always use ROWS UNBOUNDED PRECEDING (the default for ordered window functions is RANGE UNBOUNDED PRECEDING which is subtly different). In any event, you should always try to have deterministic ordering.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2