Postgresql: partition by number of rows after the current one



  • I need to calculate correlation for each say 3 consecutive rows.

    For example, assume there is a table with the following data.

    create table a (
        asset_id int,
        time int not null,
        value decimal not null
    );
    insert into a values
    (1,1,30),(1,2,45),(1,3,30),(1,4,30),(1,5,30),
    (2,1,30),(2,2,40),(2,3,30),(2,4,25),(2,5,25);
    

    select row_number() over() as r, a.asset_id, b.asset_id, a.time, a.value, b.value

    from a join a b on a.time=b.time and a.asset_id<b.asset_id;

    r | asset_id | asset_id | time | value | value
    ---+----------+----------+------+-------+-------
    1 | 1 | 2 | 1 | 30 | 30
    2 | 1 | 2 | 2 | 45 | 40
    3 | 1 | 2 | 3 | 30 | 30
    4 | 1 | 2 | 4 | 30 | 25
    5 | 1 | 2 | 5 | 30 | 25
    (5 rows)

    The correlation of entire series can be calculated like this

    # select a.asset_id, b.asset_id as asset2_id, corr(a.value, b.value)
    from a join a b on a.time=b.time and a.asset_idasset_id | asset2_id |        corr
    ----------+-----------+--------------------
    1 | 2 | 0.9128709291752769
    (1 row)

    But I would like it calculated for each 3 consecutive rows, i.e. for rows 1-3, 2-4, 3-5. Is it possible achieve that?



  • This does it:

    select a.asset_id, b.asset_id, 
        count(*) over w, 
        corr(a.value, b.value) over w
    from a join a b on a.time=b.time and a.asset_id

    But it does generate values when you are too close to the end of a partition to be able to have 2 following rows, so with the count you can filter those ones out.

    So that is how you do it in SQL. But from a statistics perspective, this seems like malpractice. That is a topic for a different forum though.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2