How to transpose values from one column into columns with a value from a different column?



  • I have a database with the following structure:

    Date role type duration
    2022-04-16 Nurse Food preparation 45
    2022-04-17 Nurse Cleaning 30
    2022-04-17 Volunteer Cleaning 20
    2022-04-17 Nurse Food preparation 60

    Note: I don't know the values in the "type" column in advance, since they are defined by the user. Also, There can be multiple rows with overlapping date, role, and type.

    I am using a charting library that would like for the data to be grouped as follows:

    role Food preparation Cleaning
    Nurse 105 30
    Volunteer Null 20

    So far, I am able to group the data using the following query

    select 
        role,
        type, 
        sum(duration) as total_minutes
    from work
    group by role, type;
    
    role type total_mintes
    Nurse Cleaning 45
    Nurse Food preparation 20
    Volunteer Cleaning 15
    Volunteer Food preparation 43

    How can I "pivot"/"transpose" the data so that each row represents a role with one column containing the sum of minutes for each type of work?

    In effect, I would like to transpose the data similar to the Pandas https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html function, but using only SQL.



  • First of all you will need to install the tablefunc extension using the create extension tablefunc; command, otherwise the pivot function crosstab will not work.

    Even after reading this answer, it is recommended that you read the official documentation of PostgreSQL on crosstab https://www.postgresql.org/docs/current/tablefunc.html

    As for how to do this:

    select *
    from crosstab(
        'select
        role,
        type,
        sum(duration) as total_minutes
    from work
    group by role, type
    order by type',
        'select distinct type from work order by type'
    ) as ct(
        role text,
        "Cleaning" text,
        "Food preparation" text
    );
    

    Pay attention to the explicit order by clause in both queries, this is a must, otherwise it may map values incorrectly as SQL does not guarantee the order of data without it.

    You will have to specify each possible output of the column type in the alias.


    A more dynamic version of the above (although not perfect by any means):

    create or replace function get_dynamic_transpose()
      returns text
      language plpgsql
    as
    $$
    declare
        v_output_columns text;
    begin
        select array_to_string(array_agg(distinct quote_ident(type) || ' ' || pg_typeof(type) || E' \n'),',','null')
        into v_output_columns
        from testing;
    
    return format(
    

    'select *
    from crosstab(
    ''select
    role,
    type,
    sum(duration) as total_minutes
    from testing
    group by role, type
    order by type'',
    ''select distinct type from testing order by type''
    ) as ct(
    role text,
    %s
    );', v_output_columns
    );
    end;
    $$;

    This function would return the query you need to execute to get your desired result. It will dynamically build the list of possible columns you need for the output. This function can definitely be made to be more general purpose like it is done https://stackoverflow.com/a/15514334 but it's not a small amount of work to do that as PostgreSQL cannot return a set it does not know its definition beforehand.

    There is the other option of this function instead of returning a query string, it can instead return an array of json objects each representing a row, and you would split this json into normal rows and column on application side. If such a solution is acceptable then this works fine:

    create or replace function get_dynamic_transpose_jsonb()
      returns jsonb
      language plpgsql
    as
    $$
    declare
        v_output_columns text;
        v_query text;
        v_result jsonb;
    begin
        select array_to_string(array_agg(distinct quote_ident(type) || ' ' || pg_typeof(type) || E' \n'),',','null')
        into v_output_columns
        from testing;
    
    v_query = format(
    

    'select jsonb_agg(ct)
    from crosstab(
    ''select
    role,
    type,
    sum(duration) as total_minutes
    from testing
    group by role, type
    order by type'',
    ''select distinct type from testing order by type''
    ) as ct(
    role text,
    %s
    );', v_output_columns
    );

    execute v_query into v_result;
    
    return v_result;
    

    end;
    $$;

    The result of this function would be something similar to the following

    [{"role": "Nurse", "Cleaning": "30", "Food preparation": null}, {"role": "Volunteer", "Cleaning": null, "Food preparation": "55"}]
    



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2