Redshift Spectrum Implicit Unnest of List in Join



  • I have data that is stored in S3 and I am querying it using SQL via Redshift Spectrum. One of the columns is stored as a list in the S3 file like so:

    Table_ABC

    column_a items_list
    a. [item_1, item_2]
    b. [item_3, item_4]

    Redshift has no unnest function, but I have picked out of our codebase the syntax below that works to unnest the list.

    select 
        column_a,
        unnested_list_items
    from table_abc as abc
    left join abc.items_list as unnested_list_items on true
    

    Running this results in:

    column_a unnested_list_items
    a. item_1.
    a. item_2.
    b. item_3.
    b. item_4.

    No one seems to understand why this works and I can't find any documentation on this and why it works. The closest I can find is the documentation below on Athena syntax, but the unnest seems to be implicit and afaik it's not being processed by Athena.

    https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html

    Is anyone able to explain how this syntax is processed by the query engine or point to any documentation on this?

    I am using Redshift 1.0.37680. Thanks!



  • As found in the documentation https://docs.aws.amazon.com/redshift/latest/dg/query-super.html under 'Querying semistructured data' the from clause syntax x as y means that y is iterating over x.

    In the syntax in the example abc.items_list as unnested_items_list means that unnested_items_list is iterating over the array abc.items_list.


Log in to reply
 


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2