Unique array values for this string_to_array

  • This is a follow-up to:

    • https://dba.stackexchange.com/questions/308684/best-way-to-map-different-json-keys-to-same-target-columns/308686#308686

    Based on these sample tables:

    id | field_map
    1  | {"segments": "SEGMENT IDS", "full_name": "FULL NAME"}

    id | data_provider_id | email | data

    1 | 201 | hi@hi | {"SEGMENT IDS": "id1,id1,id1,id2,id3", "FULL NAME": "John Doe"}
    2 | 201 | xx@xx | {"FULL NAME": "Billy Bob"}

    desired output:

    data_provider_id | email | full_name | segment

    201 | hi@hi | John Doe | id1
    201 | hi@hi | John Doe | id2
    201 | hi@hi | John Doe | id3
    201 | xx@xx | Billy Bob | NULL

    I have the following query:

      leads.data->>(p.field_map->>'full_name') AS full_name,
    FROM leads
    LEFT OUTER JOIN data_providers p ON p.id = leads.data_provider_id
    LEFT JOIN LATERAL unnest(string_to_array(leads.data->>(p.field_map->>'segments'), ',')) AS segment ON true

    This query is doing 2 particular things:

    1. its joining on data_providers table to get the field_map column which contains a JSONB mapping if CSV column headers. So something like {"segments": "SEGMENT ID", "full_name": "FULL NAME"}

    2. Within the data JSONB column of leads, there is a key (which I discover through the field map above) that contains a comma separated string of segment_ids (it comes in a CSV and they chose to put 2 values within 1 row). I want to split it so each segment_id gets its own row (and obviously all other columns remain the same on both rows).

    I have 2 goals:

    1. If there is an empty string or the key doesn't exist within the map, I want to return the row but just with NULL for the segment_id. I already got this working by changing CROSS JOIN to LEFT JOIN.

    2. I'm trying to remove duplicates in segment ids, so if someone enters 'id1,id1' it should only produce 1 row. I do this because there is a unique index on that column for the materialized view.

    I'm currently stuck on #2.

  • Make it a subquery and throw in DISTINCT:

    SELECT l.data_provider_id
         , l.email
         , l.data->>(p.field_map->>'full_name') AS full_name
         , s.segment
    FROM   leads l
    LEFT   JOIN data_providers p ON p.id = l.data_provider_id
       SELECT DISTINCT segment
       FROM   unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
       ) s ON true

    Your field_map holds the key 'segment', not 'segments', btw.

    You could even use this short syntax:

       SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
       ) s(segment) ON true

    (But the last one might make unsuspecting SQL purists cringe.)

    Original order of array elements is not preserved. If you need that, see:

    • https://dba.stackexchange.com/questions/27279/how-to-preserve-the-original-order-of-elements-in-an-unnested-array/27287#27287

    And use GROUP BY rather than DISTINCT and also aggregate the minimum ordinal position for each group of duplicates.

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2