Canonical-or-best data type to store a Flake ID in PostgresSQL for two query patterns
-
I want to generate and write https://github.com/boundary/flake to a column from clients connected to my PostgreSQL database. These IDs are essentially a large globally-unique numbers that grow with time, similar to a monotonically increasing ID but without any coordination.
The two most common query patterns would look this:
- Sort by the Flake ID
SELECT * ORDER BY flake_id
- Group by another column, and then select the maximum Flake ID in each group.
SELECT max(flake_id) GROUP BY some_other_column
There seem to be a few possible ways to do this:
bigint
with two columns. Simple to order by, but unclear how to select maximum across two words after a group-by.bytea
with 16 bytes.bit(n)
wheren=128
uuid
happen to be 128-bit, and testing yields ORDER BY results, but there is nomax
function that can be applied to it- encoded as a string, and stored in
text
haven't tested it but amax
function seems nonsensical unless it uses a lexicographic order. It also seems a bit dirty to use a string to encode what is order-able on its own numerically.
There's a bit of choice paralysis in the options and how they interact with indexes to perform the above query patterns most efficiently.
I am looking for insight into the ideal data-type considering the above query patterns, and how it would interact with the relevant indexes.
- Sort by the Flake ID
-
If you want to go with two
bigint
s, create a composite type forflake_id
:CREATE TYPE pair (a bigint, b bigint);
Your first query then can remain as it is, and your second query could be rewritten to
SELECT DISTINCT ON (some_other_column) flake_id ORDER BY some_other_column, flake_id DESC;
But why don't you use the obvious data type
numeric
?