Non-clustered Primary Key and Clustered Index
It is my understanding that in SQL Server, you can have a Primary Key that is non-clustered, and have another index that is the clustered one.
To me, this seems the same as just having a Primary Key, and an extra UNIQUE key.
So I have two questions:
if a Primary Key is non-clustered, does it store all the columns with it? Or only the Primary Key columns and the columns referencing the clustered index?
I've just read that if the PK isn't the clustered index, then the clustered index does NOT have to be UNIQUE (but it's highly encouraged). Does this mean then that the table could be "randomly sorted" on the rows with the same key?
if a Primary Key is non-clustered, does it store all the columns with it?
Nope, that is a characteristic of the clustered index, not the primary key. A nonclustered primary key will only store the fields it is defined on plus the clustered index's key fields.
...the clustered index does NOT have to be UNIQUE (but it's highly encouraged). Does this mean then that the table could be "randomly sorted" on the rows with the same key?
The rows of the table will always be logically sorted by the clustered index fields. In the case where the clustered index is not unique, and there are two rows with the same values for the clustered index key, those rows are given a unique row identifier that is stored behind the scenes (this 4 byte uniquifier is only added to the duplicate key rows). This is the determining factor in how they're sorted relative to each other.
One reason why one might choose to make the primary key a nonclustered index is because they find performance benefits in having the data logically sorted differently than the fields that identify uniqueness for the row (regardless if the clustered index is unique or not).
An example would be if a GUID was used in a
UNIQUEIDENTIFIERcolumn as the primary key of your table. GUIDs are great for uniqueness (most times) but aren't usually a great way to keep your data sorted because of their random-like values. Instead you may have a natural set of fields in that table, that your queries typically join or filter on, that aren't necessarily guaranteed to be unique, which instead make for a good clustered index then. Or even a set of fields you typically order on in your queries could be a candidate for a clustered index, to eliminate a heavy sort operation from the query plan. The data will then be sorted in an order that makes sense instead of by the semi-random values of a GUID.
Please see https://www.sqlshack.com/sql-server-clustered-indexes-internals-with-examples/ for more information on how clustered indexes work.