Does any RDBMS "inline" a 1:0-1 relationship's dependent table?



  • In the case of a 1:0-1 relationship between a principal table p and dependent table d, when commonly-used queries will read from both p and d, do any RDMBs inline d to eliminate the need to store d separately (and maintain d's indexes, and other costs).

    I want to invoke Codd's paper where he suggests that tables could be entirely abstract - whereas right now I'm not aware of any RDBMS that treats tables as an abstraction; they all treat SQL TABLE objects as 1:1 representations of an on-disk rowstore structure (such that two SQL TABLE objects (sharing a PK) can't share a single rowstore, nor can multiple rowstores be used to represent a single TABLE... excepting SQL PARTITION, of course).

    Because surely a SQL TABLE relation object is intended as an abstraction and the RDMBS is free to choose the most appropriate on-disk representation, which might include inlining dependent tables, to quote Codd:

    https://www.seas.upenn.edu/%7Ezives/03f/cis550/codd.pdf
    The relational view (or model) of data described in Section 1 appears to be superior in several respects to the graph or network model presently in vogue for noninferential systems. It provides a means of describing data with its natural structure only-that is, without superimposing any additional structure for machine representation purposes

    For example:

    CREATE TABLE principal (
        principalId int         NOT NULL IDENTITY,
        foo         varchar(50) NOT NULL,
        bar         bigint      NOT NULL,
    
    CONSTRAINT PK_principal PRIMARY KEY ( principalId )
    

    );

    CREATE TABLE dependent (
    principalId int NOT NULL,
    baz varchar(50) NOT NULL,
    qux bigint NOT NULL,

    CONSTRAINT PK_dependent PRIMARY KEY ( principalId ),
    CONSTRAINT PK_dependent_to_principal FOREIGN KEY ( principalId ) REFERENCES principal ( principalId )
    

    );

    The above can be considered equivalent to:

    CREATE TABLE principal (
        principalId int         NOT NULL IDENTITY,
        foo         varchar(50) NOT NULL,
        bar         bigint      NOT NULL,
    
    d_baz       varchar(50)     NULL,
    d_qux       bigint          NULL,
    
    CONSTRAINT PK_principal PRIMARY KEY ( principalId ),
    
    CONSTRAINT CK_dependent CHECK (
        ( d_baz IS     NULL AND q_qux IS     NULL )
        OR
        ( d_baz IS NOT NULL AND q_qux IS NOT NULL )
    )
    

    );

    ...which presumably would have better DML performance as it has to maintain only the PK_principal index, instead of PK_principal and PK_dependent indexes when dependent data is added or removed.

    It also means that concurrent applications wouldn't need to take a lock on multiple tables during a transaction, and maintains proximal-locality which I assume would greatly benefit performance.



  • The most common one's don't automatically do this, as far as I'm aware, and probably for good reason. Your question is under the assumption that one would always want to query for all of the columns between the two tables. If that were the case, why even have two tables to begin with?

    "It also means that concurrent applications wouldn't need to take a lock on multiple tables during a transaction" - True but the time to take the lock on the single "inlined" table would be approximately proportionally longer since more data needs to be read from disk, written into memory, and then processed concurrently. In fact, a single longer running transaction could be worse for overall performance than two shorter locks, in some scenarios.

    "maintains proximal-locality which I assume would greatly benefit performance" - This is a moot point these days with modern hardware and modern database systems with how data is stored and located. I doubt proximal location makes any tangible difference compared to the usual bottlenecks with data management and querying.


Log in to reply
 


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2