Using Indexing When Performing JOINs on Part of a Composite Key



  • I have a table in my database (hosted on MariaDB) that looks like this (I have given the definition in SQLAlchemy),

    class Address(Base):
    __tablename__ = 'address'
    __table_args__ = {'schema': DB_NAME}
    

    referenceID = sqlalchemy.Column(sqlalchemy.String(length=25), primary_key=True)
    referenceTable = sqlalchemy.Column(sqlalchemy.String(length=25), primary_key=True)
    sourceReferenceID = sqlalchemy.Column(sqlalchemy.String(length=25), primary_key=True)
    addressType = sqlalchemy.Column(sqlalchemy.String(length=25), primary_key=True)
    zip = sqlalchemy.Column(sqlalchemy.String(length=10))
    streetAddress = sqlalchemy.Column(sqlalchemy.String(length=50))
    city = sqlalchemy.Column(sqlalchemy.String(length=25))
    fullAddress = sqlalchemy.Column(sqlalchemy.String(length=200))
    country = sqlalchemy.Column(sqlalchemy.String(length=25))
    validFrom = sqlalchemy.Column(sqlalchemy.Date)
    validTo = sqlalchemy.Column(sqlalchemy.Date)

    Now, it can be seen that my table has a composite key which includes several attributes.

    As far as I know, a clustered index will be created out of the primary key of a table. I am not completely aware of how this works when the table consists of a composite key. An explanation on this would be appreciated.

    Further, I am going to want to perform several JOINs on this table, particularly using the referenceID and sourceReferenceID attributes of the table. To do this efficiently, do I need to create indexes (non-clustered) on these attributes separately using index=True?



  • You can JOIN on any column or combination of columns (except for TEXT or BLOB).

    JOINing is likely to be faster (perhaps much faster) if the columns in the ON clause are in some index.

    I don't understand the semantics of your four primary key columns; the rest of the columns feel like they need a location_id as the PRIMARY KEY.

    I MySQL, the PRIMARY KEY has these properties:

    • There is only one PK for the table.
    • "Clustered": The Data's BTree is ordered by the PK. This makes some queries with WHERE or ON involving the PK column(s) faster than they would otherwise be.
    • The order of columns in the PK (and any secondary key) is important, depending on details of the SELECT or other SQL that is using it.
    • A "composite" (multi-column) index (PRIMARY or secondary) can be though of as concatenating the columns together to make one long string that provides the storage order of the data (for Primary) or index (for secondary).
    • Usually, if the first column of a composite key is not tested with =, the rest of the columns are not used.
    • A "covering" index is one that includes all of the columns needed for the entire SELECT. This is likely to make the Select run somewhat faster because it need look only in the index's BTree.

    It would be better to discuss a specific query.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2