What indexing can help improve speed of multi join query with where clause on nvarchar field?



  • I have a multi-join SQL query.

    SELECT A.COMPANYEMAIL, A.CONTRACTNAME, 
    B.CLAIMNAME, C.PARTNAME, C.PARTPRICE, 
    C.DAMAGEAMOUNT, C.DAMAGEDATE
    FROM CONTRACT A
    JOIN CLAIM B ON A.ID=B.CONTRACTID
    JOIN PART C ON B.ID=C.CLAIMID
    WHERE A.COMPANYEMAIL='XYZ@ABC.COM'
    ORDER BY DAMAGEDATE DESC
    

    The ID column in each table is primary key, clustered. Each table has atleast 10M records.

    Does it make sense to:

    1. Create non clustered index on B.CONTRACTID and C.CLAIMID

    2. Instead of non clustered index, create non clustered covering index for example B.CONTRACTID (INCLUDE B.CLAIMNAME) and C.CLAIMID (INCLUDE PARTNAME)?

    3. A.EMAIL column has unique index- what can be done to improve the WHERE clause and prevent it from having to read all pages of the table? If I add non clustered index on A.EMAIL then would that help? Or do I need INCLUDE A.ID and A.CONTRACTNAME as well?

    4. Any other indexing recommendation please?



    1. Create non clustered index on B.CONTRACTID and C.CLAIMID

    No, by themselves, these are non-covering and would leave you with Key Lookup operations on a large tables. A Key Lookup is a row based operation, meaning if you feed it 10 million rows, it'll turn around and execute 10 million independent Key Lookups.

    1. Instead of non clustered index, create non clustered covering index for example B.CONTRACTID (INCLUDE B.CLAIMNAME) and C.CLAIMID (INCLUDE PARTNAME)?

    Yes/No

    1. The Index on CLAIM would be good in this case, as your proposed index has a key on the column used in the where clause, and an INCLUDE on the column returned in the select statement.
    2. However, the index you're proposing on PART only has a key on CLAIMID and an INCLUDE on PARTNAME. This still leaves PARTPRICE, DAMAGEAMOUNT, and DAMAGEDATE uncovered, in need of Key Lookups to find their values. Of course, this is all the fact that you're sorting, which is going to have significant impact on the keys and includes of your indexes. More on that later.

    3 A.EMAIL column has unique index. What can be done to improve the WHERE clause and prevent it from having to read all pages of the table? If I add nonclustered index on A.EMAIL then would that help? Or do I need INCLUDE A.ID and A.CONTRACTNAME as well?

    Yes, you would want to try adding a NONCLUSTERED index on CONTACT, and you'd want INCLUDE for CONTRACTNAME. Including ID is not necessary, as it's implicitly included in all NONCLUSTERED indexes, since it's the CLUSTERED key. The only time you might want considering it as an INCLUDE is if there a chance the CLUSTERED index on the table may be changed in the future.

    4.Any other indexing recommendation please?

    An ORDER BY may throw a monkey wrinkle into every thing mentioned here. Since you're joining three tables, you basically need to decide between performance on the JOIN operations, or performance when sorting the data. You say you 10M rows in each table. Since you're only WHERE clause is on EMAIL, I would suspect that you're going to narrow down your result drastically upfront, making the ORDER By a more trivial operation. For example, if you have 5-10 rows returned by the query, that's not enough to worry about optimizing for. However, if you were to be left with 1 million rows returned, that's where you'd want to rethink your strategy.

    To summarize, I'd create the below three indexes as a starting point, and test them. As with most indexing operations, without having the full data set to test, query plan and I/O stats, we can only guess these will be optimal for your situation. For now, I would not even worry about the ORDER BY without knowing how many rows this query returns, on average.

    CREATE NONCLUSTERED INDEX IX_Contract_CompanyEmail
        ON Contract(CompanyEmail) INCLUDE(ContractName);
    

    CREATE NONCLUSTERED INDEX IX_Claim_ContractID
    ON Claim(ContractID) INCLUDE(ClaimName);

    CREATE NONCLUSTERED INDEX IX_Part_ClaimId
    ON Part(ClaimId) INCLUDE(PartName,PartPrice, DamageAmount,DamageDate);




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2