TSQL Query to match different lengths of strings with each other



  • I'm writing a TVF to query a large table (tens of millions of rows) where a string (Postcode) in that table matches a string (a partial postcode (outcode/sector portion)) from another table.

    I'm hitting an edge case that I can't resolve.

    For those unfamiliar with UK Postcodes enter image description here

    Here's some sample data.

    DECLARE @tab1 TABLE (Sk INT, Postcode VARCHAR(8))
    DECLARE @tab2 TABLE (Sk INT, Coverage VARCHAR(8)) 
    

    INSERT INTO @tab1 (Sk, Postcode) VALUES (1, 'E12 5HH'), (6, 'SW1X 6AA')
    INSERT INTO @tab2 (Sk, Coverage) VALUES (1, 'E12'), (1, 'E12 5'),
    (2, 'E1'), (2, 'E11'), (2, 'E13'),
    (3, 'E12 6'),
    (4, 'E12 5') ,
    (5, 'E12') ,
    (7, 'SW1') ,
    (8, 'SW1X')

    And my current query

    SELECT  S.Sk, 
            S.Postcode, 
            CoverageSk = X.Sk, 
            X.Coverage 
    FROM        @tab1   S 
    OUTER APPLY (
        SELECT  Sk , 
                Coverage , 
                [Length] = LEN(Coverage) 
        FROM    @tab2
    )   X 
    WHERE   S.Sk <> X.Sk 
    AND     LEFT(S.Postcode,X.[Length] ) = X.Coverage 
    

    These are my results.

    enter image description here

    The data on rows 1 and 4 shouldn't be in the results. Row 1, The Coverage or outcode "E1" isn't the same as the outcode portion ("E12") of the Postcode "E12 5HH".

    The same for Row 4, The Coverage or outcode "SW1" isn't the same as the outcode ("SW1X") portion of the Postcode "SW1X 6AA".



  • Ideally, the source tables would have the component parts broken out so you could match them directly.

    If that's not possible, this works for the sample data:

    SELECT 
        T1.Sk, 
        T1.Postcode, 
        CoverageSk = T2.Sk, 
        T2.Coverage 
    FROM @tab1 AS T1
    JOIN @tab2 AS T2
        ON T2.Sk <> T1.Sk
        AND T1.Postcode LIKE 
                CASE
                    WHEN CHARINDEX(SPACE(1), T2.Coverage) > 0 
                    THEN T2.Coverage
                    ELSE T2.Coverage + SPACE(1)
                END + '%';
    
    Sk Postcode CoverageSk Coverage
    1 E12 5HH 4 E12 5
    1 E12 5HH 5 E12
    6 SW1X 6AA 8 SW1X

    SQL Server can use an index on Postcode via a https://www.sql.kiwi/2012/01/dynamic-seeks-and-hidden-implicit-conversions.html :

    dynamic seek




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2