TSQL Query to match different lengths of strings with each other
inna last edited by
I'm writing a TVF to query a large table (tens of millions of rows) where a string (Postcode) in that table matches a string (a partial postcode (outcode/sector portion)) from another table.
I'm hitting an edge case that I can't resolve.
For those unfamiliar with UK Postcodes
Here's some sample data.
DECLARE @tab1 TABLE (Sk INT, Postcode VARCHAR(8)) DECLARE @tab2 TABLE (Sk INT, Coverage VARCHAR(8))
INSERT INTO @tab1 (Sk, Postcode) VALUES (1, 'E12 5HH'), (6, 'SW1X 6AA')
INSERT INTO @tab2 (Sk, Coverage) VALUES (1, 'E12'), (1, 'E12 5'),
(2, 'E1'), (2, 'E11'), (2, 'E13'),
(3, 'E12 6'),
(4, 'E12 5') ,
(5, 'E12') ,
(7, 'SW1') ,
And my current query
SELECT S.Sk, S.Postcode, CoverageSk = X.Sk, X.Coverage FROM @tab1 S OUTER APPLY ( SELECT Sk , Coverage , [Length] = LEN(Coverage) FROM @tab2 ) X WHERE S.Sk <> X.Sk AND LEFT(S.Postcode,X.[Length] ) = X.Coverage
These are my results.
The data on rows 1 and 4 shouldn't be in the results. Row 1, The Coverage or outcode "E1" isn't the same as the outcode portion ("E12") of the Postcode "E12 5HH".
The same for Row 4, The Coverage or outcode "SW1" isn't the same as the outcode ("SW1X") portion of the Postcode "SW1X 6AA".
Ideally, the source tables would have the component parts broken out so you could match them directly.
If that's not possible, this works for the sample data:
SELECT T1.Sk, T1.Postcode, CoverageSk = T2.Sk, T2.Coverage FROM @tab1 AS T1 JOIN @tab2 AS T2 ON T2.Sk <> T1.Sk AND T1.Postcode LIKE CASE WHEN CHARINDEX(SPACE(1), T2.Coverage) > 0 THEN T2.Coverage ELSE T2.Coverage + SPACE(1) END + '%';
Sk Postcode CoverageSk Coverage 1 E12 5HH 4 E12 5 1 E12 5HH 5 E12 6 SW1X 6AA 8 SW1X
SQL Server can use an index on Postcode via a https://www.sql.kiwi/2012/01/dynamic-seeks-and-hidden-implicit-conversions.html :