Postgresql Search for a 6 digit number in text
-
I have a table with one field Address - varchar(250), is it possible to search this field for a 6 digit (integer) Pincode. Is it also possible to list out records without any 12 digit Pincode.
Sl No | Address 1 | "16/127, Off M G Road, Goregaon (west) Mumbai, Maharashtra, 400062 02228744298" 2 | "113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka, 560002 2229218" 3 | "2, 2, Panchal Estate, Bapunagar Ahmedabad, Gujarat, 380024 079227401432nd Flr, B Wing, Jay Chamber, Service Rd, Near Telephone Exchange, Vile Parle (east) Mumbai, Maharashtra, 400057 02226104168" 4 | "New#27, Gopalakrishna Street, T Nagar Chennai, Tamil Nadu, 600017 04428150211" 5 | "9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi, 110055 01123550815" 6 | "113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka, 2229218" 7 | "9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi, 01123550815"
-
To resolve your issue, I did the following (all of the code below is available on the fiddle https://dbfiddle.uk/?rdbms=postgres_13&fiddle=5db82aefe80e18bc218c2aa3c0515d6f ). The code makes use of the https://www.postgresql.org/docs/9.1/functions-string.html function ( https://www.postgresql.org/docs/9.1/functions-string.html ) and the https://www.2ndquadrant.com/en/blog/postgresql-regular-expressions-and-pattern-matching/ (
~
) regular expression (regex) matching operator ( https://www.postgresql.org/docs/current/functions-matching.html ). https://www.regular-expressions.info/ is an excellent site for regexes (quick start https://www.regular-expressions.info/quickstart.html ).CREATE TABLE test ( s1_no INTEGER GENERATED ALWAYS AS identity, address TEXT NOT NULL );
and then populate using your data (and some test data of my own). BTW, all of the PINs that were supposed to be 12 digits long were actually 11 digits in your data, so I've assumed that 11 is the correct number of digits - change as required:
INSERT INTO test (address) VALUES
--
-- my own test cases('"sdfg ghhjbghjj 123456 12345678901"'),
('"dsdcfg vbggff 12345 123456789"'), -- <<---- should NOT be returned
-- no 11 digit number-- your data (with concatenated records corrected)
('"16/127, Off M G Road, Goregaon (west) Mumbai, Maharashtra, 400062 02228744298"'),
('"113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka, 560002 2229218"'),
('"2, 2, Panchal Estate, Bapunagar Ahmedabad, Gujarat, 380024 07922740143"'),
('"2nd Flr, B Wing, Jay Chamber, Service Rd, Near Telephone Exchange, Vile Parle (east) Mumbai, Maharashtra, 400057 02226104168"'),
('"New#27, Gopalakrishna Street, T Nagar Chennai, Tamil Nadu, 600017 04428150211"'),
('"9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi, 110055 01123550815"'),
('"113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka, 2229218"'),--
-- This last record from your own data, returns NULL for sstr (see below)
-- because there is no 6 digit no. in the address.('"9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi, 01123550815"');
Then use the query:
SELECT s1_no, TRIM(SUBSTRING(address FROM ' \d{6} ')) AS sstr FROM test WHERE address ~ ' \d{11}"' ORDER BY s1_no;
Result:
s1_no sstr 1 123456 3 400062 5 380024 6 400057 7 600017 8 110055 10 NULL 7 rows
Records 2, 4 & 9 aren't returned because there is no 11 digit PIN (couldn't find out what this was for) and record 10 returns
NULL
as there is no https://%20Decoding%20the%20PINCODE%20(Postal%20Index%20Number)%208 . The https://www.postgresqltutorial.com/postgresql-string-functions/postgresql-trim-function/ function ( https://www.postgresql.org/docs/9.1/functions-string.html ) removes the spaces returned by the regex.Explanation of regexes:
The
SUBSTRING()
function extracts a string which matches the pattern.' \d{6} '
Match a space,
Followed by 6 digits exactly. (
\d
) is a "shorthand character class" (see https://www.regular-expressions.info/quickstart.html ) that represents a single digit.The curly braces act as a https://techbyexample.com/curly-braces-quantifier-regex/ in regexes. The single 6 between them means match 6 occurrences exactly. One could put something like {3,6} which would match between 3 and 6 occurrences for example.
The final space means that these 6 digits must also be followed by a space in order for a match to occur.
The tilde (
~
) operator can be seen as a synonym for the English word "contains".~ ' \d{11}"'
- does the expression contain a space, followed by exactly 11 digits, followed by a double quote ("
)?
You can eliminate the
NULL
by using (see fiddle):SELECT s1_no, TRIM(SUBSTRING(address FROM ' \d{6} ')) AS sstr FROM test WHERE SUBSTRING(address FROM ' \d{6} ') IS NOT NULL AND address ~ ' \d{11}"';
Finally, your https://economictimes.indiatimes.com/nation-world/here-is-how-your-pin-code-is-decided/first-digit/slideshow/57889801.cms will be returned as
TEXT
and not as anINTEGER
. This is OK, since one does not add, subtract, multiply or divide using PINs.However, should you wish to https://en.wikipedia.org/wiki/Hash_function#Hashing_integer_data_types as
INTEGER
s for partitioning purposes or somesuch. You can use the very useful[PG_TYPEOF()][10]
function ( https://www.postgresql.org/docs/13/functions-info.html ) which returns the the type of any expression as follows (also in fiddle):SELECT s1_no, TRIM(SUBSTRING(address FROM ' \d{6} ')) AS sstr, PG_TYPEOF(TRIM(SUBSTRING(address FROM ' \d{6} '))) FROM test WHERE address ~ ' \d{11}"' ORDER BY s1_no;
Result (snipped for brevity):
s1_no sstr pg_typeof 1 123456 text 3 400062 text ... ... ...
For the purposes of hashing, you could use the non-standard PostgreSQL https://www.postgresql.org/docs/13/sql-expressions.html ( https://www.postgresql.org/docs/13/sql-expressions.html ) as follows (see fiddle):
TRIM(SUBSTRING(address FROM ' \d{6} '))::INT
This could also help if you wished to partition your PINs by range as opposed to list.