Wednesday, February 22, 2012

Sql + RegEx = Awesome

NOTE: This is not a post about regex (if you want more info check out http://www.regular-expressions.info/ or http://www.zytrax.com/tech/web/regex.htm and http://gskinner.com/RegExr/ is a great tool for testing your regular expressions).

One of my clients called and asked me to create an auto-complete list of employee names – no problem. I cracked open the table and started looking at the data only to realize that there were names that were “not valid”, i.e., they contained invalid characters, numbers, periods, etc. When I questioned them about it, the response was “just filter those out”. Now I was ready to get to work.

Step 1, filter out all the names that begin with a number:

SELECT
    FirstName, LastName
FROM
    dbo.Employee
WHERE
    FirstName NOT LIKE '1%'
    AND
    FirstName NOT LIKE '2%'
    AND
    FirstName NOT LIKE '3%' 
    ...
    AND
    FirstName NOT LIKE '9%' 

Great, now on to step 2, filter out names that begin with a dash, period, comma, or backslash:
SELECT
    FirstName, LastName
FROM
    dbo.Employee
WHERE
    FirstName NOT LIKE '1%'
    AND
    FirstName NOT LIKE '2%'
    AND
    FirstName NOT LIKE '3%' 
    ...
    AND
    FirstName NOT LIKE '9%'  
    AND
    FirstName NOT LIKE '-%'
    AND
    FirstName NOT LIKE '.%'
    AND
    FirstName NOT LIKE ',%'  
    AND
    FirstName NOT LIKE '\%'  

Awesome, step 3, abandon all of that and find a better way. In reality, step 2 never happened and step 1 only lasted past the first where condition. I knew there had to be a better way and it is inline RegEx, and it is good.
Let’s rewrite the above query so you can see what it looks like using RegEx:
SELECT
    FirstName, LastName
FROM
    dbo.Employee
WHERE
    FirstName NOT LIKE '[0-9-\.,\\]%'

It’s that simple!

In my case, it was the best solution for filtering out data, but you could just as easily use this technique to create complex select statements too. Give it a try and enjoy!