Recently a colleague submitted a proposal to our team regarding the storage of regular expressions in a database lookup table. Whilst I could see the obvious benefits from this, it did make me slightly uneasy.

- Image via Wikipedia
My principle concern was that vast swathes of key business logic could be broken with a simple update, and that we were effectively storing (pseudo) executable code in a database table. In addition to this, I was concerned that as these expressions would be defined outside of compiled code, they would be circumventing any syntactic/lexical validation performed by the compiler.
Further concerns that were raised were regarding the nature of the expressions to be stored. For example, postcode pattern matching should be deemed acceptable, however specific business-related search terms should not.
There was also the concern regarding different flavours of regex – PERL, PL/SQL, Unix, JavaScript, C# etc. How does a platform know which regexes it can use, and which it cannot.
After a bit of parley, we came up with a sensible set of proposals:
- Provide an API for their access and use with rigorous escapement and error entrapment. In particular, erroneous or poisonous expressions (such as those that may facilitate SQL Injections) should be handled.
- Ensure different ‘flavours’ are captured in the lookup table. Provide an API for those platforms wishing to subscribe.
- Avoid storing very specific (business or otherwise related) search terms, and give special consideration to terms that needn’t require a regular expressions.
- Require evidence of testing and impact analysis before the submission of any new regex is accepted.
And finally, should you want to learn or brush up on them, you could do far worse than Regex Coach.

