Wednesday, September 14, 2005

To Regex or not to Regex

Regular Expression (Regex) is undoubtful a strong feature of Perl. Many people use Perl if they want to do pattern matching, like in matching DNA result, matching human finger print, etc. However, Regex sometimes is not the fastest way. In some simple cases, Regex can be avoided by using built-in Perl functions, like index and substr. Note: index is very fast since it uses a Boyer-Moore algorithm.

Here is a simple example when Regex is found to be slow.

&call_a_function() if $condition =~ m/^yes$/i;


Function 'call_a_function()' will be called when $condition is 'Yes', 'yes', 'YES', 'YeS', or any other case combination. Using Regex for such a simple task is avoidable. You can also write the following code to do the same task:

&call_a_function() if lc($condition) eq 'yes';


Now, we are going to see some examples of Regex-replacement in a more complex situation.

my $sentence='I like apples, oranges, and bananas';


To check the string whether $sentence has the word 'apples' in it, we usualy use this command:

($sentence=~m/apples/ and print 'exist') or print 'not exist';


To save some time, we can replace the command above with this command:

(index($sentence, 'apples')!=-1 and print 'exist') or print 'not exist';


The function index() will return the position of the first existing occurence (starting from zero). It will return -1 if no match is found.

In my computer, index() is 38% faster than the regex method.

What about string replacement, is it possible to use Perl built in function? For some simple task, it is possible. Let's say we have a sentence and we want to replace a specific string with another string. See the example below:


my $sentence='I like classic and jazz. Do you like jazz?';
my $string='jazz';
my $word='metal';


To replace the first word 'jazz' with 'metal', we usually use this command:

$sentence=~s/$string/$word/;
# now $sentence is 'I like classis and metal. Do you like jazz?'


We can also use substr function as shown below:

substr($sentence, index($sentence,$string), length($string), $word);
# it will give the exact same result


The 'g' regex modifier will replace all 'jazz's with 'metal':

$sentence=~s/$string/$word/g;
# $sentence is 'I like classic and metal. Do you like metal?'


Can it be accomplished by using substr function? It is possible, but perhaps you won't like it :)

(substr($sentence,index($sentence,$string), length($string),$word)) while(index($sentence,$string)!=-1);


Since it is less readable than the regex method, you might want to put it inside a subroutine and call the subroutine whenever you need a simple string replacement as shown above.