And First There Was Love...
Regular expressions, also
known as "regex" by the geek community, are a powerful
tool used in pattern-matching and substitution. They are commonly
associated with almost all *NIX-based tools, including editors
like vi, scripting languages like Perl and PHP, and shell
programs like awk and sed. You'll even find them in client-side
scripting languages like JavaScript - kinda like Madonna, their
popularity cuts across languages and territorial boundaries...
A regular expression lets you build patterns using a set of
special characters; these patterns can then be compared with text
in a file, data entered into an application, or input from a form
filled up by users on a Web site. Depending on whether or not
there's a match, appropriate action can be taken, and appropriate
program code executed.
For example, one of the most common applications of regular
expressions is to check whether or not a user's email address, as
entered into an online form, is in the correct format; if it is,
the form is processed, whereas if it's not, a warning message
pops up asking the user to correct the error. Regular expressions
thus play an important role in the decision-making routines of
Web applications - although, as you'll see, they can also be used
to great effect in complex find-and-replace operations.
A regular expression usually looks something like this:
/love/
All this does is match
the pattern "love" in the text it's applied to. Like
many other things in life, it's simpler to get your mind around
the pattern than the concept - but then, that's neither here nor
there...
How about something a little more complex? Try this:
/fo+/
This would match the
words "fool", "footsie" and "four-seater".
And although it's a pretty silly example, you have to admit that
there's truth to it - after all, who but fools in love would play
footsie in a four-seater?
The "+" that you see above is the first of what are
called "meta-characters" - these are characters that
have a special meaning when used within a pattern. The
"+" metacharacter is used to match one or more
occurrence of the preceding character - in the example above, the
letter "f" followed by one or more occurrence of the
letter "o".
Similar to the "+" meta-character, we have
"*" and "?" - these are used to match zero or
more occurrences of the preceding character, and zero or one
occurrence of the preceding character, respectively. So,
/eg*/
would match "easy",
"egocentric" and "egg"
while
/Wil?/
would match "Winnie",
"Wimpy" "Wilson" and "William",
though not "Wendy" or "Wolf".
In case all this seems a little too imprecise, you can also
specify a range for the number of matches. For example, the
regular expression
/jim{2,6}/
would match "jimmy"
and "jimmmmmy!", but not "jim". The numbers
in the curly braces represent the lower and upper values of the
range to match; you can leave out the upper limit for an open-ended
range match.
Next: Of
Carrots, Bombshells And Four-Figure Incomes >>