Of Carrots, Bombshells And
Four-Figure Incomes
Now that you've got the
basics down, how about taking it to the next level? It's also
possible to search for white space, numbers and alphabetic
characters with a regular expression - and here's the merry gang
of meta-characters that will help you do just that:
\s = used to match a single white space character, including tabs
and newline characters
\S = used to match everything that is *not* a white space
character
\d = used to match numbers from 0 to 9
\w = used to match letters, numbers and underscores
\W = used to match anything that does not match with \w
. = used to match everything except the newline character
Now, you're probably thinking, "Hey, that's great - but what
does it all mean?!". Well, suppose you wanted to find all
the white space in a document...
/\s+/
Easy, isn't it? If you're looking only for numbers, try
/\d/
So, if you had a complex financial spreadsheet in front of you, and you wanted to quickly find all amounts of a thousand dollars or more, you could use
/\d000/
How about limiting your
search to the beginning or end of a string? Well, that's why we
have "pattern anchors" - these simply tie your regular
expression to either the first or last character of the string,
and come in very useful when you're looking for a way to filter
through a mass of matches.
There are two basic pattern anchors - the first one is
represented by a caret [^], and is used to indicate that the
expression should be matched only at the beginning of the string
that it is applied to. For example, the expression
/^hell/
will return a match only
if it finds a word beginning with "hell" - "hello"
and "hellhound", but not "shell".
And similarly, to match the end of a string, there's the "$"
pattern anchor. So
/ar$/
would match "scar",
"car" and "bar", though not "art",
"army" or "arrow".
There's also a simpler way to add pattern anchors to your
expression - the \b meta-character. This is used to check that
the regex matches the boundary of a string, and it can be placed
either at the beginning or end of the pattern to be matched -
like this:
/\bbom/
This would match both "bombay" and "bombshell", while
/man\b/
would match "human",
"woman" and "man", though not "manitou"
or "mannequin". And the converse of this is \B, which
matches everywhere but at the boundaries of a string.