Of Carrots, Bombshells And Four-Figure Incomes
Now that you've got the basics down, how about taking it to the next level? It's also possible to search for white space, numbers and alphabetic characters with a regular expression - and here's the merry gang of meta-characters that will help you do just that:

\s = used to match a single white space character, including tabs and newline characters

\S = used to match everything that is *not* a white space character

\d = used to match numbers from 0 to 9

\w = used to match letters, numbers and underscores

\W = used to match anything that does not match with \w

. = used to match everything except the newline character

Now, you're probably thinking, "Hey, that's great - but what does it all mean?!". Well, suppose you wanted to find all the white space in a document...


/\s+/

Easy, isn't it? If you're looking only for numbers, try


/\d/

So, if you had a complex financial spreadsheet in front of you, and you wanted to quickly find all amounts of a thousand dollars or more, you could use


/\d000/

How about limiting your search to the beginning or end of a string? Well, that's why we have "pattern anchors" - these simply tie your regular expression to either the first or last character of the string, and come in very useful when you're looking for a way to filter through a mass of matches.

There are two basic pattern anchors - the first one is represented by a caret [^], and is used to indicate that the expression should be matched only at the beginning of the string that it is applied to. For example, the expression


/^hell/

will return a match only if it finds a word beginning with "hell" - "hello" and "hellhound", but not "shell".

And similarly, to match the end of a string, there's the "$" pattern anchor. So


/ar$/

would match "scar", "car" and "bar", though not "art", "army" or "arrow".

There's also a simpler way to add pattern anchors to your expression - the \b meta-character. This is used to check that the regex matches the boundary of a string, and it can be placed either at the beginning or end of the pattern to be matched - like this:


/\bbom/

This would match both "bombay" and "bombshell", while


/man\b/

would match "human", "woman" and "man", though not "manitou" or "mannequin". And the converse of this is \B, which matches everywhere but at the boundaries of a string.

Next: Ranging Far And Wide... >>