Sometimes, meta-meaning characters, such as (^) or ($) and other special ones need to be included within the string to be searched for, representing the corresponding character instead of having the special meaning in the context of regular expressions syntax. To do so, we need to escape them properly in the string, with a backslash. If a backslash has to be represented too, it must be escaped with another backslash (two slashes \\).
Anything enclosed in the special square brace brackets [ and ] is a character class, a set of characters to which a matched character must belong. Please note that the expression in the square brackets matches only a simple character.
which means any vowel.
which matches “1†and “3†but not “a†or “6â€.
We can also describe a range, or set of ranges with the special hyphen character:
Besides, we can use sets to specify that a character cannot be a member of a set.
The caret symbol means "not" when it is placed inside the square brackets. As we have seen previously, it has a different meaning when it’s used outside, anchoring the beginning of a string.
Often, it’s useful to specify that there might be multiple occurrences of a particular string. We can represent this using the following special characters: “?â€, “+†and “*â€. Specifically, “?†means that the preceding character is optional, “+†means one or more of the previous character, while “*†means zero or more of the previous character.
Sometimes, it’s good to be able to split an expression into subexpressions, so, it’s possible, for example, to represent “at least one of these strings followed by exactly one of those.†We can achieve this using parentheses and combinations of the special characters ?,+ and *, exactly as we would do in an arithmetic expression.
(good)?computer // Matches “good computer†and “computerâ€, but not “good good computerâ€.
(good)+computer // Matches “good computer†and “good good computer†but not “computerâ€.
(good)*computer // Matches “computer†and “good good computer†but not “good computersâ€.
(Page 5 of 9 )
Regular expressions in JavaScript - Counted Subexpressions
We can specify how many times something can be repeated by using a numerical expression in curly braces ({ }). We can define an exact number of repetitions ({3} means exactly 3 repetitions), a range of repetitions ({2,4} means from 2 to 4 repetitions), or an open-ended range of repetitions ({2,} means at least two repetitions).
For example,
computer{1,3} // Matches “computerâ€, “computer computer†and “computer computer computerâ€.
Branching
Another useful option in building regular expressions is to represent choices for a string. This is done with a vertical pipe (|).
For example, if we want to match several domains, such as com, edu or net, the following expression would be used:
( com)|(edu)|(net)
Summary of special characters
Here are a few special characters that can be used for matching characters in regular expressions:
\n // a newline character
. // any character except a newline
\r // a carriage return character
\t // a tab character
\b // a word boundary (the start or end of a word)
\B // anything but a word boundary.
\d // any digit (same as [0-9])
\D // anything but a digit (same as [^0-9])
\s // single whitespace (space, tab, newline, etc.)
\S // single nonwhitespace.
\w // A “word character†(same as [0-9a-zA-Z_])
\W // A “nonword character (same as [^0-9a-zA-Z_]
Of course, there are more special characters and tips for regular expressions, generally well covered in any complete reference. For the sake of brevity, this list is good enough for this article. Since JavaScript has the same support that Perl for regular expressions, any full guide focused on Perl regular expressions will be applicable to JavaScript too.
Now, with all of the basics covered, we’ll see how we can add the power of regular expressions to our JavaScript code, making our developer life a lot easier and expanding our background a little bit more.
(Page 6 of 9 )
Regular expressions in JavaScript - Using regular expressions in JavaScript
Using regular expressions in JavaScript is very easy, often being passed over by people who don’t know that it can be done, or by developers arguing that parsing regular expressions slows down client-side applications. Whatever the reasons are, let’s show how we can create a regular expression in JavaScript:
var re = /regexp/;
where regexp is the regular expression itself. Extending the concept to our first example presented in the basics section, let’s build one that detects the string “JavaScriptâ€:
var re = /JavaScript/;
As default behavior, JavaScript regular expressions are case sensitive and only search for the first match in any given string. But we can add more functionality by adding the g and i modifiers (g for global and i for insensitive). Annexing the modifiers after the last /, we can make a regular expression search for all matches in the string and ignore case. Once again, let’s see some examples to properly understand these concepts.
Given the string “example1 Example2 EXAMPLE3â€, the following regular expressions match as listed below:
/Example[0-9]+/ // Matches “Example2â€
/Example[0-9]+/i // Matches “example1â€
/Example[0-9]+/gi // Matches “example1â€, “Example2†and “EXAMPLE3â€
As seen in the previous examples, the use of “i†and “g†modifiers increases noticeably the matching capabilities of regular expressions. So don’t forget they exist when you code your next script.
Applying methods to JavaScript strings
Using a regular expression is easy. Every JavaScript variable containing a text string is able to support four main methods (in Object Oriented parlance) or functions to work with regular expressions. They are match(), replace(), search() and test(), the last one being an object method rather than a string method.
We’ll see in turn how they work.
(Page 7 of 9 )
Regular expressions in JavaScript - The match() method
The match() method takes a regular expression as a parameter and returns an array of all the matching strings found in the string given. If no matches are found, then match() returns false. Let’s say we want to check the proper format for a phone number entered by a user, with the form of (XXX) XXX-XXXX. The code listed below does that:
function checkPhone( phone ) {
phoneRegex = /^\(\d\d\d\) \d\d\d-\d\d\d\d$/;
if( !phone.match( phoneRegex ) ) {
alert( ‘Please enter a valid phone number’ );
return false;
}
return true;
}
Let’s break down the code to understand how it works. First, we define a function that will check if the phone number entered has a valid format. Next, we declare the regular expression to define our pattern. It begins with ^, to indicate that any match must begin at the start of the string. Then we have \(, which will match the opening parenthesis. As seen previously, the character is escaped with a backslash to remove its special meaning in regular expression syntax. As mentioned, \d is a special code that matches any digit. The expression \d\d\d matches any three digits (same effect is achieved with [0-9] [0-9] [0-9]).
The rest of the pattern is pretty easy to understand. \) matches the closing parenthesis, the space matches the proper space for the phone number, then \d\d\d-\d\d\d\d matches three any digits followed by a dash, and then followed by four any digits.Finally, the $ indicates that any match must end at the end of the string.
It’s possible to short the regular expression as follows:
phoneRegex = /^\(\d{3]\) \d{3}-\d{4}$/;
Once we have seen in detail the regular expression pattern, let’s see how our function works. It checks whether or not the string contained in phone, passed as a parameter, matches our regular expression. If it does, then an array will be returned which JavaScript will evaluate as true. Otherwise it will return false, displaying the proper error message to the user.This kind of function is commonly used to validate user input data coming from HTML forms, chaining several specific functions to check if data entered is valid or not.
Here is an example:
First, the JavaScript code located in the HEAD section (or even better, in a separate .js file)
<script language=â€javascriptâ€>
validateForm = function() {
if ( checkPhone( this.phone, ‘Please enter a valid phone number’ ) ) {
return true;
}
return false;
}
checkPhone= function( field, errorMsg) {
phoneRegex = /^\(\d{3]\) \d{3}-\d{4}$/;
if( !field.match( phoneRegex ) ) {
alert( errorMsg );
field.focus();
field.select();
return false;
}
return true;
}
signupForm = document.forms[0]; // assumes that it’s the first form present in the document
signupForm.onsubmit = validateForm;
</script>
<form action=â€signup.htmâ€>
<p>Phone number ( e.g. (123) 456-7890):<input type=â€text†name=â€phone†/></p>
<p><input type=â€submit†value=â€send†/></p>
</form>
The user will be unable to submit this form unless a valid phone number has been entered. If the number format is not valid, an error message will be displayed (generated by our validateForm function).
As stated above, it’s easy to add more functionality to our validateForm() function. If we want to apply more than one check to the form, we can embed several calls to specific functions to perform particular validation, achieving something like this:
validateForm=function () {
if ( checkPhone( this.phone, ‘Please enter a valid phone number’ ) && checkEmail( this.email, ’Please enter a valid email address’ ) ) {
return true;
}
return false;
The code is very compact and is separated completely from the HTML.
Next, it’s time to see another useful JavaScript method for working with regular expressions: the replace() method.
(Page 8 of 9 )
Regular expressions in JavaScript - The replace() method
As you might suppose, replace() method replaces matches to a given regular expression with some new string. For a simple example, let’s say we want to replace every newline character (\n) with a break <br /> tag, within a form field used for comments, by formatting the content for proper displaying.
For example:
comment = document.forms[0].comments.value; // assumes that our HTML form the first one present in the document, and it has a field named “commentsâ€
comment = comment.replace(/\n/g, “<br />â€);
Pretty simple, right?, The first parameter taken is the regular expression we’re searching for (please note the g modifier indicating that it will do a global search, so it will find all of the occurrences in the string, not just the first). The second argument or parameter is the string with which we want to replace any matches (in this case, the <br /> tag).
Let’s wrap the above code into a simple function:
function formatField( fieldValue ) {
return fieldValue = fieldValue. replace(/\n/g, “<br />â€);
}
The function accepts any string as a parameter, and returns the new string with all of the newline characters replaced by <br /> tags.
In a moment, we’ll see another useful method used with regular expressions: the search() method.
The search() method
The search() method is very similar to the indexOf() method, with the difference being that it takes a regular expression as a parameter instead of a string. Then it searches the string for the first match to the given regular expression and returns an integer that indicates the position in the string (strings in JavaScript are zero-indexed elements, so it would return 0 if the match is at the start of the string, 5 if the match begins with the 5th character in the string, and so on). If no match is found, the method will return –1.
Let’s say we wish to know the location of the first absolute link within a HTML document. We might code something like this:
pos = htmlString.search(/^<a href=â€http:\/\/$/i);
if ( pos != -1) {
alert( ‘First absolute link found at’ + pos +’position.’);
}
else {
alert ( ‘Absolute links not found’);
}
It’s very simple and not quite useful, but good enough for example purposes.
So far, all methods described here work by accepting a regular expression as the parameter. Now, let’s take a detailed look at our final method: test(), which is by far the most used to perform client-side validation when using regular expressions in JavaScript.
(Page 9 of 9 )
Regular expressions in JavaScript - The test() method
The test() method is somewhat particular and different from the rest, as we’ll see shortly. Within the JavaScript context, when a pattern is defined following the syntax previously described, we are actually defining a new object, called a “regular expression objectâ€. I don't intend to go deeply into object programming concepts here. All we need to know is that this object owns the proprietary test() method, which allows us to perform string matching according to a given string.
The test() method takes a given string as a parameter and looks for matches according to the pattern defined within the regular expression object itself. If any matches are found, it will return true. If no matches are found, then it will return false. Let’s see an example to explain how this method works:
emailpat = /^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9])+(\.[a-zA-Z0-9_-]+)+$/;
if( !mailpat.test( emailString ) ) {
alert( ‘Please enter a valid email address’ );
}
First, we have defined a regular expression object that represents the standardized format of an email address. Then, we use the test() method to check for any matches to the email string passed as a parameter. If there are no matches, the error message will be displayed to the user.
We can easily build a function to check for email address validity, as we have seen so many times:
function validateEmail ( emailField, errorMsg ) {
emailpat = /^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9])+(\.[a-zA-Z0-9_-]+)+$/;
if( !emailpat.test( emailField.value ) ) {
alert( errorMsg);
emailField.focus();
emailField.select();
return false;
}
return true;
}
To validate an email address, we should call the function as:
validateEmail( this.email , ‘Please enter a valid email address’ );
where “this.email†is representing the form field named “emailâ€.
Summing it up
Having described the most common methods used with regular expressions, we can appreciate that they are not as intimidating as they seem. What’s more, we took a deeper look at their powerful capabilities for client-side validation, since they are an invaluable tool for verifying user input. By taking advantage of regular expressions in JavaScript, that verification can be done without making any requests to the server.
Validating user input prior to its being submitted is a good way to make sure that data are, at least, well formatted. However, JavaScript cannot be used on its own for complete validation, since it can be disabled in most browsers and offers limited control on user data. Server-side validation is always the best resource for complete and effective control.
And the HTML form code is the following:
Enter comment