NSRegularExpression Tutorial and Cheat Sheet
A NSRegularExpression tutorial that shows you how to search, replace, and validate data in your app. Includes a handy NSRegularExpression cheat sheet PDF! By Soheil Azarpour.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
NSRegularExpression Tutorial and Cheat Sheet
45 mins
Update 3/11/15: Updated for Xcode 6.2.
A regular expression (commonly known as a “regex”) is a string or a sequence of characters that specifies a pattern. Think of it as a search string — but with super powers!
A plain old search in a text editor or word processor will allow you to find simple matches. A regular expression can also perform these simple searches, but it takes things a step further and lets you search for patterns, such as two digits followed by letter, or three letters followed by a hyphen.
This pattern matching allows you to do useful things like validate fields (phone numbers, email addresses), check user input, perform advanced text manipulation and much much more.
If you have been eager to know more about using regular expressions in iOS, look no further than this tutorial — no previous experience required!
By the end of this NSRegularExpression tutorial you will have implemented code to search for patterns in text, replace those matches with whatever you wish, validate user input information and find and highlight some complex strings in a block of text.
In addition, I’ll give you a handy NSRegularExpression Cheat Sheet PDF that you can print out and use as reference as you’re developing!
Without further ado, it’s time to start crunching some regular expressions.
/The (Basics|Introduction)/
Note: If you’re already familiar with regular expressions, feel free to skip ahead to the next section.
Note: If you’re already familiar with regular expressions, feel free to skip ahead to the next section.
If you are new to regular expressions and are wondering what all the hype is about, here’s a simple definition: a regular expression is a simple string that can describe a large number of possibilities in a concise notation. There are many awesome books and tutorials written about regular expression – you’ll find a short list of them at the end of this tutorial.
Examples
Let’s start with a few brief examples to show you what regular expressions look like.
Here’s an example of a regular expression that matches the phrase “NSRegularExpression”:
NSRegularExpression
That’s about as simple as regular expressions get. You can use some APIs that are available in iOS to search a string of text for any part that matches this regular expression – and once you find a match, you can find where it is, or replace the text, etc.
Here’s a slightly more complicated example – this one matches the phrase “NSRegularExpression” or “NSRegularExpressions”:
NSRegularExpression(s)?
This is an example of using some special characters that are available in regular expressions. The parenthesis create a group, and the question mark says “match the previous element (the group in this case) 0 or 1 times”.
Now let’s go for a really complex example. This one matches any HTML or XML tag:
<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>
Wow, looks complicated, eh? :] Don’t worry, you’ll be learning about all the special characters in this regular expression in the rest of this tutorial, and by the time you’re done you should be able to understand how this works! :]
Testing Regular Expressions
In this tutorial, you’ll be creating a lot of regular expressions. If you want to try them out visually as you’re working with them, check out regexpal, a web-based regular expression parser. Enter a regular expression in the top field, enter some text in the bottom field, and the matches in the searched text will automatically highlight.
Load up regexpal and try out the above example expressions one at a time. Here’s some good sample text to use:
NSRegularExpression tutorial or NSRegularExpressions tutorial. And here's an <strong>html</strong> tag.
Pretty handy, eh? It’s great to see regular expressions in action, so you can test out your own regular expressions as you’re working with them.
Overall Concepts
Before you go any further, it’s important to understand a few core concepts about regular expressions.
Literal characters are the simplest kind of regular expression. They’re similar to a “find” operation in a word processor or text editor. For example, the single-character regular expression t
will find all occurrences of the letter “t”, and the regular expression hello
will find all appearances of “hello”. Pretty straightforward!
Just like a programming language, there are some “reserved” characters in regular expression syntax, as follows:
- [
- ( and )
- \
- *
- +
- ?
- { and }
- ^
- $
- .
- | (pipe)
- /
These characters are used for advanced pattern matching. If you want to search for one of these characters, you need to escape it with a backslash. For example, to search for all periods in a block of text, the pattern is not .
but rather \.
.
As an extra complication, since regular expressions are strings themselves, the backslash character needs to be escaped when working with NSString
and NSRegularExpression
. That means the standard regular expression \.
will be written as \\.
in your code.
To clarify the above concept in point form:
- The literal
@"\\."
defines a string that looks like this: \. - The regular expression \. will then match a single period character
Capturing parentheses are used to group part of a pattern. For example, 3 (pm|am)
would match the text “3 pm” as well as the text “3 am”. The pipe character here (|
) acts like an OR operator. You can include as many pipe characters in your regular expression as you would like. As an example, (Tom|Dick|Harry)
is a valid pattern.
Grouping with parentheses comes in handy when you need to optionally match a certain text string. Say you are looking for “November” in some text, but the user may or may not have abbreviated the month as “Nov”. You can define the pattern as Nov(ember)?
where the question mark after the capturing parentheses means that whatever is inside the parentheses is optional.
These parentheses are termed “capturing” because they capture the matched content and allow you reference it in other places in your regular expression.
As an example, assume you have the string “Say hi to Harry”. If you created a search-and-replace regular expression to replace any occurences of (Tom|Dick|Harry)
with that guy $1
, the result would be “Say hi to that guy Harry”. The $1
allows you to reference the first captured group of the preceding rule.
Capturing and non-capturing groups are somewhat advanced topics. You’ll encounter examples of capturing and non-capturing groups later on in the tutorial.
Character classes represent a set of possible single-character matches. Character classes appear between square brackets ([
and ]
).
As an example, the regular expression t[aeiou]
will match “ta”, “te”, “ti”, “to”, or “tu”. You can have as many character possibilities inside the square brackets as you like, but remember that any single character in the set will match. [aeiou]
looks like five characters, but it actually means “a” or “e” or “i” or “o” or “u”.
You can also define a range in a character class if the characters appear consecutively. For example, to search for a number between 100 to 109, the pattern would be 10[0-9]
. This returns the same results as 10[0123456789]
, but using ranges makes your regular expressions much cleaner and easier to understand.
But character classes aren’t limited to numbers — you can do the same thing with characters. For instance, [a-f]
will match “a”, “b”, “c”, “d”, “e”, or “f”.
Character classes usually contain the characters you want to match, but what if you want to explicitly not match a character? You can also define negated character classes, which use the ^ character. For example, the pattern t[^o]
will match any combination of “t” and one other character except for the single instance of “to”.