Regular Expressions in Kotlin
Learn how to improve your strings manipulation with the power of regular expressions in Kotlin. You’ll love them! By arjuna sky kok.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Regular Expressions in Kotlin
30 mins
- Getting Started
- Understanding the Backstory
- Building and Running the Web App
- The Regex Object
- Using RegexOption
- Flag expression
- Understanding Character Classes, Groups, Quantifiers and Boundaries
- Using Character Classes
- Using Groups and Quantifiers
- Using Boundaries
- Regex Helper Tools from IntelliJ IDEA
- Understanding Predefined Classes and Groups
- Captured Groups and Back-references
- Understanding Greedy Quantifiers, Possessive Quantifiers and Reluctant Quantifiers
- Using Greedy Quantifiers
- Using Possessive Quantifiers
- Using Reluctant Quantifiers
- Understanding the Logical Operator and Escaping Regex
- Where to Go From Here?
Flag expression
RegexOption
‘s purpose is to alter the behavior of a regex. But you can achieve the same results without using RegexOption
by writing the rule in the regex string, like this:
val pattern = Regex("(?i)batman(?-i)")
You get the same result as using Regex.IGNORE_CASE
.
This strange syntax is a flag expression. Flag expressions have special meanings. (?i)batman(?-i)
doesn’t mean the regex string matches the (?i)batman(?-i)
string exactly.
The regex engine interprets the flag expressions differently than normal characters. (?i)
tells the regex engine to treat the characters case-insensitively from now on. On the other hand, (?-i)
tells the regex engine to treat the characters case-sensitively from this point on.
So (?i)b(?-i)atman
means only b
is case-insensitive. The rest of the characters are case-sensitive.
But for this example, you’ll use only RegexOption
.
Understanding Character Classes, Groups, Quantifiers and Boundaries
Another problem appears. A superhero called catman
enters Supervillains Club. How do you forbid both catman
and batman
?
With a standard string method, you can use the if
condition with a logical operator. But you’ll use regex.
You want to check whether the string is batman
or catman
. Notice, only one character is different. The rest characters, atman
, are the same.
Using Character Classes
You can use a character class to group b
and c
. Replace your pattern
line with:
val pattern = Regex("[bc]atman", RegexOption.IGNORE_CASE)
The [
and ]
create a character class. [bc]
means either b
or c
. [aiueo]
means vowels.
There are special characters inside square brackets. If you want to negate the characters, you can use ^
. [^aiueo]
means any characters other than vowels.
You can also use -
to create a range of characters. [a-z]
means a
, b
, c
until z
.
Build and run the app. Try to input catman
. The validation works flawlessly.
Next, you’ll take a look at groups and quantifiers.
Using Groups and Quantifiers
All is well until batwoman
breaks into Supervillains Club. Now you need to prevent batman
and batwoman
as well. Notice, the difference is the wo
string: You can’t use the character class to solve this problem.
bat[wo]man
means batwman
or batoman
. It doesn’t match batwoman
.
What you want is a group.
Add this new rule to the existing regex syntax. Replace your pattern
line with:
val pattern = Regex("[bc]at(wo)?man", RegexOption.IGNORE_CASE)
Here, you use (
and )
to create a group. (wo)
means a group of the wo
string. Groups make characters a single unit.
You want to make this group optional and you apply ?
after the group. The regex string is bat(wo)?man
.
?
is a quantifier. A quantifier defines how many occurrences of a unit. There are a few of varieties of quantifiers in regex:
-
?
: 0 or 1 occurrence. -
+
: 1 or unlimited occurrences. -
*
: 0 or unlimited occurrences.
You could use quantifiers to match occurrences of a unit:
-
ba+
matchesba
andbaaaaa
fully, but doesn’t matchb
. -
ba*
matchesb
,ba
andbaaaaa
fully. -
ba?
matchesb
andba
fully, but only matchesbaaaa
partially.
In your group, (wo)?
, the syntax means the group on the left side of ?
is either one occurrence or nothing.
That’s the purpose of the group. w
and o
in wo
aren’t separable.
Build and run the app. Check to see that batwoman
and catwoman
can’t enter Supervillains Club.
What if you hadn’t used a group so that the regex string should have been batwo?man
?
[spoiler title=”Solution”]
That means the ?
modifier only applies to the o
character. So batwo?man
matches batwman
.
You don’t want this. You want either batman
or batwoman
, but not batwman
.
[/spoiler]
Using Boundaries
You’re satisfied with your superb code: You protected Supervillains Club from superheroes. Then one day, a supervillain named I'm not Batman
tries to register, and the validation stops the supervillain.
You get a complaint from your employer.
Now, you need to add a logic that the regex string needs to match batman
, catman
, batwoman
and catwoman
only if they appear at the beginning of the string.
Use a boundary to solve this problem. Add ^
in the front of the regex string. Then replace your pattern
line with:
val pattern = Regex("^[bc]at(wo)?man", RegexOption.IGNORE_CASE)
The ^
character doesn’t have the same meaning as the ^
character inside the brackets. ^bat
means bat
at the beginning of the string. [^bat]
means any characters other than b
, a
and t
.
Build and run the app. Now I'm not Batman
can register successfully in Supervillains Club.
$
. So bat$
means bat
at the end of the string.
Regex Helper Tools from IntelliJ IDEA
Sometimes when writing your regex pattern, you want to check if it works as soon as possible, even without running your app. For this purpose, use regex helper tools from IntelliJ IDEA.
Move your caret to the regex pattern and press Alt-Enter on Linux/Windows or Option-Enter on Mac:
You have two helper tools dealing with regex. One edits the regex fragment, and the other checks the regex pattern.
Choose Check RegExp:
You have a form to validate an input string with your regex pattern. If the input string matches the regex pattern, you’ll see a green check mark.
If you put in an invalid input string:
You’ll see a red exclamation mark.
If you find that your regex pattern doesn’t work as expected, go back to your regex pattern. Press Alt-Enter or Option+Enter again:
Then choose Edit RegExp Fragment:
You’ll see a dedicated editor for your regex pattern where you can edit your regex pattern and get hints. For example, delete )
after the alphabet o
:
You’ll see a warning about the missing )
.
For this regex pattern, an editor is overkill. But it can be handy while editing a complex regex.
Understanding Predefined Classes and Groups
You were working on the signup page when you got a distress call: Some superheroes have infiltrated Supervillains Club, and you need to root them out!
Open http://localhost:8080/impostors and you’ll see some names:
Click Find Impostors and you’ll get… nothing. The clue is anyone with Captain
is a superhero. Based on that information, it’s time to write a new regex.
You’ll use findAll
, a different method from Regex
. You don’t want to check whether a regex string matches a string. You want to take out strings that match the regex string inside a string.
In RegexValidator.kt, replace the content of filterNames
with:
val pattern = Regex("""Captain""")
return pattern.findAll(names).map {
it.value
}.toList()
The findAll
method returns a list of Regex
objects. To get the string match, you use the value
property of Regex
.
Build and run the app. Submit the form, and you’ll get this result:
Not good! You could write the regex string like this: Captain (America|Marvel)
. It works for this case, but it’s not scalable.
What if there’s another impostor named Captain Saving the World
or Captain Love
? Then you’d need to rewrite your regex string.
There’s a better way. You can use predefined classes and the +
quantifier.
Replace your Regex
with:
val pattern = Regex("""Captain\s\w+""")
\s
and \w
are predefined classes. \s
means any spaces, like Space or Tab. \w
means any word characters.
Build and run the app. Click Find Impostors and you’ll get this result:
Bingo- you successfully rooted them out!
\w
is same as [a-zA-Z_0-9]
you can change your regex with this more readable one:
val pattern = Regex("""Captain\s[a-zA-Z_0-9]+""")
val pattern = Regex("""Captain\s[a-zA-Z_0-9]+""")