Regular Expressions in Kotlin
Learn how to improve your strings manipulation with the power of regular expressions in Kotlin. You’ll love them! By arjuna sky kok.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Regular Expressions in Kotlin
30 mins
- Getting Started
- Understanding the Backstory
- Building and Running the Web App
- The Regex Object
- Using RegexOption
- Flag expression
- Understanding Character Classes, Groups, Quantifiers and Boundaries
- Using Character Classes
- Using Groups and Quantifiers
- Using Boundaries
- Regex Helper Tools from IntelliJ IDEA
- Understanding Predefined Classes and Groups
- Captured Groups and Back-references
- Understanding Greedy Quantifiers, Possessive Quantifiers and Reluctant Quantifiers
- Using Greedy Quantifiers
- Using Possessive Quantifiers
- Using Reluctant Quantifiers
- Understanding the Logical Operator and Escaping Regex
- Where to Go From Here?
Captured Groups and Back-references
You captured all superheroes infiltrating Supervillains Club with the first name Captain
. Now, they’re ready to convert to supervillains, but supervillains can’t use Captain
as their name.
Your task now is to extract the last name from the superheroes. Later, Supervillains Club will give them a first name suitable for a supervillain.
To recap, you have to remove Captain
from Captain Marvel
, then give Marvel
to your employer. Later, your employer will give them a different first name, like Dark Marvel
. You only need to extract the last name.
Build and run the app. Open http://localhost:8080/extract then click Extract Names. Nothing happens:
To solve this problem, you’ll still use findAll
. But this time, you’ll use a group in the regex string.
In RegexValidator.kt, replace the content of extractNames
with:
val pattern = Regex("""Captain\s(\w+)""")
val results = pattern.findAll(names)
return results.map {
it.groupValues[1]
}.toList()
This code is almost the same as the previous code, but there are two differences:
-
(\w+)
: The regex string now has a group. -
groupValues[1]
: You usegroupValues
instead ofvalue
.
groupValues[1]
refers to the (\w+)
group in the regex string. Remember that (\w+)
is the last name.
What is the number 1 in groupValues[1]
exactly? It’s the index of the first group in groupValues
array.
You don’t you use index 0 instead because it refers to the full match, such as Captain Marvel
. But how big could groupValues
be? It depends on the number of the groups in the regex string.
Suppose you have three groups in the regex string:
val pattern = Regex("""((Cap)tain)\s(\w+)""")
If the input string is Captain Marvel
:
- Index 0 refers to
Captain Marvel
. - Index 1 refers to
Captain
. - Index 2 refers to
Cap
. - Index 3 refers to
Marvel
.
You count the index from the outer groups to inner or nested groups, then from left to right. The first group refers to the full match. The second group refers to ((Cap)tain)
.
Then you go inside the second group to get the third group. The third group refers to (Cap)
. Then you move to the right, and the fourth group refers to (\w+)
.
Build and run the app. Then click Extract Names. You’ll get this result:
You’ve extracted the last name perfectly. Good job!
You feel proud of your code. It helps supervillains prosper in this wicked world.
But, your employer doesn’t have time to pick a custom first name for the superheroes willing to become a supervillain. They tell you to use a generic first name, Super Evil
and be done with it. So Captain Marvel
will become Super Evil Marvel
.
Open https://localhost:8080/replace and click Replace Names. Nothing happens:
It’s time to convert these superheroes to supervillains!
To replace strings with regex, you use… guess what? replace
. :]
Change the content of replaceNames
in RegexValidator.kt with the code below:
val pattern = Regex("""Captain\s(\w+)""")
return pattern.replace(names, "Super Evil $1")
replace
accepts two parameters. The first is the string against which you want to match your regex """Captain\s(\w+)"""
.
The second is the replacement string. It’s Super Evil $1
.
The $1
in Super Evil $1
is a special character. $1
is the same as groupValues[1]
in the previous example. This is a back-reference.
So the back-reference makes a reference to the captured group. The captured group is (\w+)
in Captain\s(\w+)
.
It’s like you wrote:
val pattern = Regex("""Captain\s(\w+)""")
val results = pattern.findAll(names)
return results.map {
"Super Evil ${it.groupValues[1]}"
}.joinToString()
But it’s much less code!
Build and run the app. Click Replace Names. You’ll see all superheroes who want to repent got a new first name:
Now with these new names, the superheroes have become supervillains officially!
Understanding Greedy Quantifiers, Possessive Quantifiers and Reluctant Quantifiers
Supervillains Club throws you another task. All supervillains have diet plans. The nutritionist in Supervillains Club has made a plan tailored for supervillains.
Open http://localhost:8080/diet and you’ll see a diet plan for supervillains in HTML format:
The data scientists ask you to extract the diet plan from the HTML file. In other words, you want to extract an array of the meals from the HTML string: 5kg Unicorn Meat, 2L Lava, 2kg Meteorite.
You need to match strings between the li
tags. The strings could be anything. How do you match strings that can be anything?
You use .
to represent any character in regex. Any character means any characters in the universe, with one exception.
.
can match the line terminators or not depending on the configuration of the regex. But you don’t need to worry about this in this tutorial.
You know the ?
, *
and +
quantifiers. These are called greedy quantifiers. You’ll know why they’re greedy soon!
What happens if you join .
and *
? They match any characters or any strings!
Interestingly, you can add the ?
or +
quantifiers to .*
. The quantifiers alter the behavior of .*
. You’ll experiment with all of them.
Using Greedy Quantifiers
First, you’ll use the greedy quantifier, .*
.
In RegexValidator.kt, replace the content of the extractNamesFromHtml
with:
val pattern = Regex("""<li>(.*)</li>""")
val results = pattern.findAll(names)
return results.map {
it.groupValues[1]
}.toList()
Here, you use the method you used previously, findAll
. The logic is simple: You use a group to capture the string between the li
tags. Then you use groupValues
when extracting the string.
Build and run the app, then submit the form. The result isn’t something you expect:
You got a one-item array, not a three-item array. The (.*)
regex pattern swallowed the </li>
strings as well except the last one.
That’s why people call this quantifier greedy. It tries to match the string as much as possible while still getting the correct full match result.
But there’s another quantifier that’s greedier than the greedy quantifier: the possessive quantifier.
Using Possessive Quantifiers
Now, replace the content of extractNamesFromHtml
with:
val pattern = Regex("""<li>(.*+)</li>""")
val results = pattern.findAll(names)
return results.map {
it.groupValues[1]
}.toList()
Notice that the difference is you put +
on the right of .*
. This is a possessive quantifier.
Build and run the app. Then submit the form:
The result is empty. The regex pattern failed to match the string because .*+
in <li>(.*+)</li>
matches 5kg Unicorn Meat</li><li>2L Lava</li><li>2kg Meteorite</li>
. So by the time the regex pattern moves to </li>
in <li>(.*+)</li>
, it can’t match the string because there is nothing to match.
What you want is a reluctant quantifier.