Regular Expressions in Kotlin
Learn how to improve your strings manipulation with the power of regular expressions in Kotlin. You’ll love them! By arjuna sky kok.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Regular Expressions in Kotlin
30 mins
- Getting Started
- Understanding the Backstory
- Building and Running the Web App
- The Regex Object
- Using RegexOption
- Flag expression
- Understanding Character Classes, Groups, Quantifiers and Boundaries
- Using Character Classes
- Using Groups and Quantifiers
- Using Boundaries
- Regex Helper Tools from IntelliJ IDEA
- Understanding Predefined Classes and Groups
- Captured Groups and Back-references
- Understanding Greedy Quantifiers, Possessive Quantifiers and Reluctant Quantifiers
- Using Greedy Quantifiers
- Using Possessive Quantifiers
- Using Reluctant Quantifiers
- Understanding the Logical Operator and Escaping Regex
- Where to Go From Here?
Using Reluctant Quantifiers
Replace the content of extractNamesFromHtml
with:
val pattern = Regex("""<li>(.*?)</li>""")
val results = pattern.findAll(names)
return results.map {
it.groupValues[1]
}.toList()
Notice, the difference is you put ?
on the right of .*
. This is a reluctant quantifier.
Build and run the app. Then submit the form:
This is the correct result. The (.*?)
matches as few characters as possible before </li>
. The (.*?)
reluctantly moves forward.
Now, you successfully extracted the meals data using regex.
To get more familiar these quantifiers, check out this comparison between them and their results:
Understanding the Logical Operator and Escaping Regex
Supervillains Club recruits a lot of young supervillains. They also monitor the chatting between young supervillains to ensure they don’t defect. But Gen Z writes differently: they don’t respect English grammar.
This creates a problem when Supervillains Club wants to analyze Gen Z’s dialog when chatting. A Gen Z supervillain might write: “I just beat a hero :] looks like I’m good :)”.
You have to separate the dialog into sentences, but Gen Z supervillains don’t use end punctuation. Kids these days… :]
Fortunately, Supervillains Club’s NLP scientists have done their research. It looks like Gen Z uses :]
, :)
and 🤣
as a .
replacement.
Open http://localhost:8080/split and submit the form:
Nothing happens. It’s time to analyze Gen Z using regex!
To split the sentences using regex, you use… split
!
In RegexValidator.kt, replace the content of splitSentences
with:
val escapedString = Regex.escape(""":)""")
val pattern = Regex("""(:]|${escapedString})|🤣""")
return pattern.split(sentences).map {
it.trim()
}
split
uses the regex string to split the input string, looking inside the regex string for string separators. If the regex string is Y
and the input string is sunny Y rainy Y cloudy
, then the result is sunny
, rainy
and cloudy
.
But you notice there’s another character, |
. This is a special character in regex. It means a logical operator.
If you want to use more than one character to split, join them using |
. If the regex string is Y|B
, then you’ll split the sentences using Y
or B
.
You’ll also see you escape :)
using escape
:
Regex.escape(""":)""")
The )
character is special in regex. As you learned previously, it’s the character you use to create a group.
Build and run the app. Submit the form again. You’ll see this:
Your work impressed Supervillains Club. They offer to make you a supervillain.
Why not?
Your supervillain name is Regex Monster
. When people have a problem, you tell them a popular regex joke: “Now you have two problems.” :]
Where to Go From Here?
Download the final project using the Download Materials button at the top or bottom of the tutorial.
You learned the most common Regex
methods, but there are some you didn’t tried like replaceFirst
, splitToSequence
and toPattern
. You can consult on the Regex API documentation to learn more.
You also need to be careful with the catastrophic backtracking problem. If you write regex wrong, the regex could consume high CPU and create an outage.
You used some regex patterns but the regex pattern is vast. For example, you haven’t used the multi-lines regex pattern and named groups. Head to the Regex pattern documentation to learn more about the regex pattern.
Regex isn’t invincible. It fails in fuzzy operations like classifying the sentiment of a tweet. For this problem, you need Natural Language Processing or NLP.
Regex is complicated. You can debug the regex pattern in many regex playgrounds. One examples is regex101. Choose the Java 8 flavor in the playground.
I hope you enjoyed this tutorial! Please join the forum discussion below if you have any questions or comments.