Swift Regex Tutorial: Getting Started
Master the pattern-matching superpowers of Swift Regex. Learn to write regular expressions that are easy to understand, work with captures and try out RegexBuilder, all while making a Marvel Movies list app! By Ehab Amer.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Swift Regex Tutorial: Getting Started
30 mins
- Getting Started
- Understanding Regular Expressions
- Swiftifying Regular Expressions
- Loading the Marvel Movies List
- Reading the Text File
- Defining the Separator
- Defining the Fields
- Matching a Row
- Looking Ahead
- Capturing Matches
- Naming Captures
- Transforming Data
- Creating a Custom Type
- Conditional Transformation
- Where to Go From Here?
Looking Ahead
To get the expression to look ahead, you want the operation called NegativeLookAhead. In regular expression syntax, it's denoted as (?!pattern)
, where pattern is the expression you want to look ahead for.
titleField
should look ahead for fieldSeparator
before resuming the repetition of its any-character expression.
Change the declaration of titleField
to the following:
let titleField = OneOrMore {
NegativeLookahead { fieldSeparator }
CharacterClass.any
}
Build and run. Observe the output in the console log:
Found 49 matches
tt10857160 She-Hulk: Attorney at Law |
tt10648342 Thor: Love and Thunder |
tt13623148 I Am Groot |
tt9419884 Doctor Strange in the Multiverse of Madness |
tt10872600 Spider-Man: No Way Home |
Excellent. You fixed the expression, and it's back to only picking up the fields you requested.
Before you add the remaining fields, update ones with an any-character-type expression to include a negative lookahead. Change the declaration of premieredOnField
to:
let premieredOnField = OneOrMore {
NegativeLookahead { fieldSeparator }
CharacterClass.any
}
Then, change imdbRatingField
to:
let imdbRatingField = OneOrMore {
NegativeLookahead { CharacterClass.newlineSequence }
CharacterClass.any
}
Since you expect the rating at the end of the line, the negative lookahead searches for a newline character instead of a field separator.
Update recordMatcher
to include the remaining fields:
let recordMatcher = Regex {
idField
fieldSeparator
titleField
fieldSeparator
yearField
fieldSeparator
premieredOnField
fieldSeparator
urlField
fieldSeparator
imdbRatingField
}
Build and run. The console will show that it found 49 matches and will print all the rows correctly. Now, you want to hold or capture the relevant parts of the string that the expressions found so you can convert them to the proper objects.
Capturing Matches
Capturing data inside a Regex object is straightforward. Simply wrap the expressions you want to capture in a Capture
block.
Change the declaration of recordMatcher
to the following:
let recordMatcher = Regex {
Capture { idField }
fieldSeparator
Capture { titleField }
fieldSeparator
Capture { yearField }
fieldSeparator
Capture { premieredOnField }
fieldSeparator
Capture { urlField }
fieldSeparator
Capture { imdbRatingField }
/\n/
}
Then change the loop that goes over the matches to the following:
for match in matches {
print("Full Row: " + match.output.0)
print("ID: " + match.output.1)
print("Title: " + match.output.2)
print("Year: " + match.output.3)
print("Premiered On: " + match.output.4)
print("Image URL: " + match.output.5)
print("Rating: " + match.output.6)
print("---------------------------")
}
Build and run. The console log should output each row in full with a breakdown of each value underneath:
Found 49 matches
Full Row: tt10857160 She-Hulk: Attorney at Law......
ID: tt10857160
Title: She-Hulk: Attorney at Law
Year: (2022– )
Premiered On: Aug 18, 2022
Image URL: https://m.media-amazon.com/images/M/MV5BMjU4MTkxNz......jpg
Rating: 5.7
---------------------------
Full Row: tt10648342 Thor: Love and Thunder.....
ID: tt10648342
Title: Thor: Love and Thunder
Year: (2022)
Premiered On: July 6, 2022
Image URL: https://m.media-amazon.com/images/M/MV5BYmMxZWRiMT......jpg
Rating: 6.7
---------------------------
Before you added any captures, the output
object contained the whole row. By adding captures, it became a tuple whose first value is the whole row. Each capture adds a value to that tuple. With six captures, your tuple has seven values.
Naming Captures
Depending on order isn't always a good idea for API design. If the raw data introduces a new column in an update that isn't at the end, this change will cause a propagation that goes beyond just updating the Regex. You'll need to revise what the captured objects are and make sure you're picking the right item.
A better way is to give a reference name to each value that matches its column name. That'll make your code more resilient and more readable.
You can do this by using Reference
. Add the following at the top of loadData()
:
let idFieldRef = Reference(Substring.self)
let titleFieldRef = Reference(Substring.self)
let yearFieldRef = Reference(Substring.self)
let premieredOnFieldRef = Reference(Substring.self)
let urlFieldRef = Reference(Substring.self)
let imdbRatingFieldRef = Reference(Substring.self)
You create a Reference
object for each value field in the document using their data types. Since captures are of type Substring
, all the References are with that type. Later, you'll see how to convert the captured values to a different type.
Next, change the declaration of recordMatcher
to:
let recordMatcher = Regex {
Capture(as: idFieldRef) { idField }
fieldSeparator
Capture(as: titleFieldRef) { titleField }
fieldSeparator
Capture(as: yearFieldRef) { yearField }
fieldSeparator
Capture(as: premieredOnFieldRef) { premieredOnField }
fieldSeparator
Capture(as: urlFieldRef) { urlField }
fieldSeparator
Capture(as: imdbRatingFieldRef) { imdbRatingField }
/\n/
}
Notice the addition of the reference objects as the as
parameter to each capture.
Finally, change the contents of the loop printing the values of data to:
print("Full Row: " + match.output.0)
print("ID: " + match[idFieldRef])
print("Title: " + match[titleFieldRef])
print("Year: " + match[yearFieldRef])
print("Premiered On: " + match[premieredOnFieldRef])
print("Image URL: " + match[urlFieldRef])
print("Rating: " + match[imdbRatingFieldRef])
print("---------------------------")
Notice how you are accessing the values with the reference objects. If any changes happen to the data, you'll just need to change the regex reading the values, and capture it with the proper references. The rest of your code won't need any updates.
Build and run to ensure everything is correct. You won't see any differences in the console log.
At this point, you're probably thinking that it would be nice to access the value like a property instead of a key path.
The good news is that you can! But you'll need to write the expression as a literal and not use RegexBuilder
. You'll see how it's done soon. :]
Transforming Data
One great feature of Swift Regex is the ability to transform captured data into different types.
Currently, you capture all the data as Substring
. There are two fields that are easy to convert:
- The image URL, which doesn't need to stay as a string — it's more convenient to convert it to a
URL
- The rating, which works better as a number so you'll convert it to a Float
You'll change these now.
In ProductionsDataProvider.swift, change the declaration of urlFieldRef
to:
let urlFieldRef = Reference(URL.self)
This changes the expected type to URL
.
Then, change imdbRatingFieldRef
to:
let imdbRatingFieldRef = Reference(Float.self)
Similarly, this changes the expected data type to Float
.
Next, change the declaration of recordMatcher
to the following:
let recordMatcher = Regex {
Capture(as: idFieldRef) { idField }
fieldSeparator
Capture(as: titleFieldRef) { titleField }
fieldSeparator
Capture(as: yearFieldRef) { yearField }
fieldSeparator
Capture(as: premieredOnFieldRef) { premieredOnField }
fieldSeparator
TryCapture(as: urlFieldRef) { // 1
urlField
} transform: {
URL(string: String($0))
}
fieldSeparator
TryCapture(as: imdbRatingFieldRef) { // 2
imdbRatingField
} transform: {
Float(String($0))
}
/\n/
}
Notice how you captured urlField and imdbRatingField changed from just Capture(as::)
to TryCapture(as::transform:)
. If successful, the later attempts to capture the value will pass it to transform
function to convert it to the desired type. In this case, you converted urlField to a URL and imdbRatingField to a Float.
Now that you have the proper types, it's time to populate the data source.
Replace the code you have inside the loop to print to the console with:
let production = MarvelProductionItem(
imdbID: String(match[idFieldRef]), // 1
title: String(match[titleFieldRef]),
productionYear: ProductionYearInfo.fromString(String(match[yearFieldRef])), // 2
premieredOn: PremieredOnInfo.fromString(String(match[premieredOnFieldRef])), // 3
posterURL: match[urlFieldRef], // 4
imdbRating: match[imdbRatingFieldRef]) // 5
marvelProductions.append(production)
This creates an instance of MarvelProductionItem and appends it to the array, but there's a little more happening:
- You convert the first two Substring parameters to strings.
-
ProductionYearInfo is an
enum
. You're creating an instance from the string value. You'll implement this part in the next section. For now, the value is always ProductionYearInfo.unknown. -
PremieredOnInfo is also an
enum
you'll implement in the next section. The value for now is PremieredOnInfo.unknown. - The value provided for the poster is a URL and not a string.
- The rating value is already a Float.
Build and run. You should see the Movies and TV shows listed on the app.