Scanner Tutorial for macOS
Use NSScanner to analyze strings from natural form to computer languages. In this NSScanner tutorial, you’ll learn how to extract information from emails. By Hai Nguyen.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Scanner Tutorial for macOS
25 mins
Creating the Data Structure
Navigate to File\New\File… (or simply press Command+N). Select macOS > Source > Swift File and click Next. Set the file’s name to HardwarePost.swift, then click Create.
Open HardwarePost.swift and add the following structure:
struct HardwarePost {
// MARK: Properties
// the fields' values once extracted placed in the properties
let email: String
let sender: String
let subject: String
let date: String
let organization: String
let numberOfLines: Int
let message: String
let costs: [Double] // cost related information
let keywords: Set<String> // set of distinct keywords
}
This code defines HardwarePost
structure that stores the parsed data. By default, Swift provides you a default constructor based on its properties, but you’ll come back to this later to implement your own custom initializer.
Are you ready for parsing in action with Scanner
? Let’s do this.
Creating the Data Parser
Navigate to File\New\File… (or simply press Command+N), select macOS > Source > Swift File and click Next. Set the file’s name to ParserEngine.swift, then click Create.
Open ParserEngine.swift and create ParserEngine
class by adding the following code:
final class ParserEngine {
}
Extracting Metadata Fields
Consider the following sample metadata segment:
Here’s where Scanner
comes in and separates the fields and their values. The image below gives you a general visual representation of this structure.
Open ParserEngine.swift and implement this code inside ParserEngine
class:
// 1.
typealias Fields = (sender: String, email: String, subject: String, date: String, organization: String, lines: Int)
/// Returns a collection of predefined fields' extracted values
func fieldsByExtractingFrom(_ string: String) -> Fields {
// 2.
var (sender, email, subject, date, organization, lines) = ("", "", "", "", "", 0)
// 3.
let scanner = Scanner(string: string)
scanner.charactersToBeSkipped = CharacterSet(charactersIn: " :\n")
// 4.
while !scanner.isAtEnd { // A
let field = scanner.scanUpTo(":") ?? "" // B
let info = scanner.scanUpTo("\n") ?? "" // C
// D
switch field {
case "From": (email, sender) = fromInfoByExtractingFrom(info) // E
case "Subject": subject = info
case "Date": date = info
case "Organization": organization = info
case "Lines": lines = Int(info) ?? 0
default: break
}
}
return (sender, email, subject, date, organization, lines)
}
Don’t panic! The Xcode error of an unresolved identifier will go away right in the next section.
Here’s what the above code does:
- Defines a
Fields
type alias for the tuple of parsed fields. - Creates variables that will hold the returning values.
- Initializes a
Scanner
instance and changes itscharactersToBeSkipped
property to also include a colon beside the default values – whitespace and linefeed. - Obtains values of all the wanted fields by repeating the process below:
- Uses
while
to loop throughstring
‘s content until it reaches the end. - Invokes one of the helper functions you created earlier to get
field
‘s title before:
. - Continues scanning up to the end of the line where the linefeed character
\n
is located and assigns the result toinfo
. - Uses
switch
to find the matching field and stores itsinfo
property value into the proper variable. - Analyzes From field by calling
fromInfoByExtractingFrom(_:)
. You’ll implement the method after this section.
- Uses
- Uses
while
to loop throughstring
‘s content until it reaches the end. - Invokes one of the helper functions you created earlier to get
field
‘s title before:
. - Continues scanning up to the end of the line where the linefeed character
\n
is located and assigns the result toinfo
. - Uses
switch
to find the matching field and stores itsinfo
property value into the proper variable. - Analyzes From field by calling
fromInfoByExtractingFrom(_:)
. You’ll implement the method after this section.
Remember the tricky part of From field? Hang tight because you’re going to need help from regular expression to overcome this challenge.
At the end of ParserEngine.swift, add the following String
extension:
private extension String {
func isMatched(_ pattern: String) -> Bool {
return NSPredicate(format: "SELF MATCHES %@", pattern).evaluate(with: self)
}
}
This extension defines a private helper method to find whether the string matches a given pattern using regular expressions.
It creates a NSPredicate
object with a MATCHES
operator using the regular expression pattern. Then it invokes evaluate(with:)
to check if the string matches the conditions of the pattern.
NSPredicate
in the official Apple documentation.Now add the following method inside the ParserEngine
implementation, just after fieldsByExtractingFrom(_:)
method:
fileprivate func fromInfoByExtractingFrom(_ string: String) -> (email: String, sender: String) {
let scanner = Scanner(string: string)
// 1.
/*
* ROGOSCHP@MAX.CC.Uregina.CA (Are we having Fun yet ???)
* oelt0002@student.tc.umn.edu (Bret Oeltjen)
* (iisi owner)
* mbuntan@staff.tc.umn.edu ()
* barry.davis@hal9k.ann-arbor.mi.us (Barry Davis)
*/
if string.isMatched(".*[\\s]*\\({1}(.*)") { // A
scanner.charactersToBeSkipped = CharacterSet(charactersIn: "() ") // B
let email = scanner.scanUpTo("(") // C
let sender = scanner.scanUpTo(")") // D
return (email ?? "", sender ?? "")
}
// 2.
/*
* "Jonathan L. Hutchison" <jh6r+@andrew.cmu.edu>
* <BR4416A@auvm.american.edu>
* Thomas Kephart <kephart@snowhite.eeap.cwru.edu>
* Alexander Samuel McDiarmid <am2o+@andrew.cmu.edu>
*/
if string.isMatched(".*[\\s]*<{1}(.*)") {
scanner.charactersToBeSkipped = CharacterSet(charactersIn: "<> ")
let sender = scanner.scanUpTo("<")
let email = scanner.scanUpTo(">")
return (email ?? "", sender ?? "")
}
// 3.
return ("unknown", string)
}
After examining the 49 data sets, you end up with three cases to consider:
- email (name)
- name <email>
- email with no name
Here’s what the code does:
- Matches
string
with the first pattern – email (name). If not, continues to the next case.- Looks for zero or more occurrences of any character –
.*
, followed by zero or more occurrence of a space –[\\s]*
, followed by one open parenthesis –\\({1}
and finally zero or more occurrences of a string –(.*)
. - Sets the
Scanner
object’scharactersToBeSkipped
to include: “(“, “)” and whitespace. - Scans up to
(
to get theemail
value. - Scans up to
)
, which gives you thesender
name. This extracts everything before(
and after)
.
- Looks for zero or more occurrences of any character –
- Checks whether the given string matches the pattern – name <email>. The if body is practically the same as the first scenario, except that you deal with angle brackets.
- Finally, if neither of the two patterns is matched, this is the case where you only have an email. You’ll simply return the string for the email and “unknown” for sender.
- Looks for zero or more occurrences of any character –
.*
, followed by zero or more occurrence of a space –[\\s]*
, followed by one open parenthesis –\\({1}
and finally zero or more occurrences of a string –(.*)
. - Sets the
Scanner
object’scharactersToBeSkipped
to include: “(“, “)” and whitespace. - Scans up to
(
to get theemail
value. - Scans up to
)
, which gives you thesender
name. This extracts everything before(
and after)
.
At this point, you can build the project. The previous compile error is gone.
NSDataDetector
would be a better solution for known-data types like phone number, address, and email. You can check out this blog about email validation with NSDataDetector
.You’ve been working with Scanner
to analyze and retrieve information from a patterned string. In the next two sections, you’ll learn how to parse unstructured data.