Scanner Tutorial for macOS
Use NSScanner to analyze strings from natural form to computer languages. In this NSScanner tutorial, you’ll learn how to extract information from emails. By Hai Nguyen.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Scanner Tutorial for macOS
25 mins
Extracting Cost-Related Information
A good example of parsing unstructured data is to determine whether the email’s body contains cost-related information. To do this, you’ll use Scanner
to search for an occurrence of a dollar character: $.
Still working on ParserEngine.swift, add the following implementation inside ParserEngine
class:
func costInfoByExtractingFrom(_ string: String) -> [Double] {
// 1.
var results = [Double]()
// 2.
let dollar = CharacterSet(charactersIn: "$")
// 3.
let scanner = Scanner(string: string)
scanner.charactersToBeSkipped = dollar
// 4.
while !scanner.isAtEnd && scanner.scanUpToCharacters(from: dollar, into: nil) {
results += [scanner.scanDouble()].flatMap { $0 }
}
return results
}
The code is fairly straightforward:
- Defines an empty array to store the cost values.
- Creates a
CharacterSet
object with a$
character. - Initializes a
Scanner
instance and configures it to ignore the $ character. - Loops through
string
‘s content and when a$
is found, grabs the number after$
with your helper method and appends it toresults
array.
Parsing the Message
Another example of parsing unstructured data is finding keywords in a given body of text. Your search strategy is to look at every word and check it against a set of keywords to see if it matches. You’ll use the whitespace and newline characters to take the words in the message as scanning.
Add the following code at the end of ParserEngine
class:
// 1.
let keywords: Set<String> = ["apple", "macs", "software", "keyboard",
"printers", "printer", "video", "monitor",
"laser", "scanner", "disks", "cost", "price",
"floppy", "card", "phone"]
/// Return a set of keywords extracted from
func keywordsByExtractingFrom(_ string: String) -> Set<String> {
// 2.
var results: Set<String> = []
// 3.
let scanner = Scanner(string: string)
// 4.
while !scanner.isAtEnd, let word = scanner.scanUpTo(" ")?.lowercased() {
if keywords.contains(word) {
results.insert(word)
}
}
return results
}
Here’s what this code does:
- Defines the keywords set that you’ll match against.
- Creates a
Set
ofString
to store the found keywords. - Initializes a
Scanner
instance. You’ll use the defaultcharactersToBeSkipped
, which are the whitespace and newline characters. - For every word found, checks whether it’s one of the predefined
keywords
. If it is, appends it intoresults
.
There — you have all of the necessary methods to acquire the desired information. Time to put them to good use and create HardwarePost
instances for the 49 data files.
Connecting the Parser With Data Samples
Open HardwarePost.swift and add this initializer into HardWarePost
structure:
init(fromData data: Data) {
// 1.
let parser = ParserEngine()
// 2.
let string = String(data: data, encoding: String.Encoding.utf8) ?? ""
// 3.
let scanner = Scanner(string: string)
// 4.
let metadata = scanner.scanUpTo("\n\n") ?? ""
let (sender, email, subject, date, organization, lines) = parser.fieldsByExtractingFrom(metadata)
// 5.
self.sender = sender
self.email = email
self.subject = subject
self.date = date
self.organization = organization
self.numberOfLines = lines
// 6.
let startIndex = string.characters.index(string.startIndex, offsetBy: scanner.scanLocation) // A
let message = string[startIndex..<string.endIndex] // B
self.message = message.trimmingCharacters(in: .whitespacesAndNewlines ) // C
// 7.
costs = parser.costInfoByExtractingFrom(message)
keywords = parser.keywordsByExtractingFrom(message)
}
Here's how HardwarePost
initializes its properties:
- Simply creates a
ParserEngine
object namedparser
. - Converts
data
into aString
. - Initializes an instance of
Scanner
to parse the Metadata and Message segments, which are separated by "\n\n". - Scans up to the first
\n\n
to grab the metadata string, then invokes theparser
'sfieldsByExtractingFrom(_:)
method to obtain all of the metadata fields. - Assigns the parsing results to the
HardwarePost
properties. - Prepares the message content:
- Gets the current reading cursor from
scanner
withscanLocation
and converts it toString.CharacterView.Index
, so you can substitutestring
by range. - Assigns the remaining string that
scanner
has yet to read into the newmessage
variable. - Since
message
value still contains\n\n
where thescanner
left off from the previous reading, you need to trim it and give the new value back to theHardwarePost
instance'smessage
property.
- Gets the current reading cursor from
- Invokes the
parser
's methods withmessage
to retrieve values forcost
andkeywords
properties.
- Gets the current reading cursor from
scanner
withscanLocation
and converts it toString.CharacterView.Index
, so you can substitutestring
by range. - Assigns the remaining string that
scanner
has yet to read into the newmessage
variable. - Since
message
value still contains\n\n
where thescanner
left off from the previous reading, you need to trim it and give the new value back to theHardwarePost
instance'smessage
property.
At this point, you can create HardwarePost
instances directly from the files' data. You are only few more steps from displaying the final product!
Displaying Parsed Data
Open PostCell.swift and add the following method inside the PostCell
class implementation:
func configure(_ post: HardwarePost) {
senderLabel.stringValue = post.sender
emailLabel.stringValue = post.email
dateLabel.stringValue = post.date
subjectLabel.stringValue = post.subject
organizationLabel.stringValue = post.organization
numberOfLinesLabel.stringValue = "\(post.numberOfLines)"
// 1.
costLabel.stringValue = post.costs.isEmpty ? "NO" :
post.costs.map { "\($0)" }.lazy.joined(separator: "; ")
// 2.
keywordsLabel.stringValue = post.keywords.isEmpty ? "No keywords found" :
post.keywords.joined(separator: "; ")
}
This code assigns the post values to the cell labels. costLabel
and keywordsLabel
require special treatment because they can be empty. Here's what happens:
- If the
costs
array is empty, it sets thecostLabel
string value to NO; otherwise, it concatenates the cost values with "; " as a separator. - Similarly, sets
keywordsLabel
string value toNo words found
for an empty set ofpost.keywords
.
You're almost there! Open DataSource.swift. Delete the DataSource
initializer init()
and add the following code into the class:
let hardwarePosts: [HardwarePost] // 1.
override init() {
self.hardwarePosts = Bundle.main // 2.
.urls(forResourcesWithExtension: nil, subdirectory: "comp.sys.mac.hardware")? // 3.
.flatMap( { try? Data(contentsOf: $0) }).lazy // 4.
.map(HardwarePost.init) ?? [] // 5.
super.init()
}
This is what the code does:
- Stores the
HardwarePost
instances. - Obtains a reference to the application's main Bundle.
- Retrieves urls of the sample files inside the comp.sys.mac.hardware directory.
- Lazily acquires an array of
Data
instances by reading file contents withData
failable initializer andflatMap(_:)
. The idea of usingflatMap(_:)
is to get back a subarray containing only elements that are notnil
. - Finally, transforms the
Data
results to aHardwarePost
object and assigns them to theDataSource
hardwarePosts
property.
Now you need to set up the table view's data source and delegate so that your app can show your hard work.
Open DataSource.swift. Find numberOfRows(in:)
and replace it with the following:
func numberOfRows(in tableView: NSTableView) -> Int {
return hardwarePosts.count
}
numberOfRows(in:)
is part of the table view’s data source protocol; it sets the number of rows of the table view.
Next, find tableView(_:viewForTableColumn:row:)
and replace the comment that says: //TODO: Set up cell view
with the code below:
cell.configure(hardwarePosts[row])
The table view invokes its delegate tableView(_:viewForTableColumn:row:)
method to set up every individual cell. It gets a reference to the post for that row and invokes PostCell
's configure(_:)
method to display the data.
Now you need to show the post in the text view when you select a post on the table view. Replace the initial implementation of tableViewSelectionDidChange(_:)
with the following:
func tableViewSelectionDidChange(_ notification: Notification) {
guard let tableView = notification.object as? NSTableView else {
return
}
textView.string = hardwarePosts[tableView.selectedRow].message
}
tableViewSelectionDidChange(_:)
is called when the table view’s selection has changed. When that happens, this code gets the hardware post for the selected row and displays the message
in the text view.
Build and run your project.
All of the parsed fields are now neatly displayed on the table. Select a cell on the left, and you'll see the corresponding message on the right. Good Job!