How to Make a Narrated Book Using AVSpeechSynthesizer in iOS 7

Learn how to make Siri read you a bedtime story to you by using one of iOS 7’s newest features: AVSpeechSynthesizer. By .

Leave a rating/review
Save for later
Share
You are currently viewing page 2 of 4 of this article. Click here to view the first page.

To Speak or Not to Speak!

That is the question.

Open RWTPageViewController.m and underneath #import "RWTPage.h", add the following line:

@import AVFoundation;

iOS speech support is in the AVFoundation framework so you must import the AVFoundation module.

Note: The @import will both import and link the AVFoundation framework. To learn more about @import as well as some other new Objective-C language features in iOS 7, check out the article: What’s New in Objective-C and Foundation in iOS 7.

Note: The @import will both import and link the AVFoundation framework. To learn more about @import as well as some other new Objective-C language features in iOS 7, check out the article: What’s New in Objective-C and Foundation in iOS 7.

Add the following line just below the declaration of the currentPageIndex property in the RWTPageViewController class extension:

@property (nonatomic, strong) AVSpeechSynthesizer *synthesizer;

You’ve just added the speech synthesizer that will speak the words in each page.

Think of the AVSpeechSynthesizer you just added to your view controller as the person doing the speaking. AVSpeechUtterance instances represent the chunks of text the synthesizer speaks.

Note: An AVSpeechUtterance can be a single word like “Whisky” or an entire sentence, such as, “Whisky, frisky, hippidity hop.”

Note: An AVSpeechUtterance can be a single word like “Whisky” or an entire sentence, such as, “Whisky, frisky, hippidity hop.”

Add the following code just before the @end at the bottom of RWTPageViewController.m

#pragma mark - Speech Management

- (void)speakNextUtterance
{
  AVSpeechUtterance *nextUtterance = [[AVSpeechUtterance alloc]
                                       initWithString:[self currentPage].displayText];
  [self.synthesizer speakUtterance:nextUtterance];
}

You’ve created an utterance to speak, and told the synthesizer to speak it.

Now add the following code just below speakNextUtterance

- (void)startSpeaking
{
  if (!self.synthesizer) {
    self.synthesizer = [[AVSpeechSynthesizer alloc] init];
  }

  [self speakNextUtterance];
}

This code initializes the synthesizer property if it’s not already initialized. Then it invokes speakNextUtterance to speak.

Add the following line of code to the very end of viewDidLoad, gotoNextPage and gotoPreviousPage

  [self startSpeaking];

Your additions ensure that speech starts when the book loads, as well as when the user advances to the next or previous page.

Build and run and listen to the dulcet tones of AVSpeechSyntesizer.

Note: If you don’t hear anything, check the volume on your Mac or iOS device (wherever you’re running the app). You might need to swipe between pages to start speech again if you missed it.

Also note: if you are running this project in the simulator, be prepared to have your console filled with cryptic error messages. This appears only to happen in the simulator. They will not print out when used on a device.

Note: If you don’t hear anything, check the volume on your Mac or iOS device (wherever you’re running the app). You might need to swipe between pages to start speech again if you missed it.

Also note: if you are running this project in the simulator, be prepared to have your console filled with cryptic error messages. This appears only to happen in the simulator. They will not print out when used on a device.

Once you’ve confirmed that you can hear speech, try building and running again, but this time, swipe from right-to-left before the first page finishes talking. What do you notice?

The synthesizer will start speaking the second page’s text once it’s completed the first page. That’s not what users will expect; they’ll expect that swiping to another page will stop speech for the current page and start it for the next page. This glitch isn’t so worrisome for short pages like nursery ryhmes, but imagine what could happen with very long pages…

Breaking Speech into Parts

One reliable principle of software engineering is to keep data and code separate. It makes testing your code easier, and it makes it easier to run your code on different input data. Moreover, keeping data out of code allows you to download new data at runtime. For example, wouldn’t it be grand if your book app could download new books at runtime?

You’re currently using a simple test book Book.testBook to test your code. You’re about to change that by storing books in and reading them from Apple’s plist (XML) format files.

Open Supporting Files\WhirlySquirrelly.plist and you’ll see something like the following

WhirlySquirrelly.plist

You can also see the raw data structure by Right-Clicking on Supporting Files\WhirlySquirrelly.plist and selecting Open As\Source Code.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>bookPages</key>
  <array>
    <!-- First page -->
    <dict>
      <key>backgroundImage</key>
      <string>PageBackgroundImage.jpg</string>
      <key>utterances</key>
      <array>
        <dict>
          <key>utteranceProperties</key>
          <dict>
            <key>pitchMultiplier</key>
            <real>1</real>
            <key>rate</key>
            <real>1.2</real>
          </dict>
          <key>utteranceString</key>
          <string>Whisky,</string>
        </dict>
        ...
      </array>
    </dict>
    <!-- Second page -->
    <dict>
      <key>backgroundImage</key>
      <string>PageBackgroundImage.jpg</string>
      <key>utterances</key>
      <array>
        <dict>
          <key>utteranceProperties</key>
          <dict>
            <key>pitchMultiplier</key>
            <real>1.2</real>
            <key>rate</key>
            <real>1.3</real>
          </dict>
          <key>utteranceString</key>
          <string>Whirly,</string>
        </dict>
        ...
      </array>
    </dict>
  </array>
</dict>
</plist>

It’s nice to have a high-level view of my data structures. The data structure in Supporting Files\WhirlySquirrelly.plist is outlined as follows (where {} indicates a dictionary and [] an array):

Book {
  bookPages => [
  	{FirstPage
                backgroundImage => "Name of background image file",
  		utterances => [
  			{ utteranceString     => "what to say first",
  			  utteranceProperties => { how to say it }
  			},
  			{ utteranceString     => "what to say next",
  			  utteranceProperties => { how to say it }
  			}
  		]
  	},
  	{SecondPage
                backgroundImage => "Name of background image file",
  		utterances => [
  			{ utteranceString     => "what to say last",
  			  utteranceProperties => { how to say it }
  			}
  		]
  	}
  ]
}

Behold the power of ASCII art! :]

Supporting Files\WhirlySquirrelly.plist breaks up the text into one-utterance-per-word. The virtue of doing this is that you can control the speech properties’ pitch (high voice or low voice) and rate (slow or fast-talking) for each word.

The reason your synthesizer sounds so mechanical, like a robot from a cheesy 1950’s sci-fi movie, is that its diction is too uniform. To make your synthesizer speak more like a human, you’ll need to control the pitch and meter, which will vary its diction.

Parsing Power

You’ll parse Supporting Files\WhirlySquirrelly.plist into a RWTBook object. Open RWTBook.h and add the following line right after the declaration of bookWithPages:

  + (instancetype)bookWithContentsOfFile:(NSString*)path;

This method will read a file like Supporting Files\WhirlySquirrelly.plist, then initialize and return a RWTBook instance that holds the file’s data.

Open RWTBook.m and add the following code right below #import "RWTPage.h"

#pragma mark - External Constants

NSString* const RWTBookAttributesKeyBookPages = @"bookPages";

This is the key you’ll use to retrieve the book’s pages from files like Supporting Files\WhirlySquirrelly.plist.

With RWTBook.m still open, add the following code at the bottom of the file, just before the @end.


#pragma mark - Private

+ (instancetype)bookWithContentsOfFile:(NSString*)path
{
  // 1
  NSDictionary *bookAttributes = [NSDictionary dictionaryWithContentsOfFile:path];
  if (!bookAttributes) {
    return nil;
  }

  // 2
  NSMutableArray *pages = [NSMutableArray arrayWithCapacity:2];
  for (NSDictionary *pageAttributes in [bookAttributes objectForKey:RWTBookAttributesKeyBookPages]) {
    RWTPage *page = [RWTPage pageWithAttributes:pageAttributes];
    if (page) {
      [pages addObject:page];
    }
  }

  // 3
  return [self bookWithPages:pages];
}

Here’s what your new code does:

  1. Reads and initializes a dictionary of book attributes from the given path. This is where your code reads Supporting Files\WhirlySquirrelly.plist.
  2. Creates a new Page object for each dictionary of page attributes under the book attributes.
  3. Returns a new book using the handy bookWithPages: provided in the starter project.

Open RWTPageViewController.m and navigate to viewDidLoad. Replace the line

  [self setupBook:[RWTBook testBook]];

with

  NSString *path = [[NSBundle mainBundle] pathForResource:@"WhirlySquirrelly" ofType:@"plist"];
  [self setupBook:[RWTBook bookWithContentsOfFile:path]];

Your new code locates WhirlySquirrelly.plist and creates a book from it by using bookWithContentsOfFile:.

You’re almost ready to run your new code. Open RWTPage.m and add the following code below the #import "RWTPage.h"

@import AVFoundation;

Now you can reference AVSpeechUtterance in this file.

Add the following constant definitions just below the definition of RWTPageAttributesKeyBackgroundImage

NSString* const RWTUtteranceAttributesKeyUtteranceString = @"utteranceString";
NSString* const RWTUtteranceAttributesKeyUtteranceProperties = @"utteranceProperties";

These are the keys you’ll use to parse out individual AVSpeechUtterance attributes from a plist.

Replace pageWithAttributes: with the following

+ (instancetype)pageWithAttributes:(NSDictionary*)attributes
{
  RWTPage *page = [[RWTPage alloc] init];

  if ([[attributes objectForKey:RWTPageAttributesKeyUtterances] isKindOfClass:[NSString class]]) {
    // 1
    page.displayText = [attributes objectForKey:RWTPageAttributesKeyUtterances];
    page.backgroundImage = [attributes objectForKey:RWTPageAttributesKeyBackgroundImage];
  } else if ([[attributes objectForKey:RWTPageAttributesKeyUtterances] isKindOfClass:[NSArray class]]) {
    // 2
    NSMutableArray *utterances = [NSMutableArray arrayWithCapacity:31];
    NSMutableString *displayText = [NSMutableString stringWithCapacity:101];

    // 3
    for (NSDictionary *utteranceAttributes in [attributes objectForKey:RWTPageAttributesKeyUtterances]) {
      // 4
      NSString *utteranceString =
                 [utteranceAttributes objectForKey:RWTUtteranceAttributesKeyUtteranceString];
      NSDictionary *utteranceProperties =
                     [utteranceAttributes objectForKey:RWTUtteranceAttributesKeyUtteranceProperties];

      // 5
      AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:utteranceString];
      // 6
      [utterance setValuesForKeysWithDictionary:utteranceProperties];

      if (utterance) {
        // 7
        [utterances addObject:utterance];
        [displayText appendString:utteranceString];
      }
    }

    // 8
    page.displayText = displayText;
    page.backgroundImage = [UIImage imageNamed:[attributes objectForKey:RWTPageAttributesKeyBackgroundImage]];
  }

  return page;
}

Here’s what your new code does:

  1. Handles the case like RWTBook.testBook where a page’s utterances are a single NSString. Sets the display text and background image.
  2. Handles the case like Supporting Files\WhirlySquirrelly.plist where a page’s utterances are an NSArray of NSDictionary. Accumulates all the utterances and display text.
  3. Loop over the individual utterances for the page.
  4. Grabs the individual utterance’s utteranceString and utteranceProperties.
  5. Create a new AVSpeechUtterance to speak utteranceString.
  6. Set the new utterance’s properties using Key Value Coding (KVC). Although not openly documented by Apple, AVSpeechUtterance responds to the selector setValuesForKeysWithDictionary: so you can use it to set all the utteranceProperties in one fell swoop. Conveniently, this means you can add new utterance properties to your plist without needing to write new setter invocation code; setValuesForKeysWithDictionary: will handle the new properties automatically. That is, of course, provided the corresponding properties exist on AVSpeechUtterance and are writable.
  7. Accumulate the utterance and display text.
  8. Set the display text and background image.

Build and run and listen to the speech.

You’ve constructed each RWTPage.displayText from the combined utteranceStrings for the page in the plist. So, your page view displays the entire page’s text.
However, remember that RWTPageViewController.speakNextUtterance creates a single AVSpeechUtterance for the entire RWTPage.displayText. The result is that it overlooks your carefully parsed utterance properties.

In order to modify how each utterance is spoken, you need to synthesize each page’s text as individual utterances. If only there were some way to observe and control how and when AVSpeechSynthesizer speaks. Hmmm…