How to Play, Record and Merge Videos in iOS and Swift

Learn the basics of working with videos on iOS with AV Foundation in this tutorial. You’ll play, record and even do some light video editing! By Owen L Brown.

4.3 (20) · 3 Reviews

Download materials
Save for later
Share
You are currently viewing page 3 of 4 of this article. Click here to view the first page.

Merging: Step 1

In this step, you’ll merge the videos into one long video.

Add the following code to merge(_:):

guard
  let firstAsset = firstAsset,
  let secondAsset = secondAsset
  else { return }

activityMonitor.startAnimating()

// 1
let mixComposition = AVMutableComposition()

// 2
guard
  let firstTrack = mixComposition.addMutableTrack(
    withMediaType: .video,
    preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
  else { return }
    
// 3
do {
  try firstTrack.insertTimeRange(
    CMTimeRangeMake(start: .zero, duration: firstAsset.duration),
    of: firstAsset.tracks(withMediaType: .video)[0],
    at: .zero)
} catch {
  print("Failed to load first track")
  return
}

// 4
guard
  let secondTrack = mixComposition.addMutableTrack(
    withMediaType: .video,
    preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
  else { return }
    
do {
  try secondTrack.insertTimeRange(
    CMTimeRangeMake(start: .zero, duration: secondAsset.duration),
    of: secondAsset.tracks(withMediaType: .video)[0],
    at: firstAsset.duration)
} catch {
  print("Failed to load second track")
  return
}

// 5
// TODO: PASTE CODE A

In the code above:

Notice that insertTimeRange(_:ofTrack:atStartTime:) allows you to insert a part of a video, rather than the whole thing, into your main composition. This way, you can trim the video to a time range of your choice.

In this instance, you want to insert the whole video, so you create a time range from CMTime.zero to your video asset duration.

Notice how the code inserts firstAsset at time .zero, and it inserts secondAsset at the end of the first video. That’s because this tutorial assumes you want your video assets one after the other, but you can also overlap the assets by playing with the time ranges.

  1. You create an AVMutableComposition to hold your video and audio tracks.
  2. Next, you create an AVMutableCompositionTrack for the video and add it to your AVMutableComposition.
  3. Then, you insert the video from the first video asset to this track.

    Notice that insertTimeRange(_:ofTrack:atStartTime:) allows you to insert a part of a video, rather than the whole thing, into your main composition. This way, you can trim the video to a time range of your choice.

    In this instance, you want to insert the whole video, so you create a time range from CMTime.zero to your video asset duration.

  4. Next, you do the same thing with the second video asset.

    Notice how the code inserts firstAsset at time .zero, and it inserts secondAsset at the end of the first video. That’s because this tutorial assumes you want your video assets one after the other, but you can also overlap the assets by playing with the time ranges.

  5. // TODO: PASTE CODE A is a marker — you’ll replace this line with the code in the next section.

In this step, you set up two separate AVMutableCompositionTrack instances. Now, you need to apply an AVMutableVideoCompositionLayerInstruction to each track to make some editing possible.

Merging the Videos: Step 2

Next up is to add instructions to the composition to tell it how you want the assets to be merged.

Add the next section of code after track code above in merge(_:). Replace // TODO: PASTE CODE A with the following code:

// 6
let mainInstruction = AVMutableVideoCompositionInstruction()
mainInstruction.timeRange = CMTimeRangeMake(
  start: .zero,
  duration: CMTimeAdd(firstAsset.duration, secondAsset.duration))

// 7
let firstInstruction = AVMutableVideoCompositionLayerInstruction(
  assetTrack: firstTrack)
firstInstruction.setOpacity(0.0, at: firstAsset.duration)
let secondInstruction = AVMutableVideoCompositionLayerInstruction(
  assetTrack: secondTrack)

// 8
mainInstruction.layerInstructions = [firstInstruction, secondInstruction]
let mainComposition = AVMutableVideoComposition()
mainComposition.instructions = [mainInstruction]
mainComposition.frameDuration = CMTimeMake(value: 1, timescale: 30)
mainComposition.renderSize = CGSize(
  width: UIScreen.main.bounds.width,
  height: UIScreen.main.bounds.height)

// 9                          
// TODO: PASTE CODE B

Here’s what’s happening in this code:

  1. First, you set up mainInstruction to wrap the entire set of instructions. Notice that the total time here is the sum of the first asset’s duration and the second asset’s duration.
  2. Next, you set up two instructions, one for each asset. The instruction for the first video needs one extra addition: You set its opacity to 0 at the end so it becomes invisible when the second video starts.
  3. Now that you have your AVMutableVideoCompositionLayerInstruction instances for the first and second tracks, you simply add them to mainInstruction. Next, you add mainInstruction to the instructions property of an instance of AVMutableVideoComposition. You also set the frame rate for the composition to 30 frames/second.
  4. // TODO: PASTE CODE B is a marker — you’ll replace this line with the code in the next section.

OK, so you’ve now merged your two video files. It’s time to spice them up with some sound!

Merging the Audio: Step 3

To give your clip some musical flair, add the following code to merge(_:). Replace // TODO: PASTE CODE B with the following code:

// 10
if let loadedAudioAsset = audioAsset {
  let audioTrack = mixComposition.addMutableTrack(
    withMediaType: .audio,
    preferredTrackID: 0)
  do {
    try audioTrack?.insertTimeRange(
      CMTimeRangeMake(
        start: .zero,
        duration: CMTimeAdd(
          firstAsset.duration,
          secondAsset.duration)),
      of: loadedAudioAsset.tracks(withMediaType: .audio)[0],
      at: .zero)
  } catch {
    print("Failed to load Audio track")
  }
}

// 11
guard
  let documentDirectory = FileManager.default.urls(
    for: .documentDirectory,
    in: .userDomainMask).first
  else { return }
let dateFormatter = DateFormatter()
dateFormatter.dateStyle = .long
dateFormatter.timeStyle = .short
let date = dateFormatter.string(from: Date())
let url = documentDirectory.appendingPathComponent("mergeVideo-\(date).mov")

// 12
guard let exporter = AVAssetExportSession(
  asset: mixComposition,
  presetName: AVAssetExportPresetHighestQuality)
  else { return }
exporter.outputURL = url
exporter.outputFileType = AVFileType.mov
exporter.shouldOptimizeForNetworkUse = true
exporter.videoComposition = mainComposition

// 13
exporter.exportAsynchronously {
  DispatchQueue.main.async {
    self.exportDidFinish(exporter)
  }
}

Here’s what the code above does:

Because the code performs the export asynchronously, this method returns immediately. The code calls the completion handler you supply to exportAsynchronously() whether the export fails, completes or the user cancels it.

Upon completion, the exporter’s status property indicates whether the export has completed successfully. If it fails, the value of the exporter’s error property supplies additional information about the reason for the failure.

  1. Similarly to video tracks, you create a new track for your audio and add it to the main composition. You set the audio time range to the sum of the duration of the first and second videos, because that will be the complete length of your video.
  2. Before you can save the final video, you need a path for the saved file. Create a unique file name based upon the current date and time that points to a file in the documents folder.
  3. Render and export the merged video. To do this, you create an AVAssetExportSession that transcodes the contents of the composition to create an output of the form described by a specified export preset. Because you’ve already configured AVMutableVideoComposition, all you need to do is assign it to your exporter.
  4. After you’ve initialized an export session with the asset that contains the source media, the export presetName and outputFileType, you run the export by invoking exportAsynchronously().

    Because the code performs the export asynchronously, this method returns immediately. The code calls the completion handler you supply to exportAsynchronously() whether the export fails, completes or the user cancels it.

    Upon completion, the exporter’s status property indicates whether the export has completed successfully. If it fails, the value of the exporter’s error property supplies additional information about the reason for the failure.

AVComposition combines media data from multiple file-based sources. At its top level, AVComposition is a collection of tracks, each presenting media of a specific type such as audio or video. An instance of AVCompositionTrack represents a single track.

Similarly, AVMutableComposition and AVMutableCompositionTrack also present a higher-level interface for constructing compositions. These objects offer insertion, removal and scaling operations that you’ve seen before and that will come up again.

Finally, build and run.

Select two videos and an audio file and merge the selected files. You’ll see a Video Saved message, which indicates that the merge was successful. At this point, your new video will be present in the photo album.

Pop-up after merge with message: Video saved

Go to the photo album or browse using the Select and Play Video screen in the app and you might notice some orientation issues in the merged video. Maybe portrait video is in landscape mode, and sometimes videos are upside down.

Video orientation issues

This is due to the default AVAsset orientation. All movie and image files recorded using the default iPhone camera app have the video frame set to landscape, so the iPhone saves the media in landscape mode. You’ll fix these problems next.