Skip to content

April 16, 2018

Apple’s Latest Machine Learning Journal Entry Focuses on ‘Hey Siri’ Trigger Phrase

by John_A

Apple’s latest entry in its online Machine Learning Journal focuses on the personalization process that users partake in when activating “Hey Siri” features on iOS devices. Across all Apple products, “Hey Siri” invokes the company’s AI assistant, and can be followed up by questions like “How is the weather?” or “Message Dad I’m on my way.”

“Hey Siri” was introduced in iOS 8 on the iPhone 6, and at that time it could only be used while the iPhone was charging. Afterwards, the trigger phrase could be used at all times thanks to a low-power and always-on processor that fueled the iPhone and iPad’s ability to continuously listen for “Hey Siri.”

In the new Machine Learning Journal entry, Apple’s Siri team breaks down its technical approach to the development of a “speaker recognition system.” The team created deep neural networks and “set the stage for improvements” in future iterations of Siri, all motivated by the goal of creating “on-device personalization” for users.

Apple’s team says that “Hey Siri” as a phrase was chosen because of its “natural” phrasing, and described three scenarios where unintended activations prove troubling for “Hey Siri” functionality. These include “when the primary users says a similar phrase,” “when other users say “Hey Siri”,” and “when other users say a similar phrase.” According to the team, the last scenario is “the most annoying false activation of all.”

To lessen these accidental activations of Siri, Apple leverages techniques from the field of speaker recognition. Importantly, the Siri team says that it is focused on “who is speaking” and less on “what was spoken.”

The overall goal of speaker recognition (SR) is to ascertain the identity of a person using his or her voice. We are interested in “who is speaking,” as opposed to the problem of speech recognition, which aims to ascertain “what was spoken.” SR performed using a phrase known a priori, such as “Hey Siri,” is often referred to as text-dependent SR; otherwise, the problem is known as text-independent SR.

The journal entry then goes into how users enroll in a personalized “Hey Siri” process using explicit and implicit enrollment. Explicit begins the minute that users speak the trigger phrase a few times, but implicit is “created over a period of time” and made during “real-world situations.”

The Siri team says that the remaining challenges faced by speaker recognition is figuring out how to get quality performance in reverberant (large room) and noisy (car) environments. You can check out the full Machine Learning Journal entry on “Hey Siri” right here.

Since it began last summer, Apple has shared numerous entries in its Machine Learning Journal about complex topics, which have already included “Hey Siri”, face detection, and more. All past entries can be seen on Apple.com.

Tag: machine learning
Discuss this article in our forums

MacRumors-All?d=6W8y8wAjSf4 MacRumors-All?d=qj6IDK7rITs

Advertisements
Read more from News

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments

%d bloggers like this: