Manufacturers compete for speech recognition, who can detonate the market?

How long does it take for humans to use intelligent and advanced voice assistants like Iron Man? Last year, several domestic speech recognition vendors announced their own new strategies for speech recognition. It seems that the natural interaction between humans and speech is gradually approaching.

Keda Xunfei once announced that the self-developed offline voice dictation engine will be applied to products such as “Xunfei Input Method” to meet the needs of users for voice technology without network or weak network. A few days before this, another company, Spirit, also announced in an industry salon that the direction of the human-computer interaction experience was redefined, and that the machine could be changed from being able to listen to what it would be.

Foreign giants are also in the field of speech recognition. Some foreign media reported that Microsoft is developing its own voice personal assistant software codenamed "Cortana" and plans to launch it in the next big upgrade of Windows Phone platform to counter Google Now and Apple Siri.

As Li Jianhui, vice president and general manager of the dialogue workshop, said, the development of smart devices and the advent of the mobile Internet era have made cognitive computing the future direction of human-computer interaction, requiring more natural, intuitive and immersive interaction. the way.

Zhang Jidong, deputy general manager of the Mobile Internet Division of the University of Science and Technology, described the evolution of speech recognition products as a marathon. In the process, many manufacturers have already withdrawn from the field, Sogou voice assistants have not been promoted high-profile, and Airi has stopped updating a year ago. Another small i robot turned to the B2B market.

With the exit of the manufacturer and the entry of new vendors, a new round of layout and competition based on speech recognition applications has begun.

Lame voice interaction experience

Although the speech recognition rate of the Xunfei Xunfei input method can reach more than 95%, from the perspective of the entire speech recognition application, the current user experience can only be described as lame.

On the one hand, it is due to the inherent defects of the error of voice interaction. “If the accuracy of speech recognition is between 85% and 95%, the accuracy of semantic analysis is between 85% and 95%, and the accuracy of final recognition is only 70%-90%.” Yu Kai, chief scientist of Spirit Say.

It is more difficult for offline voice technology. At present, there are offline voice technology, Google (microblogging) and Apple's two international giants and the University of Science and Technology. However, due to lack of network connection and limited storage space, the success rate of offline speech recognition of HKUST is only about 85%, “just reached the usable level.”

On the other hand, speech recognition technology has just begun to evolve due to the high technical threshold. "From voice evaluation, speech synthesis to understanding of natural semantics, each direction requires sufficient corpus and algorithms for continuous optimization." Zhang Jidong said.

While the technology is being optimized, an ecosystem needs to be built. For example, the community question and answer, similar to what the film played by Andy Lau is, or a knowledge map based on music and video, similar to what the film of Andy Lau has played.

"It's a trend to replace the keyboard input with voice-based natural interactions, but it's a trend, but it's not the time to rise to the level you just need." Zhang Jidong said.

Heavy money cast to speech recognition

Despite the difficulties, the general direction of speech recognition technology has become irreversible.

“All handset manufacturers are investing in voice, expanding their investment in voice technology, creating more elegant designs and integrating them deep into their handsets,” said Michael Thompson, vice president of voice recognition technology at Nuance.

Although Apple's Siri has been repeatedly ridiculed, even known as one of Apple's most failed products, but Apple's investment has increased. Apple even set up a mysterious office near the Massachusetts Institute of Technology (MIT) to develop Siri speech recognition technology. Yu Kai revealed that the personnel of Siri's voice technology department maintained a ratio of 1:4. One person is responsible for studying the input and output of speech, and four people are responsible for natural language processing, which is used to overcome the difficulty of natural interaction of speech.

Domestic manufacturers who have been deeply involved in the field of speech recognition have also received investment for research and development. The year before, Spirit acquired joint investment by Lenovo and Enlightenment. China Mobile (microblogging), through its subsidiaries, invested in the HKUST News, at a price of 1.363 billion yuan, accounting for 15% of the shares. Then in December of that year, it jointly launched the intelligent voice portal product “Spiritual”. The spirit can realize the functions of voice calling, texting, and weather checking.

Who can detonate voice interaction?

"Sometimes it may be embarrassing, and it is even possible that the future will be driven by other directions." Zhang Jidong said. He believes that WeChat is one of them.

At the time of the launch of WeChat, many people saw that other users would feel very confused when they talked to the mobile phone “self-talking”, and later discovered that it was the voice intercom function of WeChat. Now, people have become accustomed to talking to WeChat.

Zhang Jidong believes that the next one that is likely to detonate speech recognition applications is the increasingly popular wearable device. For example, the bracelet can pass user data to the cloud and then analyze a suggestion for personal health. Even, the data finds that a user's work schedule is irregular, and the voice assistant can give a voice prompt when the user needs to rest.

More realistic applications are wearable devices such as smart watches, such as voiceprint recognition and voice wake-up, which can be typical applications. The former user can use his own voice as the password to turn on the device, while the latter wakes up the device without the user touching the device.

"We are also working with chip vendors to integrate speech recognition technology into smart wearable devices to reduce power consumption and increase the application time of speech recognition on wearable devices." Head of a speech recognition technology vendor Say.

Nano Flexible Glass

Guangzhou Ehang Electronic Co., Ltd. , https://www.ehangmobile.com