“Voice control and voice interfaces have begun to permeate all cutting-edge consumer device categories. Advances in speech recognition algorithms and AI accelerator hardware mean the technology can even be used in power- and cost-constrained applications, such as smart home devices.
Translated from – EEtimes
Voice control and voice interfaces have begun to permeate all cutting-edge consumer device categories. Advances in speech recognition algorithms and AI accelerator hardware mean the technology can even be used in power- and cost-constrained applications, such as smart home devices.
From a user’s perspective, the drivers behind voice control of smart home devices are clear.
“Ease of use and convenience are the main drivers right now,” says Alireza Kenarsari-Anhari, CEO of PicoVoice, and it’s easy to imagine that when you want a cup of coffee, you’re looking from your desk to your home office coffee maker Yelling, or giving orders to the tumble dryer while holding a basket of wet clothes.
We assume these smart devices that can’t be carried around can be permanently connected to the WiFi in the home, so why not do the voice processing in the cloud?
In this context, the edge AI trend is primarily driven by privacy. Privacy is a concern for consumers, but a must-have for some businesses, Alireza said. Reliability is another driver: “Does your washing machine make sense if your WiFi doesn’t work?”
Latency is also important in some cases; some applications do require real-time guarantees for voice workload processing, such as games.
Cost is another major driver of voice edge processing due to the additional cost of processing this voice data in the cloud. The pay-per-use cloud API business model is not suitable for use cases such as home appliances and consumer electronics, which have lower cost points and may be used multiple times a day.
PicoVoice’s AI speech-to-text inference engine is designed to run independently of the cloud on a sub-$1 microcontroller, with the goal of enabling voice control in applications. This could include consumer wearables and hearables, which are at the intersection of power efficiency and cost efficiency, and microcontroller-based voice solutions enable this. Kenarsari-Anhari said this power- and cost-optimized solution could also open up opportunities in industrial, security and medical applications.
PicoVoice recently launched Shepherd, a no-code platform for building voice applications on microcontrollers, which is compatible with the company’s model creation software, PicoVoice Console. Shepherd supports popular Arm Cortex-M microcontrollers from ST and NXP, and supports other devices in development.
Kenarsari-Anhari: “I see speech as a development interface — if you can build your GUI or website without coding, maybe with WordPress, the next step is to build speech interfaces in a similar way. Shepherd is empowering product managers and UX designers prototype and iterate quickly, but our goal is to expand their target user base.”
While it is entirely possible to develop natural language processing models and implement them without specialized software, this approach is not for everyone.
“Certainly – Apple, Amazon, Google and Microsoft have all done it, the key is whether the business has the resources, is committed to building an organization around it, and has the ability to wait a few years.”
Last summer, Syntiant CEO Kurt Busch said in an interview that voice is becoming the interface of choice for the next generation of technology users.
Kurt Busch describes this future through his youngest child. His youngest child, who is too young to read but cannot write, uses the voice function of his smartphone to text his friends.
Busch: “His older brothers and sisters would text, but his generation had cell phones years before them, and over time, for his generation and younger, their default interface was with conversation.”
Busch believes that voice will be “the touchscreen of the future,” and that in-device processing will provide fast, responsive interfaces first on devices with keyboards or mice, and then on white goods.
Syntiant’s chips are specialized AI accelerators designed to handle voice AI workloads in consumer electronics devices with low to very low power budgets. The startup has so far sold more than 10 million chips worldwide, most of which make their way into phones to enable always-on keyword detection. The latest Syntiant chip NDP120 can recognize hot words such as “OK Google” and activate Google Assistant under 280µW.
In the future, Busch also sees voice control allowing everyone to connect and access technology.
“We believe that voice is an important tool for the democratization of technology. There are 3 billion people in the world who live on $2 a day. Most of these people have no Internet access, no education, no ability to write, read, and voice interface means to them Major. The natural interface here is[语音]. That’s how you bring technology into a third world where today’s world doesn’t interact with technology. We are seeing a lot of interest in voice-first applications in many developing countries, hoping to benefit those segments of society that may not have had access before, not only from a cost perspective, but also from a comfort perspective. “
In this context, many developing countries have already developed a lot of interest in conversational AI.
The danger in a fast-growing market like voice interaction is that it can become extremely fragmented very quickly, and not just in terms of hardware, said Vikram Shirastava, senior director of IoT at Knowles.
Shirastava: “The market will be fragmented depending on which speech recognition engine is used. The market will become fragmented, depending on whether you have an integrated TV SoC or a simple MCU built in. Based on the operating system, or based on the sound environment , the use case will become fragmented – is it just the home? There cannot be a one-size-fits-all solution. You have to find common ground in these verticals and solve the sound integration problem accordingly.”
Knowles has a DSP-based voice control solution that it intends to roll out for different verticals. Knowles’ approach is to segment the market into categories that share common characteristics — for example, home controls, TV audio, and remote controls might fall into the same category. Then develop a solution optimized for this type of application. Shirastava calls this approach “a layer below turnkey,” which provides turnkey scalability but adds some flexibility.
“We had to put out several different distributions to address some aspect of fragmentation so we could cover the verticals we wanted to chase,” Shirastava said.
Knowles introduces the AISonic Bluetooth Standard Solution, a complete development solution for fast and easy voice integration into Bluetooth devices. The development kit enables OEMs and ODMs to build voice calling, control and far-field voice recognition capabilities into Bluetooth devices, including smart speakers, smart home locks, connected light switches, wearables and in-vehicle voice assistants. The kit is based on Knowles’ IA8201 dual-core DSP chip, which is designed for neural network processing and consumes far less power than an application processor. For example, the chip can simultaneously process independent AI models such as keyword identification, source classification, beamforming, acoustic echo cancellation (AEC) and source direction estimation at under 50 MW. This is achieved with nearly 400 custom instruction set extensions for audio and AI processing on the Tensilica DSP core, resulting in lower clock frequencies to save power.
AISonic Bluetooth Standard Solutions is a development kit under Knowles’ new family of reference solutions for voice activation, control and contextual audio processing for TVs, portable speakers, Sound Blasters, Liquor and a variety of IoT electronics. Called the Knowles industry standard solution.
Sugr’s iOttie Aivo Connect car smartphone mount uses Knowles’ IA8201 in-vehicle voice capabilities. It has the Alexa voice assistant built in.
As AI technology continues to advance, conversational AI is becoming a key tool for freeing hands and increasing productivity. The complex voice development environment, the high cost of processing voice data in the cloud, the high power consumption of equipment, and market fragmentation are all obstacles to voice interface. Will voice eventually become the default user interface for most consumer electronics? It seems so. This is made possible by advanced, efficient AI voice control algorithms, enabling developers to easily integrate voice, and the emergence of a growing ecosystem of energy and cost-effective hardware solutions.