Alibaba’s speech recognition algorithm can isolate voices in noisy crowds

Chinese conglomerate Alibaba is one of the world’s largest ecommerce companies, but increasingly, it’s turning its attention to artificial intelligence (AI).

In March 2017, it launched an AI services division for health care and manufacturing, and September, its public cloud division — Alibaba Cloud — unveiled plans to set up a dedicated subsidiary and produce a self-developed AI inference chip that could be used for logistics and autonomous driving.

Alibaba has its fingers in plenty of AI pies, needless to say. And during a presentation at NeurIPS 2018 in Montreal this morning, it delivered an update on its cross-company efforts.

“We’re solving … scenarios [with] unseen difficulties,” Rong Jin, dean of the Alibaba Institute of Data Science, said. “AI together with innovation [is helping] to solve some interesting challenges.”

One of those challenges is speech recognition in noisy environments, like a crowded subway system or congested convention center. Alibaba’s solution is part hardware, part software: a far-field microphone array and sophisticated deep learning algorithms that isolate voices in a crowd, drastically reducing error rate.

Compared to the 84 percent accuracy the “best” speech recognition technologies are able to achieve with a mic array alone, Alibaba claims its model is between 94 and 95 percent accurate, even with heavily accented speakers.

Already, it has been deployed as part of a voice-based subway ticketing system in Shanghai, and Alibaba is in talks to bring it to “a number of [additional] cities.”

“Nothing can save you if you don’t get enough signal to be recognized in the first place,” Jin said.

The spoken word isn’t the only domain Alibaba is tackling with AI. Using natural language processing, it’s performing automatic translation in real time, in the cloud, so that Alibaba retail customers in countries such as Russia and Malay can converse with human agents in their native tongues. And it’s tapping algorithms to field a portion of the tens of thousands of calls its support centers receive each day with Alime, Alibaba’s intelligent customer service engine.

Alime, much like Google’s Duplex, can carry on a phone conversation and answer basic questions without involving a human being. Perhaps more impressively, in a chatbot context, it’s able to automatically extract text and images from a supplied document with “better than human” performance.

In an onstage demo, a customer asked Dian Xiaomi — Alibaba’s answering bot — about sales promotions for a particular Bluetooth speaker, like what sort of free gifts they’d receive with their purchase and how the gifts would be delivered to their residence. (A future version rolling out later this year will add sentiment analysis and automated alerts for priority cases.) Another demo showed a humanoid embodiment of the chatbot — a prototype, Jin told the audience — with coordinated eye, lip, and head movements.

It’s a boon for bustling Alibaba divisions like AliExpress, which has over 150 million users and millions of merchants, and Cainiao, whose human workers and robots fulfill more than a billion orders each year. On Singles’ Day — the November 11 Chinese shopping holiday that this year generated $30.8 billion — Alibaba’s agents receive five times the amount of calls in a 24-hour period, which would be nearly impossible to juggle without a helping hand from AI.

Dian Xiaomi currently serves almost 3.5 million users a day, Alibaba says.

But natural language processing is just the tip of Alibaba’s AI iceberg. On Xian Yu, the retailer’s used goods marketplace, the company deployed a price negotiation bot that talks to buyers to settle on a price.

The bot’s development wasn’t a cakewalk — it needed to learn negotiating strategies and efficient ways to generate text that’d incentivize back-and-forth negotiation — but the end result is impressive. When published to 10 million users on the same platform, the bot had a 20 percent higher chance of making a deal than a typical human being.

“Most of the [users] are not professional sellers,” Jin said. “They don’t know how to set a price or talk to buyers.”

On the inventory management and image search front, Alibaba is leveraging a scalable computer vision architecture to sift through hundreds of millions of entities. Its Cloud Image Search algorithm can recognize objects and find images containing similar or identical ones, and one of its store management apps — which picks out multiple items on a shelf to generate a summary that includes a distribution of different brands — can detect more than 100,000 SKUs with “high accuracy.” (Alibaba’s working toward a goal of 10 million SKUs.)

Both compliment Alibaba’s Ali Smart Supply Chain (ASSC), a suite of AI tools that help Alibaba merchants forecast product demand, allocate inventory, and select pricing strategies.

Alibaba’s machine vision work extends to satellite images. Using data gathered from AutoNavi, the largest map and navigation provider in China with over 70 million users, its systems are able to identify new buildings recently constructed, for example, and gather information related to road work and points of interest.

Alibaba is also using computer vision to prevent shoplifting. At its more than 66 Hema brick-and-mortar stores, offline algorithms at its self-checkout kiosks prevent ne’re-do-well customers from scanning only the first item and a basket but not the rest, or concealing items from the overhead camera’s view.

“The goal is to … have a computer vision system figure out if a customer is intentionally or unintentionally scanning items,” Jin said. “The machine sees that things aren’t scanned.”

It’s powered by a deep learning algorithm — AliFPGA-X100 — that runs on a field-programmable gate array, a reconfigurable integrated circuit within the kiosks. Alibaba says it’s able to process images up to 170 times faster compared to a comparable GPU-based system.

Alibaba is applying AI, too, to Youku, its video hosting service. Machine learning algorithms automatically generate thumbnails for the roughly 200,000 videos its tens of millions of active users upload each day, and target certain audience segments with said thumbnails. (Female users might see a different preview image for a given video than male users, for example.) They’ve led to a 15 percent improvement in click-through rate and 12 percent uptick in dwell time.

Today’s survey comes just over a year after the debut of Alibaba’s new research organization — the Academy for Discovery, Momentum, and Outlook (or DAMO) — aimed at tackling emerging technologies like machine learning and network security, and the opening of labs in San Mateo, California; Seattle, Washington; Moscow, Russia; Tel Aviv, Israel; and Singapore. It also follows on the heels of the launch of Alibaba’s Tmall Genie, its AI-powered voice assistant that’s sold over 5 million units since it hit store shelves in July 2017.

Alibaba plans to spend more than $15 billion on research and development by 2020, it told Quartz in October 2017.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.