Seattle is chosen as the location of Mobvoi AI Lab,
to be immersed in the forefront of AI technology.
The lab is responsible for pushing Mobvoi technology forward.
We work with several universities in both Asia and U.S. on innovation and
academic publications in the top international conferences, in speech and
NLP technologies. Our scientists are also talented developers who deliver
product-level codes for our business and clients.
Our technologies are driven by industry needs and standards,
not just academic benchmarks.
Additionally we are actively expanding language coverage from Mandarin to others, such as Taiwan-accented Mandarin, Cantonese, and English. We have also started building German and Korean IoT services.
Please visit https://github.com/mobvoi
for the list of our github projects.
Dr. Mei-Yuh Hwang,
a long-time speech veteran and an IEEE fellow, is the director for the lab.
A Ph.D. from Carnegie Mellon University, Mei-Yuh has built various speech recognition systems for many languages, both for DARPA researches (Resource Management, Wall Street Journal, ATIS, EARS, GALE) and for industry products (Microsoft Speech API, Cortana, and Skype speech-to-speech translation). She has gained extensive linguistic knowledge across many languages, via her experiences in Bing machine translation. She accumulated a deep understanding of Cortana NLP, in Mandarin, French, Italian, German and Spanish. She is a rare talent combining speech recognition, NLP, and linguistics in many languages.
Dr. Yangyang Shi, a Ph.D. from Delft University, leads R&D in both speech recognition and natural language understanding. Dr. Shi worked at Microsoft China as an applied scientist from 2012-2016 on Cortana NLP, Microsoft Sunnyvale Bing search rank team in 2016-2017, and finally joined Mobvoi AI Lab in November 2017. Since joining Mobvoi, Dr. Shi has been very productive and has focused on speech recognition. He is often working on multiple projects simultaneously, on both production lines and research publications. The lab maintains 2-4 interns
all year round who are usually under Dr. Shi's mentorship.
Additional scientists are located in Beijing, Suzhou and Taipei. The teams work together while each site has its own focus:
- Beijing: AI on-chip, hotword triggering and in-car personal assistents for VW.
- Suzhou: 2-3 dozens of automated customer-service support (CSS), by speech AI technologies
- Taipei: Far EasTone (FET) smart speakers and CSS business clients in Taiwan
- Seattle: English, and other languages.
- 2018 academic publications
- An end-to-end approach for handling unknown slot values in dialogue state tracking, ACL
- Domain adversarial training for accented speech recognition, ICASSP
- Data augmentation with adversarial examples for robust speech recognition, Interspeech
- A probability weighted beamformer for noise robust ASR Interspeech
- Source-critical reinforcement learning for transferring spoken language
understanding to a new language, COLING
- A robust nonlinear microphone array postfilter for noise reduction, IWAENC
- A teacher-student framework for maintainable dialog manager, EMNLP
- Multiple Beamformers with Rover for the CHiME-5 Challenge, Chime5 Workshop.
- 2019 academic publications
- Knowledge Distillation for Recurrent Neural Network Language Modeling With Trust Regularization, ICASSP
- End-to-end Speech Recognition Using High Rank LSTM-CTC Based Model, ICASSP
- Adversarial examples for improving end-to-end
attention-based small-footprint keyword spotting, ICASSP
Incremental learning from scratch for task-origented dialog systems, ACL
A novel method to correct steering vectors in MVDR beamformer for
noise robust ASR, Interspeech
- Adversarial regularization for attention based end-to-end robust speech recognition, IEEE Transactions on ASLP,
Vol 27, Issue 11, Nov. 2019, pp1826-1838.
- Region Proposal Network Based Small-Footprint Keyword Spotting, IEEE Signal Processing Letters, Digital Object Identifier: 10.1109/LSP.2019.2936282
- Multiple fixed beamformers with a spacial Wiener-form postfilter for far-field speech recognition, APSIPA.
- Are you for Real? Detecting Identity Fraud via Dialogue Interactions, IJCNLP.
- 2020 academic publications
- Mining effective negative training samples for keyword spotting, submitted to ICASSP 2020.
- Regularize CTC with Maximum Entropy and another CTC For Speech Recognition, submitted to ICASSP 2020.