Paper Digest: Recent Papers on Speech Recognition
Paper Digest Team extracted all recent Speech Recognition related papers on our radar, and generated highlight sentences for them. The results are then sorted by relevance & date. In addition to this ‘static’ page, we also provide a real-time version of this article, which has more coverage and is updated in real time to include the most recent updates on this topic.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more. Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: Recent Papers on Speech Recognition
Paper | Author(s) | Source | Date | |
---|---|---|---|---|
1 | Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this approach we aim to build ASR model for languages with limited digital resources by sequentially adapting the model across linguistically similar languages. |
Leena G Pillai; Kavya Manohar; Basil K Raju; Elizabeth Sherly; | arxiv-cs.CL | 2024-11-07 |
2 | Enhancing AAC Software for Dysarthric Speakers in E-Health Settings: An Evaluation Using TORGO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prompt-overlap is a well-known issue with this dataset where phrases overlap between training and test speakers. Our work proposes an algorithm to break this prompt-overlap. |
Macarious Hui; Jinda Zhang; Aanchan Mohan; | arxiv-cs.CL | 2024-11-01 |
3 | Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach that first refines all available transcriptions to ensure data reliability. |
Enshi Zhang; Christian Poellabauer; | arxiv-cs.CL | 2024-10-27 |
4 | Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contributions include creating a domain-specific dataset, comprehensive ASR model evaluations, and an effective augmentation technique. |
ChaeHun Park; Hojun Cho; Jaegul Choo; | arxiv-cs.CL | 2024-10-24 |
5 | STTATTS: Unified Speech-To-Text And Text-To-Speech Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a parameter-efficient approach to learning ASR and TTS jointly via a multi-task learning objective and shared parameters. |
Hawau Olamide Toyin; Hao Li; Hanan Aldarmaki; | arxiv-cs.CL | 2024-10-24 |
6 | MmWave-Whisper: Phone Call Eavesdropping and Transcription Using Millimeter-Wave Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces mmWave-Whisper, a system that demonstrates the feasibility of full-corpus automated speech recognition (ASR) on phone calls eavesdropped remotely using off-the-shelf frequency modulated continuous wave (FMCW) millimeter-wave radars. |
Suryoday Basak; Abhijeeth Padarthi; Mahanth Gowda; | arxiv-cs.SD | 2024-10-22 |
7 | DENOASR: Debiasing ASRs Through Selective Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel framework DENOASR, which is a selective denoising technique to reduce the disparity in the word error rates between the two gender groups, male and female. |
Anand Kumar Rai; Siddharth D Jaiswal; Shubham Prakash; Bendi Pragnya Sree; Animesh Mukherjee; | arxiv-cs.SD | 2024-10-22 |
8 | VoiceBench: Benchmarking LLM-Based Voice Assistants Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field. |
YIMING CHEN et. al. | arxiv-cs.CL | 2024-10-22 |
9 | Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system, by cutting off the non-causal effect of noises. |
YEONJOON JUNG et. al. | arxiv-cs.CL | 2024-10-20 |
10 | Roadmap Towards Superhuman Speech Understanding Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To guide the development of speech LLMs, we propose a five-level roadmap, ranging from basic automatic speech recognition (ASR) to advanced superhuman models capable of integrating non-semantic information with abstract acoustic knowledge for complex tasks. |
FAN BU et. al. | arxiv-cs.CL | 2024-10-17 |
11 | Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Parameter-efficient fine-tuning and text-only adaptation are two popular methods that have been used to address such low-resource settings. In this work, we investigate how these techniques can be effectively combined using a multilingual multimodal model like SeamlessM4T. |
Abhishek Gupta; Amruta Parulekar; Sameep Chattopadhyay; Preethi Jyothi; | arxiv-cs.CL | 2024-10-17 |
12 | Investigation of Speaker Representation for Target-Speaker Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While most studies have focused on training schemes or system architectures for each specific task, the auxiliary network for embedding target-speaker cues has not been investigated comprehensively in a unified cross-task evaluation. Therefore, this paper aims to address a fundamental question: what is the preferred speaker embedding for TS tasks? |
TAKANORI ASHIHARA et. al. | arxiv-cs.SD | 2024-10-14 |
13 | Automatic Speech Recognition with BERT and CTC Transformers: A Review Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: All in all, this review provides valuable insights for researchers and practitioners who are interested in ASR with BERT and CTC transformers. |
Noussaiba Djeffal; Hamza Kheddar; Djamel Addou; Ahmed Cherif Mazari; Yassine Himeur; | arxiv-cs.CL | 2024-10-12 |
14 | Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To develop Indonesian automatic speech recognition (ASR), we present our research on state-of-the-art speech recognition models, namely Massively Multilingual Speech (MMS) and Whisper, as well as compiling a dataset comprising Indonesian speech with variabilities to facilitate our study. |
AULIA ADILA et. al. | arxiv-cs.CL | 2024-10-11 |
15 | Integrating Paralinguistics in Speech-Empowered Large Language Models for Natural Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken responses with naturally occurring prosodic features relevant to the given input speech without relying on explicit automatic speech recognition (ASR) or text-to-speech (TTS) systems. |
HEESEUNG KIM et. al. | nips | 2024-10-07 |
16 | REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. |
LIANG-HSUAN TSENG et. al. | nips | 2024-10-07 |
17 | Comprehensive Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for The Polish Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A comprehensive framework has been designed to survey, catalog, and curate available speech datasets, which allows replicable evaluation of automatic speech recognition (ASR) systems. |
Michał Junczyk; | nips | 2024-10-07 |
18 | Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we hypothesize that incorporating speaker representations during speech recognition can enhance model robustness to noise. |
Sagarika Alavilli; Annesya Banerjee; Gasser Elbanna; Annika Magaro; | arxiv-cs.SD | 2024-10-07 |
19 | Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) have started to play a vital role in modelling speech and text. |
Pavel Stepachev; Pinzhen Chen; Barry Haddow; | arxiv-cs.CL | 2024-10-04 |
20 | Reverb: Open-Source ASR and Diarization from Rev Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as … |
NISHCHAL BHANDARI et. al. | arxiv-cs.CL | 2024-10-04 |
21 | Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). |
Olga Iakovenko; Ivan Bondarenko; | arxiv-cs.SD | 2024-10-03 |
22 | Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rules described in the present paper are implemented in an open-source module, which can be of use to any scientific study connected to ASR or Speech To Text (STT) tasks. |
Olga Iakovenko; Ivan Bondarenko; Mariya Borovikova; Daniil Vodolazsky; | arxiv-cs.CL | 2024-10-03 |
23 | VHASR: A Multimodal Speech Recognition System With Vision Hotwords Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach effectively utilizing audio-related image information and set up VHASR, a multimodal speech recognition system that uses vision as hotwords to strengthen the model’s speech recognition capability. |
JILIANG HU et. al. | arxiv-cs.SD | 2024-10-01 |
24 | Automatic Speech Recognition for The Ika Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a cost-effective approach for developing Automatic Speech Recognition (ASR) models for low-resource languages like Ika. |
Uchenna Nzenwata; Daniel Ogbuigwe; | arxiv-cs.CL | 2024-10-01 |
25 | AfriHuBERT: A Self-supervised Speech Representation Model for African Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present AfriHuBERT, an extension of mHuBERT-147, a state-of-the-art (SOTA) and compact self-supervised learning (SSL) model, originally pretrained on 147 languages. |
Jesujoba O. Alabi; Xuechen Liu; Dietrich Klakow; Junichi Yamagishi; | arxiv-cs.CL | 2024-09-30 |
26 | ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, developing robust ASR models for young children’s speech remains challenging due to differences in pronunciation, tone, and pace compared to adult speech. In this paper, we introduce a new Mandarin speech dataset focused on children aged 3 to 5, addressing the scarcity of resources in this area. |
JIAMING ZHOU et. al. | arxiv-cs.SD | 2024-09-27 |
27 | Improving Multilingual ASR in The Wild Using Simple N-best Re-ranking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy for several prominent acoustic models by employing external features such as language models and text-based language identification models. |
Brian Yan; Vineel Pratap; Shinji Watanabe; Michael Auli; | arxiv-cs.CL | 2024-09-26 |
28 | Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These models often rely on an ASR-to-TTS chain-of-thought pipeline, converting speech into text for processing before generating audio responses, which introduces latency and loses audio features. We propose a method that implicitly internalizes ASR chain of thought into a speech LLM, enhancing its native speech understanding capabilities. |
Robin Shing-Hei Yuen; Timothy Tin-Long Tse; Jian Zhu; | arxiv-cs.CL | 2024-09-25 |
29 | Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning. |
Andrés Piñeiro-Martín; Carmen García-Mateo; Laura Docío-Fernández; María del Carmen López-Pérez; Georg Rehm; | arxiv-cs.CL | 2024-09-25 |
30 | Spelling Correction Through Rewriting of Non-Autoregressive ASR Lattices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models. |
LEONID VELIKOVICH et. al. | arxiv-cs.CL | 2024-09-24 |
31 | Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a speech-conditioned Large Language Model (LLM) integrated with a Mixture of Experts (MoE) based connector to address the challenge of Code-Switching (CS) in Automatic Speech Recognition (ASR). |
FENGRUN ZHANG et. al. | arxiv-cs.SD | 2024-09-24 |
32 | Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel training approach to enhance LLM performance in ASR tasks. |
Yang Yuhang; Peng Yizhou; Eng Siong Chng; Xionghu Zhong; | arxiv-cs.CL | 2024-09-24 |
33 | MultiMed: Multilingual Medical Speech Recognition Via Attention Encoder Decoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce MultiMed, a collection of small-to-large end-to-end ASR models for the medical domain, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese, together with the corresponding real-world ASR dataset. |
KHAI LE-DUC et. al. | arxiv-cs.CL | 2024-09-21 |
34 | A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the ASR model propagates its errors to the retriever. In this work, we try to alleviate these limitations by proposing an ASR-free, end-to-end trained multimodal dense retriever that can work directly on spoken questions. |
Georgios Sidiropoulos; Evangelos Kanoulas; | arxiv-cs.CL | 2024-09-20 |
35 | Fast Streaming Transducer ASR Prototyping Via Knowledge Distillation with Whisper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs in their entirety with pseudo-labeled (PL) speech from foundational speech models (FSM). |
IULIIA THORBECKE et. al. | arxiv-cs.CL | 2024-09-20 |
36 | Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the present work, we conduct a set of experiments around zero-shot learning with synthetic speech data for the specific task of speech commands classification. |
Sebastião Quintas; Isabelle Ferrané; Thomas Pellegrini; | arxiv-cs.SD | 2024-09-19 |
37 | Personalized Speech Recognition for Children with Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We devised a novel ASR pipeline to apply unsupervised test-time adaptation (TTA) methods for child speech recognition, so that ASR models pre-trained on adult speech can be continuously adapted to each child speaker at test time without further human annotations. |
Zhonghao Shi; Harshvardhan Srivastava; Xuan Shi; Shrikanth Narayanan; Maja J. Matarić; | arxiv-cs.LG | 2024-09-19 |
38 | ASR Benchmarking: Need for A More Representative Conversational Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a multilingual conversational dataset, derived from TalkBank, consisting of unstructured phone conversation between adults. |
Gaurav Maheshwari; Dmitry Ivanov; Théo Johannet; Kevin El Haddad; | arxiv-cs.CL | 2024-09-18 |
39 | M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose M2R-whisper, a novel multi-stage and multi-scale retrieval augmentation approach designed to enhance ASR performance in low-resource settings. |
JIAMING ZHOU et. al. | arxiv-cs.SD | 2024-09-18 |
40 | Large Language Models Are Strong Audio-Visual Speech Recognition Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the contrary, tasks like visual and audio-visual speech recognition (VSR/AVSR), which also exploit noise-invariant lip movement information, have received little or no attention. To bridge this gap, we propose Llama-AVSR, a new MLLM with strong audio-visual speech recognition capabilities. |
UMBERTO CAPPELLAZZO et. al. | arxiv-cs.CV | 2024-09-18 |
41 | Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To mitigate this issue, we propose a novel channel-aware data simulation method for robust ASR training. |
CHIEN-CHUN WANG et. al. | arxiv-cs.SD | 2024-09-18 |
42 | Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a speech generation system that simulates the L1 shadowing process using voice conversion (VC) techniques and latent speech representations. |
Haopeng Geng; Daisuke Saito; Nobuaki Minematsu; | arxiv-cs.SD | 2024-09-18 |
43 | WER We Stand: Benchmarking Urdu ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive evaluation of Urdu Automatic Speech Recognition (ASR) models. |
SAMEE ARIF et. al. | arxiv-cs.CL | 2024-09-17 |
44 | Chain-of-Thought Prompting for Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM. |
KE HU et. al. | arxiv-cs.CL | 2024-09-17 |
45 | Speech Recognition for Analysis of Police Radio Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. |
Tejes Srivastava; Ju-Chieh Chou; Priyank Shroff; Karen Livescu; Christopher Graziul; | arxiv-cs.SD | 2024-09-16 |
46 | Augmenting Automatic Speech Recognition Models with Disfluency Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an inference-only approach to augment any ASR model with the ability to detect open-set disfluencies. |
Robin Amann; Zhaolin Li; Barbara Bruno; Jan Niehues; | arxiv-cs.CL | 2024-09-16 |
47 | Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore new capabilities in language modeling for speech processing, we introduce the generative speech transcription error correction (GenSEC) challenge. |
CHAO-HAN HUCK YANG et. al. | arxiv-cs.CL | 2024-09-15 |
48 | Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a pioneering effort to investigate the capability of LLMs in transcribing speech in multi-talker environments, following versatile instructions related to multi-talker automatic speech recognition (ASR), target talker ASR, and ASR based on specific talker attributes such as sex, occurrence order, language, and keyword spoken. |
LINGWEI MENG et. al. | arxiv-cs.CL | 2024-09-13 |
49 | Exploring SSL Discrete Tokens for Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a comprehensive comparison of discrete tokens generated by various leading SSL models across multiple language domains. |
MINGYU CUI et. al. | arxiv-cs.CL | 2024-09-13 |
50 | LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often constrained by the capabilities of the speech encoders under varied acoustic conditions, such as accents. To address this, we propose LA-RAG, a novel Retrieval-Augmented Generation (RAG) paradigm for LLM-based ASR. |
SHAOJUN LI et. al. | arxiv-cs.SD | 2024-09-13 |
51 | M$^{3}$V: A Multi-modal Multi-view Approach for Device-Directed Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in practice, these models often produce incorrect predictions for unaligned input pairs due to the unavoidable errors of automatic speech recognition (ASR). To address this challenge, we propose M$^{3}$V, a multi-modal multi-view approach for device-directed speech detection, which frames we frame the problem as a multi-view learning task that introduces unimodal views and a text-audio alignment view in the network besides the multi-modal. |
ANNA WANG et. al. | arxiv-cs.SD | 2024-09-13 |
52 | Full-text Error Correction for Chinese Speech Recognition with Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR). |
Zhiyuan Tang; Dong Wang; Shen Huang; Shidong Shang; | arxiv-cs.CL | 2024-09-12 |
53 | WhisperNER: Unified Open Named Entity and Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition. |
GIL AYACHE et. al. | arxiv-cs.CL | 2024-09-12 |
54 | The Faetar Benchmark: Speech Recognition in A Very Under-Resourced Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition. |
MICHAEL ONG et. al. | arxiv-cs.CL | 2024-09-12 |
55 | Enhancing CTC-Based Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents LiteVSR2, an enhanced version of our previously introduced efficient approach to Visual Speech Recognition (VSR). |
Hendrik Laux; Anke Schmeink; | arxiv-cs.CV | 2024-09-11 |
56 | Linear Time Complexity Conformers with SummaryMixing for Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, this work extends SummaryMixing to a Conformer Transducer that works in both a streaming and an offline mode. |
Titouan Parcollet; Rogier van Dalen; Shucong Zhang; Sourav Batthacharya; | arxiv-cs.SD | 2024-09-11 |
57 | Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. |
Jihyun Lee; Solee Im; Wonjun Lee; Gary Geunbae Lee; | arxiv-cs.CL | 2024-09-10 |
58 | An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of this, we explore in-depth the impact of altering the context list to have words with different frequency distributions on model performance, and meanwhile extend CA with a simple yet effective context-balanced learning objective. A series of experiments conducted on the AISHELL-1 benchmark dataset suggests that using all vocabulary words from the training corpus as the context list and pairing them with our balanced objective yields the best performance, demonstrating a significant reduction in character error rate (CER) by up to 1.21% and a more pronounced 9.44% reduction in the error rate of zero-shot words. |
YI-CHENG WANG et. al. | arxiv-cs.CL | 2024-09-10 |
59 | What Is Lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our research reveals that current text normalization practices, while aiming to standardize ASR outputs for fair comparison, by removing inconsistencies such as variations in spelling, punctuation, and special characters, are fundamentally flawed when applied to Indic scripts. Through empirical analysis using text similarity scores and in-depth linguistic examination, we demonstrate that these flaws lead to artificially improved performance metrics for Indic languages. |
Kavya Manohar; Leena G Pillai; Elizabeth Sherly; | arxiv-cs.CL | 2024-09-04 |
60 | Quantification of Stylistic Differences in Human- and ASR-produced Transcripts of African American English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We categorize the kinds of stylistic differences between 6 transcription versions, 4 human- and 2 ASR-produced, of 10 hours of African American English (AAE) speech. Focusing on verbatim features and AAE morphosyntactic features, we investigate the interactions of these categories with how well transcripts can be compared via word error rate (WER). |
ANNIKA HEUSER et. al. | arxiv-cs.CL | 2024-09-04 |
61 | LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a large-scale far-field overlapping speech dataset, crafted to advance research in speech separation, recognition, and speaker diarization. |
ZENGRUI JIN et. al. | arxiv-cs.SD | 2024-09-01 |
62 | Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the overlapped encoding separation (EncSep) to fully utilize the benefits of the connectionist temporal classification (CTC) and attention hybrid loss. |
Hao Shi; Yuan Gao; Zhaoheng Ni; Tatsuya Kawahara; | arxiv-cs.SD | 2024-09-01 |
63 | Comparing Discrete and Continuous Space LLMs for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervised for both discrete and continuous types. |
Yaoxun Xu; Shi-Xiong Zhang; Jianwei Yu; Zhiyong Wu; Dong Yu; | arxiv-cs.CL | 2024-09-01 |
64 | ProGRes: Prompted Generative Rescoring on ASR N-Best Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. |
Ada Defne Tur; Adel Moumen; Mirco Ravanelli; | arxiv-cs.CL | 2024-08-30 |
65 | Measuring The Accuracy of Automatic Speech Recognition Solutions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: At the same time the DHH community reports serious issues with the accuracy and reliability of ASR. |
Korbinian Kuhn; Verena Kersken; Benedikt Reuter; Niklas Egger; Gottfried Zimmermann; | arxiv-cs.CL | 2024-08-29 |
66 | Speech Recognition Transformers: Topological-lingualism Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper presents a comprehensive survey of transformer techniques oriented in speech modality. |
Shruti Singh; Muskaan Singh; Virender Kadyan; | arxiv-cs.CL | 2024-08-27 |
67 | MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Employing conventional data augmentation for enhancing the noise robustness of summarization models is not feasible either due to the unavailability of sufficient medical dialogue audio recordings and corresponding ASR transcripts. To address this challenge, we propose MEDSAGE, an approach for generating synthetic samples for data augmentation using Large Language Models (LLMs). |
KULUHAN BINICI et. al. | arxiv-cs.CL | 2024-08-26 |
68 | Self-supervised Speech Representations Still Struggle with African American Vernacular English Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American English (MAE). We evaluate four SSL models (wav2vec 2.0, HuBERT, WavLM, and XLS-R) on zero-shot Automatic Speech Recognition (ASR) for these two varieties and find that these models perpetuate the bias in performance against AAVE. |
KALVIN CHANG et. al. | arxiv-cs.CL | 2024-08-26 |
69 | Developing Vocal System Impaired Patient-aimed Voice Quality Assessment Approach Using ASR Representation-included Multiple Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article addresses these challenges by showcasing the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech. This innovative approach aims to estimate voice quality of patients with impaired vocal systems. |
SHAOXIANG DANG et. al. | arxiv-cs.SD | 2024-08-22 |
70 | Towards Measuring Fairness in Speech Recognition: Fair-Speech Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel dataset, Fair-Speech, a publicly released corpus to help researchers evaluate their ASR models for accuracy across a diverse set of self-reported demographic information, such as age, gender, ethnicity, geographic variation and whether the participants consider themselves native English speakers. |
IRINA-ELENA VELICHE et. al. | arxiv-cs.AI | 2024-08-22 |
71 | The State of Commercial Automatic French Legal Speech Recognition Systems and Their Impact on Court Reporters Et Al Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We benchmark three ASR models, including commercial and open-source options, on their ability to recognize French legal speech using a curated dataset. Our study evaluates the performance of these systems using the Word Error Rate (WER) metric and introduces the Sonnex Distance to account for phonetic accuracy. |
Nicolad Garneau; Olivier Bolduc; | arxiv-cs.CL | 2024-08-21 |
72 | Error-preserving Automatic Speech Recognition of Young English Learners� Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To give corrective feedback, which is a crucial part of language learning, the ASR systems in our setting need to preserve the mistakes made by the language learners. In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their mistakes. |
JANICK MICHOT et. al. | acl | 2024-08-20 |
73 | Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we report on a set of experiments aiming at assessing the performance of two parsing paradigms (graph-based parsing and sequence labeling based parsing) on speech parsing. |
Adrien Pupier; Maximin Coavoux; J�r�me Goulian; Benjamin Lecouteux; | acl | 2024-08-20 |
74 | StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. |
SHAOLEI ZHANG et. al. | acl | 2024-08-20 |
75 | Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the error predictors in two ways: first by predicting the errors made by a Switchboard ASR system on unseen data (Fisher), and then using that same predictor to estimate the behavior of an unrelated cloud-based ASR system on a novel task. |
Prashant Serai; Peidong Wang; Eric Fosler-Lussier; | arxiv-cs.AI | 2024-08-20 |
76 | Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn�t Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate what linguistic factors affect the performance of Automatic Speech Recognition (ASR) models. |
Chihiro Taguchi; David Chiang; | acl | 2024-08-20 |
77 | CopyNE: Better Contextual ASR By Copying Named Entities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we treat entities as indivisible wholes and introduce the idea of copying into ASR. |
SHILIN ZHOU et. al. | acl | 2024-08-20 |
78 | XCB: An Effective Contextual Biasing Approach to Bias Cross-lingual Phrases in Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(XCB) module. |
Xucheng Wan; Naijun Zheng; Kai Liu; Huan Zhou; | arxiv-cs.CL | 2024-08-20 |
79 | A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors. This paper proposes a transcription prompt-based audio-LLM by introducing an ASR expert as a transcription tokenizer and a hybrid Autoregressive (AR) Non-autoregressive (NAR) decoding approach to solve the above problems. |
YANGZE LI et. al. | arxiv-cs.SD | 2024-08-18 |
80 | Enhancing Dialogue Speech Recognition with Robust Contextual Awareness Via Noise Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Context Noise Representation Learning (CNRL) to enhance robustness against noisy context, ultimately improving dialogue speech recognition accuracy. |
Wonjun Lee; San Kim; Gary Geunbae Lee; | arxiv-cs.CL | 2024-08-12 |
81 | Audio Enhancement for Computer Audition — An Iterative Training Paradigm Using Sample Importance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. |
Manuel Milling; Shuo Liu; Andreas Triantafyllopoulos; Ilhan Aslan; Björn W. Schuller; | arxiv-cs.SD | 2024-08-12 |
82 | LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a key limitation of this self-supervision lies in its primary focus on acoustic features, with minimal attention to the linguistic properties of the input. To address this gap, we propose Language Informed Test-Time Adaptation (LI-TTA), which incorporates linguistic insights during TTA for ASR. |
Eunseop Yoon; Hee Suk Yoon; John Harvill; Mark Hasegawa-Johnson; Chang D. Yoo; | arxiv-cs.CL | 2024-08-11 |
83 | MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present MooER, a LLM-based large-scale automatic speech recognition (ASR) / automatic speech translation (AST) model of Moore Threads. |
JUNHAO XU et. al. | arxiv-cs.CL | 2024-08-09 |
84 | Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present accent clustering and mining schemes for fair speech recognition systems which can perform equally well on under-represented accented speech. |
JAEYOUNG KIM et. al. | arxiv-cs.SD | 2024-08-05 |
85 | Contextualized Speech Recognition: Rethinking Second-Pass Rescoring with Generative Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a novel framework that diverges from typical second-pass rescoring methods. |
Yixuan Tang; Anthony K. H. Tung; | ijcai | 2024-08-03 |
86 | ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms Using Linguistic Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, AE-based adversarial audio samples are susceptible to ASR updates. In this paper, we identify the root cause of these limitations, namely the inability to construct AE attack samples directly around the decision boundary of deep learning (DL) models. |
PENG CHENG et. al. | arxiv-cs.CR | 2024-08-03 |
87 | MECOS: A Bilingual Manipuri-English Spontaneous Code-switching Speech Corpus for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View |
Naorem Karline Singh; Y. J. Chanu; Hoomexsun Pangsatabam; | Comput. Speech Lang. | 2024-08-01 |
88 | On The Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use the comparison of five different TTS decoder architectures in the scope of synthetic data generation to show the impact on CTC-based speech recognition training. |
Nick Rossenbach; Ralf Schlüter; Sakriani Sakti; | arxiv-cs.CL | 2024-07-31 |
89 | Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach called sentence-wise speech summarization (Sen-SSum), which generates text summaries from a spoken document in a sentence-by-sentence manner. |
KOHEI MATSUURA et. al. | arxiv-cs.CL | 2024-07-31 |
90 | On The Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we evaluate the utility of synthetic data for training automatic speech recognition (ASR). |
Benedikt Hilmes; Nick Rossenbach; and Ralf Schlüter; | arxiv-cs.CL | 2024-07-25 |
91 | Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these advancements, they still struggle to accurately recognize domain specific words, such as proper nouns and technical terminologies. To address this problem, we propose a method to utilize the state-of-the-art Whisper without modifying its architecture, preserving its generalization performance while enabling it to leverage descriptions effectively. |
Jiwon Suh; Injae Na; Woohwan Jung; | arxiv-cs.CL | 2024-07-25 |
92 | Coupling Speech Encoders with Downstream Text Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a modular approach to building cascade speech translation (AST) models that guarantees that the resulting model performs no worse than the 1-best cascade baseline while preserving state-of-the-art speech recognition (ASR) and text translation (MT) performance for a given task. |
Ciprian Chelba; Johan Schalkwyk; | arxiv-cs.CL | 2024-07-24 |
93 | Quantifying The Role of Textual Predictability in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use this method to demonstrate that a Wav2Vec 2.0-based model makes greater stronger use of textual context than a hybrid ASR model, in spite of not using an explicit language model, and also use it to shed light on recent results demonstrating poor performance of standard ASR systems on African-American English. We demonstrate that these mostly represent failures of acoustic–phonetic modelling. |
Sean Robertson; Gerald Penn; Ewan Dunbar; | arxiv-cs.CL | 2024-07-23 |
94 | Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern … |
Rithik Sachdev; Zhong-Qiu Wang; Chao-Han Huck Yang; | arxiv-cs.CL | 2024-07-23 |
95 | DMel: Speech Tokenization Made Simple Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using an LM-style transformer architecture for speech-text modeling, we comprehensively evaluate different speech tokenization methods on speech recognition (ASR) and speech synthesis (TTS). |
HE BAI et. al. | arxiv-cs.CL | 2024-07-22 |
96 | SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an instance of this approach, we present SELM, an audio-conditioned language model for SER that predicts different emotion views. |
Hazim Bukhari; Soham Deshmukh; Hira Dhamyal; Bhiksha Raj; Rita Singh; | arxiv-cs.SD | 2024-07-21 |
97 | Low-Resourced Speech Recognition for Iu Mien Language Via Weakly-Supervised Phoneme-based Multilingual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With less than 10 hours of transcribed Iu Mien language, this paper investigates and compares the three approaches for Iu Mien speech recognition. |
LUKUAN DONG et. al. | arxiv-cs.SD | 2024-07-18 |
98 | Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding By Provenance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic speech recognition (ASR) models trained on large amounts of audio data are now widely used to convert speech to written text in a variety of applications from video captioning to automated assistants used in healthcare and other domains. |
Changye Li; Trevor Cohen; Serguei Pakhomov; | arxiv-cs.CL | 2024-07-18 |
99 | Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present novel approaches that use a generative pretrained transformer (GPT) to identify paraphasias from transcripts as well as two end-to-end approaches that focus on modeling both automatic speech recognition (ASR) and paraphasia classification as multiple sequences vs. a single sequence. |
Matthew Perez; Aneesha Sampath; Minxue Niu; Emily Mower Provost; | arxiv-cs.CL | 2024-07-15 |
100 | Textless Dependency Parsing By Labeled Sequence Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although their effectiveness is shown in capturing acoustic features, it is unclear in capturing lexical knowledge. This paper proposes a textless method for dependency parsing, examining its effectiveness and limitations. |
Shunsuke Kando; Yusuke Miyao; Jason Naradowsky; Shinnosuke Takamichi; | arxiv-cs.CL | 2024-07-14 |
101 | CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer Based Streaming ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present CUSIDE-T, which successfully adapts the CUSIDE method over the recurrent neural network transducer (RNN-T) ASR architecture, instead of being based on the CTC architecture. |
Wenbo Zhao; Ziwei Li; Chuan Yu; Zhijian Ou; | arxiv-cs.SD | 2024-07-14 |
102 | Empowering Whisper As A Joint Multi-Talker and Target-Talker Speech Recognition System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recognition tasks. |
LINGWEI MENG et. al. | arxiv-cs.SD | 2024-07-13 |
103 | HebDB: A Weakly Supervised Dataset for Hebrew Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HebDB, a weakly supervised dataset for spoken language processing in the Hebrew language. |
ARNON TURETZKY et. al. | arxiv-cs.CL | 2024-07-10 |
104 | LearnerVoice: A Dataset of Non-Native English Learners’ Spontaneous Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our linguistic analysis reveals that transcriptions in our dataset contain L2S (L2 learner’s Spontaneous speech) features, consisting of ungrammatical expressions and disfluencies (e.g., filler words, word repetitions, self-repairs, false starts), significantly more than native speech datasets. |
HAECHAN KIM et. al. | arxiv-cs.CL | 2024-07-05 |
105 | Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the development of audio-prompted LLMs there is the potential for even greater control options. In this work we demonstrate that with this greater flexibility the systems can be susceptible to model-control adversarial attacks. |
Vyas Raina; Mark Gales; | arxiv-cs.SD | 2024-07-05 |
106 | TokenVerse: Towards Unifying Speech and NLP Tasks Via Transducer-based ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. |
SHASHI KUMAR et. al. | arxiv-cs.CL | 2024-07-05 |
107 | Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study yields numerous significant findings that we are discussing in this paper. |
Salima Mdhaffar; Haroun Elleuch; Fethi Bougares; Yannick Estève; | arxiv-cs.CL | 2024-07-05 |
108 | Romanization Encoding For Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce romanization encoding for script-heavy languages to optimize multilingual and code-switching Automatic Speech Recognition (ASR) systems. |
WEN DING et. al. | arxiv-cs.CL | 2024-07-05 |
109 | Improving Accented Speech Recognition Using Data Augmentation Based on Unsupervised Text-to-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the use of unsupervised text-to-speech synthesis (TTS) as a data augmentation method to improve accented speech recognition. |
Cong-Thanh Do; Shuhei Imai; Rama Doddipatla; Thomas Hain; | arxiv-cs.CL | 2024-07-04 |
110 | FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). |
KEYU AN et. al. | arxiv-cs.SD | 2024-07-04 |
111 | Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluated three publicly available end-to-end models: Whisper, OWSM 3.1, and SeamlessM4T. |
Tiia Sildam; Andra Velve; Tanel Alumäe; | arxiv-cs.CL | 2024-07-04 |
112 | Improving Self-supervised Pre-training Using Accent-Specific Codebooks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an accent-aware adaptation technique for self-supervised learning that introduces a trainable set of accent-specific codebooks to the self-supervised architecture. |
Darshan Prabhu; Abhishek Gupta; Omkar Nitsure; Preethi Jyothi; Sriram Ganapathy; | arxiv-cs.CL | 2024-07-04 |
113 | Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a layer-adapted fusion (LAF) model, called Qifusion-Net, which does not require any prior knowledge about the target accent. |
Jinming Chen; Jingyi Fang; Yuanzhong Zheng; Yaoxuan Wang; Haojun Fei; | arxiv-cs.SD | 2024-07-03 |
114 | Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs. |
Zhiyuan Tang; Dong Wang; Shen Huang; Shidong Shang; | arxiv-cs.CL | 2024-07-01 |
115 | Less Is More: Accurate Speech Recognition & Translation Without Web-Scale Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that state-of-the art accuracy can be reached without relying on web-scale data. |
KRISHNA C. PUVVADA et. al. | arxiv-cs.CL | 2024-06-28 |
116 | Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ZQ-Attack, a transfer-based adversarial attack on ASR systems in the zero-query black-box setting. |
ZHENG FANG et. al. | arxiv-cs.CR | 2024-06-27 |
117 | Enhanced ASR Robustness to Packet Loss with A Front-End Adaptation Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose using a front-end adaptation network connected to a frozen ASR model. |
Yehoshua Dissen; Shiry Yonash; Israel Cohen; Joseph Keshet; | arxiv-cs.SD | 2024-06-27 |
118 | ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. |
Ahmed Heakl; Youssef Zaghloul; Mennatullah Ali; Rania Hossam; Walid Gomaa; | arxiv-cs.CL | 2024-06-26 |
119 | Automatic Speech Recognition for Hindi Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The final phase of the research tested a neural network for accurately aligning the speech signal to hidden Markov model (HMM) states. This included implementing a novel backpropagation method that utilizes prior statistics of node co-activations. |
Anish Saha; A. G. Ramakrishnan; | arxiv-cs.CL | 2024-06-26 |
120 | Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, we introduce a decoder-only model exclusively designed for streaming recognition, incorporating a dedicated boundary token to facilitate streaming recognition and employing causal attention masking during the training phase. |
Peikun Chen; Sining Sun; Changhao Shan; Qing Yang; Lei Xie; | arxiv-cs.SD | 2024-06-26 |
121 | Dynamic Data Pruning for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works often entail significant overhead to achieve meaningful results. To fill this gap, this paper presents the first investigation of dynamic data pruning for ASR, finding that we can reach the full-data performance by dynamically selecting 70% of data. |
QIAO XIAO et. al. | arxiv-cs.CL | 2024-06-26 |
122 | MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a regularization technique that facilitates the training of visual and audio-visual speech recognition models (VSR and AVSR) from scratch. |
ADRIANA FERNANDEZ-LOPEZ et. al. | arxiv-cs.CV | 2024-06-25 |
123 | A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, several limitations persist, including limited fine-tuning options, a lack of mechanisms to enforce speech-text alignment, and high insertion errors especially in domain mismatch conditions. This paper presents a comprehensive solution to address these issues. |
VAN TUNG PHAM et. al. | arxiv-cs.LG | 2024-06-25 |
124 | SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Classification (CTC) loss as a router in the encoder of SC-MoE to achieve a real-time streaming CS ASR system. |
Shuaishuai Ye; Shunfei Chen; Xinhui Hu; Xinkang Xu; | arxiv-cs.SD | 2024-06-25 |
125 | FASA: A Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When generating datasets, human annotations are not scalable, and existing forced-alignment tools are not usable as they make impractical assumptions about the quality of the input transcriptions. To address these challenges, we propose a new forced-alignment tool, FASA, as a flexible and automatic speech aligner to extract high-quality aligned children’s speech data from many of the existing noisy children’s speech data. |
Dancheng Liu; Jinjun Xiong; | arxiv-cs.CL | 2024-06-25 |
126 | Sequential Editing for Lifelong Training of Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Sequential Model Editing as a novel method to continually learn new domains in ASR systems. |
Devang Kulshreshtha; Saket Dingliwal; Brady Houston; Nikolaos Pappas; Srikanth Ronanki; | arxiv-cs.CL | 2024-06-25 |
127 | Blending LLMs Into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present KIT’s offline submission in the constrained + LLM track by incorporating recently proposed techniques that can be added to any cascaded speech translation. |
SAI KONERU et. al. | arxiv-cs.CL | 2024-06-24 |
128 | Exploring The Capability of Mamba in Speech Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we compared Mamba with state-of-the-art Transformer variants for various speech applications, including ASR, text-to-speech, spoken language understanding, and speech summarization. |
Koichi Miyazaki; Yoshiki Masuyama; Masato Murata; | arxiv-cs.SD | 2024-06-24 |
129 | Perception of Phonological Assimilation By Neural Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds, and identifies the linguistic knowledge that is implemented by the model to compensate for assimilation during Automatic Speech Recognition (ASR). |
Charlotte Pouw; Marianne de Heer Kloots; Afra Alishahi; Willem Zuidema; | arxiv-cs.CL | 2024-06-21 |
130 | Massive End-to-end Speech Recognition Models with Time Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate massive end-to-end automatic speech recognition (ASR) models with efficiency improvements achieved by time reduction. |
WEIRAN WANG et. al. | naacl | 2024-06-20 |
131 | Lost in Transcription: Identifying and Quantifying The Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study evaluates six leading ASRs, analyzing their performance on both a real-world dataset of speech samples from individuals who stutter and a synthetic dataset derived from the widely-used LibriSpeech benchmark. |
DENA MUJTABA et. al. | naacl | 2024-06-20 |
132 | Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. |
Murali Karthick Baskar; Andrew Rosenberg; Bhuvana Ramabhadran; Neeraj Gaur; Zhong Meng; | arxiv-cs.AI | 2024-06-20 |
133 | Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a two-stage method, Contrastive and Consistency Learning (CCL), that correlates error patterns between clean and noisy ASR transcripts and emphasizes the consistency of the latent features of the two transcripts. |
Suyoung Kim; Jiyeon Hwang; Ho-Young Jung; | naacl | 2024-06-20 |
134 | Children’s Speech Recognition Through Discrete Token Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the integration of discrete speech tokens into children’s speech recognition systems as input without significantly degrading the ASR performance. |
Vrunda N. Sukhadia; Shammur Absar Chowdhury; | arxiv-cs.CL | 2024-06-19 |
135 | Joint Vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While traditional approaches take on these tasks separately, we propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture. We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets. |
Alexander Blatt; Aravind Krishnan; Dietrich Klakow; | arxiv-cs.CL | 2024-06-19 |
136 | ManWav: The First Manchu ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In a pioneering effort, we introduce the first-ever Manchu ASR model ManWav, leveraging Wav2Vec2-XLSR-53. |
Jean Seo; Minha Kang; Sungjoo Byun; Sangah Lee; | arxiv-cs.CL | 2024-06-19 |
137 | Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we report on a set of experiments aiming at assessing the performance of two parsing paradigms (graph-based parsing and sequence labeling based parsing) on speech parsing. |
Adrien Pupier; Maximin Coavoux; Jérôme Goulian; Benjamin Lecouteux; | arxiv-cs.CL | 2024-06-18 |
138 | Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose finding task-specific subnetworks within a multi-task SLU model via neural network pruning. |
Hayato Futami; Siddhant Arora; Yosuke Kashiwagi; Emiru Tsunoo; Shinji Watanabe; | arxiv-cs.CL | 2024-06-18 |
139 | Bridging The Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, neural network-based (NN-based) SE often introduces artifacts into the enhanced signals and harms ASR performance, particularly when SE and ASR are independently trained. Therefore, this study introduces a simple yet effective SE post-processing technique to address the gap between various pre-trained SE and ASR models. |
KUAN-CHEN WANG et. al. | arxiv-cs.SD | 2024-06-18 |
140 | CoSTA: Code-Switched Speech Translation Using Aligned Speech-Text Interleaving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. |
Bhavani Shankar; Preethi Jyothi; Pushpak Bhattacharyya; | arxiv-cs.CL | 2024-06-16 |
141 | Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the stealthiness of data poisoning, we propose a non-neural and fast algorithm called Random Spectrogram Rhythm Transformation (RSRT) in this paper. |
Wenhan Yao; Jiangkun Yang; Yongqiang He; Jia Liu; Weiping Wen; | arxiv-cs.SD | 2024-06-16 |
142 | Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Simul-Whisper, which uses the time alignment embedded in Whisper’s cross-attention to guide auto-regressive decoding and achieve chunk-based streaming ASR without any fine-tuning of the pre-trained model. |
Haoyu Wang; Guoqiang Hu; Guodong Lin; Wei-Qiang Zhang; Jian Li; | arxiv-cs.SD | 2024-06-14 |
143 | An Efficient Text Augmentation Approach for Contextualized Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA) technique, all while keeping computational costs minimal. |
Naijun Zheng; Xucheng Wan; Kai Liu; Ziqing Du; Zhou Huan; | arxiv-cs.SD | 2024-06-14 |
144 | Speech ReaLLM — Real-time Streaming Speech Recognition with Multimodal LLMs By Teaching The Flow of Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Speech ReaLLM, a new ASR architecture that marries decoder-only ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. |
FRANK SEIDE et. al. | arxiv-cs.CL | 2024-06-13 |
145 | Speech ReaLLM – Real-time Streaming Speech Recognition with Multimodal LLMs By Teaching The Flow of Time Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We introduce Speech ReaLLM, a new ASR architecture that marriesdecoder-onlyASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the … |
FRANK SEIDE et. al. | ArXiv | 2024-06-13 |
146 | LASER: Learning By Aligning Self-supervised Representations of Speech for Improving Content-related Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent attempts have been made to address this issue with cost-effective self-supervised fine-tuning (SSFT) approaches. Continuing in this direction, a cost-effective SSFT method named LASER: Learning by Aligning Self-supervised Representations is presented. |
Amit Meghanani; Thomas Hain; | arxiv-cs.CL | 2024-06-13 |
147 | EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR. |
ZIYANG ZHUANG et. al. | arxiv-cs.SD | 2024-06-13 |
148 | Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate what linguistic factors affect the performance of Automatic Speech Recognition (ASR) models. |
Chihiro Taguchi; David Chiang; | arxiv-cs.CL | 2024-06-13 |
149 | Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a transcription-free method for joint training using only audio signals. |
WILLIAM RAVENSCROFT et. al. | arxiv-cs.SD | 2024-06-13 |
150 | Training Data Augmentation for Dysarthric Automatic Speech Recognition By Text-to-Dysarthric-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automatic speech recognition (ASR) research has achieved impressive performance in recent years and has significant potential for enabling access for people with dysarthria (PwD) in augmentative and alternative communication (AAC) and home environment systems. |
Wing-Zin Leung; Mattias Cross; Anton Ragni; Stefan Goetze; | arxiv-cs.SD | 2024-06-12 |
151 | Improving Child Speech Recognition with Augmented Child-like Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied … |
Yuanyuan Zhang; Zhengjun Yue; T. Patel; O. Scharenborg; | ArXiv | 2024-06-12 |
152 | Towards Unsupervised Speech Recognition Without Pronunciation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora by proposing the removal of reliance on a phoneme lexicon. |
JUNRUI NI et. al. | arxiv-cs.CL | 2024-06-12 |
153 | ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. |
JIATONG SHI et. al. | arxiv-cs.SD | 2024-06-12 |
154 | PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models. |
TRANG LE et. al. | arxiv-cs.CL | 2024-06-11 |
155 | The Interspeech 2024 Challenge on Speech Processing Using Discrete Units Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper outlines the challenge designs and baseline descriptions. We also collate baseline and selected submission systems, along with preliminary findings, offering valuable contributions to future research in this evolving field. |
XUANKAI CHANG et. al. | arxiv-cs.SD | 2024-06-11 |
156 | AS-70: A Mandarin Stuttered Speech Dataset for Automatic Speech Recognition and Stuttering Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. |
RONG GONG et. al. | arxiv-cs.SD | 2024-06-11 |
157 | Reading Miscue Detection in Primary School Through Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1\%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8\%). |
Lingyun Gao; Cristian Tejedor-Garcia; Helmer Strik; Catia Cucchiarini; | arxiv-cs.CL | 2024-06-11 |
158 | MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling Methods for Learning Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose (i) a Swap method to address pre-training and inference mismatch observed in HuBERT and (ii) incorporates Multicluster masked prediction loss for more effective utilization of the models capacity. |
Hemant Yadav; Sunayana Sitaram; Rajiv Ratn Shah; | arxiv-cs.CL | 2024-06-09 |
159 | Hypernetworks for Personalizing ASR to Atypical Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech. |
Max Müller-Eberstein; Dianna Yee; Karren Yang; Gautam Varma Mantena; Colin Lea; | arxiv-cs.LG | 2024-06-06 |
160 | Improving Zero-Shot Chinese-English Code-Switching ASR with KNN-CTC and Gated Monolingual Datastores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. |
JIAMING ZHOU et. al. | arxiv-cs.CL | 2024-06-06 |
161 | BLSP-Emo: Towards Empathetic Large Speech-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present BLSP-Emo (Bootstrapped Language-Speech Pretraining with Emotion support), a novel approach to developing an end-to-end speech-language model capable of understanding both semantics and emotions in speech and generate empathetic responses. |
CHEN WANG et. al. | arxiv-cs.CL | 2024-06-06 |
162 | Text Injection for Neural Contextual Biasing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes contextual text injection (CTI) to enhance contextual ASR. |
ZHONG MENG et. al. | arxiv-cs.CL | 2024-06-05 |
163 | Error-preserving Automatic Speech Recognition of Young English Learners’ Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To give corrective feedback, which is a crucial part of language learning, the ASR systems in our setting need to preserve the errors made by the language learners. In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their errors. |
JANICK MICHOT et. al. | arxiv-cs.CL | 2024-06-05 |
164 | Discrete Multimodal Transformers with A Pretrained Large Language Model for Mixed-Supervision Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a decoder-only Discrete Multimodal Language Model (DMLM), which can be flexibly applied to multiple tasks (ASR, T2S, S2TT, etc.) and modalities (text, speech, vision). |
VIET ANH TRINH et. al. | arxiv-cs.CL | 2024-06-04 |
165 | Efficiently Train ASR Models That Memorize Less and Perform Better with Per-core Clipping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work systematically investigates the impact of a specific granularity of gradient clipping, namely per-core clip-ping (PCC), across training a wide range of ASR models. |
LUN WANG et. al. | arxiv-cs.CR | 2024-06-04 |
166 | Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition Via Weakly Phonetic Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores the approach of pre-training with weakly phonetic supervision towards data-efficient MCL-ASR, which is called Whistle. |
Saierdaer Yusuyin; Te Ma; Hao Huang; Wenbo Zhao; Zhijian Ou; | arxiv-cs.SD | 2024-06-04 |
167 | Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks, which typically feature a single transcript associated with hours-long audios. |
Ara Yeroyan; Nikolay Karpov; | arxiv-cs.CL | 2024-06-03 |
168 | Pass The Butter: A Study on Desktop-classic Multitasking Robotic Arm Based on Advanced YOLOv7 and BERT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, locally deploying a natural language model (NLP-BERT), and integrating visual recognition (CV-YOLO) and speech recognition technology (ASR-Whisper) as inputs to achieve autonomous decision-making and rational action by the desktop robot. |
HAOHUA QUE et. al. | arxiv-cs.RO | 2024-05-27 |
169 | Denoising LM: Pushing The Limits of Error Correction Models for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Denoising LM (DLM), which is a $\textit{scaled}$ error correction model trained with vast amounts of synthetic data, significantly exceeding prior attempts meanwhile achieving new state-of-the-art ASR performance. |
ZIJIN GU et. al. | arxiv-cs.LG | 2024-05-24 |
170 | Let’s Fuse Step By Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Generative Fusion Decoding (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR). |
CHAN-JAN HSU et. al. | arxiv-cs.CL | 2024-05-23 |
171 | You Don’t Understand Me!: Comparing ASR Results for L1 and L2 Speakers of Swedish IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus on the gap in performance between recognition results for native and non-native, read and spontaneous, Swedish utterances transcribed by different ASR services. |
Ronald Cumbal; Birger Moell; Jose Lopes; Olof Engwall; | arxiv-cs.CL | 2024-05-22 |
172 | A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This prevents the human users to interrupt the robot, which limits speech-based human-robot interaction. To enable a more natural interaction which allows for such interruptions, we propose an audio processing pipeline for filtering out robot’s ego speech using only a single-channel microphone. |
Yue Li; Florian A. Kunneman; Koen V. Hindriks; | arxiv-cs.HC | 2024-05-22 |
173 | Non-autoregressive Real-time Accent Conversion Model with Voice Cloning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have developed the non-autoregressive model for real-time accent conversion with voice cloning. |
Vladimir Nechaev; Sergey Kosyakov; | arxiv-cs.SD | 2024-05-21 |
174 | Listen Again and Choose The Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ClozeGER, a new paradigm for ASR generative error correction. |
YUCHEN HU et. al. | arxiv-cs.CL | 2024-05-16 |
175 | Towards Evaluating The Robustness of Automatic Speech Recognition Systems Via Audio Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an attack on ASR systems based on user-customized style transfer. |
WEIFEI JIN et. al. | arxiv-cs.SD | 2024-05-15 |
176 | I Know What You Mean: Context-Aware Recognition to Enhance Speech-Based Games Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent advances in language processing and speech recognition open up a large opportunity for video game companies to embrace voice interaction as an intuitive feature and … |
Nima Zargham; Mohamed Lamine Fetni; Laura Spillner; Thomas Muender; Rainer Malaka; | Proceedings of the CHI Conference on Human Factors in … | 2024-05-11 |
177 | Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple yet effective method to learn a universal acoustic realization of Whisper’s $\texttt{<|endoftext|>}$ token, which, when prepended to any speech signal, encourages the model to ignore the speech and only transcribe the special token, effectively `muting’ the model. |
Vyas Raina; Rao Ma; Charles McGhee; Kate Knill; Mark Gales; | arxiv-cs.CL | 2024-05-09 |
178 | Lost in Transcription: Identifying and Quantifying The Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study evaluates six leading ASRs, analyzing their performance on both a real-world dataset of speech samples from individuals who stutter and a synthetic dataset derived from the widely-used LibriSpeech benchmark. |
DENA MUJTABA et. al. | arxiv-cs.CL | 2024-05-09 |
179 | The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. |
JINGGUANG TIAN et. al. | arxiv-cs.SD | 2024-05-08 |
180 | Open Implementation and Study of BEST-RQ for Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we describe a re-implementation of a Random-projection quantizer and perform a preliminary study with a comparison to wav2vec 2.0 on four downstream tasks. |
Ryan Whetten; Titouan Parcollet; Marco Dinarelli; Yannick Estève; | arxiv-cs.CL | 2024-05-07 |
181 | Mixat: A Data Set of Bilingual Emirati-English Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Mixat: a dataset of Emirati speech code-mixed with English. |
Maryam Al Ali; Hanan Aldarmaki; | arxiv-cs.CL | 2024-05-04 |
182 | Unveiling The Potential of LLM-Based ASR on Chinese Open-Source Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules in the context of the speech foundation encoder-LLM ASR paradigm. |
XUELONG GENG et. al. | arxiv-cs.SD | 2024-05-03 |
183 | Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models. |
FRANCISCO TEIXEIRA et. al. | arxiv-cs.LG | 2024-05-02 |
184 | Low-resource Speech Recognition and Dialect Identification of Irish in A Multi-task Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID). |
Liam Lonergan; Mengjie Qian; Neasa Ní Chiaráin; Christer Gobl; Ailbhe Ní Chasaide; | arxiv-cs.CL | 2024-05-02 |
185 | Efficient Compression of Multitask Multilingual Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the model still underperforms on a non-negligible number of under-represented languages, a problem exacerbated in smaller model versions. In this work, we examine its limitations, demonstrating the presence of speaker-related (gender, age) and model-related (resourcefulness and model size) bias. |
Thomas Palmeira Ferraz; | arxiv-cs.CL | 2024-05-01 |
186 | Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. |
Dongyuan Li; Ying Zhang; Yusong Wang; Funakoshi Kataro; Manabu Okumura; | arxiv-cs.SD | 2024-05-01 |
187 | Confides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. |
Sunwoo Ha; Chaehun Lim; R. Jordan Crouser; Alvitta Ottley; | arxiv-cs.HC | 2024-04-30 |
188 | Child Speech Recognition in Human-Robot Interaction: Problem Solved? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. |
RUBEN JANSSENS et. al. | arxiv-cs.CL | 2024-04-26 |
189 | Automatic Speech Recognition System-Independent Word Error Rate Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. |
Chanho Park; Mingjie Chen; Thomas Hain; | arxiv-cs.CL | 2024-04-25 |
190 | Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Killkan, the first dataset for automatic speech recognition (ASR) in the Kichwa language, an indigenous language of Ecuador. |
Chihiro Taguchi; Jefferson Saransig; Dayana Velásquez; David Chiang; | arxiv-cs.CL | 2024-04-23 |
191 | Semantically Corrected Amharic Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa. |
Samuael Adnew; Paul Pu Liang; | arxiv-cs.CL | 2024-04-20 |
192 | Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a multi-task audio source separation (MTASS) based ASR model called JRSV, which Jointly Recognizes Speech and singing Voices. |
Ye Bai; Chenxing Li; Hao Li; Yuanyuan Zhao; Xiaorui Wang; | arxiv-cs.SD | 2024-04-17 |
193 | Task Vector Algebra for ASR Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Vector representations of text and speech signals such as word2vec and wav2vec are used commonly in automatic speech recognition (ASR) and spoken language understanding systems. … |
Gowtham Ramesh; Kartik Audhkhasi; B. Ramabhadran; | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
194 | Automatic Speech Recognition Tuned for Child Speech in The Classroom Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: K-12 school classrooms have proven to be a challenging environment for Automatic Speech Recognition (ASR) systems, both due to background noise and conversation, and differences … |
ROSY SOUTHWELL et. al. | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
195 | Extending Large Language Models for Speech and Audio Captioning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multimodal large language models (LLMs) have shown promising visual perception abilities by connecting with image encoders, but their performance on auditory tasks has not yet … |
CHANGLI TANG et. al. | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
196 | Generalization of Self-Supervised Learning-Based Representations for Cross-Domain Speech Emotion Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Self-supervised learning (SSL) from unlabelled speech data has revolutionized speech representation learning. Among them, wavLM, wav2vec2, HuBERT, and Data2vec have produced … |
Abinay Reddy Naini; Mary A. Kohler; Elizabeth Richerson; Donita Robinson; Carlos Busso; | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
197 | Exploring Adapters with Conformers for Children’s Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The high variability in acoustic, pronunciation, and linguistic characteristics of children’s speech makes of children’s automatic speech recognition (ASR) a complex task. … |
Thomas Rolland; Alberto Abad; | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
198 | Enhancing Two-Stage Finetuning for Speech Emotion Recognition Using Adapters Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study investigates the effective finetuning of a pretrained model using adapters for speech emotion recognition (SER). Since emotion is related with linguistic and prosodic … |
Yuan Gao; Hao Shi; Chenhui Chu; Tatsuya Kawahara; | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
199 | Train Long and Test Long:Leveraging Full Document Contexts in Speech Processing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The quadratic memory complexity of self-attention has generally restricted Transformer-based models to utterance-based speech processing, preventing models from leveraging … |
William Chen; Takatomo Kano; A. Ogawa; Marc Delcroix; Shinji Watanabe; | ICASSP 2024 – 2024 IEEE International Conference on … | 2024-04-14 |
200 | Automatic Speech Recognition Advancements for Indigenous Languages of The Americas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the fine-tuning of a state-of-the-art ASR model for each target language, using approximately 36.65 h of transcribed speech data from diverse sources enriched with data augmentation methods. |
Monica Romero; Sandra Gomez; Ivan G. Torre; | arxiv-cs.CL | 2024-04-12 |
201 | An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss reweighting, leveraging distinct SSL-based embedding features. |
Tien-Hong Lo; Fu-An Chao; Tzu-I Wu; Yao-Ting Sung; Berlin Chen; | arxiv-cs.SD | 2024-04-11 |
202 | VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in The Medical Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present VietMed – a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. |
Khai Le-Duc; | arxiv-cs.CL | 2024-04-08 |
203 | Mai Ho’omāuna I Ka ‘Ai: Language Models Improve Automatic Speech Recognition in Hawaiian Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we address the challenge of improving Automatic Speech Recognition (ASR) for a low-resource language, Hawaiian, by incorporating large amounts of independent text data into an ASR foundation model, Whisper. |
Kaavya Chaparala; Guido Zarrella; Bruce Torres Fischer; Larry Kimura; Oiwi Parker Jones; | arxiv-cs.CL | 2024-04-03 |
204 | BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data. |
Alexandros Haliassos; Andreas Zinonos; Rodrigo Mira; Stavros Petridis; Maja Pantic; | arxiv-cs.CV | 2024-04-02 |
205 | Noise Masking Attacks and Defenses for Pretrained Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They show that when a record has been seen at training time, the model will transcribe the noisy record with its memorized sensitive transcript. In our work, we extend these attacks beyond ASR models, to attack pretrained speech encoders. |
Matthew Jagielski; Om Thakkar; Lun Wang; | arxiv-cs.LG | 2024-04-02 |
206 | Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Emotion Neural Transducer for fine-grained speech emotion recognition with automatic speech recognition (ASR) joint training. |
Siyuan Shen; Yu Gao; Feng Liu; Hanyang Wang; Aimin Zhou; | arxiv-cs.SD | 2024-03-28 |
207 | Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel method combining multi-modal and multi-task unsupervised pre-training with a translation-based supervised mid-training approach. |
YASH JAIN et. al. | arxiv-cs.CL | 2024-03-28 |
208 | DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as the named entity (NE) list grows, the problems of phonetic confusion in the NE list are exacerbated; for example, homophone ambiguities increase substantially. In view of this, we proposed a novel Description Augmented Named entity CorrEctoR (dubbed DANCER), which leverages entity descriptions to provide additional information to facilitate mitigation of phonetic confusion for NEC on ASR transcription. |
Yi-Cheng Wang; Hsin-Wei Wang; Bi-Cheng Yan; Chi-Han Lin; Berlin Chen; | arxiv-cs.CL | 2024-03-26 |
209 | More Than Words: Advancements and Challenges in Speech Recognition for Singing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the challenges and advancements in speech recognition for singing, a domain distinctly different from standard speech recognition. |
Anna Kruspe; | arxiv-cs.SD | 2024-03-14 |
210 | A Review on Gujarati Language Based Automatic Speech Recognition (ASR) Systems Related Papers Related Patents Related Grants Related Venues Related Experts View |
Mohit Dua; Bhavesh Bhagat; Shelza Dua; N. Chakravarty; | Int. J. Speech Technol. | 2024-03-12 |
211 | Automatic Speech Recognition (ASR) for The Diagnosis of Pronunciation of Speech Sound Disorders in Korean Children Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. |
TAEKYUNG AHN et. al. | arxiv-cs.CL | 2024-03-12 |
212 | SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation. |
Jiayu Du; Jinpeng Li; Guoguo Chen; Wei-Qiang Zhang; | arxiv-cs.CL | 2024-03-12 |
213 | Towards Decoupling Frontend Enhancement and Backend Recognition in Monaural Robust ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The divide between SE and ASR impedes the progress of robust ASR systems, especially as SE has made major advances in recent years. This paper focuses on eliminating this divide with an ARN (attentive recurrent network) time-domain and a CrossNet time-frequency domain enhancement models. |
Yufeng Yang; Ashutosh Pandey; DeLiang Wang; | arxiv-cs.SD | 2024-03-10 |
214 | SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents a cost-effective SSFT method named Self-supervised Correspondence (SCORE) fine-tuning to adapt the SSL speech representations for content-related tasks. |
Amit Meghanani; Thomas Hain; | arxiv-cs.CL | 2024-03-10 |
215 | A New Benchmark for Evaluating Automatic Speech Recognition in The Arabic Call Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications. |
QUSAI ABO OBAIDAH et. al. | arxiv-cs.AI | 2024-03-07 |
216 | Kirigami Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-based human activity recognition (HAR) is very popular because many human activities have unique sound signatures that can be detected using machine learning (ML) … |
Sudershan Boovaraghavan; Haozhe Zhou; Mayank Goel; Yuvraj Agarwal; | Proceedings of the ACM on Interactive, Mobile, Wearable and … | 2024-03-06 |
217 | PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: A major drawback of supervised speech separation (SSep) systems is their reliance on synthetic data, leading to poor real-world generalization. Mixture invariant training (MixIT) … |
Joonas Kalda; Clément Pagés; R. Marxer; Tanel Alumäe; Hervé Bredin; | The Speaker and Language Recognition Workshop | 2024-03-04 |
218 | Automatic Speech Recognition Using Advanced Deep Learning Approaches: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This survey offers a comprehensive review of DTL, FL, and RL-based ASR frameworks, aiming to provide insights into the latest developments and aid researchers and professionals in understanding the current challenges. |
Hamza Kheddar; Mustapha Hemis; Yassine Himeur; | arxiv-cs.SD | 2024-03-02 |
219 | Towards Inclusive Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Siyuan Feng; B. Halpern; O. Kudina; O. Scharenborg; | Comput. Speech Lang. | 2024-03-01 |
220 | Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach, post-decoder biasing, which constructs a transform probability matrix based on the distribution of training transcriptions. |
Heyang Liu; Yu Wang; Yanfeng Wang; | arxiv-cs.CL | 2024-03-01 |
221 | Probing The Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following many researches in neural networks interpretability, we propose in this article a protocol that aims to determine which and where information is located in an ASR acoustic model (AM). |
Quentin Raymondaud; Mickael Rouvier; Richard Dufour; | arxiv-cs.SD | 2024-02-29 |
222 | Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. |
Jeehyun Lee; Yerin Choi; Tae-Jin Song; Myoung-Wan Koo; | arxiv-cs.CL | 2024-02-29 |
223 | Exploration of Adapter for Noise Robust Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study thoroughly investigates adapter-based ASR adaptation in noisy environments. |
Hao Shi; Tatsuya Kawahara; | arxiv-cs.SD | 2024-02-28 |
224 | Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets, encompassing 19 languages from eight language families and two speaking conditions. |
Giuseppe Attanasio; Beatrice Savoldi; Dennis Fucci; Dirk Hovy; | arxiv-cs.CL | 2024-02-27 |
225 | Large Language Models Are Efficient Learners of Noise-Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER just like what robust ASR do, where one solution is introducing noise information as a conditioner into LLM.The latest work proposes a GER benchmark with HyPoradise dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by efficient LLM finetuning, which shows great effectiveness but lacks specificity on noise-robust ASR. |
YUCHEN HU et. al. | iclr | 2024-02-26 |
226 | An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus exclusively on improving the acoustic encoder of E2E ASR to tackle the challenge caused by the codeswitching phenomenon. |
Tzu-Ting Yang; Hsin-Wei Wang; Yi-Cheng Wang; Chi-Han Lin; Berlin Chen; | arxiv-cs.CL | 2024-02-26 |
227 | It’s Never Too Late: Fusing Acoustic Information Into Large Language Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite its effectiveness, GER introduces extra data uncertainty since the LLM is trained without taking into account acoustic information available in the speech signal. In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF). |
CHEN CHEN et. al. | iclr | 2024-02-26 |
228 | LipVoicer: Generating Speech from Silent Videos Guided By Lip Reading Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present LipVoicer, a novel method that generates high-quality speech, even for in-the-wild and rich datasets, by incorporating the text modality. |
Yochai Yemini; Aviv Shamsian; Lior Bracha; Sharon Gannot; Ethan Fetaya; | iclr | 2024-02-26 |
229 | Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are invoked and their placement in memory. |
YANG LI et. al. | arxiv-cs.SD | 2024-02-20 |
230 | Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose the multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs. |
Qiushi Zhu; Jie Zhang; Yu Gu; Yuchen Hu; Lirong Dai; | aaai | 2024-02-20 |
231 | OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC). |
Yifan Peng; Yui Sudo; Muhammad Shakeel; Shinji Watanabe; | arxiv-cs.CL | 2024-02-19 |
232 | An Embarrassingly Simple Approach for LLM with Strong ASR Capacity IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). |
ZIYANG MA et. al. | arxiv-cs.CL | 2024-02-13 |
233 | The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research represents a pioneering effort in quantifying biases in the Portuguese language context through the application of MMS and Whisper, contributing to a better understanding of ASR systems’ performance in multilingual settings. |
Ajinkya Kulkarni; Anna Tokareva; Rameez Qureshi; Miguel Couceiro; | arxiv-cs.CL | 2024-02-12 |
234 | A Comprehensive Study of The Current State-of-the-Art in Nepali Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). |
Rupak Raj Ghimire; Bal Krishna Bal; Prakash Poudyal; | arxiv-cs.SD | 2024-02-05 |
235 | Streaming Sequence Transduction Through Dynamic Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. |
WEITING TAN et. al. | arxiv-cs.CL | 2024-02-02 |
236 | AccentFold: A Journey Through African Accents for Zero-Shot ASR Adaptation to Target Accents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While previous approaches have focused on modeling techniques or creating accented speech datasets, gathering sufficient data for the multitude of accents, particularly in the African context, remains impractical due to their sheer diversity and associated budget constraints. To address these challenges, we propose AccentFold, a method that exploits spatial relationships between learned accent embeddings to improve downstream Automatic Speech Recognition (ASR). |
Abraham Toluwase Owodunni; Aditya Yadavalli; Chris Chinenye Emezue; Tobi Olatunji; Clinton C Mbataku; | arxiv-cs.CL | 2024-02-02 |
237 | Digits Micro-model for Accurate and Secure Transactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present our work on creating micro models for multi-digit number recognition that handle diverse speaking styles reflecting real-world pronunciation patterns. |
Chirag Chhablani; Nikhita Sharma; Jordan Hosier; Vijay K. Gurbani; | arxiv-cs.LG | 2024-02-02 |
238 | Exploring The Limits of Decoder-only Models Trained on Public Speech Recognition Corpora Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate factors such as choice of training datasets and modeling components necessary for obtaining the best performance using public English ASR corpora alone. |
Ankit Gupta; George Saon; Brian Kingsbury; | arxiv-cs.CL | 2024-01-31 |
239 | Improving ASR Performance with OCR Through Using Word Frequency Difference IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) … |
Kyudan Jung; Seungmin Bae; N. Kim; Hyun Gon Ryu; Hyuk-Jae Lee; | 2024 International Conference on Electronics, Information, … | 2024-01-28 |
240 | Byte Pair Encoding Is All You Need For Automatic Bengali Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent research highlights the dependency of BPE subword tokenization’s efficacy on the morphological nature of the language, particularly in languages rich in inflectional morphology, where fewer BPE merges suffice for generating highly productive tokens. Motivated by this, our study empirically identifies the optimal number of BPE tokens for Bengali, a language known for its morphological complexity, thus enhancing out-of-distribution automatic speech recognition (ASR) performance. |
Ahnaf Mozib Samin; | arxiv-cs.CL | 2024-01-27 |
241 | Toward Practical Automatic Speech Recognition and Post-Processing: A Call for Explainable Error Benchmark Guideline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose the development of an Error Explainable Benchmark (EEB) dataset. |
SEONMIN KOO et. al. | arxiv-cs.CL | 2024-01-25 |
242 | SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. |
CHYI-JIUNN LIN et. al. | arxiv-cs.CL | 2024-01-24 |
243 | MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker’s emotion, with the text … |
Jiajun He; Xiaohan Shi; Xingfeng Li; Tomoki Toda; | arxiv-cs.CL | 2024-01-24 |
244 | Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware. |
W. RONNY HUANG et. al. | arxiv-cs.CL | 2024-01-23 |
245 | Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers. |
Michael Hentschel; Yuta Nishikawa; Tatsuya Komatsu; Yusuke Fujita; | arxiv-cs.CL | 2024-01-22 |
246 | Using Large Language Model for End-to-End Chinese ASR and NER Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach, however, has received less attention in the literature. In this work, we connect the Whisper encoder with ChatGLM3 and provide in-depth comparisons of these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks. |
YUANG LI et. al. | arxiv-cs.CL | 2024-01-20 |
247 | SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct SlideAVSR, an AVSR dataset using scientific paper explanation videos. |
Hao Wang; Shuhei Kurita; Shuichiro Shimizu; Daisuke Kawahara; | arxiv-cs.CV | 2024-01-18 |
248 | Joint Unsupervised and Supervised Training for Automatic Speech Recognition Via Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. |
A F M SAIF et. al. | arxiv-cs.CL | 2024-01-13 |
249 | LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to rare phrase lists, the slides within videos are synchronized in real-time with the speech, enabling the extraction of long contextual bias. Therefore, we propose a novel long-context biasing network (LCB-net) for audio-visual speech recognition (AVSR) to leverage the long-context information available in videos effectively. |
Fan Yu; Haoxu Wang; Xian Shi; Shiliang Zhang; | arxiv-cs.SD | 2024-01-12 |
250 | XLS-R Deep Learning Model for Multilingual ASR on Low- Resource Languages: Indonesian, Javanese, and Sundanese Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This research paper focuses on the development and evaluation of Automatic Speech Recognition (ASR) technology using the XLS-R 300m model. The study aims to improve ASR … |
Panji Arisaputra; Alif Tri Handoyo; Amalia Zahra; | ArXiv | 2024-01-12 |
251 | UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. |
JIAXIN GUO et. al. | arxiv-cs.CL | 2024-01-11 |
252 | Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: \textbf{Objectives}: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the “Cookie Theft” picture description task. |
Changye Li; Weizhe Xu; Trevor Cohen; Serguei Pakhomov; | arxiv-cs.CL | 2024-01-10 |
253 | High-precision Voice Search Query Correction Via Retrievable Speech-text Embedings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, ASR-hypothesis-based retrieval can yield poor precision if the textual hypotheses are too phonetically dissimilar to the transcript truth. In this paper, we eliminate the hypothesis-audio mismatch problem by querying the correction database directly using embeddings derived from the utterance audio; the embeddings of the utterance audio and candidate corrections are produced by multimodal speech-text embedding networks trained to place the embedding of the audio of an utterance and the embedding of its corresponding textual transcript close together. |
CHRISTOPHER LI et. al. | arxiv-cs.CL | 2024-01-08 |
254 | An Audio-quality-based Multi-strategy Approach for Target Speaker Extraction in The MISP 2023 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. |
RUNDUO HAN et. al. | arxiv-cs.SD | 2024-01-08 |
255 | Cross-Speaker Encoding Network for Multi-Talker Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a Cross-Speaker Encoding (CSE) network to address the limitations of SIMO models by aggregating cross-speaker representations. |
JIAWEN KANG et. al. | arxiv-cs.SD | 2024-01-08 |
256 | ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. |
HE WANG et. al. | arxiv-cs.SD | 2024-01-07 |
257 | MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current studies mainly focus on fusing the well-learned modality features, like the output of modality-specific encoders, without considering the contextual relationship during the modality feature learning. In this study, we propose a multi-layer cross-attention fusion based AVSR (MLCA-AVSR) approach that promotes representation learning of each modality by fusing them at different levels of audio/visual encoders. |
He Wang; Pengcheng Guo; Pan Zhou; Lei Xie; | arxiv-cs.SD | 2024-01-07 |
258 | Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce a method that utilizes the ASR system’s lattice output instead of relying solely on the top hypothesis, aiming to encapsulate speech ambiguities and enhance SLU outcomes. |
KEVIN EVERSON et. al. | arxiv-cs.CL | 2024-01-05 |
259 | An Approach for Speech Enhancement in Low SNR Environments Using Granular Speaker Embedding Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The proliferation of speech technology applications has led to an unprecedented demand for effective speech enhancement techniques, particularly in low Signal-to-Noise Ratio (SNR) … |
Jayasree Saha; Rudrabha Mukhopadhyay; A. Agrawal; Surabhi Jain; C. V. Jawahar; | Proceedings of the 7th Joint International Conference on … | 2024-01-04 |
260 | Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models. To address this, we propose a perturbation-based method for assessing the susceptibility of an automatic speech recognition (ASR) model to hallucination at test time, which does not require access to the training dataset. |
Rita Frieske; Bertram E. Shi; | arxiv-cs.CL | 2024-01-03 |
261 | Arabic Speech Recognition: Advancement and Challenges Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition is a captivating process that revolutionizes human-computer interactions, allowing us to interact and control machines through spoken commands. The foundation … |
ASHIFUR RAHMAN et. al. | IEEE Access | 2024-01-01 |
262 | Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We investigate state-of-the-art automatic speech recognition (ASR) systems and provide thorough investigations on training methods to adapt them to low-resourced electrolaryngeal … |
Lester Phillip Violeta; D. Ma; Wen-Chin Huang; T. Toda; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2024-01-01 |
263 | Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE … |
Hao Shi; M. Mimura; Tatsuya Kawahara; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2024-01-01 |
264 | Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition Using Adversarial Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper presents an extensive comparative study of various data augmentation approaches to improve the robustness of pre-trained ASR model fine-tuning to dysarthric speech. |
HUIMENG WANG et. al. | arxiv-cs.SD | 2023-12-31 |
265 | KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional evaluation metrics for ASR systems produce a singular aggregate score, which is insufficient for understanding specific system vulnerabilities. Therefore, we aim to address the limitations of the previous ASR evaluation methods by introducing the Korean Error Explainable Benchmark Dataset for ASR and Post-processing (KEBAP). |
SEONMIN KOO et. al. | emnlp | 2023-12-22 |
266 | Accented Speech Recognition With Accent-specific Codebooks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. |
Darshan Prabhu; Preethi Jyothi; Sriram Ganapathy; Vinit Unni; | emnlp | 2023-12-22 |
267 | Back Transcription As A Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models. |
Marek Kubis; Pawel Sk�rzewski; Marcin Sowannski; Tomasz Zietkiewicz; | emnlp | 2023-12-22 |
268 | Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR). |
SRIJITH RADHAKRISHNAN et. al. | emnlp | 2023-12-22 |
269 | CS2W: A Chinese Spoken-to-Written Style Conversion Dataset with Multiple Conversion Types Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the availability of datasets for this is limited. To address this issue, we present CS2W, a Chinese Spoken-to-Written style conversion dataset comprising 7,237 spoken sentences extracted from transcribed conversational texts. |
Zishan Guo; Linhao Yu; Minghui Xu; Renren Jin; Deyi Xiong; | emnlp | 2023-12-22 |
270 | Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we attempt to resolve structurally ambiguous utterances into unambiguous texts in Indonesian using prosodic information. |
RUHIYAH WIDIAPUTRI et. al. | emnlp | 2023-12-22 |
271 | CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address this robustness problem in downstream MT models by forcing the MT encoder to bring the representations of a noisy input closer to its clean version in the semantic space. This is achieved by introducing a contrastive learning method that leverages adversarial examples in the form of ASR outputs paired with their corresponding human transcripts to optimize the network parameters. |
Sathish Indurthi; Shamil Chollampatt; Ravi Agrawal; Marco Turchi; | emnlp | 2023-12-22 |
272 | Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach, that builds on a pre-trained ASR model and extends it with an adaptive upstream module, that fuses audio and visual information. |
Christopher Simic; Tobias Bocklet; | arxiv-cs.SD | 2023-12-21 |
273 | Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2.90% relative reduction in WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning. |
ANIRUDH S. SUNDAR et. al. | arxiv-cs.LG | 2023-12-21 |
274 | KNN-CTC: Enhancing ASR Via Retrieval of CTC Pseudo Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The success of retrieval-augmented language models in various natural language processing (NLP) tasks has been constrained in automatic speech recognition (ASR) applications due to challenges in constructing fine-grained audio-text datastores. This paper presents kNN-CTC, a novel approach that overcomes these challenges by leveraging Connectionist Temporal Classification (CTC) pseudo labels to establish frame-level audio-text key-value pairs, circumventing the need for precise ground truth alignments. |
JIAMING ZHOU et. al. | arxiv-cs.SD | 2023-12-20 |
275 | SpokesBiz — An Open Corpus of Conversational Polish Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We outline the general structure and content of the corpus, showcasing selected applications in linguistic research, evaluation and improvement of automatic speech recognition (ASR) systems |
PIOTR PĘZIK et. al. | arxiv-cs.CL | 2023-12-19 |
276 | AdaStreamLite Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to … |
YUHENG WEI et. al. | Proceedings of the ACM on Interactive, Mobile, Wearable and … | 2023-12-19 |
277 | SpokesBiz – An Open Corpus of Conversational Polish Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper announces the early release of SpokesBiz, a freely available corpus of conversational Polish developed within the CLARIN-BIZ project and comprising over 650 hours of … |
PIOTR PEZIK et. al. | ArXiv | 2023-12-19 |
278 | Seq2seq for Automatic Paraphasia Detection in Aphasic Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel, sequence-to-sequence (seq2seq) model that is trained end-to-end (E2E) to perform both ASR and paraphasia detection tasks. |
MATTHEW PEREZ et. al. | arxiv-cs.SD | 2023-12-16 |
279 | Parameter-Efficient Cross-Language Transfer Learning for A Language-Modular Audiovisual Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In audiovisual speech recognition (AV-ASR), for many languages only few audiovisual data is available. Building upon an English model, in this work, we first apply and analyze … |
ZHENGYANG LI et. al. | 2023 IEEE Automatic Speech Recognition and Understanding … | 2023-12-16 |
280 | Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transfer learning from large multilingual pretrained models, like XLSR, has become the new paradigm for Automatic Speech Recognition (ASR). Considering their ever-increasing size, … |
GEOFFROY VANDERREYDT et. al. | 2023 IEEE Automatic Speech Recognition and Understanding … | 2023-12-16 |
281 | Conformer-Based Speech Recognition On Extreme Edge-Computing Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. |
MINGBIN XU et. al. | arxiv-cs.LG | 2023-12-16 |
282 | Leveraging The Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Indonesia is home to roughly 700 languages, which amounts to about ten percent of the global total, positioning it as the second-most linguistically diverse country after Papua … |
S. Sakti; Benita Angela Titalim; | 2023 IEEE Automatic Speech Recognition and Understanding … | 2023-12-16 |
283 | LiteVSR: Efficient Visual Speech Recognition By Learning from Speech Representations of Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model. |
HENDRIK LAUX et. al. | arxiv-cs.CV | 2023-12-15 |
284 | Automatic Channel Selection and Spatial Feature Integration for Multi-channel Speech Recognition Across Various Array Topologies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition performance in real-world environments. Addressing this task, we introduce an ASR system that demonstrates exceptional performance across various array topologies. |
BINGSHEN MU et. al. | arxiv-cs.SD | 2023-12-15 |
285 | On The Compression of Shallow Non-causal ASR Models Using Knowledge Distillation and Tied-and-reduced Decoder for Low-latency On-device Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose shallow cascaded model by combining various model compression techniques such as knowledge distillation, shared decoder, and tied-and-reduced transducer network in order to reduce the model footprint. |
NAGARAJ ADIGA et. al. | arxiv-cs.SD | 2023-12-15 |
286 | Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most past studies have simplified the learning complexity of the model by splitting the code-switching task into multiple tasks dealing with a single language and then learning the domain-specific knowledge of each language separately. Therefore, in this paper, we attempt to introduce language identification information into the middle layer of the ASR model’s encoder. |
Tzu-Ting Yang; Hsin-Wei Wang; Berlin Chen; | arxiv-cs.CL | 2023-12-15 |
287 | Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently audio-visual speech recognition (AVSR), which better leverages video modality as additional information to extend automatic speech recognition (ASR), has shown promising … |
Fan Yu; Haoxu Wang; Ziyang Ma; Shiliang Zhang; | ICASSP 2024 – 2024 IEEE International Conference on … | 2023-12-14 |
288 | FastInject: Injecting Unpaired Text Data Into CTC-Based ASR Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models have achieved impressive results, especially with the … |
Keqi Deng; Phil Woodland; | ICASSP 2024 – 2024 IEEE International Conference on … | 2023-12-14 |
289 | Extending Whisper with Prompt Tuning to Target-speaker ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work leverages prompt tuning, a parameter-efficient fine-tuning approach, to extend Whisper, a large-scale single-talker ASR model, to TS-ASR. |
Hao Ma; Zhiyuan Peng; Mingjie Shao; Jing Li; Ju Liu; | arxiv-cs.CL | 2023-12-13 |
290 | ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, a time-domain recognition-oriented speech enhancement (ROSE) framework is proposed to improve speech intelligibility and also advance ASR accuracy based on convolutional encoder-decoder-based U-Net framework, which serves as a plug-and-play tool in ATC scenarios and does not require additional retraining of the ASR model. |
Xincheng Yu; Dongyue Guo; Jianwei Zhang; Yi Lin; | arxiv-cs.SD | 2023-12-10 |
291 | Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. |
Wonjun Lee; Gary Geunbae Lee; Yunsu Kim; | arxiv-cs.CL | 2023-12-06 |
292 | Taiwanese Hakka Across Taiwan Corpus and Formosa Speech Recognition Challenge 2023 – Hakka ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: To revive the endangered Taiwanese Hakka language, the first large-scale Taiwanese Hakka speech corpus across Taiwan (HAT) was developed, representing modern Taiwanese Hakka … |
YUAN-FU LIAO et. al. | 2023 26th Conference of the Oriental COCOSDA International … | 2023-12-04 |
293 | End-to-End Speech-to-Text Translation: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, researchers have been exploring end-to-end (E2E) models for ST translation. |
Nivedita Sethiya; Chandresh Kumar Maurya; | arxiv-cs.CL | 2023-12-02 |
294 | FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach called FAT-HuBERT, which leverages distortion-invariant self-supervised learning (SSL) to enhance the robustness of ASR. |
Dongning Yang; Wei Wang; Yanmin Qian; | arxiv-cs.SD | 2023-11-29 |
295 | End-to-end Joint Punctuated and Normalized ASR with A Limited Amount of Punctuated Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two approaches to train an end-to-end joint punctuated and normalized ASR system using limited punctuated data. |
Can Cui; Imran Ahamad Sheikh; Mostafa Sadeghi; Emmanuel Vincent; | arxiv-cs.CL | 2023-11-29 |
296 | On The Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional NSER approaches have proven effective in mitigating the impact of artificial noise sources, such as white Gaussian noise, but are limited to non-stationary noises in real-world environments due to their complexity and uncertainty. To overcome this limitation, we introduce a new method for NSER by adopting the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech. |
Xiaohan Shi; Jiajun He; Xingfeng Li; Tomoki Toda; | arxiv-cs.SD | 2023-11-13 |
297 | Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Decoupling and Interacting Multi-task Network (DIMNet) for joint speech and accent recognition, which is comprised of a connectionist temporal classification (CTC) branch, an AR branch, an ASR branch, and a bottom feature encoder. |
Qijie Shao; Pengcheng Guo; Jinghao Yan; Pengfei Hu; Lei Xie; | arxiv-cs.SD | 2023-11-12 |
298 | A Survey of Technologies for Automatic Dysarthric Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View |
Zhaopeng Qian; K. Xiao; Chongchong Yu; | EURASIP Journal on Audio, Speech, and Music Processing | 2023-11-11 |
299 | Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to model speech tokens in an autoregressive way, similar to text. |
QIAN CHEN et. al. | arxiv-cs.CL | 2023-11-08 |
300 | Improved Child Text-to-Speech Synthesis Through Fastpitch-based Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel approach that leverages the Fastpitch text-to-speech (TTS) model for generating high-quality synthetic child speech. |
Rishabh Jain; Peter Corcoran; | arxiv-cs.SD | 2023-11-07 |
301 | Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. |
RABINDRA NATH NANDI et. al. | arxiv-cs.CL | 2023-11-06 |
302 | COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM. |
JING PAN et. al. | arxiv-cs.CL | 2023-11-03 |
303 | Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models Via Language-Specific Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the model still underperforms on a non-negligible number of under-represented languages, a problem exacerbated in smaller model versions. In this work, we propose DistilWhisper, an approach able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities. |
Thomas Palmeira Ferraz; Marcely Zanon Boito; Caroline Brun; Vassilina Nikoulina; | arxiv-cs.CL | 2023-11-02 |
304 | Disordered Speech Recognition Considering Low Resources and Abnormal Articulation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Yuqin Lin; Longbiao Wang; Jianwu Dang; Sheng Li; Chenchen Ding; | Speech Commun. | 2023-11-01 |
305 | Learning Adapters for Code-Switching Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multilingual code-switching speech recognition has been an emerging research direction in real-world applications since most of speakers are bilingual or multilingual. A … |
Chun-Yi He; Jen-Tzung Chien; | 2023 Asia Pacific Signal and Information Processing … | 2023-10-31 |
306 | MUST: A Multilingual Student-Teacher Learning Approach for Low-resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, the aforementioned limitation is addressed by proposing a MUltilingual Student-Teacher (MUST) learning which exploits a posteriors mapping approach. |
Muhammad Umar Farooq; Rehan Ahmad; Thomas Hain; | arxiv-cs.CL | 2023-10-28 |
307 | MADGF: Multi-Agent Data Generation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic Speech Recognition (ASR) systems predominantly cater to monolingual inputs and struggle with the complexity introduced by mixed language audio. In this paper, we present a novel Multi-Agent Data Generation Framework (MADGF) to address this challenge. |
Peng Xie; Kani Chen; | arxiv-cs.SD | 2023-10-27 |
308 | Uncovering Bias in ASR Systems: Evaluating Wav2vec2 and Whisper for Dutch Speakers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: It is crucial that ASR systems can handle the wide range of variations in speech of speakers from different demographic groups, with different speaking styles, and of speakers … |
Márcio Fuckner; Sophie Horsman; Pascal Wiggers; Iskaj Janssen; | 2023 International Conference on Speech Technology and … | 2023-10-25 |
309 | ArTST: Arabic Text and Speech Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. |
Hawau Olamide Toyin; Amirbek Djanibekov; Ajinkya Kulkarni; Hanan Aldarmaki; | arxiv-cs.CL | 2023-10-25 |
310 | Back Transcription As A Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models. |
Marek Kubis; Paweł Skórzewski; Marcin Sowański; Tomasz Ziętkiewicz; | arxiv-cs.CL | 2023-10-25 |
311 | CDSD: Chinese Dysarthria Speech Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. |
MENGYI SUN et. al. | arxiv-cs.SD | 2023-10-24 |
312 | How Much Context Does My Attention-Based ASR System Need? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct an empirical study on the effect of scaling the sequence length used to train/evaluate (dense-attention-based) acoustic models on speech recognition performance. |
Robert Flynn; Anton Ragni; | arxiv-cs.CL | 2023-10-24 |
313 | Hypotheses Paradise: An Open and Strong Baseline for Speech Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. |
CHEN CHEN et. al. | nips | 2023-10-24 |
314 | Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder. |
SARA PAPI et. al. | arxiv-cs.CL | 2023-10-23 |
315 | Intuitive Multilingual Audio-Visual Speech Recognition with A Single-Trained Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. |
Joanna Hong; Se Jin Park; Yong Man Ro; | arxiv-cs.MM | 2023-10-23 |
316 | Conversational Speech Recognition By Learning Audio-textual Cross-modal Contextual Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel conversational ASR system, extending the Conformer encoder-decoder model with cross-modal conversational representation. |
KUN WEI et. al. | arxiv-cs.SD | 2023-10-22 |
317 | BUT CHiME-7 System Description Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the joint effort of Brno University of Technology (BUT), AGH University of Krakow and University of Buenos Aires on the development of Automatic Speech Recognition systems for the CHiME-7 Challenge. |
MARTIN KARAFIÁT et. al. | arxiv-cs.SD | 2023-10-18 |
318 | VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the linguistic diversity and variations, it is challenging to build a robust and generalized ASR system for Arabic. In this work, we address this gap by developing and demoing a system, dubbed VoxArabica, for dialect identification (DID) as well as automatic speech recognition (ASR) of Arabic. |
Abdul Waheed; Bashar Talafha; Peter Sullivan; AbdelRahim Elmadany; Muhammad Abdul-Mageed; | arxiv-cs.CL | 2023-10-17 |
319 | Generative Error Correction for Code-switching Speech Recognition Using Large Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. Despite the recent advances in automatic speech recognition (ASR), … |
CHEN CHEN et. al. | ArXiv | 2023-10-17 |
320 | Correction Focused Language Model Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel correction focused LM training approach which aims to prioritize ASR fallible words. |
Yingyi Ma; Zhe Liu; Ozlem Kalinli; | arxiv-cs.CL | 2023-10-17 |
321 | Multi-stage Large Language Model Correction for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. |
Jie Pu; Thai-Son Nguyen; Sebastian Stüker; | arxiv-cs.CL | 2023-10-17 |
322 | End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder. |
Can Cui; Imran Ahamad Sheikh; Mostafa Sadeghi; Emmanuel Vincent; | arxiv-cs.CL | 2023-10-16 |
323 | Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. |
ZHIHONG LEI et. al. | arxiv-cs.LG | 2023-10-15 |
324 | Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing leveraging the power of deep learning models in accurately delivering spot-on transcriptions across a wide variety of vocabularies and speaking styles. |
Ankitha Sudarshan; Vinay Samuel; Parth Patwa; Ibtihel Amara; Aman Chadha; | arxiv-cs.CL | 2023-10-14 |
325 | SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities. |
ZHEHUAI CHEN et. al. | arxiv-cs.CL | 2023-10-13 |
326 | On The Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It has been shown that TTS-generated outputs still do not have the same qualities as real data. In this work we focus on the temporal structure of synthetic data and its relation to ASR training. |
Nick Rossenbach; Benedikt Hilmes; Ralf Schlüter; | arxiv-cs.CL | 2023-10-12 |
327 | Adapting The Adapters for Code-switching in Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network. |
Atharva Kulkarni; Ajinkya Kulkarni; Miguel Couceiro; Hanan Aldarmaki; | arxiv-cs.CL | 2023-10-11 |
328 | Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR). |
SRIJITH RADHAKRISHNAN et. al. | arxiv-cs.CL | 2023-10-10 |
329 | A Study of Speech Recognition, Speech Translation, and Speech Summarization of TED English Lectures Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Our research focuses on developing an automatic speech recognition system for English lectures, which involves summarizing the content and providing Japanese subtitles. Subtitling … |
Kazumasa Yamamoto; Haruhiko Banno; Haruki Sakurai; Toichiro Adachi; Seiichi Nakagawa; | 2023 IEEE 12th Global Conference on Consumer Electronics … | 2023-10-10 |
330 | No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While in the context of hybrid ASR models several solutions have been proposed, the gender bias issue has not been explicitly addressed in end-to-end neural architectures. To fill this gap, we propose a data augmentation technique that manipulates the fundamental frequency (f0) and formants. |
Dennis Fucci; Marco Gaido; Matteo Negri; Mauro Cettolo; Luisa Bentivogli; | arxiv-cs.CL | 2023-10-10 |
331 | Acoustic Model Fusion for End-to-end Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. |
ZHIHONG LEI et. al. | arxiv-cs.SD | 2023-10-10 |
332 | Ed-cec: Improving Rare Word Recognition Using Asr Postprocessing Based on Error Detection and Context-aware Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automatic speech recognition (ASR) systems often encounter difficulties in accurately recognizing rare words, leading to errors that can have a negative impact on downstream tasks such as keyword spotting, intent detection, and text summarization. To address this challenge, we present a novel ASR postprocessing method that focuses on improving the recognition of rare words through error detection and context-aware error correction. |
Jiajun He; Zekun Yang; Tomoki Toda; | arxiv-cs.AI | 2023-10-08 |
333 | Improving End-to-End Speech Processing By Efficient Text Data Utilization with Latent Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models. |
JIANQIAO LU et. al. | arxiv-cs.CL | 2023-10-08 |
334 | LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LauraGPT, a novel unified audio-and-text GPT-based LLM for audio recognition, understanding, and generation. |
ZHIHAO DU et. al. | arxiv-cs.SD | 2023-10-06 |
335 | Dementia Assessment Using Mandarin Speech with An Attention-based Speech Recognition Encoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper utilizes a speech recognition model to construct a dementia assessment system tailored for Mandarin speakers during the picture description task. |
ZIH-JYUN LIN et. al. | arxiv-cs.CL | 2023-10-05 |
336 | EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose EFFUSE, a novel approach that uses a single SSL model to mimic the features of multiple SSL models via prediction, resulting in a lightweight framework with competitive performance. |
Tejes Srivastava; Jiatong Shi; William Chen; Shinji Watanabe; | arxiv-cs.SD | 2023-10-05 |
337 | LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-end ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models. |
ALEKSANDR MEISTER et. al. | arxiv-cs.CL | 2023-10-04 |
338 | Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. |
Liming Wang; Mark Hasegawa-Johnson; Chang D. Yoo; | arxiv-cs.CL | 2023-10-03 |
339 | Evaluating Speech Synthesis By Training Recognizers on Synthetic Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Prior works focus on evaluating synthetic speech based on pre-trained speech recognition models, however, this can be limiting since this approach primarily measures speech intelligibility. In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech. |
DAREEN ALHARTHI et. al. | arxiv-cs.CL | 2023-10-01 |
340 | AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Several publications have highlighted racial bias with speech-to-text algorithms and performance on minority accents lags significantly. |
TOBI OLATUNJI et. al. | arxiv-cs.CL | 2023-09-30 |
341 | AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels. |
Andrew Rouditchenko; Ronan Collobert; Tatiana Likhomanenko; | arxiv-cs.LG | 2023-09-29 |
342 | Federated Learning with Differential Privacy for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. |
MARTIN PELIKAN et. al. | arxiv-cs.LG | 2023-09-29 |
343 | SLM: Bridge The Thin Gap Between Speech and Text Foundation Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. |
MINGQIU WANG et. al. | arxiv-cs.CL | 2023-09-29 |
344 | The Gift of Feedback: Improving ASR Model Quality By Learning from User Corrections Through Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continually learn from on-device user corrections through Federated Learning (FL) to address this issue. |
LILLIAN ZHOU et. al. | arxiv-cs.CL | 2023-09-29 |
345 | LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this information may be helpful for ASR modeling. To alleviate this issue, we propose the LAE-ST-MoE framework. |
GUODONG MA et. al. | arxiv-cs.SD | 2023-09-28 |
346 | Speech Collage: Code-switched Audio Generation By Collaging Monolingual Corpora Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. |
AMIR HUSSEIN et. al. | arxiv-cs.SD | 2023-09-27 |
347 | Lip2Vec: Efficient and Robust Visual Speech Recognition Via Latent-to-Latent Visual to Audio Representation Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Un-like previous works that involve auxiliary losses or com-plex training procedures and architectures, we propose a simple approach, named Lip2Vec that is based on learning a prior model. |
Yasser Abdelaziz Dahou Djilali; Sanath Narayan; Haithem Boussaid; Ebtessam Almazrouei; Merouane Debbah; | iccv | 2023-09-27 |
348 | HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. |
CHEN CHEN et. al. | arxiv-cs.CL | 2023-09-27 |
349 | Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in The HYKIST Project Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In today’s interconnected globe, moving abroad is more and more prevalent, whether it’s for employment, refugee resettlement, or other causes. Language difficulties between … |
Khai Le-Duc; | ArXiv | 2023-09-26 |
350 | Updated Corpora and Benchmarks for Long-Form Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we re-release three standard ASR corpora – TED-LIUM 3, Gigapeech, and VoxPopuli-en – with updated transcription and alignments to enable their use for long-form ASR research. |
JENNIFER DREXLER FOX et. al. | arxiv-cs.CL | 2023-09-26 |
351 | Speech Dereverberation With Frequency Domain Autoregressive Modeling Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to … |
Anurenjan Purushothaman; Debottam Dutta; Rohit Kumar; Sriram Ganapathy; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-09-24 |
352 | AudioFool: Fast, Universal and Synchronization-free Cross-Domain Attack on Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA model, and we design a method of generating attacks with arbitrary such desired properties, namely the invariance to synchronization, and the robustness to filtering: this allows a Denial-of-Service (DoS) attack against ASR systems. |
Mohamad Fakih; Rouwaida Kanj; Fadi Kurdahi; Mohammed E. Fouda; | arxiv-cs.CR | 2023-09-20 |
353 | A Survey of Automatic Speech Recognition Deep Models Performance for Polish Medical Terms Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for … |
MARTA ZIELONKA et. al. | 2023 Signal Processing: Algorithms, Architectures, … | 2023-09-20 |
354 | Directional Source Separation for Robust Speech Recognition on Smart Glasses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve voice quality, this work investigates directional source separation using the multi-microphone array. |
TIANTIAN FENG et. al. | arxiv-cs.SD | 2023-09-19 |
355 | HypR: A Comprehensive Study for ASR Hypothesis Revising with A Reference Corpus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, we first concentrate on providing an ASR hypothesis revising (HypR) dataset in this study. |
Yi-Wei Wang; Ke-Han Lu; Kuan-Yu Chen; | arxiv-cs.CL | 2023-09-18 |
356 | Instruction-Following Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the mechanisms behind these models’ speech understanding and reasoning capabilities remain underexplored. To study this question from the data perspective, we introduce instruction-following speech recognition, training a Listen-Attend-Spell model to understand and execute a diverse set of free-form text instructions. |
Cheng-I Jeff Lai; Zhiyun Lu; Liangliang Cao; Ruoming Pang; | arxiv-cs.CL | 2023-09-18 |
357 | Are Soft Prompts Good Zero-shot Learners for Speech Recognition? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). |
DIANWEN NG et. al. | arxiv-cs.SD | 2023-09-17 |
358 | Open Vocabulary Keyword Spotting with Small-Footprint ASR-based Architecture and Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present the results of experiments on minimizing the model size for the text-based Open Vocabulary Keyword Spotting task. The main goal is to perform inference on devices with … |
Mikołaj Pudo; Mateusz Wosik; Artur Janicki; | 2023 18th Conference on Computer Science and Intelligence … | 2023-09-17 |
359 | Augmenting Conformers with Structured State-space Sequence Models for Online Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4), a family of models that provide a parameter-efficient way of accessing arbitrarily long left context. |
HAOZHE SHAN et. al. | arxiv-cs.CL | 2023-09-15 |
360 | Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage. To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. |
YANG LI et. al. | arxiv-cs.LG | 2023-09-14 |
361 | Echotune: A Modular Extractor Leveraging The Variable-Length Nature of Speech in ASR Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential long-term connectivity. Addressing this limitation, we introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism that accommodates a range of speech sample complexities and durations. |
Sizhou Chen; Songyang Gao; Sen Fang; | arxiv-cs.SD | 2023-09-14 |
362 | CPPF: A Contextual and Post-processing-free Model for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus on ASR-related processing tasks, including Contextual ASR and multiple ASR post processing tasks. |
LEI ZHANG et. al. | arxiv-cs.CL | 2023-09-13 |
363 | SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multi-Modal automatic speech recognition (ASR) techniques aim to leverage additional modalities to improve the performance of speech recognition systems. While existing approaches … |
HAOXU WANG et. al. | ICASSP 2024 – 2024 IEEE International Conference on … | 2023-09-11 |
364 | SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the pipeline for constructing the corpus and propose baseline methods for utilizing text information in the visual slide context. |
HAOXU WANG et. al. | arxiv-cs.SD | 2023-09-11 |
365 | Leveraging Large Language Models for Exploiting ASR Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis. |
PRANAY DIGHE et. al. | arxiv-cs.CL | 2023-09-09 |
366 | Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the effectiveness of this method has not been demonstrated for various model architectures, nor has it been verified that the encoder has the expected look-ahead capability to reduce latency. This study, therefore, examines the effectiveness of Mask-CTCbased pre-training for models with different architectures, such as Transformer-Transducer and contextual block streaming ASR. |
Huaibo Zhao; Yosuke Higuchi; Yusuke Kida; Tetsuji Ogawa; Tetsunori Kobayashi; | arxiv-cs.SD | 2023-09-08 |
367 | LanSER: Language-Model Supported Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. |
TAESIK GONG et. al. | arxiv-cs.CL | 2023-09-07 |
368 | Bring The Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. |
Patrick Eickhoff; Matthias Möller; Theresa Pekarek Rosin; Johannes Twiefel; Stefan Wermter; | arxiv-cs.CL | 2023-09-05 |
369 | SememeASR: Boosting Performance of End-to-End Speech Recognition Against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that knowledge-driven approaches can help data-driven approaches alleviate their flaws, we introduce sememe-based semantic knowledge information to speech recognition (SememeASR). |
Jiaxu Zhu; Changhe Song; Zhiyong Wu; Helen Meng; | arxiv-cs.SD | 2023-09-04 |
370 | Text-Only Domain Adaptation for End-to-End Speech Recognition Through Down-Sampling Acoustic Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed novel representations match strategy through down-sampling acoustic representation to align with text modality. |
JIAXU ZHU et. al. | arxiv-cs.SD | 2023-09-04 |
371 | Boosting Low-Resource Speech Recognition in Air Traffic Communication Via Pretrained Feature Aggregation and Multi-Task Learning IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Developing a robust Automatic Speech Recognition (ASR) system usually requires a large amount of well-annotated samples which is extremely hard to build in the Air Traffic Control … |
Dongyue Guo; Zichen Zhang; Bo Yang; Jianwei Zhang; Yi Lin; | IEEE Transactions on Circuits and Systems II: Express Briefs | 2023-09-01 |
372 | ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenge, we propose ASTER, a technique for automatically testing the accessibility of ASR systems. |
YI LIU et. al. | arxiv-cs.SD | 2023-08-29 |
373 | Speech Wikimedia: A 77 Language Multilingual Speech Dataset Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA … |
RAFAEL MOSQUERA GÓMEZ et. al. | arxiv-cs.AI | 2023-08-29 |
374 | Naaloss: Rethinking The Objective of Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. |
Kuan-Hsun Ho; En-Lun Yu; Jeih-weih Hung; Berlin Chen; | arxiv-cs.SD | 2023-08-24 |
375 | Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a cross-modal global interaction and local alignment (GILA) approach for AVSR, which captures the deep audio-visual (A-V) correlations from both global and local perspectives. |
YUCHEN HU et. al. | ijcai | 2023-08-23 |
376 | Convoifilter: A Case Study of Doing Cocktail Party Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an end-to-end model designed to improve automatic speech recognition (ASR) for a particular speaker in a crowded, noisy environment. |
Thai-Binh Nguyen; Alexander Waibel; | arxiv-cs.SD | 2023-08-22 |
377 | SeamlessM4T: Massively Multilingual & Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. |
SEAMLESS COMMUNICATION et. al. | arxiv-cs.CL | 2023-08-22 |
378 | On Training A Neural Residual Acoustic Echo Suppressor for Improved ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Acoustic Echo Cancellation (AEC) is critical for accurate recognition of speech directed at a smart device playing audio. Previous work has showed that neural AEC models can … |
S. Panchapagesan; T. Shabestary; A. Narayanan; | Interspeech | 2023-08-20 |
379 | A Conformer-based Classifier for Variable-length Utterance Processing in Anti-spoofing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The success achieved by conformers in Automatic Speech Recognition (ASR) leads us to their application in other domains, such as spoofing detection for automatic speaker … |
Eros Rosello; Alejandro Gomez-Alanis; A. Gómez; A. Peinado; | Interspeech | 2023-08-20 |
380 | Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We release 840 hours of read speech multi-dialect ASR corpora consisting of 700 hours of main Thai dialect, named Thai-central, and 40 hours for each local dialect , named … |
Artit Suwanbandit; Burin Naowarat; Orathai Sangpetch; E. Chuangsuwanich; | Interspeech | 2023-08-20 |
381 | Two-stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper addresses effective pretraining of automatic speech recognition (ASR) and gender recognition to improve wav2vec 2.0 embedding for speech emotion recognition (SER). … |
Yuan Gao; Chenhui Chu; Tatsuya Kawahara; | Interspeech | 2023-08-20 |
382 | Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper proposes autoregressive modeling of the joint multi-talker automatic speech recognition (ASR) and timestamp prediction. Autoregressive modeling of multi-talker ASR is a … |
Naoki Makishima; Keita Suzuki; Satoshi Suzuki; Atsushi Ando; Ryo Masumura; | Interspeech | 2023-08-20 |
383 | Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The use of self-supervised pre-trained speech models has greatly improved speech tasks in low-resource settings. However, fine-tuning the entire model can be computationally … |
DIANWEN NG et. al. | Interspeech | 2023-08-20 |
384 | Embedding Articulatory Constraints for Low-resource Speech Recognition Based on Large Pre-trained Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Knowledge about phonemes and their articulatory attributes can help improve automatic speech recognition (ASR) of low-resource languages. In this study, we propose a simple and … |
Jaeyoung Lee; M. Mimura; Tatsuya Kawahara; | Interspeech | 2023-08-20 |
385 | Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speakers with dysarthria could particularly benefit from assistive speech technology, but are underserved by current automatic speech recognition (ASR) systems. The differences of … |
Enno Hermann; Mathew Magimai; | Interspeech | 2023-08-20 |
386 | Dialect Speech Recognition Modeling Using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In order to utilize the large amount of historical speech resources for applications such as linguistic analysis and retrieval, automatic speech recognition technology that can … |
Shogo Miwa; A. Kai; | Interspeech | 2023-08-20 |
387 | MiniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on The Edge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Real-time applications of Automatic Speech Recognition (ASR) on user devices on the edge require streaming processing. Conformer model has achieved state-of-the-art performance in … |
Haris Gulzar; Monikka Roslianna Busto; Takeharu Eda; Katsutoshi Itoyama; K. Nakadai; | Interspeech | 2023-08-20 |
388 | Whisper Features for Dysarthric Severity-Level Classification Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Dysarthria is a speech disorder caused by improper coordination between the brain and the muscles that produce intelligible speech. Accurately diagnosing the severity of … |
Siddharth Rathod; Monil Charola; Akshat Vora; Yash Jogi; H. Patil; | Interspeech | 2023-08-20 |
389 | TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TokenSplit, a speech separation model that acts on discrete token sequences. |
HAKAN ERDOGAN et. al. | arxiv-cs.SD | 2023-08-20 |
390 | Unsupervised Code-switched Text Generation from Parallel Text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: There has been great interest in developing automatic speech recognition (ASR) systems that can handle code-switched (CS) speech to meet the needs of a growing bilingual … |
JI-EUN CHI et. al. | Interspeech | 2023-08-20 |
391 | Data Augmentation for Children ASR and Child-adult Speaker Classification Using Voice Conversion Methods Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Many young children prefer speech based interfaces over text, as they are relatively slow and error-prone with text input. However, children ASR can be challenging due to the lack … |
Shuyang Zhao; Mittul Singh; Abraham Woubie; Reima Karhila; | Interspeech | 2023-08-20 |
392 | Exploring Sources of Racial Bias in Automatic Speech Recognition Through The Lens of Rhythmic Variation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although studies have shown that one issue of bias in modern automatic speech recognition (ASR) technologies is degraded performance for African American English (AAE) speakers, … |
Li-Fang Lai; N. Holliday; | Interspeech | 2023-08-20 |
393 | Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing Based Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View |
ZHENG LIANG et. al. | Interspeech | 2023-08-20 |
394 | Bayes Risk Transducer: Transducer with Controllable Alignment Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, this work proposes Bayes Risk Transducer (BRT), which uses a Bayes risk function to set lower risk values to the preferred paths so that the predicted alignment is more likely to satisfy specific desired properties. |
JINCHUAN TIAN et. al. | arxiv-cs.CL | 2023-08-19 |
395 | Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A standard pipeline for automated spoken language assessment is to start with an automatic speech recognition (ASR) system and derive features that exploit transcriptions and … |
Stefano Bannò; K. Knill; M. Matassoni; Vyas Raina; M. Gales; | Slate | 2023-08-18 |
396 | An Ambient Intelligence-based Approach For Longitudinal Monitoring of Verbal and Vocal Depression Symptoms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Another major challenge in depression relapse research is the scarcity of publicly available datasets. To overcome these issues, we propose a one-shot learning framework for detecting depression relapse from speech. |
Alice Othmani; Muhammad Muzammel; | arxiv-cs.HC | 2023-08-16 |
397 | Accurate Synthesis of Dysarthric Speech for ASR Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. |
Mohammad Soleymanpour; Michael T. Johnson; Rahim Soleymanpour; Jeffrey Berry; | arxiv-cs.SD | 2023-08-16 |
398 | A Comprehensive Survey on Automatic Speech Recognition Using Neural Networks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Amandeep Singh Dhanjal; Williamjeet Singh; | Multim. Tools Appl. | 2023-08-15 |
399 | Radio2Text: Streaming Speech Recognition Using MmWave Radio Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Radio2Text, the first mmWave-based system for streaming automatic speech recognition (ASR) with a vocabulary size exceeding 13,000 words. |
Running Zhao; Jiangtao Yu; Hang Zhao; Edith C. H. Ngai; | arxiv-cs.SD | 2023-08-15 |
400 | Using Text Injection to Improve Recognition of Personal Identifiers in Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use text-injection to improve the recognition of PII categories by including fake textual substitutes of PII categories in the training data using a text injection method. |
YOCHAI BLAU et. al. | arxiv-cs.CL | 2023-08-14 |
401 | Text Injection for Capitalization and Turn-Taking Prediction in Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks. |
SHAAN BIJWADIA et. al. | arxiv-cs.CL | 2023-08-14 |
402 | A Novel Self-training Approach for Low-resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a self-training approach for automatic speech recognition (ASR) for low-resource settings. |
Satwinder Singh; Feng Hou; Ruili Wang; | arxiv-cs.CL | 2023-08-09 |
403 | Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR). |
Yang Zhang; Krishna C. Puvvada; Vitaly Lavrukhin; Boris Ginsburg; | arxiv-cs.SD | 2023-08-09 |
404 | Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text generated by ASR output. |
JIAXIN FAN et. al. | arxiv-cs.CL | 2023-08-07 |
405 | Federated Representation Learning for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. |
GURUPRASAD V RAMESH et. al. | arxiv-cs.SD | 2023-08-03 |
406 | Inaudible Adversarial Perturbation: Manipulating The Recognition of User Speech in Real Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we seek to bridge the gap in existing research and extend the attack to user-present scenarios. |
XINFENG LI et. al. | arxiv-cs.CR | 2023-08-02 |
407 | ÌròyìnSpeech: A Multi-purpose Yorùbá Speech Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \`{I}r\`{o}y\`{i}nSpeech, a new corpus influenced by the desire to increase the amount of high quality, contemporary Yor\`{u}b\'{a} speech data, which can be used for both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) tasks. |
Tolulope Ogunremi; Kola Tubosun; Anuoluwapo Aremu; Iroro Orife; David Ifeoluwa Adelani; | arxiv-cs.CL | 2023-07-29 |
408 | The Timing Bottleneck: Why Timing and Overlap Are Mission-critical for Conversational User Interfaces, Speech Recognition and Dialogue Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). |
Andreas Liesenfeld; Alianda Lopez; Mark Dingemanse; | arxiv-cs.CL | 2023-07-28 |
409 | Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. |
Christophe Van Gysel; | sigir | 2023-07-25 |
410 | Adaptation of Whisper Models to Child Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly … |
Rishabh Jain; Andrei Barcovschi; Mariam Yiwere; Peter Corcoran; H. Cucu; | ArXiv | 2023-07-24 |
411 | Boosting Punctuation Restoration with Data Generation and Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. |
VIET DAC LAI et. al. | arxiv-cs.CL | 2023-07-24 |
412 | Code-Switched Urdu ASR for Noisy Telephonic Environment Using Data Centric Approach with Hybrid HMM and CNN-TDNN Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, this paper describes an implementation framework of a resource efficient Automatic Speech Recognition/ Speech to Text System in a noisy call-center environment using Chain Hybrid HMM and CNN-TDNN for Code-Switched Urdu Language. |
Muhammad Danyal Khan; Raheem Ali; Arshad Aziz; | arxiv-cs.CL | 2023-07-24 |
413 | Exploring The Integration of Speech Separation and Recognition with Self-Supervised Learning Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further improve multi-speaker recognition performance, we present a carefully designed training strategy for integrating speech separation and recognition with SSLR. |
YOSHIKI MASUYAMA et. al. | arxiv-cs.SD | 2023-07-23 |
414 | A Meta Learning Scheme for Fast Accent Domain Expansion in Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. |
Ziwei Zhu; Changhao Shan; Bihong Zhang; Jian Yu; | arxiv-cs.SD | 2023-07-23 |
415 | Robust Automatic Speech Recognition Via WavAugment Guided Phoneme Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Developing a practically-robust automatic speech recognition (ASR) is challenging since the model should not only maintain the original performance on clean samples, but also achieve consistent efficacy under small volume perturbations and large domain shifts. To address this problem, we propose a novel WavAugment Guided Phoneme Adversarial Training (wapat). |
GEGE QI et. al. | arxiv-cs.SD | 2023-07-23 |
416 | Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome this issue, we propose a novel E2E SLU system that enhances robustness to ASR errors by fusing audio and text representations based on the estimated modality confidence of ASR hypotheses. We introduce two novel techniques: 1) an effective method to encode the quality of ASR hypotheses and 2) an effective approach to integrate them into E2E SLU models. |
SUYOUN KIM et. al. | arxiv-cs.CL | 2023-07-22 |
417 | A Change of Heart: Improving Speech Emotion Recognition Through Speech-to-Text Modality Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. |
Zeinab Sadat Taghavi; Ali Satvaty; Hossein Sameti; | arxiv-cs.SD | 2023-07-21 |
418 | A Deep Dive Into The Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. |
Anand Kumar Rai; Siddharth D Jaiswal; Animesh Mukherjee; | arxiv-cs.CL | 2023-07-20 |
419 | Ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: We introduceivrit.ai, a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) … |
Yanir Marmor; Kinneret Misgav; Y. Lifshitz; | ArXiv | 2023-07-17 |
420 | Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay for continual learning. |
Theresa Pekarek Rosin; Stefan Wermter; | arxiv-cs.CL | 2023-07-14 |
421 | SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel speech-scene graph grounding network (SGGNet$^2$) that robustly grounds spoken utterances by leveraging the acoustic similarity between correctly recognized and misrecognized words obtained from automatic speech recognition (ASR) systems. |
DOHYUN KIM et. al. | arxiv-cs.RO | 2023-07-14 |
422 | SGGNet2: Speech-Scene Graph Grounding Network for Speech-guided Navigation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The spoken language serves as an accessible and efficient interface, enabling non-experts and disabled users to interact with complex assistant robots. However, accurately … |
DOHYUN KIM et. al. | 2023 32nd IEEE International Conference on Robot and Human … | 2023-07-14 |
423 | Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare our model with encoders pretrained on self-supervised learning (SSL), and show that ASR pretraining is much more effective than SSL for SICSF. |
He Huang; Jagadeesh Balam; Boris Ginsburg; | arxiv-cs.CL | 2023-07-13 |
424 | Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. |
Wenxuan Wang; Guodong Ma; Yuke Li; Binbin Du; | arxiv-cs.SD | 2023-07-12 |
425 | Exploring The Integration of Large Language Models Into Automatic Speech Recognition Systems: An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems to improve transcription accuracy. |
Zeping Min; Jinbo Wang; | arxiv-cs.CL | 2023-07-12 |
426 | SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Cheaper alternatives to self-attention for ASR have been developed, but they fail to consistently reach the same level of accuracy. This paper, therefore, proposes a novel linear-time alternative to self-attention. |
Titouan Parcollet; Rogier van Dalen; Shucong Zhang; Sourav Bhattacharya; | arxiv-cs.CL | 2023-07-12 |
427 | The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. |
KUN SONG et. al. | arxiv-cs.SD | 2023-07-10 |
428 | Introducing Semantics Into Speech Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. |
DEREK XU et. al. | acl | 2023-07-08 |
429 | Building Accurate Low Latency ASR for Streaming Voice Search in E-commerce Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build accurate LSTM, attention and CTC based streaming ASR models for large-scale Hinglish (blend of Hindi and English) Voice Search. |
Abhinav Goyal; Nikesh Garera; | acl | 2023-07-08 |
430 | DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Choosing an informative subset of speech samples that are most representative of the target accents becomes important for effective ASR finetuning. To address this problem, we propose DITTO (Data-efficient and faIr Targeted subseT selectiOn that uses Submodular Mutual Information (SMI) functions as acquisition functions to find the most informative set of utterances matching a target accent within a fixed budget. |
SURAJ KOTHAWADE et. al. | acl | 2023-07-08 |
431 | Hybrid Transducer and Attention Based Encoder-Decoder Modeling for Speech-to-Text Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks. |
YUN TANG et. al. | acl | 2023-07-08 |
432 | BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. |
MINGDA CHEN et. al. | acl | 2023-07-08 |
433 | STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present STT4SG-350, a corpus of Swiss German speech, annotated with Standard German text at the sentence level. |
MICHEL PL�SS et. al. | acl | 2023-07-08 |
434 | A Theory of Unsupervised Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed a general theoretical framework to study the properties of {pasted macro �ASRU�}/ systems based on random matrix theory and the theory of neural tangent kernels. |
Liming Wang; Mark Hasegawa-Johnson; Chang Yoo; | acl | 2023-07-08 |
435 | Back Translation for Speech-to-text Translation Without Transcripts IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to utilize large amounts of target-side monolingual data to enhance ST without transcripts. |
Qingkai Fang; Yang Feng; | acl | 2023-07-08 |
436 | Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate whether data augmentation techniques could help improve low-resource ASR performance, focusing on four typologically diverse minority languages or language variants (West Germanic: Gronings, West-Frisian; Malayo-Polynesian: Besemah, Nasal). |
Martijn Bartelds; Nay San; Bradley McDonnell; Dan Jurafsky; Martijn Wieling; | acl | 2023-07-08 |
437 | Why Aren�t We NER Yet? Artifacts of ASR Errors in Named Entity Recognition in Spontaneous Speech Transcripts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine in detail the complex relationship between ASR and NER errors which limit the ability of NER models to recover entity mentions from spontaneous speech transcripts. |
PIOTR SZYMANSKI et. al. | acl | 2023-07-08 |
438 | Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To produce ASR and ST content effectively with minimal latency, we propose a joint token-level serialized output training method that interleaves source and target words by leveraging an off-the-shelf textual aligner. |
SARA PAPI et. al. | arxiv-cs.CL | 2023-07-06 |
439 | Transcribing Educational Videos Using Whisper: A Preliminary Study on Using AI for Transcribing Educational Videos Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Videos are increasingly being used for e-learning, and transcripts are vital to enhance the learning experience. The costs and delays of generating transcripts can be alleviated … |
Ashwin Rao; | ArXiv | 2023-07-04 |
440 | Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. |
Guangzhi Sun; Chao Zhang; Ivan Vulić; Paweł Budzianowski; Philip C. Woodland; | arxiv-cs.CL | 2023-07-04 |
441 | Boosting Norwegian Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk. |
Javier de la Rosa; Rolv-Arild Braaten; Per Egil Kummervold; Freddy Wetjen; Svein Arne Brygfjeld; | arxiv-cs.CL | 2023-07-04 |
442 | Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to reduce bias against different age groups and non-native speakers of Dutch. |
Tanvina Patel; Odette Scharenborg; | arxiv-cs.CL | 2023-07-04 |
443 | Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a supervision loss for smoother training of the Contextual Adapters. |
Devang Kulshreshtha; Saket Dingliwal; Brady Houston; Sravan Bodapati; | arxiv-cs.CL | 2023-07-03 |
444 | Don’t Stop Self-Supervision: Accent Adaptation of Speech Representations Via Residual Adapters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose and investigate self-supervised adaptation of speech representations to such populations in a parameter-efficient way via training accent-specific residual adapters. |
ANSHU BHATIA et. al. | arxiv-cs.CL | 2023-07-01 |
445 | Trends and Developments in Automatic Speech Recognition Research IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
D. O’Shaughnessy; | Comput. Speech Lang. | 2023-07-01 |
446 | Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language. |
Simone Wills; Yu Bai; Cristian Tejedor-Garcia; Catia Cucchiarini; Helmer Strik; | arxiv-cs.CL | 2023-06-29 |
447 | Accelerating Transducers Through Adjacent Token Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this design is inefficient, particularly for long speech signals due to the quadratic computation of self-attention. To address this, we propose a new method, Adjacent Token Merging (A-ToMe), which gradually combines adjacent tokens with high similarity scores between their key values. |
Yuang Li; Yu Wu; Jinyu Li; Shujie Liu; | arxiv-cs.CL | 2023-06-28 |
448 | Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). |
Yuang Li; Yu Wu; Jinyu Li; Shujie Liu; | arxiv-cs.CL | 2023-06-28 |
449 | Cascaded Encoders for Fine-tuning ASR Models on Overlapped Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an MT-ASR model formed by combining a well-trained foundation model with a multi-talker mask model in a cascaded RNN-T encoder configuration. |
Richard Rose; Oscar Chang; Olivier Siohan; | arxiv-cs.SD | 2023-06-28 |
450 | Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN). |
QINGYU WANG et. al. | aaai | 2023-06-26 |
451 | Don’t Be So Sure! Boosting ASR Decoding Via Confidence Relaxation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models. |
Tomer Wullach; Shlomo E. Chazan; | aaai | 2023-06-26 |
452 | Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, different from audio forced alignment, it is challenging to develop a reliable visual forced alignment technology for the following two reasons: 1) Visual Speech Recognition (VSR) has a much lower performance compared to audio-based Automatic Speech Recognition (ASR), and 2) the translation from text to video is not reliable, so the method typically used for building audio forced alignment cannot be utilized in developing visual forced alignment. In order to alleviate these challenges, in this paper, we propose a new method that is appropriate for visual forced alignment, namely Deep Visual Forced Alignment (DVFA). |
Minsu Kim; Chae Won Kim; Yong Man Ro; | aaai | 2023-06-26 |
453 | Performance Disparities Between Accents in Automatic Speech Recognition (Student Abstract) Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this work, we expand the discussion of bias in Automatic Speech Recognition (ASR) through a large-scale audit. Using a large and global data set of speech, we perform an audit … |
Alex DiChristofano; Henry Shuster; Shefali Chandra; Neal Patwari; | AAAI Conference on Artificial Intelligence | 2023-06-26 |
454 | An Analysis of Personalized Speech Recognition System Development for The Deaf and Hard-of-Hearing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To do so, we analyze the use of openly-available automatic speech recognition (ASR) tools with a DHH Japanese speaker dataset. As these out-of-the-box ASR models typically do not perform well on DHH speech, we provide a thorough analysis of creating personalized ASR systems. |
Lester Phillip Violeta; Tomoki Toda; | arxiv-cs.SD | 2023-06-24 |
455 | Mixture Encoder for Joint Speech Separation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a middle-ground approach that leverages explicit speech separation similarly to the modular approach but also incorporates mixture speech information directly into the ASR module in order to mitigate the propagation of errors made by the speech separator. |
Simon Berger; Peter Vieting; Christoph Boeddeker; Ralf Schlüter; Reinhold Haeb-Umbach; | arxiv-cs.CL | 2023-06-21 |
456 | NoRefER: A Referenceless Quality Metric for Automatic Speech Recognition Via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces NoRefER, a novel referenceless quality metric for automatic speech recognition (ASR) systems. |
Kamer Ali Yuksel; Thiago Ferreira; Golara Javadi; Mohamed El-Badrashiny; Ahmet Gunduz; | arxiv-cs.CL | 2023-06-21 |
457 | Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a Conformer-based architecture, called Aformer, to leverage both the acoustic information from large non-accented and limited accented training data. |
Xuefei Wang; Yanhua Long; Yijie Li; Haoran Wei; | arxiv-cs.SD | 2023-06-20 |
458 | Improved Keyword Recognition Based on Aho-Corasick Automaton Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The recognition of out-of-vocabulary (OOV) words in many state-of-art automatic speech recognition (ASR) systems, which need the to recognize a word that has never been seen … |
Yachao Guo; Zhibin Qiu; Hao Huang; Chng Eng Siong; | 2023 International Joint Conference on Neural Networks … | 2023-06-18 |
459 | A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In collaborative learning environments, effective intelligent learning systems need to accurately analyze and understand the collaborative discourse between learners (i.e., group … |
JIE CAO et. al. | Proceedings of the 31st ACM Conference on User Modeling, … | 2023-06-18 |
460 | Research on An Improved Conformer End-to-end Speech Recognition Model with R-Drop Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue of poor generalization ability in end-to-end speech recognition models within deep learning, this study proposes a new Conformer-based speech recognition model called Conformer-R that incorporates the R-drop structure. |
Weidong Ji; Shijie Zan; Guohui Zhou; Xu Wang; | arxiv-cs.SD | 2023-06-14 |
461 | Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing Based Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current data augmentation methods mainly rely on audio splicing and text-to-speech (TTS) models, which might result in discontinuous, unrealistic, and less diversified speech. To mitigate these potential issues, we propose a novel data augmentation method by applying the text-based speech editing model. |
ZHENG LIANG et. al. | arxiv-cs.CL | 2023-06-14 |
462 | Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, a novel multilingual model fusion technique has been proposed where a model is trained to learn cross-lingual acoustic-phonetic similarities as a mapping function. |
Muhammad Umar Farooq; Thomas Hain; | arxiv-cs.CL | 2023-06-14 |
463 | IIITH-CSTD Corpus: Crowdsourced Strategies for The Collection of A Large-scale Telugu Speech Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to the lack of a large annotated speech corpus, many low-resource Indian languages struggle to utilize recent advancements in deep neural network architectures for Automatic … |
MIRISHKAR SAI GANESH et. al. | ACM Transactions on Asian and Low-Resource Language … | 2023-06-12 |
464 | Multimodal Audio-textual Architecture for Robust Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Because such approach relies on the ASR output, it often suffers from the so-called ASR error propagation. In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa. |
Anderson R. Avila; Mehdi Rezagholizadeh; Chao Xing; | arxiv-cs.CL | 2023-06-11 |
465 | Adversarial Training For Low-Resource Disfluency Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) that utilizes a small amount of labeled real disfluent data in conjunction with a large amount of unlabeled data. |
Vineet Bhat; Preethi Jyothi; Pushpak Bhattacharyya; | arxiv-cs.CL | 2023-06-10 |
466 | Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end (E2E) systems have shown comparable performance to hybrid systems for automatic speech recognition (ASR). Word timings, as a by-product of ASR, are essential in many … |
Xianzhao Chen; Yist Y. Lin; Kang Wang; Yi He; Zejun Ma; | ArXiv | 2023-06-09 |
467 | Developing Speech Processing Pipelines for Police Accountability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops. |
Anjalie Field; Prateek Verma; Nay San; Jennifer L. Eberhardt; Dan Jurafsky; | arxiv-cs.CL | 2023-06-09 |
468 | Latent Phrase Matching for Dysarthric Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Many consumer speech recognition systems are not tuned for people with speech disabilities, resulting in poor recognition and user experience, especially for severe speech … |
COLIN S. LEA et. al. | ArXiv | 2023-06-08 |
469 | Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. |
CLAYTONE SIKASOTE et. al. | arxiv-cs.CL | 2023-06-07 |
470 | An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. |
Yu Bai; Cristian Tejedor-Garcia; Ferdy Hubers; Catia Cucchiarini; Helmer Strik; | arxiv-cs.CL | 2023-06-07 |
471 | Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a new lenient evaluation metric as a more defensible CER measure for Japanese ASR. |
Shigeki Karita; Richard Sproat; Haruko Ishikawa; | arxiv-cs.CL | 2023-06-07 |
472 | Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach. |
Massa Baali; Ibrahim Almakky; Shady Shehata; Fakhri Karray; | arxiv-cs.SD | 2023-06-07 |
473 | Label Aware Speech Representation Learning For Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task. |
SHIKHAR VASHISHTH et. al. | arxiv-cs.CL | 2023-06-07 |
474 | A Study on The Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively $+24.7\%, +61\%, \text{and} +7.2\%$ accuracy compared to classical acoustic features. |
Xavier F. Cadet; Ranya Aloufi; Sara Ahmadi-Abhari; Hamed Haddadi; | arxiv-cs.CL | 2023-06-07 |
475 | Alzheimer Disease Classification Through ASR-based Transcriptions: Exploring The Impact of Punctuation and Pauses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we used the new state-of-the-art Automatic Speech Recognition (ASR) model Whisper to obtain the transcriptions, which also include automatic punctuation. |
LUCÍA GÓMEZ-ZARAGOZÁ et. al. | arxiv-cs.CL | 2023-06-06 |
476 | N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task. |
Bashar Talafha; Abdul Waheed; Muhammad Abdul-Mageed; | arxiv-cs.CL | 2023-06-05 |
477 | SpellMapper: A Non-autoregressive Neural Spellchecker for ASR Customization with Candidate Retrieval Based on N-gram Mappings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose: 1) a novel algorithm for candidate retrieval, based on misspelled n-gram mappings, which gives up to 90% recall with just the top 10 candidates on Spoken Wikipedia; 2) a non-autoregressive neural model based on BERT architecture, where the initial transcript and ten candidates are combined into one input. |
Alexandra Antonova; Evelina Bakhturina; Boris Ginsburg; | arxiv-cs.CL | 2023-06-04 |
478 | End-to-End Joint Target and Non-Target Speakers ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker’s speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. |
RYO MASUMURA et. al. | arxiv-cs.CL | 2023-06-04 |
479 | A Reference-Less Quality Metric for Automatic Speech Recognition Via Contrastive-Learning of A Multi-Language Model with Self-Supervision Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: The common standard for quality evaluation of automatic speech recognition (ASR) systems is reference-based metrics such as the Word Error Rate (WER), computed using manual … |
K. Yuksel; Thiago Castro Ferreira; Ahmet Gunduz; Mohamed Al-Badrashiny; Golara Javadi; | 2023 IEEE International Conference on Acoustics, Speech, … | 2023-06-04 |
480 | Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The limited availability of non-native speech datasets presents a major challenge in automatic speech recognition (ASR) to narrow the performance gap between native and non-native speakers. To address this, the focus of this study is on the efficient incorporation of the L2 phonemes, which in this work refer to Korean phonemes, through articulatory feature analysis. |
Jisung Wang; Haram Lee; Myungwoo Oh; | arxiv-cs.CL | 2023-06-04 |
481 | Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Combining several active learning paradigms and the core-set approach, we propose a new multi-rounds adaptation process that uses epistemic uncertainty to automate the annotation process, significantly reducing the associated costs and human labor. |
Bonaventure F. P. Dossou; | arxiv-cs.CL | 2023-06-03 |
482 | Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data with kids going through gamified math learning activities. |
Eda Okur; Roddy Fuentes Alba; Saurav Sahay; Lama Nachman; | arxiv-cs.CY | 2023-06-01 |
483 | Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in … |
DONGJI GAO et. al. | ArXiv | 2023-06-01 |
484 | SlothSpeech: Denial-of-service Attack Against Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SlothSpeech, a denial-of-service attack against ASR models, which exploits the dynamic behaviour of the model. |
MIRAZUL HAQUE et. al. | arxiv-cs.SD | 2023-06-01 |
485 | Towards Hate Speech Detection in Low-resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We specifically use a multilingual AWE model trained on labelled data from well-resourced languages to spot keywords in data in the unseen target language. |
Christiaan Jacobs; Nathanaël Carraz Rakotonirina; Everlyn Asiko Chimoto; Bruce A. Bassett; Herman Kamper; | arxiv-cs.CL | 2023-06-01 |
486 | Adaptation and Optimization of Automatic Speech Recognition (ASR) for The Maritime Domain in The Field of VHF Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a multilingual automatic speech recognizer (ASR) for maritime radio communi-cation that automatically converts received VHF radio signals into text. |
Emin Cagatay Nakilcioglu; Maximilian Reimann; Ole John; | arxiv-cs.SD | 2023-06-01 |
487 | Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). |
Santosh Kesiraju; Marek Sarvas; Tomas Pavlicek; Cecile Macaire; Alejandro Ciuba; | arxiv-cs.CL | 2023-05-31 |
488 | The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, new approaches are explored and compared to improve the performance of CLS based multilingual ASR model. |
Kaousheik Jayakumar; Vrunda N. Sukhadia; A Arunkumar; S. Umesh; | arxiv-cs.CL | 2023-05-31 |
489 | Zero-Shot Automatic Pronunciation Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT. |
Hongfu Liu; Mingqian Shi; Ye Wang; | arxiv-cs.SD | 2023-05-31 |
490 | Accurate and Structured Pruning for Efficient Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the model size and inference cost of the Conformer model while preserving high recognition performance. |
HUIQIANG JIANG et. al. | arxiv-cs.CL | 2023-05-31 |
491 | STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level. |
MICHEL PLÜSS et. al. | arxiv-cs.CL | 2023-05-30 |
492 | Towards Selection of Text-to-speech Data to Augment ASR Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic … |
SHUO LIU et. al. | ArXiv | 2023-05-30 |
493 | Improving Textless Spoken Language Understanding with Discrete Units As Intermediate Target Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, inspired by the content-disentangled discrete units from self-supervised speech models, we proposed to use discrete units as intermediate guidance to improve textless SLU performance. |
Guan-Wei Wu; Guan-Ting Lin; Shang-Wen Li; Hung-yi Lee; | arxiv-cs.CL | 2023-05-29 |
494 | HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are expensive for long input sequences. Here, we address this by extending HyperMixer, an efficient alternative to attention exhibiting linear complexity, to the Conformer architecture for speech recognition, leading to HyperConformer. |
Florian Mai; Juan Zuluaga-Gomez; Titouan Parcollet; Petr Motlicek; | arxiv-cs.CL | 2023-05-29 |
495 | CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7.0 (English) and Common Voice 11.0 (Italian, German, and Spanish). |
Juan Zuluaga-Gomez; Sara Ahmed; Danielius Visockas; Cem Subakan; | arxiv-cs.CL | 2023-05-29 |
496 | Building Accurate Low Latency ASR for Streaming Voice Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on developing accurate LSTM, attention, and CTC based streaming ASR models for large-scale Hinglish (a blend of Hindi and English) Voice Search. |
Abhinav Goyal; Nikesh Garera; | arxiv-cs.SD | 2023-05-29 |
497 | Exploration of Efficient End-to-End ASR Using Discretized Input from Self-Supervised Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new protocol that utilizes discretized token sequences in ASR tasks, which includes de-duplication and sub-word modeling to enhance the input sequence. |
Xuankai Chang; Brian Yan; Yuya Fujita; Takashi Maekaku; Shinji Watanabe; | arxiv-cs.SD | 2023-05-29 |
498 | Speech and Noise Dual-stream Spectrogram Refine Network with Speech Distortion Loss for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input. |
HAOYU LU et. al. | arxiv-cs.SD | 2023-05-28 |
499 | Retraining-free Customized ASR for Enharmonic Words Based on A Named-Entity-Aware Model and Phoneme Similarity Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since such NE words tend to be important keywords, ASR easily loses user trust if it misrecognizes them. To solve these problems, this paper proposes a novel retraining-free customized method for E2E-ASRs based on a named-entity-aware E2E-ASR model and phoneme similarity estimation. |
Yui Sudo; Kazuya Hata; Kazuhiro Nakadai; | arxiv-cs.SD | 2023-05-28 |
500 | Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on The False Alarms in Automated Speech Recognition Testing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate false alarm occurrences in five popular ASR systems using synthetic audio generated from four TTS systems and human audio obtained from two commonly used datasets. |
JULIA KAIWEN LAU et. al. | arxiv-cs.SE | 2023-05-27 |
501 | DisfluencyFixer: A Tool to Enhance Language Learning Through Speech To Speech Disfluency Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents DisfluencyFixer, a tool that performs speech-to-speech disfluency correction in English and Hindi using a pipeline of Automatic Speech Recognition (ASR), Disfluency Correction (DC) and Text-To-Speech (TTS) models. |
Vineet Bhat; Preethi Jyothi; Pushpak Bhattacharyya; | arxiv-cs.CL | 2023-05-26 |
502 | INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. … |
Eunseop Yoon; Hee Suk Yoon; John Harvill; M. Hasegawa-Johnson; C. Yoo; | Annual Meeting of the Association for Computational … | 2023-05-25 |
503 | Svarah: Evaluating English ASR Systems on Indian Accents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. |
TAHIR JAVED et. al. | arxiv-cs.CL | 2023-05-25 |
504 | Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with A Sidecar Separator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A recent study proposed a cost-effective method to convert a single-talker automatic speech recognition (ASR) system into a multi-talker one, by inserting a Sidecar separator into the frozen well-trained ASR model. Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters. |
LINGWEI MENG et. al. | arxiv-cs.SD | 2023-05-25 |
505 | Iteratively Improving Speech Recognition and Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel iterative way of improving both the ASR and VC models. |
Mayank Kumar Singh; Naoya Takahashi; Onoe Naoyuki; | arxiv-cs.SD | 2023-05-24 |
506 | InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR. |
ZHI-HAO LAI et. al. | arxiv-cs.CL | 2023-05-24 |
507 | Evaluating OpenAI’s Whisper ASR for Punctuation Prediction and Topic Modeling of Life Histories of The Museum of The Person Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This chapter presents the first study on the performance of Whisper for punctuation prediction in the Portuguese language. |
LUCAS RAFAEL STEFANEL GRIS et. al. | arxiv-cs.CL | 2023-05-23 |
508 | Personalized Predictive ASR for Latency Reduction in Voice Assistants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, thus saving latency. In this paper, we extend this idea by introducing predictive automatic speech recognition, where we predict the full utterance from a partially observed utterance, and prefetch the response based on the predicted utterance. |
Andreas Schwarz; Di He; Maarten Van Segbroeck; Mohammed Hethnawi; Ariya Rastrow; | arxiv-cs.CL | 2023-05-23 |
509 | SE-Bridge: Speech Enhancement with Consistent Brownian Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SE-Bridge, a novel method for speech enhancement (SE). |
Zhibin Qiu; Mengfan Fu; Fuchun Sun; Gulila Altenbek; Hao Huang; | arxiv-cs.SD | 2023-05-23 |
510 | Text Generation with Speech Synthesis for ASR Data Augmentation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work … |
ZHUANGQUN HUANG et. al. | ArXiv | 2023-05-22 |
511 | GNCformer Enhanced Self-attention for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper,an Enhanced Self-Attention (ESA) mechanism has been put forward for robust feature extraction.The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism.In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction.In addition, the location of interest that is suitable for inserting the ESA is also worth being explored.In this paper, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named GNCformer. |
J. Li; Z. Duan; S. Li; X. Yu; G. Yang; | arxiv-cs.SD | 2023-05-22 |
512 | Self-supervised Representations in Speech-based Depression Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). |
Wen Wu; Chao Zhang; Philip C. Woodland; | arxiv-cs.CL | 2023-05-20 |
513 | Wavoice: A MmWave-assisted Noise-resistant Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As automatic speech recognition evolves, the deployment of voice user interface has boomingly expanded. Especially since the COVID-19 pandemic, VUI has gained more attention in … |
TIANTIAN LIU et. al. | ACM Transactions on Sensor Networks | 2023-05-18 |
514 | A Lexical-aware Non-autoregressive Transformer-based ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A series of experiments are conducted on the AISHELL-1, CSJ, and TEDLIUM 2 datasets. |
Chong-En Lin; Kuan-Yu Chen; | arxiv-cs.CL | 2023-05-18 |
515 | FunASR: A Fundamental End-to-End Speech Recognition Toolkit IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. |
ZHIFU GAO et. al. | arxiv-cs.SD | 2023-05-18 |
516 | A Comparative Study on E-Branchformer Vs Conformer in Speech Recognition, Translation, and Understanding Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work compares E-Branchformer and Conformer through extensive experiments using different types of end-to-end sequence-to-sequence models. |
YIFAN PENG et. al. | arxiv-cs.CL | 2023-05-18 |
517 | AVFormer: Injecting Vision Into Frozen Speech Models for Zero-Shot AV-ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AVFormer, a simple method for augmenting audioonly models with visual information, at the same time performing lightweight domain adaptation. |
Paul Hongsuck Seo; Arsha Nagrani; Cordelia Schmid; | cvpr | 2023-05-17 |
518 | MmMIC: Multi-modal Speech Recognition Based on MmWave Radar IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the proliferation of voice assistants, microphone-based speech recognition technology usually cannot achieve good performance in the situation of multiple sound sources and … |
LONG FAN et. al. | IEEE INFOCOM 2023 – IEEE Conference on Computer … | 2023-05-17 |
519 | OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, … |
FAZLE RAKIB et. al. | ArXiv | 2023-05-15 |
520 | Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how HuBERT uses clustering to discover hidden acoustic units, we formulate a factor analysis (FA) model that uses the discovered hidden acoustic units to align the SSL features. |
Weiwei Lin; Chenhang He; Man-Wai Mak; Youzhi Tu; | arxiv-cs.SD | 2023-05-14 |
521 | Investigating The Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes. |
Emma O’Neill; Julie Carson-Berndsen; | arxiv-cs.CL | 2023-05-12 |
522 | Multi-Temporal Lip-Audio Memory for Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Multi-Temporal Lip-Audio Memory (MTLAM) that makes the best use of audio signals to complement insufficient information of lip movements. |
Jeong Hun Yeo; Minsu Kim; Yong Man Ro; | arxiv-cs.CV | 2023-05-08 |
523 | Hybrid Transducer and Attention Based Encoder-Decoder Modeling for Speech-to-Text Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks. |
YUN TANG et. al. | arxiv-cs.CL | 2023-05-04 |
524 | TrojanModel: A Practical Trojan Attack Against Automatic Speech Recognition Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While deep learning techniques have achieved great success in modern digital products, researchers have shown that deep learning models are susceptible to Trojan attacks. In a … |
W. Zong; Yang-Wai Chow; Willy Susilo; Kien Do; S. Venkatesh; | 2023 IEEE Symposium on Security and Privacy (SP) | 2023-05-01 |
525 | Edge Computing Solutions Supporting Voice Recognition Services for Speakers with Dysarthria Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the framework of Automatic Speech Recognition (ASR), the synergism between edge computing and artificial intelligence has led to the development of intelligent objects that … |
Davide Mulfari; Lorenzo Carnevale; A. Galletta; M. Villari; | 2023 IEEE/ACM 23rd International Symposium on Cluster, … | 2023-05-01 |
526 | Automatic Speech Recognition of Portuguese Phonemes Using Neural Networks Ensemble Related Papers Related Patents Related Grants Related Venues Related Experts View |
N. Nedjah; Alejandra D. Bonilla; Luiza de Macedo Mourelle; | Expert Syst. Appl. | 2023-05-01 |
527 | Building A Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. |
Hiuchung Hung; Andreas Maier; Thorsten Piske; | arxiv-cs.CL | 2023-04-30 |
528 | Enhancing Multilingual Speech Recognition in Air Traffic Control By Sentence-level Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a two-stage multilingual ASR framework. |
Peng Fan; Dongyue Guo; JianWei Zhang; Bo Yang; Yi Lin; | arxiv-cs.SD | 2023-04-29 |
529 | Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing multilingual hierarchical Softmax decoding. |
Q. LIU et. al. | icassp | 2023-04-27 |
530 | Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel modeling framework for effective training of end-to-end automatic speech recognition (ASR) models on various sources of data from diverse domains: speech paired with clean ground truth transcripts, speech with noisy pseudo transcripts from semi-supervised decodes and unpaired text-only data. |
T. Fukuda; S. Thomas; | icassp | 2023-04-27 |
531 | Continual Learning for On-Device Speech Recognition Using Disentangled Conformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This algorithm produces ASR models consisting of a frozen ‘core’ network for general-purpose use and several tunable ‘augment’ networks for speaker-specific tuning. Using such models, we propose a novel compute-efficient continual learning algorithm called DisentangledCL. |
A. DIWAN et. al. | icassp | 2023-04-27 |
532 | Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a speech summarization system that enables E2E summarization from 100 seconds, which is the limit of the conventional method, to up to 10 minutes (i.e., the duration of typical instructional videos on YouTube). |
T. KANO et. al. | icassp | 2023-04-27 |
533 | DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for generative tasks such as speech enhancement and speech separation, most self-supervised speech representations did not show substantial improvements. To deal with this problem, in this paper, we propose data2vec-SG (Speech Generation), which is a teacher-student learning framework that addresses speech generation tasks. |
H. WANG et. al. | icassp | 2023-04-27 |
534 | Stabilising and Accelerating Light Gated Recurrent Units for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the unbounded nature of its rectified linear unit on the candidate recurrent gate induces a gradient exploding phenomenon disrupting the training process and preventing it from being applied to medium to large ASR datasets. In this paper, we theoretically and empirically derive the necessary conditions for its stability as well as engineering mechanisms to speed up by a factor of five its training time, hence introducing a novel version of this architecture named SLi-GRU. |
A. Moumen; T. Parcollet; | icassp | 2023-04-27 |
535 | Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a series of approaches to integrate domain adapted Self-Supervised Learning (SSL) pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends and domain adapted wav2vec2.0 speech representations; b) frame-level joint decoding of TDNN systems separately trained using standard acoustic features alone and with additional wav2vec2.0 features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain adapted wav2vec2.0 models. |
S. HU et. al. | icassp | 2023-04-27 |
536 | Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How important are different temporal speech modulations for speech recognition? We answer this question from two complementary perspectives. |
S. Sadhu; H. Hermansky; | icassp | 2023-04-27 |
537 | A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains … |
L. MENG et. al. | icassp | 2023-04-27 |
538 | Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this work, we investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size. |
P. MA et. al. | icassp | 2023-04-27 |
539 | Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. |
H. LIU et. al. | icassp | 2023-04-27 |
540 | Improving Speech-to-Speech Translation Through Unlabeled Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data. |
X. -P. NGUYEN et. al. | icassp | 2023-04-27 |
541 | Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-to-End Automated Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the potential of leveraging external knowledge, particularly through off-policy generated text-to-speech key-value stores, to allow for flexible post-training adaptation to new data distributions. |
D. M. Chan; S. Ghosh; A. Rastrow; B. Hoffmeister; | icassp | 2023-04-27 |
542 | The Edinburgh International Accents of English Corpus: Towards The Democratization of English ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first release of The Edinburgh International Accents of English Corpus (EdAcc). |
R. SANABRIA et. al. | icassp | 2023-04-27 |
543 | Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since these are E2E models operating on speech directly, there remains a potential to improve their performance using purely text based models like BERT, which have strong language understanding capabilities. In this paper, we propose a new training criteria for RNN-T based E2E ASR and SLU to transfer BERT’s knowledge into these systems. |
V. Sunder; S. Thomas; H. -K. J. Kuo; B. Kingsbury; E. Fosler-Lussier; | icassp | 2023-04-27 |
544 | Robust Audio-Visual ASR with Unified Cross-Modal Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new audio-visual speech recognition model with a unified cross-modal attention mechanism. |
J. Li; C. Li; Y. Wu; Y. Qian; | icassp | 2023-04-27 |
545 | Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The research community has produced many successful self-supervised speech representation learning methods over the past few years. |
A. ELKAHKY et. al. | icassp | 2023-04-27 |
546 | Transcription Free Filler Word Detection with Neural Semi-CRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate filler word detection system1 that does not depend on ASR systems. |
G. Zhu; Y. Yan; J. -P. Caceres; Z. Duan; | icassp | 2023-04-27 |
547 | Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce multi-resolution LBT to estimate the complex spectrograms from low to high time and frequency resolutions. |
H. Taherian; D. Wang; | icassp | 2023-04-27 |
548 | An ASR-Free Fluency Scoring Approach with Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a novel ASR-free approach for automatic fluency assessment using self-supervised learning (SSL). |
W. LIU et. al. | icassp | 2023-04-27 |
549 | Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When training data is lacking in ASR, a large-scale pre-training and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to over-come, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. |
L. P. Violeta; D. Ma; W. -C. Huang; T. Toda; | icassp | 2023-04-27 |
550 | Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to overcome CF for E2E ASR by inserting adapters, small architectures of few parameters which allow a general model to be fine-tuned to a specific task, into our model. |
S. V. Eeckt; H. Van Hamme; | icassp | 2023-04-27 |
551 | Text Is All You Need: Personalizing ASR Models Using Controllable Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. |
K. Yang; T. -Y. Hu; J. -H. R. Chang; H. Swetha Koppula; O. Tuzel; | icassp | 2023-04-27 |
552 | Speech Reconstruction from Silent Tongue and Lip Articulation By Pseudo Target Generation and Domain Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to employ a method built on pseudo target generation and domain adversarial training with an iterative training strategy to improve the intelligibility and naturalness of the speech recovered from silent tongue and lip articulation. |
R. -C. Zheng; Y. Ai; Z. -H. Ling; | icassp | 2023-04-27 |
553 | Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the effective finetuning of a large-scale pretrained model for automatic speech recognition (ASR) of lowresource languages with only a one-hour matched dataset. |
K. Soky; S. Li; C. Chu; T. Kawahara; | icassp | 2023-04-27 |
554 | UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. |
J. GUO et. al. | icassp | 2023-04-27 |
555 | Selective Film Conditioning with CTC-Based ASR Probability for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although prior studies have improved the performance, they are inefficient because the two networks are combined and require large model sizes. To address this limitation, we propose an efficient way to use feature-wise linear modulation (FiLM) conditioning with CTC-based ASR probabilities for the SE system. |
D. -H. Yang; J. -H. Chang; | icassp | 2023-04-27 |
556 | Leveraging Large Text Corpora For End-To-End Speech Summarization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two novel methods that leverage a large amount of external text summarization data for E2E SSum training. |
K. MATSUURA et. al. | icassp | 2023-04-27 |
557 | Weavspeech: Data Augmentation Strategy For Automatic Speech Recognition Via Semantic-Aware Weaving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, if speech signals are indiscriminately mixed without considering semantics, the risk of generating nonsensical sentences arises. To address these issues, in this paper, we propose WeavSpeech, still a simple yet effective cut-and-paste augmentation method for ASR tasks that weaves a pair of speech data considering semantics. |
K. Seo; J. Park; J. Song; E. Yang; | icassp | 2023-04-27 |
558 | Automatic Severity Classification of Dysarthric Speech By Using Self-Supervised Model with Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. |
E. J. Yeo; K. Choi; S. Kim; M. Chung; | icassp | 2023-04-27 |
559 | The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. |
P. Guo; H. Wang; B. Mu; A. Zhang; P. Chen; | icassp | 2023-04-27 |
560 | An Analysis of Degenerating Speech Due to Progressive Dysarthria on ASR Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aims of this study were to (1) analyze the change of performance of ASR over time in individuals with degrading speech, and (2) explore mitigation strategies to optimize recognition throughout disease progression. |
K. TOMANEK et. al. | icassp | 2023-04-27 |
561 | SAN: A Robust End-to-End ASR Model Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. |
Z. Min; Q. Ge; G. Huang; | icassp | 2023-04-27 |
562 | Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on End-to-End ASR, in this paper, we propose a simple yet effective method to overcome catastrophic forgetting: weight averaging. |
S. Vander Eeckt; H. Van Hamme; | icassp | 2023-04-27 |
563 | An Adapter Based Multi-Label Pre-Training for Speech Separation and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on HuBERT, this work investigates improving the SSL model for SS and SE. |
T. Wang; X. Chen; Z. Chen; S. Yu; W. Zhu; | icassp | 2023-04-27 |
564 | Ensemble Knowledge Distillation of Self-Supervised Speech Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously. |
K. . -P. HUANG et. al. | icassp | 2023-04-27 |
565 | Self-Supervised Learning-Based Source Separation for Meeting Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, seven SSL models were compared on both simulated and real-world corpora. |
Y. Li; X. Zheng; P. C. Woodland; | icassp | 2023-04-27 |
566 | Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we analyze the performance of features at different layers of a foundation model on the speech recognition task and propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models. |
Z. HUO et. al. | icassp | 2023-04-27 |
567 | Slot-Triggered Contextual Biasing For Personalized Speech Recognition Using Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method whereby the E2E ASR model is trained to emit opening and closing tags around slot content which are used to both selectively enable biasing and decide which catalog to use for biasing. |
S. Tong; P. Harding; S. Wiesler; | icassp | 2023-04-27 |
568 | Avoid Overthinking in Self-Supervised Models for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then motivate further research in EE by computing an optimal bound for performance versus speed trade-offs. To approach this bound we propose two new strategies for ASR: (1) we adapt the recently proposed patience strategy to ASR; and (2) we design a new EE strategy specific to ASR that performs better than all strategies previously introduced. |
D. Berrebbi; B. Yan; S. Watanabe; | icassp | 2023-04-27 |
569 | Robust Multi-modal Speech Emotion Recognition with ASR Error Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an SER method robust to ASR errors. |
B. Lin; L. Wang; | icassp | 2023-04-27 |
570 | Towards Improved Room Impulse Response Estimation for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). |
A. RATNARAJAH et. al. | icassp | 2023-04-27 |
571 | Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In streaming ASR, high accuracy is assured by attending to look-ahead frames, which leads to delay increments. To tackle this trade-off issue, we propose a multiple latency streaming ASR to achieve high accuracy with zero look-ahead. |
H. ZHAO et. al. | icassp | 2023-04-27 |
572 | Federated Self-Learning with Weak Supervision for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine transcriptions from a stronger ASR model. |
M. RAO et. al. | icassp | 2023-04-27 |
573 | Improving Noisy Student Training on Non-Target Domain Data for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks. |
Y. Chen; W. Ding; J. Lai; | icassp | 2023-04-27 |
574 | Representation of Vocal Tract Length Transformation Based on Group Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the property of vocal tract length transformation (VTLT) that forms a group, and derive the novel speech representation VTL spectrum based on group theory analysis, where only the phase of the VTL spectrum is changed by VTLT, which is a simple linear shift. |
A. Miyashita; T. Toda; | icassp | 2023-04-27 |
575 | Towards Accurate and Real-Time End-of-Speech Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variant of the endpoint (EP) detection problem in automatic speech recognition (ASR), which we call the end-of-speech (EOS) estimation. |
Y. FAN et. al. | icassp | 2023-04-27 |
576 | Multi-Temporal Lip-Audio Memory for Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Multi-Temporal Lip-Audio Memory (MTLAM) that makes the best use of audio signals to complement insufficient information of lip movements. |
J. H. Yeo; M. Kim; Y. M. Ro; | icassp | 2023-04-27 |
577 | Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. |
Y. Hu; C. Chen; R. Li; Q. Zhu; E. S. Chng; | icassp | 2023-04-27 |
578 | Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. |
Z. JIN et. al. | icassp | 2023-04-27 |
579 | Multi-Blank Transducers for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In standard RNN-T, the emission of a blank symbol consumes exactly one input frame; in our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted. |
H. Xu; F. Jia; S. Majumdar; S. Watanabe; B. Ginsburg; | icassp | 2023-04-27 |
580 | De’hubert: Disentangling Noise in A Self-Supervised Model for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow’s redundancy-reduction principle. |
D. NG et. al. | icassp | 2023-04-27 |
581 | WL-MSR: Watch and Listen for Multimodal Subtitle Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Watch and Listen for Multimodal Subtitle Recognition (WL-MSR) framework to obtain comprehensive video subtitles, by fusing the information provided by Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) models. |
J. Liu; H. Wang; W. Wang; X. He; J. Liu; | icassp | 2023-04-27 |
582 | Understanding Shared Speech-Text Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand our understanding of the resulting shared speech-text representations with two types of analyses. |
G. WANG et. al. | icassp | 2023-04-27 |
583 | Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR). |
Y. Zhang; K. C. Puvvada; V. Lavrukhin; B. Ginsburg; | icassp | 2023-04-27 |
584 | Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to factorize out the language component in the AED model, we propose the factorized attention-based encoder-decoder (Factorized AED) model whose decoder takes as input the posterior probabilities of a jointly trained LM. |
X. Gong; W. Wang; H. Shao; X. Chen; Y. Qian; | icassp | 2023-04-27 |
585 | Understanding Shared Speech-Text Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expandour understanding of the resulting shared speech-text representationswith two types of analyses. |
GARY WANG et. al. | arxiv-cs.CL | 2023-04-27 |
586 | Align, Write, Re-Order: Explainable End-to-End Speech Translation Via Operation Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The black-box nature of end-to-end speech-to-text translation (E2E ST) makes it difficult to understand how source language inputs are being mapped to the target language. To solve this problem, we propose to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word. |
M. Omachi; B. Yan; S. Dalmia; Y. Fujita; S. Watanabe; | icassp | 2023-04-27 |
587 | Code-Switching Text Generation and Injection in Mandarin-English ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T), in Mandarin-English code-switching speech recognition. |
H. YU et. al. | icassp | 2023-04-27 |
588 | Improving Non-Autoregressive Speech Recognition with Autoregressive Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AR pretraining to the NAR encoder to reduce the accuracy gap between AR and NAR models. |
Y. Li; L. Samarakoon; I. Fung; | icassp | 2023-04-27 |
589 | Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget. |
R. Gody; D. Harwath; | icassp | 2023-04-27 |
590 | Anchored Speech Recognition with Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech. |
D. RAJ et. al. | icassp | 2023-04-27 |
591 | Improving Fairness and Robustness in End-to-End Speech Recognition Through Unsupervised Clustering IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a privacy preserving approach to improve fairness and robustness of end-to-end ASR without using metadata, zip codes, or even speaker or utterance embeddings directly in training. |
I. -E. Veliche; P. Fung; | icassp | 2023-04-27 |
592 | Visual Information Matters for ASR Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The other is that the community lacks a high-quality benchmark where visual information matters for the EC models. Therefore, this paper provides 1) simple yet effective methods, namely gated fusion and image captions as prompts to incorporate visual information to help EC; 2) large-scale benchmark datasets, namely Visual-ASR-EC, where each item in the training data consists of visual, speech, and text information, and the test data are carefully selected by human annotators to ensure that even humans could make mistakes when visual information is missing. |
V. B. Kumar; S. Cheng; N. Peng; Y. Zhang; | icassp | 2023-04-27 |
593 | Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we improve the self-adaptive TTS using character-vocabulary level ASR feedback at higher granularity, considering the losses in the positive and negative classes. |
S. Novitasari; S. Sakti; S. Nakamura; | icassp | 2023-04-27 |
594 | LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor and then embeds token-level long-form features inside the vocabulary predictor, with a pre-trained contextual encoder RoBERTa to further boost the performance. |
X. GONG et. al. | icassp | 2023-04-27 |
595 | Self-Supervised Representations in Speech-Based Depression Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). |
W. Wu; C. Zhang; P. C. Woodland; | icassp | 2023-04-27 |
596 | Exploring Wav2vec 2.0 Fine Tuning for Improved Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that V-FT is able to outperform state-of-the-art models on the IEMOCAP dataset. |
L. -W. Chen; A. Rudnicky; | icassp | 2023-04-27 |
597 | Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. |
F. WU et. al. | icassp | 2023-04-27 |
598 | Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion. |
Z. HUANG et. al. | icassp | 2023-04-27 |
599 | Context-Aware Fine-Tuning of Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the use of context, i.e., surrounding segments, during fine-tuning and propose a new approach called context-aware fine-tuning. |
S. SHON et. al. | icassp | 2023-04-27 |
600 | Towards Zero-Shot Code-Switched Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot set-ting where no transcribed CS speech data is available for training. |
B. Yan; M. Wiesner; O. Klejch; P. Jyothi; S. Watanabe; | icassp | 2023-04-27 |
601 | Enhancing Unsupervised Speech Recognition with Diffusion GANS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusionGAN. |
X. Wu; | icassp | 2023-04-27 |
602 | MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel UDA approach for ASR via inter-domain MAtching and intra-domain DIscrimination (MADI), which improves the model transferability by fine-grained inter-domain matching and discriminability by intra-domain contrastive discrimination simultaneously. |
J. Zhou; S. Zhao; N. Jiang; G. Zhao; Y. Qin; | icassp | 2023-04-27 |
603 | Vararray Meets T-Sot: Advancing The State of The Art of Streaming Distant Conversational Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. |
N. KANDA et. al. | icassp | 2023-04-27 |
604 | Database-Aware ASR Error Correction for Speech-to-SQL Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an ASR correction method, DBATI (DataBase-Aware TaggerILM). |
Y. Shao; A. Kumar; N. Nakashole; | icassp | 2023-04-27 |
605 | Improving Accented Speech Recognition with Multi-Domain Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models. |
L. Maison; Y. Esteve; | icassp | 2023-04-27 |
606 | MoLE : Mixture Of Language Experts For Multi-Lingual Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a multi-lingual speech recognition network named Mixture-of-Language-Experts (MoLE), which digests speech in a variety of languages. |
Y. Kwon; S. -W. Chung; | icassp | 2023-04-27 |
607 | Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new shallow fusion (SF) method to exploit an external backward language model (BLM) for end-to-end automatic speech recognition (ASR). |
A. Ogawa; T. Moriya; N. Kamo; N. Tawara; M. Delcroix; | icassp | 2023-04-27 |
608 | A Speech Representation Anonymization Framework Via Selective Noise Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a speech anonymization framework that achieves privacy via noise perturbation to a selected subset of the high-utility representations extracted using a pre-trained speech encoder. |
M. Tran; M. Soleymani; | icassp | 2023-04-27 |
609 | From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English automatic speech recognition (ASR) models to recognize the other languages. |
C. -H. H. YANG et. al. | icassp | 2023-04-27 |
610 | Simulating Realistic Speech Overlaps Improves Multi-Talker ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an improved technique to simulate multi-talker overlap-ping speech with realistic speech overlaps, where an arbitrary pattern of speech overlaps is represented by a sequence of discrete tokens. |
M. YANG et. al. | icassp | 2023-04-27 |
611 | Context-Aware End-to-end ASR Using Self-Attentive Embedding and Tensor Fusion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a context-aware end-to-end ASR model that injects the self-attentive context embedding into the decoder of the recurrent neural network transducer (RNN-T). |
S. -Y. Chang; C. Zhang; T. N. Sainath; B. Li; T. Strohman; | icassp | 2023-04-27 |
612 | Multi-modal ASR Error Correction with Joint ASR Error Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To include the audio information for better error correction, we propose a sequence-to-sequence multi-modal ASR error correction model. |
B. Lin; L. Wang; | icassp | 2023-04-27 |
613 | UML: A Universal Monolingual Output Layer For Multilingual Asr Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For multilingual ASR, due to the differences in written scripts across languages, multilingual WPMs bring the challenges of having overly large output layers and scaling to more languages. In this work, we propose a universal monolingual output layer (UML) to address such problems. |
C. Zhang; B. Li; T. N. Sainath; T. Strohman; S. -Y. Chang; | icassp | 2023-04-27 |
614 | Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. |
J. SHI et. al. | icassp | 2023-04-27 |
615 | Investigation Into Phone-Based Subword Units for Multilingual End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use of phone-based sub-words, specifically Byte Pair Encoding (BPE), as modeling units for multilingual end-to-end speech recognition. |
S. Yusuyin; H. Huang; J. Liu; C. Liu; | icassp | 2023-04-27 |
616 | Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Cleanformer —a streaming multichannel neural enhancement frontend for automatic speech recognition (ASR). |
J. Caroselli; A. Narayanan; N. Howard; T. O’Malley; | icassp | 2023-04-27 |
617 | Structured State Space Decoder for Speech Recognition and Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we applied S4 as a decoder for ASR and text-to-speech (TTS) tasks, respectively, by comparing it with the Transformer decoder. |
K. Miyazaki; M. Murata; T. Koriyama; | icassp | 2023-04-27 |
618 | Pretraining Conformer with ASR for Speaker Verification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to pretrain Conformer with automatic speech recognition (ASR) task for speaker verification. |
D. Cai; W. Wang; M. Li; R. Xia; C. Huang; | icassp | 2023-04-27 |
619 | Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. |
Y. Tseng; C. -I. J. Lai; H. -Y. Lee; | icassp | 2023-04-27 |
620 | Learning ASR Pathways: A Sparse Multilingual ASR Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in multilingual ASR, language-agnostic pruning may lead to severe performance drops on some languages because language-agnostic pruning masks may not fit all languages and discard important language-specific parameters. In this work, we present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks (pathways), such that the parameters for each language are learned explicitly. |
M. YANG et. al. | icassp | 2023-04-27 |
621 | Adaptive Multi-Corpora Language Model Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel adaptive multi-corpora training algorithm that dynamically learns and adjusts the sampling probability of each corpus along the training process. |
Y. Ma; Z. Liu; X. Zhang; | icassp | 2023-04-27 |
622 | Multi-Speaker Data Augmentation for Improved End-to-end Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While E2E ASR models achieve state-of-the-art performance on recognition tasks that match well with such training data, they are observed to fail on test recordings that contain multiple speakers, significant channel or background noise or span longer durations than training data utterances. To mitigate these issues, we propose an on-the-fly data augmentation strategy that transforms single speaker training data into multiple speaker data by appending together multiple single speaker utterances. |
S. Thomas; H. -K. J. Kuo; G. Saon; B. Kingsbury; | icassp | 2023-04-27 |
623 | Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input. |
H. LU et. al. | icassp | 2023-04-27 |
624 | Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose to employ a novel bidirectional attention mechanism (BiAM) to jointly learn both ASR encoder (bottom layers) and text encoder with a multi-modal learning method. |
Y. Yang; H. Xu; H. Huang; E. S. Chng; S. Li; | icassp | 2023-04-27 |
625 | HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose HuBERT-AGG, a novel method that learns noise-invariant SSL representations for robust speech recognition by distilling aggregated layer-wise representations. |
W. Wang; Y. Qian; | icassp | 2023-04-27 |
626 | Joint Unsupervised and Supervised Learning for Context-Aware Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. |
J. PARK et. al. | icassp | 2023-04-27 |
627 | Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Virtuoso, a massively multilingual speech–text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. |
T. SAEKI et. al. | icassp | 2023-04-27 |
628 | Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective … |
Eduardo Medeiros; Leonel Corado; Luís Rato; P. Quaresma; Pedro Salgueiro; | Future Internet | 2023-04-24 |
629 | Using Automatic Speech Recognition to Measure The Intelligibility of Speech Synthesized From Brain Signals Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Brain-computer interfaces (BCIs) can potentially restore lost function in patients with neurological injury. A promising new application of BCI technology has focused on speech … |
Suvi Varshney; D. Farias; David M. Brandman; S. Stavisky; Lee M. Miller; | 2023 11th International IEEE/EMBS Conference on Neural … | 2023-04-24 |
630 | Situating Automatic Speech Recognition Development Within Communities of Under-heard Language Speakers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper we develop approaches to automatic speech recognition (ASR) development that suit the needs and functions of under-heard language speakers. Our novel contribution to … |
THOMAS REITMAIER et. al. | Proceedings of the 2023 CHI Conference on Human Factors in … | 2023-04-19 |
631 | Collaboratively Mitigating Racial Disparities in Automated Speech Recognition and Language Technologies with African American English Speakers: Community-Collaborative and Equity-Centered Approaches Toward Designing Inclusive Natural Language Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automated speech recognition (ASR) systems that rely on natural language processing (NLP) techniques are becoming increasingly prevalent within people’s everyday lives. From … |
Jay L. Cunningham; | Extended Abstracts of the 2023 CHI Conference on Human … | 2023-04-19 |
632 | Improving Automatic Summarization for Browsing Longform Spoken Dialog IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Longform spoken dialog delivers rich streams of informative content through podcasts, interviews, debates, and meetings. While production of this medium has grown tremendously, … |
Daniel Li; Thomas Chen; Alec Zadikian; Albert Tung; Lydia B. Chilton; | Proceedings of the 2023 CHI Conference on Human Factors in … | 2023-04-19 |
633 | Speech Command Recognition Based on Convolutional Spiking Neural Networks Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This article presents a new technique for speech recognition that combines Convolutional Neural Networks (CNNs) with Spiking Neural Networks (SNNs) to create an SNNCNN model. The … |
Erik Sadovsky; Maroš Jakubec; R. Jarina; | 2023 33rd International Conference Radioelektronika … | 2023-04-19 |
634 | Political Corpus Creation Through Automatic Speech Recognition on EU Debates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a transcribed corpus of the LIBE committee of the EU parliament, totalling 3.6 Million running words. |
Hugo de Vos; Suzan Verberne; | arxiv-cs.CL | 2023-04-17 |
635 | Speech2Spikes: Efficient Audio Encoding Pipeline for Real-time Neuromorphic Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Despite the maturity and availability of speech recognition systems, there are few available spiking speech recognition tasks that can be implemented with current neuromorphic … |
Kenneth Michael Stewart; Timothy M. Shea; Noah Pacik-Nelson; Eric M Gallo; Andreea Danielescu; | Proceedings of the 2023 Annual Neuro-Inspired Computational … | 2023-04-11 |
636 | Speech Recognition Method Based on Deep Learning of Artificial Intelligence: An Example of BLSTM-CTC Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Under the influence of information, network and intelligent high-speed development situation, China’s intelligent technology and other aspects have made great progress and … |
Kangyu Chen; Zhiyuan Peng; | Proceedings of the 2023 5th International Symposium on … | 2023-03-24 |
637 | Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this work, we investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size. |
PINGCHUAN MA et. al. | arxiv-cs.CV | 2023-03-24 |
638 | Beyond Universal Transformer: Block Reusing with Adaptor in Transformer for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the drawback of universal Transformer models for the application of ASR on edge devices, we propose a solution that can reuse the block in Transformer models for the occasion of the small footprint ASR system, which meets the objective of accommodating resource limitations without compromising recognition accuracy. |
Haoyu Tang; Zhaoyi Liu; Chang Zeng; Xinfeng Li; | arxiv-cs.SD | 2023-03-23 |
639 | Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View |
M. Latha; M. Shivakumar; G. Manjula; M. Hemakumar; M. K. Kumar; | SN Computer Science | 2023-03-20 |
640 | A Deep Learning System for Domain-specific Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the-art automatic speech recognition (ASR) systems are proposed. However, … |
Yanan Jia; | arxiv-cs.CL | 2023-03-18 |
641 | Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. |
Yuan Tseng; Cheng-I Lai; Hung-yi Lee; | arxiv-cs.CL | 2023-03-15 |
642 | Improving Accented Speech Recognition with Multi-Domain Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models. |
Lucas Maison; Yannick Estève; | arxiv-cs.LG | 2023-03-14 |
643 | MIXPGD: Hybrid Adversarial Training for Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose mixPGD adversarial training method to improve the robustness of the model for ASR systems. |
Aminul Huq; Weiyi Zhang; Xiaolin Hu; | arxiv-cs.SD | 2023-03-10 |
644 | An Overview of Bengali Speech Recognition: Methods, Challenges, and Future Direction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the subject of human-computer interactions, speech recognition is an appealing technique that gives users the opportunity to interact with and control the machine. Currently, … |
NABILA TASNIA et. al. | 2023 IEEE 13th Annual Computing and Communication Workshop … | 2023-03-08 |
645 | End-to-End Speech Recognition: A Survey Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, … |
Rohit Prabhavalkar; Takaaki Hori; Tara N. Sainath; R. Schluter; Shinji Watanabe; | ArXiv | 2023-03-03 |
646 | Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. |
YU ZHANG et. al. | arxiv-cs.CL | 2023-03-02 |
647 | Leveraging Large Text Corpora for End-to-End Speech Summarization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two novel methods that leverage a large amount of external text summarization data for E2E SSum training. |
KOHEI MATSUURA et. al. | arxiv-cs.CL | 2023-03-02 |
648 | MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages. |
MOHAMED ANWAR et. al. | arxiv-cs.CL | 2023-03-01 |
649 | WTASR: Wavelet Transformer for Automatic Speech Recognition of Indian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View |
T. Choudhary; Vishal Goyal; A. Bansal; | Big Data Min. Anal. | 2023-03-01 |
650 | Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation. |
PHILIPP KLUMPP et. al. | arxiv-cs.CL | 2023-03-01 |
651 | N-best T5: Robust ASR Error Correction Using Multiple Input Hypotheses and Constrained Decoding Space IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. |
Rao Ma; Mark J. F. Gales; Kate M. Knill; Mengjie Qian; | arxiv-cs.CL | 2023-03-01 |
652 | Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends and domain adapted wav2vec2.0 speech representations; b) frame-level joint decoding of TDNN systems separately trained using standard acoustic features alone and with additional wav2vec2.0 features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain adapted wav2vec2.0 models. |
SHUJIE HU et. al. | arxiv-cs.SD | 2023-02-28 |
653 | Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In this paper, we propose a language-universal adapter learning framework based on a pre-trained model for end-to-end multilingual automatic speech recognition (ASR). For acoustic … |
Zhijie Shen; Wu Guo; Bin Gu; | ArXiv | 2023-02-28 |
654 | DeHuBERT: Disentangling Noise in A Self-supervised Model for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow’s redundancy-reduction principle. |
DIANWEN NG et. al. | arxiv-cs.SD | 2023-02-28 |
655 | Deep Learning Methods for Arabic Autoencoder Speech Recognition System for Electro-Larynx Device Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent advances in speech recognition have achieved remarkable performance comparable with human transcribers’ abilities. But this significant performance is not the same for all … |
Z. J. M. Ameen; A. Kadhim; | Adv. Hum. Comput. Interact. | 2023-02-28 |
656 | Multimodal Speech Recognition for Language-Guided Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose training a multimodal ASR model to reduce errors in transcribing spoken instructions by considering the accompanying visual context. |
ALLEN CHANG et. al. | arxiv-cs.CL | 2023-02-27 |
657 | Diacritic Recognition Performance in Arabic ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an analysis of diacritic recognition performance in Arabic Automatic Speech Recognition (ASR) systems. |
Hanan Aldarmaki; Ahmad Ghannam; | arxiv-cs.CL | 2023-02-27 |
658 | Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications … |
Jaeyoung Huh; Sangjoon Park; Jeonghyeon Lee; Jong-Chul Ye; | ArXiv | 2023-02-27 |
659 | A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we summarize and compare different data augmentation strategies using S3PRL toolkit. |
Mina Huh; Ruchira Ray; Corey Karnei; | arxiv-cs.SD | 2023-02-27 |
660 | Speech Corpora Divergence Based Unsupervised Data Selection for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora. |
Changfeng Gao; Gaofeng Cheng; Pengyuan Zhang; Yonghong Yan; | arxiv-cs.CL | 2023-02-25 |
661 | MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel UDA approach for ASR via inter-domain MAtching and intra-domain DIscrimination (MADI), which improves the model transferability by fine-grained inter-domain matching and discriminability by intra-domain contrastive discrimination simultaneously. |
Jiaming Zhou; Shiwan Zhao; Ning Jiang; Guoqing Zhao; Yong Qin; | arxiv-cs.CL | 2023-02-22 |
662 | An Approach for Speech Enhancement with Dysarthric Speech Recognition Using Optimization Based Machine Learning Frameworks Related Papers Related Patents Related Grants Related Venues Related Experts View |
Bhuvaneshwari Jolad; Rajashri Khanai; | International Journal of Speech Technology | 2023-02-21 |
663 | Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the application of language and speech technology to open-ended questions in a Dutch panel survey. |
Henk van den Heuvel; Martijn Bentum; Simone Wills; Judith C. Koops; | arxiv-cs.CL | 2023-02-21 |
664 | A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains … |
LINGWEI MENG et. al. | arxiv-cs.SD | 2023-02-20 |
665 | Chinese ASR and NER Improvement Based on Whisper Fine-Tuning IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Based on 680k hours of weakly supervised multilingual and multi-task speech transcription/translation data, Whisper [1] has developed a robust system for both Automated Speech … |
Hao Yang; Min Zhang; Shimin Tao; Miaomiao Ma; Ying Qin; | 2023 25th International Conference on Advanced … | 2023-02-19 |
666 | Reconsidering Read and Spontaneous Speech: Causal Perspectives on The Generation of Training Data for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic … |
Philipp Gabler; Bernhard C. Geiger; Barbara Schuppler; Roman Kern; | Inf. | 2023-02-19 |
667 | Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words. |
Leyuan Qu; Cornelius Weber; Stefan Wermter; | arxiv-cs.CL | 2023-02-19 |
668 | QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a lightweight VITS-based VC model that uses the HuBERT-Soft model to extract content information features without speaker information. |
Houjian Guo; Chaoran Liu; Carlos Toshinori Ishi; Hiroshi Ishiguro; | arxiv-cs.SD | 2023-02-16 |
669 | ASR Bundestag: A Large-Scale Political Debate Dataset in German Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ASR Bundestag, a dataset for automatic speech recognition in German, consisting of 610 hours of aligned audio-transcript pairs for supervised training as well as 1,038 hours of unlabeled audio snippets for self-supervised learning, based on raw audio data and transcriptions from plenary sessions and committee meetings of the German parliament. |
Johannes Wirth; René Peinl; | arxiv-cs.CL | 2023-02-12 |
670 | Leveraging Supplementary Text Data to Kick-start Automatic Speech Recognition System Development with Limited Transcriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the use of different amounts of text data, both for creating a lexicon that constrains ASR decoding to possible words (e.g. *dogz vs. dogs), and for training larger language models that bias the system toward probable word sequences (e.g. too dogs vs. two dogs). |
NAY SAN et. al. | arxiv-cs.CL | 2023-02-09 |
671 | PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose PATCorrect-a novel non-autoregressive (NAR) approach based on multi-modal fusion leveraging representations from both text and phoneme modalities, to reduce word error rate (WER) and perform robustly with varying input transcription quality. |
Ziji Zhang; Zhehui Wang; Rajesh Kamma; Sharanya Eswaran; Narayanan Sadagopan; | arxiv-cs.CL | 2023-02-09 |
672 | MAC: A Unified Framework Boosting Low Resource Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). |
Zeping Min; Qian Ge; Zhong Li; Weinan E; | arxiv-cs.CL | 2023-02-05 |
673 | Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN). |
MINGLUN HAN et. al. | arxiv-cs.NE | 2023-02-02 |
674 | Measuring The Intelligibility of Dysarthric Speech Through Automatic Speech Recognition in A Pluricentric Language Related Papers Related Patents Related Grants Related Venues Related Experts View |
Wei Xue; C. Cucchiarini; R. Hout; H. Strik; | Speech Commun. | 2023-02-01 |
675 | In-Situ Text-Only Adaptation of Speech Models with Low-Overhead Speech Imputations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach (TOLSTOI) that imputes speech representations internal to a baseline RNN-T, starting from text-only inputs, and performs in-situ adaptation that results in higher adaptation accuracy without any runtime overheads during decoding. |
Ashish Mittal; Sunita Sarawagi; Preethi Jyothi; | iclr | 2023-02-01 |
676 | Improving Rare Words Recognition Through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the problem of lowresource Cantonese speech recognition, this paper presents a novel homophone extension method to integrate human knowledge of the homophone lexicon into the beam search decoding process with language model re-scoring. |
HOLAM CHUNG et. al. | arxiv-cs.CL | 2023-02-01 |
677 | A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The proliferation of personal artificial intelligence (AI) -assistant technologies with speech-based conversational AI interfaces is driving the exponential growth in the consumer … |
THIERRY TAMBE et. al. | IEEE Journal of Solid-State Circuits | 2023-02-01 |
678 | Attention-based Latent Features for Jointly Trained End-to-end Automatic Speech Recognition with Modified Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View |
Dali Yang; Joon‐Hyuk Chang; | J. King Saud Univ. Comput. Inf. Sci. | 2023-02-01 |
679 | Prioritizing Speech Test Cases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PROPHET (PRiOritizing sPeecH tEsT), a tool that predicts potential error-uncovering speech test cases only based on their reference texts. |
ZHOU YANG et. al. | arxiv-cs.SE | 2023-02-01 |
680 | Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers Via Hierarchical Distillation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems, we propose the hierarchical knowledge distillation (HKD) on the continuous integrate-and-fire (CIF) based ASR models. |
Minglun Han; Feilong Chen; Jing Shi; Shuang Xu; Bo Xu; | arxiv-cs.CL | 2023-01-30 |
681 | Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, to the best of our knowledge, there isn’t a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. |
Tanvi Dinkar; Chloé Clavel; Ioana Vasilescu; | arxiv-cs.CL | 2023-01-25 |
682 | From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages. |
CHAO-HAN HUCK YANG et. al. | arxiv-cs.SD | 2023-01-18 |
683 | Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we investigate the impact of using syllables as subword tokens instead of words in Malayalam ASR, and evaluate the relative improvement in lexicon size, model memory requirement and word error rate. |
Kavya Manohar; A. R. Jayan; Rajeev Rajan; | arxiv-cs.CL | 2023-01-17 |
684 | Using Kaldi for Automatic Speech Recognition of Conversational Austrian German Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ASR experiments with read and conversational Austrian German as target. |
Julian Linke; Saskia Wepner; Gernot Kubin; Barbara Schuppler; | arxiv-cs.CL | 2023-01-16 |
685 | M2ASR-KIRGHIZ: A Free Kirghiz Speech Database and Accompanied Baselines Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep learning has significantly boosted the performance improvement of automatic speech recognition (ASR) with the cooperation of large amounts of data resources. For minority … |
Ikram Mamtimin; Wenqiang Du; A. Hamdulla; | Inf. | 2023-01-16 |
686 | Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Context within the segments produced by ASR decoders can be helpful but limiting in overall punctuation performance for a continuous speech session. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. |
Piyush Behre; Sharman Tan; Padma Varadharajan; Shuangyu Chang; | arxiv-cs.CL | 2023-01-10 |
687 | Exploring A Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We build a single automatic speech recognition (ASR) model for several south Indian languages using a common set of intermediary labels, which can be easily mapped to the desired … |
C. Anoop; A. Ramakrishnan; | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
688 | Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper investigates an approach for adapting RNN-Transducer (RNN-T) based automatic speech recognition (ASR) model to improve the recognition of unseen words during training. … |
Sungjun Han; Deepak Baby; Valentin Mendelev; | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
689 | Improving Luxembourgish Speech Recognition with Cross-Lingual Speech Representations Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Luxembourgish is a West Germanic language spoken by roughly 390,000 people, mainly in Luxembourg. It is one of Europe’s under-described and under-resourced languages, not … |
Le-Minh Nguyen; Shekhar Nayak; M. Coler; | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
690 | A Truly Multilingual First Pass and Monolingual Second Pass Streaming On-Device ASR System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) systems need to be accurate, have low latency, and effectively handle language switching in order to be useful for the 60% of the world … |
S. MAVANDADI et. al. | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
691 | Learning Mask Scalars for Improved Robust Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Improving robustness of streaming automatic speech recognition (ASR) systems using neural network based acoustic frontends is challenging because of causality constraints and the … |
A. Narayanan; James Walker; S. Panchapagesan; N. Howard; Yuma Koizumi; | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
692 | Context-Aware Neural Confidence Estimation for Rare Word Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Confidence estimation for automatic speech recognition (ASR) is important for many downstream tasks. Recently, neural confidence estimation models (CEMs) have been shown to … |
David Qiu; Tsendsuren Munkhdalai; Yanzhang He; K. Sim; | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
693 | ASBERT: ASR-Specific Self-Supervised Learning with Self-Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Pre-training of self-supervised learning (SSL) generally shows a good performance on various speech processing tasks. However, this pre-training scheme may lead to a sub-optimal … |
H. KIM et. al. | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
694 | Interdecoder: Using Attention Decoders As Intermediate Regularization for CTC-Based Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We propose InterDecoder: a new non-autoregressive automatic speech recognition (NAR-ASR) training method that injects the advantage of token-wise autoregressive decoders while … |
Tatsuya Komatsu; Yusuke Fujita; | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2023-01-09 |
695 | SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use. |
Heli Qi; Sashi Novitasari; Andros Tjandra; Sakriani Sakti; Satoshi Nakamura; | arxiv-cs.CL | 2023-01-07 |
696 | Uncovering The Potential for A Weakly Supervised End-to-End Model in Recognising Speech from Patient with Post-Stroke Aphasia Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Post-stroke speech and language deficits (aphasia) significantly impact patients’ quality of life. Many with mild symptoms remain undiagnosed, and the majority do not receive the … |
Giulia Sanguedolce; P. Naylor; F. Geranmayeh; | Clinical Natural Language Processing Workshop | 2023-01-01 |
697 | Submission of USTC’s System for The IWSLT 2023 – Offline Speech Translation Track Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the submissions of the research group USTC-NELSLIP to the 2023 IWSLT Offline Speech Translation competition, which involves translating spoken English into … |
XINYUAN ZHOU et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
698 | The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the extscMineTrans English-to-Chinese speech translation systems developed for two challenge tracks of IWSLT 2023, i.e., Offline Speech Translation (S2T) and … |
YICHAO DU et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
699 | QUESPA Submission for The IWSLT 2023 Dialect and Low-resource Speech Translation Tasks IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This article describes the QUESPA team speech translation (ST) submissions for the Quechua to Spanish (QUE–SPA) track featured in the Evaluation Campaign of IWSLT 2023: … |
John E. Ortega; Rodolfo Zevallos; William Chen; | International Workshop on Spoken Language Translation | 2023-01-01 |
700 | Towards Training Bilingual and Code-Switched Speech Recognition Models from Monolingual Data Sources Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multilingual Automatic Speech Recognition (ASR) models are capable of transcribing audios across multiple languages, eliminating the need for separate models. In addition, they … |
Kunal Dhawan; Dima Rekesh; Boris Ginsburg; | ArXiv | 2023-01-01 |
701 | Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View |
Siddharth Rathod; Monil Charola; Hemant A. Patil; | International Conference on Speech and Computer | 2023-01-01 |
702 | Listen, Decipher and Sign: Toward Unsupervised Speech-to-Sign Language Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Existing supervised sign language recognition systems rely on an abundance of well-annotated data. Instead, an unsupervised speech-to-sign language recognition (SSR-U) system … |
LIMING WANG et. al. | Annual Meeting of the Association for Computational … | 2023-01-01 |
703 | Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Modern speech recognition systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down … |
Titouan Parcollet; R. V. Dalen; Shucong Zhang; Sourav Bhattacharya; | ArXiv | 2023-01-01 |
704 | SRI-B’s Systems for IWSLT 2023 Dialectal and Low-resource Track: Marathi-Hindi Speech Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the speech translation systems SRI-B developed for the IWSLT 2023 Evaluation Campaign Dialectal and Low-resource track: Marathi-Hindi Speech Translation. We … |
BALAJI RADHAKRISHNAN et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
705 | Query-Efficient Black-Box Adversarial Attacks on Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The susceptibility of Deep Neural Networks (DNNs) to adversarial attacks has raised concerns regarding their practical applications in real-world scenarios. Although the … |
CHUXUAN TONG et. al. | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
706 | Query-Efficient Adversarial Attack With Low Perturbation Against End-to-End Speech Recognition Systems IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the widespread use of automated speech recognition (ASR) systems in modern consumer devices, attack against ASR systems have become an attractive topic in recent years. … |
SHEN WANG et. al. | IEEE Transactions on Information Forensics and Security | 2023-01-01 |
707 | JHU IWSLT 2023 Dialect Speech Translation System Description Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents JHU’s submissions to the IWSLT 2023 dialectal and low-resource track of Tunisian Arabic to English speech translation. The Tunisian dialect lacks formal … |
A. HUSSEIN et. al. | International Workshop on Spoken Language Translation | 2023-01-01 |
708 | A Deep Diacritics-Based Recognition Model for Arabic Speech: Quranic Verses As Case Study Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Arabic is the language of more than 422 million of the world’s population. Although classic Arabic is the Quran language that 1.9 billion Muslims are required to recite, limited … |
Sarah S. Alrumiah; Amal A. Al-Shargabi; | IEEE Access | 2023-01-01 |
709 | A CIF-Based Speech Segmentation Method for Streaming E2E ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Long utterances segmentation is crucial in end-to-end (E2E) streaming automatic speech recognition (ASR). However, commonly used voice activity detection(VAD)-based and … |
Yuchun Shu; Haoneng Luo; Shiliang Zhang; Longbiao Wang; J. Dang; | IEEE Signal Processing Letters | 2023-01-01 |
710 | Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Training Automatic Speech Recognition (ASR) systems with sequentially incoming data from alternate domains is an essential milestone in order to reach human intelligibility level … |
Shahram Ghorbani; J. Hansen; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
711 | End-to-End Multi-Modal Speech Recognition on An Air and Bone Conducted Speech Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) has been significantly improved in the past years. However, most robust ASR systems are based on air-conducted (AC) speech, and their … |
Mou-Sheng Wang; Junqi Chen; Xiao-Lei Zhang; S. Rahardja; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
712 | Evaluating and Improving Automatic Speech Recognition Using Severity Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A common metric for evaluating Automatic Speech Recognition (ASR) is Word Error Rate (WER) which solely takes into account discrepancies at the word-level. Although useful, WER is … |
Ryan Whetten; C. Kennington; | Workshop on Biomedical Natural Language Processing | 2023-01-01 |
713 | Recognition of English Speech – Using A Deep Learning Algorithm Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The accurate recognition of speech is beneficial to the fields of machine translation and intelligent human–computer interaction. After briefly introducing speech recognition … |
Shuyan Wang; | Journal of Intelligent Systems | 2023-01-01 |
714 | A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Though speech enhancement (SE) can be used to improve speech quality in noisy environments, it may also cause distortions that degrade the performance of automatic speech … |
Qiu-shi Zhu; J. Zhang; Zitian Zhang; Lirong Dai; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
715 | Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition of a target speaker in the presence of interfering speakers remains a challenging issue. One approach to tackle this problem is target-speaker speech … |
Takafumi Moriya; Hiroshi Sato; Tsubasa Ochiai; Marc Delcroix; T. Shinozaki; | IEEE Access | 2023-01-01 |
716 | AdvDDoS: Zero-Query Adversarial Attacks Against Commercial Speech Recognition Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) has been widely and commercially employed in health care, autonomous vehicles, and finance. Yet, recent studies have shown that universal … |
Yunjie Ge; Lingchen Zhao; Qian Wang; Yiheng Duan; Minxin Du; | IEEE Transactions on Information Forensics and Security | 2023-01-01 |
717 | A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Both unpaired speech and text have shown to be beneficial for low-resource automatic speech recognition (ASR), which, however were either separately used for pre-training, … |
Ye Du; J Zhang; Xin Fang; Ming Wu; Zhouwang Yang; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
718 | Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Owing to the linguistic richness of the Arabic language, which contains more than 6000 roots, building a reliable Arabic language model for Arabic speech recognition systems faces … |
Mona A. Azim; Wedad Hussein; N. Badr; | IEEE Access | 2023-01-01 |
719 | Spectral Analysis of EEG Signals for Automatic Imagined Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Brain–computer interface (BCI) systems are intended to provide a means of communication for both the healthy and those suffering from neurological disorders. Imagined speech … |
Ashwin Kamble; P. Ghare; Vinay Kumar; Ashwin Kothari; A. Keskar; | IEEE Transactions on Instrumentation and Measurement | 2023-01-01 |
720 | Speech Recognition for Minority Languages Using HuBERT and Model Adaptation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: : In the field of speech recognition, models and datasets are becoming larger and larger. However, it is difficult to create large datasets for minority languages, which is an … |
Tomohiro Hattori; S. Tamura; | International Conference on Pattern Recognition … | 2023-01-01 |
721 | Augmentation Techniques for Adult-Speech to Generate Child-Like Speech Data Samples at Scale Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Technologies such as Text-To-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) have become important in providing speech-based Artificial Intelligence (AI) solutions … |
Mariam Yiwere; Andrei Barcovschi; Rishabh Jain; H. Cucu; Peter Corcoran; | IEEE Access | 2023-01-01 |
722 | Why Aren’t We NER Yet? Artifacts of ASR Errors in Named Entity Recognition in Spontaneous Speech Transcripts Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transcripts of spontaneous human speech present a significant obstacle for traditional NER models. The lack of grammatical structure of spoken utterances and word errors … |
PIOTR SZYMAŃSKI et. al. | Annual Meeting of the Association for Computational … | 2023-01-01 |
723 | Towards Recognition for Radio-Echo Speech in Air Traffic Control: Dataset and A Contrastive Learning Approach Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the air traffic control (ATC) domain, automatic speech recognition (ASR) suffers from radio speech echo, which cannot be addressed by existing echo cancellation due to … |
YI LIN et. al. | IEEE/ACM Transactions on Audio, Speech, and Language … | 2023-01-01 |
724 | Indonesian Automatic Speech Recognition with XLSR-53 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study focuses on the development of Indonesian Automatic Speech Recognition (ASR) using the XLSR-53 pre-trained model, the XLSR stands for cross-lingual speech … |
Panji Arisaputra; Amalia Zahra; | ArXiv | 2022-12-31 |
725 | Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A Case Study for Modern Greek Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervision. |
GEORGIOS PARASKEVOPOULOS et. al. | arxiv-cs.CL | 2022-12-31 |
726 | Can Visual Context Improve Automatic Speech Recognition for An Embodied Agent? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a method to incorporate a robot?s visual information into an ASR system and improve the recognition of a spoken utterance containing a visible entity. |
Pradip Pramanick; Chayan Sarkar; | emnlp | 2022-12-30 |
727 | RED-ACE: Robust Error Detection for ASR Using Confidence Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model’s encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. |
Zorik Gekhman; Dina Zverinski; Jonathan Mallinson; Genady Beryozkin; | emnlp | 2022-12-30 |
728 | Towards Relation Extraction from Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new listening information extraction task, i. e. , speech relation extraction. |
TONGTONG WU et. al. | emnlp | 2022-12-30 |
729 | Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text. |
CHEN WANG et. al. | emnlp | 2022-12-30 |
730 | Memory Augmented Lookup Dictionary Based Language Modeling for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. |
Yukun Feng; Ming Tu; Rui Xia; Chuanzeng Huang; Yuxuan Wang; | arxiv-cs.CL | 2022-12-30 |
731 | SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder. |
ZIQIANG ZHANG et. al. | emnlp | 2022-12-30 |
732 | Don’t Be So Sure! Boosting ASR Decoding Via Confidence Relaxation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models. |
Tomer Wullach; Shlomo E. Chazan; | arxiv-cs.CL | 2022-12-27 |
733 | Skit-S2I: An Indian Accented Speech to Intent Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we release the Skit-S2I dataset, the first publicly available Indian-accented SLU dataset in the banking domain in a conversational tonality. |
Shangeth Rajaa; Swaraj Dalmia; Kumarmanas Nethil; | arxiv-cs.CL | 2022-12-26 |
734 | I Spy You Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents iSpyU, a system that shows the feasibility of recognition of natural speech content played on a phone during conference calls (Skype, Zoom, etc) using a fusion … |
Shijia Zhang; Yilin Liu; Mahanth K. Gowda; | Proceedings of the ACM on Interactive, Mobile, Wearable and … | 2022-12-21 |
735 | End-to-End Automatic Speech Recognition Model for The Sudanese Dialect Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the field lacks wide support for several universal languages and their dialects, while most of the daily conversations are carried out using them. This paper comes to inspect the viability of designing an Automatic Speech Recognition model for the Sudanese dialect, which is one of the Arabic Language dialects, and its complexity is a product of historical and social conditions unique to its speakers. |
Ayman Mansour; Wafaa F. Mukhtar; | arxiv-cs.CL | 2022-12-21 |
736 | Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to better utilize the potential power of SSL models, in this work, we explore the effective fusion on multiple SSL models. |
Changli Tang; Yujin Wang; Xie Chen; Wei-Qiang Zhang; | arxiv-cs.SD | 2022-12-20 |
737 | Mu2SLAM: Multitask, Multilingual Speech and Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech … |
Yong Cheng; Yu Zhang; Melvin Johnson; Wolfgang Macherey; Ankur Bapna; | ArXiv | 2022-12-19 |
738 | Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. |
Yong Cheng; Yu Zhang; Melvin Johnson; Wolfgang Macherey; Ankur Bapna; | arxiv-cs.CL | 2022-12-19 |
739 | DSTC-11: Speech Aware Task-Oriented Dialog Modeling Track Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, … |
H. SOLTAU et. al. | DSTC | 2022-12-16 |
740 | Speech Aware Dialog System Technology Challenge (DSTC11) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These included ASR transcripts, word time stamps, and latent representations of the audio (audio encoder outputs). In this paper, we describe the corpus, report results from participating teams, provide preliminary analyses of their results, and summarize the current state-of-the-art in this domain. |
HAGEN SOLTAU et. al. | arxiv-cs.AI | 2022-12-16 |
741 | BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. |
MINGDA CHEN et. al. | arxiv-cs.CL | 2022-12-16 |
742 | Disentangling Prosody Representations with Unsupervised Speech Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aim of this paper is to address the disentanglement of emotional prosody from speech based on unsupervised reconstruction. |
LEYUAN QU et. al. | arxiv-cs.SD | 2022-12-13 |
743 | End-to-End Speech Translation of Arabic to English Broadcast News Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system. |
Fethi Bougares; Salim Jouili; | arxiv-cs.CL | 2022-12-11 |
744 | Improving Speech Recognition with Augmented Synthesized Data and Conditional Model Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With recent advances in end-to-end text to speech (TTS), the quality of synthetic data has been significantly improved. Synthesized speech is becoming a feasible alternative to … |
Shaofei Xue; Jian Tang; Yazhu Liu; | 2022 13th International Symposium on Chinese Spoken … | 2022-12-11 |
745 | Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we discuss the rationale of our work for automatic speech recognition (ASR) in the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge and provide a … |
Hanzhi Guo; Yunshu Chen; Xukang Xie; G. Xu; Wei Guo; | 2022 13th International Symposium on Chinese Spoken … | 2022-12-11 |
746 | Adaptive Attention Network with Domain Adversarial Training for Multi-Accent Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spoken accents severely degrade the performance of automatic speech recognition (ASR) systems. Domain adversarial training (DAT) is widely adopted for generating domain-invariant … |
YANBING YANG et. al. | 2022 13th International Symposium on Chinese Spoken … | 2022-12-11 |
747 | An Automatic Speech Recognition System in Indian and Foreign Languages: A State-of-the-art Review Analysis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech Recognition is one of the prominent research topics in the field of Natural Language Processing (NLP). The Speech Recognition technique removes the barriers and makes the … |
Astha Gupta; Rakesh Kumar; Y. Kumar; | Intell. Decis. Technol. | 2022-12-02 |
748 | Robust Speech Recognition Using Teacher-Student Learning Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View |
Han Ma; Qiaoling Zhang; Roubing Tang; Lu Zhang; Yubo Jia; | IEICE Trans. Inf. Syst. | 2022-12-01 |
749 | Audio Adversarial Detection Through Classification Score on Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View |
Hyung-Min Kwon; Seung-Hun Nam; | Comput. Secur. | 2022-12-01 |
750 | MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both … |
XIAOHUAN ZHOU et. al. | ArXiv | 2022-11-29 |
751 | TESSP: Text-Enhanced Self-Supervised Speech Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the distinct pre-training objectives make it challenging to jointly optimize the speech and text representation in the same model. To solve this problem, we propose Text-Enhanced Self-Supervised Speech Pre-training (TESSP), aiming to incorporate the linguistic information into speech pre-training. |
ZHUOYUAN YAO et. al. | arxiv-cs.SD | 2022-11-24 |
752 | Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation … |
INJY HAMED et. al. | 2022 IEEE Spoken Language Technology Workshop (SLT) | 2022-11-22 |
753 | SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore training and deploying an ASR system in the label-scarce, compute-limited setting. |
RAPHAEL TANG et. al. | arxiv-cs.CL | 2022-11-21 |
754 | CORAA ASR: A Large Corpus of Spontaneous and Prepared Speech Manually Validated for Speech Recognition in Brazilian Portuguese Related Papers Related Patents Related Grants Related Venues Related Experts View |
ARNALDO CANDIDO JUNIOR et. al. | Language Resources and Evaluation | 2022-11-21 |
755 | LongFNT: Long-form Speech Recognition with Factorized Neural Transducer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the {LongFNT-Text} architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor and then embeds token-level long-form features inside the vocabulary predictor, with a pre-trained contextual encoder RoBERTa to further boost the performance. |
XUN GONG et. al. | arxiv-cs.SD | 2022-11-17 |
756 | Hey ASR System! Why Aren’t You More Inclusive? Automatic Speech Recognition Systems’ Bias and Proposed Bias Mitigation Techniques. A Literature Review IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These systems do not work equally for everyone and actually hinder the productivity of some users. In this paper, we present research that addresses ASR biases against gender, race, and the sick and disabled, while exploring studies that propose ASR debiasing techniques for mitigating these discriminations. |
Mikel K. Ngueajio; Gloria Washington; | arxiv-cs.CL | 2022-11-17 |
757 | Introducing Semantics Into Speech Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. |
DEREK XU et. al. | arxiv-cs.CL | 2022-11-15 |
758 | Improving Children’s Speech Recognition By Fine-tuning Self-supervised Adult Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage self-supervised adult speech representations and use three well-known child speech corpora to build models for children’s speech recognition. |
Renee Lu; Mostafa Shahin; Beena Ahmed; | arxiv-cs.CL | 2022-11-14 |
759 | The Far Side of Failure: Investigating The Impact of Speech Recognition Errors on Subsequent Dementia Classification Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications includ-ing screening for dementia and other forms of cognitive … |
Changye Li; T. Cohen; Serguei V. S. Pakhomov; | ArXiv | 2022-11-11 |
760 | Align, Write, Re-order: Explainable End-to-End Speech Translation Via Operation Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A major challenge arises from the fact that translation is a non-monotonic sequence transduction task due to word ordering differences between languages — this clashes with the monotonic nature of ASR. Therefore, we propose to generate ST tokens out-of-order while remembering how to re-order them later. |
Motoi Omachi; Brian Yan; Siddharth Dalmia; Yuya Fujita; Shinji Watanabe; | arxiv-cs.CL | 2022-11-10 |
761 | A Study on The Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? |
YIFAN PENG et. al. | arxiv-cs.CL | 2022-11-10 |
762 | Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks. |
Yu Chen; Wen Ding; Junjie Lai; | arxiv-cs.SD | 2022-11-09 |
763 | ATCO2 Corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. |
JUAN ZULUAGA-GOMEZ et. al. | arxiv-cs.CL | 2022-11-08 |
764 | Streaming, Fast and Accurate On-device Inverse Text Normalization for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the development of an on-device ITN system that is streaming, lightweight & accurate. |
YASHESH GAUR et. al. | arxiv-cs.CL | 2022-11-07 |
765 | Towards Improved Room Impulse Response Estimation for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). |
ANTON RATNARAJAH et. al. | arxiv-cs.SD | 2022-11-07 |
766 | Non-Acoustic Speech Sensing System Based on Flexible Piezoelectric Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech is one of the most important biological signals to complement human-human and human-computer interaction. Traditional speech datasets were collected by air microphones, but … |
SHIJI YUAN et. al. | Proceedings of the 20th ACM Conference on Embedded … | 2022-11-06 |
767 | Global Normalization for Streaming Speech Recognition in A Modular Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. |
EHSAN VARIANI et. al. | nips | 2022-11-06 |
768 | Bridging Speech and Textual Pre-trained Models with Unsupervised ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. |
JIATONG SHI et. al. | arxiv-cs.CL | 2022-11-06 |
769 | Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework to finetune the connections of speech SSL models, instead of model weights, to empower efficient multilingual and multitask speech processing. |
YONGGAN FU et. al. | nips | 2022-11-06 |
770 | LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to … |
PEIDONG WANG et. al. | INTERSPEECH 2023 | 2022-11-05 |
771 | LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers. |
PEIDONG WANG et. al. | arxiv-cs.CL | 2022-11-05 |
772 | Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a linguistic perspective, and take the French language as a case study toward disambiguation of the French homophones. |
Hannaneh B. Pasandi; Haniyeh B. Pasandi; | arxiv-cs.CL | 2022-11-05 |
773 | Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing … |
XIN ZHANG et. al. | ArXiv | 2022-11-04 |
774 | Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we analyze the performance of features at different layers of a foundation model on the speech recognition task and propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models. |
ZHOUYUAN HUO et. al. | arxiv-cs.LG | 2022-11-04 |
775 | H_eval: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose H_eval, a new hybrid evaluation metric for ASR systems that considers both semantic correctness and error rate and performs significantly well in scenarios where WER and SD perform poorly. |
Zitha Sasindran; Harsha Yelchuri; T. V. Prabhakar; Supreeth Rao; | arxiv-cs.CL | 2022-11-03 |
776 | Probing Statistical Representations For End-To-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: End-to-End automatic speech recognition (ASR) models aim to learn a generalised speech representation to perform recognition. |
Anna Ollerenshaw; Md Asif Jalal; Thomas Hain; | arxiv-cs.CL | 2022-11-03 |
777 | The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC). |
AO ZHANG et. al. | arxiv-cs.SD | 2022-11-03 |
778 | Towards Zero-Shot Code-Switched Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. |
Brian Yan; Matthew Wiesner; Ondrej Klejch; Preethi Jyothi; Shinji Watanabe; | arxiv-cs.CL | 2022-11-02 |
779 | Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work aims to enhance the practical usage of speech SSL models towards a win-win in both enhanced efficiency and alleviated overfitting via our proposed S$^3$-Router framework, which for the first time discovers that simply discarding no more than 10\% of model weights via only finetuning model connections of speech SSL models can achieve better accuracy over standard weight finetuning on downstream speech processing tasks. |
YONGGAN FU et. al. | arxiv-cs.LG | 2022-11-02 |
780 | Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to overcome, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. |
Lester Phillip Violeta; Ding Ma; Wen-Chin Huang; Tomoki Toda; | arxiv-cs.SD | 2022-11-02 |
781 | Conversation-oriented ASR with Multi-look-ahead CBS Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In streaming ASR, high accuracy is assured by attending to look-ahead frames, which leads to delay increments. To tackle this trade-off issue, we propose a multiple latency streaming ASR to achieve high accuracy with zero look-ahead. |
HUAIBO ZHAO et. al. | arxiv-cs.SD | 2022-11-01 |
782 | Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a method to jointly train the ASR and EP tasks in a single end-to-end (E2E) multitask model, improving EP quality by optionally leveraging information from the ASR audio encoder. |
SHAAN BIJWADIA et. al. | arxiv-cs.SD | 2022-11-01 |
783 | An Online Intelligent Electronic Medical Record System Via Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Traditional electronic medical record systems in hospitals rely on healthcare workers to manually enter patient information, resulting in healthcare workers having to spend a … |
Xin Xia; Yunlong Ma; Ye Luo; Jianwei Lu; | International Journal of Distributed Sensor Networks | 2022-11-01 |
784 | Improving Vietnamese Accent Recognition Using ASR Transfer Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Accent Recognition (AR) is a critical task in voice-controlled systems. If accent information is known in advance, voice-controlled systems can switch to a suitable … |
Bao Thang Ta; Xuan Vuong Dang; Quang Tien Duong; Nhat Minh Le; Van Hai Do; | 2022 25th Conference of the Oriental COCOSDA International … | 2022-11-01 |
785 | Structured State Space Decoder for Speech Recognition and Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we applied S4 as a decoder for ASR and text-to-speech (TTS) tasks by comparing it with the Transformer decoder. |
Koichi Miyazaki; Masato Murata; Tomoki Koriyama; | arxiv-cs.SD | 2022-10-31 |
786 | Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present our Joint Audio/Text training method for Transformer Rescorer, to leverage unpaired text-only data which is relatively cheaper than paired audio-text data. |
SUYOUN KIM et. al. | arxiv-cs.CL | 2022-10-31 |
787 | On The Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Interacting with a speech interface to query a Question Answering (QA) system is becoming increasingly popular. Typically, QA systems rely on passage retrieval to select candidate … |
Georgios Sidiropoulos; Svitlana Vakulenko; Evangelos Kanoulas; | cikm | 2022-10-29 |
788 | XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a novel linear transformer by examining the properties of the key-query product within self-attentions. |
Roshan Sharma; Bhiksha Raj; | arxiv-cs.CL | 2022-10-29 |
789 | Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the transcription process and the development of a Kiswahili speech corpus, which includes both read-out texts and spontaneous speech data from native Kiswahili speakers. |
EBBIE AWINO et. al. | arxiv-cs.CL | 2022-10-29 |
790 | Filter and Evolve: Progressive Pseudo Label Refining for Semi-supervised Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Fine-tuning self-supervised pre-trained models using pseudo-labels can effectively improve speech recognition performance. But, low-quality pseudo-labels can misguide decision … |
ZEZHONG JIN et. al. | ArXiv | 2022-10-28 |
791 | Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. |
TAKAAKI SAEKI et. al. | arxiv-cs.SD | 2022-10-27 |
792 | Automatic Severity Classification of Dysarthric Speech By Using Self-supervised Model with Multi-task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. |
Eun Jung Yeo; Kwanghee Choi; Sunhee Kim; Minhwa Chung; | arxiv-cs.CL | 2022-10-27 |
793 | SAN: A Robust End-to-end ASR Model Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. |
Zeping Min; Qian Ge; Guanhua Huang; | arxiv-cs.SD | 2022-10-27 |
794 | Improving Speech-to-Speech Translation Through Unlabeled Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data. |
XUAN-PHI NGUYEN et. al. | arxiv-cs.CL | 2022-10-26 |
795 | End-to-End Speech to Intent Prediction to Improve E-commerce Customer Support Voicebot in Hindi and English Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automation of on-call customer support relies heavily on accurate and efficient speech-to-intent (S2I) systems. Building such systems using multi-component pipelines can pose … |
Abhinav Goyal; Ashutosh Kumar Singh; Nikesh Garera; | ArXiv | 2022-10-26 |
796 | Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite joining four models into one, our unified tagging approach matches or outperforms task-specific models across all four tasks on benchmark test sets across several domains. |
Sharman Tan; Piyush Behre; Nick Kibre; Issac Alphonso; Shuangyu Chang; | arxiv-cs.CL | 2022-10-26 |
797 | ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To promote the development of multi-domain speech systems, we introduce the End-to-end Speech Benchmark (ESB) for evaluating the performance of a single automatic speech recognition (ASR) system across a broad set of speech datasets. |
Sanchit Gandhi; Patrick von Platen; Alexander M. Rush; | arxiv-cs.CL | 2022-10-24 |
798 | Investigating Self-supervised, Weakly Supervised and Fully Supervised Training Approaches for Multi-domain Automatic Speech Recognition: A Study on Bangladeshi Bangla Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the robustness of the state-of-the-art transfer learning approaches such as self-supervised wav2vec 2.0 and weakly supervised Whisper as well as fully supervised convolutional neural networks (CNNs) for multi-domain ASR. |
AHNAF MOZIB SAMIN et. al. | arxiv-cs.CL | 2022-10-23 |
799 | Beyond Subtitles: Captioning and Visualizing Non-speech Sounds to Improve Accessibility of User-Generated Videos Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Captioning provides access to sounds in audio-visual content for people who are Deaf or Hard-of-hearing (DHH). As user-generated content in online videos grows in prevalence, … |
Oliver Alonzo; Hijung Valentina Shin; Dingzeyu Li; | Proceedings of the 24th International ACM SIGACCESS … | 2022-10-22 |
800 | Guided Contrastive Self-supervised Pre-training for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel modification of CPC called Guided Contrastive Predictive Coding (GCPC). |
Aparna Khare; Minhua Wu; Saurabhchand Bhati; Jasha Droppo; Roland Maas; | arxiv-cs.CL | 2022-10-21 |
801 | Deep LSTM Spoken Term Detection Using Wav2Vec 2.0 Recognizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use the Wav2Vec speech recognizer in the task of spoken term detection over a large set of spoken documents. |
Jan Švec; Jan Lehečka; Luboš Šmídl; | arxiv-cs.CL | 2022-10-21 |
802 | A Textless Metric for Speech-to-Speech Comparison Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. |
Laurent Besacier; Swen Ribeiro; Olivier Galibert; Ioan Calapodescu; | arxiv-cs.CL | 2022-10-21 |
803 | Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. |
THIEN NGUYEN et. al. | arxiv-cs.SD | 2022-10-21 |
804 | End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel end-to-end architecture by integrating dereverberation, beamforming, SSLR, and ASR within a single neural network. |
Yoshiki Masuyama; Xuankai Chang; Samuele Cornell; Shinji Watanabe; Nobutaka Ono; | arxiv-cs.SD | 2022-10-19 |
805 | Throat Microphone Speech Recognition Using Wav2vec 2.0 and Feature Mapping Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Throat microphones can record the voice which and simultaneously suppress the impact of external noise. This work aims to improve speech recognition performance using throat … |
Kohta Masuda; J. Ogata; M. Nishida; M. Nishimura; | 2022 IEEE 11th Global Conference on Consumer Electronics … | 2022-10-18 |
806 | HMM Vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR). |
Tina Raissi; Wei Zhou; Simon Berger; Ralf Schlüter; Hermann Ney; | arxiv-cs.SD | 2022-10-18 |
807 | Maestro-U: Leveraging Joint Speech-text Representation Learning for Zero Supervised Speech ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that a modality-matched joint speech and text model can be leveraged to train a massively multilingual ASR model without any supervised (manually transcribed) speech for some languages. |
ZHEHUAI CHEN et. al. | arxiv-cs.CL | 2022-10-18 |
808 | Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is only a few transcribed and aligned CS speech available. To overcome this problem and train multilingual systems which can transcribe CS speech, we propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are concatenated. |
Enes Yavuz Ugan; Christian Huber; Juan Hussain; Alexander Waibel; | arxiv-cs.CL | 2022-10-17 |
809 | Experiments on Turkish ASR with Self-Supervised Speech Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this report, we present our findings on Turkish ASR with speech representation learning using HUBERT. |
Ali Safaya; Engin Erzin; | arxiv-cs.CL | 2022-10-13 |
810 | Summary on The ISCSLP 2022 Chinese-English Code-Switching ASR Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems. |
SHUHAO DENG et. al. | arxiv-cs.CL | 2022-10-12 |
811 | A Context-aware Knowledge Transferring Strategy for CTC-based ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the challenge, we propose a context-aware knowledge transferring strategy, consisting of a knowledge transferring module and a context-aware training strategy, for CTC-based ASR. |
Ke-Han Lu; Kuan-Yu Chen; | arxiv-cs.CL | 2022-10-12 |
812 | Language Identification-Based Evaluation of Single Channel Speech Separation of Overlapped Speeches Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In multi-lingual, multi-speaker environments (e.g., international conference scenarios), speech, language, and background sounds can overlap. In real-world scenarios, source … |
Zuhragvl Aysa; Mijit Ablimit; Hankiz Yilahun; A. Hamdulla; | Inf. | 2022-10-11 |
813 | An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora. |
Chao-Han Huck Yang; I-Fan Chen; Andreas Stolcke; Sabato Marco Siniscalchi; Chin-Hui Lee; | arxiv-cs.SD | 2022-10-11 |
814 | Streaming Punctuation for Long-form Dictation with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. |
Piyush Behre; Sharman Tan; Padma Varadharajan; Shuangyu Chang; | arxiv-cs.CL | 2022-10-11 |
815 | Automatic Speech Recognition of Low-Resource Languages Based on Chukchi Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language. |
Anastasia Safonova; Tatiana Yudina; Emil Nadimanov; Cydnie Davenport; | arxiv-cs.CL | 2022-10-11 |
816 | CTC Alignments Improve Autoregressive Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework wherein CTC’s core properties can counteract several key weaknesses of pure-attention models during training and decoding. |
BRIAN YAN et. al. | arxiv-cs.CL | 2022-10-11 |
817 | A Platform for Deploying The TFE Ecosystem of Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Since data regulations such as the European Union’s General Data Protection Regulation (GDPR) have taken effect, the traditional two-step Automatic Speech Recognition (ASR) … |
YUANFENG SONG et. al. | Proceedings of the 30th ACM International Conference on … | 2022-10-10 |
818 | Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. |
LEI WANG et. al. | arxiv-cs.CL | 2022-10-07 |
819 | Pronunciation Modeling of Foreign Words for Mandarin ASR By Considering The Effect of Language Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on examining the phonetic effect of language transfer in automatic speech recognition. |
Lei Wang; Rong Tong; | arxiv-cs.CL | 2022-10-07 |
820 | Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose several techniques such as a limited training strategy and regularized adapter modules for the Transducer encoder, prediction, and joiner network. |
Somshubra Majumdar; Shantanu Acharya; Vitaly Lavrukhin; Boris Ginsburg; | arxiv-cs.SD | 2022-10-06 |
821 | JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a … |
Mayumi Ohta; Julia Kreutzer; Stefan Riezler; | arxiv-cs.CL | 2022-10-05 |
822 | Code-Switching Without Switching: Language Agnostic End-to-End Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we eliminate the need for that, by treating speech recognition and translation as one unified end-to-end speech translation problem. |
Christian Huber; Enes Yavuz Ugan; Alexander Waibel; | arxiv-cs.CL | 2022-10-04 |
823 | IoT Device Control with Offline Automatic Speech Recognition on Edge Device Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) on edge device still barely used in industry. Most of ASR such as speech-to-text commonly depend on the network presence. This is the … |
Panji Setiawan; Rahadian Yusuf; | 2022 12th International Conference on System Engineering … | 2022-10-03 |
824 | Tamil Speech Recognition Using XLSR Wav2Vec2.0 & CTC Algorithm Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition is a promising research topic with lots of real-world applications like virtual assistants, aids for physically challenged etc. Tamil language speech … |
A. Akhilesh; Brinda P; Keerthana S; Deepa Gupta; Susmitha Vekkot; | 2022 13th International Conference on Computing … | 2022-10-03 |
825 | Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end automatic speech recognition (ASR) has become a popular alternative to traditional module-based systems, simplifying the model-building process with a single deep … |
Yosuke Higuchi; Niko Moritz; J. Le Roux; Takaaki Hori; | IEEE Journal of Selected Topics in Signal Processing | 2022-10-01 |
826 | Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased attention in the automatic speech recognition (ASR) community. Typical SSL methods … |
Ruchao Fan; Yunzheng Zhu; Jinhan Wang; A. Alwan; | IEEE Journal of Selected Topics in Signal Processing | 2022-10-01 |
827 | Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR). |
CHENDONG ZHAO et. al. | arxiv-cs.CL | 2022-09-29 |
828 | DAMO-NLP at NLPCC-2022 Task 2: Knowledge Enhanced Robust NER for Speech Entity Linking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach called Knowledge Enhanced Named Entity Recognition (KENER), which focuses on improving robustness through painlessly incorporating proper knowledge in the entity recognition stage and thus improving the overall performance of entity linking. |
SHEN HUANG et. al. | arxiv-cs.CL | 2022-09-27 |
829 | Unsupervised Domain Adaptation for Speech Recognition with Unsupervised Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an unsupervised error correction method for unsupervised ASR domain adaption, aiming to recover transcription errors caused by domain mismatch. |
Long Mai; Julie Carson-Berndsen; | arxiv-cs.SD | 2022-09-24 |
830 | A Russian Continuous Speech Recognition System Based on The DTW Algorithm Under Artificial Intelligence Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In order to improve the effect of continuous speech recognition, this paper combines the DTW algorithm to construct a continuous Russian speech recognition system and proposes a … |
Chunping Yu; Xin Eric Wang; | J. Robotics | 2022-09-19 |
831 | Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Evaluating automatic speech recognition (ASR) systems is a classical but difficult and still open problem, which often boils down to focusing only on the word error rate (WER). … |
Thibault Bañeras-Roux; Mickael Rouvier; Jane Wottawa; Richard Dufour; | Interspeech | 2022-09-18 |
832 | Reducing Multilingual Context Confusion for End-to-end Code-switching Automatic Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially … |
SHUAI ZHANG et. al. | Interspeech | 2022-09-18 |
833 | Articulatory Synthesis for Data Augmentation in Phoneme Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While numerous studies on automatic speech recognition have been published in recent years describing data augmentation strategies based on time or frequency domain signal … |
P. K. KRUG et. al. | Interspeech | 2022-09-18 |
834 | Mitigating Bias Against Non-native Accents IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) systems have seen sub-stantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists … |
Yuanyuan Zhang; Yixuan Zhang; B. Halpern; T. Patel; O. Scharenborg; | Interspeech | 2022-09-18 |
835 | Improved ASR Performance for Dysarthric Speech Using Two-stage DataAugmentation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine learning (ML) and Deep Neural Networks (DNN) have greatly aided the problem of Automatic Speech Recognition (ASR). However, accurate ASR for dysarthric speech remains a … |
Chitralekha Bhat; Ashish Panda; H. Strik; | Interspeech | 2022-09-18 |
836 | Finer-grained Modeling Units-based Meta-Learning for Low-resource Tibetan Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Tibetan is a typical under-resourced language due to its relatively smaller population. Although a character-based end-to-end (E2E) automatic speech recognition (ASR) model with … |
Siqing Qin; Longbiao Wang; Sheng Li; Yuqin Lin; J. Dang; | Interspeech | 2022-09-18 |
837 | Enhancing Speech Privacy with Slicing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Privacy preservation calls for anonymization methods which hide the speaker’s identity in speech signals while min-imizing the impact on downstream tasks such as automatic speech … |
MOHAMED MAOUCHE et. al. | Interspeech | 2022-09-18 |
838 | External Text Based Data Augmentation for Low-Resource Speech Recognition in The Constrained Condition of OpenASR21 Challenge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes our USTC NELSLIP system submitted to the Open Automatic Speech Recognition (OpenASR21) Challenge for the Constrained condition, where only a 10-hour speech … |
GUOLONG ZHONG et. al. | Interspeech | 2022-09-18 |
839 | Incremental Learning for RNN-Transducer Based Speech Recognition Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper investigates an incremental learning framework for a real-world voice assistant employing RNN-Transducer based automatic speech recognition (ASR) model. Such a model … |
Deepak Baby; Pasquale D’Alterio; Valentin Mendelev; | Interspeech | 2022-09-18 |
840 | Preventing Sensitive-word Recognition Using Self-supervised Learning to Preserve User-privacy for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Smart voice assistants that rely on automatic speech recognition (ASR) are widely used by people for multiple reasons. These devices, however, feature “always on” microphones that … |
Yuchen Liu; Apu Kapadia; D. Williamson; | Interspeech | 2022-09-18 |
841 | Generalized Keyword Spotting Using ASR Embeddings IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Keyword Spotting (KWS) detects a set of pre-defined spoken keywords. Building a KWS system for an arbitrary set requires massive training datasets. We propose to use the text … |
K. R.; V. Kurmi; Vinay Namboodiri; C. V. Jawahar; | Interspeech | 2022-09-18 |
842 | End-to-End Dependency Parsing of Spoken French Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Research efforts in syntactic parsing have focused on written texts. As a result, speech parsing is usually performed on transcriptions, either in unrealistic settings (gold … |
Adrien Pupier; Maximin Coavoux; B. Lecouteux; Jérôme Goulian; | Interspeech | 2022-09-18 |
843 | Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Conventional Automatic Speech Recognition (ASR) systems are susceptible to dialect variations within a language, thereby adversely affecting the ASR. Therefore, the current … |
Aditya Yadavalli; Ganesh S Mirishkar; A. Vuppala; | Interspeech | 2022-09-18 |
844 | Global RNN Transducer Models For Multi-dialect Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Constructing single, unified automatic speech recognition (ASR) models that work effectively across various dialects of a language is a challenging problem. Although many recently … |
TAKASHI FUKUDA et. al. | Interspeech | 2022-09-18 |
845 | Improving ASR Robustness in Noisy Condition Through VAD Integration Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) systems are often deployed together with a voice activity detection (VAD) system to run ASR only on the voiced acoustic signals. Although it can … |
Sashi Novitasari; Takashi Fukuda; Gakuto Kurata; | Interspeech | 2022-09-18 |
846 | End-to-End Spontaneous Speech Recognition Using Disfluency Labeling Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spontaneous speech often contains disfluent acoustic features such as fillers and hesitations, which are major causes of errors during automatic speech recognition (ASR). In this … |
KOHARU HORII et. al. | Interspeech | 2022-09-18 |
847 | OpenASR21: The Second Open Challenge for Automatic Speech Recognition of Low-Resource Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In 2021, the National Institute of Standards and Technology (NIST), in cooperation with the Intelligence Advanced Research Project Activity (IARPA), conducted OpenASR21, the … |
Kay Peterson; Audrey Tong; Yan Yu; | Interspeech | 2022-09-18 |
848 | Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The performance of automatic speech recognition (ASR) systems strongly deteriorates when the desired speech signal is contaminated with room reverberation and when the speech of … |
Mieszko Fraś; Marcin Witkowski; K. Kowalczyk; | Interspeech | 2022-09-18 |
849 | Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this article, we propose a simple yet effective approach to train an end-to-end speech recognition system on languages with limited resources by leveraging a large pre-trained … |
Geoffroy Vanderreydt; François Remy; Kris Demuynck; | Interspeech | 2022-09-18 |
850 | Ant Multilingual Recognition System for OLR 2021 Challenge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a comprehensive description of the Ant multilingual recognition system for the 6th Oriental Language Recognition(OLR 2021) Challenge. Inspired by the transfer … |
Anqi Lyu; Zhiming Wang; Huijia Zhu; | Interspeech | 2022-09-18 |
851 | Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR By Fusing Speech Generation Methods Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Out-of-vocabulary (OOV) is a common problem for end-to-end (E2E) ASR. For code-switching (CS), the OOV problem on the embedded language is further aggravated and becomes a … |
LINGXUAN YE et. al. | Interspeech | 2022-09-18 |
852 | Prompt-based Re-ranking Language Model for ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In Automatic Speech Recognition(ASR), the language model re-ranking based on unlabeled text can improve the performance and realize flexibly scene adaptation. The scheme of ASR … |
Mengxi Nie; Ming Yan; Caixia Gong; | Interspeech | 2022-09-18 |
853 | Gram Vaani ASR Challenge on Spontaneous Telephone Speech Recordings in Regional Variations of Hindi IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the corpus and baseline systems for the Gram Vaani Automatic Speech Recognition (ASR) challenge in regional variations of Hindi. The corpus for this challenge … |
ANISH BHANUSHALI et. al. | Interspeech | 2022-09-18 |
854 | Analysis of The Effect of Audio Data Augmentation Techniques on Phone Digit Recognition For Algerian Arabic Dialect Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this study, we describe a solution for dealing with the problem of data scarcity in Speech Processing tasks involving low-resource languages, including Automatic Speech … |
Khaled Lounnas; Mohamed Lichouri; Mourad Abbas; | 2022 International Conference on Advanced Aspects of … | 2022-09-17 |
855 | Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that modern ASR architectures, specifically ones based on Self-Supervised Learning, are in fact vulnerable to transferability. |
Raphael Olivier; Hadi Abdullah; Bhiksha Raj; | arxiv-cs.LG | 2022-09-17 |
856 | MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a MVNet consisted of a memory assistance module which improves the performance of downstream ASR and a vocal reinforcement module which boosts the performance of ASV. |
JIANRONG WANG et. al. | arxiv-cs.SD | 2022-09-15 |
857 | Non-Parallel Voice Conversion for ASR Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that voice conversion can be used as a data augmentation technique to improve ASR performance, even on LibriSpeech, which contains 2,456 speakers. |
GARY WANG et. al. | arxiv-cs.SD | 2022-09-14 |
858 | A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children’s Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking … |
John Andrew Y. Ing; Ronald M. Pascual; Francis D. Dimzon; | 2022 IEEE International Conference on Artificial … | 2022-09-13 |
859 | Bengali Speech Recognition: An Overview Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study outlines the notable efforts of creating of automatic speech recognition (ASR) system in Bengali. It describes data from the Bengali language’s existing voice corpus … |
MASHUK AREFIN PRANJOL et. al. | 2022 IEEE International Conference on Artificial … | 2022-09-13 |
860 | Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although over 300M around the world speak Bangla, scant work has been done in improving Bangla voice-to-text transcription due to Bangla being a low-resource language. However, … |
Mohammed Rakib; Md. Ismail Hossain; Nabeel Mohammed; F. Rahman; | Proceedings of the 2023 12th International Conference on … | 2022-09-13 |
861 | Automatic Speech Recognition Systems: A Survey of Discriminative Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View |
Amrit Kaur; Ashutosh Kumar Singh; Rohit Sachdeva; Vinay Kukreja; | Multimedia Tools and Applications | 2022-09-09 |
862 | Conversion of Acoustic Signal (Speech) Into Text By Digital Filter Using Natural Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we created an interface that transforms speech and other auditory inputs into text using a digital filter. |
Abhiram Katuri; Sindhu Salugu; Gelli Tharuni; Challa Sri Gouri; | arxiv-cs.AI | 2022-09-09 |
863 | Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present a new way to group multiple low-resource locales together and optimize the performance of Multilingual Transformer LMs in ASR. |
Li Miao; Jian Wu; Piyush Behre; Shuangyu Chang; Sarangarajan Parthasarathy; | arxiv-cs.CL | 2022-09-08 |
864 | Distilling The Knowledge of BERT for CTC-based ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR. |
Hayato Futami; Hirofumi Inaguma; Masato Mimura; Shinsuke Sakai; Tatsuya Kawahara; | arxiv-cs.CL | 2022-09-05 |
865 | Predict-and-Update Network: Audio-Visual Speech Recognition Inspired By Human Speech Perception IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Human studies suggest that visual signal primes the listener in advance as to when and on which frequency to attend to. We propose a Predict-and-Update Network (P&U net), to simulate such a visual cueing mechanism for Audio-Visual Speech Recognition (AVSR). |
Jiadong Wang; Xinyuan Qian; Haizhou Li; | arxiv-cs.MM | 2022-09-05 |
866 | Deep Sparse Conformer for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We improve Conformer’s long-sequence representation ability in two directions, \emph{sparser} and \emph{deeper}. |
Xianchao Wu; | arxiv-cs.CL | 2022-09-01 |
867 | Visual Speech Recognition in A Driver Assistance System IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Visual speech recognition or automated lip-reading is a field of growing attention. Video data proved its usefulness in multimodal speech recognition, especially when acoustic … |
D. Ivanko; D. Ryumin; Alexev Kashevnik; A. Axyonov; Alexey Karnov; | 2022 30th European Signal Processing Conference (EUSIPCO) | 2022-08-29 |
868 | DualVoice: Speech Interaction That Discriminates Between Normal and Whispered Voice Input Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Interactions based on automatic speech recognition (ASR) have become widely used, with speech input being increasingly utilized to create documents. However, as there is no easy … |
J. Rekimoto; | Proceedings of the 35th Annual ACM Symposium on User … | 2022-08-22 |
869 | Audio-Driven Deformation Flow for Effective Lip Reading Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Lip reading, also known as visual speech recognition (VSR), is the task to recognize the speech content using only the visual modality. Inspired by the natural synchronization … |
Dalu Feng; Shuang Yang; S. Shan; Xilin Chen; | 2022 26th International Conference on Pattern Recognition … | 2022-08-21 |
870 | Synthesising Audio Adversarial Examples for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the first time, we propose the Speech Synthesising based Attack (SSA), a novel threat model that constructs audio adversarial examples entirely from scratch, i.e., without depending on any existing audio to fool cutting-edge ASR models. To this end, we introduce a conditional variational auto-encoder (CVAE) as the speech synthesiser. |
XINGHUA QU et. al. | kdd | 2022-08-12 |
871 | ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by these challenges, in this paper we use a cloud based framework for production systems to demonstrate insights from privacy preserving incremental learning for automatic speech recognition (ILASR). |
GOPINATH CHENNUPATI et. al. | kdd | 2022-08-12 |
872 | Thai Wav2Vec2.0 with CommonVoice V8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model. |
Wannaphong Phatthiyaphaibun; Chompakorn Chaksangchaichot; Peerat Limkonchotiwat; Ekapol Chuangsuwanich; Sarana Nutanong; | arxiv-cs.CL | 2022-08-09 |
873 | Large Vocabulary Speech Recognition for Languages of Africa: Multilingual Modeling and Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data and collected data for 15 languages, and trained experimental models using these techniques. |
SANDY RITCHIE et. al. | arxiv-cs.CL | 2022-08-05 |
874 | Automatic Speech Recognition in German: A Detailed Error Analysis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. … |
Johannes Wirth; R. Peinl; | 2022 IEEE International Conference on Omni-layer … | 2022-08-01 |
875 | Self-conducted Speech Audiometry Using Automatic Speech Recognition: Simulation Results for Listeners with Hearing Loss Related Papers Related Patents Related Grants Related Venues Related Experts View |
Jasper Ooster; Laura Tuschen; B. Meyer; | Comput. Speech Lang. | 2022-08-01 |
876 | Global Performance Disparities Between English-Language Accents in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand the discussion beyond bias as a function of the individual national origin of the speaker to look for bias as a function of the geopolitical orientation of their nation of origin. |
Alex DiChristofano; Henry Shuster; Shefali Chandra; Neal Patwari; | arxiv-cs.CL | 2022-08-01 |
877 | Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: Cumulating Speech Resources in A Pluricentric Language IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Wei Xue; C. Cucchiarini; R. Hout; H. Strik; | Speech Commun. | 2022-08-01 |
878 | Pronunciation-aware Unique Character Encoding for RNN Transducer-based Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to use a novel pronunciation-aware unique character encoding for building E2E RNN-T-based Mandarin ASR systems. |
Peng Shen; Xugang Lu; Hisashi Kawai; | arxiv-cs.CL | 2022-07-29 |
879 | Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new approach to perform unsupervised fine-tuning and self-training using unlabeled speech data for recurrent neural network (RNN)-Transducer (RNN-T) end-to-end (E2E) automatic speech recognition (ASR) systems. |
Cong-Thanh Do; Mohan Li; Rama Doddipatla; | arxiv-cs.CL | 2022-07-29 |
880 | Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022). |
Alef Iury Siqueira Ferreira; Gustavo dos Reis Oliveira; | arxiv-cs.CL | 2022-07-28 |
881 | Automatic Speech Recognition Using Limited Vocabulary: A Survey IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support … |
Jean Louis Fendji Kedieng Ebongue; D. Tala; B. Yenke; M. Atemkeng; | Applied Artificial Intelligence | 2022-07-25 |
882 | Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: -In this article gesture recognition and speech recognition applications are implemented on embedded systems with Tiny Machine Learning (TinyML).The main benefit of using TinyML … |
V. VISWANATHA et. al. | ArXiv | 2022-07-23 |
883 | Toward Fairness in Speech Recognition: Discovery and Mitigation of Performance Disparities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we report on initial findings with both discovery and mitigation of performance disparities using data from a product-scale AI assistant speech recognition system. |
PRANAV DHERAM et. al. | arxiv-cs.CL | 2022-07-22 |
884 | ASR Error Detection Via Audio-Transcript Entailment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment. |
Nimshi Venkat Meripo; Sandeep Konam; | arxiv-cs.CL | 2022-07-21 |
885 | When Is TTS Augmentation Through A Pivot Language Useful? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an alternative: produce synthetic audio by running text from the target language through a trained TTS system for a higher-resource pivot language. |
Nathaniel Robinson; Perez Ogayo; Swetha Gangu; David R. Mortensen; Shinji Watanabe; | arxiv-cs.CL | 2022-07-20 |
886 | ASRTest: Automated Testing for Deep-neural-network-driven Speech Recognition Systems IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the rapid development of deep neural networks and end-to-end learning techniques, automatic speech recognition (ASR) systems have been deployed into our daily and assist in … |
Pin Ji; Yang Feng; Jia Liu; Zhihong Zhao; Zhenyu Chen; | Proceedings of the 31st ACM SIGSOFT International Symposium … | 2022-07-18 |
887 | Self-supervised Learning with Random-projection Quantizer for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple and effective self-supervised learning approach for speech recognition. |
Chung-Cheng Chiu; James Qin; Yu Zhang; Jiahui Yu; Yonghui Wu; | icml | 2022-07-15 |
888 | Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than … |
K. Wongpatikaseree; Sattaya Singkul; Narit Hnoohom; Sumeth Yuenyong; | Big Data Cogn. Comput. | 2022-07-15 |
889 | Data Augmentation for Low-Resource Quechua ASR Improvement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe our data augmentation approach to improve the results of ASR models for low-resource and agglutinative languages. |
Rodolfo Zevallos; Nuria Bel; Guillermo Cámbara; Mireia Farrús; Jordi Luque; | arxiv-cs.SD | 2022-07-14 |
890 | Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature Enhancement module (V-CAFE) to enhance the input noisy audio speech with a help of audio-visual correspondence. |
Joanna Hong; Minsu Kim; Daehun Yoo; Yong Man Ro; | arxiv-cs.SD | 2022-07-13 |
891 | Huqariq: A Multilingual Speech Corpus of Native Languages of Peru ForSpeech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Huqariq corpus is a multilingual collection of speech from native Peruvian languages. The transcribed corpus is intended for the research and development of speech … |
Rodolfo Zevallos; Luis Camacho; Nelsi Melgarejo; | ArXiv | 2022-07-12 |
892 | Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to verify the quality of the corpus, we present speech recognition experiments using 220 hours of fully transcribed audio. |
Rodolfo Zevallos; Luis Camacho; Nelsi Melgarejo; | arxiv-cs.CL | 2022-07-12 |
893 | Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Along with automatic speech recognition, many researchers have been actively studying speech emotion recognition, since emotion information is as crucial as the textual … |
Jiyoung Seo; Bowon Lee; | Symmetry | 2022-07-12 |
894 | Speaker Consistency Loss and Step-wise Optimization for Semi-supervised Joint Training of TTS and ASR Using Unpaired Text Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the semi-supervised joint training of text to speech (TTS) and automatic speech recognition (ASR), where a small amount of paired data and a large amount of unpaired text data are available. |
Naoki Makishima; Satoshi Suzuki; Atsushi Ando; Ryo Masumura; | arxiv-cs.SD | 2022-07-11 |
895 | Online Continual Learning of End-to-End Speech Recognition Models IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic … |
Muqiao Yang; I. Lane; Shinji Watanabe; | Interspeech | 2022-07-11 |
896 | Non-Autoregressive Chinese ASR Error Correction with Phonological Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the errors introduced by ASR systems will impair the performance of downstream tasks, we introduce a post-processing error correction method, PhVEC, to correct errors in text space. |
Zheng Fang; Ruiqing Zhang; Zhongjun He; Hua Wu; Yanan Cao; | naacl | 2022-07-09 |
897 | Investigating The Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities. |
Muhammad Umar Farooq; Thomas Hain; | arxiv-cs.CL | 2022-07-07 |
898 | Improving Transformer-based Conversational ASR By Inter-Sentential Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. |
Kun Wei; Pengcheng Guo; Ning Jiang; | arxiv-cs.SD | 2022-07-02 |
899 | Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator (TCPGen) component for end-to-end contextual ASR. |
Guangzhi Sun; Chao Zhang; Philip C. Woodland; | arxiv-cs.SD | 2022-07-02 |
900 | Adversarial Example Attacks Against ASR Systems: An Overview Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the development of hardware and algorithms, ASR(Automatic Speech Recognition) systems evolve a lot. As The models get simpler, the difficulty of development and deployment … |
XIAO ZHANG et. al. | 2022 7th IEEE International Conference on Data Science in … | 2022-07-01 |
901 | SpeechHide: A Hybrid Privacy-preserving Mechanism for Speech Content and Voiceprint in Speech Data Sharing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the development of speech technology, huge amounts of speech data generated by users is collected by speech service providers and may be used for data sharing. However, … |
Yu Hu; Ran Li; Simin Wang; Fuqiang Tao; Zhe Sun; | 2022 7th IEEE International Conference on Data Science in … | 2022-07-01 |
902 | Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining Vs. Semi-Supervised Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate continued pretraining (CoPT) with unlabeled in-language audio data on the XLSR-53 pretrained model in several low-resource languages. |
Mitchell DeHaven; Jayadev Billa; | arxiv-cs.CL | 2022-07-01 |
903 | FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to investigate the effectiveness of diverse SSLR combinations using various fusion methods within end-to-end (E2E) ASR models. |
Szu-Jui Chen; Jiamin Xie; John H. L. Hansen; | arxiv-cs.SD | 2022-06-30 |
904 | A NLP-based Approach to Improve Speech Recognition Services for People with Speech Disorders Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Current speech recognition services are not suitable for people with speech disorders, which present difficulties in coordinating muscles and articulating words and sentences. In … |
A. Celesti; M. Fazio; Lorenzo Carnevale; M. Villari; | 2022 IEEE Symposium on Computers and Communications (ISCC) | 2022-06-30 |
905 | Space-Efficient Representation of Entity-centric Query Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the use of probabilistic grammars as language models within the finite-state transducer (FST) framework. |
Christophe Van Gysel; Mirko Hannemann; Ernest Pusateri; Youssef Oualil; Ilya Oparin; | arxiv-cs.CL | 2022-06-29 |
906 | Bengali Common Voice Speech Dataset for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present insights obtained from the dataset and discuss key linguistic challenges that need to be addressed in future versions. |
SAMIUL ALAM et. al. | arxiv-cs.CL | 2022-06-28 |
907 | TALCS: An Open-Source Mandarin-English Code-Switching Corpus and A Speech Recognition Baseline IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we will introduce the recording procedure in detail, including audio capturing devices and corpus environments. |
CHENGFEI LI et. al. | arxiv-cs.CL | 2022-06-27 |
908 | TEVR: Improving Speech Recognition By Token Entropy Variance Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents TEVR, a speech recognition model designed to minimize the variation in token entropy w.r.t. to the language model. |
Hajo Nils Krabbenhöft; Erhardt Barth; | arxiv-cs.CL | 2022-06-25 |
909 | Distilling A Pretrained Language Model to A Multilingual ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method called the Distilling a Language model to a Speech model (Distill-L2S), which aligns the latent representations of two different modalities. |
Kwanghee Choi; Hyung-Min Park; | arxiv-cs.CL | 2022-06-25 |
910 | Pruned RNN-T for Fast, Memory-efficient ASR Training IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with … |
FANGJUN KUANG et. al. | Interspeech | 2022-06-23 |
911 | A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition(ASR) has been dominated by deep learning-based end-to-end speech recognition models. These approaches require large amounts of labeled data in the … |
Raviraj Joshi; Ashutosh Kumar Singh; | ArXiv | 2022-06-22 |
912 | Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance and also the pre-training efficiency, either through decoding with a hybrid ASR system to generate phoneme-level alignments (named PBERT), or performing clustering on the supervised speech features extracted from an end-to-end CTC model (named CTC clustering). |
CHENGYI WANG et. al. | arxiv-cs.CL | 2022-06-21 |
913 | The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech corpus of 155 hours. |
Jonathan Mukiibi; Andrew Katumba; Joyce Nakatumba-Nabende; Ali Hussein; Josh Meyer; | arxiv-cs.CL | 2022-06-20 |
914 | Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This gap introduces serious problems for ASR systems, especially when training or evaluating ASR models on datasets containing a lot of colloquial speech, such as the MALACH project. In this paper, we are addressing this problem in the light of a new paradigm in end-to-end ASR systems — recently introduced self-supervised audio Transformers. |
Jan Lehečka; Josef V. Psutka; Josef Psutka; | arxiv-cs.CL | 2022-06-15 |
915 | AVATAR: Unconstrained Audiovisual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is particularly useful for unconstrained videos, where the speaker is not necessarily visible. To solve this task, we propose a new sequence-to-sequence AudioVisual ASR TrAnsformeR (AVATAR) which is trained end-to-end from spectrograms and full-frame RGB. |
VALENTIN GABEUR et. al. | arxiv-cs.CV | 2022-06-15 |
916 | Exploring Capabilities of Monolingual Audio Transformers Using Large Datasets in Automatic Speech Recognition of Czech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks using a combination of in-domain data and almost 6 thousand hours of out-of-domain transcribed speech. |
Jan Lehečka; Jan Švec; Aleš Pražák; Josef V. Psutka; | arxiv-cs.CL | 2022-06-15 |
917 | Jira: A Central Kurdish Speech Recognition System, Designing and Building Speech Corpus and Pronunciation Lexicon Related Papers Related Patents Related Grants Related Venues Related Experts View |
H. Veisi; Hawre Hosseini; Mohammad MohammadAmini; Wirya Fathy; A. Mahmudi; | Language Resources and Evaluation | 2022-06-14 |
918 | Joint Encoder-Decoder Self-Supervised Pre-training for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Usually, encoder-decoder architecture works exceptionally well for a sequence-to-sequence task like ASR. Therefore, in this paper, we propose a new paradigm that exploits the power of a decoder during self-supervised learning. |
Arunkumar A; Umesh S; | arxiv-cs.CL | 2022-06-09 |
919 | The Necessity of Emotion Recognition from Speech Signals for Natural and Effective Human-Robot Interaction in Society 5.0 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The history of humanity has reached Industry 4.0 that aims to the integration of information technologies and especially artificial intelligence with all life-sustaining … |
Yeşím Ülgen Sönmez; A. Varol; | 2022 10th International Symposium on Digital Forensics and … | 2022-06-06 |
920 | Lip-Listening: Mixing Senses to Understand Lips Using Cross Modality Knowledge Distillation for Word-Based Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize … |
Hadeel Mabrouk; Omar Abugabal; Nourhan Sakr; Hesham M. Eraqi; | ArXiv | 2022-06-05 |
921 | LAE: Language-Aware Encoder for Monolingual and Multilingual ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding. |
JINCHUAN TIAN et. al. | arxiv-cs.CL | 2022-06-05 |
922 | Adaptive Activation Network For Low Resource Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduced an adaptive activation network to the upper layers of ASR model, and applied different activation functions to different languages. |
Jian Luo; Jianzong Wang; Ning Cheng; Zhenpeng Zheng; Jing Xiao; | arxiv-cs.CL | 2022-05-28 |
923 | Contextual Adapters for Personalized Speech Recognition in Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models. |
KANTHASHREE MYSORE SATHYENDRA et. al. | arxiv-cs.CL | 2022-05-26 |
924 | Global Normalization for Streaming Speech Recognition in A Modular Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. |
EHSAN VARIANI et. al. | arxiv-cs.LG | 2022-05-26 |
925 | On Building Spoken Language Understanding Systems for Low Resourced Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a series of experiments to explore extremely low-resourced settings where we perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset. |
Akshat Gupta; | arxiv-cs.CL | 2022-05-25 |
926 | Heterogeneous Reservoir Computing Models for Persian Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the accuracy of the RC in ASR applications, we propose heterogeneous single and multi-layer ESNs to create non-linear transformations of the inputs that capture temporal context at different scales. |
Zohreh Ansari; Farzin Pourhoseini; Fatemeh Hadaeghi; | arxiv-cs.SD | 2022-05-25 |
927 | FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. |
ALEXIS CONNEAU et. al. | arxiv-cs.CL | 2022-05-24 |
928 | Improved Language Models for ASR Using Written Language Text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The performance of an Automatic Speech Recognition (ASR) engine primarily depends on ($a$) the acoustic model (AM), (b) the language model (LM) and (c) the lexicon (Lx), While the … |
Kaustuv Mukherji; Meghna Pandharipande; Sunil Kumar Kopparapu; | 2022 National Conference on Communications (NCC) | 2022-05-24 |
929 | Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel method involves with multi-level modeling units, which integrates multi-level information for mandarin speech recognition. |
Yuting Yang; Binbin Du; Yuke Li; | arxiv-cs.CL | 2022-05-24 |
930 | End-to-End ASR-Enhanced Neural Network for Alzheimer’s Disease Diagnosis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents an approach to Alzheimer’s disease (AD) diagnosis from spontaneous speech using an end-to-end ASR-enhanced neural network. Under the condition that only audio … |
Jiancheng Gui; Yikai Li; Kai Chen; Joanna Siebert; Qingcai Chen; | ICASSP 2022 – 2022 IEEE International Conference on … | 2022-05-23 |
931 | Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the effectiveness of multi-modal acoustic modelling for dysarthric speech recognition using acoustic features along with articulatory information. |
Z. Yue; E. Loweimi; Z. Cvetkovic; H. Christensen; J. Barker; | icassp | 2022-05-22 |
932 | Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (SimuGAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels. |
C. Chen; N. Hou; Y. Hu; S. Shirol; E. S. Chng; | icassp | 2022-05-22 |
933 | Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use of the NOS rescoring model on a first-pass multilingual model and show that similar to the first-pass model, the rescoring model can be made multilingual. |
N. GAUR et. al. | icassp | 2022-05-22 |
934 | A Two-Step Approach to Leverage Contextual Data: Speech Recognition in Air-Traffic Communications IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a two-step callsign boosting approach: (1) at the 1st step (ASR), weights of probable callsign n-grams are reduced in G.fst and/or in the decoding FST (lattices), (2) at the 2nd step (NLP), callsigns extracted from the improved recognition outputs with Named Entity Recognition (NER) are correlated with the surveillance data to select the most suitable one. |
I. Nigmatulina; J. Zuluaga-Gomez; A. Prasad; S. Saeed Sarfjoo; P. Motlicek; | icassp | 2022-05-22 |
935 | Joint Modeling of Code-Switched and Monolingual ASR Via Conditional Factorization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. |
B. Yan; et al. | icassp | 2022-05-22 |
936 | Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents initial Speech Recognition results on �Casual Conversations� � a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone. |
C. Liu; et al. | icassp | 2022-05-22 |
937 | Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an embedding aligner and modality switch training to better align the speech and text latent spaces. |
W. Wang; et al. | icassp | 2022-05-22 |
938 | Model-Based Approach for Measuring The Fairness in ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest. |
Z. Liu; I. -E. Veliche; F. Peng; | icassp | 2022-05-22 |
939 | Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we explore a speaker reinforcement strategy for improving recognition performance without retraining the acoustic model (AM). |
C. Zorila; R. Doddipatla; | icassp | 2022-05-22 |
940 | The Royalflush System of Speech Recognition for M2met Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge. |
S. Ye; P. Wang; S. Chen; X. Hu; X. Xu; | icassp | 2022-05-22 |
941 | Unsupervised Speech Enhancement with Speech Recognition Embedding and Disentanglement Losses IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose an unsupervised loss function to tackle those two problems. |
V. A. Trinh; S. Braun; | icassp | 2022-05-22 |
942 | Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a DNN-based switching method that directly estimates whether ASR will perform better on the enhanced or observed signals. |
H. SATO et. al. | icassp | 2022-05-22 |
943 | End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. |
R. Kumar; A. Purushothaman; A. Sreeram; S. Ganapathy; | icassp | 2022-05-22 |
944 | Integrating Multiple ASR Systems Into NLP Backend with Attention Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reduce the impact of ASR errors on the NLP back-end by combining transcriptions from various ASR systems. |
T. Kano; A. Ogawa; M. Delcroix; S. Watanabe; | icassp | 2022-05-22 |
945 | TED Talk Teaser Generation with Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the challenge of automatically generating teasers for TED talks. |
G. Vico; J. Niehues; | icassp | 2022-05-22 |
946 | Improved Meta Learning for Low Resource Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new meta learning based framework for low resource speech recognition that improves the previous model agnostic meta learning (MAML) approach. |
S. Singh; R. Wang; F. Hou; | icassp | 2022-05-22 |
947 | Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training before being cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features. |
S. Hu; et al. | icassp | 2022-05-22 |
948 | Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. |
A. RATNARAJAH et. al. | icassp | 2022-05-22 |
949 | Dementia Detection By Fusing Speech and Eye-Tracking Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method of detecting dementia from the simultaneous speech and eye-tracking recordings of subjects in a picture description task. |
Z. Sheng; Z. Guo; X. Li; Y. Li; Z. Ling; | icassp | 2022-05-22 |
950 | Optimize Wav2vec2s Architecture for Small Training Set Through Analyzing Its Pre-Trained Models Attention Pattern Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. |
L. Chen; M. Asgari; H. H. Dodge; | icassp | 2022-05-22 |
951 | Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Transcribe-to-Diarize, a new approach for neural speaker diarization that uses an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR). |
N. Kanda; et al. | icassp | 2022-05-22 |
952 | Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of suppressing background noise with a conventional cascaded pipeline, we employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. |
H. Wang; et al. | icassp | 2022-05-22 |
953 | Punctuation Prediction for Streaming On-Device Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss one-pass models for both ASR and punctuation prediction to replace the conventional two-pass post-processing pipeline. |
Z. Zhou; T. Tan; Y. Qian; | icassp | 2022-05-22 |
954 | Contextual Adapters for Personalized Speech Recognition in Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models. |
K. M. Sathyendra; et al. | icassp | 2022-05-22 |
955 | Being Greedy Does Not Hurt: Sampling Strategies for End-To-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, the Optimal Completion Distillation (OCD) training method was proposed which attempts to address some of those issues. In this paper, we analyze if the method is competitive over a strong MLE baseline and investigate its scalability towards large speech data beyond read speech, which to our knowledge is the first attempt known in literature. |
J. Heymann; E. Lakomkin; L. R�del; | icassp | 2022-05-22 |
956 | Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks, and increase scalability of the model to multiple tasks or languages. |
B. Thomas; S. Kessler; S. Karout; | icassp | 2022-05-22 |
957 | Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization. |
T. Munkhdalai; et al. | icassp | 2022-05-22 |
958 | Speech Pattern Based Black-Box Model Watermarking for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first black-box model watermarking framework for protecting the IP of ASR models. |
H. CHEN et. al. | icassp | 2022-05-22 |
959 | Magic Dust for Cross-Lingual Adaptation of Monolingual Wav2vec-2.0 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in resource-scarce languages. |
S. Khurana; A. Laurent; J. Glass; | icassp | 2022-05-22 |
960 | Massively Multilingual ASR: A Lifelong Learning Solution IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the impact of adding more languages and propose a lifelong learning approach to build high quality MMASR systems. |
B. Li; et al. | icassp | 2022-05-22 |
961 | Analyzing The Robustness of Unsupervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Unsupervised speech recognition (unsupervised ASR) aims to learn the ASR system with non-parallel speech and text corpus only. Wav2vec-U [1] has shown promising results in … |
G. -T. Lin; C. -J. Hsu; D. -R. Liu; H. -Y. Lee; Y. Tsao; | icassp | 2022-05-22 |
962 | Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We trained personalized models for 195 individuals with different types and severities of speech impairment with training sets ranging in size from <1 minute to 18-20 minutes of speech data. |
J. Tobin; K. Tomanek; | icassp | 2022-05-22 |
963 | Fusing ASR Outputs in Joint Training for Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to fuse Automatic Speech Recognition (ASR) outputs into the pipeline for joint training SER. |
Y. Li; P. Bell; C. Lai; | icassp | 2022-05-22 |
964 | Multi-Turn RNN-T for Streaming Recognition of Multi-Party Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through an in-depth analysis, we discuss potential pitfalls of the proposed system as well as promising future research directions. |
I. Sklyar; A. Piunova; X. Zheng; Y. Liu; | icassp | 2022-05-22 |
965 | Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. |
A. Ogawa; N. Tawara; M. Delcroix; S. Araki; | icassp | 2022-05-22 |
966 | Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the over-suppression phenomenon in the enhanced speech might degrade the performance of downstream automatic speech recognition (ASR) task due to the missing latent information. To alleviate such problem, we propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature. |
Y. Hu; N. Hou; C. Chen; E. Siong Chng; | icassp | 2022-05-22 |
967 | End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a conformer-based multi-modal speech recognition system. |
J. Chen; M. Wang; X. -L. Zhang; Z. Huang; S. Rahardja; | icassp | 2022-05-22 |
968 | Conversational Speech Recognition By Learning Conversation-Level Characteristics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. |
K. Wei; Y. Zhang; S. Sun; L. Xie; L. Ma; | icassp | 2022-05-22 |
969 | LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, the sparsity of the supervised training data forces the model to have the ability to learn from limited data. To address these problems, we propose LatticeBART, a model that decodes the sequence from the lattice in an end-to-end fashion and can use the pre-trained language models� prior. |
L. Dai; L. Chen; Z. Zhou; K. Yu; | icassp | 2022-05-22 |
970 | Integer-Only Zero-Shot Quantization for Efficient Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, they require training and/or validation data during quantization, which may not be available due to security or privacy concerns. To address these limitations, we propose an integer-only, zeroshot quantization scheme for ASR models. |
S. Kim; et al. | icassp | 2022-05-22 |
971 | An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. |
S. Kessler; B. Thomas; S. Karout; | icassp | 2022-05-22 |
972 | Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel text representation and training framework for E2E ASR models. |
S. Thomas; B. Kingsbury; G. Saon; H. -K. J. Kuo; | icassp | 2022-05-22 |
973 | Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from previous one-piece model, in this paper, we propose a novel and agile framework called CR-ID for ASR error robust intent detection with two plug-and-play modules, namely semantic drift calibration module (SDCM) and phonemic refinement module (PRM), which are both model-agnostic and thus could be easily integrated to any existing intent detection models without modifying their structures. |
Peilin Zhou; Dading Chong; Helin Wang; Qingcheng Zeng; | arxiv-cs.CL | 2022-05-22 |
974 | Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to combine their ideas for end-to-end speech recognition model. |
S. Ling; C. Shen; M. Cai; Z. Ma; | icassp | 2022-05-22 |
975 | End-to-End Speech Recognition from Federated Acoustic Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French and Italian sets of the CommonVoice dataset, a large heterogeneous dataset containing thousands of different speakers, acoustic environments and noises. |
Y. Gao; et al. | icassp | 2022-05-22 |
976 | DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dual-path network for the far-field acoustic model, which uses voice processing (VP) signal and acoustic echo cancellation (AEC) signal as input. |
D. MA et. al. | icassp | 2022-05-22 |
977 | Joint and Adversarial Training with ASR for Expressive Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to alleviate the entanglement problem by integrating Text-To-Speech (TTS) model and Automatic Speech Recognition (ASR) model with a share layer network for joint training, and using ASR adversarial training to eliminate the content information in the style information. |
K. ZHANG et. al. | icassp | 2022-05-22 |
978 | An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore HuBERT with larger numbers of clusters and iterations in order to obtain better speech representation. |
T. Maekaku; X. Chang; Y. Fujita; S. Watanabe; | icassp | 2022-05-22 |
979 | Transformer-Based Streaming ASR with Cumulative Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR). |
M. Li; S. Zhang; C. Zorila; R. Doddipatla; | icassp | 2022-05-22 |
980 | Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel approach to introduce LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. |
J. Tian; et al. | icassp | 2022-05-22 |
981 | Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR. |
M. Soleymanpour; M. T. Johnson; R. Soleymanpour; J. Berry; | icassp | 2022-05-22 |
982 | Joint Speech Recognition and Audio Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of AAC is to generate natural language descriptions of contents in audio samples. We propose several approaches for end-to-end joint modeling of ASR and AAC tasks and demonstrate their advantages over traditional approaches, which model these tasks independently. |
C. NARISETTY et. al. | icassp | 2022-05-22 |
983 | A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There-fore, we propose a time domain progressive learning (TDPL) approach for speech enhancement and ASR. |
Z. Nian; J. Du; Y. Ting Yeung; R. Wang; | icassp | 2022-05-22 |
984 | Privacy Attacks for Automatic Speech Recognition Acoustic Models in A Federated Learning Framework IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset. |
N. Tomashenko; S. Mdhaffar; M. Tommasi; Y. Est�ve; J. -F. Bonastre; | icassp | 2022-05-22 |
985 | Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all three stages of the system is proposed. |
G. Li; J. Yu; J. Deng; X. Liu; H. Meng; | icassp | 2022-05-22 |
986 | Channel-Wise AV-Fusion Attention for Multi-Channel Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our work for automatic speech recognition (ASR) in the Multimodal Information Based Speech Processing (MISP) Challenge 2021. |
G. Xu; et al. | icassp | 2022-05-22 |
987 | Exploring Machine Speech Chain For Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the TTS?ASR pipeline in machine speech chain to perform domain adaptation for both E2E ASR and neural TTS models with only text data from the target domain. |
F. Yue; Y. Deng; L. He; T. Ko; Y. Zhang; | icassp | 2022-05-22 |
988 | RescoreBERT: Discriminative Speech Recognition Rescoring With Bert IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. |
L. Xu; et al. | icassp | 2022-05-22 |
989 | Bilingual End-to-End ASR with Byte-Level Subwords Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). |
L. Deng; R. Hsiao; A. Ghoshal; | icassp | 2022-05-22 |
990 | Listen, Know and Spell: Knowledge-Infused Subword Modeling for Improving ASR Performance of OOV Named Entities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Knowledge-Infused Subword Model (KISM), a novel technique for incorporating semantic context from KGs into the ASR pipeline for improving the performance of OOV named entities. |
N. DAS et. al. | icassp | 2022-05-22 |
991 | Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning. |
Y. WANG et. al. | icassp | 2022-05-22 |
992 | Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach for multi-modal emotion recognition from conversations using speech and text. |
S. Dutta; S. Ganapathy; | icassp | 2022-05-22 |
993 | End-to-End ASR-Enhanced Neural Network for Alzheimer�s Disease Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an approach to Alzheimer�s disease (AD) diagnosis from spontaneous speech using an end-to-end ASR-enhanced neural network. |
J. Gui; Y. Li; K. Chen; J. Siebert; Q. Chen; | icassp | 2022-05-22 |
994 | Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. |
Q. LI et. al. | icassp | 2022-05-22 |
995 | Reference Microphone Selection and Low-Rank Approximation Based Multichannel Wiener Filter with Application to Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an experimental study on the low-rank approximation and reference microphone selection based MWF with application to noisy speech recognition. |
X. -Y. Chen; J. Zhang; L. -R. Dai; | icassp | 2022-05-22 |
996 | SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to create a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE) consisting of limited-size labeled training sets and corresponding evaluation sets. |
S. Shon; et al. | icassp | 2022-05-22 |
997 | Improving Recognition-Synthesis Based Any-to-one Voice Conversion with Cyclic Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This inconsistency between conversion and training stages constrains the speaker similarity of converted speech. To address this issue, a cyclic training method is proposed in this paper. |
Y. -N. Chen; L. -J. Liu; Y. -J. Hu; Y. Jiang; Z. -H. Ling; | icassp | 2022-05-22 |
998 | Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast and accurate end-to-end (E2E) model, which executes automatic speech recognition (ASR) and downstream natural language processing (NLP) simultaneously. |
M. Omachi; Y. Fujita; S. Watanabe; T. Wang; | icassp | 2022-05-22 |
999 | Speaker-Targeted Audio-Visual Speech Recognition Using A Hybrid CTC/Attention Model with Interference Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although AV Align shows an improvement in recognition accuracy in background noise environments, we have observed that the recognition accuracy degrades significantly in interference speaker environments, where a target speech and an interfering speech overlap each other. In order to improve the speech recognition accuracy of the target speaker in such situations, we propose a method that combines the auxiliary loss function that maximizes the recognition accuracy of the interference speaker and the CTC loss function for training the AV-ASR model. |
R. Tsunoda; R. Aihara; R. Takashima; T. Takiguchi; Y. Imai; | icassp | 2022-05-22 |
1000 | Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fulfill the two demands, in this paper, we propose a NAR CTC/attention model utilizing both pre-trained acoustic and language models: wav2vec2.0 and BERT. |
K. DENG et. al. | icassp | 2022-05-22 |
1001 | Caching Networks: Capitalizing on Common Speech for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Caching Networks (CachingNets), a speech recognition network architecture capable of delivering faster, more accurate decoding by leveraging common speech patterns. |
A. Alexandridis; et al. | icassp | 2022-05-22 |
1002 | Improving Spoken Language Understanding By Enhancing Text Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel model to train a language model namely CapuBERT that is able to deal with spoken form input from ASR module. |
T. B. Nguyen; | icassp | 2022-05-22 |
1003 | Phone-Informed Refinement of Synthesized Mel Spectrogram for Data Augmentation in Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a phone-informed post-processing network that refines Mel spectrograms without using the vocoder. |
S. Ueno; T. Kawahara; | icassp | 2022-05-22 |
1004 | Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages. |
X. Zhao; et al. | icassp | 2022-05-22 |
1005 | Sentiment-Aware Automatic Speech Recognition Pre-Training for Enhanced Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). |
A. Ghriss; B. Yang; V. Rozgic; E. Shriberg; C. Wang; | icassp | 2022-05-22 |
1006 | Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic speech recognition (ASR) of multi-channel multi-speaker overlapped speech remains one of the most challenging tasks to the speech community. In this paper, we look into this challenge by utilizing the location information of target speakers in the 3D space for the first time. |
Y. Shao; S. -X. Zhang; D. Yu; | icassp | 2022-05-22 |
1007 | Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A question that naturally arises is whether the dissemination of personalized acoustic models can leak personal information. In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker. |
S. Mdhaffar; J. -F. Bonastre; M. Tommasi; N. Tomashenko; Y. Est�ve; | icassp | 2022-05-22 |
1008 | Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network, which can be applied to a wider range of tasks. |
X. Chang; N. Moritz; T. Hori; S. Watanabe; J. L. Roux; | icassp | 2022-05-22 |
1009 | Factorized Neural Transducer for Efficient Language Model Adaptation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This draw-back might prevent their potential applications in practice. In order to address this issue, we propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction, and adopting a standalone language model for the vocabulary prediction. |
X. Chen; Z. Meng; S. Parthasarathy; J. Li; | icassp | 2022-05-22 |
1010 | Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. |
C. -H. H. Yang; et al. | icassp | 2022-05-22 |
1011 | AISHELL-NER: Named Entity Recognition from Chinese Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech. |
B. CHEN et. al. | icassp | 2022-05-22 |
1012 | Effect of Noise Suppression Losses on Speech Distortion and ASR Performance IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the introduced speech distortion and artifacts greatly harm speech quality and intelligibility, and often significantly degrade automatic speech recognition (ASR) rates. In this work, we shed light on the success of the spectral complex compressed mean squared error (MSE) loss, and how its magnitude and phase-aware terms are related to the speech distortion vs. noise reduction trade off. |
S. Braun; H. Gamper; | icassp | 2022-05-22 |
1013 | Performance-Efficiency Trade-Offs in Unsupervised Pre-Training for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Putting together all our observations, we introduce SEW-D (Squeezed and Efficient Wav2vec with Disentangled Attention), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. |
F. WU et. al. | icassp | 2022-05-22 |
1014 | M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems. |
F. Yu; et al. | icassp | 2022-05-22 |
1015 | Enhance Rnnlms with Hierarchical Multi-Task Learning for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, how to best share information among related tasks in MTL remains to be addressed. In this current work, we propose a hierarchical multi-task learning (HMTL) approach to incorporate linguistic knowledge into recurrent neural network language models (RNNLM), instead of using linguistic features as word factors. |
M. Song; Y. Zhao; | icassp | 2022-05-22 |
1016 | Knowledge Transfer from Large-Scale Pretrained Language Models to End-To-End Speech Recognizers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since end-to-end models are also known to be severely data hungry, this constraint is crucial especially because obtaining transcribed utterances is costly and can possibly be impractical or impossible. This paper proposes a method for alleviating this issue by transferring knowledge from a language model neural network that can be pretrained with text-only data. |
Y. Kubo; S. Karita; M. Bacchiani; | icassp | 2022-05-22 |
1017 | Multi-Stage and Multi-Loss Training for Fullband Non-Personalized and Personalized Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep learning-based wideband (16kHz) speech enhancement approaches have surpassed traditional methods. This work further extends the existing wideband systems to enable full-band … |
L. Chen; et al. | icassp | 2022-05-22 |
1018 | WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. |
B. Zhang; et al. | icassp | 2022-05-22 |
1019 | SYNT++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two novel techniques during training to mitigate the problems due to the distribution gap: (i) a rejection sampling algorithm and (ii) using separate batch normalization statistics for the real and the synthetic samples. |
T. -Y. HU et. al. | icassp | 2022-05-22 |
1020 | Curriculum Optimization for Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new difficulty measure called compression ratio that can be used as a scoring function for raw audio in various noise conditions. |
A. Kuznetsova; A. Kumar; J. D. Fox; F. M. Tyers; | icassp | 2022-05-22 |
1021 | Usted: Improving ASR with A Unified Speech and Text Encoder-Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose training ASR model jointly with a set of text-to-text auxiliary tasks with which it shares a decoder and parts of the encoder. |
B. Yusuf; A. Gandhe; A. Sokolov; | icassp | 2022-05-22 |
1022 | A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. |
Q. -S. ZHU et. al. | icassp | 2022-05-22 |
1023 | Best of Both Worlds: Multi-Task Audio-Visual Automatic Speech Recognition and Active Speaker Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has shown that we can solve both problems simultaneously by employing an attention mechanism over the competing video tracks of the speakers� faces, at the cost of sacrificing some accuracy on active speaker detection. This work closes this gap in active speaker detection accuracy by presenting a single model that can be jointly trained with a multi-task loss. |
O. Braga; O. Siohan; | icassp | 2022-05-22 |
1024 | SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. |
J. Pan; T. Lei; K. Kim; K. J. Han; S. Watanabe; | icassp | 2022-05-22 |
1025 | Multi-Modal Pre-Training for Automated Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel approach that leverages a self-supervised learning technique based on masked language modeling to compute a global, multi-modal encoding of the environment in which the utterance occurs. |
D. M. Chan; S. Ghosh; D. Chakrabarty; B. Hoffmeister; | icassp | 2022-05-22 |
1026 | Endpoint Detection for Streaming End-to-End Multi-Talker ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the EP detection problem in the SURT framework by introducing an end-of-sentence token as an output unit, following the practice of single-talker end-to-end models. |
L. Lu; J. Li; Y. Gong; | icassp | 2022-05-22 |
1027 | Exploring Effective Data Utilization for Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a series of training strategies to exploring more effective data utilization for low-resource speech recognition. |
Z. Zhou; W. Wang; W. Zhang; Y. Qian; | icassp | 2022-05-22 |
1028 | The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the SJTU system for ICASSP Multi-modal Information based Speech Processing Challenge (MISP) 2021. |
W. Wang; et al. | icassp | 2022-05-22 |
1029 | Spell My Name: Keyword Boosted Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recognition of uncommon words such as names and technical terminology is important to understanding conversations in context. However, the ability to recognise such words remains … |
N. Jung; G. Kim; J. S. Chung; | icassp | 2022-05-22 |
1030 | Towards Better Meta-Initialization with Task Augmentation for Kindergarten-Aged Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we validate the effectiveness of MI in children�s ASR and attempt to alleviate the problem of learner overfitting. |
Y. Zhu; R. Fan; A. Alwan; | icassp | 2022-05-22 |
1031 | Summary on The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions. |
F. Yu; et al. | icassp | 2022-05-22 |
1032 | Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence and ASR Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). |
Z. Wang; et al. | icassp | 2022-05-22 |
1033 | Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-Box Acoustic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A deep neural network (DNN)-based speech enhancement (SE) aiming to maximize the performance of an automatic speech recognition (ASR) system is proposed in this paper. |
R. Sawata; Y. Kashiwagi; S. Takahashi; | icassp | 2022-05-22 |
1034 | Insights on Neural Representations for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: End-to-end automatic speech recognition (ASR) models aim to learn a generalised speech representation. |
Anna Ollerenshaw; Md Asif Jalal; Thomas Hain; | arxiv-cs.CL | 2022-05-19 |
1035 | Minimising Biasing Word Errors for Contextual ASR with The Tree-Constrained Pointer Generator IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel tree-constrained pointer generator (TCPGen) component that enables end-to-end ASR models to bias towards a list of long-tail words obtained using external contextual information. |
Guangzhi Sun; Chao Zhang; Philip C Woodland; | arxiv-cs.CL | 2022-05-18 |
1036 | PriMock57: A Dataset Of Primary Care Mock Consultations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We detail the development of a public access, high quality dataset comprising of 57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. |
Alex Papadopoulos Korfiatis; Francesco Moramarco; Radmila Sarac; Aleksandar Savkov; | acl | 2022-05-17 |
1037 | Unified Speech-Text Pre-training for Speech Translation and Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition. |
YUN TANG et. al. | acl | 2022-05-17 |
1038 | Deploying Self-supervised Learning in The Wild for Hybrid Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a full exploration on how to utilize uncurated audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model. |
MOSTAFA KARIMI et. al. | arxiv-cs.SD | 2022-05-17 |
1039 | Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR Via Speech Chain Reconstruction and Self-Transcribing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an improved consistency training paradigm of semi-supervised S2S ASR. |
Heli Qi; Sashi Novitasari; Sakriani Sakti; Satoshi Nakamura; | arxiv-cs.CL | 2022-05-14 |
1040 | LAS-Transformer: An Enhanced Transformer Based on The Local Attention Mechanism for Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, Transformer-based models have shown promising results in automatic speech recognition (ASR), outperforming models based on recurrent neural networks (RNNs) and … |
Pengbin Fu; Daxing Liu; Huirong Yang; | Inf. | 2022-05-13 |
1041 | DMS-SK/BLSTM-CTC Hybrid Network for Gesture/Speech Fusion and Its Application in Lunar Robot-Astronauts Interaction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the future manned lunar exploration mission, astronauts would work with the lunar robots, which has a high requirement for human–robot interaction (HRI). As the accuracy of … |
Jianli Ding; Jin Liu; X. Ning; Z. Kang; | Int. J. Pattern Recognit. Artif. Intell. | 2022-05-12 |
1042 | MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large-scale automatic speech recognition model has achieved impressive performance. However, huge computational resources and massive amount of data are required to train an ASR … |
Xing Wu; Yifan Jin; Jianjia Wang; Quan Qian; Yike Guo; | Algorithms | 2022-05-11 |
1043 | Hearing Voices at The National Library – A Speech Corpus and Acoustic Model for The Swedish Language Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper details our work in developing new acoustic models for automated speech recognition (ASR) at KBLab, the infrastructure for data-driven research at the National Library … |
Martin Malmsten; Chris Haffenden; Love Borjeson; | ArXiv | 2022-05-06 |
1044 | Hearing Voices at The National Library — A Speech Corpus and Acoustic Model for The Swedish Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate different approaches for a viable speech-to-text pipeline for audiovisual resources in Swedish, using the wav2vec 2.0 architecture in combination with speech corpuses created from KB’s collections. |
Martin Malmsten; Chris Haffenden; Love Börjeson; | arxiv-cs.CL | 2022-05-06 |
1045 | Speaker Recognition in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori. |
NEERAJ CHHIMWAL et. al. | arxiv-cs.SD | 2022-05-05 |
1046 | DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition technology has played an indispensable role in realizing human-computer intelligent interaction. However, most of the current Chinese speech recognition systems … |
Hong Lei; Yue Xiao; Yanchun Liang; Dalin Li; Heow Pueh Lee; | Complex. | 2022-05-02 |
1047 | Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. |
FELIX WU et. al. | arxiv-cs.CL | 2022-05-02 |
1048 | Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present our recent improvements within the CapisciAMe project, an Italian initiative aimed at investigating the usage of deep learning strategies for automatic … |
Davide Mulfari; A. Celesti; M. Villari; | 2022 22nd IEEE International Symposium on Cluster, Cloud … | 2022-05-01 |
1049 | Bilingual End-to-End ASR with Byte-Level Subwords Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). |
Liuhui Deng; Roger Hsiao; Arnab Ghoshal; | arxiv-cs.CL | 2022-05-01 |
1050 | Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) researchers are turning their attention towards supporting low-resource languages, such as isiXhosa or Marathi, with only limited training … |
THOMAS REITMAIER et. al. | Proceedings of the 2022 CHI Conference on Human Factors in … | 2022-04-29 |
1051 | Stuttering Disfluency Detection Using Machine Learning Approaches Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Stuttering is a neurodevelopmental speech disorder wherein people suffer from disfluency in speech generation. Recent research has applied machine learning and deep learning … |
Abedal-Kareem Al-Banna; E. Edirisinghe; H. Fang; W. Hadi; | J. Inf. Knowl. Manag. | 2022-04-28 |
1052 | Why Does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. |
SANYUAN CHEN et. al. | arxiv-cs.CL | 2022-04-27 |
1053 | Improving Multimodal Speech Recognition By Data Augmentation and Speech Representations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate ways of improving the base speech recognition system by following similar techniques to the ones used for the visual encoder, namely, transferring representations and data augmentation. |
Dan Oneata; Horia Cucu; | arxiv-cs.SD | 2022-04-27 |
1054 | DualVoice: A Speech Interaction Method Using Whisper-Voice As Commands Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Applications based on speech recognition have become widely used, and speech input is increasingly being utilized to create documents. However, it is still difficult to correct … |
J. Rekimoto; | CHI Conference on Human Factors in Computing Systems … | 2022-04-27 |
1055 | E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to replace the VAD with an end-to-end ASR model capable of predicting segment boundaries in a streaming fashion, allowing the segmentation decision to be conditioned not only on better acoustic features but also on semantic features from the decoded text with negligible extra computation. |
W. RONNY HUANG et. al. | arxiv-cs.SD | 2022-04-22 |
1056 | Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we aim to tackle these problems with a novel layer-wise adaptation structure injected into the E2E ASR model encoder. |
Xun Gong; Yizhou Lu; Zhikai Zhou; Yanmin Qian; | arxiv-cs.SD | 2022-04-21 |
1057 | Personalized Taiwanese Speech Synthesis Using Cascaded ASR and TTS Framework Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: To bring endangered Taiwanese language back to life, this paper leveraged a large-scale Taiwanese across Taiwan (TAT) corpus to construct cascaded automatic speech recognition … |
Y. LIAO et. al. | 2022 32nd International Conference Radioelektronika … | 2022-04-21 |
1058 | WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Wave BERT (WaBERT), a novel end-to-end model combining the speech model and the language model for SLU tasks. |
LIN YAO et. al. | arxiv-cs.CL | 2022-04-21 |
1059 | Disappeared Command: Spoofing Attack On Automatic Speech Recognition Systems with Sound Masking Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The development of deep learning technology has greatly promoted the performance improvement of automatic speech recognition (ASR) technology, which has demonstrated an ability … |
Jinghui Xu; Jifeng Zhu; Yong Yang; | arxiv-cs.SD | 2022-04-19 |
1060 | Automated Speech Tools for Helping Communities Process Restricted-access Corpora for Language Revival Efforts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g. |
NAY SAN et. al. | arxiv-cs.CL | 2022-04-14 |
1061 | HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the model efficiency, we propose an early exit scheme for ASR, namely HuBERT-EE, that allows the model to stop the inference dynamically. |
Ji Won Yoon; Beom Jun Woo; Nam Soo Kim; | arxiv-cs.CL | 2022-04-13 |
1062 | ASR in German: A Detailed Error Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. |
Johannes Wirth; Rene Peinl; | arxiv-cs.CL | 2022-04-12 |
1063 | Large-Scale Streaming End-to-End Speech Translation with Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly. |
Jian Xue; Peidong Wang; Jinyu Li; Matt Post; Yashesh Gaur; | arxiv-cs.CL | 2022-04-11 |
1064 | MAESTRO: Matched Speech Text Representations Through Modality Matching IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Maestro, a self-supervised training method to unify representations learnt from speech and text modalities. |
ZHEHUAI CHEN et. al. | arxiv-cs.CL | 2022-04-07 |
1065 | Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis. In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue. |
SRAVYA POPURI et. al. | arxiv-cs.CL | 2022-04-06 |
1066 | Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a learnable and interpretable framework to combine SF and SSL representations. |
DAN BERREBBI et. al. | arxiv-cs.CL | 2022-04-05 |
1067 | Towards End-to-end Unsupervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Similar to the trend of making supervised speech recognition end-to-end, we introduce wav2vec-U 2.0 which does away with all audio-side pre-processing and improves accuracy through better architecture. |
Alexander H. Liu; Wei-Ning Hsu; Michael Auli; Alexei Baevski; | arxiv-cs.CL | 2022-04-05 |
1068 | A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage unpaired data to train a general sequence-to-sequence model. |
YE-QIAN DU et. al. | arxiv-cs.SD | 2022-04-05 |
1069 | Audio-visual Multi-channel Speech Separation, Dereverberation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all three stages of the system is proposed. |
Guinan Li; Jianwei Yu; Jiajun Deng; Xunying Liu; Helen Meng; | arxiv-cs.SD | 2022-04-05 |
1070 | A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using French as our investigation language, we train and compare gender-specific wav2vec 2.0 models against models containing different degrees of gender balance in their pre-training data. |
Marcely Zanon Boito; Laurent Besacier; Natalia Tomashenko; Yannick Estève; | arxiv-cs.CL | 2022-04-04 |
1071 | Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model. |
ABNER HERNANDEZ et. al. | arxiv-cs.CL | 2022-04-04 |
1072 | Deliberation Model for On-Device Spoken Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR’s text and audio embeddings. |
DUC LE et. al. | arxiv-cs.CL | 2022-04-04 |
1073 | Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we have used transfer learning approach using most recent Deep Speech model i.e., deepspeech-0.9.3 to develop an end-to-end speech recognition system for Indian-English accents. |
Priyank Dubey; Bilal Shah; | arxiv-cs.CL | 2022-04-02 |
1074 | Speaker Adaptation for Wav2vec2 Based Dysarthric ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple adaptation network for fine-tuning wav2vec2 using fMLLR features. |
MURALI KARTHICK BASKAR et. al. | arxiv-cs.SD | 2022-04-02 |
1075 | End-to-end Model for Named Entity Recognition from Speech Without Paired Training Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an approach to build an end-to-end neural model to extract semantic information in a scenario in which zero paired audio data is available. |
Salima Mdhaffar; Jarod Duret; Titouan Parcollet; Yannick Estève; | arxiv-cs.CL | 2022-04-02 |
1076 | Zero-Shot Cross-lingual Aphasia Detection Using Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an end-to-end pipeline using pre-trained Automatic Speech Recognition (ASR) models that share cross-lingual speech representations and are fine-tuned for our desired low-resource languages. |
GERASIMOS CHATZOUDIS et. al. | arxiv-cs.LG | 2022-04-01 |
1077 | End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS). |
Xuankai Chang; Takashi Maekaku; Yuya Fujita; Shinji Watanabe; | arxiv-cs.SD | 2022-04-01 |
1078 | End-to-end Multi-talker Audio-visual ASR Using An Active Speaker Attention Module Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new approach for end-to-end audio-visual multi-talker speech recognition. |
Richard Rose; Olivier Siohan; | arxiv-cs.SD | 2022-04-01 |
1079 | Text-To-Speech Data Augmentation for Low Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this research is to propose a new data augmentation method to improve ASR models for agglutinative and low-resource languages. |
Rodolfo Zevallos; | arxiv-cs.CL | 2022-04-01 |
1080 | A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a comparative study on speaker-attributed automatic speech recognition (SA-ASR) in the multi-party meeting scenario, a topic with increasing attention in meeting rich transcription. |
Fan Yu; Zhihao Du; Shiliang Zhang; Yuxiao Lin; Lei Xie; | arxiv-cs.SD | 2022-03-31 |
1081 | Perceptive, Non-linear Speech Processing and Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss the potential of perceptive speech analysis and processing in combination with biologically plausible neural network processors. |
Jean Rouat; Ramin Pichevar; Stéphane Loiselle; | arxiv-cs.SD | 2022-03-31 |
1082 | Effectiveness of Text to Speech Pseudo Labels for Forced Alignment and Cross Lingual Pretrained Models for Low Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present an approach to create labelled data for Maithili, Bhojpuri and Dogri by utilising pseudo labels from text to speech for forced alignment. |
ANIRUDH GUPTA et. al. | arxiv-cs.CL | 2022-03-31 |
1083 | Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model. |
JUNYI AO et. al. | arxiv-cs.SD | 2022-03-31 |
1084 | Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the problems, we introduce explicit interaction between characters and syllables using Self-conditioned connectionist temporal classification (CTC), in which the upper layers are “self-conditioned” on the intermediate predictions from the lower layers. |
Yusuke Fujita; Tatsuya Komatsu; Yusuke Kida; | arxiv-cs.CL | 2022-03-31 |
1085 | Analyzing The Factors Affecting Usefulness of Self-Supervised Pre-trained Representations for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, as part of the Interspeech Gram Vaani ASR challenge, we try to study the effect of domain, language, dataset size, and other aspects of our upstream pre-training SSL data on the final performance low-resource downstream ASR task. |
Ashish Seth; Lodagala V S V Durga Prasad; Sreyan Ghosh; S. Umesh; | arxiv-cs.CL | 2022-03-31 |
1086 | CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel channel and temporal-wise attention RNN (CTA-RNN) architecture based on the intermediate representations of pre-trained ASR models. |
Chengxin Chen; Pengyuan Zhang; | arxiv-cs.SD | 2022-03-31 |
1087 | HiFi-VC: High Quality ASR-Based Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new any-to-any voice conversion pipeline. |
A. Kashkin; I. Karpukhin; S. Shishkin; | arxiv-cs.SD | 2022-03-31 |
1088 | Is Word Error Rate A Good Evaluation Metric for Speech Recognition in Indic Languages? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). |
PRIYANSHI SHAH et. al. | arxiv-cs.CL | 2022-03-30 |
1089 | Code Switched and Code Mixed Speech Recognition for Indic Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID). |
HARVEEN SINGH CHADHA et. al. | arxiv-cs.CL | 2022-03-30 |
1090 | Improving Speech Recognition for Indic Languages Using Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages. |
ANKUR DHURIYA et. al. | arxiv-cs.CL | 2022-03-30 |
1091 | Vakyansh: ASR Toolkit for Low Resource Indic Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through Vakyansh, we introduce automatic data pipelines for data creation, model training, model evaluation and deployment. |
HARVEEN SINGH CHADHA et. al. | arxiv-cs.CL | 2022-03-30 |
1092 | Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (Simu-GAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels. |
Chen Chen; Nana Hou; Yuchen Hu; Shashank Shirol; Eng Siong Chng; | arxiv-cs.SD | 2022-03-29 |
1093 | WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. |
BINBIN ZHANG et. al. | arxiv-cs.SD | 2022-03-29 |
1094 | Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate SSL frameworks such as the wav2vec 2.0 and WavLM models using different setups and compare their performance with different supervised pretraining setups, using two types of pathological speech, namely, Japanese electrolaryngeal and English dysarthric. |
Lester Phillip Violeta; Wen-Chin Huang; Tomoki Toda; | arxiv-cs.SD | 2022-03-29 |
1095 | Earnings-22: A Practical Benchmark for Accents in The Wild IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To ensure this type of speech is represented in ASR benchmarking, we present Earnings-22, a 125 file, 119 hour corpus of English-language earnings calls gathered from global companies. |
Miguel Del Rio; Peter Ha; Quinten McNamara; Corey Miller; Shipra Chandra; | arxiv-cs.CL | 2022-03-29 |
1096 | Finnish Parliament ASR Corpus – Analysis, Benchmarks and Statistics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we publish and analyse the Finnish parliament ASR corpus, the largest publicly available collection of manually transcribed speech data for Finnish with over 3000 hours of speech and 449 speakers for which it provides rich demographic metadata. |
Anja Virkkunen; Aku Rouhe; Nhan Phan; Mikko Kurimo; | arxiv-cs.CL | 2022-03-28 |
1097 | A Dataset for Speech Emotion Recognition in Greek Theatrical Plays Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Machine learning methodologies can be adopted in cultural applications and propose new ways to distribute or even present the cultural content to the public. For instance, speech … |
Maria Moutti; S. Eleftheriou; Panagiotis Koromilas; Theodoros Giannakopoulos; | International Conference on Language Resources and … | 2022-03-27 |
1098 | Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a modification of the conventional FDLP model that allows easy interpretability of the complex cepstrum as temporal modulations in an all-pole model approximation of the power of the speech signal. |
Samik Sadhu; Hynek Hermansky; | arxiv-cs.SD | 2022-03-24 |
1099 | Disentangleing Content and Fine-grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages. |
XINTAO ZHAO et. al. | arxiv-cs.SD | 2022-03-23 |
1100 | BeParrot: Efficient Interface for Transcribing Unclear Speech Via Respeaking Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transcribing speech from audio files to text is an important task not only for exploring the audio content in text form but also for utilizing the transcribed data as a source to … |
Riku Arakawa; Hiromu Yakura; Masataka Goto; | 27th International Conference on Intelligent User Interfaces | 2022-03-22 |
1101 | Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence and ASR Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). |
ZEXUN WANG et. al. | arxiv-cs.CL | 2022-03-22 |
1102 | Inequity in Popular Speech Recognition Systems for Accented English Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Voice-enabled technology has become increasingly common in homes, businesses, and other parts of everyday life. The benefits of smart speakers, hands-free controllers, and digital … |
Chinaemere Ike; Seth Polsley; T. Hammond; | 27th International Conference on Intelligent User Interfaces | 2022-03-22 |
1103 | A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study developed and validated a training pipeline for fine-tuning state-of-the-art (SOTA) neural TTS models using child speech datasets. |
Rishabh Jain; Mariam Yiwere; Dan Bigioi; Peter Corcoran; Horia Cucu; | arxiv-cs.SD | 2022-03-22 |
1104 | Intelligent Stuttering Speech Recognition: A Succinct Review Related Papers Related Patents Related Grants Related Venues Related Experts View |
N. Banerjee; Samarjeet Borah; Nilambar Sethi; | Multimedia Tools and Applications | 2022-03-19 |
1105 | Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR. |
Abdul Hameed Azeemi; Ihsan Ayyub Qazi; Agha Ali Raza; | arxiv-cs.LG | 2022-03-18 |
1106 | Prediction of Speech Intelligibility with DNN-based Performance Measures IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. |
Angel Mario Castro Martinez; Constantin Spille; Jana Roßbach; Birger Kollmeier; Bernd T. Meyer; | arxiv-cs.SD | 2022-03-17 |
1107 | Data Analytics on Eco-Conditional Factors Affecting Speech Recognition Rate of Modern Interaction Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech-based Interaction systems contribute to the growing class of contemporary interactive techniques (Human-Computer Interactive system), which have emerged quickly in the last … |
A. C. KALADEVI et. al. | J. Mobile Multimedia | 2022-03-16 |
1108 | Modelling Word Learning and Recognition Using Visually Grounded Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Methods: We investigate the time-course of word recognition as simulated by the model using a gating paradigm to test whether its recognition is affected by well-known word-competition effects in human speech processing. |
Danny Merkx; Sebastiaan Scholten; Stefan L. Frank; Mirjam Ernestus; Odette Scharenborg; | arxiv-cs.CL | 2022-03-14 |
1109 | The Improving Effect of Intelligent Speech Recognition System on English Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: To improve the effect of English learning in the context of smart education, this study combines speech coding to improve the intelligent speech recognition algorithm, builds an … |
Qinqin Luo; | Adv. Multim. | 2022-03-10 |
1110 | Adaptation of A Pronunciation Dictionary for Dysarthric Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the general framework of an automatic speech recognition system, a pronunciation dictionary, that is a mapping table from a phoneme sequence to a word, is used both in the … |
Yuya Sawa; R. Takashima; T. Takiguchi; | 2022 IEEE 4th Global Conference on Life Sciences and … | 2022-03-07 |
1111 | Data Augmentation for Dysarthric Speech Recognition Based on Text-to-Speech Synthesis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the field of automatic speech recognition (ASR) for people with dysarthria, it is problematic that not enough training speech data can be collected from people with dysarthria. … |
Yuki Matsuzaka; R. Takashima; Chiho Sasaki; T. Takiguchi; | 2022 IEEE 4th Global Conference on Life Sciences and … | 2022-03-07 |
1112 | AaeCAPTCHA: The Design and Implementation of Audio Adversarial CAPTCHA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the robustness of audio CAPTCHAs against automated abuses, we present the design and implementation of an audio adversarial CAPTCHA (aaeCAPTCHA) system in this paper. |
Md Imran Hossen; Xiali Hei; | arxiv-cs.CR | 2022-03-05 |
1113 | Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. |
XIAOQIANG WANG et. al. | arxiv-cs.CL | 2022-03-02 |
1114 | Spike‐Enabled Audio Learning in Multilevel Synaptic Memristor Array‐Based Spiking Neural Network IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition involves the ability to learn the audios which are closely related to event sequence. Although speech recognition has been widely implemented in software neural … |
Xulei Wu; B. Dang; Hong Wang; Xiu-Qing Wu; Yuchao Yang; | Advanced Intelligent Systems | 2022-03-01 |
1115 | Uneven Success: Automatic Speech Recognition and Ethnicity-related Dialects IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Alicia B. Wassink; Cady Gansen; Isabel Bartholomew; | Speech Commun. | 2022-03-01 |
1116 | Multilingual Speech Recognition for GlobalPhone Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Martha Yifiru Tachbelie; S. Abate; Tanja Schultz; | Speech Commun. | 2022-03-01 |
1117 | A Conformer Based Acoustic Model for Robust Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acoustic model. |
Yufeng Yang; Peidong Wang; DeLiang Wang; | arxiv-cs.SD | 2022-03-01 |
1118 | Deep Neural Network Based Chinese Dialect Classification Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the recent advance of neural networks in audio speech recognition (ASR), Deep Neural Network Based ASR has been widely used in multiple application scenarios such as smart … |
MIAO WAN et. al. | 2021 Ninth International Conference on Advanced Cloud and … | 2022-03-01 |
1119 | Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network, which can be applied to a wider range of tasks. |
Xuankai Chang; Niko Moritz; Takaaki Hori; Shinji Watanabe; Jonathan Le Roux; | arxiv-cs.SD | 2022-03-01 |
1120 | Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel text representation and training framework for E2E ASR models. |
Samuel Thomas; Brian Kingsbury; George Saon; Hong-Kwang J. Kuo; | arxiv-cs.CL | 2022-02-26 |
1121 | Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. |
Xiaoxiao Miao; Xin Wang; Erica Cooper; Junichi Yamagishi; Natalia Tomashenko; | arxiv-cs.SD | 2022-02-26 |
1122 | A Survey of Multilingual Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we survey the state of the art in multilingual ASR models that are built with cross-lingual transfer in mind. |
Hemant Yadav; Sunayana Sitaram; | arxiv-cs.CL | 2022-02-25 |
1123 | Language Technology Practitioners As Language Managers: Arbitrating Data Bias and Predictive Bias in ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we use the lens of language policy to analyse how current practices in training and testing ASR systems in industry lead to the data bias giving rise to these systematic error differences. |
Nina Markl; Stephen Joseph McNulty; | arxiv-cs.CL | 2022-02-25 |
1124 | Ask2Mask: Guided Data Selection for Masked Speech Modeling Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods … |
M. Baskar; A. Rosenberg; B. Ramabhadran; Yu Zhang; P. Moreno; | IEEE Journal of Selected Topics in Signal Processing | 2022-02-24 |
1125 | Korean Tokenization for Beam Search Rescoring in Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR. |
Kyuhong Shim; Hyewon Bae; Wonyong Sung; | arxiv-cs.CL | 2022-02-22 |
1126 | Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An adversarial attack might entail presenting a model with inaccurate or fabricated samples as it’s training data, or introducing maliciously designed data to deceive an already trained model. |
NGOC DUNG HUYNH et. al. | arxiv-cs.SD | 2022-02-21 |
1127 | End-to-end Contextual Asr Based on Posterior Distribution Adaptation for Hybrid Ctc/attention System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases. |
Zhengyi Zhang; Pan Zhou; | arxiv-cs.CL | 2022-02-17 |
1128 | AISHELL-NER: Named Entity Recognition from Chinese Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech. |
BOLI CHEN et. al. | arxiv-cs.CL | 2022-02-17 |
1129 | Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method for alleviating this issue by transferring knowledge from a language model neural network that can be pretrained with text-only data. |
Yotaro Kubo; Shigeki Karita; Michiel Bacchiani; | arxiv-cs.CL | 2022-02-16 |
1130 | Conversational Speech Recognition By Learning Conversation-level Characteristics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. |
Kun Wei; Yike Zhang; Sining Sun; Lei Xie; Long Ma; | arxiv-cs.SD | 2022-02-15 |
1131 | USTED: Improving ASR with A Unified Speech and Text Encoder-Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose training ASR model jointly with a set of text-to-text auxiliary tasks with which it shares a decoder and parts of the encoder. |
Bolaji Yusuf; Ankur Gandhe; Alex Sokolov; | arxiv-cs.CL | 2022-02-12 |
1132 | ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle those challenges by proposing ASRPU, a programmable accelerator for on-edge ASR. |
Dennis Pinto; Jose-María Arnau; Antonio González; | arxiv-cs.AR | 2022-02-10 |
1133 | Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We further \textbf{(ii)} incorporate language model decoding in the ASR system, along with the fine-tuning method. |
Peter Sullivan; Toshiko Shibano; Muhammad Abdul-Mageed; | arxiv-cs.CL | 2022-02-10 |
1134 | English Speech Emotion Recognition Method Based on Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Man Liu; | International Journal of Speech Technology | 2022-02-08 |
1135 | Understanding The Role of Self Attention for Efficient Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the role of self attention in Transformer-based speech recognition and present a practical technique to design a model that accelerates the inference and improve the performance. |
Kyuhong Shim; Jungwook Choi; Wonyong Sung; | iclr | 2022-02-08 |
1136 | A Two-step Approach to Leverage Contextual Data: Speech Recognition in Air-traffic Communications IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a two-step callsign boosting approach: (1) at the 1 step (ASR), weights of probable callsign n-grams are reduced in G.fst and/or in the decoding FST (lattices), (2) at the 2 step (NLP), callsigns extracted from the improved recognition outputs with Named Entity Recognition (NER) are correlated with the surveillance data to select the most suitable one. |
Iuliia Nigmatulina; Juan Zuluaga-Gomez; Amrutha Prasad; Seyyed Saeed Sarfjoo; Petr Motlicek; | arxiv-cs.CL | 2022-02-08 |
1137 | Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions. |
FAN YU et. al. | arxiv-cs.SD | 2022-02-08 |
1138 | Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks, and increase scalability of the model to multiple tasks or languages. |
Bethan Thomas; Samuel Kessler; Salah Karout; | arxiv-cs.CL | 2022-02-07 |
1139 | Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a key characteristic in audio-visual speech recognition (AVSR), relating linguistic information observed across visual and audio data has been a challenge, benefiting not only audio/visual speech recognition (ASR/VSR) but also for manipulating data within/across modalities. In this paper, we present a feature disentanglement-based framework for jointly addressing the above tasks. |
Chih-Chun Yang; Wan-Cyuan Fan; Cheng-Fu Yang; Yu-Chiang Frank Wang; | aaai | 2022-02-07 |
1140 | Polyphonic Pitch Detection with Convolutional Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we outline an online polyphonic pitch detection system that streams audio to MIDI by ConvLSTMs. |
Carl Thomé; Sven Ahlbäck; | arxiv-cs.SD | 2022-02-04 |
1141 | The RoyalFlush System of Speech Recognition for M2MeT Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge. |
Shuaishuai Ye; Peiyao Wang; Shunfei Chen; Xinhui Hu; Xinkang Xu; | arxiv-cs.SD | 2022-02-03 |
1142 | Error Correction in ASR Using Sequence-to-Sequence Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The outputs of an ASR system are largely prone to phonetic and spelling errors. In this paper, we propose to use a powerful pre-trained sequence-to-sequence model, BART, further adaptively trained to serve as a denoising model, to correct errors of such types. |
SAMRAT DUTTA et. al. | arxiv-cs.CL | 2022-02-02 |
1143 | Language Dependencies in Adversarial Attacks on Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare the attackability of a German and an English ASR system, taking Deepspeech as an example. |
Karla Markert; Donika Mirdita; Konstantin Böttinger; | arxiv-cs.CL | 2022-02-01 |
1144 | A Bidirectional Context Embedding Transformer for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transformers have become popular in building end-to-end automatic speech recognition (ASR) systems. However, transformer ASR systems are usually trained to give output sequences … |
L. LIAO et. al. | Inf. | 2022-01-29 |
1145 | Reducing Language Context Confusion for End-to-end Code-switching Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory. |
SHUAI ZHANG et. al. | arxiv-cs.CL | 2022-01-28 |
1146 | Sentiment-Aware Automatic Speech Recognition Pre-training for Enhanced Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). |
Ayoub Ghriss; Bo Yang; Viktor Rozgic; Elizabeth Shriberg; Chao Wang; | arxiv-cs.CL | 2022-01-27 |
1147 | Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. |
PIOTR ŻELASKO et. al. | arxiv-cs.SD | 2022-01-26 |
1148 | The Norwegian Parliamentary Speech Corpus IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To test the usefulness of this dataset, we have compared an ASR system trained on the NPSC with a baseline system trained on only manuscript-read speech. |
Per Erik Solberg; Pablo Ortiz; | arxiv-cs.CL | 2022-01-26 |
1149 | Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-visual automatic speech recognition (AV-ASR) extends speech recognition by introducing the video modality as an ad-ditional source of information. In this work, the … |
Dmitriy Serdyuk; Otavio Braga; O. Siohan; | Interspeech | 2022-01-25 |
1150 | Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, the information contained in the motion of the speaker’s mouth is used to augment the audio features. |
Dmitriy Serdyuk; Otavio Braga; Olivier Siohan; | arxiv-cs.CV | 2022-01-25 |
1151 | Speech Recognition for Light Control on Raspberry Pi Using Python Programming Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Internet of Things has been substantially developed for disabled and elderly persons in various domains. Speech recognition is an extremely challenging technique for … |
P. Netinant; Krairat Arpabusayapan; Meennapa Rukhiran; | Proceedings of the 2022 5th International Conference on … | 2022-01-21 |
1152 | Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when … |
MENGZHE GENG et. al. | arxiv-cs.SD | 2022-01-14 |
1153 | Investigation of Data Augmentation Techniques for Disordered Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation. |
MENGZHE GENG et. al. | arxiv-cs.SD | 2022-01-14 |
1154 | The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. |
Luke Prananta; Bence Mark Halpern; Siyuan Feng; Odette Scharenborg; | arxiv-cs.SD | 2022-01-13 |
1155 | Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text. |
AMIR HUSSEIN et. al. | arxiv-cs.CL | 2022-01-07 |
1156 | End-to-End Speech to Braille Translation in Japanese Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study addresses an end-to-end braille translation approach from Japanese speech for the deaf-blind. In Japan, automatic Braille translation from spoken language is expected … |
A. Kobayashi; Junji Onishi; H. Nishizaki; N. Kitaoka; | 2022 IEEE International Conference on Consumer Electronics … | 2022-01-07 |
1157 | Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. |
TIEZHENG YU et. al. | arxiv-cs.CL | 2022-01-07 |
1158 | Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences for each partial hypothesis. |
Jinchuan Tian; Jianwei Yu; Chao Weng; Yuexian Zou; Dong Yu; | arxiv-cs.CL | 2022-01-06 |
1159 | Robust Self-Supervised Audio-Visual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a self-supervised AVSR framework built upon Audio-Visual HuBERT (AV-HuBERT), a state-of-the-art audio-visual speech representation learning model. |
Bowen Shi; Wei-Ning Hsu; Abdelrahman Mohamed; | arxiv-cs.SD | 2022-01-05 |
1160 | Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most easiest and efficient way for … |
Yuanfeng Song; Raymond Chi-Wing Wong; Xuefang Zhao; Di Jiang; | arxiv-cs.DB | 2022-01-04 |
1161 | Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model … |
Hosung Park; Changmin Kim; Hyunsoo Son; Soonshin Seo; Ji-Hwan Kim; | J. Web Eng. | 2022-01-04 |
1162 | ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spoken Language Understanding (SLU) aims to interpret the meanings of human speeches in order to support various human-machine interaction systems. A key technique for SLU is … |
CHENGYU WANG et. al. | IEEE/ACM Transactions on Audio, Speech, and Language … | 2022-01-01 |
1163 | Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders Related Papers Related Patents Related Grants Related Venues Related Experts View |
J. Svec; Filip Polák; A. Bartoš; M. Zapletalová; Martin Víta; | International Conference on Text, Speech and Dialogue | 2022-01-01 |
1164 | Deep Investigation of The Recent Advances in Dialectal Arabic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition systems play an important role in human–machine interactions. Many systems exist for Arabic speech, however, there are limited systems for dialectal Arabic … |
Hamzah A. Alsayadi; A. Abdelhamid; I. Hegazy; Bandar Alotaibi; Z. Fayed; | IEEE Access | 2022-01-01 |
1165 | Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and A Two-Stage Pre-Trained Approach Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Nowadays Automatic Speech Recognition (ASR) systems can accurately recognize which words are said. However, due to the disfluency, grammatical error, and other phenomena in … |
Junwei Liao; Yu Shi; Yong Xu; | IEEE Access | 2022-01-01 |
1166 | Multi-sequence Intermediate Conditioning for CTC-based ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end automatic speech recognition (ASR) directly maps input speech to a character sequence without using pronunciation lexica. However, in languages with thousands of … |
Yusuke Fujita; Tatsuya Komatsu; Yusuke Kida; | ArXiv | 2022-01-01 |
1167 | Cleanformer: A Microphone Array Configuration-invariant, Streaming, Multichannel Neural Enhancement Frontend for ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This work introduces the Cleanformer , a streaming multichannel neural based enhancement frontend for automatic speech recognition (ASR). This model has a conformer-based … |
J. Caroselli; A. Naranayan; Tom O’Malley; | ArXiv | 2022-01-01 |
1168 | An E2E-ASR-Based Iteratively-Trained Timestamp Estimator Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Text-to-speech alignment, also known as time alignment, is essential for automatic speech recognition (ASR) systems used for speech retrieval tasks, such as keyword search and … |
Runyan Yang; Gaofeng Cheng; Pengyuan Zhang; Yonghong Yan; | IEEE Signal Processing Letters | 2022-01-01 |
1169 | The Performance of Wearable Speech Enhancement System Under Noisy Environment: An Experimental Study Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Wearable speech enhancement can improve the recognition accuracy of the speech signals in stationary noise environments at 0dB to 60dB signal to noise ratio. Beamforming, adaptive … |
Pavani Cherukuru; Mumtaz Begum Mustafa; Hema Subramaniam; | IEEE Access | 2022-01-01 |
1170 | Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech emotion recognition (SER) is essential for understanding a speaker’s intention. Recently, some groups have attempted to improve SER performance using a bidirectional long … |
Jennifer Santoso; Takeshi Yamada; K. Ishizuka; Taiichi Hashimoto; S. Makino; | IEEE Access | 2022-01-01 |
1171 | Estonian Speech Recognition and Transcription Editing Service IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: . This paper describes the latest iteration of our Estonian speech recognition system and the publicly available transcription editing service. The system is now based on an … |
Aivo Olev; Tanel Alumäe; | Balt. J. Mod. Comput. | 2022-01-01 |
1172 | Deep Convolutional Neural Network for Arabic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View |
RAFIK AMARI et. al. | International Conference on Computational Collective … | 2022-01-01 |
1173 | On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic recognition of dysarthric and elderly speech highly challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender commonly found in normal … |
MENGZHE GENG et. al. | ArXiv | 2022-01-01 |
1174 | MTL-SLT: Multi-Task Learning for Spoken Language Tasks IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Language understanding in speech-based systems has attracted extensive interest from both academic and industrial communities in recent years with the growing demand for … |
ZHIQI HUANG et. al. | NLP4CONVAI | 2022-01-01 |
1175 | A Novel Approach of Audio-Visual Color Recognition Using KNN Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech is one of the attractive areas of the scientists to research in the field of machine learning and they got maximum success in Automatic Speech Recognition system. ASR … |
Bachchu Paul; Tanushree Dey; Debashri Das Adhikary; Sanchita Guchhai; Somnath Bera; | 2022-01-01 | |
1176 | Speech Recognition Technologies Based on Artificial Intelligence Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View |
M. Musaev; I. Khujayarov; M. Ochilov; | IEEE International Conference on Healthcare Informatics | 2022-01-01 |
1177 | IBGS: A Wearable Smart System to Assist Visually Challenged IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Traditional blind guide devices are expensive and large. In this study, an intelligent blind guide system (IBGS) was introduced. GD32 is used as the main control chip, it … |
Kun Xia; Xueyong Li; Haiyang Liu; Mingli Zhou; Kexin Zhu; | IEEE Access | 2022-01-01 |
1178 | Natural Backdoor Attacks on Speech Recognition Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Jinwen Xin; X. Lyu; Jing Ma; | International Conference on Machine Learning for Cyber … | 2022-01-01 |
1179 | A Hidden Markov Optimization Model for Processing and Recognition of English Speech Feature Signals Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition plays an important role in human–computer interaction. The higher the accuracy and efficiency of speech recognition are, the larger the improvement of … |
Yinchun Chen; | Journal of Intelligent Systems | 2022-01-01 |
1180 | The BiLSTM-based Synthesized Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View |
Dmitry Efanov; P. Aleksandrov; Nikolay Karapetyants; | BICA*AI | 2022-01-01 |
1181 | Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In this article we present the design and the development of a knowledge based computational linguistic tool, Mlphon for Malayalam language. Mlphon computationally models … |
K. Manohar; A. Jayan; R. Rajan; | IEEE Access | 2022-01-01 |
1182 | Emotional Speech Recognition Based on Lip-Reading Related Papers Related Patents Related Grants Related Venues Related Experts View |
E. Ryumina; D. Ivanko; | International Conference on Speech and Computer | 2022-01-01 |
1183 | Performance Disparities Between Accents in Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) services are ubiquitous, transforming speech into text for systems like Amazon’s Alexa, Google’s Assistant, and Microsoft’s Cortana. However, … |
Alex DiChristofano; Henry Shuster; Shefali Chandra; Neal Patwari; | ArXiv | 2022-01-01 |
1184 | Evaluation of Automatic Speech Recognition Approaches Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: . Automatic Speech Recognition (ASR) is essential for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. … |
R. P. MAGALHÃES et. al. | J. Inf. Data Manag. | 2022-01-01 |
1185 | ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Even though attention-based end-to-end (E2E) automatic speech recognition (ASR) models have been yielding state-of-the-art recognition accuracy, they still fall behind many of the … |
Gaofeng Cheng; Haoran Miao; Runyan Yang; Keqi Deng; Yonghong Yan; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2022-01-01 |
1186 | Taris: An Online Speech Recognition Framework with Sequence to Sequence Neural Networks for Both Audio-only and Audio-visual Speech Related Papers Related Patents Related Grants Related Venues Related Experts View |
George Sterpu; N. Harte; | Comput. Speech Lang. | 2022-01-01 |
1187 | Findings of The Shared Task on Speech Recognition for Vulnerable Individuals in Tamil IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper illustrates the overview of the sharedtask on automatic speech recognition in the Tamillanguage. In the shared task, spontaneousTamil speech data gathered from elderly … |
B. B et. al. | LTEDI | 2022-01-01 |
1188 | NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper provides an overview of NVIDIA NeMo’s speech translation systems for the IWSLT 2022 Offline Speech Translation Task. Our cascade system consists of 1) Conformer RNN-T … |
OLEKSII HRINCHUK et. al. | International Workshop on Spoken Language Translation | 2022-01-01 |
1189 | Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Time-frequency (TF) masks are widely used in speech enhancement (SE). However, accurately estimating TF masks from noisy speech remains a challenge to both statistical or neural … |
Suliang Bu; Yunxin Zhao; Tuo Zhao; Shaojun Wang; Mei Han; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2022-01-01 |
1190 | The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the HW-TSC’s designation of the Offline Speech Translation System submitted for IWSLT 2022 Evaluation. We explored both cascade and end-to-end system on three … |
MINGHAN WANG et. al. | International Workshop on Spoken Language Translation | 2022-01-01 |
1191 | CMU’s IWSLT 2022 Dialect Speech Translation System IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes CMU’s submissions to the IWSLT 2022 dialect speech translation (ST) shared task for translating Tunisian-Arabic speech to English text. We use additional … |
BRIAN YAN et. al. | International Workshop on Spoken Language Translation | 2022-01-01 |
1192 | Generative Adversarial Networks for Speech Processing: A Review IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
AAMIR WALI et. al. | Computer Speech & Language | 2022-01-01 |
1193 | JHU IWSLT 2022 Dialect Speech Translation System Description Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper details the Johns Hopkins speech translation (ST) system used in the IWLST2022 dialect speech translation task. Our system uses a cascade of automatic speech … |
Jinyi Yang; A. Hussein; Matthew Wiesner; S. Khudanpur; | International Workshop on Spoken Language Translation | 2022-01-01 |
1194 | Speech Recognition Lab Related Papers Related Patents Related Grants Related Venues Related Experts View |
Alessia Cornaggia; Fahrettin Gökgöz; F. Kurth; Hans-Christian Schmitz; Kevin Wilkinghoff; | ICMCIS | 2022-01-01 |
1195 | A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality speech. However, TTS speech intelligibility degrades in noisy environments … |
Sashi Novitasari; S. Sakti; Satoshi Nakamura; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2022-01-01 |
1196 | Bangla Spoken Numerals Recognition By Using HMM Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech is one of the most natural forms of vocalized communication media. Nowadays with the advancement of machine learning, different doors are opened to us for finding several … |
Bachchu Paul; Debashri Das Adhikary; Tanushree Dey; Sanchita Guchhait; Somnath Bera; | 2022-01-01 | |
1197 | Towards Representative Subset Selection for Self-Supervised Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Self-supervised speech recognition models require considerable labeled training data for learning high-fidelity representations for Automatic Speech Recognition (ASR) which is … |
Abdul Hameed Azeemi; I. Qazi; Agha Ali Raza; | ArXiv | 2022-01-01 |
1198 | SSNCSE_NLP@LT-EDI-ACL2022: Speech Recognition for Vulnerable Individuals in Tamil Using Pre-trained XLSR Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition is a tool used to transform human speech into a written form. It is used in a variety of avenues, such as in voice commands, customer, service and … |
Dhanya Srinivasan; B. Bharathi; Thenmozhi Durairaj; B. Senthilkumar; | LTEDI | 2022-01-01 |
1199 | End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention in recent years. Most existing methods feature a signal processing frontend and an ASR … |
WANGYOU ZHANG et. al. | IEEE/ACM Transactions on Audio, Speech, and Language … | 2022-01-01 |
1200 | Exploring The Effect of Dialect Mismatched Language Models in Telugu Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Previous research has found that Acoustic Models (AM) of an Automatic Speech Recognition (ASR) system are susceptible to dialect variations within a language, thereby adversely … |
Aditya Yadavalli; Mirishkar Sai Ganesh; A. Vuppala; | North American Chapter of the Association for Computational … | 2022-01-01 |
1201 | Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transformer-based models have led to significant innovation in various classic and practical subjects, including speech processing, natural language processing, and computer … |
Fu-Hao Yu; Kuan-Yu Chen; Keda Lu; | IEEE/ACM Transactions on Audio, Speech, and Language … | 2022-01-01 |
1202 | Seamless Equal Accuracy Ratio for Inclusive CTC Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View |
HETING GAO et. al. | Speech Commun. | 2022-01-01 |
1203 | Multi-Variant Consistency Based Self-supervised Learning for Robust Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the proposed method on the commercially-motivated dataset, CHiME-4, and the meeting dataset, AMI. |
Changfeng Gao; Gaofeng Cheng; Pengyuan Zhang; | arxiv-cs.SD | 2021-12-23 |
1204 | Voice Quality and Pitch Features in Transformer-Based Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the effects of incorporating voice quality and pitch features altogether and separately to a Transformer-based ASR model, with the intuition that the attention mechanisms might exploit latent prosodic traits. |
Guillermo Cámbara; Jordi Luque; Mireia Farrús; | arxiv-cs.CL | 2021-12-21 |
1205 | JTubeSpeech: Corpus of Japanese Speech Collected from YouTube for Speech Recognition and Speaker Verification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct a new Japanese speech corpus called JTubeSpeech. |
Shinnosuke Takamichi; Ludwig Kürzinger; Takaaki Saeki; Sayaka Shiota; Shinji Watanabe; | arxiv-cs.SD | 2021-12-17 |
1206 | Improving Deep Learning Based Automatic Speech Recognition for Gujarati IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that … |
Deep Raval; Vyom Pathak; Muktan Patel; Brijesh Bhatt; | Transactions on Asian and Low-Resource Language Information … | 2021-12-14 |
1207 | Automatic Speech Recognition for Low-Resource Languages: The Thuee Systems for The IARPA Openasr20 Evaluation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The paper introduces our Automatic Speech Recognition (ASR) systems for the IARPA Open Automatic Speech Recognition Challenge (OpenASR20) as well as some post explorations with … |
Jing Zhao; Gui-Xin Shi; Guan-Bo Wang; Weiqiang Zhang; | 2021 IEEE Automatic Speech Recognition and Understanding … | 2021-12-13 |
1208 | Detecting Audio Adversarial Examples with Logit Noising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to detect audio adversarial examples by adding noise to the logits before feeding them into the decoder of the ASR. |
Namgyu Park; Sangwoo Ji; Jong Kim; | arxiv-cs.CR | 2021-12-13 |
1209 | Far-Field Speech Recognition Based on Complex-Valued Neural Networks and Inter-Frame Similarity Difference Method Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Far-field automatic speech recognition (ASR) is a challenging task due to the background noise and reverberation. To address this issue, we introduce a novel end-to-end … |
Y. Guo; Yifan Chen; Gaofeng Cheng; Pengyuan Zhang; Yonghong Yan; | 2021 IEEE Automatic Speech Recognition and Understanding … | 2021-12-13 |
1210 | PM-MMUT: Boosted Phone-Mask Data Augmentation Using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To boost the performance of PMT, we propose multi-modeling unit training (MMUT) architecture fusion with PMT (PM-MMUT). |
Guodong Ma; Pengfei Hu; Nurmemet Yolwas; Shen Huang; Hao Huang; | arxiv-cs.SD | 2021-12-13 |
1211 | Data Augmentation for ASR Using TTS Via A Discrete Representation IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While end-to-end automatic speech recognition (ASR) has achieved high performance, it requires a huge amount of paired speech and transcription data for training. Recently, data … |
Sei Ueno; M. Mimura; S. Sakai; Tatsuya Kawahara; | 2021 IEEE Automatic Speech Recognition and Understanding … | 2021-12-13 |
1212 | Improving ASR Error Correction Using N-Best Hypotheses IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the field of Automatic Speech Recognition (ASR), Grammatical Error Correction (GEC) can be used to correct errors in recognition results of ASR systems and whereby it further … |
Linchen Zhu; Wenjie Liu; Linquan Liu; Ed Lin; | 2021 IEEE Automatic Speech Recognition and Understanding … | 2021-12-13 |
1213 | Improving Speech Recognition on Noisy Speech Via Speech Enhancement with Multi-Discriminators CycleGAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method named Multi-discriminators CycleGAN to reduce noise of input speech and therefore improve the automatic speech recognition performance. |
Chia-Yu Li; Ngoc Thang Vu; | arxiv-cs.CL | 2021-12-12 |
1214 | Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we used a pre-trained acoustic model to generate a perceptual loss that makes speech enhancement more aware of the phonetic properties of the signal. |
Peter Plantinga; Deblin Bagchi; Eric Fosler-Lussier; | arxiv-cs.SD | 2021-12-11 |
1215 | Building A Great Multi-lingual Teacher with Sparsely-gated Mixture of Experts for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate how multi-lingual Automatic Speech Recognition (ASR) networks can be scaled up with a simple routing algorithm in order to achieve better accuracy. |
KENICHI KUMATANI et. al. | arxiv-cs.CL | 2021-12-10 |
1216 | Sequence-level Self-learning with Multiple Hypotheses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). |
KENICHI KUMATANI et. al. | arxiv-cs.CL | 2021-12-10 |
1217 | Revisiting The Boundary Between ASR and NLU in The Age of Conversational Dialog Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of the observations we make in this paper, we argue that (1) NLU should be cognizant of the presence of ASR models being used upstream in a dialog system’s pipeline, (2) ASR should be able to learn from errors found in NLU, (3) there is a need for end-to-end datasets that provide semantic annotations on spoken input, (4) there should be stronger collaboration between ASR and NLU research communities. |
Manaal Faruqui; Dilek Hakkani-Tür; | arxiv-cs.CL | 2021-12-10 |
1218 | A Sequence-to-sequence Based Error Correction Model for Medical Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The use of Automatic Speech Recognition (ASR) systems in medical applications is receiving rapidly growing interest due to their ability to reduce distractions and the cognitive … |
Yu Jiang; C. Poellabauer; | 2021 IEEE International Conference on Bioinformatics and … | 2021-12-09 |
1219 | CommanderGabble: A Universal Attack Against ASR Systems Leveraging Fast Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) systems are widely used in various online transcription services and personal digital assistants. Emerging lines of research have demonstrated … |
Zhaohe Zhang; Edwin Yang; Song Fang; | Annual Computer Security Applications Conference | 2021-12-06 |
1220 | Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. |
JINCHUAN TIAN et. al. | arxiv-cs.AI | 2021-12-05 |
1221 | Multimodal N-best List Rescoring with Weakly Supervised Pre-training in Hybrid Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: N-best list rescoring, an essential step in hybrid automatic speech recognition (ASR), aims to re-evaluate the N-best hypothesis list decoded by the acoustic model (AM) and … |
Yuanfeng Song; Xiaoling Huang; Xuefang Zhao; Di Jiang; Raymond Chi-Wing Wong; | 2021 IEEE International Conference on Data Mining (ICDM) | 2021-12-01 |
1222 | Joint Modeling of Code-Switched and Monolingual ASR Via Conditional Factorization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. |
BRIAN YAN et. al. | arxiv-cs.CL | 2021-11-29 |
1223 | Romanian Speech Recognition Experiments from The ROBIN Project Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents different speech recognition experiments with deep neural networks focusing on producing fast (under 100ms latency from the network itself), while still reliable models. |
Andrei-Marius Avram; Vasile Păiş; Dan Tufiş; | arxiv-cs.CL | 2021-11-23 |
1224 | Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we look into this challenge by utilizing the location information of target speakers in the 3D space for the first time. |
Yiwen Shao; Shi-Xiong Zhang; Dong Yu; | arxiv-cs.SD | 2021-11-22 |
1225 | Speech-T: Transducer for Text to Speech and Beyond IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that monotonic alignments are also critical to text to speech (TTS) synthesis and streaming TTS is also an important application scenario, in this work, we explore the possibility of applying Transducer to TTS and more. |
JIAWEI CHEN et. al. | nips | 2021-11-20 |
1226 | PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results. |
CHENG-I JEFF LAI et. al. | nips | 2021-11-20 |
1227 | FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. |
YICHONG LENG et. al. | nips | 2021-11-20 |
1228 | SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to create a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE) consisting of limited-size labeled training sets and corresponding evaluation sets. |
SUWON SHON et. al. | arxiv-cs.CL | 2021-11-19 |
1229 | Simultaneous Speech-to-Speech Translation System with Transformer-Based Incremental ASR, MT, and TTS Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present an English-to-Japanese simultaneous speech-to-speech translation (S2ST) system. It has three Transformer-based incremental processing modules for S2ST: … |
RYO FUKUDA et. al. | 2021 24th Conference of the Oriental COCOSDA International … | 2021-11-18 |
1230 | GAMVA: A Japanese Audio-Visual Multi-Angle Speech Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-visual speech recognition (AVSR) has contributed to improve Automatic Speech Recognition (ASR) accuracy in noisy environments. In real-world scenarios, a speaker does not … |
SHINNOSUKE ISOBE et. al. | 2021 24th Conference of the Oriental COCOSDA International … | 2021-11-18 |
1231 | Khmer Speech Translation Corpus of The Extraordinary Chambers in The Courts of Cambodia (ECCC) Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech translation (ST) is a subject of rapidly increasing interest in the area of speech processing research. This interest is apparent from the increasing tools and corpora for … |
KAK SOKY et. al. | 2021 24th Conference of the Oriental COCOSDA International … | 2021-11-18 |
1232 | M2ASR-MONGO: A Free Mongolian Speech Database and Accompanied Baselines Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep learning has significantly improved the performance of automatic speech recognition (ASR), in particular for major languages such as English and Chinese. However, for minor … |
Tiankai Zhi; Ying Shi; Wenqiang Du; Guanyu Li; Dong Wang; | 2021 24th Conference of the Oriental COCOSDA International … | 2021-11-18 |
1233 | Investigation of A Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-to-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to the presence of distortion, most of the single-channel frequency-domain speech enhancement (SE) approaches are still challenging for downstream automatic speech recognition … |
MAHBUB E. NOOR et. al. | 2021 24th Conference of the Oriental COCOSDA International … | 2021-11-18 |
1234 | A Multi-Genre Urdu Broadcast Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper reports the development of a multi-genre Urdu Broadcast (BC) corpus and a Large Vocabulary Continuous Speech Recognition (LVCSR) system. BC speech corpus of 98 hours … |
Erbaz Khan; Sahar Rauf; F. Adeeba; S. Hussain; | 2021 24th Conference of the Oriental COCOSDA International … | 2021-11-18 |
1235 | Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. |
Yi-Chang Chen; Chun-Yen Cheng; Chien-An Chen; Ming-Chieh Sung; Yi-Ren Yeh; | arxiv-cs.CL | 2021-11-16 |
1236 | Analysis of Data Augmentation Methods for Low-Resource Maltese ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case. |
ANDREA DEMARCO et. al. | arxiv-cs.CL | 2021-11-15 |
1237 | Visualizing Automatic Speech Recognition – Means for A Better Understanding? Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) is improving ever more at mimicking human speech processing. The functioning of ASR, however, remains to a large extent obfuscated by the … |
KARLA MARKERT et. al. | ArXiv | 2021-11-10 |
1238 | Scaling ASR Improves Zero and Few Shot Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose data selection techniques to efficiently scale training data to find the most valuable samples in massive datasets. |
ALEX XIAO et. al. | arxiv-cs.CL | 2021-11-10 |
1239 | Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker. |
Salima Mdhaffar; Jean-François Bonastre; Marc Tommasi; Natalia Tomashenko; Yannick Estève; | arxiv-cs.CL | 2021-11-07 |
1240 | Protection Method Based on Multiple Sub-Detectors Against Audio Adversarial Examples Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Applications with audio speech recognition usually involve personal and authentication information; therefore, security measurement for audio speech recognition is one of the most … |
Keiichi Tamura; Hajime Ito; | 2021 IEEE 12th International Workshop on Computational … | 2021-11-06 |
1241 | Context-Aware Transformer Transducer for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel context-aware transformer transducer (CATT) network that improves the state-of-the-art transformer-based ASR system by taking advantage of such contextual signals. |
FENG-JU CHANG et. al. | arxiv-cs.CL | 2021-11-05 |
1242 | Effective Cross-Utterance Language Modeling for Conversational Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To flesh out our ideas, we frame the ASR N-best hypothesis rescoring task as a prediction problem, leveraging BERT, an iconic pre-trained LM, as the ingredient vehicle to facilitate selection of the oracle hypothesis from a given N-best hypothesis list. |
Bi-Cheng Yan; Hsin-Wei Wang; Shih-Hsuan Chiu; Hsuan-Sheng Chiu; Berlin Chen; | arxiv-cs.CL | 2021-11-05 |
1243 | Privacy Attacks for Automatic Speech Recognition Acoustic Models in A Federated Learning Framework IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset. |
Natalia Tomashenko; Salima Mdhaffar; Marc Tommasi; Yannick Estève; Jean-François Bonastre; | arxiv-cs.CL | 2021-11-05 |
1244 | Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that by adding a relatively small number of extra parameters to the encoder layers via so-called residual adapter, we can achieve similar adaptation gains compared to model fine-tuning, while only updating a tiny fraction (less than 0.5%) of the model parameters. |
Katrin Tomanek; Vicky Zayats; Dirk Padfield; Kara Vaillancourt; Fadi Biadsy; | emnlp | 2021-11-05 |
1245 | Sequential Randomized Smoothing for Adversarially Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion. |
Raphael Olivier; Bhiksha Raj; | arxiv-cs.CL | 2021-11-05 |
1246 | A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explored partial fine-tuning and entire fine-tuning on wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. |
Yingzhi Wang; Abdelmoumene Boumadane; Abdelwahab Heba; | arxiv-cs.CL | 2021-11-04 |
1247 | Speech Recognition for Air Traffic Control Via Feature Learning and End-to-end Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. |
Peng Fan; Dongyue Guo; Yi Lin; Bo Yang; Jianwei Zhang; | arxiv-cs.SD | 2021-11-04 |
1248 | Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of suppressing background noise with a conventional cascaded pipeline, we employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. |
HEMING WANG et. al. | arxiv-cs.SD | 2021-10-28 |
1249 | WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. |
SANYUAN CHEN et. al. | arxiv-cs.CL | 2021-10-26 |
1250 | ViDA-MAN: Visual Dialog with Digital Humans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries. |
TONG SHEN et. al. | arxiv-cs.CV | 2021-10-25 |
1251 | A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The speech recognition technology of isolated words is one of the widely used speech recognition technologies at present. The isolated Words Speech Recognition technology for … |
Qiuyun Hao; Fuqiang Wang; Xiaofeng Ma; Peng Zhang; | 2021 14th International Congress on Image and Signal … | 2021-10-23 |
1252 | Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an embedding aligner and modality switch training to better align the speech and text latent spaces. |
WEI WANG et. al. | arxiv-cs.SD | 2021-10-23 |
1253 | Research of Automatic Speech Recognition of Asante-Twi Dialect For Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a new way of building low-resourced dialect Automatic Speech Recognition (ASR) systems using a small database using the Asante-Twi dialect. Three different ASR … |
Adwoa Agyeiwaa Boakye-Yiadom; Mingwei Qin; Ren Jing; | Proceedings of the 2021 5th International Conference on … | 2021-10-22 |
1254 | A Preliminary Study on Wav2Vec 2.0 Embeddings for Text-to-Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Wav2Vec 2.0 (W2V), a self-supervised speech representation trained with massive unlabeled speech data, showed promising results on Automatic Speech Recognition (ASR). In spite of … |
Yohan Lim; Namhyeong Kim; Seung Yun; Sang-Hun Kim; Seung-Ik Lee; | 2021 International Conference on Information and … | 2021-10-20 |
1255 | An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. |
Huaibo Zhao; Yosuke Higuchi; Tetsuji Ogawa; Tetsunori Kobayashi; | arxiv-cs.SD | 2021-10-20 |
1256 | Speech Pattern Based Black-box Model Watermarking for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first black-box model watermarking framework for protecting the IP of ASR models. |
HAOZHE CHEN et. al. | arxiv-cs.SD | 2021-10-19 |
1257 | SLAM: A Unified Encoder for Speech and Language Modeling Via Speech-Text Joint Pre-Training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. |
ANKUR BAPNA et. al. | arxiv-cs.CL | 2021-10-19 |
1258 | AequeVox: Automated Fairness Testing of Speech Recognition Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce, AequeVox, an automated testing framework for evaluating the fairness of ASR systems. |
Sai Sathiesh Rajan; Sakshi Udeshi; Sudipta Chattopadhyay; | arxiv-cs.LG | 2021-10-19 |
1259 | Improving Model Stability and Training Efficiency in Fast, High Quality Expressive Voice Conversion System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Voice conversion (VC) systems have made significant progress owing to advanced deep learning methods. Current research is not only concerned with high-quality and fast audio … |
ZHIYUAN ZHAO et. al. | Companion Publication of the 2021 International Conference … | 2021-10-18 |
1260 | Measuring Frequency of Child-directed WH-Question Words for Alternate Preschool Locations Using Speech Recognition and Location Tracking Technologies Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech and language development in children are crucial for ensuring effective skills in their long-term learning ability. A child’s vocabulary size at the time of entry into … |
PRASANNA V. KOTHALKAR et. al. | Companion Publication of the 2021 International Conference … | 2021-10-18 |
1261 | Analysis of French Phonetic Idiosyncrasies for Accent Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using spectrograms of speech signals, we propose a multi-class classification framework for accent recognition. |
Pierre Berjon; Avishek Nag; Soumyabrata Dev; | arxiv-cs.CL | 2021-10-18 |
1262 | Multilingual Speech Recognition Using Knowledge Transfer Across Learning Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to enhance the multilingual ASR performance in two ways, 1)studying the impact of feeding a one-hot vector identifying the language, 2)formulating the task with a meta-learning objective combined with self-supervised learning (SSL). |
Rimita Lahiri; Kenichi Kumatani; Eric Sun; Yao Qian; | arxiv-cs.CL | 2021-10-15 |
1263 | CORAA: A Large Corpus of Spontaneous and Prepared Speech Manually Validated for Speech Recognition in Brazilian Portuguese IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents CORAA (Corpus of Annotated Audios) v1. |
ARNALDO CANDIDO JUNIOR et. al. | arxiv-cs.CL | 2021-10-14 |
1264 | M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems. |
FAN YU et. al. | arxiv-cs.SD | 2021-10-14 |
1265 | A Chinese Speech Recognition System Based on Fusion Network Structure Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The purpose of an automatic speech recognition system is to convert speech into recognizable text. Chinese is a language in which the same pronunciation but different writing … |
LUNVI GUO et. al. | 2021 IEEE 21st International Conference on Communication … | 2021-10-13 |
1266 | Prompt-tuning in ASR Systems for Efficient Domain-adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we overcome the problem using prompt-tuning, a methodology that trains a small number of domain token embedding parameters to prime a transformer-based LM to a particular domain. |
SAKET DINGLIWAL et. al. | arxiv-cs.CL | 2021-10-13 |
1267 | Corpus Design and Automatic Speech Recognition for Deaf and Hard-of-Hearing People Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study describes automatic speech recognition (ASR) for the deaf and hard-of-hearing people. In the relevant literature, ASR for the deaf has been studied in a manner similar … |
A. Kobayashi; K. Yasu; H. Nishizaki; N. Kitaoka; | 2021 IEEE 10th Global Conference on Consumer Electronics … | 2021-10-12 |
1268 | Emotion Recognition Combining Acoustic and Linguistic Features Based on Speech Recognition Results Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this study, a speech emotion recognition method that uses both acoustic and linguistic features is studied. Various emotion recognition methods using both the abovementioned … |
Misaki Sakurai; T. Kosaka; | 2021 IEEE 10th Global Conference on Consumer Electronics … | 2021-10-12 |
1269 | Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose evaluating ASR output hypotheses quality with SemDist that can measure semantic correctness by using the distance between the semantic vectors of the reference and hypothesis extracted from a pre-trained language model. |
SUYOUN KIM et. al. | arxiv-cs.CL | 2021-10-11 |
1270 | K-Wav2vec 2.0: Automatic Speech Recognition Based on Joint Decoding of Graphemes and Syllables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present K-Wav2Vec 2.0, which is a modified version of Wav2vec 2.0 designed for Korean automatic speech recognition by exploring and optimizing various factors of the original Wav2vec 2.0. |
Jounghee Kim; Pilsung Kang; | arxiv-cs.CL | 2021-10-11 |
1271 | Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning. |
YIMING WANG et. al. | arxiv-cs.CL | 2021-10-10 |
1272 | An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models. |
XUANKAI CHANG et. al. | arxiv-cs.CL | 2021-10-09 |
1273 | Magic Dust for Cross-lingual Adaptation of Monolingual Wav2vec-2.0 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in resource-scarce languages. |
Sameer Khurana; Antoine Laurent; James Glass; | arxiv-cs.CL | 2021-10-07 |
1274 | Explaining The Attention Mechanism of End-to-End Speech Recognition Using Decision Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we use decision trees to explain how the attention mechanism impact itself in speech recognition. |
Yuanchao Wang; Wenji Du; Chenghao Cai; Yanyan Xu; | arxiv-cs.CL | 2021-10-07 |
1275 | WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. |
BINBIN ZHANG et. al. | arxiv-cs.SD | 2021-10-07 |
1276 | Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. |
Liang-Hsuan Tseng; Yu-Kuan Fu; Heng-Jui Chang; Hung-yi Lee; | arxiv-cs.CL | 2021-10-07 |
1277 | FAST-RIR: Fast Neural Diffuse Room Impulse Response Generator IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. |
ANTON RATNARAJAH et. al. | arxiv-cs.SD | 2021-10-07 |
1278 | Spell My Name: Keyword Boosted Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords, which in turn enables better readability of the results. |
Namkyu Jung; Geonmin Kim; Joon Son Chung; | arxiv-cs.SD | 2021-10-06 |
1279 | Integrating Categorical Features in End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we treat all these aspects as categorical information in an ASR system, and propose a simple yet effective way to integrate categorical features into E2E model. |
Rongqing Huang; | arxiv-cs.CL | 2021-10-06 |
1280 | Is Attention Always Needed? A Case Study on Language Identification from Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The present study introduces convolutional recurrent neural network (CRNN) based LID, designed to operate on the Mel-frequency Cepstral Coefficient (MFCC) characteristics of audio samples. |
Atanu Mandal; Santanu Pal; Indranil Dutta; Mahidas Bhattacharya; Sudip Kumar Naskar; | arxiv-cs.LG | 2021-10-05 |
1281 | BERT Attends The Conversation: Improving Low-Resource Conversational ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose new, data-efficient training tasks for BERT models that improve performance of automatic speech recognition (ASR) systems on conversational speech. |
Pablo Ortiz; Simen Burud; | arxiv-cs.CL | 2021-10-05 |
1282 | Evaluation of Automatic Speech Recognition Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and … |
MATHEUS XAVIER SAMPAIO et. al. | SBBD | 2021-10-04 |
1283 | Audio Steganography with Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples that are intentionally crafted by adding small perturbations to the original input. Most works focus on … |
HAO TAN et. al. | 2021 IEEE Sixth International Conference on Data Science in … | 2021-10-01 |
1284 | SpliceOut: A Simple and Efficient Audio Augmentation Method IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SpliceOut, a simple modification to time masking which makes it computationally more efficient. |
Arjit Jain; Pranay Reddy Samala; Deepak Mittal; Preethi Jyoti; Maneesh Singh; | arxiv-cs.SD | 2021-09-30 |
1285 | FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. |
YICHONG LENG et. al. | arxiv-cs.CL | 2021-09-29 |
1286 | Challenges and Opportunities of Speech Recognition for Bengali Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research work, we sedulously disclose the current status of the Bengali ASR system’s research endeavors. |
M. F. Mridha; Abu Quwsar Ohi; Md. Abdul Hamid; Muhammad Mostafa Monowar; | arxiv-cs.CL | 2021-09-27 |
1287 | Audio-Visual Speech Recognition Is Worth $32\times 32\times 8$ Voxels Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-visual automatic speech recognition (AV-ASR) intro-duces the video modality into the speech recognition process, often by relying on information conveyed by the motion of … |
Dmitriy Serdyuk; Otavio Braga; O. Siohan; | 2021 IEEE Automatic Speech Recognition and Understanding … | 2021-09-20 |
1288 | Audio-Visual Speech Recognition Is Worth 32$\times$32$\times$8 Voxels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to replace the 3D convolutional visual front-end with a video transformer front-end. |
Dmitriy Serdyuk; Otavio Braga; Olivier Siohan; | arxiv-cs.CV | 2021-09-20 |
1289 | Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. |
FELIX WU et. al. | arxiv-cs.CL | 2021-09-14 |
1290 | Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In such cases, we show that the domain-independent acoustic models learned from the high-resource language through unsupervised domain adaptation (UDA) schemes can enhance the performance of the ASR in the low-resource language. |
Anoop C S; Prathosh A P; A G Ramakrishnan; | arxiv-cs.CL | 2021-09-12 |
1291 | Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the self-attention channel combinator (SACC) ASR frontend, which leverages the self-attention mechanism to combine multichannel audio signals in the magnitude spectral domain. |
RONG GONG et. al. | arxiv-cs.SD | 2021-09-10 |
1292 | Using Data Augmentation and Time-Scale Modification to Improve ASR of Children’s Speech in Noisy Environments Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Current ASR systems show poor performance in recognition of children’s speech in noisy environments because recognizers are typically trained with clean adults’ speech and … |
H. Kathania; Sudarsana Reddy Kadiri; P. Alku; M. Kurimo; | Applied Sciences | 2021-09-10 |
1293 | DeepEMO: Deep Learning for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We proposed the industry level deep learning approach for speech emotion recognition task. |
Enkhtogtokh Togootogtokh; Christian Klasen; | arxiv-cs.SD | 2021-09-09 |
1294 | Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel tree-constrained pointer generator (TCPGen) component is proposed that incorporates such knowledge as a list of biasing words into both attention-based encoder-decoder and transducer end-to-end ASR models in a neural-symbolic way. |
Guangzhi Sun; Chao Zhang; Philip C. Woodland; | arxiv-cs.CL | 2021-09-01 |
1295 | ETLT 2021: Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The paper presents the Second ASR Challenge for Non-native Children’s Speech proposed as a Special Session at Interspeech 2021, following the successful first challenge at … |
R. GRETTER et. al. | Interspeech | 2021-08-30 |
1296 | Adversarial Example Devastation and Detection on Speech Recognition System By Adding Random Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm of devastation and detection on adversarial examples that can attack current advanced ASR systems. |
Mingyu Dong; Diqun Yan; Yongkang Gong; Rangding Wang; | arxiv-cs.SD | 2021-08-30 |
1297 | Weakly Supervised Construction of ASR Systems from Massive Video Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Despite the rapid development of deep learning models, for real-world applications, building large-scale Automatic Speech Recognition (ASR) systems from scratch is still … |
Mengli Cheng; Chengyu Wang; Jun Huang; Xiaobo Wang; | Interspeech | 2021-08-30 |
1298 | ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: %To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics. |
LINGYUN FENG et. al. | arxiv-cs.CL | 2021-08-30 |
1299 | You Don’t Understand Me!: Comparing ASR Results for L1 and L2 Speakers of Swedish IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The performance of Automatic Speech Recognition (ASR) systems has constantly increased in state-of-the-art development. However, performance tends to decrease considerably in more … |
Ronald Cumbal; Birger Moell; José Lopes; Olov Engwall; | Interspeech | 2021-08-30 |
1300 | BERT-Based Semantic Model for Rescoring N-Best Speech Recognition List IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This work aims to improve automatic speech recognition (ASR) by modeling long-term semantic relations. We propose to perform this through rescoring the ASR N-best hypotheses list. … |
D. Fohr; I. Illina; | Interspeech | 2021-08-30 |
1301 | Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR). |
INJY HAMED et. al. | arxiv-cs.CL | 2021-08-29 |
1302 | Improving Callsign Recognition with Air-surveillance Data in Air-traffic Communication IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate two approaches: (1) G-boosting, when callsigns weights are adjusted at language model level (G) and followed by the dynamic decoder with an on-the-fly composition, and (2) lattice rescoring when callsign information is introduced on top of lattices generated using a conventional decoder. |
Iuliia Nigmatulina; Rudolf Braun; Juan Zuluaga-Gomez; Petr Motlicek; | arxiv-cs.CL | 2021-08-27 |
1303 | Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to (1) automatically segment the ATCO and pilot data based on an intuitive approach exploiting ASR transcripts and (2) subsequently consider an automatic recognition of ATCOs’ and pilots’ voice as two separate tasks. |
AMRUTHA PRASAD et. al. | arxiv-cs.CL | 2021-08-27 |
1304 | Injecting Text in Self-Supervised Speech Pretraining IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to jointly learn representations during pretraining from two different modalities: speech and text. |
ZHEHUAI CHEN et. al. | arxiv-cs.CL | 2021-08-27 |
1305 | Task-aware Warping Factors in Mask-based Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the use of two task-aware warping factors in mask-based speech enhancement (SE). |
Qiongqiong Wang; Kong Aik Lee; Takafumi Koshinaka; Koji Okabe; Hitoshi Yamamoto; | arxiv-cs.SD | 2021-08-27 |
1306 | Spontaneous Speech Summarization: Transformers All The Way Through Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper proposes a speech summarization system for spontaneous speech. The proposed system consists of speech segmentation, speech recognition, and extractive text … |
Tomoki Hayashi; Takenori Yoshimura; Masaya Inuzuka; Ibuki Kuroyanagi; Osamu Segawa; | 2021 29th European Signal Processing Conference (EUSIPCO) | 2021-08-23 |
1307 | Automatic Speech Recognition And Limited Vocabulary: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary. |
Jean Louis K. E. Fendji; Diane C. M. Tala; Blaise O. Yenke; Marcellin Atemkeng; | arxiv-cs.AI | 2021-08-23 |
1308 | Few-Shot Learning for Frame-Wise Phoneme Recognition: Adaptation of Matching Networks Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, the topic of Few-Shot Learning (FSL) is emerging as a radical direction in machine learning, well established with a variety of paradigms and network realizations for … |
TIRTHANKAR BANERJEE et. al. | 2021 29th European Signal Processing Conference (EUSIPCO) | 2021-08-23 |
1309 | Data Augmentation Using CycleGAN for End-to-End Children ASR IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent deep learning algorithms are known to perform better for Automatic Speech Recognition (ASR) of adult speakers, however, yet remains a challenge to recognize children’s … |
D. K. Singh; Preet P. Amin; Hardik B. Sailor; H. Patil; | 2021 29th European Signal Processing Conference (EUSIPCO) | 2021-08-23 |
1310 | Multilingual Speech Recognition for Low-Resource Indian Languages Using Multi-Task Conformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a multi-task learning-based transformer model for low-resource multilingual speech recognition for Indian languages. |
Krishna D N; | arxiv-cs.CL | 2021-08-22 |
1311 | Hierarchical Summarization for Longform Spoken Dialog IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we design a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges. |
Daniel Li; Thomas Chen; Albert Tung; Lydia Chilton; | arxiv-cs.CL | 2021-08-21 |
1312 | A Light-weight Contextual Spelling Correction Model for Customizing Transducer-based Speech Recognition Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems. |
Xiaoqiang Wang; Yanqing Liu; Sheng Zhao; Jinyu Li; | arxiv-cs.CL | 2021-08-17 |
1313 | Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an approach, "Mondegreen", to correct voice queries in text space without depending on audio signals, which may not always be available due to system constraints or privacy or bandwidth (for example, some ASR systems run on-device) considerations. |
SUKHDEEP S. SODHI et. al. | kdd | 2021-08-12 |
1314 | Meaning Error Rate: ASR Domain-specific Metric Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we build a speech recognition quality evaluation framework that unifies feedback coming from different types of customers into a single metric. |
Ludmila Gordeeva; Vasily Ershov; Oleg Gulyaev; Igor Kuralenok; | kdd | 2021-08-12 |
1315 | On The Compensation Between Magnitude and Phase in Speech Separation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase. |
Zhong-Qiu Wang; Gordon Wichern; Jonathan Le Roux; | arxiv-cs.SD | 2021-08-11 |
1316 | The HW-TSC’s Offline Speech Translation Systems for IWSLT 2021 Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our work in participation of the IWSLT-2021 offline speech translation task. |
MINGHAN WANG et. al. | arxiv-cs.CL | 2021-08-09 |
1317 | StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized By Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome this problem, we propose the use of automatic speech recognition to assist model training, to improve StarGAN-VC, especially in low-resource scenarios. |
Shoki Sakamoto; Akira Taniguchi; Tadahiro Taniguchi; Hirokazu Kameoka; | arxiv-cs.SD | 2021-08-09 |
1318 | Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we exploit the scope of the transformer distillation method that is specifically designed for knowledge distillation from a transformer based language model to a transformer based speech model. |
Yidi Jiang; Bidisha Sharma; Maulik Madhavi; Haizhou Li; | arxiv-cs.CL | 2021-08-05 |
1319 | Dyn-ASR: Compact, Multilingual Speech Recognition Via Spoken Language and Accent Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach to enable multilingual speech recognition on edge devices. |
Sangeeta Ghangam; Daniel Whitenack; Joshua Nemecek; | arxiv-cs.CL | 2021-08-04 |
1320 | An Intelligent Hybrid–Integrated System Using Speech Recognition and A 3D Display for Early Childhood Education IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the past few years, people’s attitudes toward early childhood education (PAUD) have undergone a complete transformation. Personalized and intelligent communication methods are … |
Kun Xia; Xinghao Xie; Hongliang Fan; Haiyang Liu; | Electronics | 2021-08-03 |
1321 | Improving Distinction Between ASR Errors and Speech Disfluencies with Feature Space Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a scheme to improve existing LM-based ASR error detection systems, both in terms of detection scores and resilience to such distracting auxiliary tasks. |
SEONGMIN PARK et. al. | arxiv-cs.CL | 2021-08-03 |
1322 | Decoupling Recognition and Transcription in Mandarin ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. |
JIAHONG YUAN et. al. | arxiv-cs.CL | 2021-08-02 |
1323 | The Role of Phonetic Units in Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0. |
Jiahong Yuan; Xingyu Cai; Renjie Zheng; Liang Huang; Kenneth Church; | arxiv-cs.CL | 2021-08-02 |
1324 | The History of Speech Recognition to The Year 2030 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: I attempt to forecast the state of speech recognition research and applications by the year 2030. |
Awni Hannun; | arxiv-cs.CL | 2021-07-30 |
1325 | Automatic Speech Recognition (ASR) Systems for Learning Arabic Language and Al-Quran Recitation: A Review Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper provides a literature survey about Automatic Speech Recognition (ASR) systems for learning Arabic language and Al-Quran Recitation. The growth in communication … |
Nazik O’mar Balula; M. Rashwan; S. Abdou; | International Journal of Computer Science and Mobile … | 2021-07-30 |
1326 | Lightweight Adapter Tuning for Multilingual Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). |
HANG LE et. al. | acl | 2021-07-26 |
1327 | VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VoxPopuli, a large-scale multilingual corpus providing 400K hours of unlabeled speech data in 23 languages. |
CHANGHAN WANG et. al. | acl | 2021-07-26 |
1328 | Stacked Acoustic-and-Textual Encoding: Integrating The Pre-trained Models Into Speech Translation Encoders IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Stacked Acoustic-and-Textual Encoding (SATE) method for speech translation. |
CHEN XU et. al. | acl | 2021-07-26 |
1329 | OLR 2021 Challenge: Datasets, Rules and Baselines IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the sixth Oriental Language Recognition (OLR) 2021 Challenge, which intends to improve the performance of language recognition systems and speech recognition systems within multilingual scenarios. |
BINLING WANG et. al. | arxiv-cs.CL | 2021-07-23 |
1330 | Brazilian Portuguese Speech Recognition Using Wav2vec 2.0 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this sense, this work presents the development of an public Automatic Speech Recognition (ASR) system using only open available audio data, from the fine-tuning of the Wav2vec 2.0 XLSR-53 model pre-trained in many languages, over BP data. |
Lucas Rafael Stefanel Gris; Edresson Casanova; Frederico Santos de Oliveira; Anderson da Silva Soares; Arnaldo Candido Junior; | arxiv-cs.CL | 2021-07-23 |
1331 | On Prosody Modeling for ASR+TTS Based Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, in this work, we propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP). |
Wen-Chin Huang; Tomoki Hayashi; Xinjian Li; Shinji Watanabe; Tomoki Toda; | arxiv-cs.SD | 2021-07-20 |
1332 | A Comparison of Methods for OOV-word Recognition on A New Public Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a new tool for calculating relevant performance metrics. We showcase very large improvements in OOV-word recognition and make both the data and code available. |
Rudolf A. Braun; Srikanth Madikeri; Petr Motlicek; | arxiv-cs.CL | 2021-07-16 |
1333 | Zero-shot Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These models tend to output the wrong language when performing zero-shot ST. We tackle the issues by including additional training data and an auxiliary loss function that minimizes the text-audio difference. |
Tu Anh Dinh; | arxiv-cs.CL | 2021-07-13 |
1334 | The IWSLT 2021 BUT Speech Translation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study their efficiency from the perspective of having a large amount of separate ASR training data and MT training data, and a smaller amount of speech-translation training data. |
Hari Krishna Vydana; Martin Karafi’at; Luk’as Burget; Honza Cernock’y; | arxiv-cs.CL | 2021-07-13 |
1335 | Noisy Training Improves E2E ASR for The Edge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a simple yet effective noisy training strategy to further improve the E2E ASR model training. |
DILIN WANG et. al. | arxiv-cs.CL | 2021-07-09 |
1336 | Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize compared to a strong baseline. |
Christian Huber; Juan Hussain; Sebastian Stüker; Alexander Waibel; | arxiv-cs.CL | 2021-07-05 |
1337 | Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel modeling method for single-channel multi-talker overlapped automatic speech recognition (ASR) systems. |
RYO MASUMURA et. al. | arxiv-cs.CL | 2021-07-04 |
1338 | Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a cross-modal transformer-based neural correction models that refines the output of an automatic speech recognition (ASR) system so as to exclude ASR errors. |
TOMOHIRO TANAKA et. al. | arxiv-cs.CL | 2021-07-04 |
1339 | Arabic Code-Switching Speech Recognition Using Monolingual Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. With this study, we release an artificially generated development and test sets, along with ecological code-switching test set, to benchmark the ASR performance. |
Ahmed Ali; Shammur Chowdhury; Amir Hussein; Yasser Hifny; | arxiv-cs.CL | 2021-07-04 |
1340 | Developing Children’s Speech Recognition System for Low Resource Punjabi Language IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View |
Virender Kadyan; Syed Shanawazuddin; Amitoj Singh; | Applied Acoustics | 2021-07-01 |
1341 | IMS’ Systems for The IWSLT 2021 Low-Resource Speech Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data … |
Pavel Denisov; Manuel Mager; Ngoc Thang Vu; | International Workshop on Spoken Language Translation | 2021-06-30 |
1342 | Word-Free Spoken Language Understanding for Mandarin-Chinese Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Transformer-based SLU system that works directly on phones. |
Zhiyuan Guo; Yuexin Li; Guo Chen; Xingyu Chen; Akshat Gupta; | arxiv-cs.CL | 2021-06-30 |
1343 | IMS’ Systems for The IWSLT 2021 Low-Resource Speech Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. |
Pavel Denisov; Manuel Mager; Ngoc Thang Vu; | arxiv-cs.CL | 2021-06-30 |
1344 | Alzheimer’s Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer’s Disease and to what degree, evaluating the ADReSSo challenge 2021 data. |
Morteza Rohanian; Julian Hough; Matthew Purver; | arxiv-cs.CL | 2021-06-29 |
1345 | Towards Multilingual End‐to‐end Speech Recognition for Air Traffic Control IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this work, an end-to-end framework is proposed to achieve multilingual automatic speech recognition (ASR) in air traffic control (ATC) systems. Considering the standard ATC … |
Yi Lin; Bo Yang; Dongyue Guo; Peng Fan; | IET Intelligent Transport Systems | 2021-06-22 |
1346 | Using Heterogeneity in Semi-supervised Transcription Hypotheses to Improve Code-switched Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, we propose a semi-supervised approach for code-switched ASR. |
Andrew Slottje; Shannon Wotherspoon; William Hartmann; Matthew Snover; Owen Kimball; | arxiv-cs.CL | 2021-06-14 |
1347 | SynthASR: Unlocking Synthetic Data for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to utilize synthetic speech for ASR training (SynthASR) in applications where data is sparse or hard to get for ASR model training. |
AMIN FAZEL et. al. | arxiv-cs.LG | 2021-06-14 |
1348 | Assessing The Use of Prosody in Constituency Parsing of Imperfect Transcripts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work explores constituency parsing on automatically recognized transcripts of conversational speech. |
Trang Tran; Mari Ostendorf; | arxiv-cs.CL | 2021-06-14 |
1349 | GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. |
GUOGUO CHEN et. al. | arxiv-cs.SD | 2021-06-13 |
1350 | Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In view of this, we in this paper seek to represent the historical context information of an utterance as graph-structured data so as to distill cross-utterances, global word interaction relationships. |
Shih-Hsuan Chiu; Tien-Hong Lo; Fu-An Chao; Berlin Chen; | arxiv-cs.CL | 2021-06-13 |
1351 | PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Prune-Adjust-Re-Prune (PARP), which discovers and finetunes subnetworks for much better performance, while only requiring a single downstream ASR finetuning run. |
CHENG-I JEFF LAI et. al. | arxiv-cs.CL | 2021-06-10 |
1352 | Unsupervised Automatic Speech Recognition: A Review IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition. |
Hanan Aldarmaki; Asad Ullah; Nazar Zaki; | arxiv-cs.CL | 2021-06-09 |
1353 | Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To extract learnable and adaptive features and mitigate information loss, we propose a new encoder that adopts globally attentive locally recurrent (GALR) networks and directly takes raw waveform as input. |
Max W. Y. Lam; Jun Wang; Chao Weng; Dan Su; Dong Yu; | arxiv-cs.SD | 2021-06-08 |
1354 | Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inthis paper, we propose a simple yet efficient neural networkarchitecture to exploit both acoustic and lexical informationfrom speech. |
Zixuan Peng; Yu Lu; Shengfeng Pan; Yunfeng Liu; | arxiv-cs.SD | 2021-06-08 |
1355 | Vowel Non-Vowel Based Spectral Warping and Time Scale Modification for Improvement in Children’s ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Acoustic differences between children’s and adults’ speech causes the degradation in the automatic speech recognition system performance when system trained on adults’ speech and … |
H. Kathania; Avinash Kumar; M. Kurimo; | ICASSP 2021 – 2021 IEEE International Conference on … | 2021-06-06 |
1356 | Towards Data Selection on TTS Data for Children’s Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although great progress has been made on automatic speech recognition (ASR) systems, children’s speech recognition still remains a challenging task. General ASR systems for … |
WEI WANG et. al. | ICASSP 2021 – 2021 IEEE International Conference on … | 2021-06-06 |
1357 | Semantic-WER: A Unified Metric for The Evaluation of ASR Transcript for End Usability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The reason is that the WER works at the surface level and does not include any syntactic and semantic knowledge.The current work proposes Semantic-WER (SWER), a metric to evaluate the ASR transcripts for downstream applications in general. |
Somnath Roy; | arxiv-cs.CL | 2021-06-03 |
1358 | Attention-based Contextual Language Model Adaptation for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce an attention mechanism for training neural speech recognition language models on both text and non-linguistic contextual data. |
RICHARD DIEHL MARTINEZ et. al. | arxiv-cs.CL | 2021-06-02 |
1359 | Arabic Speech Recognition Using End-to-end Deep Learning IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this … |
Hamzah A. Alsayadi; A. Abdelhamid; I. Hegazy; Z. Fayed; | IET Signal Process. | 2021-06-02 |
1360 | Improving Low-resource ASR Performance with Untranscribed Out-of-domain Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the issue of low-resource ASR when only untranscribed out-of-domain speech data is readily available in the target language. |
Jayadev Billa; | arxiv-cs.CL | 2021-06-02 |
1361 | CrossASR++: A Modular Differential Testing Framework for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: So in this accompanying tool demo paper, we devote more engineering and propose CrossASR++, an easy-to-use ASR testing tool that can be conveniently extended to incorporate different TTS and ASR systems, and failure estimators. |
Muhammad Hilmi Asyrofi; Zhou Yang; David Lo; | arxiv-cs.SE | 2021-05-31 |
1362 | End-to-end ASR to Jointly Predict Transcriptions and Linguistic Annotations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Transformer-based sequence-to-sequence model for automatic speech recognition (ASR) capable of simultaneously transcribing and annotating audio with linguistic information such as phonemic transcripts or part-of-speech (POS) tags. |
Motoi Omachi; Yuya Fujita; Shinji Watanabe; Matthew Wiesner; | naacl | 2021-05-23 |
1363 | Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an approach, Mondegreen, to correct voice queries in text space without depending on audio signals, which may not always be available due to system constraints or privacy or bandwidth (for example, some ASR systems run on-device) considerations. |
SUKHDEEP S. SODHI et. al. | arxiv-cs.SD | 2021-05-20 |
1364 | Development of The Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using The Dementiabank Corpus IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the development of a state-of-the-art automatic speech recognition (ASR) system built on the Dementia-Bank Pitt corpus for automatic NCD detection. |
Z. Ye; et al. | icassp | 2021-05-16 |
1365 | Speech Acoustic Modelling from Raw Phase Spectrum IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time phase spectrum. |
E. Loweimi; Z. Cvetkovic; P. Bell; S. Renals; | icassp | 2021-05-16 |
1366 | Phoneme-Based Distribution Regularization for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to bridge this gap by extracting phoneme identities to help speech enhancement. |
Y. Liu; X. Peng; Z. Xiong; Y. Lu; | icassp | 2021-05-16 |
1367 | Meta-Learning for Improving Rare Word Recognition in End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we take on the challenge of rare word recognition in end-to-end (E2E) automatic speech recognition (ASR) by integrating a meta learning mechanism into an E2E ASR system, enabling few-shot adaptation. |
F. Lux; N. T. Vu; | icassp | 2021-05-16 |
1368 | Federated Acoustic Modeling for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate federated acoustic modeling using data from multiple clients. |
X. Cui; S. Lu; B. Kingsbury; | icassp | 2021-05-16 |
1369 | MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR). |
L. MENG et. al. | icassp | 2021-05-16 |
1370 | A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer and Large Scale Synthetic Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since our goal is to recognize the input speech, we consider enhancements which improve word error rates (WERs) when the predicted speech signal is passed to an automatic speech recognition (ASR) model. |
N. Howard; A. Park; T. Z. Shabestary; A. Gruenstein; R. Prabhavalkar; | icassp | 2021-05-16 |
1371 | Task Aware Multi-Task Learning for Speech to Text Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a task modulation network which allows the model to learn task specific features, while learning the shared features simultaneously. |
S. Indurthi; et al. | icassp | 2021-05-16 |
1372 | Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first apply a known decoding technique that was developed to perform single-speaker ASR for long-form audio to our E2E SA-ASR task. Then, we propose a novel method using a sequence-to-sequence model, called hypothesis stitcher. |
X. CHANG et. al. | icassp | 2021-05-16 |
1373 | Recent Developments on Espnet Toolkit Boosted By Conformer IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present recent developments on ESPnet: End-to- End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. |
P. Guo; et al. | icassp | 2021-05-16 |
1374 | Construction of A Large-Scale Japanese ASR Corpus on TV Recordings IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new large-scale Japanese speech corpus for training automatic speech recognition (ASR) systems. |
S. Ando; H. Fujihara; | icassp | 2021-05-16 |
1375 | The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The Accented English Speech Recognition Challenge (AESRC2020) is designed for providing a common testbed and promoting accent-related research. |
X. Shi; et al. | icassp | 2021-05-16 |
1376 | Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the neural architecture search (NAS) for automatic speech recognition (ASR) systems. |
L. He; D. Su; D. Yu; | icassp | 2021-05-16 |
1377 | End-to-End Multilingual Automatic Speech Recognition for Less-Resourced Languages: The Case of Four Ethiopian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have, therefore, conducted ML E2E ASR experiments for four less-resourced Ethiopian languages using different language and acoustic modelling units. |
S. T. Abate; M. Y. Tachbelie; T. Schultz; | icassp | 2021-05-16 |
1378 | Code-Switch Speech Rescoring with Monolingual Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the code-switch speech recognition in mainland China, which is obviously different from the Hong Kong and Southeast Asia area in linguistic characteristics. |
G. Liu; L. Cao; | icassp | 2021-05-16 |
1379 | Analysis of X-Vectors for Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper presents a study of usability of x-vectors for adaptation of automatic speech recognition (ASR) systems. |
M. Karafi�t; et al. | icassp | 2021-05-16 |
1380 | AISpeech-SJTU ASR System for The Accented English Speech Recognition Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the AISpeech-SJTU ASR system for the Interspeech-2020 Accented English Speech Recognition Challenge (AESRC). |
T. TAN et. al. | icassp | 2021-05-16 |
1381 | Federated Marginal Personalization for ASR Rescoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce federated marginal personalization (FMP), a novel method for continuously updating personalized neural network language models (NNLMs) on private devices using federated learning (FL). |
Z. Liu; F. Peng; | icassp | 2021-05-16 |
1382 | Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. |
Q. Li; et al. | icassp | 2021-05-16 |
1383 | Partially Overlapped Inference for Long-Form Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a more effective way of overlapped inference by aligning partially matched hypotheses. |
T. G. Kang; H. -G. Kim; M. -J. Lee; J. Lee; H. Lee; | icassp | 2021-05-16 |
1384 | Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an extended Parrotron model: a single, end-to-end network that enables voice conversion and recognition simultaneously. |
R. Doshi; et al. | icassp | 2021-05-16 |
1385 | A Comparison of Methods for OOV-Word Recognition on A New Public Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We showcase very large improvements in OOV-word recognition and make both the data and code available. |
R. A. Braun; S. Madikeri; P. Motlicek; | icassp | 2021-05-16 |
1386 | A Sequential Contrastive Learning Framework for Robust Dysarthric Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a contrastive learning framework for robust dysarthric speech recognition (DSR) by capturing the dysarthric speech variability. |
L. Wu; D. Zong; S. Sun; J. Zhao; | icassp | 2021-05-16 |
1387 | A Causal Deep Learning Framework for Classifying Phonemes in Cochlear Implants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a causal deep learning framework for classifying phonemes using features extracted at the time-frequency resolution of a CI processor. |
K. Chu; L. Collins; B. Mainsah; | icassp | 2021-05-16 |
1388 | End-To-End Multi-Accent Speech Recognition with Unsupervised Accent Modelling IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to grapple with such an issue, we first investigate and improve the current mainstream end-to-end multi-accent speech recognition technologies. In addition, we propose two unsupervised accent modelling methods, which convert accent information into a global embedding, and use it to improve the performance of the end-to-end multi-accent speech recognition systems. |
S. LI et. al. | icassp | 2021-05-16 |
1389 | An End-to-End Speech Accent Recognition Method Based on Hybrid CTC/Attention Transformer ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel accent recognition system in the framework of a transformer-based end-to-end speech recognition system. |
Q. Gao; H. Wu; Y. Sun; Y. Duan; | icassp | 2021-05-16 |
1390 | Dynamic Sparsity Neural Networks for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly switch to any predefined sparsity configuration at run-time. |
Z. WU et. al. | icassp | 2021-05-16 |
1391 | Top-Down Attention in End-to-End Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on this insight, we propose Top-Down SLU (TD-SLU), a new transformer-based E2E SLU model that uses top-down attention and an attention gate to fuse high-level NLU features with low-level ASR features, which leads to a better optimization of both tasks. |
Y. Chen; et al. | icassp | 2021-05-16 |
1392 | Towards Data Selection on TTS Data for Children�s Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we adopt text-to-speech data augmentation to improve the performance of children?s speech recognition system. |
W. WANG et. al. | icassp | 2021-05-16 |
1393 | Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models. |
T. Doutre; et al. | icassp | 2021-05-16 |
1394 | Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet efficient neural network architecture to exploit both acoustic and lexical information from speech. |
Z. Peng; Y. Lu; S. Pan; Y. Liu; | icassp | 2021-05-16 |
1395 | BLSTM-Based Confidence Estimation for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we perform confidence estimation for end-to-end (E2E) ASR hypotheses. |
A. Ogawa; N. Tawara; T. Kano; M. Delcroix; | icassp | 2021-05-16 |
1396 | Towards An ASR Approach Using Acoustic and Language Models for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to modify the speech estimation process, by treating speech enhancement as a classification problem in an ASR-style manner. |
K. M. Nayem; D. S. Williamson; | icassp | 2021-05-16 |
1397 | Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end (E2E) neural network manner, called directional automatic speech recognition (D-ASR), which explicitly models source speaker locations. |
A. S. Subramanian; et al. | icassp | 2021-05-16 |
1398 | Improved Mask-CTC for Non-Autoregressive End-to-End ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To boost the performance of Mask-CTC, we first propose to enhance the encoder network architecture by employing a recently proposed architecture called Conformer. Next, we propose new training and decoding methods by introducing auxiliary objective to predict the length of a partial target sequence, which allows the model to delete or insert tokens during inference. |
Y. Higuchi; H. Inaguma; S. Watanabe; T. Ogawa; T. Kobayashi; | icassp | 2021-05-16 |
1399 | Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a novel technique for child ASR using both feature normalization and data augmentation methods based on the relationship between formants and fundamental frequency (fo). |
G. Yeung; R. Fan; A. Alwan; | icassp | 2021-05-16 |
1400 | Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generalized form of the connectionist temporal classification (CTC) objective that accepts a graph representation of the training labels. |
N. Moritz; T. Hori; J. L. Roux; | icassp | 2021-05-16 |
1401 | Meta-Adapter: Efficient Cross-Lingual Adaptation With Meta-Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to combine the adapter module with meta-learning algorithms to achieve high recognition performance under low-resource settings and improve the parameter-efficiency of the model. |
W. Hou; Y. Wang; S. Gao; T. Shinozaki; | icassp | 2021-05-16 |
1402 | Synthesis of New Words for Improved Dysarthric Speech Recognition on An Expanded Vocabulary IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data augmentation method using voice conversion that allows dysarthric ASR systems to accurately recognize words outside of the training set vocabulary. |
J. Harvill; D. Issa; M. Hasegawa-Johnson; C. Yoo; | icassp | 2021-05-16 |
1403 | Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Memory-Self-Attention (MSA), which adds history information into the Restricted-Self-Attention unit. |
J. Luo; J. Wang; N. Cheng; J. Xiao; | icassp | 2021-05-16 |
1404 | Content-Aware Speaker Embeddings for Speaker Diarisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the content-aware speaker embeddings (CASE) approach is proposed, which extends the input of the speaker classifier to include not only acoustic features but also their corresponding speech content, via phone, character, and word embeddings. |
G. Sun; D. Liu; C. Zhang; P. C. Woodland; | icassp | 2021-05-16 |
1405 | ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel all deep learning MVDR framework, where the matrix inversion and eigenvalue decomposition are replaced by two recurrent neural networks (RNNs), to resolve both issues at the same time. |
Z. ZHANG et. al. | icassp | 2021-05-16 |
1406 | Joint Masked CPC And CTC Training For ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we demonstrate a single-stage training of ASR models that can utilize both unlabeled and labeled data. |
C. Talnikar; T. Likhomanenko; R. Collobert; G. Synnaeve; | icassp | 2021-05-16 |
1407 | A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a progressive learning-based adaptive noise and speech estimation (PL-ANSE) method for speech preprocessing in noisy speech recognition, leveraging upon a frame-level noise tracking capability of improved minima controlled recursive averaging (IMCRA) and an utterance-level deep progressive learning of nonlinear interactions between speech and noise. |
Z. Nian; Y. -H. Tu; J. Du; C. -H. Lee; | icassp | 2021-05-16 |
1408 | End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the multichannel multi-speaker reverberant condition, and propose to extend our previous framework for end-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend subnetworks including voice activity detection like masks. |
W. Zhang; et al. | icassp | 2021-05-16 |
1409 | Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a method to pre-train transformer-based encoder-decoder automatic speech recognition (ASR) models using sufficient target-domain text. |
C. GAO et. al. | icassp | 2021-05-16 |
1410 | Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. |
R. MASUMURA et. al. | icassp | 2021-05-16 |
1411 | Echo State Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose automatic speech recognition (ASR) models inspired by echo state network (ESN) [1], in which a subset of recurrent neural networks (RNN) layers in the models are randomly initialized and untrained. |
H. Shrivastava; A. Garg; Y. Cao; Y. Zhang; T. Sainath; | icassp | 2021-05-16 |
1412 | Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Bifocal RNN-T, a new variant of the Recurrent Neural Network Transducer (RNN-T) architecture designed for improved inference time latency on speech recognition tasks. |
J. Macoskey; G. P. Strimel; A. Rastrow; | icassp | 2021-05-16 |
1413 | Transformer-Transducers for Code-Switched Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition. |
S. Dalmia; Y. Liu; S. Ronanki; K. Kirchhoff; | icassp | 2021-05-16 |
1414 | A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a general multi-task learning framework to leverage text data for ASR and ST tasks. |
Y. Tang; J. Pino; C. Wang; X. Ma; D. Genzel; | icassp | 2021-05-16 |
1415 | Multilingual Phonetic Dataset for Low Resource Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a large multilingual phonetic dataset, which is preprocessed and aligned from the UCLA phonetic dataset. |
X. Li; D. R. Mortensen; F. Metze; A. W. Black; | icassp | 2021-05-16 |
1416 | Using Synthetic Audio to Improve The Recognition of Out-of-Vocabulary Words in End-to-End Asr Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use a text-to-speech (TTS) engine to provide synthetic audio for out-of-vocabulary (OOV) words. |
X. Zheng; Y. Liu; D. Gunceler; D. Willett; | icassp | 2021-05-16 |
1417 | Unsupervised Domain Adaptation for Speech Recognition Via Uncertainty Driven Self-Training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that self-training (ST) combined with an uncertainty-based pseudo-label filtering approach can be effectively used for domain adaptation. |
S. Khurana; N. Moritz; T. Hori; J. L. Roux; | icassp | 2021-05-16 |
1418 | Speech Recognition By Simply Fine-Tuning Bert IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. |
W. -C. HUANG et. al. | icassp | 2021-05-16 |
1419 | Emotion Recognition By Fusing Time Synchronous and Time Asynchronous Representations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel two-branch neural network model structure is proposed for multimodal emotion recognition, which consists of a time synchronous branch (TSB) and a time asynchronous branch (TAB). |
W. Wu; C. Zhang; P. C. Woodland; | icassp | 2021-05-16 |
1420 | SEQ-CPC : Sequential Contrastive Predictive Coding for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the contrastive predictive coding (CPC), we propose a feature representation scheme for automatic speech recognition (ASR), which encodes sequential dependency information from raw audio signals. |
Y. Chen; et al. | icassp | 2021-05-16 |
1421 | Cascaded Models with Cyclic Feedback for Direct Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data in addition to out-of-domain MT and ASR data. |
T. K. Lam; S. Schamoni; S. Riezler; | icassp | 2021-05-16 |
1422 | Vowel Non-Vowel Based Spectral Warping and Time Scale Modification for Improvement in Children�s ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a linear prediction based spectral warping method by using the knowledge of vowel and non-vowel regions in speech signals to mitigate the formant frequencies differences between child and adult speakers. |
H. Kathania; A. Kumar; M. Kurimo; | icassp | 2021-05-16 |
1423 | Cascaded Encoders for Unifying Streaming and Non-Streaming ASR IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents cascaded encoders for building a single E2E ASR model that can operate in both these modes simultaneously. |
A. Narayanan; et al. | icassp | 2021-05-16 |
1424 | An Investigation of End-to-End Models for Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address this gap and present a detailed comparison of speech enhancement-based techniques and three different model-based adaptation techniques covering data augmentation, multi-task learning, and adversarial learning for robust ASR. |
A. Prasad; P. Jyothi; R. Velmurugan; | icassp | 2021-05-16 |
1425 | Reducing Spelling Inconsistencies in Code-Switching ASR Using Contextualized CTC Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies of a character-based non-autoregressive ASR which allows for faster inference. |
B. Naowarat; T. Kongthaworn; K. Karunratanakul; S. H. Wu; E. Chuangsuwanich; | icassp | 2021-05-16 |
1426 | How Phonotactics Affect Multilingual and Zero-Shot ASR Performance IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer. |
S. Feng; et al. | icassp | 2021-05-16 |
1427 | Refining Automatic Speech Recognition System for Older Adults Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With 12 hours of training data, we attempt to develop an ASR system for socially isolated seniors (80+ years old) with possible cognitive impairments. |
L. Chen; M. Asgari; | icassp | 2021-05-16 |
1428 | Multiple-Hypothesis CTC-Based Semi-Supervised Adaptation of End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an adaptation method for end-to-end speech recognition. |
C. -T. Do; R. Doddipatla; T. Hain; | icassp | 2021-05-16 |
1429 | Improved Robustness to Disfluencies in Rnn-Transducer Based Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. |
V. Mendelev; T. Raissi; G. Camporese; M. Giollo; | icassp | 2021-05-16 |
1430 | Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a speaker-attributed minimum Bayes risk (SA-MBR) training method where the parameters are trained to directly minimize the expected SA-WER over the training data. |
N. Kanda; et al. | icassp | 2021-05-16 |
1431 | Multi-Task Transformer with Input Feature Reconstruction for Dysarthric Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a multi-task Transformer with input feature reconstruction as an auxiliary task, where the main task of DSR and the auxiliary reconstruction task share the same encoder network. |
C. Ding; S. Sun; J. Zhao; | icassp | 2021-05-16 |
1432 | Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we propose an enhanced ASR-TTS (EAT) model that incorporates two main features: 1) The ASR?TTS direction is equipped with a language model reward to penalize the ASR hypotheses before forwarding it to TTS. |
M. K. Baskar; L. Burget; S. Watanabe; R. F. Astudillo; J. . Cernock�; | icassp | 2021-05-16 |
1433 | Streaming Multi-Speaker ASR with RNN-T IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate two approaches to multi-speaker model training of the RNN-T: deterministic output-target assignment and permutation invariant training. |
I. Sklyar; A. Piunova; Y. Liu; | icassp | 2021-05-16 |
1434 | Non-Intrusive Binaural Prediction of Speech Intelligibility Based on Phoneme Classification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. |
J. Ro�bach; S. R�ttges; C. F. Hauth; T. Brand; B. T. Meyer; | icassp | 2021-05-16 |
1435 | Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-Training and Its Application to Children�s ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a bidirectional unsupervised model pre-training (UPT) method and apply it to children?s automatic speech recognition (ASR). |
R. Fan; A. Afshan; A. Alwan; | icassp | 2021-05-16 |
1436 | ASR N-Best Fusion Nets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a fusion network to jointly consider ASR n-best hypotheses for enhanced robustness to ASR errors. |
X. Liu; et al. | icassp | 2021-05-16 |
1437 | Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. |
J. Liao; et al. | icassp | 2021-05-16 |
1438 | Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore a Connectionist Temporal Classification (CTC) based end-to-end Automatic Speech Recognition (ASR) model for the Myanmar language. |
Khin Me Me Chit; Laet Laet Lin; | arxiv-cs.LG | 2021-05-13 |
1439 | What Shall We Do with An Hour of Data? Speech Recognition for The Un- and Under-served Languages of Common Voice IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This technical report describes the methods and results of a three-week sprint to produce deployable speech recognition models for 31 under-served languages of the Common Voice project. |
Francis M. Tyers; Josh Meyer; | arxiv-cs.CL | 2021-05-10 |
1440 | FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. |
YICHONG LENG et. al. | arxiv-cs.CL | 2021-05-09 |
1441 | End-to-End Speech Recognition from Federated Acoustic Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French and Italian sets of the CommonVoice dataset, a large heterogeneous dataset containing thousands of different speakers, acoustic environments and noises. |
YAN GAO et. al. | arxiv-cs.SD | 2021-04-29 |
1442 | Using Radio Archives for Low-Resource Speech Recognition: Towards An Intelligent Virtual Assistant for Illiterate Users IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the effectiveness of unsupervised speech representation learning on noisy radio broadcasting archives, which are abundant even in low-resource languages. First, we release two datasets to the research community. |
Moussa Doumbouya; Lisa Einstein; Chris Piech; | arxiv-cs.LG | 2021-04-27 |
1443 | LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. |
SOLENE EVAIN et. al. | arxiv-cs.CL | 2021-04-23 |
1444 | Earnings-21: A Practical Benchmark for ASR in The Wild IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. |
MIGUEL DEL RIO et. al. | arxiv-cs.CL | 2021-04-22 |
1445 | Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors. |
Qian Chen; Wen Wang; Qinglin Zhang; | arxiv-cs.CL | 2021-04-21 |
1446 | Accented Speech Recognition: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a survey of current promising approaches to accented speech recognition and highlight the key challenges in the space. |
ARTHUR HINSVARK et. al. | arxiv-cs.CL | 2021-04-21 |
1447 | On The Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for The Deep Learning Era IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to tackle the above issues, we create transcripts from the original speech by applying three modern ASR systems, including an end-to-end model trained with recurrent neural network-transducer loss, a model with connectionist temporal classification loss, and a wav2vec framework for self-supervised learning. |
SHAHIN AMIRIPARIAN et. al. | arxiv-cs.SD | 2021-04-20 |
1448 | Discriminative Self-training for Punctuation Prediction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts. |
Qian Chen; Wen Wang; Mengzhe Chen; Qinglin Zhang; | arxiv-cs.CL | 2021-04-20 |
1449 | Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend our prior work by (1) introducing the Conformer architecture to further improve the accuracy, (2) accelerating the decoding process with a novel activation recycling technique, and (3) enabling streaming decoding with triggered attention. |
Takaaki Hori; Niko Moritz; Chiori Hori; Jonathan Le Roux; | arxiv-cs.CL | 2021-04-19 |
1450 | Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an acoustic data-driven subword modeling (ADSM) approach that adapts the advantages of several text-based and acoustic-based subword methods into one pipeline. |
Wei Zhou; Mohammad Zeineldeen; Zuoyun Zheng; Ralf Schlüter; Hermann Ney; | arxiv-cs.CL | 2021-04-19 |
1451 | Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on the unsupervised domain adaptation for ASR and propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. |
Wenxin Hou; Jindong Wang; Xu Tan; Tao Qin; Takahiro Shinozaki; | arxiv-cs.SD | 2021-04-15 |
1452 | Experiments of ASR-based Mispronunciation Detection for Children and Adult English Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, to detect mispronunciations, we used a phone-based ASR implemented using Kaldi. |
Nina Hosseini-Kivanani; Roberto Gretter; Marco Matassoni; Giuseppe Daniele Falavigna; | arxiv-cs.CL | 2021-04-13 |
1453 | Comparing The Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the effect of varying pre-processing, the speaker embedding and input encoding of the TTS system w.r.t. the effectiveness of the synthesized data for AED-ASR training. |
Nick Rossenbach; Mohammad Zeineldeen; Benedikt Hilmes; Ralf Schlüter; Hermann Ney; | arxiv-cs.CL | 2021-04-12 |
1454 | Non-autoregressive Transformer-based End-to-end ASR Using BERT IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, to not only inherit the advantages of non-autoregressive ASR models but also enjoy the benefits of a pre-trained language model (e.g., BERT), we propose a non-autoregressive Transformer-based end-to-end ASR model based on BERT. |
Fu-Hao Yu; Kuan-Yu Chen; | arxiv-cs.CL | 2021-04-10 |
1455 | Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the TTS->ASR pipeline in speech chain to do domain adaptation for both neural TTS and E2E ASR models, with only text data from target domain. |
Fengpeng Yue; Yan Deng; Lei He; Tom Ko; | arxiv-cs.CL | 2021-04-08 |
1456 | WNARS: WFST Based Non-autoregressive Streaming End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, namely WNARS, using hybrid CTC-attention AED models and weighted finite-state transducers (WFST) to solve these problems together. |
Zhichao Wang; Wenwen Yang; Pan Zhou; Wei Chen; | arxiv-cs.SD | 2021-04-08 |
1457 | BSTC: A Large-Scale Chinese-English Speech Translation Dataset IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents BSTC (Baidu Speech Translation Corpus), a large-scale Chinese-English speech translation dataset. |
RUIQING ZHANG et. al. | arxiv-cs.CL | 2021-04-08 |
1458 | Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the first challenge, we propose a novel system that can predict intents from flexible types of inputs: speech, ASR transcripts, or both. |
SUJEONG CHA et. al. | arxiv-cs.CL | 2021-04-07 |
1459 | Towards An Automatic Speech-Based Diagnostic Test for Alzheimer’s Disease IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is widely used in many applications and tools. Smartphones, video games, and cars are a few examples where people use ASR routinely and often … |
Roozbeh Sadeghian; J. Schaffer; S. Zahorian; | Frontiers of Computer Science | 2021-04-07 |
1460 | EasyCall Corpus: A Dysarthric Speech Dataset IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a new dysarthric speech command dataset in Italian, called EasyCall corpus. |
ROSANNA TURRISI et. al. | arxiv-cs.CL | 2021-04-06 |
1461 | Non-autoregressive Mandarin-English Code-switching Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate these methods on the SEAME corpus and achieved exciting results. |
Shun-Po Chuang; Heng-Jui Chang; Sung-Feng Huang; Hung-yi Lee; | arxiv-cs.CL | 2021-04-05 |
1462 | Streaming Multi-talker Speech Recognition with Joint Speaker Identification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Streaming Unmixing, Recognition and Identification Transducer (SURIT) — a new framework that deals with this problem in an end-to-end streaming fashion. |
Liang Lu; Naoyuki Kanda; Jinyu Li; Yifan Gong; | arxiv-cs.SD | 2021-04-05 |
1463 | Dissecting User-Perceived Latency of On-Device E2E Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work examines the impact of various techniques – model architectures, training criteria, decoding hyperparameters, and endpointer parameters – on UPL. |
YUAN SHANGGUAN et. al. | arxiv-cs.SD | 2021-04-05 |
1464 | Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a novel Semantic Distance (SemDist) measure as an alternative evaluation metric for ASR systems to address this issue. |
SUYOUN KIM et. al. | arxiv-cs.CL | 2021-04-05 |
1465 | Talk, Don’t Write: A Study of Direct Speech-Based Image Retrieval IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extensively study and expand choices of encoder architectures, training methodology (including unimodal and multimodal pretraining), and other factors. |
Ramon Sanabria; Austin Waters; Jason Baldridge; | arxiv-cs.CL | 2021-04-05 |
1466 | On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. |
Tsz Kin Lam; Mayumi Ohta; Shigehiko Schamoni; Stefan Riezler; | arxiv-cs.CL | 2021-04-03 |
1467 | Tusom2021: A Phonetically Transcribed Speech Dataset from An Endangered Language for Universal Phone Recognition Experiments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a publicly available, phonetically transcribed corpus of 2255 utterances (words and short phrases) in the endangered Tangkhulic language East Tusom (no ISO 639-3 code), a Tibeto-Burman language variety spoken mostly in India. |
David R. Mortensen; Jordan Picone; Xinjian Li; Kathleen Siminyu; | arxiv-cs.CL | 2021-04-01 |
1468 | Configurable Privacy-Preserving Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate whether modular automatic speech recognition (ASR) can improve privacy in voice assistive systems by combining independently trained separation, recognition, and discretization modules to design configurable privacy-preserving ASR systems. |
Ranya Aloufi; Hamed Haddadi; David Boyle; | arxiv-cs.CL | 2021-04-01 |
1469 | Multilingual and Code-switching ASR Challenges for Low Resource Indian Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English. |
ANUJ DIWAN et. al. | arxiv-cs.CL | 2021-03-31 |
1470 | Multiple-hypothesis CTC-based Semi-supervised Adaptation of End-to-end Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an adaptation method for end-to-end speech recognition. |
Cong-Thanh Do; Rama Doddipatla; Thomas Hain; | arxiv-cs.CL | 2021-03-29 |
1471 | An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of … |
J. Jeong; S. I. M. M. R. Mondol; Y. Kim; Sangmin Lee; | Electronics | 2021-03-29 |
1472 | Construction of A Large-scale Japanese ASR Corpus on TV Recordings IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new large-scale Japanese speech corpus for training automatic speech recognition (ASR) systems. |
Shintaro Ando; Hiromasa Fujihara; | arxiv-cs.SD | 2021-03-26 |
1473 | Leveraging Pre-trained Representations to Improve Access to Untranscribed Speech from Endangered Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using data from 7 Australian Aboriginal languages and a regional variety of Dutch, all of which are endangered or vulnerable, we show that QbE-STD can be improved by leveraging representations developed for ASR (wav2vec 2.0: the English monolingual model and XLSR53 multilingual model). |
NAY SAN et. al. | arxiv-cs.CL | 2021-03-26 |
1474 | An Approach to Improve Robustness of NLP Systems Against ASR Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we utilize the prevalent pre-trained language model to generate training samples with ASR-plausible noise. |
Tong Cui; Jinghui Xiao; Liangyou Li; Xin Jiang; Qun Liu; | arxiv-cs.CL | 2021-03-25 |
1475 | Real-time Low-resource Phoneme Recognition on Edge Devices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The method presented in this paper shows how to create and train models for speech recognition in any language which are not only highly accurate, but also require very little storage, memory and training data when compared with traditional models. |
Yonatan Alon; | arxiv-cs.CL | 2021-03-25 |
1476 | Hallucination of Speech Recognition Errors with Sequence to Sequence Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present novel end-to-end models to directly predict hallucinated ASR word sequence outputs, conditioning on an input word sequence as well as a corresponding phoneme sequence. |
Prashant Serai; Vishal Sunder; Eric Fosler-Lussier; | arxiv-cs.CL | 2021-03-22 |
1477 | SoK: A Modularized Approach to Study The Security of Automatic Speech Recognition Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present our systematization of knowledge for ASR security and provide a comprehensive taxonomy for existing work based on a modularized workflow. |
YUXUAN CHEN et. al. | arxiv-cs.CR | 2021-03-19 |
1478 | Contextual Biasing of Language Models for Speech Recognition in Goal-Oriented Conversational Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore different ways to incorporate context into a LSTM based NLM in order to model long range dependencies and improve speech recognition. |
Ashish Shenoy; Sravan Bodapati; Katrin Kirchhoff; | arxiv-cs.CL | 2021-03-18 |
1479 | Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Transformer-based ASR model with the time reduction layer, in which we incorporate time reduction layer inside transformer encoder layers in addition to traditional sub-sampling methods to input features that further reduce the frame-rate. |
Md Akmal Haidar; Chao Xing; Mehdi Rezagholizadeh; | arxiv-cs.AI | 2021-03-17 |
1480 | Fast Development of ASR in African Languages Using Self Supervised Speech Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020. |
Jama Hussein Mohamud; Lloyd Acquaye Thompson; Aissatou Ndoye; Laurent Besacier; | arxiv-cs.SD | 2021-03-16 |
1481 | OkwuGbé: End-to-End Speech Recognition for Fon and Igbo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the growing awareness and effort to include more low-resourced languages in NLP research, African languages have recently been a major subject of research in machine translation, and other text-based areas of NLP. |
Bonaventure F. P. Dossou; Chris C. Emezue; | arxiv-cs.CL | 2021-03-13 |
1482 | Continuous Speech Separation with Ad Hoc Microphone Arrays IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Several techniques are introduced to enable speech separation for real continuous recordings. |
DONGMEI WANG et. al. | arxiv-cs.SD | 2021-03-03 |
1483 | Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a deep learning-based algorithm to improve the performance of automatic speech recognition (ASR) systems for aphasia, apraxia, and dysarthria speech by utilizing electroencephalography (EEG) features recorded synchronously with aphasia, apraxia, and dysarthria speech. |
GAUTAM KRISHNA et. al. | arxiv-cs.SD | 2021-02-27 |
1484 | MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR). |
LINGHUI MENG et. al. | arxiv-cs.CL | 2021-02-24 |
1485 | Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. |
JUNWEI LIAO et. al. | arxiv-cs.CL | 2021-02-22 |
1486 | The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: The variety of accents has posed a big challenge to speech recognition. The Accented English Speech Recognition Challenge (AESRC2020) is designed for providing a common testbed … |
XIAN SHI et. al. | arxiv-cs.SD | 2021-02-19 |
1487 | Echo State Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose automatic speech recognition (ASR) models inspired by echo state network (ESN), in which a subset of recurrent neural networks (RNN) layers in the models are randomly initialized and untrained. |
Harsh Shrivastava; Ankush Garg; Yuan Cao; Yu Zhang; Tara Sainath; | arxiv-cs.CL | 2021-02-17 |
1488 | ATCSpeechNet: A Multilingual End-to-end Speech Recognition Framework for Air Traffic Control Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a multilingual end-to-end framework, called as ATCSpeechNet, is proposed to tackle the issue of translating communication speech into human-readable text in air traffic control (ATC) systems. |
YI LIN et. al. | arxiv-cs.CL | 2021-02-16 |
1489 | Improving Speech Recognition Models with Small Samples for Air Traffic Control Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved end-to-end deep learning model is developed to address the specific challenges of ASR in the ATC domain. |
YI LIN et. al. | arxiv-cs.SD | 2021-02-16 |
1490 | Jira: A Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first large vocabulary speech recognition system (LVSR) for the Central Kurdish language, named Jira. To fill this gap, we introduce the first speech corpus and pronunciation lexicon for the Kurdish language. |
Hadi Veisi; Hawre Hosseini; Mohammad Mohammadamini; Wirya Fathy; Aso Mahmudi; | arxiv-cs.AI | 2021-02-15 |
1491 | Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. |
RYO MASUMURA et. al. | arxiv-cs.CL | 2021-02-15 |
1492 | Thank You for Attention: A Survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey paper, a comprehensive review of the different attention models used in developing automatic speech recognition systems is provided. |
Priyabrata Karmakar; Shyh Wei Teng; Guojun Lu; | arxiv-cs.SD | 2021-02-14 |
1493 | Exploring Transfer Learning For End-to-End Spoken Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an E2E system that is designed to jointly train on multiple speech-to-text tasks, such as ASR (speech-transcription) and SLU (speech-hypothesis), and text-to-text tasks, such as NLU (text-hypothesis). |
SUBENDHU RONGALI et. al. | aaai | 2021-02-09 |
1494 | BembaSpeech: A Speech Recognition Corpus for The Bemba Language IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia. |
Claytone Sikasote; Antonios Anastasopoulos; | arxiv-cs.CL | 2021-02-09 |
1495 | Federated Acoustic Modeling For Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate federated acoustic modeling using data from multiple clients. |
Xiaodong Cui; Songtao Lu; Brian Kingsbury; | arxiv-cs.SD | 2021-02-08 |
1496 | Effects of Layer Freezing on Transferring A Speech Recognition System to Under-resourced Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the effect of layer freezing on the effectiveness of model transfer in the area of automatic speech recognition. |
Onno Eberhard; Torsten Zesch; | arxiv-cs.CL | 2021-02-08 |
1497 | A Bandit Approach to Curriculum Generation for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present an approach to mitigate the lack of training data by employing Automated Curriculum Learning in combination with an adversarial bandit approach inspired by Reinforcement learning. |
Anastasia Kuznetsova; Anurag Kumar; Francis M. Tyers; | arxiv-cs.CL | 2021-02-06 |
1498 | Effects of Number of Filters of Convolutional Layers on Speech Recognition Model Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the progress of the End-to-End approach [1], this paper systematically studies the effects of Number of Filters of convolutional layers on the model prediction accuracy of CNN+RNN (Convolutional Neural Networks adding to Recurrent Neural Networks) for ASR Models (Automatic Speech Recognition). |
James Mou; Jun Li; | arxiv-cs.LG | 2021-02-03 |
1499 | WeNet: Production Oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model. |
ZHUOYUAN YAO et. al. | arxiv-cs.SD | 2021-02-02 |
1500 | The Multilingual TEDx Corpus for Speech Recognition and Translation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. |
ELIZABETH SALESKY et. al. | arxiv-cs.CL | 2021-02-02 |