Paper Digest: Recent Papers on Speech Recognition

June 20, 2020December 27, 2024 admin

Paper Digest Team extracted all recent Speech Recognition related papers on our radar, and generated highlight sentences for them. The results are then sorted by relevance & date. In addition to this ‘static’ page, we also provide a real-time version of this article, which has more coverage and is updated in real time to include the most recent updates on this topic.

This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read, write, get answers and review.

Try us today and unlock the full potential of our services for free!

TABLE 1: Paper Digest: Recent Papers on Speech Recognition

	Paper	Author(s)	Source	Date
1	Zero-resource Speech Translation and Recognition with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data.	KAREL MUNDNICH et. al.	arxiv-cs.CL	2024-12-24
2	Adapting Whisper for Code-Switching Through Encoding Refining and Language-Aware Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we adapt Whisper, which is a large-scale multilingual pre-trained speech recognition model, to CS from both encoder and decoder parts.	JIAHUI ZHAO et. al.	arxiv-cs.CL	2024-12-21
3	CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce CAMEL, a cross-attention-based MoE and language bias approach for code-switching ASR.	HE WANG et. al.	arxiv-cs.SD	2024-12-17
4	Style-agnostic Evaluation of ASR Using Multiple Reference Transcripts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style parameters.	QUINTEN MCNAMARA et. al.	arxiv-cs.CL	2024-12-10
5	ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply LLMs to ASR error correction in three paradigms.	Victor Junqiu Wei; Weicheng Wang; Di Jiang; Yuanfeng Song; Lu Wang;	arxiv-cs.CL	2024-12-04
6	Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods do not encompass all dysarthric features used in clinical evaluation. To address this gap, we propose a feature extraction method that minimizes information loss.	Yerin Choi; Jeehyun Lee; Myoung-Wan Koo;	arxiv-cs.SD	2024-12-04
7	GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot.	AOHAN ZENG et. al.	arxiv-cs.CL	2024-12-03
8	A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this work, we aim to explore the capability of LLMs in low resource ASR and Mandarin-English code switching ASR.	Zheshu Song; Ziyang Ma; Yifan Yang; Jianheng Zhuo; Xie Chen;	arxiv-cs.AI	2024-12-01
9	Continual Learning in Machine Speech Chain Using Gradient Episodic Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach leveraging the machine speech chain framework to enable continual learning in ASR using gradient episodic memory (GEM).	GEOFFREY TYNDALL et. al.	arxiv-cs.CL	2024-11-27
10	MSA-ASR: Efficient Multilingual Speaker Attribution with Frozen ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach, leveraging a frozen multilingual ASR model to incorporate speaker attribution into the transcriptions, using only standard monolingual ASR datasets.	Thai-Binh Nguyen; Alexander Waibel;	arxiv-cs.CL	2024-11-27
11	How to Learn A New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Typical solutions like fine-tuning the SSL model suffer from high computation costs while using frozen SSL models as feature extractors comes with poor performance.	SHIH-HENG WANG et. al.	arxiv-cs.SD	2024-11-27
12	Aligning Pre-trained Models for Spoken Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates a novel approach to end-to-end speech translation (ST) based on aligning frozen pre-trained automatic speech recognition (ASR) and machine translation (MT) models via a small connector module (Q-Former, our Subsampler-Transformer Encoder).	Šimon Sedláček; Santosh Kesiraju; Alexander Polok; Jan Černocký;	arxiv-cs.CL	2024-11-27
13	AMPS: ASR with Multimodal Paraphrase Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR in multiple languages, including Hindi, Marathi, Malayalam, Kannada, and Nyanja.	Amruta Parulekar; Abhishek Gupta; Sameep Chattopadhyay; Preethi Jyothi;	arxiv-cs.CL	2024-11-27
14	Scaling Speech-Text Pre-training with Synthetic Interleaved Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach to scaling speech-text pre-training by leveraging large-scale synthetic interleaved data derived from text corpora, eliminating the need for parallel speech-text datasets.	AOHAN ZENG et. al.	arxiv-cs.CL	2024-11-26
15	Comparative Analysis of ASR Methods for Speech Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this connection is not yet entirely clear, and we do not know whether improved performance in ASR corresponds to higher speech deepfake detection capabilities. In this paper, we address this question through a systematic analysis.	DAVIDE SALVI et. al.	arxiv-cs.SD	2024-11-26
16	Hard-Synth: Synthesizing Diverse Hard Samples for ASR Using Zero-Shot TTS and LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Hard-Synth, a novel ASR data augmentation method that leverages large language models (LLMs) and advanced zero-shot TTS.	JIAWEI YU et. al.	arxiv-cs.CL	2024-11-20
17	Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on The Edge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a resource-efficient cross-modal alignment framework that bridges ASR and LLMs on edge devices to handle personalized audio input.	RUIYANG QIN et. al.	arxiv-cs.SD	2024-11-20
18	CAFE A Novel Code Switching Dataset for Algerian Dialect French and English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper introduces and publicly releases (Data download link available after acceptance) CAFE — the first Code-switching dataset between Algerian dialect, French, and english languages.	HOUSSAM EDDINE-OTHMAN LACHEMAT et. al.	arxiv-cs.SD	2024-11-20
19	BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study presents an end-to-end pipeline for converting dialectal Noakhali speech to standard Bangla speech.	MD. NAZMUS SADAT SAMIN et. al.	arxiv-cs.CL	2024-11-16
20	Interactive Cycle Model — The Linkage Combination Among Automatic Speech Recognition, Large Language Models and Smart Glasses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This research proposes the interaction loop model ASR-LLMs-Smart Glasses, which model combines automatic speech recognition, large language model and smart glasses to facilitate seamless human-computer interaction.	Libo Wang;	arxiv-cs.HC	2024-11-15
21	Everyone Deserves Their Voice to Be Heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyzed the word error rate, character error rate and a BERT-based semantic similarity across gender groups. We used the moral framework of Weerts et al. (2022) to assess quality of service harms and fairness, and to provide a nuanced discussion on the implications of these biases, particularly for automatic subtitling.	Rik Raes; Saskia Lensink; Mykola Pechenizkiy;	arxiv-cs.CL	2024-11-14
22	Task Arithmetic Can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods suffer in performance when they fine-tune an automatic speech recognition (ASR) model on synthetic data as they suffer from the distributional shift commonly referred to as the synthetic-to-real gap. In this paper, we find that task arithmetic is effective at mitigating this gap.	Hsuan Su; Hua Farn; Fan-Yun Sun; Shang-Tse Chen; Hung-yi Lee;	emnlp	2024-11-11
23	BLSP-Emo: Towards Empathetic Large Speech-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present BLSP-Emo (Bootstrapped Language-Speech Pretraining with Emotion support), a novel approach to developing an end-to-end speech-language model capable of understanding both semantics and emotions in speech and generate empathetic responses.	CHEN WANG et. al.	emnlp	2024-11-11
24	Advancing Test-Time Adaptation in Wild Acoustic Test Settings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel wild acoustic TTA method tailored for ASR fine-tuned acoustic foundation models.	Hongfu Liu; Hengguan Huang; Ye Wang;	emnlp	2024-11-11
25	TokenVerse: Towards Unifying Speech and NLP Tasks Via Transducer-based ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks.	SHASHI KUMAR et. al.	emnlp	2024-11-11
26	VHASR: A Multimodal Speech Recognition System With Vision Hotwords Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach effectively utilizing audio-related image information and set up VHASR, a multimodal speech recognition system that uses vision as hotwords to strengthen the model’s speech recognition capability.	JILIANG HU et. al.	emnlp	2024-11-11
27	Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system, by cutting off the non-causal effect of noises.	YEONJOON JUNG et. al.	emnlp	2024-11-11
28	What Is Lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our research reveals that current text normalization practices, while aiming to standardize ASR outputs for fair comparison, by removing inconsistencies such as variations in spelling, punctuation, and special characters, are fundamentally flawed when applied to Indic scripts. Through empirical analysis using text similarity scores and in-depth linguistic examination, we demonstrate that these flaws lead to artificially improved performance metrics for Indic languages.	Kavya Manohar; Leena G Pillai;	emnlp	2024-11-11
29	Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets, encompassing 19 languages from eight language families and two speaking conditions.	Giuseppe Attanasio; Beatrice Savoldi; Dennis Fucci; Dirk Hovy;	emnlp	2024-11-11
30	Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Selective state space models (SSMs) represented by Mamba have demonstrated their computational efficiency and promising outcomes in various tasks, including automatic speech recognition (ASR).	Yoshiki Masuyama; Koichi Miyazaki; Masato Murata;	arxiv-cs.SD	2024-11-11
31	Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR.	Guan-Ting Lin; Wei Ping Huang; Hung-yi Lee;	emnlp	2024-11-11
32	Optimizing Entity Resolution in Voice Interfaces: An ASR-Aware Entity Reference Expansion Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Navigating the equilibrium between accuracy and online retrieval’s speed requirement proves challenging, particularly when limited data links the failed mentions to resolved entities. In this paper, we propose a entity reference expansion system, injecting pairs of failed mentions and resolved entity names into the knowledge graph, enhancing its awareness of unresolved mentions.	Jiangning Chen; Ziyun Zhang; Qianli Hu;	emnlp	2024-11-11
33	Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this approach we aim to build ASR model for languages with limited digital resources by sequentially adapting the model across linguistically similar languages.	Leena G Pillai; Kavya Manohar; Basil K Raju; Elizabeth Sherly;	arxiv-cs.CL	2024-11-07
34	Dialectal Coverage And Generalization in Arabic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study explores three critical factors influencing ASR performance: the role of dialectal coverage in pre-training, the effectiveness of dialect-specific fine-tuning compared to a multi-dialectal approach, and the ability to generalize to unseen dialects.	Amirbek Djanibekov; Hawau Olamide Toyin; Raghad Alshalan; Abdullah Alitr; Hanan Aldarmaki;	arxiv-cs.CL	2024-11-07
35	Enhancing AAC Software for Dysarthric Speakers in E-Health Settings: An Evaluation Using TORGO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prompt-overlap is a well-known issue with this dataset where phrases overlap between training and test speakers. Our work proposes an algorithm to break this prompt-overlap.	Macarious Hui; Jinda Zhang; Aanchan Mohan;	arxiv-cs.CL	2024-11-01
36	Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach that first refines all available transcriptions to ensure data reliability.	Enshi Zhang; Christian Poellabauer;	arxiv-cs.CL	2024-10-27
37	STTATTS: Unified Speech-To-Text And Text-To-Speech Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a parameter-efficient approach to learning ASR and TTS jointly via a multi-task learning objective and shared parameters.	Hawau Olamide Toyin; Hao Li; Hanan Aldarmaki;	arxiv-cs.CL	2024-10-24
38	Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contributions include creating a domain-specific dataset, comprehensive ASR model evaluations, and an effective augmentation technique.	ChaeHun Park; Hojun Cho; Jaegul Choo;	arxiv-cs.CL	2024-10-24
39	MmWave-Whisper: Phone Call Eavesdropping and Transcription Using Millimeter-Wave Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces mmWave-Whisper, a system that demonstrates the feasibility of full-corpus automated speech recognition (ASR) on phone calls eavesdropped remotely using off-the-shelf frequency modulated continuous wave (FMCW) millimeter-wave radars.	Suryoday Basak; Abhijeeth Padarthi; Mahanth Gowda;	arxiv-cs.SD	2024-10-22
40	DENOASR: Debiasing ASRs Through Selective Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel framework DENOASR, which is a selective denoising technique to reduce the disparity in the word error rates between the two gender groups, male and female.	Anand Kumar Rai; Siddharth D Jaiswal; Shubham Prakash; Bendi Pragnya Sree; Animesh Mukherjee;	arxiv-cs.SD	2024-10-22
41	VoiceBench: Benchmarking LLM-Based Voice Assistants Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.	YIMING CHEN et. al.	arxiv-cs.CL	2024-10-22
42	Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Parameter-efficient fine-tuning and text-only adaptation are two popular methods that have been used to address such low-resource settings. In this work, we investigate how these techniques can be effectively combined using a multilingual multimodal model like SeamlessM4T.	Abhishek Gupta; Amruta Parulekar; Sameep Chattopadhyay; Preethi Jyothi;	arxiv-cs.CL	2024-10-17
43	Investigation of Speaker Representation for Target-Speaker Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While most studies have focused on training schemes or system architectures for each specific task, the auxiliary network for embedding target-speaker cues has not been investigated comprehensively in a unified cross-task evaluation. Therefore, this paper aims to address a fundamental question: what is the preferred speaker embedding for TS tasks?	TAKANORI ASHIHARA et. al.	arxiv-cs.SD	2024-10-14
44	Automatic Speech Recognition with BERT and CTC Transformers: A Review Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: All in all, this review provides valuable insights for researchers and practitioners who are interested in ASR with BERT and CTC transformers.	Noussaiba Djeffal; Hamza Kheddar; Djamel Addou; Ahmed Cherif Mazari; Yassine Himeur;	arxiv-cs.CL	2024-10-12
45	Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To develop Indonesian automatic speech recognition (ASR), we present our research on state-of-the-art speech recognition models, namely Massively Multilingual Speech (MMS) and Whisper, as well as compiling a dataset comprising Indonesian speech with variabilities to facilitate our study.	AULIA ADILA et. al.	arxiv-cs.CL	2024-10-11
46	Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we hypothesize that incorporating speaker representations during speech recognition can enhance model robustness to noise.	Sagarika Alavilli; Annesya Banerjee; Gasser Elbanna; Annika Magaro;	arxiv-cs.SD	2024-10-07
47	Integrating Paralinguistics in Speech-Empowered Large Language Models for Natural Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken responses with naturally occurring prosodic features relevant to the given input speech without relying on explicit automatic speech recognition (ASR) or text-to-speech (TTS) systems.	HEESEUNG KIM et. al.	nips	2024-10-07
48	REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR.	LIANG-HSUAN TSENG et. al.	nips	2024-10-07
49	Comprehensive Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for The Polish Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A comprehensive framework has been designed to survey, catalog, and curate available speech datasets, which allows replicable evaluation of automatic speech recognition (ASR) systems.	Michał Junczyk;	nips	2024-10-07
50	Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) have started to play a vital role in modelling speech and text.	Pavel Stepachev; Pinzhen Chen; Barry Haddow;	arxiv-cs.CL	2024-10-04
51	Reverb: Open-Source ASR and Diarization from Rev Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as …	NISHCHAL BHANDARI et. al.	arxiv-cs.CL	2024-10-04
52	Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE).	Olga Iakovenko; Ivan Bondarenko;	arxiv-cs.SD	2024-10-03
53	Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rules described in the present paper are implemented in an open-source module, which can be of use to any scientific study connected to ASR or Speech To Text (STT) tasks.	Olga Iakovenko; Ivan Bondarenko; Mariya Borovikova; Daniil Vodolazsky;	arxiv-cs.CL	2024-10-03
54	Automatic Speech Recognition for The Ika Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a cost-effective approach for developing Automatic Speech Recognition (ASR) models for low-resource languages like Ika.	Uchenna Nzenwata; Daniel Ogbuigwe;	arxiv-cs.CL	2024-10-01
55	AfriHuBERT: A Self-supervised Speech Representation Model for African Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present AfriHuBERT, an extension of mHuBERT-147, a state-of-the-art (SOTA) and compact self-supervised learning (SSL) model, originally pretrained on 147 languages.	Jesujoba O. Alabi; Xuechen Liu; Dietrich Klakow; Junichi Yamagishi;	arxiv-cs.CL	2024-09-30
56	ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, developing robust ASR models for young children’s speech remains challenging due to differences in pronunciation, tone, and pace compared to adult speech. In this paper, we introduce a new Mandarin speech dataset focused on children aged 3 to 5, addressing the scarcity of resources in this area.	JIAMING ZHOU et. al.	arxiv-cs.SD	2024-09-27
57	Improving Multilingual ASR in The Wild Using Simple N-best Re-ranking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy for several prominent acoustic models by employing external features such as language models and text-based language identification models.	Brian Yan; Vineel Pratap; Shinji Watanabe; Michael Auli;	arxiv-cs.CL	2024-09-26
58	Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These models often rely on an ASR-to-TTS chain-of-thought pipeline, converting speech into text for processing before generating audio responses, which introduces latency and loses audio features. We propose a method that implicitly internalizes ASR chain of thought into a speech LLM, enhancing its native speech understanding capabilities.	Robin Shing-Hei Yuen; Timothy Tin-Long Tse; Jian Zhu;	arxiv-cs.CL	2024-09-25
59	Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning.	Andrés Piñeiro-Martín; Carmen García-Mateo; Laura Docío-Fernández; María del Carmen López-Pérez; Georg Rehm;	arxiv-cs.CL	2024-09-25
60	Spelling Correction Through Rewriting of Non-Autoregressive ASR Lattices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models.	LEONID VELIKOVICH et. al.	arxiv-cs.CL	2024-09-24
61	Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a speech-conditioned Large Language Model (LLM) integrated with a Mixture of Experts (MoE) based connector to address the challenge of Code-Switching (CS) in Automatic Speech Recognition (ASR).	FENGRUN ZHANG et. al.	arxiv-cs.SD	2024-09-24
62	Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel training approach to enhance LLM performance in ASR tasks.	Yang Yuhang; Peng Yizhou; Eng Siong Chng; Xionghu Zhong;	arxiv-cs.CL	2024-09-24
63	MultiMed: Multilingual Medical Speech Recognition Via Attention Encoder Decoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce MultiMed, a collection of small-to-large end-to-end ASR models for the medical domain, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese, together with the corresponding real-world ASR dataset.	KHAI LE-DUC et. al.	arxiv-cs.CL	2024-09-21
64	Fast Streaming Transducer ASR Prototyping Via Knowledge Distillation with Whisper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs in their entirety with pseudo-labeled (PL) speech from foundational speech models (FSM).	IULIIA THORBECKE et. al.	arxiv-cs.CL	2024-09-20
65	A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the ASR model propagates its errors to the retriever. In this work, we try to alleviate these limitations by proposing an ASR-free, end-to-end trained multimodal dense retriever that can work directly on spoken questions.	Georgios Sidiropoulos; Evangelos Kanoulas;	arxiv-cs.CL	2024-09-20
66	Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the present work, we conduct a set of experiments around zero-shot learning with synthetic speech data for the specific task of speech commands classification.	Sebastião Quintas; Isabelle Ferrané; Thomas Pellegrini;	arxiv-cs.SD	2024-09-19
67	Personalized Speech Recognition for Children with Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We devised a novel ASR pipeline to apply unsupervised test-time adaptation (TTA) methods for child speech recognition, so that ASR models pre-trained on adult speech can be continuously adapted to each child speaker at test time without further human annotations.	Zhonghao Shi; Harshvardhan Srivastava; Xuan Shi; Shrikanth Narayanan; Maja J. Matarić;	arxiv-cs.LG	2024-09-19
68	Large Language Models Are Strong Audio-Visual Speech Recognition Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the contrary, tasks like visual and audio-visual speech recognition (VSR/AVSR), which also exploit noise-invariant lip movement information, have received little or no attention. To bridge this gap, we propose Llama-AVSR, a new MLLM with strong audio-visual speech recognition capabilities.	UMBERTO CAPPELLAZZO et. al.	arxiv-cs.CV	2024-09-18
69	ASR Benchmarking: Need for A More Representative Conversational Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a multilingual conversational dataset, derived from TalkBank, consisting of unstructured phone conversation between adults.	Gaurav Maheshwari; Dmitry Ivanov; Théo Johannet; Kevin El Haddad;	arxiv-cs.CL	2024-09-18
70	M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose M2R-whisper, a novel multi-stage and multi-scale retrieval augmentation approach designed to enhance ASR performance in low-resource settings.	JIAMING ZHOU et. al.	arxiv-cs.SD	2024-09-18
71	Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To mitigate this issue, we propose a novel channel-aware data simulation method for robust ASR training.	CHIEN-CHUN WANG et. al.	arxiv-cs.SD	2024-09-18
72	Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a speech generation system that simulates the L1 shadowing process using voice conversion (VC) techniques and latent speech representations.	Haopeng Geng; Daisuke Saito; Nobuaki Minematsu;	arxiv-cs.SD	2024-09-18
73	WER We Stand: Benchmarking Urdu ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive evaluation of Urdu Automatic Speech Recognition (ASR) models.	SAMEE ARIF et. al.	arxiv-cs.CL	2024-09-17
74	Chain-of-Thought Prompting for Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM.	KE HU et. al.	arxiv-cs.CL	2024-09-17
75	Speech Recognition for Analysis of Police Radio Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain.	Tejes Srivastava; Ju-Chieh Chou; Priyank Shroff; Karen Livescu; Christopher Graziul;	arxiv-cs.SD	2024-09-16
76	Augmenting Automatic Speech Recognition Models with Disfluency Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an inference-only approach to augment any ASR model with the ability to detect open-set disfluencies.	Robin Amann; Zhaolin Li; Barbara Bruno; Jan Niehues;	arxiv-cs.CL	2024-09-16
77	Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore new capabilities in language modeling for speech processing, we introduce the generative speech transcription error correction (GenSEC) challenge.	CHAO-HAN HUCK YANG et. al.	arxiv-cs.CL	2024-09-15
78	Exploring SSL Discrete Tokens for Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a comprehensive comparison of discrete tokens generated by various leading SSL models across multiple language domains.	MINGYU CUI et. al.	arxiv-cs.CL	2024-09-13
79	Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a pioneering effort to investigate the capability of LLMs in transcribing speech in multi-talker environments, following versatile instructions related to multi-talker automatic speech recognition (ASR), target talker ASR, and ASR based on specific talker attributes such as sex, occurrence order, language, and keyword spoken.	LINGWEI MENG et. al.	arxiv-cs.CL	2024-09-13
80	LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often constrained by the capabilities of the speech encoders under varied acoustic conditions, such as accents. To address this, we propose LA-RAG, a novel Retrieval-Augmented Generation (RAG) paradigm for LLM-based ASR.	SHAOJUN LI et. al.	arxiv-cs.SD	2024-09-13
81	M$^{3}$V: A Multi-modal Multi-view Approach for Device-Directed Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in practice, these models often produce incorrect predictions for unaligned input pairs due to the unavoidable errors of automatic speech recognition (ASR). To address this challenge, we propose M$^{3}$V, a multi-modal multi-view approach for device-directed speech detection, which frames we frame the problem as a multi-view learning task that introduces unimodal views and a text-audio alignment view in the network besides the multi-modal.	ANNA WANG et. al.	arxiv-cs.SD	2024-09-13
82	Full-text Error Correction for Chinese Speech Recognition with Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR).	Zhiyuan Tang; Dong Wang; Shen Huang; Shidong Shang;	arxiv-cs.CL	2024-09-12
83	The Faetar Benchmark: Speech Recognition in A Very Under-Resourced Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition.	MICHAEL ONG et. al.	arxiv-cs.CL	2024-09-12
84	WhisperNER: Unified Open Named Entity and Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce WhisperNER, a novel model that allows joint speech transcription and entity recognition.	GIL AYACHE et. al.	arxiv-cs.CL	2024-09-12
85	Enhancing CTC-Based Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents LiteVSR2, an enhanced version of our previously introduced efficient approach to Visual Speech Recognition (VSR).	Hendrik Laux; Anke Schmeink;	arxiv-cs.CV	2024-09-11
86	Linear Time Complexity Conformers with SummaryMixing for Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, this work extends SummaryMixing to a Conformer Transducer that works in both a streaming and an offline mode.	Titouan Parcollet; Rogier van Dalen; Shucong Zhang; Sourav Batthacharya;	arxiv-cs.SD	2024-09-11
87	Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model.	Jihyun Lee; Solee Im; Wonjun Lee; Gary Geunbae Lee;	arxiv-cs.CL	2024-09-10
88	What Is Lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our research reveals that current text normalization practices, while aiming to standardize ASR outputs for fair comparison, by removing inconsistencies such as variations in spelling, punctuation, and special characters, are fundamentally flawed when applied to Indic scripts. Through empirical analysis using text similarity scores and in-depth linguistic examination, we demonstrate that these flaws lead to artificially improved performance metrics for Indic languages.	Kavya Manohar; Leena G Pillai; Elizabeth Sherly;	arxiv-cs.CL	2024-09-04
89	Quantification of Stylistic Differences in Human- and ASR-produced Transcripts of African American English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We categorize the kinds of stylistic differences between 6 transcription versions, 4 human- and 2 ASR-produced, of 10 hours of African American English (AAE) speech. Focusing on verbatim features and AAE morphosyntactic features, we investigate the interactions of these categories with how well transcripts can be compared via word error rate (WER).	ANNIKA HEUSER et. al.	arxiv-cs.CL	2024-09-04
90	Comparing Discrete and Continuous Space LLMs for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervised for both discrete and continuous types.	Yaoxun Xu; Shi-Xiong Zhang; Jianwei Yu; Zhiyong Wu; Dong Yu;	arxiv-cs.CL	2024-09-01
91	Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the overlapped encoding separation (EncSep) to fully utilize the benefits of the connectionist temporal classification (CTC) and attention hybrid loss.	Hao Shi; Yuan Gao; Zhaoheng Ni; Tatsuya Kawahara;	arxiv-cs.SD	2024-09-01
92	LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a large-scale far-field overlapping speech dataset, crafted to advance research in speech separation, recognition, and speaker diarization.	ZENGRUI JIN et. al.	arxiv-cs.SD	2024-09-01
93	ProGRes: Prompted Generative Rescoring on ASR N-Best Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs.	Ada Defne Tur; Adel Moumen; Mirco Ravanelli;	arxiv-cs.CL	2024-08-30
94	Measuring The Accuracy of Automatic Speech Recognition Solutions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: At the same time the DHH community reports serious issues with the accuracy and reliability of ASR.	Korbinian Kuhn; Verena Kersken; Benedikt Reuter; Niklas Egger; Gottfried Zimmermann;	arxiv-cs.CL	2024-08-29
95	Speech Recognition Transformers: Topological-lingualism Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper presents a comprehensive survey of transformer techniques oriented in speech modality.	Shruti Singh; Muskaan Singh; Virender Kadyan;	arxiv-cs.CL	2024-08-27
96	Self-supervised Speech Representations Still Struggle with African American Vernacular English Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American English (MAE). We evaluate four SSL models (wav2vec 2.0, HuBERT, WavLM, and XLS-R) on zero-shot Automatic Speech Recognition (ASR) for these two varieties and find that these models perpetuate the bias in performance against AAVE.	KALVIN CHANG et. al.	arxiv-cs.CL	2024-08-26
97	MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Employing conventional data augmentation for enhancing the noise robustness of summarization models is not feasible either due to the unavailability of sufficient medical dialogue audio recordings and corresponding ASR transcripts. To address this challenge, we propose MEDSAGE, an approach for generating synthetic samples for data augmentation using Large Language Models (LLMs).	KULUHAN BINICI et. al.	arxiv-cs.CL	2024-08-26
98	Developing Vocal System Impaired Patient-aimed Voice Quality Assessment Approach Using ASR Representation-included Multiple Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article addresses these challenges by showcasing the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech. This innovative approach aims to estimate voice quality of patients with impaired vocal systems.	SHAOXIANG DANG et. al.	arxiv-cs.SD	2024-08-22
99	Towards Measuring Fairness in Speech Recognition: Fair-Speech Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel dataset, Fair-Speech, a publicly released corpus to help researchers evaluate their ASR models for accuracy across a diverse set of self-reported demographic information, such as age, gender, ethnicity, geographic variation and whether the participants consider themselves native English speakers.	IRINA-ELENA VELICHE et. al.	arxiv-cs.AI	2024-08-22
100	The State of Commercial Automatic French Legal Speech Recognition Systems and Their Impact on Court Reporters Et Al Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We benchmark three ASR models, including commercial and open-source options, on their ability to recognize French legal speech using a curated dataset. Our study evaluates the performance of these systems using the Word Error Rate (WER) metric and introduces the Sonnex Distance to account for phonetic accuracy.	Nicolad Garneau; Olivier Bolduc;	arxiv-cs.CL	2024-08-21
101	Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the error predictors in two ways: first by predicting the errors made by a Switchboard ASR system on unseen data (Fisher), and then using that same predictor to estimate the behavior of an unrelated cloud-based ASR system on a novel task.	Prashant Serai; Peidong Wang; Eric Fosler-Lussier;	arxiv-cs.AI	2024-08-20
102	Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we report on a set of experiments aiming at assessing the performance of two parsing paradigms (graph-based parsing and sequence labeling based parsing) on speech parsing.	Adrien Pupier; Maximin Coavoux; J�r�me Goulian; Benjamin Lecouteux;	acl	2024-08-20
103	StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning.	SHAOLEI ZHANG et. al.	acl	2024-08-20
104	Error-preserving Automatic Speech Recognition of Young English Learners� Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To give corrective feedback, which is a crucial part of language learning, the ASR systems in our setting need to preserve the mistakes made by the language learners. In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their mistakes.	JANICK MICHOT et. al.	acl	2024-08-20
105	CopyNE: Better Contextual ASR By Copying Named Entities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we treat entities as indivisible wholes and introduce the idea of copying into ASR.	SHILIN ZHOU et. al.	acl	2024-08-20
106	Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn�t Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate what linguistic factors affect the performance of Automatic Speech Recognition (ASR) models.	Chihiro Taguchi; David Chiang;	acl	2024-08-20
107	XCB: An Effective Contextual Biasing Approach to Bias Cross-lingual Phrases in Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(XCB) module.	Xucheng Wan; Naijun Zheng; Kai Liu; Huan Zhou;	arxiv-cs.CL	2024-08-20
108	A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors. This paper proposes a transcription prompt-based audio-LLM by introducing an ASR expert as a transcription tokenizer and a hybrid Autoregressive (AR) Non-autoregressive (NAR) decoding approach to solve the above problems.	YANGZE LI et. al.	arxiv-cs.SD	2024-08-18
109	Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and …	Yinghao Aaron Li; Xilin Jiang; Jordan Darefsky; Ge Zhu; N. Mesgarani;	ArXiv	2024-08-13
110	Enhancing Dialogue Speech Recognition with Robust Contextual Awareness Via Noise Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Context Noise Representation Learning (CNRL) to enhance robustness against noisy context, ultimately improving dialogue speech recognition accuracy.	Wonjun Lee; San Kim; Gary Geunbae Lee;	arxiv-cs.CL	2024-08-12
111	Audio Enhancement for Computer Audition — An Iterative Training Paradigm Using Sample Importance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications.	Manuel Milling; Shuo Liu; Andreas Triantafyllopoulos; Ilhan Aslan; Björn W. Schuller;	arxiv-cs.SD	2024-08-12
112	LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a key limitation of this self-supervision lies in its primary focus on acoustic features, with minimal attention to the linguistic properties of the input. To address this gap, we propose Language Informed Test-Time Adaptation (LI-TTA), which incorporates linguistic insights during TTA for ASR.	Eunseop Yoon; Hee Suk Yoon; John Harvill; Mark Hasegawa-Johnson; Chang D. Yoo;	arxiv-cs.CL	2024-08-11
113	MooER: LLM-based Speech Recognition and Translation Models from Moore Threads Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present MooER, a LLM-based large-scale automatic speech recognition (ASR) / automatic speech translation (AST) model of Moore Threads.	JUNHAO XU et. al.	arxiv-cs.CL	2024-08-09
114	Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present accent clustering and mining schemes for fair speech recognition systems which can perform equally well on under-represented accented speech.	JAEYOUNG KIM et. al.	arxiv-cs.SD	2024-08-05
115	Contextualized Speech Recognition: Rethinking Second-Pass Rescoring with Generative Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a novel framework that diverges from typical second-pass rescoring methods.	Yixuan Tang; Anthony K. H. Tung;	ijcai	2024-08-03
116	ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms Using Linguistic Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, AE-based adversarial audio samples are susceptible to ASR updates. In this paper, we identify the root cause of these limitations, namely the inability to construct AE attack samples directly around the decision boundary of deep learning (DL) models.	PENG CHENG et. al.	arxiv-cs.CR	2024-08-03
117	MECOS: A Bilingual Manipuri-English Spontaneous Code-switching Speech Corpus for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View	Naorem Karline Singh; Y. J. Chanu; Hoomexsun Pangsatabam;	Comput. Speech Lang.	2024-08-01
118	On The Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use the comparison of five different TTS decoder architectures in the scope of synthetic data generation to show the impact on CTC-based speech recognition training.	Nick Rossenbach; Ralf Schlüter; Sakriani Sakti;	arxiv-cs.CL	2024-07-31
119	Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach called sentence-wise speech summarization (Sen-SSum), which generates text summaries from a spoken document in a sentence-by-sentence manner.	KOHEI MATSUURA et. al.	arxiv-cs.CL	2024-07-31
120	Improving Noisy Student Training for Low-resource Languages in End-to-End ASR Using CycleGAN and Inter-domain Losses Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Training a semi-supervised end-to-end speech recognition system using noisy student training has significantly improved performance. However, this approach requires a substantial …	C. Li; Ngoc Thang Vu;	ArXiv	2024-07-26
121	Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes DLG-MoE, a Dynamic Language Group-based MoE, which can effectively handle the CS-ASR task and leverage the advantages of parameter scaling.	HUKAI HUANG et. al.	arxiv-cs.CL	2024-07-26
122	Improving Domain-Specific ASR with LLM-Generated Contextual Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these advancements, they still struggle to accurately recognize domain specific words, such as proper nouns and technical terminologies. To address this problem, we propose a method to utilize the state-of-the-art Whisper without modifying its architecture, preserving its generalization performance while enabling it to leverage descriptions effectively.	Jiwon Suh; Injae Na; Woohwan Jung;	arxiv-cs.CL	2024-07-25
123	On The Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we evaluate the utility of synthetic data for training automatic speech recognition (ASR).	Benedikt Hilmes; Nick Rossenbach; and Ralf Schlüter;	arxiv-cs.CL	2024-07-25
124	Coupling Speech Encoders with Downstream Text Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a modular approach to building cascade speech translation (AST) models that guarantees that the resulting model performs no worse than the 1-best cascade baseline while preserving state-of-the-art speech recognition (ASR) and text translation (MT) performance for a given task.	Ciprian Chelba; Johan Schalkwyk;	arxiv-cs.CL	2024-07-24
125	Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern …	Rithik Sachdev; Zhong-Qiu Wang; Chao-Han Huck Yang;	arxiv-cs.CL	2024-07-23
126	Quantifying The Role of Textual Predictability in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use this method to demonstrate that a Wav2Vec 2.0-based model makes greater stronger use of textual context than a hybrid ASR model, in spite of not using an explicit language model, and also use it to shed light on recent results demonstrating poor performance of standard ASR systems on African-American English. We demonstrate that these mostly represent failures of acoustic–phonetic modelling.	Sean Robertson; Gerald Penn; Ewan Dunbar;	arxiv-cs.CL	2024-07-23
127	DMel: Speech Tokenization Made Simple Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using an LM-style transformer architecture for speech-text modeling, we comprehensively evaluate different speech tokenization methods on speech recognition (ASR) and speech synthesis (TTS).	HE BAI et. al.	arxiv-cs.CL	2024-07-22
128	SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an instance of this approach, we present SELM, an audio-conditioned language model for SER that predicts different emotion views.	Hazim Bukhari; Soham Deshmukh; Hira Dhamyal; Bhiksha Raj; Rita Singh;	arxiv-cs.SD	2024-07-21
129	Low-Resourced Speech Recognition for Iu Mien Language Via Weakly-Supervised Phoneme-based Multilingual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With less than 10 hours of transcribed Iu Mien language, this paper investigates and compares the three approaches for Iu Mien speech recognition.	LUKUAN DONG et. al.	arxiv-cs.SD	2024-07-18
130	Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding By Provenance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic speech recognition (ASR) models trained on large amounts of audio data are now widely used to convert speech to written text in a variety of applications from video captioning to automated assistants used in healthcare and other domains.	Changye Li; Trevor Cohen; Serguei Pakhomov;	arxiv-cs.CL	2024-07-18
131	Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present novel approaches that use a generative pretrained transformer (GPT) to identify paraphasias from transcripts as well as two end-to-end approaches that focus on modeling both automatic speech recognition (ASR) and paraphasia classification as multiple sequences vs. a single sequence.	Matthew Perez; Aneesha Sampath; Minxue Niu; Emily Mower Provost;	arxiv-cs.CL	2024-07-15
132	Textless Dependency Parsing By Labeled Sequence Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although their effectiveness is shown in capturing acoustic features, it is unclear in capturing lexical knowledge. This paper proposes a textless method for dependency parsing, examining its effectiveness and limitations.	Shunsuke Kando; Yusuke Miyao; Jason Naradowsky; Shinnosuke Takamichi;	arxiv-cs.CL	2024-07-14
133	CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer Based Streaming ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present CUSIDE-T, which successfully adapts the CUSIDE method over the recurrent neural network transducer (RNN-T) ASR architecture, instead of being based on the CTC architecture.	Wenbo Zhao; Ziwei Li; Chuan Yu; Zhijian Ou;	arxiv-cs.SD	2024-07-14
134	Empowering Whisper As A Joint Multi-Talker and Target-Talker Speech Recognition System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recognition tasks.	LINGWEI MENG et. al.	arxiv-cs.SD	2024-07-13
135	HebDB: A Weakly Supervised Dataset for Hebrew Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HebDB, a weakly supervised dataset for spoken language processing in the Hebrew language.	ARNON TURETZKY et. al.	arxiv-cs.CL	2024-07-10
136	Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study yields numerous significant findings that we are discussing in this paper.	Salima Mdhaffar; Haroun Elleuch; Fethi Bougares; Yannick Estève;	arxiv-cs.CL	2024-07-05
137	LearnerVoice: A Dataset of Non-Native English Learners’ Spontaneous Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our linguistic analysis reveals that transcriptions in our dataset contain L2S (L2 learner’s Spontaneous speech) features, consisting of ungrammatical expressions and disfluencies (e.g., filler words, word repetitions, self-repairs, false starts), significantly more than native speech datasets.	HAECHAN KIM et. al.	arxiv-cs.CL	2024-07-05
138	Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the development of audio-prompted LLMs there is the potential for even greater control options. In this work we demonstrate that with this greater flexibility the systems can be susceptible to model-control adversarial attacks.	Vyas Raina; Mark Gales;	arxiv-cs.SD	2024-07-05
139	TokenVerse: Towards Unifying Speech and NLP Tasks Via Transducer-based ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks.	SHASHI KUMAR et. al.	arxiv-cs.CL	2024-07-05
140	Romanization Encoding For Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce romanization encoding for script-heavy languages to optimize multilingual and code-switching Automatic Speech Recognition (ASR) systems.	WEN DING et. al.	arxiv-cs.CL	2024-07-05
141	Improving Accented Speech Recognition Using Data Augmentation Based on Unsupervised Text-to-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the use of unsupervised text-to-speech synthesis (TTS) as a data augmentation method to improve accented speech recognition.	Cong-Thanh Do; Shuhei Imai; Rama Doddipatla; Thomas Hain;	arxiv-cs.CL	2024-07-04
142	FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).	KEYU AN et. al.	arxiv-cs.SD	2024-07-04
143	Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluated three publicly available end-to-end models: Whisper, OWSM 3.1, and SeamlessM4T.	Tiia Sildam; Andra Velve; Tanel Alumäe;	arxiv-cs.CL	2024-07-04
144	Improving Self-supervised Pre-training Using Accent-Specific Codebooks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an accent-aware adaptation technique for self-supervised learning that introduces a trainable set of accent-specific codebooks to the self-supervised architecture.	Darshan Prabhu; Abhishek Gupta; Omkar Nitsure; Preethi Jyothi; Sriram Ganapathy;	arxiv-cs.CL	2024-07-04
145	Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a layer-adapted fusion (LAF) model, called Qifusion-Net, which does not require any prior knowledge about the target accent.	Jinming Chen; Jingyi Fang; Yuanzhong Zheng; Yaoxuan Wang; Haojun Fei;	arxiv-cs.SD	2024-07-03
146	Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs.	Zhiyuan Tang; Dong Wang; Shen Huang; Shidong Shang;	arxiv-cs.CL	2024-07-01
147	Less Is More: Accurate Speech Recognition & Translation Without Web-Scale Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that state-of-the art accuracy can be reached without relying on web-scale data.	KRISHNA C. PUVVADA et. al.	arxiv-cs.CL	2024-06-28
148	Enhanced ASR Robustness to Packet Loss with A Front-End Adaptation Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose using a front-end adaptation network connected to a frozen ASR model.	Yehoshua Dissen; Shiry Yonash; Israel Cohen; Joseph Keshet;	arxiv-cs.SD	2024-06-27
149	Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ZQ-Attack, a transfer-based adversarial attack on ASR systems in the zero-query black-box setting.	ZHENG FANG et. al.	arxiv-cs.CR	2024-06-27
150	ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma.	Ahmed Heakl; Youssef Zaghloul; Mennatullah Ali; Rania Hossam; Walid Gomaa;	arxiv-cs.CL	2024-06-26
151	Automatic Speech Recognition for Hindi Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The final phase of the research tested a neural network for accurately aligning the speech signal to hidden Markov model (HMM) states. This included implementing a novel backpropagation method that utilizes prior statistics of node co-activations.	Anish Saha; A. G. Ramakrishnan;	arxiv-cs.CL	2024-06-26
152	Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, we introduce a decoder-only model exclusively designed for streaming recognition, incorporating a dedicated boundary token to facilitate streaming recognition and employing causal attention masking during the training phase.	Peikun Chen; Sining Sun; Changhao Shan; Qing Yang; Lei Xie;	arxiv-cs.SD	2024-06-26
153	Dynamic Data Pruning for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works often entail significant overhead to achieve meaningful results. To fill this gap, this paper presents the first investigation of dynamic data pruning for ASR, finding that we can reach the full-data performance by dynamically selecting 70% of data.	QIAO XIAO et. al.	arxiv-cs.CL	2024-06-26
154	MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a regularization technique that facilitates the training of visual and audio-visual speech recognition models (VSR and AVSR) from scratch.	ADRIANA FERNANDEZ-LOPEZ et. al.	arxiv-cs.CV	2024-06-25
155	A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, several limitations persist, including limited fine-tuning options, a lack of mechanisms to enforce speech-text alignment, and high insertion errors especially in domain mismatch conditions. This paper presents a comprehensive solution to address these issues.	VAN TUNG PHAM et. al.	arxiv-cs.LG	2024-06-25
156	SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Classification (CTC) loss as a router in the encoder of SC-MoE to achieve a real-time streaming CS ASR system.	Shuaishuai Ye; Shunfei Chen; Xinhui Hu; Xinkang Xu;	arxiv-cs.SD	2024-06-25
157	FASA: A Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When generating datasets, human annotations are not scalable, and existing forced-alignment tools are not usable as they make impractical assumptions about the quality of the input transcriptions. To address these challenges, we propose a new forced-alignment tool, FASA, as a flexible and automatic speech aligner to extract high-quality aligned children’s speech data from many of the existing noisy children’s speech data.	Dancheng Liu; Jinjun Xiong;	arxiv-cs.CL	2024-06-25
158	Sequential Editing for Lifelong Training of Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Sequential Model Editing as a novel method to continually learn new domains in ASR systems.	Devang Kulshreshtha; Saket Dingliwal; Brady Houston; Nikolaos Pappas; Srikanth Ronanki;	arxiv-cs.CL	2024-06-25
159	Exploring The Capability of Mamba in Speech Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we compared Mamba with state-of-the-art Transformer variants for various speech applications, including ASR, text-to-speech, spoken language understanding, and speech summarization.	Koichi Miyazaki; Yoshiki Masuyama; Masato Murata;	arxiv-cs.SD	2024-06-24
160	Blending LLMs Into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present KIT’s offline submission in the constrained + LLM track by incorporating recently proposed techniques that can be added to any cascaded speech translation.	SAI KONERU et. al.	arxiv-cs.CL	2024-06-24
161	Perception of Phonological Assimilation By Neural Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds, and identifies the linguistic knowledge that is implemented by the model to compensate for assimilation during Automatic Speech Recognition (ASR).	Charlotte Pouw; Marianne de Heer Kloots; Afra Alishahi; Willem Zuidema;	arxiv-cs.CL	2024-06-21
162	PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As edge-based automatic speech recognition (ASR) technologies become increasingly prevalent for the development of intelligent and personalized assistants, three important …	Amir Nassereldine; Dancheng Liu; Chenhui Xu; Jinjun Xiong;	ArXiv	2024-06-21
163	PI-Whisper: Designing An Adaptive and Incremental Automatic Speech Recognition System for Edge Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show how the design of PI-Whisper allows for incremental adaptation of new characteristics without the need for repetitive retraining, enhances recognition capabilities, and improves equity and fairness across diverse speaker groups.	AMIR NASSERELDINE et. al.	arxiv-cs.CL	2024-06-21
164	Massive End-to-end Speech Recognition Models with Time Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate massive end-to-end automatic speech recognition (ASR) models with efficiency improvements achieved by time reduction.	WEIRAN WANG et. al.	naacl	2024-06-20
165	Lost in Transcription: Identifying and Quantifying The Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study evaluates six leading ASRs, analyzing their performance on both a real-world dataset of speech samples from individuals who stutter and a synthetic dataset derived from the widely-used LibriSpeech benchmark.	DENA MUJTABA et. al.	naacl	2024-06-20
166	Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on addressing the constraints faced when applying LLMs to ASR.	Murali Karthick Baskar; Andrew Rosenberg; Bhuvana Ramabhadran; Neeraj Gaur; Zhong Meng;	arxiv-cs.AI	2024-06-20
167	Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a two-stage method, Contrastive and Consistency Learning (CCL), that correlates error patterns between clean and noisy ASR transcripts and emphasizes the consistency of the latent features of the two transcripts.	Suyoung Kim; Jiyeon Hwang; Ho-Young Jung;	naacl	2024-06-20
168	ManWav: The First Manchu ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In a pioneering effort, we introduce the first-ever Manchu ASR model ManWav, leveraging Wav2Vec2-XLSR-53.	Jean Seo; Minha Kang; Sungjoo Byun; Sangah Lee;	arxiv-cs.CL	2024-06-19
169	Joint Vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While traditional approaches take on these tasks separately, we propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture. We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets.	Alexander Blatt; Aravind Krishnan; Dietrich Klakow;	arxiv-cs.CL	2024-06-19
170	Children’s Speech Recognition Through Discrete Token Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the integration of discrete speech tokens into children’s speech recognition systems as input without significantly degrading the ASR performance.	Vrunda N. Sukhadia; Shammur Absar Chowdhury;	arxiv-cs.CL	2024-06-19
171	Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose finding task-specific subnetworks within a multi-task SLU model via neural network pruning.	Hayato Futami; Siddhant Arora; Yosuke Kashiwagi; Emiru Tsunoo; Shinji Watanabe;	arxiv-cs.CL	2024-06-18
172	Bridging The Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, neural network-based (NN-based) SE often introduces artifacts into the enhanced signals and harms ASR performance, particularly when SE and ASR are independently trained. Therefore, this study introduces a simple yet effective SE post-processing technique to address the gap between various pre-trained SE and ASR models.	KUAN-CHEN WANG et. al.	arxiv-cs.SD	2024-06-18
173	Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we report on a set of experiments aiming at assessing the performance of two parsing paradigms (graph-based parsing and sequence labeling based parsing) on speech parsing.	Adrien Pupier; Maximin Coavoux; Jérôme Goulian; Benjamin Lecouteux;	arxiv-cs.CL	2024-06-18
174	CoSTA: Code-Switched Speech Translation Using Aligned Speech-Text Interleaving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text.	Bhavani Shankar; Preethi Jyothi; Pushpak Bhattacharyya;	arxiv-cs.CL	2024-06-16
175	Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the stealthiness of data poisoning, we propose a non-neural and fast algorithm called Random Spectrogram Rhythm Transformation (RSRT) in this paper.	Wenhan Yao; Jiangkun Yang; Yongqiang He; Jia Liu; Weiping Wen;	arxiv-cs.SD	2024-06-16
176	Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Simul-Whisper, which uses the time alignment embedded in Whisper’s cross-attention to guide auto-regressive decoding and achieve chunk-based streaming ASR without any fine-tuning of the pre-trained model.	Haoyu Wang; Guoqiang Hu; Guodong Lin; Wei-Qiang Zhang; Jian Li;	arxiv-cs.SD	2024-06-14
177	An Efficient Text Augmentation Approach for Contextualized Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA) technique, all while keeping computational costs minimal.	Naijun Zheng; Xucheng Wan; Kai Liu; Ziqing Du; Zhou Huan;	arxiv-cs.SD	2024-06-14
178	LASER: Learning By Aligning Self-supervised Representations of Speech for Improving Content-related Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent attempts have been made to address this issue with cost-effective self-supervised fine-tuning (SSFT) approaches. Continuing in this direction, a cost-effective SSFT method named LASER: Learning by Aligning Self-supervised Representations is presented.	Amit Meghanani; Thomas Hain;	arxiv-cs.CL	2024-06-13
179	Speech ReaLLM – Real-time Streaming Speech Recognition with Multimodal LLMs By Teaching The Flow of Time Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We introduce Speech ReaLLM, a new ASR architecture that marriesdecoder-onlyASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the …	FRANK SEIDE et. al.	ArXiv	2024-06-13
180	Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a transcription-free method for joint training using only audio signals.	WILLIAM RAVENSCROFT et. al.	arxiv-cs.SD	2024-06-13
181	Speech ReaLLM — Real-time Streaming Speech Recognition with Multimodal LLMs By Teaching The Flow of Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Speech ReaLLM, a new ASR architecture that marries decoder-only ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming.	FRANK SEIDE et. al.	arxiv-cs.CL	2024-06-13
182	EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR.	ZIYANG ZHUANG et. al.	arxiv-cs.SD	2024-06-13
183	The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of …	SHAREEF BABU KALLURI et. al.	ArXiv	2024-06-13
184	Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate what linguistic factors affect the performance of Automatic Speech Recognition (ASR) models.	Chihiro Taguchi; David Chiang;	arxiv-cs.CL	2024-06-13
185	ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches.	JIATONG SHI et. al.	arxiv-cs.SD	2024-06-12
186	Improving Child Speech Recognition with Augmented Child-like Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied …	Yuanyuan Zhang; Zhengjun Yue; T. Patel; O. Scharenborg;	ArXiv	2024-06-12
187	Training Data Augmentation for Dysarthric Automatic Speech Recognition By Text-to-Dysarthric-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automatic speech recognition (ASR) research has achieved impressive performance in recent years and has significant potential for enabling access for people with dysarthria (PwD) in augmentative and alternative communication (AAC) and home environment systems.	Wing-Zin Leung; Mattias Cross; Anton Ragni; Stefan Goetze;	arxiv-cs.SD	2024-06-12
188	Towards Unsupervised Speech Recognition Without Pronunciation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora by proposing the removal of reliance on a phoneme lexicon.	JUNRUI NI et. al.	arxiv-cs.CL	2024-06-12
189	The Interspeech 2024 Challenge on Speech Processing Using Discrete Units Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper outlines the challenge designs and baseline descriptions. We also collate baseline and selected submission systems, along with preliminary findings, offering valuable contributions to future research in this evolving field.	XUANKAI CHANG et. al.	arxiv-cs.SD	2024-06-11
190	PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models.	TRANG LE et. al.	arxiv-cs.CL	2024-06-11
191	AS-70: A Mandarin Stuttered Speech Dataset for Automatic Speech Recognition and Stuttering Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category.	RONG GONG et. al.	arxiv-cs.SD	2024-06-11
192	Reading Miscue Detection in Primary School Through Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1\%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8\%).	Lingyun Gao; Cristian Tejedor-Garcia; Helmer Strik; Catia Cucchiarini;	arxiv-cs.CL	2024-06-11
193	MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling Methods for Learning Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose (i) a Swap method to address pre-training and inference mismatch observed in HuBERT and (ii) incorporates Multicluster masked prediction loss for more effective utilization of the models capacity.	Hemant Yadav; Sunayana Sitaram; Rajiv Ratn Shah;	arxiv-cs.CL	2024-06-09
194	LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent years have witnessed significant progress in multilingual automatic speech recognition (ASR), driven by the emergence of end-to-end (E2E) models and the scaling of …	ZHESHU SONG et. al.	ArXiv	2024-06-07
195	Improving Zero-Shot Chinese-English Code-Switching ASR with KNN-CTC and Gated Monolingual Datastores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference.	JIAMING ZHOU et. al.	arxiv-cs.CL	2024-06-06
196	Hypernetworks for Personalizing ASR to Atypical Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech.	Max Müller-Eberstein; Dianna Yee; Karren Yang; Gautam Varma Mantena; Colin Lea;	arxiv-cs.LG	2024-06-06
197	Error-preserving Automatic Speech Recognition of Young English Learners’ Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To give corrective feedback, which is a crucial part of language learning, the ASR systems in our setting need to preserve the errors made by the language learners. In this work, we build an ASR system that satisfies these requirements: it works on spontaneous speech by young language learners and preserves their errors.	JANICK MICHOT et. al.	arxiv-cs.CL	2024-06-05
198	Text Injection for Neural Contextual Biasing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes contextual text injection (CTI) to enhance contextual ASR.	ZHONG MENG et. al.	arxiv-cs.CL	2024-06-05
199	Discrete Multimodal Transformers with A Pretrained Large Language Model for Mixed-Supervision Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a decoder-only Discrete Multimodal Language Model (DMLM), which can be flexibly applied to multiple tasks (ASR, T2S, S2TT, etc.) and modalities (text, speech, vision).	VIET ANH TRINH et. al.	arxiv-cs.CL	2024-06-04
200	Efficiently Train ASR Models That Memorize Less and Perform Better with Per-core Clipping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work systematically investigates the impact of a specific granularity of gradient clipping, namely per-core clip-ping (PCC), across training a wide range of ASR models.	LUN WANG et. al.	arxiv-cs.CR	2024-06-04
201	Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition Via Weakly Phonetic Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores the approach of pre-training with weakly phonetic supervision towards data-efficient MCL-ASR, which is called Whistle.	Saierdaer Yusuyin; Te Ma; Hao Huang; Wenbo Zhao; Zhijian Ou;	arxiv-cs.SD	2024-06-04
202	Speaking of Accent: A Content Analysis of Accent Misconceptions in ASR Research Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) researchers are working to address the differing transcription performance of ASR by accent or dialect. However, research often has a limited …	Kerri Prinos; Neal Patwari; Cathleen A. Power;	Proceedings of the 2024 ACM Conference on Fairness, …	2024-06-03
203	Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks, which typically feature a single transcript associated with hours-long audios.	Ara Yeroyan; Nikolay Karpov;	arxiv-cs.CL	2024-06-03
204	Pass The Butter: A Study on Desktop-classic Multitasking Robotic Arm Based on Advanced YOLOv7 and BERT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, locally deploying a natural language model (NLP-BERT), and integrating visual recognition (CV-YOLO) and speech recognition technology (ASR-Whisper) as inputs to achieve autonomous decision-making and rational action by the desktop robot.	HAOHUA QUE et. al.	arxiv-cs.RO	2024-05-27
205	Denoising LM: Pushing The Limits of Error Correction Models for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Denoising LM (DLM), which is a $\textit{scaled}$ error correction model trained with vast amounts of synthetic data, significantly exceeding prior attempts meanwhile achieving new state-of-the-art ASR performance.	ZIJIN GU et. al.	arxiv-cs.LG	2024-05-24
206	Let’s Fuse Step By Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Generative Fusion Decoding (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR).	CHAN-JAN HSU et. al.	arxiv-cs.CL	2024-05-23
207	You Don’t Understand Me!: Comparing ASR Results for L1 and L2 Speakers of Swedish IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus on the gap in performance between recognition results for native and non-native, read and spontaneous, Swedish utterances transcribed by different ASR services.	Ronald Cumbal; Birger Moell; Jose Lopes; Olof Engwall;	arxiv-cs.CL	2024-05-22
208	A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This prevents the human users to interrupt the robot, which limits speech-based human-robot interaction. To enable a more natural interaction which allows for such interruptions, we propose an audio processing pipeline for filtering out robot’s ego speech using only a single-channel microphone.	Yue Li; Florian A. Kunneman; Koen V. Hindriks;	arxiv-cs.HC	2024-05-22
209	Linguistic Analysis of Human-computer Interaction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This article reviews recent literature investigating speech variation in production and comprehension during spoken language communication between humans and devices. Human speech …	Georgia Zellou; Nicole Holliday;	Frontiers Comput. Sci.	2024-05-21
210	Non-autoregressive Real-time Accent Conversion Model with Voice Cloning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have developed the non-autoregressive model for real-time accent conversion with voice cloning.	Vladimir Nechaev; Sergey Kosyakov;	arxiv-cs.SD	2024-05-21
211	A Study on Speech Recognition By A Neural Network Based on English Speech Feature Parameters Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this study, from the perspective of English speech feature parameters, two feature parameters, the mel-frequency cepstral coefficient (MFCC) and filter bank (Fbank), were …	Congmin Mao; Sujing Liu;	J. Adv. Comput. Intell. Intell. Informatics	2024-05-20
212	Listen Again and Choose The Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ClozeGER, a new paradigm for ASR generative error correction.	YUCHEN HU et. al.	arxiv-cs.CL	2024-05-16
213	Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. …	Ahmed Adel Attia; Dorottya Demszky; Tolúlopé Ògúnrèmí; Jing Liu; Carol Y. Espy-Wilson;	ArXiv	2024-05-15
214	Towards Evaluating The Robustness of Automatic Speech Recognition Systems Via Audio Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an attack on ASR systems based on user-customized style transfer.	WEIFEI JIN et. al.	arxiv-cs.SD	2024-05-15
215	I Know What You Mean: Context-Aware Recognition to Enhance Speech-Based Games Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent advances in language processing and speech recognition open up a large opportunity for video game companies to embrace voice interaction as an intuitive feature and …	Nima Zargham; Mohamed Lamine Fetni; Laura Spillner; Thomas Muender; Rainer Malaka;	Proceedings of the CHI Conference on Human Factors in …	2024-05-11
216	Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple yet effective method to learn a universal acoustic realization of Whisper’s $\texttt{<\|endoftext\|>}$ token, which, when prepended to any speech signal, encourages the model to ignore the speech and only transcribe the special token, effectively `muting’ the model.	Vyas Raina; Rao Ma; Charles McGhee; Kate Knill; Mark Gales;	arxiv-cs.CL	2024-05-09
217	Lost in Transcription: Identifying and Quantifying The Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study evaluates six leading ASRs, analyzing their performance on both a real-world dataset of speech samples from individuals who stutter and a synthetic dataset derived from the widely-used LibriSpeech benchmark.	DENA MUJTABA et. al.	arxiv-cs.CL	2024-05-09
218	The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios.	JINGGUANG TIAN et. al.	arxiv-cs.SD	2024-05-08
219	Open Implementation and Study of BEST-RQ for Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we describe a re-implementation of a Random-projection quantizer and perform a preliminary study with a comparison to wav2vec 2.0 on four downstream tasks.	Ryan Whetten; Titouan Parcollet; Marco Dinarelli; Yannick Estève;	arxiv-cs.CL	2024-05-07
220	Mixat: A Data Set of Bilingual Emirati-English Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Mixat: a dataset of Emirati speech code-mixed with English.	Maryam Al Ali; Hanan Aldarmaki;	arxiv-cs.CL	2024-05-04
221	Unveiling The Potential of LLM-Based ASR on Chinese Open-Source Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules in the context of the speech foundation encoder-LLM ASR paradigm.	XUELONG GENG et. al.	arxiv-cs.SD	2024-05-03
222	Towards Fair and Inclusive Speech Recognition for Stuttering: Community-led Chinese Stuttered Speech Dataset Creation and Benchmarking Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Despite the widespread adoption of Automatic Speech Recognition (ASR) models in voice-operated products and conversational AI agents, current ASR models perform poorly for people …	Qisheng Li; Shaomei Wu;	Extended Abstracts of the CHI Conference on Human Factors …	2024-05-02
223	Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models.	FRANCISCO TEIXEIRA et. al.	arxiv-cs.LG	2024-05-02
224	Low-resource Speech Recognition and Dialect Identification of Irish in A Multi-task Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID).	Liam Lonergan; Mengjie Qian; Neasa Ní Chiaráin; Christer Gobl; Ailbhe Ní Chasaide;	arxiv-cs.CL	2024-05-02
225	Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency.	Dongyuan Li; Ying Zhang; Yusong Wang; Funakoshi Kataro; Manabu Okumura;	arxiv-cs.SD	2024-05-01
226	Efficient Compression of Multitask Multilingual Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the model still underperforms on a non-negligible number of under-represented languages, a problem exacerbated in smaller model versions. In this work, we examine its limitations, demonstrating the presence of speaker-related (gender, age) and model-related (resourcefulness and model size) bias.	Thomas Palmeira Ferraz;	arxiv-cs.CL	2024-05-01
227	Confides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue.	Sunwoo Ha; Chaehun Lim; R. Jordan Crouser; Alvitta Ottley;	arxiv-cs.HC	2024-04-30
228	Child Speech Recognition in Human-Robot Interaction: Problem Solved? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services.	RUBEN JANSSENS et. al.	arxiv-cs.CL	2024-04-26
229	Automatic Speech Recognition System-Independent Word Error Rate Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed.	Chanho Park; Mingjie Chen; Thomas Hain;	arxiv-cs.CL	2024-04-25
230	Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Killkan, the first dataset for automatic speech recognition (ASR) in the Kichwa language, an indigenous language of Ecuador.	Chihiro Taguchi; Jefferson Saransig; Dayana Velásquez; David Chiang;	arxiv-cs.CL	2024-04-23
231	Enhancing ASR Performance Through Relative Word Frequency in OCR and Normal Word Frequency Analysis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the growing interest in Conversational AI, a system that enables machines to engage in human-like dialogues, there has been an increased focus on Automatic Speech Recognition …	KYUDAN JUNG et. al.	2024 IEEE 6th International Conference on AI Circuits and …	2024-04-22
232	Semantically Corrected Amharic Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa.	Samuael Adnew; Paul Pu Liang;	arxiv-cs.CL	2024-04-20
233	Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a multi-task audio source separation (MTASS) based ASR model called JRSV, which Jointly Recognizes Speech and singing Voices.	Ye Bai; Chenxing Li; Hao Li; Yuanyuan Zhao; Xiaorui Wang;	arxiv-cs.SD	2024-04-17
234	Synthetic Conversations Improve Multi-Talker ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In recent times, automatic speech recognition (ASR) has seen remarkable progress, particularly in recognizing dominant speakers. Nevertheless, the challenge of multi-talker …	Thai-Binh Nguyen; Alexander Waibel;	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
235	Task Vector Algebra for ASR Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Vector representations of text and speech signals such as word2vec and wav2vec are used commonly in automatic speech recognition (ASR) and spoken language understanding systems. …	Gowtham Ramesh; Kartik Audhkhasi; B. Ramabhadran;	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
236	Automatic Speech Recognition Tuned for Child Speech in The Classroom Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: K-12 school classrooms have proven to be a challenging environment for Automatic Speech Recognition (ASR) systems, both due to background noise and conversation, and differences …	ROSY SOUTHWELL et. al.	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
237	Extending Large Language Models for Speech and Audio Captioning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multimodal large language models (LLMs) have shown promising visual perception abilities by connecting with image encoders, but their performance on auditory tasks has not yet …	CHANGLI TANG et. al.	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
238	The Fosafer System for The ICASSP2024 In-Car Multi-Channel Automatic Speech Recognition Challenge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the Fosafer’s submissions to the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge (ICMC-ASR), which includes both the Automatic Speech …	Shangkun Huang; Yuxuan Du; Yankai Wang; Jing Deng; Rong Zheng;	2024 IEEE International Conference on Acoustics, Speech, …	2024-04-14
239	A Study on The Adverse Impact of Synthetic Speech on Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The high-quality synthetic speech by TTS has been widely used in the field of human-computer interaction, bringing users better experience. However, synthetic speech is prone to …	Jian Huang; Yancheng Bai; Yang Cai; Wei Bian;	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
240	Improved Children’s Automatic Speech Recognition Combining Adapters and Synthetic Data Augmentation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Children’s automatic speech recognition (ASR) poses a significant challenge due to the high variability nature of children’s speech. The limited availability of training datasets …	Thomas Rolland; Alberto Abad;	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
241	Exploring Adapters with Conformers for Children’s Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The high variability in acoustic, pronunciation, and linguistic characteristics of children’s speech makes of children’s automatic speech recognition (ASR) a complex task. …	Thomas Rolland; Alberto Abad;	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
242	Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Traditionally, automatic speech recognition (ASR) and speaker change detection (SCD) systems have been independently trained to generate comprehensive transcripts accompanied by …	SHASHI KUMAR et. al.	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
243	SIR-Progressive Audio-Visual TF-Gridnet with ASR-Aware Selector for Target Speaker Extraction in MISP 2023 Challenge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: TF-GridNet has demonstrated its effectiveness in speech separation and enhancement. In this paper, we extend its capabilities for progressive audio-visual speech enhancement by …	ZHONGSHU HOU et. al.	2024 IEEE International Conference on Acoustics, Speech, …	2024-04-14
244	Enhancing Two-Stage Finetuning for Speech Emotion Recognition Using Adapters Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study investigates the effective finetuning of a pretrained model using adapters for speech emotion recognition (SER). Since emotion is related with linguistic and prosodic …	Yuan Gao; Hao Shi; Chenhui Chu; Tatsuya Kawahara;	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
245	Train Long and Test Long:Leveraging Full Document Contexts in Speech Processing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The quadratic memory complexity of self-attention has generally restricted Transformer-based models to utterance-based speech processing, preventing models from leveraging …	William Chen; Takatomo Kano; A. Ogawa; Marc Delcroix; Shinji Watanabe;	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
246	Improving Multi-Speaker ASR With Overlap-Aware Encoding And Monotonic Attention Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end (E2E) multi-speaker speech recognition with the serialized output training (SOT) strategy demonstrates good performance in modeling diverse speaker scenarios. However, …	TAO LI et. al.	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
247	Hot-Fixing Wake Word Recognition for End-to-End ASR Via Neural Model Reprogramming Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper proposes two novel variants of neural reprogramming to enhance wake word recognition in streaming end-to-end ASR models without updating model weights. The first, …	PIN-JUI KU et. al.	ICASSP 2024 – 2024 IEEE International Conference on …	2024-04-14
248	Automatic Speech Recognition Advancements for Indigenous Languages of The Americas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the fine-tuning of a state-of-the-art ASR model for each target language, using approximately 36.65 h of transcribed speech data from diverse sources enriched with data augmentation methods.	Monica Romero; Sandra Gomez; Ivan G. Torre;	arxiv-cs.CL	2024-04-12
249	An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss reweighting, leveraging distinct SSL-based embedding features.	Tien-Hong Lo; Fu-An Chao; Tzu-I Wu; Yao-Ting Sung; Berlin Chen;	arxiv-cs.SD	2024-04-11
250	VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in The Medical Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present VietMed – a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech.	Khai Le-Duc;	arxiv-cs.CL	2024-04-08
251	Mai Ho’omāuna I Ka ‘Ai: Language Models Improve Automatic Speech Recognition in Hawaiian Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we address the challenge of improving Automatic Speech Recognition (ASR) for a low-resource language, Hawaiian, by incorporating large amounts of independent text data into an ASR foundation model, Whisper.	Kaavya Chaparala; Guido Zarrella; Bruce Torres Fischer; Larry Kimura; Oiwi Parker Jones;	arxiv-cs.CL	2024-04-03
252	Noise Masking Attacks and Defenses for Pretrained Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They show that when a record has been seen at training time, the model will transcribe the noisy record with its memorized sensitive transcript. In our work, we extend these attacks beyond ASR models, to attack pretrained speech encoders.	Matthew Jagielski; Om Thakkar; Lun Wang;	arxiv-cs.LG	2024-04-02
253	BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data.	Alexandros Haliassos; Andreas Zinonos; Rodrigo Mira; Stavros Petridis; Maja Pantic;	arxiv-cs.CV	2024-04-02
254	Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Emotion Neural Transducer for fine-grained speech emotion recognition with automatic speech recognition (ASR) joint training.	Siyuan Shen; Yu Gao; Feng Liu; Hanyang Wang; Aimin Zhou;	arxiv-cs.SD	2024-03-28
255	Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel method combining multi-modal and multi-task unsupervised pre-training with a translation-based supervised mid-training approach.	YASH JAIN et. al.	arxiv-cs.CL	2024-03-28
256	DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as the named entity (NE) list grows, the problems of phonetic confusion in the NE list are exacerbated; for example, homophone ambiguities increase substantially. In view of this, we proposed a novel Description Augmented Named entity CorrEctoR (dubbed DANCER), which leverages entity descriptions to provide additional information to facilitate mitigation of phonetic confusion for NEC on ASR transcription.	Yi-Cheng Wang; Hsin-Wei Wang; Bi-Cheng Yan; Chi-Han Lin; Berlin Chen;	arxiv-cs.CL	2024-03-26
257	More Than Words: Advancements and Challenges in Speech Recognition for Singing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the challenges and advancements in speech recognition for singing, a domain distinctly different from standard speech recognition.	Anna Kruspe;	arxiv-cs.SD	2024-03-14
258	A Review on Gujarati Language Based Automatic Speech Recognition (ASR) Systems Related Papers Related Patents Related Grants Related Venues Related Experts View	Mohit Dua; Bhavesh Bhagat; Shelza Dua; N. Chakravarty;	Int. J. Speech Technol.	2024-03-12
259	Automatic Speech Recognition (ASR) for The Diagnosis of Pronunciation of Speech Sound Disorders in Korean Children Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures.	TAEKYUNG AHN et. al.	arxiv-cs.CL	2024-03-12
260	SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation.	Jiayu Du; Jinpeng Li; Guoguo Chen; Wei-Qiang Zhang;	arxiv-cs.CL	2024-03-12
261	Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While Automatic Speech Recognition (ASR) systems excel in controlled environments, challenges arise in robot-specifc setups due to unique microphone requirements and added noise …	Antonio Andriella; Raquel Ros; Yoav Ellinson; Sharon Gannot; S. Lemaignan;	2024 19th ACM/IEEE International Conference on Human-Robot …	2024-03-11
262	SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents a cost-effective SSFT method named Self-supervised Correspondence (SCORE) fine-tuning to adapt the SSL speech representations for content-related tasks.	Amit Meghanani; Thomas Hain;	arxiv-cs.CL	2024-03-10
263	Towards Decoupling Frontend Enhancement and Backend Recognition in Monaural Robust ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The divide between SE and ASR impedes the progress of robust ASR systems, especially as SE has made major advances in recent years. This paper focuses on eliminating this divide with an ARN (attentive recurrent network) time-domain and a CrossNet time-frequency domain enhancement models.	Yufeng Yang; Ashutosh Pandey; DeLiang Wang;	arxiv-cs.SD	2024-03-10
264	A New Benchmark for Evaluating Automatic Speech Recognition in The Arabic Call Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work aims to establish a robust benchmark that not only encompasses the broad spectrum of Arabic dialects but also emulates the real-world conditions of call-based communications.	QUSAI ABO OBAIDAH et. al.	arxiv-cs.AI	2024-03-07
265	Kirigami Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-based human activity recognition (HAR) is very popular because many human activities have unique sound signatures that can be detected using machine learning (ML) …	Sudershan Boovaraghavan; Haozhe Zhou; Mayank Goel; Yuvraj Agarwal;	Proceedings of the ACM on Interactive, Mobile, Wearable and …	2024-03-06
266	JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent …	Chang Sun; Hong Yang; Bo Qin;	ArXiv	2024-03-04
267	PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: A major drawback of supervised speech separation (SSep) systems is their reliance on synthetic data, leading to poor real-world generalization. Mixture invariant training (MixIT) …	Joonas Kalda; Clément Pagés; R. Marxer; Tanel Alumäe; Hervé Bredin;	The Speaker and Language Recognition Workshop	2024-03-04
268	Automatic Speech Recognition Using Advanced Deep Learning Approaches: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This survey offers a comprehensive review of DTL, FL, and RL-based ASR frameworks, aiming to provide insights into the latest developments and aid researchers and professionals in understanding the current challenges.	Hamza Kheddar; Mustapha Hemis; Yassine Himeur;	arxiv-cs.SD	2024-03-02
269	A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Silent Speech Interfaces (SSIs) offer a noninvasive alternative to brain-computer interfaces for soundless verbal communication. We introduce Multimodal Orofacial Neural Audio …	Tyler Benster; G. Wilson; Reshef Elisha; Francis R. Willett; S. Druckmann;	ArXiv	2024-03-02
270	Towards Inclusive Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View	Siyuan Feng; B. Halpern; O. Kudina; O. Scharenborg;	Comput. Speech Lang.	2024-03-01
271	Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach, post-decoder biasing, which constructs a transform probability matrix based on the distribution of training transcriptions.	Heyang Liu; Yu Wang; Yanfeng Wang;	arxiv-cs.CL	2024-03-01
272	Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer.	Jeehyun Lee; Yerin Choi; Tae-Jin Song; Myoung-Wan Koo;	arxiv-cs.CL	2024-02-29
273	Probing The Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following many researches in neural networks interpretability, we propose in this article a protocol that aims to determine which and where information is located in an ASR acoustic model (AM).	Quentin Raymondaud; Mickael Rouvier; Richard Dufour;	arxiv-cs.SD	2024-02-29
274	Exploration of Adapter for Noise Robust Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study thoroughly investigates adapter-based ASR adaptation in noisy environments.	Hao Shi; Tatsuya Kawahara;	arxiv-cs.SD	2024-02-28
275	A Multitask Co-training Framework for Improving Speech Translation By Leveraging Speech Recognition and Machine Translation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View	Yue Zhou; Yuxuan Yuan; Xiaodong Shi;	Neural Comput. Appl.	2024-02-27
276	Large Language Models Are Efficient Learners of Noise-Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER just like what robust ASR do, where one solution is introducing noise information as a conditioner into LLM.The latest work proposes a GER benchmark with HyPoradise dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by efficient LLM finetuning, which shows great effectiveness but lacks specificity on noise-robust ASR.	YUCHEN HU et. al.	iclr	2024-02-26
277	An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus exclusively on improving the acoustic encoder of E2E ASR to tackle the challenge caused by the codeswitching phenomenon.	Tzu-Ting Yang; Hsin-Wei Wang; Yi-Cheng Wang; Chi-Han Lin; Berlin Chen;	arxiv-cs.CL	2024-02-26
278	It’s Never Too Late: Fusing Acoustic Information Into Large Language Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite its effectiveness, GER introduces extra data uncertainty since the LLM is trained without taking into account acoustic information available in the speech signal. In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF).	CHEN CHEN et. al.	iclr	2024-02-26
279	LipVoicer: Generating Speech from Silent Videos Guided By Lip Reading Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present LipVoicer, a novel method that generates high-quality speech, even for in-the-wild and rich datasets, by incorporating the text modality.	Yochai Yemini; Aviv Shamsian; Lior Bracha; Sharon Gannot; Ethan Fetaya;	iclr	2024-02-26
280	Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are invoked and their placement in memory.	YANG LI et. al.	arxiv-cs.SD	2024-02-20
281	Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose the multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.	Qiushi Zhu; Jie Zhang; Yu Gu; Yuchen Hu; Lirong Dai;	aaai	2024-02-20
282	OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC).	Yifan Peng; Yui Sudo; Muhammad Shakeel; Shinji Watanabe;	arxiv-cs.CL	2024-02-19
283	Phantom in The Opera: Adversarial Music Attack for Robot Dialogue System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study explores the vulnerability of robot dialogue systems’ automatic speech recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a …	Sheng Li; Jiyi Li; Yang Cao;	Frontiers Comput. Sci.	2024-02-15
284	An Embarrassingly Simple Approach for LLM with Strong ASR Capacity IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM).	ZIYANG MA et. al.	arxiv-cs.CL	2024-02-13
285	The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research represents a pioneering effort in quantifying biases in the Portuguese language context through the application of MMS and Whisper, contributing to a better understanding of ASR systems’ performance in multilingual settings.	Ajinkya Kulkarni; Anna Tokareva; Rameez Qureshi; Miguel Couceiro;	arxiv-cs.CL	2024-02-12
286	Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM), designed to generate coherent spoken responses with naturally occurring prosodic features relevant to the given input speech without relying on explicit automatic speech recognition (ASR) or text-to-speech (TTS) systems.	HEESEUNG KIM et. al.	arxiv-cs.CL	2024-02-08
287	A Comprehensive Study of The Current State-of-the-Art in Nepali Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR).	Rupak Raj Ghimire; Bal Krishna Bal; Prakash Poudyal;	arxiv-cs.SD	2024-02-05
288	Digits Micro-model for Accurate and Secure Transactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present our work on creating micro models for multi-digit number recognition that handle diverse speaking styles reflecting real-world pronunciation patterns.	Chirag Chhablani; Nikhita Sharma; Jordan Hosier; Vijay K. Gurbani;	arxiv-cs.LG	2024-02-02
289	Streaming Sequence Transduction Through Dynamic Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams.	WEITING TAN et. al.	arxiv-cs.CL	2024-02-02
290	AccentFold: A Journey Through African Accents for Zero-Shot ASR Adaptation to Target Accents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While previous approaches have focused on modeling techniques or creating accented speech datasets, gathering sufficient data for the multitude of accents, particularly in the African context, remains impractical due to their sheer diversity and associated budget constraints. To address these challenges, we propose AccentFold, a method that exploits spatial relationships between learned accent embeddings to improve downstream Automatic Speech Recognition (ASR).	Abraham Toluwase Owodunni; Aditya Yadavalli; Chris Chinenye Emezue; Tobi Olatunji; Clinton C Mbataku;	arxiv-cs.CL	2024-02-02
291	Chinese Dialect Speech Recognition: A Comprehensive Survey Related Papers Related Patents Related Grants Related Venues Related Experts View	Qiang Li; Qianyu Mai; Mandou Wang; Mingjuan Ma;	Artif. Intell. Rev.	2024-01-31
292	Exploring The Limits of Decoder-only Models Trained on Public Speech Recognition Corpora Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate factors such as choice of training datasets and modeling components necessary for obtaining the best performance using public English ASR corpora alone.	Ankit Gupta; George Saon; Brian Kingsbury;	arxiv-cs.CL	2024-01-31
293	Improving ASR Performance with OCR Through Using Word Frequency Difference IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) …	Kyudan Jung; Seungmin Bae; N. Kim; Hyun Gon Ryu; Hyuk-Jae Lee;	2024 International Conference on Electronics, Information, …	2024-01-28
294	Byte Pair Encoding Is All You Need For Automatic Bengali Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent research highlights the dependency of BPE subword tokenization’s efficacy on the morphological nature of the language, particularly in languages rich in inflectional morphology, where fewer BPE merges suffice for generating highly productive tokens. Motivated by this, our study empirically identifies the optimal number of BPE tokens for Bengali, a language known for its morphological complexity, thus enhancing out-of-distribution automatic speech recognition (ASR) performance.	Ahnaf Mozib Samin;	arxiv-cs.CL	2024-01-27
295	Toward Practical Automatic Speech Recognition and Post-Processing: A Call for Explainable Error Benchmark Guideline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose the development of an Error Explainable Benchmark (EEB) dataset.	SEONMIN KOO et. al.	arxiv-cs.CL	2024-01-25
296	SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem.	CHYI-JIUNN LIN et. al.	arxiv-cs.CL	2024-01-24
297	MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker’s emotion, with the text …	Jiajun He; Xiaohan Shi; Xingfeng Li; Tomoki Toda;	arxiv-cs.CL	2024-01-24
298	Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware.	W. RONNY HUANG et. al.	arxiv-cs.CL	2024-01-23
299	Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers.	Michael Hentschel; Yuta Nishikawa; Tatsuya Komatsu; Yusuke Fujita;	arxiv-cs.CL	2024-01-22
300	Using Large Language Model for End-to-End Chinese ASR and NER Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach, however, has received less attention in the literature. In this work, we connect the Whisper encoder with ChatGLM3 and provide in-depth comparisons of these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks.	YUANG LI et. al.	arxiv-cs.CL	2024-01-20
301	SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct SlideAVSR, an AVSR dataset using scientific paper explanation videos.	Hao Wang; Shuhei Kurita; Shuichiro Shimizu; Daisuke Kawahara;	arxiv-cs.CV	2024-01-18
302	Joint Unsupervised and Supervised Training for Automatic Speech Recognition Via Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}.	A F M SAIF et. al.	arxiv-cs.CL	2024-01-13
303	LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to rare phrase lists, the slides within videos are synchronized in real-time with the speech, enabling the extraction of long contextual bias. Therefore, we propose a novel long-context biasing network (LCB-net) for audio-visual speech recognition (AVSR) to leverage the long-context information available in videos effectively.	Fan Yu; Haoxu Wang; Xian Shi; Shiliang Zhang;	arxiv-cs.SD	2024-01-12
304	XLS-R Deep Learning Model for Multilingual ASR on Low- Resource Languages: Indonesian, Javanese, and Sundanese Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This research paper focuses on the development and evaluation of Automatic Speech Recognition (ASR) technology using the XLS-R 300m model. The study aims to improve ASR …	Panji Arisaputra; Alif Tri Handoyo; Amalia Zahra;	ArXiv	2024-01-12
305	UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction.	JIAXIN GUO et. al.	arxiv-cs.CL	2024-01-11
306	Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: \textbf{Objectives}: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the “Cookie Theft” picture description task.	Changye Li; Weizhe Xu; Trevor Cohen; Serguei Pakhomov;	arxiv-cs.CL	2024-01-10
307	A New MmWave-Speech Multimodal Speech System for Voice User Interface Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Voice user interface (VUI) plays an essential role in intelligent scenes, e.g., smart homes. It provides a hands- and eyes-free human-machine interaction between humans and …	Tiantian Liu; Feng Lin;	GetMobile: Mobile Computing and Communications	2024-01-08
308	An Audio-quality-based Multi-strategy Approach for Target Speaker Extraction in The MISP 2023 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge.	RUNDUO HAN et. al.	arxiv-cs.SD	2024-01-08
309	High-precision Voice Search Query Correction Via Retrievable Speech-text Embedings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, ASR-hypothesis-based retrieval can yield poor precision if the textual hypotheses are too phonetically dissimilar to the transcript truth. In this paper, we eliminate the hypothesis-audio mismatch problem by querying the correction database directly using embeddings derived from the utterance audio; the embeddings of the utterance audio and candidate corrections are produced by multimodal speech-text embedding networks trained to place the embedding of the audio of an utterance and the embedding of its corresponding textual transcript close together.	CHRISTOPHER LI et. al.	arxiv-cs.CL	2024-01-08
310	Cross-Speaker Encoding Network for Multi-Talker Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a Cross-Speaker Encoding (CSE) network to address the limitations of SIMO models by aggregating cross-speaker representations.	JIAWEN KANG et. al.	arxiv-cs.SD	2024-01-08
311	ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.	HE WANG et. al.	arxiv-cs.SD	2024-01-07
312	MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current studies mainly focus on fusing the well-learned modality features, like the output of modality-specific encoders, without considering the contextual relationship during the modality feature learning. In this study, we propose a multi-layer cross-attention fusion based AVSR (MLCA-AVSR) approach that promotes representation learning of each modality by fusing them at different levels of audio/visual encoders.	He Wang; Pengcheng Guo; Pan Zhou; Lei Xie;	arxiv-cs.SD	2024-01-07
313	Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce a method that utilizes the ASR system’s lattice output instead of relying solely on the top hypothesis, aiming to encapsulate speech ambiguities and enhance SLU outcomes.	KEVIN EVERSON et. al.	arxiv-cs.CL	2024-01-05
314	An Approach for Speech Enhancement in Low SNR Environments Using Granular Speaker Embedding Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The proliferation of speech technology applications has led to an unprecedented demand for effective speech enhancement techniques, particularly in low Signal-to-Noise Ratio (SNR) …	Jayasree Saha; Rudrabha Mukhopadhyay; A. Agrawal; Surabhi Jain; C. V. Jawahar;	Proceedings of the 7th Joint International Conference on …	2024-01-04
315	Research on The Application of Speech Database Based on Emotional Feature Extraction in International Chinese Education and Teaching Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The advanced analysis of the relationship between acoustic and emotional characteristics of speech signals can effectively improve the interactivity and intelligence of computers. …	Xiangli Zhang;	Scalable Comput. Pract. Exp.	2024-01-04
316	Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models. To address this, we propose a perturbation-based method for assessing the susceptibility of an automatic speech recognition (ASR) model to hallucination at test time, which does not require access to the training dataset.	Rita Frieske; Bertram E. Shi;	arxiv-cs.CL	2024-01-03
317	Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-Visual Speech Recognition (AVSR) is a promising approach to improving the accuracy and robustness of speech recognition systems with the assistance of visual cues in …	Jiahong Li; Chenda Li; Yifei Wu; Yanmin Qian;	IEEE/ACM Transactions on Audio, Speech, and Language …	2024-01-01
318	Arabic Speech Recognition: Advancement and Challenges Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition is a captivating process that revolutionizes human-computer interactions, allowing us to interact and control machines through spoken commands. The foundation …	ASHIFUR RAHMAN et. al.	IEEE Access	2024-01-01
319	Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We investigate state-of-the-art automatic speech recognition (ASR) systems and provide thorough investigations on training methods to adapt them to low-resourced electrolaryngeal …	Lester Phillip Violeta; D. Ma; Wen-Chin Huang; T. Toda;	IEEE/ACM Transactions on Audio, Speech, and Language …	2024-01-01
320	Exploring Native and Non-Native English Child Speech Recognition With Whisper Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise children’s speech. This challenge is due to the high acoustic variability in children’s voices …	Rishabh Jain; Andrei Barcovschi; Mariam Yiwere; Peter Corcoran; H. Cucu;	IEEE Access	2024-01-01
321	Fine-Tuning ASR Models for Very Low-Resource Languages: A Study on Mvskoke Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent advancements in multilingual models for automatic speech recognition (ASR) have been able to achieve a high accuracy for languages with extremely limited resources. This …	Julia Mainzinger; Gina-Anne Levow;	Annual Meeting of the Association for Computational …	2024-01-01
322	Explainability of Speech Recognition Transformers Via Gradient-Based Attention Visualization Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In vision Transformers, attention visualization methods are used to generate heatmaps highlighting the class-corresponding areas in input images, which offers explanations on how …	Tianli Sun; Haonan Chen; Guosheng Hu; Lianghua He; Cairong Zhao;	IEEE Transactions on Multimedia	2024-01-01
323	Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE …	Hao Shi; M. Mimura; Tatsuya Kawahara;	IEEE/ACM Transactions on Audio, Speech, and Language …	2024-01-01
324	Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-Tokenization Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large Language Models (LLMs) have proven successful across a spectrum of speech-related tasks, such as speech recognition, text-to-speech, and spoken language understanding. …	Yukun Ma; Chong Zhang; Qian Chen; Wen Wang; Bin Ma;	IEEE Signal Processing Letters	2024-01-01
325	Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper addresses the training issues associated with neural network-based automatic speech recognition (ASR) under noise conditions. In particular, conventional joint training …	Geon Woo Lee; Hong Kook Kim; Duk-Jo Kong;	IEEE Access	2024-01-01
326	Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition Using Adversarial Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper presents an extensive comparative study of various data augmentation approaches to improve the robustness of pre-trained ASR model fine-tuning to dysarthric speech.	HUIMENG WANG et. al.	arxiv-cs.SD	2023-12-31
327	KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional evaluation metrics for ASR systems produce a singular aggregate score, which is insufficient for understanding specific system vulnerabilities. Therefore, we aim to address the limitations of the previous ASR evaluation methods by introducing the Korean Error Explainable Benchmark Dataset for ASR and Post-processing (KEBAP).	SEONMIN KOO et. al.	emnlp	2023-12-22
328	Accented Speech Recognition With Accent-specific Codebooks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.	Darshan Prabhu; Preethi Jyothi; Sriram Ganapathy; Vinit Unni;	emnlp	2023-12-22
329	Back Transcription As A Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models.	Marek Kubis; Pawel Sk�rzewski; Marcin Sowannski; Tomasz Zietkiewicz;	emnlp	2023-12-22
330	Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR).	SRIJITH RADHAKRISHNAN et. al.	emnlp	2023-12-22
331	CS2W: A Chinese Spoken-to-Written Style Conversion Dataset with Multiple Conversion Types Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the availability of datasets for this is limited. To address this issue, we present CS2W, a Chinese Spoken-to-Written style conversion dataset comprising 7,237 spoken sentences extracted from transcribed conversational texts.	Zishan Guo; Linhao Yu; Minghui Xu; Renren Jin; Deyi Xiong;	emnlp	2023-12-22
332	Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we attempt to resolve structurally ambiguous utterances into unambiguous texts in Indonesian using prosodic information.	RUHIYAH WIDIAPUTRI et. al.	emnlp	2023-12-22
333	CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address this robustness problem in downstream MT models by forcing the MT encoder to bring the representations of a noisy input closer to its clean version in the semantic space. This is achieved by introducing a contrastive learning method that leverages adversarial examples in the form of ASR outputs paired with their corresponding human transcripts to optimize the network parameters.	Sathish Indurthi; Shamil Chollampatt; Ravi Agrawal; Marco Turchi;	emnlp	2023-12-22
334	Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach, that builds on a pre-trained ASR model and extends it with an adaptive upstream module, that fuses audio and visual information.	Christopher Simic; Tobias Bocklet;	arxiv-cs.SD	2023-12-21
335	Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2.90% relative reduction in WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning.	ANIRUDH S. SUNDAR et. al.	arxiv-cs.LG	2023-12-21
336	KNN-CTC: Enhancing ASR Via Retrieval of CTC Pseudo Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The success of retrieval-augmented language models in various natural language processing (NLP) tasks has been constrained in automatic speech recognition (ASR) applications due to challenges in constructing fine-grained audio-text datastores. This paper presents kNN-CTC, a novel approach that overcomes these challenges by leveraging Connectionist Temporal Classification (CTC) pseudo labels to establish frame-level audio-text key-value pairs, circumventing the need for precise ground truth alignments.	JIAMING ZHOU et. al.	arxiv-cs.SD	2023-12-20
337	SpokesBiz — An Open Corpus of Conversational Polish Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We outline the general structure and content of the corpus, showcasing selected applications in linguistic research, evaluation and improvement of automatic speech recognition (ASR) systems	PIOTR PĘZIK et. al.	arxiv-cs.CL	2023-12-19
338	SpokesBiz – An Open Corpus of Conversational Polish Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper announces the early release of SpokesBiz, a freely available corpus of conversational Polish developed within the CLARIN-BIZ project and comprising over 650 hours of …	PIOTR PEZIK et. al.	ArXiv	2023-12-19
339	Arabic Speech Recognition Based on Self Supervised Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Arabic Speech Recognition (AASR) has gained significant attention in recent years due to its potential applications in various fields such as transcription, voice …	Hiba Adreese Younis; Yusra Faisal Mohammad;	2023 16th International Conference on Developments in …	2023-12-18
340	Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multi-talker overlapped speech recognition remains a significant challenge, requiring not only speech recognition but also speaker diarization tasks to be addressed. In this …	Peng Shen; Xugang Lu; Hisashi Kawai;	ArXiv	2023-12-18
341	Towards Robust Packet Loss Concealment System With ASR-Guided Representations Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Despite the significant advancements and promising performance of deep learning-based packet loss concealment (PLC) systems in transmission systems, their focus on modeling …	Dali Yang; Joon-Hyuk Chang;	2023 IEEE Automatic Speech Recognition and Understanding …	2023-12-16
342	Seq2seq for Automatic Paraphasia Detection in Aphasic Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel, sequence-to-sequence (seq2seq) model that is trained end-to-end (E2E) to perform both ASR and paraphasia detection tasks.	MATTHEW PEREZ et. al.	arxiv-cs.SD	2023-12-16
343	Ending The Blind Flight: Analyzing The Impact of Acoustic and Lexical Factors on WAV2VEC 2.0 in Air-Traffic Control Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transformer neural networks have shown remarkable success on standard automatic speech recognition (ASR) benchmarks. However, they are known to be less robust against domain …	Alexander Blatt; Badr M. Abdullah; D. Klakow;	2023 IEEE Automatic Speech Recognition and Understanding …	2023-12-16
344	Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although end-to-end (E2E) automatic speech recognition (ASR) systems excel in general tasks, they frequently struggle with accurately recognizing personal rare words. Leveraging …	Sibo Tong; Philip Harding; Simon Wiesler;	2023 IEEE Automatic Speech Recognition and Understanding …	2023-12-16
345	Parameter-Efficient Cross-Language Transfer Learning for A Language-Modular Audiovisual Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In audiovisual speech recognition (AV-ASR), for many languages only few audiovisual data is available. Building upon an English model, in this work, we first apply and analyze …	ZHENGYANG LI et. al.	2023 IEEE Automatic Speech Recognition and Understanding …	2023-12-16
346	Parameter-Efficient Tuning with Adaptive Bottlenecks for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transfer learning from large multilingual pretrained models, like XLSR, has become the new paradigm for Automatic Speech Recognition (ASR). Considering their ever-increasing size, …	GEOFFROY VANDERREYDT et. al.	2023 IEEE Automatic Speech Recognition and Understanding …	2023-12-16
347	Conformer-Based Speech Recognition On Extreme Edge-Computing Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation.	MINGBIN XU et. al.	arxiv-cs.LG	2023-12-16
348	Leveraging The Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Indonesia is home to roughly 700 languages, which amounts to about ten percent of the global total, positioning it as the second-most linguistically diverse country after Papua …	S. Sakti; Benita Angela Titalim;	2023 IEEE Automatic Speech Recognition and Understanding …	2023-12-16
349	LiteVSR: Efficient Visual Speech Recognition By Learning from Speech Representations of Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model.	HENDRIK LAUX et. al.	arxiv-cs.CV	2023-12-15
350	Automatic Channel Selection and Spatial Feature Integration for Multi-channel Speech Recognition Across Various Array Topologies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition performance in real-world environments. Addressing this task, we introduce an ASR system that demonstrates exceptional performance across various array topologies.	BINGSHEN MU et. al.	arxiv-cs.SD	2023-12-15
351	Knowledge Prompt for Whisper: An ASR Entity Correction Approach with Knowledge Base Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Entity correction is crucial in Automatic Speech TABLE I Recognition (ASR), since erroneous entities seriously affect our understanding of ASR results. In this paper, in order to …	MIN ZHANG et. al.	2023 IEEE International Conference on Big Data (BigData)	2023-12-15
352	Improvement of Automatic Speech Recognition Systems Utilizing 2D Adaptive Wavelet Transformation Applied to Recurrence Plot of Speech Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View	S. Firooz; F. Almasganj; Yasser Shekofteh;	Signal, Image and Video Processing	2023-12-15
353	On The Compression of Shallow Non-causal ASR Models Using Knowledge Distillation and Tied-and-reduced Decoder for Low-latency On-device Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose shallow cascaded model by combining various model compression techniques such as knowledge distillation, shared decoder, and tied-and-reduced transducer network in order to reduce the model footprint.	NAGARAJ ADIGA et. al.	arxiv-cs.SD	2023-12-15
354	Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most past studies have simplified the learning complexity of the model by splitting the code-switching task into multiple tasks dealing with a single language and then learning the domain-specific knowledge of each language separately. Therefore, in this paper, we attempt to introduce language identification information into the middle layer of the ASR model’s encoder.	Tzu-Ting Yang; Hsin-Wei Wang; Berlin Chen;	arxiv-cs.CL	2023-12-15
355	Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently audio-visual speech recognition (AVSR), which better leverages video modality as additional information to extend automatic speech recognition (ASR), has shown promising …	Fan Yu; Haoxu Wang; Ziyang Ma; Shiliang Zhang;	ICASSP 2024 – 2024 IEEE International Conference on …	2023-12-14
356	FastInject: Injecting Unpaired Text Data Into CTC-Based ASR Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models have achieved impressive results, especially with the …	Keqi Deng; Phil Woodland;	ICASSP 2024 – 2024 IEEE International Conference on …	2023-12-14
357	Towards Automatic Data Augmentation for Disordered Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data …	ZENGRUI JIN et. al.	ICASSP 2024 – 2024 IEEE International Conference on …	2023-12-14
358	Extending Whisper with Prompt Tuning to Target-speaker ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work leverages prompt tuning, a parameter-efficient fine-tuning approach, to extend Whisper, a large-scale single-talker ASR model, to TS-ASR.	Hao Ma; Zhiyuan Peng; Mingjie Shao; Jing Li; Ju Liu;	arxiv-cs.CL	2023-12-13
359	ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, a time-domain recognition-oriented speech enhancement (ROSE) framework is proposed to improve speech intelligibility and also advance ASR accuracy based on convolutional encoder-decoder-based U-Net framework, which serves as a plug-and-play tool in ATC scenarios and does not require additional retraining of the ASR model.	Xincheng Yu; Dongyue Guo; Jianwei Zhang; Yi Lin;	arxiv-cs.SD	2023-12-10
360	Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models.	Wonjun Lee; Gary Geunbae Lee; Yunsu Kim;	arxiv-cs.CL	2023-12-06
361	Taiwanese Hakka Across Taiwan Corpus and Formosa Speech Recognition Challenge 2023 – Hakka ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: To revive the endangered Taiwanese Hakka language, the first large-scale Taiwanese Hakka speech corpus across Taiwan (HAT) was developed, representing modern Taiwanese Hakka …	YUAN-FU LIAO et. al.	2023 26th Conference of the Oriental COCOSDA International …	2023-12-04
362	End-to-End Speech-to-Text Translation: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, researchers have been exploring end-to-end (E2E) models for ST translation.	Nivedita Sethiya; Chandresh Kumar Maurya;	arxiv-cs.CL	2023-12-02
363	FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach called FAT-HuBERT, which leverages distortion-invariant self-supervised learning (SSL) to enhance the robustness of ASR.	Dongning Yang; Wei Wang; Yanmin Qian;	arxiv-cs.SD	2023-11-29
364	End-to-end Joint Punctuated and Normalized ASR with A Limited Amount of Punctuated Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two approaches to train an end-to-end joint punctuated and normalized ASR system using limited punctuated data.	Can Cui; Imran Ahamad Sheikh; Mostafa Sadeghi; Emmanuel Vincent;	arxiv-cs.CL	2023-11-29
365	Research Applications of Hidden Markov Models in Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Reinforcement Learning, a vital branch of Machine Learning, has gained significant attention due to its interactive and goal-oriented learning approach. Its primary objective is …	Zeng Li; Zhenzhen Wang; Xiaofei Sun;	Proceedings of the 2023 International Conference on …	2023-11-18
366	On The Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional NSER approaches have proven effective in mitigating the impact of artificial noise sources, such as white Gaussian noise, but are limited to non-stationary noises in real-world environments due to their complexity and uncertainty. To overcome this limitation, we introduce a new method for NSER by adopting the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech.	Xiaohan Shi; Jiajun He; Xingfeng Li; Tomoki Toda;	arxiv-cs.SD	2023-11-13
367	Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Decoupling and Interacting Multi-task Network (DIMNet) for joint speech and accent recognition, which is comprised of a connectionist temporal classification (CTC) branch, an AR branch, an ASR branch, and a bottom feature encoder.	Qijie Shao; Pengcheng Guo; Jinghao Yan; Pengfei Hu; Lei Xie;	arxiv-cs.SD	2023-11-12
368	Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to model speech tokens in an autoregressive way, similar to text.	QIAN CHEN et. al.	arxiv-cs.CL	2023-11-08
369	Improved Child Text-to-Speech Synthesis Through Fastpitch-based Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel approach that leverages the Fastpitch text-to-speech (TTS) model for generating high-quality synthetic child speech.	Rishabh Jain; Peter Corcoran;	arxiv-cs.SD	2023-11-07
370	Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset.	RABINDRA NATH NANDI et. al.	arxiv-cs.CL	2023-11-06
371	COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a cost-effective method to integrate speech into a large language model (LLM), resulting in a Contextual Speech Model with Instruction-following/in-context-learning Capabilities (COSMIC) multi-modal LLM.	JING PAN et. al.	arxiv-cs.CL	2023-11-03
372	Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models Via Language-Specific Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the model still underperforms on a non-negligible number of under-represented languages, a problem exacerbated in smaller model versions. In this work, we propose DistilWhisper, an approach able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities.	Thomas Palmeira Ferraz; Marcely Zanon Boito; Caroline Brun; Vassilina Nikoulina;	arxiv-cs.CL	2023-11-02
373	Learning Adapters for Code-Switching Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multilingual code-switching speech recognition has been an emerging research direction in real-world applications since most of speakers are bilingual or multilingual. A …	Chun-Yi He; Jen-Tzung Chien;	2023 Asia Pacific Signal and Information Processing …	2023-10-31
374	Incorporating Pinyin Into Pipeline Named Entity Recognition from Chinese Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Named Entity Recognition (NER) from speech is usually implemented through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech Recognition (ASR) …	MIN ZHANG et. al.	2023 Asia Pacific Signal and Information Processing …	2023-10-31
375	ASR Model Adaptation for Rare Words Using Synthetic Data Generated By Multiple Text-To-Speech Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) for rare words is difficult as there are little relevant text-audio data pairs to train an ASR model. To obtain more text-audio pairs, text-only …	Kwok Chin Yuen; Haoyang Li; Chng Eng Siong;	2023 Asia Pacific Signal and Information Processing …	2023-10-31
376	Transformer-based Automatic Speech Recognition of Simultaneous Interpretation with Auxiliary Input of Source Language Text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) of simultaneous interpretation is challenging due to disfluencies such as hesitations, filled pauses, interruptions, and self-repairs. …	Shuta Taniguchi; Tsuneo Kato; Akihiro Tamura; Keiji Yasuda;	2023 Asia Pacific Signal and Information Processing …	2023-10-31
377	Synthetic Data Augmentation for ASR with Domain Filtering Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent studies have shown that synthetic speech can effectively serve as training data for automatic speech recognition models. Text data for synthetic speech is mostly obtained …	Tuan Vu Ho; Shota Horiguchi; Shinji Watanabe; Paola Garcia; Takashi Sumiyoshi;	2023 Asia Pacific Signal and Information Processing …	2023-10-31
378	MUST: A Multilingual Student-Teacher Learning Approach for Low-resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, the aforementioned limitation is addressed by proposing a MUltilingual Student-Teacher (MUST) learning which exploits a posteriors mapping approach.	Muhammad Umar Farooq; Rehan Ahmad; Thomas Hain;	arxiv-cs.CL	2023-10-28
379	MADGF: Multi-Agent Data Generation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic Speech Recognition (ASR) systems predominantly cater to monolingual inputs and struggle with the complexity introduced by mixed language audio. In this paper, we present a novel Multi-Agent Data Generation Framework (MADGF) to address this challenge.	Peng Xie; Kani Chen;	arxiv-cs.SD	2023-10-27
380	A Review on Speech Recognition for Under-Resourced Languages: A Case Study of Vietnamese Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Fundamental speech recognition technologies for high-resourced languages are currently successful to build high-quality applications with the use of deep learning models. However, …	Trung-Nghia Phung; Duc-Binh Nguyen; Ngoc-Phuong Pham;	Int. J. Knowl. Syst. Sci.	2023-10-27
381	Evaluating A Fine-Tuned Whisper Model on Underrepresented Romanian Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech datasets available for training Romanian automatic speech recognition (ASR) systems are constructed around similar demographics (male voices, age between 19-29 years). In …	V. Pais; V. Mititelu; Radu Ion; Elena Irimia;	2023 International Conference on Speech Technology and …	2023-10-25
382	Back Transcription As A Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models.	Marek Kubis; Paweł Skórzewski; Marcin Sowański; Tomasz Ziętkiewicz;	arxiv-cs.CL	2023-10-25
383	A Comparative Analysis Between Conformer-Transducer, Whisper, and Wav2vec2 for Improving The Child Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Automatic Speech Recognition (ASR) systems have progressed significantly in their performance on adult speech data; however, transcribing child speech remains challenging due to …	Andrei Barcovschi; Rishabh Jain; Peter Corcoran;	2023 International Conference on Speech Technology and …	2023-10-25
384	Uncovering Bias in ASR Systems: Evaluating Wav2vec2 and Whisper for Dutch Speakers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: It is crucial that ASR systems can handle the wide range of variations in speech of speakers from different demographic groups, with different speaking styles, and of speakers …	Márcio Fuckner; Sophie Horsman; Pascal Wiggers; Iskaj Janssen;	2023 International Conference on Speech Technology and …	2023-10-25
385	Dysarthric Speech Recognition Using Depthwise Separable Convolutions: Preliminary Study Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As a neurological disability that affects muscles involved in articulation, dysarthria is a speech impairment that leads to reduced speech intelligibility. In severe cases, these …	Seyed Reza Shahamiri; Krishnendu Mandal; Sudeshna Sarkar;	2023 International Conference on Speech Technology and …	2023-10-25
386	ArTST: Arabic Text and Speech Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.	Hawau Olamide Toyin; Amirbek Djanibekov; Ajinkya Kulkarni; Hanan Aldarmaki;	arxiv-cs.CL	2023-10-25
387	Hypotheses Paradise: An Open and Strong Baseline for Speech Recognition with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction.	CHEN CHEN et. al.	nips	2023-10-24
388	CDSD: Chinese Dysarthria Speech Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research.	MENGYI SUN et. al.	arxiv-cs.SD	2023-10-24
389	How Much Context Does My Attention-Based ASR System Need? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct an empirical study on the effect of scaling the sequence length used to train/evaluate (dense-attention-based) acoustic models on speech recognition performance.	Robert Flynn; Anton Ragni;	arxiv-cs.CL	2023-10-24
390	Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder.	SARA PAPI et. al.	arxiv-cs.CL	2023-10-23
391	Intuitive Multilingual Audio-Visual Speech Recognition with A Single-Trained Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset.	Joanna Hong; Se Jin Park; Yong Man Ro;	arxiv-cs.MM	2023-10-23
392	Conversational Speech Recognition By Learning Audio-textual Cross-modal Contextual Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel conversational ASR system, extending the Conformer encoder-decoder model with cross-modal conversational representation.	KUN WEI et. al.	arxiv-cs.SD	2023-10-22
393	Intelligibility Prediction with A Pretrained Noise-robust Automatic Speech Recognition Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction …	Zehai Tu; Ning Ma; Jon Barker;	ArXiv	2023-10-20
394	BUT CHiME-7 System Description Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the joint effort of Brno University of Technology (BUT), AGH University of Krakow and University of Buenos Aires on the development of Automatic Speech Recognition systems for the CHiME-7 Challenge.	MARTIN KARAFIÁT et. al.	arxiv-cs.SD	2023-10-18
395	VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the linguistic diversity and variations, it is challenging to build a robust and generalized ASR system for Arabic. In this work, we address this gap by developing and demoing a system, dubbed VoxArabica, for dialect identification (DID) as well as automatic speech recognition (ASR) of Arabic.	Abdul Waheed; Bashar Talafha; Peter Sullivan; AbdelRahim Elmadany; Muhammad Abdul-Mageed;	arxiv-cs.CL	2023-10-17
396	Generative Error Correction for Code-switching Speech Recognition Using Large Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. Despite the recent advances in automatic speech recognition (ASR), …	CHEN CHEN et. al.	ArXiv	2023-10-17
397	Correction Focused Language Model Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel correction focused LM training approach which aims to prioritize ASR fallible words.	Yingyi Ma; Zhe Liu; Ozlem Kalinli;	arxiv-cs.CL	2023-10-17
398	Multi-stage Large Language Model Correction for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems.	Jie Pu; Thai-Son Nguyen; Sebastian Stüker;	arxiv-cs.CL	2023-10-17
399	Noise-Robust Automatic Speech Recognition for Industrial and Urban Environments Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) models can achieve human parity, but their performance degrades significantly when used in noisy industrial and urban environments. In this …	Daniil Orel; H. A. Varol;	IECON 2023- 49th Annual Conference of the IEEE Industrial …	2023-10-16
400	Detecting Speech Abnormalities With A Perceiver-Based Sequence Classifier That Leverages A Universal Speech Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders. We combine this classifier with a Universal Speech …	H. SOLTAU et. al.	2023 IEEE Automatic Speech Recognition and Understanding …	2023-10-16
401	End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder.	Can Cui; Imran Ahamad Sheikh; Mostafa Sadeghi; Emmanuel Vincent;	arxiv-cs.CL	2023-10-16
402	Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification.	ZHIHONG LEI et. al.	arxiv-cs.LG	2023-10-15
403	Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing leveraging the power of deep learning models in accurately delivering spot-on transcriptions across a wide variety of vocabularies and speaking styles.	Ankitha Sudarshan; Vinay Samuel; Parth Patwa; Ibtihel Amara; Aman Chadha;	arxiv-cs.CL	2023-10-14
404	SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities.	ZHEHUAI CHEN et. al.	arxiv-cs.CL	2023-10-13
405	On The Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It has been shown that TTS-generated outputs still do not have the same qualities as real data. In this work we focus on the temporal structure of synthetic data and its relation to ASR training.	Nick Rossenbach; Benedikt Hilmes; Ralf Schlüter;	arxiv-cs.CL	2023-10-12
406	Adapting The Adapters for Code-switching in Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network.	Atharva Kulkarni; Ajinkya Kulkarni; Miguel Couceiro; Hanan Aldarmaki;	arxiv-cs.CL	2023-10-11
407	A Study of Speech Recognition, Speech Translation, and Speech Summarization of TED English Lectures Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Our research focuses on developing an automatic speech recognition system for English lectures, which involves summarizing the content and providing Japanese subtitles. Subtitling …	Kazumasa Yamamoto; Haruhiko Banno; Haruki Sakurai; Toichiro Adachi; Seiichi Nakagawa;	2023 IEEE 12th Global Conference on Consumer Electronics …	2023-10-10
408	Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR).	SRIJITH RADHAKRISHNAN et. al.	arxiv-cs.CL	2023-10-10
409	No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While in the context of hybrid ASR models several solutions have been proposed, the gender bias issue has not been explicitly addressed in end-to-end neural architectures. To fill this gap, we propose a data augmentation technique that manipulates the fundamental frequency (f0) and formants.	Dennis Fucci; Marco Gaido; Matteo Negri; Mauro Cettolo; Luisa Bentivogli;	arxiv-cs.CL	2023-10-10
410	Acoustic Model Fusion for End-to-end Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch.	ZHIHONG LEI et. al.	arxiv-cs.SD	2023-10-10
411	ToozKit: System for Experimenting with Captions on A Head-worn Display Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The advent of Automatic Speech Recognition (ASR) has made real-time captioning for the Deaf and Hard-of-Hearing (DHH) community possible, and integration of ASR into Head-worn …	Peter Feng; David Martin; Thad Starner;	Adjunct Proceedings of the 2023 ACM International Joint …	2023-10-08
412	Ed-cec: Improving Rare Word Recognition Using Asr Postprocessing Based on Error Detection and Context-aware Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automatic speech recognition (ASR) systems often encounter difficulties in accurately recognizing rare words, leading to errors that can have a negative impact on downstream tasks such as keyword spotting, intent detection, and text summarization. To address this challenge, we present a novel ASR postprocessing method that focuses on improving the recognition of rare words through error detection and context-aware error correction.	Jiajun He; Zekun Yang; Tomoki Toda;	arxiv-cs.AI	2023-10-08
413	Improving End-to-End Speech Processing By Efficient Text Data Utilization with Latent Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models.	JIANQIAO LU et. al.	arxiv-cs.CL	2023-10-08
414	LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LauraGPT, a novel unified audio-and-text GPT-based LLM for audio recognition, understanding, and generation.	ZHIHAO DU et. al.	arxiv-cs.SD	2023-10-06
415	Dementia Assessment Using Mandarin Speech with An Attention-based Speech Recognition Encoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper utilizes a speech recognition model to construct a dementia assessment system tailored for Mandarin speakers during the picture description task.	ZIH-JYUN LIN et. al.	arxiv-cs.CL	2023-10-05
416	EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose EFFUSE, a novel approach that uses a single SSL model to mimic the features of multiple SSL models via prediction, resulting in a lightweight framework with competitive performance.	Tejes Srivastava; Jiatong Shi; William Chen; Shinji Watanabe;	arxiv-cs.SD	2023-10-05
417	LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-end ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models.	ALEKSANDR MEISTER et. al.	arxiv-cs.CL	2023-10-04
418	Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM.	Liming Wang; Mark Hasegawa-Johnson; Chang D. Yoo;	arxiv-cs.CL	2023-10-03
419	Evaluating Speech Synthesis By Training Recognizers on Synthetic Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Prior works focus on evaluating synthetic speech based on pre-trained speech recognition models, however, this can be limiting since this approach primarily measures speech intelligibility. In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.	DAREEN ALHARTHI et. al.	arxiv-cs.CL	2023-10-01
420	AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Several publications have highlighted racial bias with speech-to-text algorithms and performance on minority accents lags significantly.	TOBI OLATUNJI et. al.	arxiv-cs.CL	2023-09-30
421	AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels.	Andrew Rouditchenko; Ronan Collobert; Tatiana Likhomanenko;	arxiv-cs.LG	2023-09-29
422	SLM: Bridge The Thin Gap Between Speech and Text Foundation Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models.	MINGQIU WANG et. al.	arxiv-cs.CL	2023-09-29
423	Federated Learning with Differential Privacy for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines.	MARTIN PELIKAN et. al.	arxiv-cs.LG	2023-09-29
424	The Gift of Feedback: Improving ASR Model Quality By Learning from User Corrections Through Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continually learn from on-device user corrections through Federated Learning (FL) to address this issue.	LILLIAN ZHOU et. al.	arxiv-cs.CL	2023-09-29
425	LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this information may be helpful for ASR modeling. To alleviate this issue, we propose the LAE-ST-MoE framework.	GUODONG MA et. al.	arxiv-cs.SD	2023-09-28
426	HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction.	CHEN CHEN et. al.	arxiv-cs.CL	2023-09-27
427	Lip2Vec: Efficient and Robust Visual Speech Recognition Via Latent-to-Latent Visual to Audio Representation Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Un-like previous works that involve auxiliary losses or com-plex training procedures and architectures, we propose a simple approach, named Lip2Vec that is based on learning a prior model.	Yasser Abdelaziz Dahou Djilali; Sanath Narayan; Haithem Boussaid; Ebtessam Almazrouei; Merouane Debbah;	iccv	2023-09-27
428	Speech Collage: Code-switched Audio Generation By Collaging Monolingual Corpora Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments.	AMIR HUSSEIN et. al.	arxiv-cs.SD	2023-09-27
429	Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in The HYKIST Project Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In today’s interconnected globe, moving abroad is more and more prevalent, whether it’s for employment, refugee resettlement, or other causes. Language difficulties between …	Khai Le-Duc;	ArXiv	2023-09-26
430	Updated Corpora and Benchmarks for Long-Form Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we re-release three standard ASR corpora – TED-LIUM 3, Gigapeech, and VoxPopuli-en – with updated transcription and alignments to enable their use for long-form ASR research.	JENNIFER DREXLER FOX et. al.	arxiv-cs.CL	2023-09-26
431	Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform “non-verbatim” transcription, …	DONGJI GAO et. al.	2023 IEEE Automatic Speech Recognition and Understanding …	2023-09-26
432	Speech Dereverberation With Frequency Domain Autoregressive Modeling Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to …	Anurenjan Purushothaman; Debottam Dutta; Rohit Kumar; Sriram Ganapathy;	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-09-24
433	A Survey of Automatic Speech Recognition Deep Models Performance for Polish Medical Terms Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for …	MARTA ZIELONKA et. al.	2023 Signal Processing: Algorithms, Architectures, …	2023-09-20
434	AudioFool: Fast, Universal and Synchronization-free Cross-Domain Attack on Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA model, and we design a method of generating attacks with arbitrary such desired properties, namely the invariance to synchronization, and the robustness to filtering: this allows a Denial-of-Service (DoS) attack against ASR systems.	Mohamad Fakih; Rouwaida Kanj; Fadi Kurdahi; Mohammed E. Fouda;	arxiv-cs.CR	2023-09-20
435	Directional Source Separation for Robust Speech Recognition on Smart Glasses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve voice quality, this work investigates directional source separation using the multi-microphone array.	TIANTIAN FENG et. al.	arxiv-cs.SD	2023-09-19
436	HypR: A Comprehensive Study for ASR Hypothesis Revising with A Reference Corpus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, we first concentrate on providing an ASR hypothesis revising (HypR) dataset in this study.	Yi-Wei Wang; Ke-Han Lu; Kuan-Yu Chen;	arxiv-cs.CL	2023-09-18
437	Instruction-Following Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the mechanisms behind these models’ speech understanding and reasoning capabilities remain underexplored. To study this question from the data perspective, we introduce instruction-following speech recognition, training a Listen-Attend-Spell model to understand and execute a diverse set of free-form text instructions.	Cheng-I Jeff Lai; Zhiyun Lu; Liangliang Cao; Ruoming Pang;	arxiv-cs.CL	2023-09-18
438	BIGOS – Benchmark Intended Grouping of Open Speech Corpora for Polish Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a Benchmark Intended Grouping of Open Speech (BIGOS), a new corpus designed for Polish Automatic Speech Recognition (ASR) systems. This initial version of the …	Michał Junczyk;	2023 18th Conference on Computer Science and Intelligence …	2023-09-17
439	Are Soft Prompts Good Zero-shot Learners for Speech Recognition? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR).	DIANWEN NG et. al.	arxiv-cs.SD	2023-09-17
440	Open Vocabulary Keyword Spotting with Small-Footprint ASR-based Architecture and Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present the results of experiments on minimizing the model size for the text-based Open Vocabulary Keyword Spotting task. The main goal is to perform inference on devices with …	Mikołaj Pudo; Mateusz Wosik; Artur Janicki;	2023 18th Conference on Computer Science and Intelligence …	2023-09-17
441	Augmenting Conformers with Structured State-space Sequence Models for Online Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4), a family of models that provide a parameter-efficient way of accessing arbitrarily long left context.	HAOZHE SHAN et. al.	arxiv-cs.CL	2023-09-15
442	Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage. To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency.	YANG LI et. al.	arxiv-cs.LG	2023-09-14
443	Echotune: A Modular Extractor Leveraging The Variable-Length Nature of Speech in ASR Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential long-term connectivity. Addressing this limitation, we introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism that accommodates a range of speech sample complexities and durations.	Sizhou Chen; Songyang Gao; Sen Fang;	arxiv-cs.SD	2023-09-14
444	CPPF: A Contextual and Post-processing-free Model for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus on ASR-related processing tasks, including Contextual ASR and multiple ASR post processing tasks.	LEI ZHANG et. al.	arxiv-cs.CL	2023-09-13
445	SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multi-Modal automatic speech recognition (ASR) techniques aim to leverage additional modalities to improve the performance of speech recognition systems. While existing approaches …	HAOXU WANG et. al.	ICASSP 2024 – 2024 IEEE International Conference on …	2023-09-11
446	SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the pipeline for constructing the corpus and propose baseline methods for utilizing text information in the visual slide context.	HAOXU WANG et. al.	arxiv-cs.SD	2023-09-11
447	Leveraging Large Language Models for Exploiting ASR Uncertainty IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis.	PRANAY DIGHE et. al.	arxiv-cs.CL	2023-09-09
448	Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the effectiveness of this method has not been demonstrated for various model architectures, nor has it been verified that the encoder has the expected look-ahead capability to reduce latency. This study, therefore, examines the effectiveness of Mask-CTCbased pre-training for models with different architectures, such as Transformer-Transducer and contextual block streaming ASR.	Huaibo Zhao; Yosuke Higuchi; Yusuke Kida; Tetsuji Ogawa; Tetsunori Kobayashi;	arxiv-cs.SD	2023-09-08
449	LanSER: Language-Model Supported Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning.	TAESIK GONG et. al.	arxiv-cs.CL	2023-09-07
450	Bring The Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture.	Patrick Eickhoff; Matthias Möller; Theresa Pekarek Rosin; Johannes Twiefel; Stefan Wermter;	arxiv-cs.CL	2023-09-05
451	SememeASR: Boosting Performance of End-to-End Speech Recognition Against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that knowledge-driven approaches can help data-driven approaches alleviate their flaws, we introduce sememe-based semantic knowledge information to speech recognition (SememeASR).	Jiaxu Zhu; Changhe Song; Zhiyong Wu; Helen Meng;	arxiv-cs.SD	2023-09-04
452	Room Adaptation of Training Data for Distant Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present a novel signal processing-based approach for estimating room impulse responses for augmentation of ASR training data that is best suited to the reverberation …	James Fosburgh; D. Sharma; P. Naylor;	2023 31st European Signal Processing Conference (EUSIPCO)	2023-09-04
453	Text-Only Domain Adaptation for End-to-End Speech Recognition Through Down-Sampling Acoustic Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed novel representations match strategy through down-sampling acoustic representation to align with text modality.	JIAXU ZHU et. al.	arxiv-cs.SD	2023-09-04
454	Boosting Low-Resource Speech Recognition in Air Traffic Communication Via Pretrained Feature Aggregation and Multi-Task Learning IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Developing a robust Automatic Speech Recognition (ASR) system usually requires a large amount of well-annotated samples which is extremely hard to build in the Air Traffic Control …	Dongyue Guo; Zichen Zhang; Bo Yang; Jianwei Zhang; Yi Lin;	IEEE Transactions on Circuits and Systems II: Express Briefs	2023-09-01
455	Utilizing Automatic Speech Recognition for English Pronunciation Practice and Analyzing Its Impact Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The advancement of AI in recent years has been remarkable, along with the widespread use of speech recognition functions. In addition, an increasing number of people are …	K. Umezawa; M. Nakazawa; Michiko Nakano; S. Hirasawa;	2023 IEEE 12th International Conference on Engineering …	2023-08-29
456	ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenge, we propose ASTER, a technique for automatically testing the accessibility of ASR systems.	YI LIU et. al.	arxiv-cs.SD	2023-08-29
457	Naaloss: Rethinking The Objective of Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective.	Kuan-Hsun Ho; En-Lun Yu; Jeih-weih Hung; Berlin Chen;	arxiv-cs.SD	2023-08-24
458	Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a cross-modal global interaction and local alignment (GILA) approach for AVSR, which captures the deep audio-visual (A-V) correlations from both global and local perspectives.	YUCHEN HU et. al.	ijcai	2023-08-23
459	Convoifilter: A Case Study of Doing Cocktail Party Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an end-to-end model designed to improve automatic speech recognition (ASR) for a particular speaker in a crowded, noisy environment.	Thai-Binh Nguyen; Alexander Waibel;	arxiv-cs.SD	2023-08-22
460	SeamlessM4T: Massively Multilingual & Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages.	SEAMLESS COMMUNICATION et. al.	arxiv-cs.CL	2023-08-22
461	An Enhanced Method for Dialect Transcription Via Error-correcting Thesaurus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) has been widely used in the field of customer service, but the performance of general ASR in dialect transcription is not satisfactory, …	Xiaoliang Ma; Congjian Deng; Dequan Du; Qingqi Pei;	IET Commun.	2023-08-21
462	Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing Based Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View	ZHENG LIANG et. al.	Interspeech	2023-08-20
463	Exploiting Diversity of Automatic Transcripts from Distinct Speech Recognition Techniques for Children’s Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The recent advances in automatic speech recognition (ASR) technologies using end-to-end machine learning do not transfer well to children’s speech. One cause is the high …	Christopher Gebauer; Lars Rumberg; Hanna Ehlert; Ulrike Lüdtke; Jörn Ostermann;	Interspeech	2023-08-20
464	Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Reading is an acquired skill that is essential for integrating and participating in today’s society. Yet, becoming literate can be particularly laborious for some children. …	TIMOTHY PITON et. al.	Interspeech	2023-08-20
465	LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With advances in deep learning methodologies, Automatic Speech Recognition (ASR) systems have seen impressive re-sults. However, ASR in Low-Resource Environments (LREs) are …	Kavan Fatehi; Ayse Kucukyilmaz;	Interspeech	2023-08-20
466	Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper considers applying speaker diarization (SD) to the output tokens of automatic speech recognition (ASR). We formulate the task to be solved as a sequence classification …	MIDIA YOUSEFI et. al.	Interspeech	2023-08-20
467	Improving The Response Timing Estimation for Spoken Dialogue Systems By Reducing The Effect of Speech Recognition Delay Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In conversational systems, the proper timing of the system’s response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is …	Jin Sakuma; S. Fujie; Huaibo Zhao; Tetsunori Kobayashi;	Interspeech	2023-08-20
468	A Conformer-based Classifier for Variable-length Utterance Processing in Anti-spoofing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The success achieved by conformers in Automatic Speech Recognition (ASR) leads us to their application in other domains, such as spoofing detection for automatic speaker …	Eros Rosello; Alejandro Gomez-Alanis; A. Gómez; A. Peinado;	Interspeech	2023-08-20
469	Two-stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper addresses effective pretraining of automatic speech recognition (ASR) and gender recognition to improve wav2vec 2.0 embedding for speech emotion recognition (SER). …	Yuan Gao; Chenhui Chu; Tatsuya Kawahara;	Interspeech	2023-08-20
470	On Training A Neural Residual Acoustic Echo Suppressor for Improved ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Acoustic Echo Cancellation (AEC) is critical for accurate recognition of speech directed at a smart device playing audio. Previous work has showed that neural AEC models can …	S. Panchapagesan; T. Shabestary; A. Narayanan;	Interspeech	2023-08-20
471	I Learned Error, I Can Fix It! : A Detector-Corrector Structure for ASR Error Calibration Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition technology has improved recently. However, in the context of spoken language understanding (SLU), containing automatic speech recognition (ASR) errors causes …	Heuiyeen Yeen; Minju Kim; M. Koo;	Interspeech	2023-08-20
472	A Neural Time Alignment Module for End-to-End Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end trainable (E2E) automatic speech recognition (ASR) systems have low word error rates, but they do not model timings or silence by default unlike hidden Markov model …	Dongcheng Jiang; C. Zhang; P. Woodland;	Interspeech	2023-08-20
473	Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The use of self-supervised pre-trained speech models has greatly improved speech tasks in low-resource settings. However, fine-tuning the entire model can be computationally …	DIANWEN NG et. al.	Interspeech	2023-08-20
474	Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper proposes autoregressive modeling of the joint multi-talker automatic speech recognition (ASR) and timestamp prediction. Autoregressive modeling of multi-talker ASR is a …	Naoki Makishima; Keita Suzuki; Satoshi Suzuki; Atsushi Ando; Ryo Masumura;	Interspeech	2023-08-20
475	Unsupervised Code-switched Text Generation from Parallel Text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: There has been great interest in developing automatic speech recognition (ASR) systems that can handle code-switched (CS) speech to meet the needs of a growing bilingual …	JI-EUN CHI et. al.	Interspeech	2023-08-20
476	Speech Emotion Recognition Using Decomposed Speech Via Multi-task Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In speech emotion recognition, most recent studies used powerful models to obtain robust features without considering the disentangled components, which contain diverse …	Jia-Hao Hsu; C. Wu; Yunchao Wei;	Interspeech	2023-08-20
477	Embedding Articulatory Constraints for Low-resource Speech Recognition Based on Large Pre-trained Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Knowledge about phonemes and their articulatory attributes can help improve automatic speech recognition (ASR) of low-resource languages. In this study, we propose a simple and …	Jaeyoung Lee; M. Mimura; Tatsuya Kawahara;	Interspeech	2023-08-20
478	Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition Using Delayed-Subsource MNMF with Localization Prior Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Overlapping speech and high room reverberation deteriorate the accuracy of automatic speech recognition (ASR). This paper proposes a method for jointly optimum source separation …	Mieszko Fraś; Marcin Witkowski; K. Kowalczyk;	Interspeech	2023-08-20
479	Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We release 840 hours of read speech multi-dialect ASR corpora consisting of 700 hours of main Thai dialect, named Thai-central, and 40 hours for each local dialect , named …	Artit Suwanbandit; Burin Naowarat; Orathai Sangpetch; E. Chuangsuwanich;	Interspeech	2023-08-20
480	ASR for Low Resource and Multilingual Noisy Code-Mixed Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Developing reliable Automatic Speech Recognition (ASR) sys-tem for Indian Languages has been challenging due to the limited availability of large-scale, high-quality speech …	Tushar Verma; Atul Shree; Ashutosh Modi;	Interspeech	2023-08-20
481	Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Contextual biasing (CB) is an effective approach for contex-tualising hidden features of neural transducer ASR models to improve rare word recognition. CB relies on relatively …	Burin Naowarat; Philip Harding; Pasquale D’Alterio; Sibo Tong; Bashar Awwad Shiekh Hasan;	Interspeech	2023-08-20
482	Wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in A Clinical Setting Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The lack of large-scale speech corpora for Cantonese and older adults has impeded the academia’s research of automatic speech recognition (ASR) systems for the two. On the other …	Ranzo Huang; B. Mak;	Interspeech	2023-08-20
483	Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speakers with dysarthria could particularly benefit from assistive speech technology, but are underserved by current automatic speech recognition (ASR) systems. The differences of …	Enno Hermann; Mathew Magimai;	Interspeech	2023-08-20
484	Human Transcription Quality Improvement Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: High quality transcription data is crucial for training automatic speech recognition (ASR) systems. However, the existing industry-level data collection pipelines are expensive to …	Jian Gao; Hanbo Sun; Cheng Cao; Zheng Du;	Interspeech	2023-08-20
485	Speech-in-Speech Recognition Is Modulated By Familiarity to Dialect Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Listening to speech in competing background speech can be difficult due to elements such as the linguistic content of the signal. Linguistic release from masking occurs when …	Jessica L. L. Chin; Elena Talevska; M. Antoniou;	Interspeech	2023-08-20
486	MiniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on The Edge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Real-time applications of Automatic Speech Recognition (ASR) on user devices on the edge require streaming processing. Conformer model has achieved state-of-the-art performance in …	Haris Gulzar; Monikka Roslianna Busto; Takeharu Eda; Katsutoshi Itoyama; K. Nakadai;	Interspeech	2023-08-20
487	Information Magnitude Based Dynamic Sub-sampling for Speech-to-text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Attention-based models have achieved new state-of-the-art in many tasks while the computational cost of these models increases drastically compared with previous methods. For most …	YUHAO ZHANG et. al.	Interspeech	2023-08-20
488	Dialect Speech Recognition Modeling Using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In order to utilize the large amount of historical speech resources for applications such as linguistic analysis and retrieval, automatic speech recognition technology that can …	Shogo Miwa; A. Kai;	Interspeech	2023-08-20
489	Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present a multi-speaker silent speech recognition system trained on articulator features derived from the Tongue and Lips corpus, a multi-speaker corpus of ultrasound tongue …	Rachel Beeson; Korin Richmond;	Interspeech	2023-08-20
490	Whisper Features for Dysarthric Severity-Level Classification Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Dysarthria is a speech disorder caused by improper coordination between the brain and the muscles that produce intelligible speech. Accurately diagnosing the severity of …	Siddharth Rathod; Monil Charola; Akshat Vora; Yash Jogi; H. Patil;	Interspeech	2023-08-20
491	An Improved End-to-End Audio-Visual Speech Recognition Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: By incorporating lip language, audio-visual speech recognition can effectively improve the recognition effect in noisy environments, and will slightly improve the recognition …	Sheng Yang; Zheng Gong; Jiacang Kang;	Interspeech	2023-08-20
492	Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end (E2E) Automatic Speech Recognition (ASR) has gained popularity in recent years, with most research focusing on designing novel neural network architectures, speech …	Zeyu Zhao; P. Bell;	Interspeech	2023-08-20
493	TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TokenSplit, a speech separation model that acts on discrete token sequences.	HAKAN ERDOGAN et. al.	arxiv-cs.SD	2023-08-20
494	Data Augmentation for Children ASR and Child-adult Speaker Classification Using Voice Conversion Methods Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Many young children prefer speech based interfaces over text, as they are relatively slow and error-prone with text input. However, children ASR can be challenging due to the lack …	Shuyang Zhao; Mittul Singh; Abraham Woubie; Reima Karhila;	Interspeech	2023-08-20
495	Exploring Sources of Racial Bias in Automatic Speech Recognition Through The Lens of Rhythmic Variation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although studies have shown that one issue of bias in modern automatic speech recognition (ASR) technologies is degraded performance for African American English (AAE) speakers, …	Li-Fang Lai; N. Holliday;	Interspeech	2023-08-20
496	Bayes Risk Transducer: Transducer with Controllable Alignment Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, this work proposes Bayes Risk Transducer (BRT), which uses a Bayes risk function to set lower risk values to the preferred paths so that the predicted alignment is more likely to satisfy specific desired properties.	JINCHUAN TIAN et. al.	arxiv-cs.CL	2023-08-19
497	Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A standard pipeline for automated spoken language assessment is to start with an automatic speech recognition (ASR) system and derive features that exploit transcriptions and …	Stefano Bannò; K. Knill; M. Matassoni; Vyas Raina; M. Gales;	Slate	2023-08-18
498	Accurate Synthesis of Dysarthric Speech for ASR Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.	Mohammad Soleymanpour; Michael T. Johnson; Rahim Soleymanpour; Jeffrey Berry;	arxiv-cs.SD	2023-08-16
499	An Ambient Intelligence-based Approach For Longitudinal Monitoring of Verbal and Vocal Depression Symptoms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Another major challenge in depression relapse research is the scarcity of publicly available datasets. To overcome these issues, we propose a one-shot learning framework for detecting depression relapse from speech.	Alice Othmani; Muhammad Muzammel;	arxiv-cs.HC	2023-08-16
500	Radio2Text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping. However, …	Running Zhao; Luca Jiang-Tao Yu; H. Zhao; Edith C. H. Ngai;	Proceedings of the ACM on Interactive, Mobile, Wearable and …	2023-08-16
501	Radio2Text: Streaming Speech Recognition Using MmWave Radio Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Radio2Text, the first mmWave-based system for streaming automatic speech recognition (ASR) with a vocabulary size exceeding 13,000 words.	Running Zhao; Jiangtao Yu; Hang Zhao; Edith C. H. Ngai;	arxiv-cs.SD	2023-08-15
502	Text Injection for Capitalization and Turn-Taking Prediction in Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks.	SHAAN BIJWADIA et. al.	arxiv-cs.CL	2023-08-14
503	Using Text Injection to Improve Recognition of Personal Identifiers in Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use text-injection to improve the recognition of PII categories by including fake textual substitutes of PII categories in the training data using a text injection method.	YOCHAI BLAU et. al.	arxiv-cs.CL	2023-08-14
504	A Novel Self-training Approach for Low-resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a self-training approach for automatic speech recognition (ASR) for low-resource settings.	Satwinder Singh; Feng Hou; Ruili Wang;	arxiv-cs.CL	2023-08-09
505	Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR).	Yang Zhang; Krishna C. Puvvada; Vitaly Lavrukhin; Boris Ginsburg;	arxiv-cs.SD	2023-08-09
506	The Role of Audio Features in Accent Recognition: A Comparative Analysis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study focuses on enhancing Automatic Speech Recognition (ASR) systems, crucial in Science and Technology, by addressing challenges tied to diverse speaker accents. The …	Anik Biswas;	2023 International Workshop on Intelligent Systems (IWIS)	2023-08-09
507	Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text generated by ASR output.	JIAXIN FAN et. al.	arxiv-cs.CL	2023-08-07
508	Federated Representation Learning for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints.	GURUPRASAD V RAMESH et. al.	arxiv-cs.SD	2023-08-03
509	Inaudible Adversarial Perturbation: Manipulating The Recognition of User Speech in Real Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we seek to bridge the gap in existing research and extend the attack to user-present scenarios.	XINFENG LI et. al.	arxiv-cs.CR	2023-08-02
510	ÌròyìnSpeech: A Multi-purpose Yorùbá Speech Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \`{I}r\`{o}y\`{i}nSpeech, a new corpus influenced by the desire to increase the amount of high quality, contemporary Yor\`{u}b\'{a} speech data, which can be used for both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) tasks.	Tolulope Ogunremi; Kola Tubosun; Anuoluwapo Aremu; Iroro Orife; David Ifeoluwa Adelani;	arxiv-cs.CL	2023-07-29
511	The Timing Bottleneck: Why Timing and Overlap Are Mission-critical for Conversational User Interfaces, Speech Recognition and Dialogue Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1).	Andreas Liesenfeld; Alianda Lopez; Mark Dingemanse;	arxiv-cs.CL	2023-07-28
512	Cascaded Cross-Modal Transformer for Request and Complaint Detection Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and text transcripts to detect customer requests and complaints in phone conversations. Our …	Nicolae-Cătălin Ristea; Radu Tudor Ionescu;	Proceedings of the 31st ACM International Conference on …	2023-07-27
513	Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition.	Christophe Van Gysel;	sigir	2023-07-25
514	Boosting Punctuation Restoration with Data Generation and Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap.	VIET DAC LAI et. al.	arxiv-cs.CL	2023-07-24
515	Adaptation of Whisper Models to Child Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly …	Rishabh Jain; Andrei Barcovschi; Mariam Yiwere; Peter Corcoran; H. Cucu;	ArXiv	2023-07-24
516	Code-Switched Urdu ASR for Noisy Telephonic Environment Using Data Centric Approach with Hybrid HMM and CNN-TDNN Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, this paper describes an implementation framework of a resource efficient Automatic Speech Recognition/ Speech to Text System in a noisy call-center environment using Chain Hybrid HMM and CNN-TDNN for Code-Switched Urdu Language.	Muhammad Danyal Khan; Raheem Ali; Arshad Aziz;	arxiv-cs.CL	2023-07-24
517	Robust Automatic Speech Recognition Via WavAugment Guided Phoneme Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Developing a practically-robust automatic speech recognition (ASR) is challenging since the model should not only maintain the original performance on clean samples, but also achieve consistent efficacy under small volume perturbations and large domain shifts. To address this problem, we propose a novel WavAugment Guided Phoneme Adversarial Training (wapat).	GEGE QI et. al.	arxiv-cs.SD	2023-07-23
518	Exploring The Integration of Speech Separation and Recognition with Self-Supervised Learning Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further improve multi-speaker recognition performance, we present a carefully designed training strategy for integrating speech separation and recognition with SSLR.	YOSHIKI MASUYAMA et. al.	arxiv-cs.SD	2023-07-23
519	A Meta Learning Scheme for Fast Accent Domain Expansion in Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR.	Ziwei Zhu; Changhao Shan; Bihong Zhang; Jian Yu;	arxiv-cs.SD	2023-07-23
520	Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome this issue, we propose a novel E2E SLU system that enhances robustness to ASR errors by fusing audio and text representations based on the estimated modality confidence of ASR hypotheses. We introduce two novel techniques: 1) an effective method to encode the quality of ASR hypotheses and 2) an effective approach to integrate them into E2E SLU models.	SUYOUN KIM et. al.	arxiv-cs.CL	2023-07-22
521	A Change of Heart: Improving Speech Emotion Recognition Through Speech-to-Text Modality Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset.	Zeinab Sadat Taghavi; Ali Satvaty; Hossein Sameti;	arxiv-cs.SD	2023-07-21
522	A Deep Dive Into The Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography.	Anand Kumar Rai; Siddharth D Jaiswal; Animesh Mukherjee;	arxiv-cs.CL	2023-07-20
523	Room Acoustic Characterization with Smartphone-Based Automated Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Characterizing and monitoring the acoustic quality of a room is important for maintaining effective speech communication. Noise and echoes make speech harder to perceive, …	Brady Laska; Bruce Wallace; Abagael Hudak; Rafik Goubran;	2023 IEEE Sensors Applications Symposium (SAS)	2023-07-18
524	Ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: We introduceivrit.ai, a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) …	Yanir Marmor; Kinneret Misgav; Y. Lifshitz;	ArXiv	2023-07-17
525	Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay for continual learning.	Theresa Pekarek Rosin; Stefan Wermter;	arxiv-cs.CL	2023-07-14
526	SGGNet$^2$: Speech-Scene Graph Grounding Network for Speech-guided Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel speech-scene graph grounding network (SGGNet$^2$) that robustly grounds spoken utterances by leveraging the acoustic similarity between correctly recognized and misrecognized words obtained from automatic speech recognition (ASR) systems.	DOHYUN KIM et. al.	arxiv-cs.RO	2023-07-14
527	SGGNet2: Speech-Scene Graph Grounding Network for Speech-guided Navigation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The spoken language serves as an accessible and efficient interface, enabling non-experts and disabled users to interact with complex assistant robots. However, accurately …	DOHYUN KIM et. al.	2023 32nd IEEE International Conference on Robot and Human …	2023-07-14
528	Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare our model with encoders pretrained on self-supervised learning (SSL), and show that ASR pretraining is much more effective than SSL for SICSF.	He Huang; Jagadeesh Balam; Boris Ginsburg;	arxiv-cs.CL	2023-07-13
529	Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR.	Wenxuan Wang; Guodong Ma; Yuke Li; Binbin Du;	arxiv-cs.SD	2023-07-12
530	Exploring The Integration of Large Language Models Into Automatic Speech Recognition Systems: An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems to improve transcription accuracy.	Zeping Min; Jinbo Wang;	arxiv-cs.CL	2023-07-12
531	SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Cheaper alternatives to self-attention for ASR have been developed, but they fail to consistently reach the same level of accuracy. This paper, therefore, proposes a novel linear-time alternative to self-attention.	Titouan Parcollet; Rogier van Dalen; Shucong Zhang; Sourav Bhattacharya;	arxiv-cs.CL	2023-07-12
532	The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech.	KUN SONG et. al.	arxiv-cs.SD	2023-07-10
533	Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate whether data augmentation techniques could help improve low-resource ASR performance, focusing on four typologically diverse minority languages or language variants (West Germanic: Gronings, West-Frisian; Malayo-Polynesian: Besemah, Nasal).	Martijn Bartelds; Nay San; Bradley McDonnell; Dan Jurafsky; Martijn Wieling;	acl	2023-07-08
534	Building Accurate Low Latency ASR for Streaming Voice Search in E-commerce Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build accurate LSTM, attention and CTC based streaming ASR models for large-scale Hinglish (blend of Hindi and English) Voice Search.	Abhinav Goyal; Nikesh Garera;	acl	2023-07-08
535	Hybrid Transducer and Attention Based Encoder-Decoder Modeling for Speech-to-Text Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks.	YUN TANG et. al.	acl	2023-07-08
536	DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Choosing an informative subset of speech samples that are most representative of the target accents becomes important for effective ASR finetuning. To address this problem, we propose DITTO (Data-efficient and faIr Targeted subseT selectiOn that uses Submodular Mutual Information (SMI) functions as acquisition functions to find the most informative set of utterances matching a target accent within a fixed budget.	SURAJ KOTHAWADE et. al.	acl	2023-07-08
537	BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.	MINGDA CHEN et. al.	acl	2023-07-08
538	Introducing Semantics Into Speech Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions.	DEREK XU et. al.	acl	2023-07-08
539	STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present STT4SG-350, a corpus of Swiss German speech, annotated with Standard German text at the sentence level.	MICHEL PL�SS et. al.	acl	2023-07-08
540	Back Translation for Speech-to-text Translation Without Transcripts IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to utilize large amounts of target-side monolingual data to enhance ST without transcripts.	Qingkai Fang; Yang Feng;	acl	2023-07-08
541	A Theory of Unsupervised Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed a general theoretical framework to study the properties of {pasted macro �ASRU�}/ systems based on random matrix theory and the theory of neural tangent kernels.	Liming Wang; Mark Hasegawa-Johnson; Chang Yoo;	acl	2023-07-08
542	Why Aren�t We NER Yet? Artifacts of ASR Errors in Named Entity Recognition in Spontaneous Speech Transcripts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine in detail the complex relationship between ASR and NER errors which limit the ability of NER models to recover entity mentions from spontaneous speech transcripts.	PIOTR SZYMANSKI et. al.	acl	2023-07-08
543	NLP Based Model to Convert English Speech to Gujarati Text for Deaf & Dumb People Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: NLP (Natural Language Processing) is a use for process Hu-man Language. It is a combination of language and Artificial Intelligence. This paper is helpful to understand about …	Nasrin Aasofwala; Shanti Verma; Kalyani Patel;	2023 14th International Conference on Computing …	2023-07-06
544	Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To produce ASR and ST content effectively with minimal latency, we propose a joint token-level serialized output training method that interleaves source and target words by leveraging an off-the-shelf textual aligner.	SARA PAPI et. al.	arxiv-cs.CL	2023-07-06
545	Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input.	Guangzhi Sun; Chao Zhang; Ivan Vulić; Paweł Budzianowski; Philip C. Woodland;	arxiv-cs.CL	2023-07-04
546	Transcribing Educational Videos Using Whisper: A Preliminary Study on Using AI for Transcribing Educational Videos Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Videos are increasingly being used for e-learning, and transcripts are vital to enhance the learning experience. The costs and delays of generating transcripts can be alleviated …	Ashwin Rao;	ArXiv	2023-07-04
547	Boosting Norwegian Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk.	Javier de la Rosa; Rolv-Arild Braaten; Per Egil Kummervold; Freddy Wetjen; Svein Arne Brygfjeld;	arxiv-cs.CL	2023-07-04
548	Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to reduce bias against different age groups and non-native speakers of Dutch.	Tanvina Patel; Odette Scharenborg;	arxiv-cs.CL	2023-07-04
549	Using Open-Source Automatic Speech Recognition Tools for The Annotation of Dutch Infant-Directed Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: There is a large interest in the annotation of speech addressed to infants. Infant-directed speech (IDS) has acoustic properties that might pose a challenge to automatic speech …	Anika van der Klis; Frans Adriaans; Mengru Han; R. Kager;	Multimodal Technol. Interact.	2023-07-03
550	Don’t Stop Self-Supervision: Accent Adaptation of Speech Representations Via Residual Adapters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose and investigate self-supervised adaptation of speech representations to such populations in a parameter-efficient way via training accent-specific residual adapters.	ANSHU BHATIA et. al.	arxiv-cs.CL	2023-07-01
551	Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language.	Simone Wills; Yu Bai; Cristian Tejedor-Garcia; Catia Cucchiarini; Helmer Strik;	arxiv-cs.CL	2023-06-29
552	Accelerating Transducers Through Adjacent Token Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this design is inefficient, particularly for long speech signals due to the quadratic computation of self-attention. To address this, we propose a new method, Adjacent Token Merging (A-ToMe), which gradually combines adjacent tokens with high similarity scores between their key values.	Yuang Li; Yu Wu; Jinyu Li; Shujie Liu;	arxiv-cs.CL	2023-06-28
553	Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM).	Yuang Li; Yu Wu; Jinyu Li; Shujie Liu;	arxiv-cs.CL	2023-06-28
554	Cascaded Encoders for Fine-tuning ASR Models on Overlapped Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an MT-ASR model formed by combining a well-trained foundation model with a multi-talker mask model in a cascaded RNN-T encoder configuration.	Richard Rose; Oscar Chang; Olivier Siohan;	arxiv-cs.SD	2023-06-28
555	Don’t Be So Sure! Boosting ASR Decoding Via Confidence Relaxation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models.	Tomer Wullach; Shlomo E. Chazan;	aaai	2023-06-26
556	Performance Disparities Between Accents in Automatic Speech Recognition (Student Abstract) Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this work, we expand the discussion of bias in Automatic Speech Recognition (ASR) through a large-scale audit. Using a large and global data set of speech, we perform an audit …	Alex DiChristofano; Henry Shuster; Shefali Chandra; Neal Patwari;	AAAI Conference on Artificial Intelligence	2023-06-26
557	Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, different from audio forced alignment, it is challenging to develop a reliable visual forced alignment technology for the following two reasons: 1) Visual Speech Recognition (VSR) has a much lower performance compared to audio-based Automatic Speech Recognition (ASR), and 2) the translation from text to video is not reliable, so the method typically used for building audio forced alignment cannot be utilized in developing visual forced alignment. In order to alleviate these challenges, in this paper, we propose a new method that is appropriate for visual forced alignment, namely Deep Visual Forced Alignment (DVFA).	Minsu Kim; Chae Won Kim; Yong Man Ro;	aaai	2023-06-26
558	Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN).	QINGYU WANG et. al.	aaai	2023-06-26
559	A Transformer-based Network for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View	Lina Tang;	International Journal of Speech Technology	2023-06-26
560	An Analysis of Personalized Speech Recognition System Development for The Deaf and Hard-of-Hearing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To do so, we analyze the use of openly-available automatic speech recognition (ASR) tools with a DHH Japanese speaker dataset. As these out-of-the-box ASR models typically do not perform well on DHH speech, we provide a thorough analysis of creating personalized ASR systems.	Lester Phillip Violeta; Tomoki Toda;	arxiv-cs.SD	2023-06-24
561	Mixture Encoder for Joint Speech Separation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a middle-ground approach that leverages explicit speech separation similarly to the modular approach but also incorporates mixture speech information directly into the ASR module in order to mitigate the propagation of errors made by the speech separator.	Simon Berger; Peter Vieting; Christoph Boeddeker; Ralf Schlüter; Reinhold Haeb-Umbach;	arxiv-cs.CL	2023-06-21
562	NoRefER: A Referenceless Quality Metric for Automatic Speech Recognition Via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces NoRefER, a novel referenceless quality metric for automatic speech recognition (ASR) systems.	Kamer Ali Yuksel; Thiago Ferreira; Golara Javadi; Mohamed El-Badrashiny; Ahmet Gunduz;	arxiv-cs.CL	2023-06-21
563	Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a Conformer-based architecture, called Aformer, to leverage both the acoustic information from large non-accented and limited accented training data.	Xuefei Wang; Yanhua Long; Yijie Li; Haoran Wei;	arxiv-cs.SD	2023-06-20
564	Improved Keyword Recognition Based on Aho-Corasick Automaton Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The recognition of out-of-vocabulary (OOV) words in many state-of-art automatic speech recognition (ASR) systems, which need the to recognize a word that has never been seen …	Yachao Guo; Zhibin Qiu; Hao Huang; Chng Eng Siong;	2023 International Joint Conference on Neural Networks …	2023-06-18
565	A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In collaborative learning environments, effective intelligent learning systems need to accurately analyze and understand the collaborative discourse between learners (i.e., group …	JIE CAO et. al.	Proceedings of the 31st ACM Conference on User Modeling, …	2023-06-18
566	Research on An Improved Conformer End-to-end Speech Recognition Model with R-Drop Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue of poor generalization ability in end-to-end speech recognition models within deep learning, this study proposes a new Conformer-based speech recognition model called Conformer-R that incorporates the R-drop structure.	Weidong Ji; Shijie Zan; Guohui Zhou; Xu Wang;	arxiv-cs.SD	2023-06-14
567	Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing Based Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current data augmentation methods mainly rely on audio splicing and text-to-speech (TTS) models, which might result in discontinuous, unrealistic, and less diversified speech. To mitigate these potential issues, we propose a novel data augmentation method by applying the text-based speech editing model.	ZHENG LIANG et. al.	arxiv-cs.CL	2023-06-14
568	Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, a novel multilingual model fusion technique has been proposed where a model is trained to learn cross-lingual acoustic-phonetic similarities as a mapping function.	Muhammad Umar Farooq; Thomas Hain;	arxiv-cs.CL	2023-06-14
569	IIITH-CSTD Corpus: Crowdsourced Strategies for The Collection of A Large-scale Telugu Speech Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to the lack of a large annotated speech corpus, many low-resource Indian languages struggle to utilize recent advancements in deep neural network architectures for Automatic …	MIRISHKAR SAI GANESH et. al.	ACM Transactions on Asian and Low-Resource Language …	2023-06-12
570	Multimodal Audio-textual Architecture for Robust Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Because such approach relies on the ASR output, it often suffers from the so-called ASR error propagation. In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa.	Anderson R. Avila; Mehdi Rezagholizadeh; Chao Xing;	arxiv-cs.CL	2023-06-11
571	Adversarial Training For Low-Resource Disfluency Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) that utilizes a small amount of labeled real disfluent data in conjunction with a large amount of unlabeled data.	Vineet Bhat; Preethi Jyothi; Pushpak Bhattacharyya;	arxiv-cs.CL	2023-06-10
572	Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end (E2E) systems have shown comparable performance to hybrid systems for automatic speech recognition (ASR). Word timings, as a by-product of ASR, are essential in many …	Xianzhao Chen; Yist Y. Lin; Kang Wang; Yi He; Zejun Ma;	ArXiv	2023-06-09
573	Developing Speech Processing Pipelines for Police Accountability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops.	Anjalie Field; Prateek Verma; Nay San; Jennifer L. Eberhardt; Dan Jurafsky;	arxiv-cs.CL	2023-06-09
574	Latent Phrase Matching for Dysarthric Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Many consumer speech recognition systems are not tuned for people with speech disabilities, resulting in poor recognition and user experience, especially for severe speech …	COLIN S. LEA et. al.	ArXiv	2023-06-08
575	Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages.	CLAYTONE SIKASOTE et. al.	arxiv-cs.CL	2023-06-07
576	An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read.	Yu Bai; Cristian Tejedor-Garcia; Ferdy Hubers; Catia Cucchiarini; Helmer Strik;	arxiv-cs.CL	2023-06-07
577	Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a new lenient evaluation metric as a more defensible CER measure for Japanese ASR.	Shigeki Karita; Richard Sproat; Haruko Ishikawa;	arxiv-cs.CL	2023-06-07
578	Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach.	Massa Baali; Ibrahim Almakky; Shady Shehata; Fakhri Karray;	arxiv-cs.SD	2023-06-07
579	Label Aware Speech Representation Learning For Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.	SHIKHAR VASHISHTH et. al.	arxiv-cs.CL	2023-06-07
580	A Study on The Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that HuBERT is the most versatile feature extractor across dysarthria classification, word recognition, and intelligibility classification, achieving respectively $+24.7\%, +61\%, \text{and} +7.2\%$ accuracy compared to classical acoustic features.	Xavier F. Cadet; Ranya Aloufi; Sara Ahmadi-Abhari; Hamed Haddadi;	arxiv-cs.CL	2023-06-07
581	Alzheimer Disease Classification Through ASR-based Transcriptions: Exploring The Impact of Punctuation and Pauses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we used the new state-of-the-art Automatic Speech Recognition (ASR) model Whisper to obtain the transcriptions, which also include automatic punctuation.	LUCÍA GÓMEZ-ZARAGOZÁ et. al.	arxiv-cs.CL	2023-06-06
582	N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task.	Bashar Talafha; Abdul Waheed; Muhammad Abdul-Mageed;	arxiv-cs.CL	2023-06-05
583	SpellMapper: A Non-autoregressive Neural Spellchecker for ASR Customization with Candidate Retrieval Based on N-gram Mappings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose: 1) a novel algorithm for candidate retrieval, based on misspelled n-gram mappings, which gives up to 90% recall with just the top 10 candidates on Spoken Wikipedia; 2) a non-autoregressive neural model based on BERT architecture, where the initial transcript and ten candidates are combined into one input.	Alexandra Antonova; Evelina Bakhturina; Boris Ginsburg;	arxiv-cs.CL	2023-06-04
584	End-to-End Joint Target and Non-Target Speakers ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker’s speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech.	RYO MASUMURA et. al.	arxiv-cs.CL	2023-06-04
585	A Reference-Less Quality Metric for Automatic Speech Recognition Via Contrastive-Learning of A Multi-Language Model with Self-Supervision Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: The common standard for quality evaluation of automatic speech recognition (ASR) systems is reference-based metrics such as the Word Error Rate (WER), computed using manual …	K. Yuksel; Thiago Castro Ferreira; Ahmet Gunduz; Mohamed Al-Badrashiny; Golara Javadi;	2023 IEEE International Conference on Acoustics, Speech, …	2023-06-04
586	Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The limited availability of non-native speech datasets presents a major challenge in automatic speech recognition (ASR) to narrow the performance gap between native and non-native speakers. To address this, the focus of this study is on the efficient incorporation of the L2 phonemes, which in this work refer to Korean phonemes, through articulatory feature analysis.	Jisung Wang; Haram Lee; Myungwoo Oh;	arxiv-cs.CL	2023-06-04
587	Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Combining several active learning paradigms and the core-set approach, we propose a new multi-rounds adaptation process that uses epistemic uncertainty to automate the annotation process, significantly reducing the associated costs and human labor.	Bonaventure F. P. Dossou;	arxiv-cs.CL	2023-06-03
588	Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data with kids going through gamified math learning activities.	Eda Okur; Roddy Fuentes Alba; Saurav Sahay; Lama Nachman;	arxiv-cs.CY	2023-06-01
589	Adaptation and Optimization of Automatic Speech Recognition (ASR) for The Maritime Domain in The Field of VHF Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a multilingual automatic speech recognizer (ASR) for maritime radio communi-cation that automatically converts received VHF radio signals into text.	Emin Cagatay Nakilcioglu; Maximilian Reimann; Ole John;	arxiv-cs.SD	2023-06-01
590	SlothSpeech: Denial-of-service Attack Against Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SlothSpeech, a denial-of-service attack against ASR models, which exploits the dynamic behaviour of the model.	MIRAZUL HAQUE et. al.	arxiv-cs.SD	2023-06-01
591	Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in …	DONGJI GAO et. al.	ArXiv	2023-06-01
592	Towards Hate Speech Detection in Low-resource Languages: Comparing ASR to Acoustic Word Embeddings on Wolof and Swahili Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We specifically use a multilingual AWE model trained on labelled data from well-resourced languages to spot keywords in data in the unseen target language.	Christiaan Jacobs; Nathanaël Carraz Rakotonirina; Everlyn Asiko Chimoto; Bruce A. Bassett; Herman Kamper;	arxiv-cs.CL	2023-06-01
593	Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST).	Santosh Kesiraju; Marek Sarvas; Tomas Pavlicek; Cecile Macaire; Alejandro Ciuba;	arxiv-cs.CL	2023-05-31
594	The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, new approaches are explored and compared to improve the performance of CLS based multilingual ASR model.	Kaousheik Jayakumar; Vrunda N. Sukhadia; A Arunkumar; S. Umesh;	arxiv-cs.CL	2023-05-31
595	Zero-Shot Automatic Pronunciation Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT.	Hongfu Liu; Mingqian Shi; Ye Wang;	arxiv-cs.SD	2023-05-31
596	Accurate and Structured Pruning for Efficient Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the model size and inference cost of the Conformer model while preserving high recognition performance.	HUIQIANG JIANG et. al.	arxiv-cs.CL	2023-05-31
597	Towards Selection of Text-to-speech Data to Augment ASR Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic …	SHUO LIU et. al.	ArXiv	2023-05-30
598	STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss German speech, annotated with Standard German text at the sentence level.	MICHEL PLÜSS et. al.	arxiv-cs.CL	2023-05-30
599	Building Accurate Low Latency ASR for Streaming Voice Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on developing accurate LSTM, attention, and CTC based streaming ASR models for large-scale Hinglish (a blend of Hindi and English) Voice Search.	Abhinav Goyal; Nikesh Garera;	arxiv-cs.SD	2023-05-29
600	HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are expensive for long input sequences. Here, we address this by extending HyperMixer, an efficient alternative to attention exhibiting linear complexity, to the Conformer architecture for speech recognition, leading to HyperConformer.	Florian Mai; Juan Zuluaga-Gomez; Titouan Parcollet; Petr Motlicek;	arxiv-cs.CL	2023-05-29
601	CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7.0 (English) and Common Voice 11.0 (Italian, German, and Spanish).	Juan Zuluaga-Gomez; Sara Ahmed; Danielius Visockas; Cem Subakan;	arxiv-cs.CL	2023-05-29
602	Improving Textless Spoken Language Understanding with Discrete Units As Intermediate Target Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, inspired by the content-disentangled discrete units from self-supervised speech models, we proposed to use discrete units as intermediate guidance to improve textless SLU performance.	Guan-Wei Wu; Guan-Ting Lin; Shang-Wen Li; Hung-yi Lee;	arxiv-cs.CL	2023-05-29
603	Exploration of Efficient End-to-End ASR Using Discretized Input from Self-Supervised Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new protocol that utilizes discretized token sequences in ASR tasks, which includes de-duplication and sub-word modeling to enhance the input sequence.	Xuankai Chang; Brian Yan; Yuya Fujita; Takashi Maekaku; Shinji Watanabe;	arxiv-cs.SD	2023-05-29
604	Speech and Noise Dual-stream Spectrogram Refine Network with Speech Distortion Loss for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input.	HAOYU LU et. al.	arxiv-cs.SD	2023-05-28
605	Retraining-free Customized ASR for Enharmonic Words Based on A Named-Entity-Aware Model and Phoneme Similarity Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since such NE words tend to be important keywords, ASR easily loses user trust if it misrecognizes them. To solve these problems, this paper proposes a novel retraining-free customized method for E2E-ASRs based on a named-entity-aware E2E-ASR model and phoneme similarity estimation.	Yui Sudo; Kazuya Hata; Kazuhiro Nakadai;	arxiv-cs.SD	2023-05-28
606	Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on The False Alarms in Automated Speech Recognition Testing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate false alarm occurrences in five popular ASR systems using synthetic audio generated from four TTS systems and human audio obtained from two commonly used datasets.	JULIA KAIWEN LAU et. al.	arxiv-cs.SE	2023-05-27
607	DisfluencyFixer: A Tool to Enhance Language Learning Through Speech To Speech Disfluency Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents DisfluencyFixer, a tool that performs speech-to-speech disfluency correction in English and Hindi using a pipeline of Automatic Speech Recognition (ASR), Disfluency Correction (DC) and Text-To-Speech (TTS) models.	Vineet Bhat; Preethi Jyothi; Pushpak Bhattacharyya;	arxiv-cs.CL	2023-05-26
608	Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with A Sidecar Separator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A recent study proposed a cost-effective method to convert a single-talker automatic speech recognition (ASR) system into a multi-talker one, by inserting a Sidecar separator into the frozen well-trained ASR model. Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.	LINGWEI MENG et. al.	arxiv-cs.SD	2023-05-25
609	INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. …	Eunseop Yoon; Hee Suk Yoon; John Harvill; M. Hasegawa-Johnson; C. Yoo;	Annual Meeting of the Association for Computational …	2023-05-25
610	Svarah: Evaluating English ASR Systems on Indian Accents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents.	TAHIR JAVED et. al.	arxiv-cs.CL	2023-05-25
611	Iteratively Improving Speech Recognition and Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel iterative way of improving both the ASR and VC models.	Mayank Kumar Singh; Naoya Takahashi; Onoe Naoyuki;	arxiv-cs.SD	2023-05-24
612	InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR.	ZHI-HAO LAI et. al.	arxiv-cs.CL	2023-05-24
613	Evaluating OpenAI’s Whisper ASR for Punctuation Prediction and Topic Modeling of Life Histories of The Museum of The Person Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This chapter presents the first study on the performance of Whisper for punctuation prediction in the Portuguese language.	LUCAS RAFAEL STEFANEL GRIS et. al.	arxiv-cs.CL	2023-05-23
614	SE-Bridge: Speech Enhancement with Consistent Brownian Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SE-Bridge, a novel method for speech enhancement (SE).	Zhibin Qiu; Mengfan Fu; Fuchun Sun; Gulila Altenbek; Hao Huang;	arxiv-cs.SD	2023-05-23
615	Personalized Predictive ASR for Latency Reduction in Voice Assistants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, thus saving latency. In this paper, we extend this idea by introducing predictive automatic speech recognition, where we predict the full utterance from a partially observed utterance, and prefetch the response based on the predicted utterance.	Andreas Schwarz; Di He; Maarten Van Segbroeck; Mohammed Hethnawi; Ariya Rastrow;	arxiv-cs.CL	2023-05-23
616	Text Generation with Speech Synthesis for ASR Data Augmentation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work …	ZHUANGQUN HUANG et. al.	ArXiv	2023-05-22
617	GNCformer Enhanced Self-attention for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper,an Enhanced Self-Attention (ESA) mechanism has been put forward for robust feature extraction.The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism.In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction.In addition, the location of interest that is suitable for inserting the ESA is also worth being explored.In this paper, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named GNCformer.	J. Li; Z. Duan; S. Li; X. Yu; G. Yang;	arxiv-cs.SD	2023-05-22
618	Self-supervised Representations in Speech-based Depression Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL).	Wen Wu; Chao Zhang; Philip C. Woodland;	arxiv-cs.CL	2023-05-20
619	A Comparative Study on E-Branchformer Vs Conformer in Speech Recognition, Translation, and Understanding Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work compares E-Branchformer and Conformer through extensive experiments using different types of end-to-end sequence-to-sequence models.	YIFAN PENG et. al.	arxiv-cs.CL	2023-05-18
620	A Lexical-aware Non-autoregressive Transformer-based ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A series of experiments are conducted on the AISHELL-1, CSJ, and TEDLIUM 2 datasets.	Chong-En Lin; Kuan-Yu Chen;	arxiv-cs.CL	2023-05-18
621	Wavoice: A MmWave-assisted Noise-resistant Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As automatic speech recognition evolves, the deployment of voice user interface has boomingly expanded. Especially since the COVID-19 pandemic, VUI has gained more attention in …	TIANTIAN LIU et. al.	ACM Transactions on Sensor Networks	2023-05-18
622	FunASR: A Fundamental End-to-End Speech Recognition Toolkit IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications.	ZHIFU GAO et. al.	arxiv-cs.SD	2023-05-18
623	AVFormer: Injecting Vision Into Frozen Speech Models for Zero-Shot AV-ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AVFormer, a simple method for augmenting audioonly models with visual information, at the same time performing lightweight domain adaptation.	Paul Hongsuck Seo; Arsha Nagrani; Cordelia Schmid;	cvpr	2023-05-17
624	MmMIC: Multi-modal Speech Recognition Based on MmWave Radar IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the proliferation of voice assistants, microphone-based speech recognition technology usually cannot achieve good performance in the situation of multiple sound sources and …	LONG FAN et. al.	IEEE INFOCOM 2023 – IEEE Conference on Computer …	2023-05-17
625	OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, …	FAZLE RAKIB et. al.	ArXiv	2023-05-15
626	Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how HuBERT uses clustering to discover hidden acoustic units, we formulate a factor analysis (FA) model that uses the discovered hidden acoustic units to align the SSL features.	Weiwei Lin; Chenhang He; Man-Wai Mak; Youzhi Tu;	arxiv-cs.SD	2023-05-14
627	Investigating The Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes.	Emma O’Neill; Julie Carson-Berndsen;	arxiv-cs.CL	2023-05-12
628	Development of Low-Latency and Real-Time Filipino Children Automatic Speech Recognition System Using Deep Neural Network Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As no studies have been made yet to use real-time speakers in performing the Filipino children automatic speech recognition system, this study aims to use the available children …	Bonry Dorado; Alonica R. Villanueva;	2023 11th International Symposium on Digital Forensics and …	2023-05-11
629	Multi-Temporal Lip-Audio Memory for Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Multi-Temporal Lip-Audio Memory (MTLAM) that makes the best use of audio signals to complement insufficient information of lip movements.	Jeong Hun Yeo; Minsu Kim; Yong Man Ro;	arxiv-cs.CV	2023-05-08
630	Hybrid Transducer and Attention Based Encoder-Decoder Modeling for Speech-to-Text Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks.	YUN TANG et. al.	arxiv-cs.CL	2023-05-04
631	TrojanModel: A Practical Trojan Attack Against Automatic Speech Recognition Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While deep learning techniques have achieved great success in modern digital products, researchers have shown that deep learning models are susceptible to Trojan attacks. In a …	W. Zong; Yang-Wai Chow; Willy Susilo; Kien Do; S. Venkatesh;	2023 IEEE Symposium on Security and Privacy (SP)	2023-05-01
632	Edge Computing Solutions Supporting Voice Recognition Services for Speakers with Dysarthria Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the framework of Automatic Speech Recognition (ASR), the synergism between edge computing and artificial intelligence has led to the development of intelligent objects that …	Davide Mulfari; Lorenzo Carnevale; A. Galletta; M. Villari;	2023 IEEE/ACM 23rd International Symposium on Cluster, …	2023-05-01
633	Building A Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children.	Hiuchung Hung; Andreas Maier; Thorsten Piske;	arxiv-cs.CL	2023-04-30
634	Enhancing Multilingual Speech Recognition in Air Traffic Control By Sentence-level Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a two-stage multilingual ASR framework.	Peng Fan; Dongyue Guo; JianWei Zhang; Bo Yang; Yi Lin;	arxiv-cs.SD	2023-04-29
635	HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose HuBERT-AGG, a novel method that learns noise-invariant SSL representations for robust speech recognition by distilling aggregated layer-wise representations.	W. Wang; Y. Qian;	icassp	2023-04-27
636	Continual Learning for On-Device Speech Recognition Using Disentangled Conformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This algorithm produces ASR models consisting of a frozen ‘core’ network for general-purpose use and several tunable ‘augment’ networks for speaker-specific tuning. Using such models, we propose a novel compute-efficient continual learning algorithm called DisentangledCL.	A. DIWAN et. al.	icassp	2023-04-27
637	Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a speech summarization system that enables E2E summarization from 100 seconds, which is the limit of the conventional method, to up to 10 minutes (i.e., the duration of typical instructional videos on YouTube).	T. KANO et. al.	icassp	2023-04-27
638	DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for generative tasks such as speech enhancement and speech separation, most self-supervised speech representations did not show substantial improvements. To deal with this problem, in this paper, we propose data2vec-SG (Speech Generation), which is a teacher-student learning framework that addresses speech generation tasks.	H. WANG et. al.	icassp	2023-04-27
639	Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a series of approaches to integrate domain adapted Self-Supervised Learning (SSL) pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends and domain adapted wav2vec2.0 speech representations; b) frame-level joint decoding of TDNN systems separately trained using standard acoustic features alone and with additional wav2vec2.0 features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain adapted wav2vec2.0 models.	S. HU et. al.	icassp	2023-04-27
640	Joint Unsupervised and Supervised Learning for Context-Aware Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels.	J. PARK et. al.	icassp	2023-04-27
641	Stabilising and Accelerating Light Gated Recurrent Units for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the unbounded nature of its rectified linear unit on the candidate recurrent gate induces a gradient exploding phenomenon disrupting the training process and preventing it from being applied to medium to large ASR datasets. In this paper, we theoretically and empirically derive the necessary conditions for its stability as well as engineering mechanisms to speed up by a factor of five its training time, hence introducing a novel version of this architecture named SLi-GRU.	A. Moumen; T. Parcollet;	icassp	2023-04-27
642	Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Virtuoso, a massively multilingual speech–text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models.	T. SAEKI et. al.	icassp	2023-04-27
643	Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel modeling framework for effective training of end-to-end automatic speech recognition (ASR) models on various sources of data from diverse domains: speech paired with clean ground truth transcripts, speech with noisy pseudo transcripts from semi-supervised decodes and unpaired text-only data.	T. Fukuda; S. Thomas;	icassp	2023-04-27
644	Structured State Space Decoder for Speech Recognition and Synthesis IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we applied S4 as a decoder for ASR and text-to-speech (TTS) tasks, respectively, by comparing it with the Transformer decoder.	K. Miyazaki; M. Murata; T. Koriyama;	icassp	2023-04-27
645	The Edinburgh International Accents of English Corpus: Towards The Democratization of English ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first release of The Edinburgh International Accents of English Corpus (EdAcc).	R. SANABRIA et. al.	icassp	2023-04-27
646	A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains …	L. MENG et. al.	icassp	2023-04-27
647	Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How important are different temporal speech modulations for speech recognition? We answer this question from two complementary perspectives.	S. Sadhu; H. Hermansky;	icassp	2023-04-27
648	Vararray Meets T-Sot: Advancing The State of The Art of Streaming Distant Conversational Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry.	N. KANDA et. al.	icassp	2023-04-27
649	Enhancing Unsupervised Speech Recognition with Diffusion GANS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusionGAN.	X. Wu;	icassp	2023-04-27
650	Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The research community has produced many successful self-supervised speech representation learning methods over the past few years.	A. ELKAHKY et. al.	icassp	2023-04-27
651	Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.	F. WU et. al.	icassp	2023-04-27
652	Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-to-End Automated Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the potential of leveraging external knowledge, particularly through off-policy generated text-to-speech key-value stores, to allow for flexible post-training adaptation to new data distributions.	D. M. Chan; S. Ghosh; A. Rastrow; B. Hoffmeister;	icassp	2023-04-27
653	Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information.	H. LIU et. al.	icassp	2023-04-27
654	Improving Speech-to-Speech Translation Through Unlabeled Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data.	X. -P. NGUYEN et. al.	icassp	2023-04-27
655	SAN: A Robust End-to-End ASR Model Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition.	Z. Min; Q. Ge; G. Huang;	icassp	2023-04-27
656	Adaptive Multi-Corpora Language Model Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel adaptive multi-corpora training algorithm that dynamically learns and adjusts the sampling probability of each corpus along the training process.	Y. Ma; Z. Liu; X. Zhang;	icassp	2023-04-27
657	Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this work, we investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size.	P. MA et. al.	icassp	2023-04-27
658	Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since these are E2E models operating on speech directly, there remains a potential to improve their performance using purely text based models like BERT, which have strong language understanding capabilities. In this paper, we propose a new training criteria for RNN-T based E2E ASR and SLU to transfer BERT’s knowledge into these systems.	V. Sunder; S. Thomas; H. -K. J. Kuo; B. Kingsbury; E. Fosler-Lussier;	icassp	2023-04-27
659	Investigation Into Phone-Based Subword Units for Multilingual End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use of phone-based sub-words, specifically Byte Pair Encoding (BPE), as modeling units for multilingual end-to-end speech recognition.	S. Yusuyin; H. Huang; J. Liu; C. Liu;	icassp	2023-04-27
660	Transcription Free Filler Word Detection with Neural Semi-CRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate filler word detection system1 that does not depend on ASR systems.	G. Zhu; Y. Yan; J. -P. Caceres; Z. Duan;	icassp	2023-04-27
661	Text Is All You Need: Personalizing ASR Models Using Controllable Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data.	K. Yang; T. -Y. Hu; J. -H. R. Chang; H. Swetha Koppula; O. Tuzel;	icassp	2023-04-27
662	Robust Audio-Visual ASR with Unified Cross-Modal Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new audio-visual speech recognition model with a unified cross-modal attention mechanism.	J. Li; C. Li; Y. Wu; Y. Qian;	icassp	2023-04-27
663	Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce multi-resolution LBT to estimate the complex spectrograms from low to high time and frequency resolutions.	H. Taherian; D. Wang;	icassp	2023-04-27
664	Selective Film Conditioning with CTC-Based ASR Probability for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although prior studies have improved the performance, they are inefficient because the two networks are combined and require large model sizes. To address this limitation, we propose an efficient way to use feature-wise linear modulation (FiLM) conditioning with CTC-based ASR probabilities for the SE system.	D. -H. Yang; J. -H. Chang;	icassp	2023-04-27
665	Leveraging Large Text Corpora For End-To-End Speech Summarization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two novel methods that leverage a large amount of external text summarization data for E2E SSum training.	K. MATSUURA et. al.	icassp	2023-04-27
666	Automatic Severity Classification of Dysarthric Speech By Using Self-Supervised Model with Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning.	E. J. Yeo; K. Choi; S. Kim; M. Chung;	icassp	2023-04-27
667	Weavspeech: Data Augmentation Strategy For Automatic Speech Recognition Via Semantic-Aware Weaving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, if speech signals are indiscriminately mixed without considering semantics, the risk of generating nonsensical sentences arises. To address these issues, in this paper, we propose WeavSpeech, still a simple yet effective cut-and-paste augmentation method for ASR tasks that weaves a pair of speech data considering semantics.	K. Seo; J. Park; J. Song; E. Yang;	icassp	2023-04-27
668	The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge.	P. Guo; H. Wang; B. Mu; A. Zhang; P. Chen;	icassp	2023-04-27
669	Understanding Shared Speech-Text Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expandour understanding of the resulting shared speech-text representationswith two types of analyses.	GARY WANG et. al.	arxiv-cs.CL	2023-04-27
670	Ensemble Knowledge Distillation of Self-Supervised Speech Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously.	K. . -P. HUANG et. al.	icassp	2023-04-27
671	UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction.	J. GUO et. al.	icassp	2023-04-27
672	Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing multilingual hierarchical Softmax decoding.	Q. LIU et. al.	icassp	2023-04-27
673	An Analysis of Degenerating Speech Due to Progressive Dysarthria on ASR Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aims of this study were to (1) analyze the change of performance of ASR over time in individuals with degrading speech, and (2) explore mitigation strategies to optimize recognition throughout disease progression.	K. TOMANEK et. al.	icassp	2023-04-27
674	Towards Improved Room Impulse Response Estimation for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR).	A. RATNARAJAH et. al.	icassp	2023-04-27
675	Slot-Triggered Contextual Biasing For Personalized Speech Recognition Using Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method whereby the E2E ASR model is trained to emit opening and closing tags around slot content which are used to both selectively enable biasing and decide which catalog to use for biasing.	S. Tong; P. Harding; S. Wiesler;	icassp	2023-04-27
676	Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In streaming ASR, high accuracy is assured by attending to look-ahead frames, which leads to delay increments. To tackle this trade-off issue, we propose a multiple latency streaming ASR to achieve high accuracy with zero look-ahead.	H. ZHAO et. al.	icassp	2023-04-27
677	An Adapter Based Multi-Label Pre-Training for Speech Separation and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on HuBERT, this work investigates improving the SSL model for SS and SE.	T. Wang; X. Chen; Z. Chen; S. Yu; W. Zhu;	icassp	2023-04-27
678	Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we analyze the performance of features at different layers of a foundation model on the speech recognition task and propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models.	Z. HUO et. al.	icassp	2023-04-27
679	Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on End-to-End ASR, in this paper, we propose a simple yet effective method to overcome catastrophic forgetting: weight averaging.	S. Vander Eeckt; H. Van Hamme;	icassp	2023-04-27
680	Robust Multi-modal Speech Emotion Recognition with ASR Error Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an SER method robust to ASR errors.	B. Lin; L. Wang;	icassp	2023-04-27
681	Avoid Overthinking in Self-Supervised Models for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then motivate further research in EE by computing an optimal bound for performance versus speed trade-offs. To approach this bound we propose two new strategies for ASR: (1) we adapt the recently proposed patience strategy to ASR; and (2) we design a new EE strategy specific to ASR that performs better than all strategies previously introduced.	D. Berrebbi; B. Yan; S. Watanabe;	icassp	2023-04-27
682	Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When training data is lacking in ASR, a large-scale pre-training and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to over-come, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data.	L. P. Violeta; D. Ma; W. -C. Huang; T. Toda;	icassp	2023-04-27
683	Towards Accurate and Real-Time End-of-Speech Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variant of the endpoint (EP) detection problem in automatic speech recognition (ASR), which we call the end-of-speech (EOS) estimation.	Y. FAN et. al.	icassp	2023-04-27
684	Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude.	Y. Hu; C. Chen; R. Li; Q. Zhu; E. S. Chng;	icassp	2023-04-27
685	Self-Supervised Learning-Based Source Separation for Meeting Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, seven SSL models were compared on both simulated and real-world corpora.	Y. Li; X. Zheng; P. C. Woodland;	icassp	2023-04-27
686	Federated Self-Learning with Weak Supervision for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine transcriptions from a stronger ASR model.	M. RAO et. al.	icassp	2023-04-27
687	An ASR-Free Fluency Scoring Approach with Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a novel ASR-free approach for automatic fluency assessment using self-supervised learning (SSL).	W. LIU et. al.	icassp	2023-04-27
688	Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech.	Z. JIN et. al.	icassp	2023-04-27
689	Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR).	Y. Zhang; K. C. Puvvada; V. Lavrukhin; B. Ginsburg;	icassp	2023-04-27
690	Understanding Shared Speech-Text Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand our understanding of the resulting shared speech-text representations with two types of analyses.	G. WANG et. al.	icassp	2023-04-27
691	WL-MSR: Watch and Listen for Multimodal Subtitle Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Watch and Listen for Multimodal Subtitle Recognition (WL-MSR) framework to obtain comprehensive video subtitles, by fusing the information provided by Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) models.	J. Liu; H. Wang; W. Wang; X. He; J. Liu;	icassp	2023-04-27
692	Improving Noisy Student Training on Non-Target Domain Data for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks.	Y. Chen; W. Ding; J. Lai;	icassp	2023-04-27
693	Multi-Temporal Lip-Audio Memory for Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Multi-Temporal Lip-Audio Memory (MTLAM) that makes the best use of audio signals to complement insufficient information of lip movements.	J. H. Yeo; M. Kim; Y. M. Ro;	icassp	2023-04-27
694	Code-Switching Text Generation and Injection in Mandarin-English ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T), in Mandarin-English code-switching speech recognition.	H. YU et. al.	icassp	2023-04-27
695	Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to factorize out the language component in the AED model, we propose the factorized attention-based encoder-decoder (Factorized AED) model whose decoder takes as input the posterior probabilities of a jointly trained LM.	X. Gong; W. Wang; H. Shao; X. Chen; Y. Qian;	icassp	2023-04-27
696	Align, Write, Re-Order: Explainable End-to-End Speech Translation Via Operation Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The black-box nature of end-to-end speech-to-text translation (E2E ST) makes it difficult to understand how source language inputs are being mapped to the target language. To solve this problem, we propose to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word.	M. Omachi; B. Yan; S. Dalmia; Y. Fujita; S. Watanabe;	icassp	2023-04-27
697	Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget.	R. Gody; D. Harwath;	icassp	2023-04-27
698	Anchored Speech Recognition with Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.	D. RAJ et. al.	icassp	2023-04-27
699	Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the effective finetuning of a large-scale pretrained model for automatic speech recognition (ASR) of lowresource languages with only a one-hour matched dataset.	K. Soky; S. Li; C. Chu; T. Kawahara;	icassp	2023-04-27
700	Improving Fairness and Robustness in End-to-End Speech Recognition Through Unsupervised Clustering IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a privacy preserving approach to improve fairness and robustness of end-to-end ASR without using metadata, zip codes, or even speaker or utterance embeddings directly in training.	I. -E. Veliche; P. Fung;	icassp	2023-04-27
701	Representation of Vocal Tract Length Transformation Based on Group Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the property of vocal tract length transformation (VTLT) that forms a group, and derive the novel speech representation VTL spectrum based on group theory analysis, where only the phase of the VTL spectrum is changed by VTLT, which is a simple linear shift.	A. Miyashita; T. Toda;	icassp	2023-04-27
702	Self-Supervised Representations in Speech-Based Depression Detection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL).	W. Wu; C. Zhang; P. C. Woodland;	icassp	2023-04-27
703	Visual Information Matters for ASR Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The other is that the community lacks a high-quality benchmark where visual information matters for the EC models. Therefore, this paper provides 1) simple yet effective methods, namely gated fusion and image captions as prompts to incorporate visual information to help EC; 2) large-scale benchmark datasets, namely Visual-ASR-EC, where each item in the training data consists of visual, speech, and text information, and the test data are carefully selected by human annotators to ensure that even humans could make mistakes when visual information is missing.	V. B. Kumar; S. Cheng; N. Peng; Y. Zhang;	icassp	2023-04-27
704	Multi-Blank Transducers for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In standard RNN-T, the emission of a blank symbol consumes exactly one input frame; in our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted.	H. Xu; F. Jia; S. Majumdar; S. Watanabe; B. Ginsburg;	icassp	2023-04-27
705	De’hubert: Disentangling Noise in A Self-Supervised Model for Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow’s redundancy-reduction principle.	D. NG et. al.	icassp	2023-04-27
706	Towards Zero-Shot Code-Switched Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot set-ting where no transcribed CS speech data is available for training.	B. Yan; M. Wiesner; O. Klejch; P. Jyothi; S. Watanabe;	icassp	2023-04-27
707	Database-Aware ASR Error Correction for Speech-to-SQL Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an ASR correction method, DBATI (DataBase-Aware TaggerILM).	Y. Shao; A. Kumar; N. Nakashole;	icassp	2023-04-27
708	MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel UDA approach for ASR via inter-domain MAtching and intra-domain DIscrimination (MADI), which improves the model transferability by fine-grained inter-domain matching and discriminability by intra-domain contrastive discrimination simultaneously.	J. Zhou; S. Zhao; N. Jiang; G. Zhao; Y. Qin;	icassp	2023-04-27
709	Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to overcome CF for E2E ASR by inserting adapters, small architectures of few parameters which allow a general model to be fine-tuned to a specific task, into our model.	S. V. Eeckt; H. Van Hamme;	icassp	2023-04-27
710	Improving Accented Speech Recognition with Multi-Domain Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models.	L. Maison; Y. Esteve;	icassp	2023-04-27
711	Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new shallow fusion (SF) method to exploit an external backward language model (BLM) for end-to-end automatic speech recognition (ASR).	A. Ogawa; T. Moriya; N. Kamo; N. Tawara; M. Delcroix;	icassp	2023-04-27
712	From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English automatic speech recognition (ASR) models to recognize the other languages.	C. -H. H. YANG et. al.	icassp	2023-04-27
713	A Speech Representation Anonymization Framework Via Selective Noise Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a speech anonymization framework that achieves privacy via noise perturbation to a selected subset of the high-utility representations extracted using a pre-trained speech encoder.	M. Tran; M. Soleymani;	icassp	2023-04-27
714	Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we improve the self-adaptive TTS using character-vocabulary level ASR feedback at higher granularity, considering the losses in the positive and negative classes.	S. Novitasari; S. Sakti; S. Nakamura;	icassp	2023-04-27
715	LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor and then embeds token-level long-form features inside the vocabulary predictor, with a pre-trained contextual encoder RoBERTa to further boost the performance.	X. GONG et. al.	icassp	2023-04-27
716	Exploring Wav2vec 2.0 Fine Tuning for Improved Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that V-FT is able to outperform state-of-the-art models on the IEMOCAP dataset.	L. -W. Chen; A. Rudnicky;	icassp	2023-04-27
717	Improving Non-Autoregressive Speech Recognition with Autoregressive Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AR pretraining to the NAR encoder to reduce the accuracy gap between AR and NAR models.	Y. Li; L. Samarakoon; I. Fung;	icassp	2023-04-27
718	Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose to employ a novel bidirectional attention mechanism (BiAM) to jointly learn both ASR encoder (bottom layers) and text encoder with a multi-modal learning method.	Y. Yang; H. Xu; H. Huang; E. S. Chng; S. Li;	icassp	2023-04-27
719	Simulating Realistic Speech Overlaps Improves Multi-Talker ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an improved technique to simulate multi-talker overlap-ping speech with realistic speech overlaps, where an arbitrary pattern of speech overlaps is represented by a sequence of discrete tokens.	M. YANG et. al.	icassp	2023-04-27
720	Multi-modal ASR Error Correction with Joint ASR Error Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To include the audio information for better error correction, we propose a sequence-to-sequence multi-modal ASR error correction model.	B. Lin; L. Wang;	icassp	2023-04-27
721	Pretraining Conformer with ASR for Speaker Verification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to pretrain Conformer with automatic speech recognition (ASR) task for speaker verification.	D. Cai; W. Wang; M. Li; R. Xia; C. Huang;	icassp	2023-04-27
722	Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data.	Y. Tseng; C. -I. J. Lai; H. -Y. Lee;	icassp	2023-04-27
723	Context-Aware End-to-end ASR Using Self-Attentive Embedding and Tensor Fusion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a context-aware end-to-end ASR model that injects the self-attentive context embedding into the decoder of the recurrent neural network transducer (RNN-T).	S. -Y. Chang; C. Zhang; T. N. Sainath; B. Li; T. Strohman;	icassp	2023-04-27
724	Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models.	J. SHI et. al.	icassp	2023-04-27
725	Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Cleanformer —a streaming multichannel neural enhancement frontend for automatic speech recognition (ASR).	J. Caroselli; A. Narayanan; N. Howard; T. O’Malley;	icassp	2023-04-27
726	UML: A Universal Monolingual Output Layer For Multilingual Asr Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For multilingual ASR, due to the differences in written scripts across languages, multilingual WPMs bring the challenges of having overly large output layers and scaling to more languages. In this work, we propose a universal monolingual output layer (UML) to address such problems.	C. Zhang; B. Li; T. N. Sainath; T. Strohman; S. -Y. Chang;	icassp	2023-04-27
727	MoLE : Mixture Of Language Experts For Multi-Lingual Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a multi-lingual speech recognition network named Mixture-of-Language-Experts (MoLE), which digests speech in a variety of languages.	Y. Kwon; S. -W. Chung;	icassp	2023-04-27
728	Context-Aware Fine-Tuning of Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the use of context, i.e., surrounding segments, during fine-tuning and propose a new approach called context-aware fine-tuning.	S. SHON et. al.	icassp	2023-04-27
729	Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input.	H. LU et. al.	icassp	2023-04-27
730	Multi-Speaker Data Augmentation for Improved End-to-end Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While E2E ASR models achieve state-of-the-art performance on recognition tasks that match well with such training data, they are observed to fail on test recordings that contain multiple speakers, significant channel or background noise or span longer durations than training data utterances. To mitigate these issues, we propose an on-the-fly data augmentation strategy that transforms single speaker training data into multiple speaker data by appending together multiple single speaker utterances.	S. Thomas; H. -K. J. Kuo; G. Saon; B. Kingsbury;	icassp	2023-04-27
731	Learning ASR Pathways: A Sparse Multilingual ASR Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in multilingual ASR, language-agnostic pruning may lead to severe performance drops on some languages because language-agnostic pruning masks may not fit all languages and discard important language-specific parameters. In this work, we present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks (pathways), such that the parameters for each language are learned explicitly.	M. YANG et. al.	icassp	2023-04-27
732	Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective …	Eduardo Medeiros; Leonel Corado; Luís Rato; P. Quaresma; Pedro Salgueiro;	Future Internet	2023-04-24
733	Using Automatic Speech Recognition to Measure The Intelligibility of Speech Synthesized From Brain Signals Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Brain-computer interfaces (BCIs) can potentially restore lost function in patients with neurological injury. A promising new application of BCI technology has focused on speech …	Suvi Varshney; D. Farias; David M. Brandman; S. Stavisky; Lee M. Miller;	2023 11th International IEEE/EMBS Conference on Neural …	2023-04-24
734	Situating Automatic Speech Recognition Development Within Communities of Under-heard Language Speakers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper we develop approaches to automatic speech recognition (ASR) development that suit the needs and functions of under-heard language speakers. Our novel contribution to …	THOMAS REITMAIER et. al.	Proceedings of the 2023 CHI Conference on Human Factors in …	2023-04-19
735	Speech Command Recognition Based on Convolutional Spiking Neural Networks Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This article presents a new technique for speech recognition that combines Convolutional Neural Networks (CNNs) with Spiking Neural Networks (SNNs) to create an SNNCNN model. The …	Erik Sadovsky; Maroš Jakubec; R. Jarina;	2023 33rd International Conference Radioelektronika …	2023-04-19
736	Transfer Learning from English to Slovak in Speech Recognition Applications Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The speed of research can be slowed down by limited access to SOTA computational resources. This paper compares two different speech recognition approaches, training from scratch …	A. Buday; J. Juhár; A. Cizmár;	2023 33rd International Conference Radioelektronika …	2023-04-19
737	Collaboratively Mitigating Racial Disparities in Automated Speech Recognition and Language Technologies with African American English Speakers: Community-Collaborative and Equity-Centered Approaches Toward Designing Inclusive Natural Language Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automated speech recognition (ASR) systems that rely on natural language processing (NLP) techniques are becoming increasingly prevalent within people’s everyday lives. From …	Jay L. Cunningham;	Extended Abstracts of the 2023 CHI Conference on Human …	2023-04-19
738	Improving Automatic Summarization for Browsing Longform Spoken Dialog IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Longform spoken dialog delivers rich streams of informative content through podcasts, interviews, debates, and meetings. While production of this medium has grown tremendously, …	Daniel Li; Thomas Chen; Alec Zadikian; Albert Tung; Lydia B. Chilton;	Proceedings of the 2023 CHI Conference on Human Factors in …	2023-04-19
739	Political Corpus Creation Through Automatic Speech Recognition on EU Debates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a transcribed corpus of the LIBE committee of the EU parliament, totalling 3.6 Million running words.	Hugo de Vos; Suzan Verberne;	arxiv-cs.CL	2023-04-17
740	Speech2Spikes: Efficient Audio Encoding Pipeline for Real-time Neuromorphic Systems IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Despite the maturity and availability of speech recognition systems, there are few available spiking speech recognition tasks that can be implemented with current neuromorphic …	Kenneth Michael Stewart; Timothy M. Shea; Noah Pacik-Nelson; Eric M Gallo; Andreea Danielescu;	Proceedings of the 2023 Annual Neuro-Inspired Computational …	2023-04-11
741	Speech Recognition Method Based on Deep Learning of Artificial Intelligence: An Example of BLSTM-CTC Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Under the influence of information, network and intelligent high-speed development situation, China’s intelligent technology and other aspects have made great progress and …	Kangyu Chen; Zhiyuan Peng;	Proceedings of the 2023 5th International Symposium on …	2023-03-24
742	Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this work, we investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size.	PINGCHUAN MA et. al.	arxiv-cs.CV	2023-03-24
743	Beyond Universal Transformer: Block Reusing with Adaptor in Transformer for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the drawback of universal Transformer models for the application of ASR on edge devices, we propose a solution that can reuse the block in Transformer models for the occasion of the small footprint ASR system, which meets the objective of accommodating resource limitations without compromising recognition accuracy.	Haoyu Tang; Zhaoyi Liu; Chang Zeng; Xinfeng Li;	arxiv-cs.SD	2023-03-23
744	A Deep Learning System for Domain-specific Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the-art automatic speech recognition (ASR) systems are proposed. However, …	Yanan Jia;	arxiv-cs.CL	2023-03-18
745	A Helium Speech Unscrambling Algorithm Based on Deep Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Helium speech, the language spoken by divers in the deep sea who breathe a high-pressure helium–oxygen mixture, is almost unintelligible. To accurately unscramble helium speech, a …	Yonghong Chen; Shibing Zhang;	Inf.	2023-03-17
746	Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data.	Yuan Tseng; Cheng-I Lai; Hung-yi Lee;	arxiv-cs.CL	2023-03-15
747	Improving Accented Speech Recognition with Multi-Domain Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models.	Lucas Maison; Yannick Estève;	arxiv-cs.LG	2023-03-14
748	Adapting Off-the-Shelf Speech Recognition Systems for Novel Words Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to …	Wiam Fadel; T. Bouchentouf; Pierre-André Buvet; Omar Bourja;	Inf.	2023-03-13
749	Utilizing Prior Knowledge to Improve Automatic Speech Recognition in Human-Robot Interactive Scenarios Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The prolificacy of human-robot interaction not only depends on a robot’s ability to understand the intent and content of the human utterance but also gets impacted by the …	Pradip Pramanick; Chayan Sarkar;	Companion of the 2023 ACM/IEEE International Conference on …	2023-03-13
750	MIXPGD: Hybrid Adversarial Training for Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose mixPGD adversarial training method to improve the robustness of the model for ASR systems.	Aminul Huq; Weiyi Zhang; Xiaolin Hu;	arxiv-cs.SD	2023-03-10
751	An Overview of Bengali Speech Recognition: Methods, Challenges, and Future Direction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the subject of human-computer interactions, speech recognition is an appealing technique that gives users the opportunity to interact with and control the machine. Currently, …	NABILA TASNIA et. al.	2023 IEEE 13th Annual Computing and Communication Workshop …	2023-03-08
752	End-to-End Speech Recognition: A Survey Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, …	Rohit Prabhavalkar; Takaaki Hori; Tara N. Sainath; R. Schluter; Shinji Watanabe;	ArXiv	2023-03-03
753	Leveraging Large Text Corpora for End-to-End Speech Summarization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two novel methods that leverage a large amount of external text summarization data for E2E SSum training.	KOHEI MATSUURA et. al.	arxiv-cs.CL	2023-03-02
754	Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages.	YU ZHANG et. al.	arxiv-cs.CL	2023-03-02
755	MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages.	MOHAMED ANWAR et. al.	arxiv-cs.CL	2023-03-01
756	N-best T5: Robust ASR Error Correction Using Multiple Input Hypotheses and Constrained Decoding Space IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input.	Rao Ma; Mark J. F. Gales; Kate M. Knill; Mengjie Qian;	arxiv-cs.CL	2023-03-01
757	Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation.	PHILIPP KLUMPP et. al.	arxiv-cs.CL	2023-03-01
758	Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends and domain adapted wav2vec2.0 speech representations; b) frame-level joint decoding of TDNN systems separately trained using standard acoustic features alone and with additional wav2vec2.0 features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain adapted wav2vec2.0 models.	SHUJIE HU et. al.	arxiv-cs.SD	2023-02-28
759	DeHuBERT: Disentangling Noise in A Self-supervised Model for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow’s redundancy-reduction principle.	DIANWEN NG et. al.	arxiv-cs.SD	2023-02-28
760	Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In this paper, we propose a language-universal adapter learning framework based on a pre-trained model for end-to-end multilingual automatic speech recognition (ASR). For acoustic …	Zhijie Shen; Wu Guo; Bin Gu;	ArXiv	2023-02-28
761	Deep Learning Methods for Arabic Autoencoder Speech Recognition System for Electro-Larynx Device Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent advances in speech recognition have achieved remarkable performance comparable with human transcribers’ abilities. But this significant performance is not the same for all …	Z. J. M. Ameen; A. Kadhim;	Adv. Hum. Comput. Interact.	2023-02-28
762	Multimodal Speech Recognition for Language-Guided Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose training a multimodal ASR model to reduce errors in transcribing spoken instructions by considering the accompanying visual context.	ALLEN CHANG et. al.	arxiv-cs.CL	2023-02-27
763	Diacritic Recognition Performance in Arabic ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an analysis of diacritic recognition performance in Arabic Automatic Speech Recognition (ASR) systems.	Hanan Aldarmaki; Ahmad Ghannam;	arxiv-cs.CL	2023-02-27
764	Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications …	Jaeyoung Huh; Sangjoon Park; Jeonghyeon Lee; Jong-Chul Ye;	ArXiv	2023-02-27
765	A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we summarize and compare different data augmentation strategies using S3PRL toolkit.	Mina Huh; Ruchira Ray; Corey Karnei;	arxiv-cs.SD	2023-02-27
766	Speech Corpora Divergence Based Unsupervised Data Selection for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora.	Changfeng Gao; Gaofeng Cheng; Pengyuan Zhang; Yonghong Yan;	arxiv-cs.CL	2023-02-25
767	Parturition Hindi Speech Dataset for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While automatic speech recognition (ASR) technologies have become mature, they are mostly being developed by industry for large scale commercial applications. There are many niche …	VANSH BANSAL et. al.	2023 National Conference on Communications (NCC)	2023-02-23
768	MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel UDA approach for ASR via inter-domain MAtching and intra-domain DIscrimination (MADI), which improves the model transferability by fine-grained inter-domain matching and discriminability by intra-domain contrastive discrimination simultaneously.	Jiaming Zhou; Shiwan Zhao; Ning Jiang; Guoqing Zhao; Yong Qin;	arxiv-cs.CL	2023-02-22
769	An Approach for Speech Enhancement with Dysarthric Speech Recognition Using Optimization Based Machine Learning Frameworks Related Papers Related Patents Related Grants Related Venues Related Experts View	Bhuvaneshwari Jolad; Rajashri Khanai;	International Journal of Speech Technology	2023-02-21
770	Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the application of language and speech technology to open-ended questions in a Dutch panel survey.	Henk van den Heuvel; Martijn Bentum; Simone Wills; Judith C. Koops;	arxiv-cs.CL	2023-02-21
771	A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains …	LINGWEI MENG et. al.	arxiv-cs.SD	2023-02-20
772	Reconsidering Read and Spontaneous Speech: Causal Perspectives on The Generation of Training Data for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic …	Philipp Gabler; Bernhard C. Geiger; Barbara Schuppler; Roman Kern;	Inf.	2023-02-19
773	Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words.	Leyuan Qu; Cornelius Weber; Stefan Wermter;	arxiv-cs.CL	2023-02-19
774	Chinese ASR and NER Improvement Based on Whisper Fine-Tuning IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Based on 680k hours of weakly supervised multilingual and multi-task speech transcription/translation data, Whisper [1] has developed a robust system for both Automated Speech …	Hao Yang; Min Zhang; Shimin Tao; Miaomiao Ma; Ying Qin;	2023 25th International Conference on Advanced …	2023-02-19
775	QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a lightweight VITS-based VC model that uses the HuBERT-Soft model to extract content information features without speaker information.	Houjian Guo; Chaoran Liu; Carlos Toshinori Ishi; Hiroshi Ishiguro;	arxiv-cs.SD	2023-02-16
776	ASR Bundestag: A Large-Scale Political Debate Dataset in German Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ASR Bundestag, a dataset for automatic speech recognition in German, consisting of 610 hours of aligned audio-transcript pairs for supervised training as well as 1,038 hours of unlabeled audio snippets for self-supervised learning, based on raw audio data and transcriptions from plenary sessions and committee meetings of the German parliament.	Johannes Wirth; René Peinl;	arxiv-cs.CL	2023-02-12
777	Leveraging Supplementary Text Data to Kick-start Automatic Speech Recognition System Development with Limited Transcriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the use of different amounts of text data, both for creating a lexicon that constrains ASR decoding to possible words (e.g. *dogz vs. dogs), and for training larger language models that bias the system toward probable word sequences (e.g. too dogs vs. two dogs).	NAY SAN et. al.	arxiv-cs.CL	2023-02-09
778	PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose PATCorrect-a novel non-autoregressive (NAR) approach based on multi-modal fusion leveraging representations from both text and phoneme modalities, to reduce word error rate (WER) and perform robustly with varying input transcription quality.	Ziji Zhang; Zhehui Wang; Rajesh Kamma; Sharanya Eswaran; Narayanan Sadagopan;	arxiv-cs.CL	2023-02-09
779	MAC: A Unified Framework Boosting Low Resource Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC).	Zeping Min; Qian Ge; Zhong Li; Weinan E;	arxiv-cs.CL	2023-02-05
780	Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN).	MINGLUN HAN et. al.	arxiv-cs.NE	2023-02-02
781	Prioritizing Speech Test Cases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PROPHET (PRiOritizing sPeecH tEsT), a tool that predicts potential error-uncovering speech test cases only based on their reference texts.	ZHOU YANG et. al.	arxiv-cs.SE	2023-02-01
782	Measuring The Intelligibility of Dysarthric Speech Through Automatic Speech Recognition in A Pluricentric Language Related Papers Related Patents Related Grants Related Venues Related Experts View	Wei Xue; C. Cucchiarini; R. Hout; H. Strik;	Speech Commun.	2023-02-01
783	In-Situ Text-Only Adaptation of Speech Models with Low-Overhead Speech Imputations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach (TOLSTOI) that imputes speech representations internal to a baseline RNN-T, starting from text-only inputs, and performs in-situ adaptation that results in higher adaptation accuracy without any runtime overheads during decoding.	Ashish Mittal; Sunita Sarawagi; Preethi Jyothi;	iclr	2023-02-01
784	Attention-based Latent Features for Jointly Trained End-to-end Automatic Speech Recognition with Modified Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View	Dali Yang; Joon‐Hyuk Chang;	J. King Saud Univ. Comput. Inf. Sci.	2023-02-01
785	Improving Rare Words Recognition Through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the problem of lowresource Cantonese speech recognition, this paper presents a novel homophone extension method to integrate human knowledge of the homophone lexicon into the beam search decoding process with language model re-scoring.	HOLAM CHUNG et. al.	arxiv-cs.CL	2023-02-01
786	A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The proliferation of personal artificial intelligence (AI) -assistant technologies with speech-based conversational AI interfaces is driving the exponential growth in the consumer …	THIERRY TAMBE et. al.	IEEE Journal of Solid-State Circuits	2023-02-01
787	Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers Via Hierarchical Distillation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems, we propose the hierarchical knowledge distillation (HKD) on the continuous integrate-and-fire (CIF) based ASR models.	Minglun Han; Feilong Chen; Jing Shi; Shuang Xu; Bo Xu;	arxiv-cs.CL	2023-01-30
788	Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, to the best of our knowledge, there isn’t a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint.	Tanvi Dinkar; Chloé Clavel; Ioana Vasilescu;	arxiv-cs.CL	2023-01-25
789	From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages.	CHAO-HAN HUCK YANG et. al.	arxiv-cs.SD	2023-01-18
790	Syllable Subword Tokens for Open Vocabulary Speech Recognition in Malayalam Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we investigate the impact of using syllables as subword tokens instead of words in Malayalam ASR, and evaluate the relative improvement in lexicon size, model memory requirement and word error rate.	Kavya Manohar; A. R. Jayan; Rajeev Rajan;	arxiv-cs.CL	2023-01-17
791	M2ASR-KIRGHIZ: A Free Kirghiz Speech Database and Accompanied Baselines Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep learning has significantly boosted the performance improvement of automatic speech recognition (ASR) with the cooperation of large amounts of data resources. For minority …	Ikram Mamtimin; Wenqiang Du; A. Hamdulla;	Inf.	2023-01-16
792	Using Kaldi for Automatic Speech Recognition of Conversational Austrian German Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ASR experiments with read and conversational Austrian German as target.	Julian Linke; Saskia Wepner; Gernot Kubin; Barbara Schuppler;	arxiv-cs.CL	2023-01-16
793	Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Context within the segments produced by ASR decoders can be helpful but limiting in overall punctuation performance for a continuous speech session. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios.	Piyush Behre; Sharman Tan; Padma Varadharajan; Shuangyu Chang;	arxiv-cs.CL	2023-01-10
794	Interdecoder: Using Attention Decoders As Intermediate Regularization for CTC-Based Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We propose InterDecoder: a new non-autoregressive automatic speech recognition (NAR-ASR) training method that injects the advantage of token-wise autoregressive decoders while …	Tatsuya Komatsu; Yusuke Fujita;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
795	Exploring A Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We build a single automatic speech recognition (ASR) model for several south Indian languages using a common set of intermediary labels, which can be easily mapped to the desired …	C. Anoop; A. Ramakrishnan;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
796	Untied Positional Encodings for Efficient Transformer-Based Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Self-attention has become a vital component for end-to-end (E2E) automatic speech recognition (ASR). Convolution-augmented Transformer (Conformer) with relative positional …	Lahiru Samarakoon; Ivan Fung;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
797	Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper investigates an approach for adapting RNN-Transducer (RNN-T) based automatic speech recognition (ASR) model to improve the recognition of unseen words during training. …	Sungjun Han; Deepak Baby; Valentin Mendelev;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
798	Learning Mask Scalars for Improved Robust Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Improving robustness of streaming automatic speech recognition (ASR) systems using neural network based acoustic frontends is challenging because of causality constraints and the …	A. Narayanan; James Walker; S. Panchapagesan; N. Howard; Yuma Koizumi;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
799	Improving Luxembourgish Speech Recognition with Cross-Lingual Speech Representations Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Luxembourgish is a West Germanic language spoken by roughly 390,000 people, mainly in Luxembourg. It is one of Europe’s under-described and under-resourced languages, not …	Le-Minh Nguyen; Shekhar Nayak; M. Coler;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
800	A Truly Multilingual First Pass and Monolingual Second Pass Streaming On-Device ASR System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) systems need to be accurate, have low latency, and effectively handle language switching in order to be useful for the 60% of the world …	S. MAVANDADI et. al.	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
801	ASBERT: ASR-Specific Self-Supervised Learning with Self-Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Pre-training of self-supervised learning (SSL) generally shows a good performance on various speech processing tasks. However, this pre-training scheme may lead to a sub-optimal …	H. KIM et. al.	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
802	Context-Aware Neural Confidence Estimation for Rare Word Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Confidence estimation for automatic speech recognition (ASR) is important for many downstream tasks. Recently, neural confidence estimation models (CEMs) have been shown to …	David Qiu; Tsendsuren Munkhdalai; Yanzhang He; K. Sim;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
803	How Do Phonological Properties Affect Bilingual Automatic Speech Recognition? Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multilingual Automatic Speech Recognition (ASR) for Indian languages is an obvious technique for leveraging their similarities. We present a detailed analysis of how phonological …	Shelly Jain; Aditya Yadavalli; Ganesh S Mirishkar; A. Vuppala;	2022 IEEE Spoken Language Technology Workshop (SLT)	2023-01-09
804	SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use.	Heli Qi; Sashi Novitasari; Andros Tjandra; Sakriani Sakti; Satoshi Nakamura;	arxiv-cs.CL	2023-01-07
805	Submission of USTC’s System for The IWSLT 2023 – Offline Speech Translation Track Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the submissions of the research group USTC-NELSLIP to the 2023 IWSLT Offline Speech Translation competition, which involves translating spoken English into …	XINYUAN ZHOU et. al.	International Workshop on Spoken Language Translation	2023-01-01
806	QUESPA Submission for The IWSLT 2023 Dialect and Low-resource Speech Translation Tasks IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This article describes the QUESPA team speech translation (ST) submissions for the Quechua to Spanish (QUE–SPA) track featured in the Evaluation Campaign of IWSLT 2023: …	John E. Ortega; Rodolfo Zevallos; William Chen;	International Workshop on Spoken Language Translation	2023-01-01
807	Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Owing to the linguistic richness of the Arabic language, which contains more than 6000 roots, building a reliable Arabic language model for Arabic speech recognition systems faces …	Mona A. Azim; Wedad Hussein; N. Badr;	IEEE Access	2023-01-01
808	Towards Training Bilingual and Code-Switched Speech Recognition Models from Monolingual Data Sources Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multilingual Automatic Speech Recognition (ASR) models are capable of transcribing audios across multiple languages, eliminating the need for separate models. In addition, they …	Kunal Dhawan; Dima Rekesh; Boris Ginsburg;	ArXiv	2023-01-01
809	Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Modern speech recognition systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down …	Titouan Parcollet; R. V. Dalen; Shucong Zhang; Sourav Bhattacharya;	ArXiv	2023-01-01
810	SRI-B’s Systems for IWSLT 2023 Dialectal and Low-resource Track: Marathi-Hindi Speech Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the speech translation systems SRI-B developed for the IWSLT 2023 Evaluation Campaign Dialectal and Low-resource track: Marathi-Hindi Speech Translation. We …	BALAJI RADHAKRISHNAN et. al.	International Workshop on Spoken Language Translation	2023-01-01
811	CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: As an essential technology in human–computer interactions, automatic speech recognition (ASR) ensures a convenient life for healthy people; however, people with speech disorders, …	Yuqin Lin; Longbiao Wang; Yanbing Yang; Jianwu Dang;	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
812	The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents the extscMineTrans English-to-Chinese speech translation systems developed for two challenge tracks of IWSLT 2023, i.e., Offline Speech Translation (S2T) and …	YICHAO DU et. al.	International Workshop on Spoken Language Translation	2023-01-01
813	Query-Efficient Black-Box Adversarial Attacks on Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The susceptibility of Deep Neural Networks (DNNs) to adversarial attacks has raised concerns regarding their practical applications in real-world scenarios. Although the …	CHUXUAN TONG et. al.	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
814	Query-Efficient Adversarial Attack With Low Perturbation Against End-to-End Speech Recognition Systems IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the widespread use of automated speech recognition (ASR) systems in modern consumer devices, attack against ASR systems have become an attractive topic in recent years. …	SHEN WANG et. al.	IEEE Transactions on Information Forensics and Security	2023-01-01
815	Bilingual Automatic Speech Recognition: A Review, Taxonomy and Open Challenges Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this technological era, smart and intelligent systems that are integrated with artificial intelligence (AI) techniques, algorithms, tools, and technologies, have impact on …	A. M. ABUSHARIAH et. al.	IEEE Access	2023-01-01
816	Listen, Decipher and Sign: Toward Unsupervised Speech-to-Sign Language Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Existing supervised sign language recognition systems rely on an abundance of well-annotated data. Instead, an unsupervised speech-to-sign language recognition (SSR-U) system …	LIMING WANG et. al.	Annual Meeting of the Association for Computational …	2023-01-01
817	JHU IWSLT 2023 Dialect Speech Translation System Description Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents JHU’s submissions to the IWSLT 2023 dialectal and low-resource track of Tunisian Arabic to English speech translation. The Tunisian dialect lacks formal …	A. HUSSEIN et. al.	International Workshop on Spoken Language Translation	2023-01-01
818	A Deep Diacritics-Based Recognition Model for Arabic Speech: Quranic Verses As Case Study Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Arabic is the language of more than 422 million of the world’s population. Although classic Arabic is the Quran language that 1.9 billion Muslims are required to recite, limited …	Sarah S. Alrumiah; Amal A. Al-Shargabi;	IEEE Access	2023-01-01
819	RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges within the speech community, particularly when confronted with significant reverberation …	Yiwen Shao; Shi-Xiong Zhang; Dong Yu;	ArXiv	2023-01-01
820	Spectral Analysis of EEG Signals for Automatic Imagined Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Brain–computer interface (BCI) systems are intended to provide a means of communication for both the healthy and those suffering from neurological disorders. Imagined speech …	Ashwin Kamble; P. Ghare; Vinay Kumar; Ashwin Kothari; A. Keskar;	IEEE Transactions on Instrumentation and Measurement	2023-01-01
821	Evaluating and Improving Automatic Speech Recognition Using Severity Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A common metric for evaluating Automatic Speech Recognition (ASR) is Word Error Rate (WER) which solely takes into account discrepancies at the word-level. Although useful, WER is …	Ryan Whetten; C. Kennington;	Workshop on Biomedical Natural Language Processing	2023-01-01
822	A CIF-Based Speech Segmentation Method for Streaming E2E ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Long utterances segmentation is crucial in end-to-end (E2E) streaming automatic speech recognition (ASR). However, commonly used voice activity detection(VAD)-based and …	Yuchun Shu; Haoneng Luo; Shiliang Zhang; Longbiao Wang; J. Dang;	IEEE Signal Processing Letters	2023-01-01
823	Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Training Automatic Speech Recognition (ASR) systems with sequentially incoming data from alternate domains is an essential milestone in order to reach human intelligibility level …	Shahram Ghorbani; J. Hansen;	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
824	Uncovering The Potential for A Weakly Supervised End-to-End Model in Recognising Speech from Patient with Post-Stroke Aphasia Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Post-stroke speech and language deficits (aphasia) significantly impact patients’ quality of life. Many with mild symptoms remain undiagnosed, and the majority do not receive the …	Giulia Sanguedolce; P. Naylor; F. Geranmayeh;	Clinical Natural Language Processing Workshop	2023-01-01
825	A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Though speech enhancement (SE) can be used to improve speech quality in noisy environments, it may also cause distortions that degrade the performance of automatic speech …	Qiu-shi Zhu; J. Zhang; Zitian Zhang; Lirong Dai;	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
826	Recognition of English Speech – Using A Deep Learning Algorithm Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The accurate recognition of speech is beneficial to the fields of machine translation and intelligent human–computer interaction. After briefly introducing speech recognition …	Shuyan Wang;	Journal of Intelligent Systems	2023-01-01
827	Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition of a target speaker in the presence of interfering speakers remains a challenging issue. One approach to tackle this problem is target-speaker speech …	Takafumi Moriya; Hiroshi Sato; Tsubasa Ochiai; Marc Delcroix; T. Shinozaki;	IEEE Access	2023-01-01
828	End-to-End Multi-Modal Speech Recognition on An Air and Bone Conducted Speech Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) has been significantly improved in the past years. However, most robust ASR systems are based on air-conducted (AC) speech, and their …	Mou-Sheng Wang; Junqi Chen; Xiao-Lei Zhang; S. Rahardja;	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
829	AdvDDoS: Zero-Query Adversarial Attacks Against Commercial Speech Recognition Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) has been widely and commercially employed in health care, autonomous vehicles, and finance. Yet, recent studies have shown that universal …	Yunjie Ge; Lingchen Zhao; Qian Wang; Yiheng Duan; Minxin Du;	IEEE Transactions on Information Forensics and Security	2023-01-01
830	Towards Recognition for Radio-Echo Speech in Air Traffic Control: Dataset and A Contrastive Learning Approach Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the air traffic control (ATC) domain, automatic speech recognition (ASR) suffers from radio speech echo, which cannot be addressed by existing echo cancellation due to …	YI LIN et. al.	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
831	A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Both unpaired speech and text have shown to be beneficial for low-resource automatic speech recognition (ASR), which, however were either separately used for pre-training, …	Ye Du; J Zhang; Xin Fang; Ming Wu; Zhouwang Yang;	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
832	Investigating Phoneme Similarity with Artificially Accented Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While the deep learning revolution has led to significant performance improvements in speech recognition, accented speech remains a challenge. Current approaches to this challenge …	Margot Masson; Julie Carson-Berndsen;	Special Interest Group on Computational Morphology and …	2023-01-01
833	En-HACN: Enhancing Hybrid Architecture With Fast Attention and Capsule Network for End-to-end Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) is a fundamental technology in the field of artificial intelligence. End-to-end (E2E) ASR is favored for its state-of-the-art performance. …	Boyang Lyu; Chunxiao Fan; Yue Ming; Panzi Zhao; Nannan Hu;	IEEE/ACM Transactions on Audio, Speech, and Language …	2023-01-01
834	Augmentation Techniques for Adult-Speech to Generate Child-Like Speech Data Samples at Scale Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Technologies such as Text-To-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) have become important in providing speech-based Artificial Intelligence (AI) solutions …	Mariam Yiwere; Andrei Barcovschi; Rishabh Jain; H. Cucu; Peter Corcoran;	IEEE Access	2023-01-01
835	Speech Recognition for Minority Languages Using HuBERT and Model Adaptation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: : In the field of speech recognition, models and datasets are becoming larger and larger. However, it is difficult to create large datasets for minority languages, which is an …	Tomohiro Hattori; S. Tamura;	International Conference on Pattern Recognition …	2023-01-01
836	Why Aren’t We NER Yet? Artifacts of ASR Errors in Named Entity Recognition in Spontaneous Speech Transcripts Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transcripts of spontaneous human speech present a significant obstacle for traditional NER models. The lack of grammatical structure of spoken utterances and word errors …	PIOTR SZYMAŃSKI et. al.	Annual Meeting of the Association for Computational …	2023-01-01
837	Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A Case Study for Modern Greek Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervision.	GEORGIOS PARASKEVOPOULOS et. al.	arxiv-cs.CL	2022-12-31
838	Indonesian Automatic Speech Recognition with XLSR-53 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study focuses on the development of Indonesian Automatic Speech Recognition (ASR) using the XLSR-53 pre-trained model, the XLSR stands for cross-lingual speech …	Panji Arisaputra; Amalia Zahra;	ArXiv	2022-12-31
839	Memory Augmented Lookup Dictionary Based Language Modeling for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM.	Yukun Feng; Ming Tu; Rui Xia; Chuanzeng Huang; Yuxuan Wang;	arxiv-cs.CL	2022-12-30
840	Can Visual Context Improve Automatic Speech Recognition for An Embodied Agent? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a method to incorporate a robot?s visual information into an ASR system and improve the recognition of a spoken utterance containing a visible entity.	Pradip Pramanick; Chayan Sarkar;	emnlp	2022-12-30
841	RED-ACE: Robust Error Detection for ASR Using Confidence Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model’s encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation.	Zorik Gekhman; Dina Zverinski; Jonathan Mallinson; Genady Beryozkin;	emnlp	2022-12-30
842	Towards Relation Extraction from Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new listening information extraction task, i. e. , speech relation extraction.	TONGTONG WU et. al.	emnlp	2022-12-30
843	Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text.	CHEN WANG et. al.	emnlp	2022-12-30
844	SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder.	ZIQIANG ZHANG et. al.	emnlp	2022-12-30
845	Don’t Be So Sure! Boosting ASR Decoding Via Confidence Relaxation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure that improves the performance of fine-tuned ASR models.	Tomer Wullach; Shlomo E. Chazan;	arxiv-cs.CL	2022-12-27
846	Skit-S2I: An Indian Accented Speech to Intent Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we release the Skit-S2I dataset, the first publicly available Indian-accented SLU dataset in the banking domain in a conversational tonality.	Shangeth Rajaa; Swaraj Dalmia; Kumarmanas Nethil;	arxiv-cs.CL	2022-12-26
847	I Spy You Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents iSpyU, a system that shows the feasibility of recognition of natural speech content played on a phone during conference calls (Skype, Zoom, etc) using a fusion …	Shijia Zhang; Yilin Liu; Mahanth K. Gowda;	Proceedings of the ACM on Interactive, Mobile, Wearable and …	2022-12-21
848	End-to-End Automatic Speech Recognition Model for The Sudanese Dialect Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the field lacks wide support for several universal languages and their dialects, while most of the daily conversations are carried out using them. This paper comes to inspect the viability of designing an Automatic Speech Recognition model for the Sudanese dialect, which is one of the Arabic Language dialects, and its complexity is a product of historical and social conditions unique to its speakers.	Ayman Mansour; Wafaa F. Mukhtar;	arxiv-cs.CL	2022-12-21
849	Low-Resource Speech Recognition Based on Transfer Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A lot of research aims to improve accuracy in end-to-end speech recognition, and achieves higher accuracy on various famous corpora. However, there are many languages which do not …	Wei-Hong Tsai; Phuong Le Thi; Tzu-Chiang Tai; Chien-Lin Huang; Jia-Ching Wang;	2022 RIVF International Conference on Computing and …	2022-12-20
850	Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to better utilize the potential power of SSL models, in this work, we explore the effective fusion on multiple SSL models.	Changli Tang; Yujin Wang; Xie Chen; Wei-Qiang Zhang;	arxiv-cs.SD	2022-12-20
851	Mu2SLAM: Multitask, Multilingual Speech and Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech …	Yong Cheng; Yu Zhang; Melvin Johnson; Wolfgang Macherey; Ankur Bapna;	ArXiv	2022-12-19
852	Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages.	Yong Cheng; Yu Zhang; Melvin Johnson; Wolfgang Macherey; Ankur Bapna;	arxiv-cs.CL	2022-12-19
853	Speech Aware Dialog System Technology Challenge (DSTC11) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These included ASR transcripts, word time stamps, and latent representations of the audio (audio encoder outputs). In this paper, we describe the corpus, report results from participating teams, provide preliminary analyses of their results, and summarize the current state-of-the-art in this domain.	HAGEN SOLTAU et. al.	arxiv-cs.AI	2022-12-16
854	DSTC-11: Speech Aware Task-Oriented Dialog Modeling Track Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, …	H. SOLTAU et. al.	DSTC	2022-12-16
855	BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.	MINGDA CHEN et. al.	arxiv-cs.CL	2022-12-16
856	Disentangling Prosody Representations with Unsupervised Speech Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aim of this paper is to address the disentanglement of emotional prosody from speech based on unsupervised reconstruction.	LEYUAN QU et. al.	arxiv-cs.SD	2022-12-13
857	Ensemble And Re-Ranking Based On Language Models To Improve ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We propose a strategy to improve speech recognition by selecting appropriate words to form new sentences using ensemble learning. Use traditional speech recognition methods first, …	Shu-fen Tsai; Shih-Chan Kuo; Ren-Yuan Lyu; J. Jang;	2022 13th International Symposium on Chinese Spoken …	2022-12-11
858	Adaptive Attention Network with Domain Adversarial Training for Multi-Accent Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spoken accents severely degrade the performance of automatic speech recognition (ASR) systems. Domain adversarial training (DAT) is widely adopted for generating domain-invariant …	YANBING YANG et. al.	2022 13th International Symposium on Chinese Spoken …	2022-12-11
859	Mix-Guided VC: Any-to-many Voice Conversion By Combining ASR and TTS Bottleneck Features Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to the difficulty of obtaining parallel data, there are many works focus on non-parallel voice conversion(VC) recently. Bottleneck features(BNFs) from automatic speech …	ZEQING ZHAO et. al.	2022 13th International Symposium on Chinese Spoken …	2022-12-11
860	End-to-End Speech Translation of Arabic to English Broadcast News Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system.	Fethi Bougares; Salim Jouili;	arxiv-cs.CL	2022-12-11
861	Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we propose a joint framework for speaker-attributed automatic speech recognition (SA-ASR) task named Separate-to-Recognize. The proposed framework combines …	YUXIAO LIN et. al.	2022 13th International Symposium on Chinese Spoken …	2022-12-11
862	Improving Speech Recognition with Augmented Synthesized Data and Conditional Model Training Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With recent advances in end-to-end text to speech (TTS), the quality of synthetic data has been significantly improved. Synthesized speech is becoming a feasible alternative to …	Shaofei Xue; Jian Tang; Yazhu Liu;	2022 13th International Symposium on Chinese Spoken …	2022-12-11
863	Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we discuss the rationale of our work for automatic speech recognition (ASR) in the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge and provide a …	Hanzhi Guo; Yunshu Chen; Xukang Xie; G. Xu; Wei Guo;	2022 13th International Symposium on Chinese Spoken …	2022-12-11
864	Is AI at Human Parity Yet? A Case Study on Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Claims have been made that speech recognition has achieved human parity, yet this does not appear to be the case in the real-world applications that rely on it, especially for …	Ian Beaver;	AI Mag.	2022-12-05
865	An Automatic Speech Recognition System in Indian and Foreign Languages: A State-of-the-art Review Analysis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech Recognition is one of the prominent research topics in the field of Natural Language Processing (NLP). The Speech Recognition technique removes the barriers and makes the …	Astha Gupta; Rakesh Kumar; Y. Kumar;	Intell. Decis. Technol.	2022-12-02
866	Isolated Word Recognition Based on Convolutional Recurrent Neural Network Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Nowadays in the modern world, speech recognition has gained elevation with development of AI and many intelligent assistants. The ability of a machine or programme to recognize …	Dr. A Rajani; Kistamgari Nikhitha Reddy;	2022 IEEE International Symposium on Smart Electronic …	2022-12-01
867	Age Estimation Based on MFCC Speech Features and Machine Learning Algorithms Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper analyses how well various machine learning algorithms perform when used to infer a person’s age from speech in real-world applications including human machine …	Laxmi Kantham Durgam; R. K. Jatoth;	2022 IEEE International Symposium on Smart Electronic …	2022-12-01
868	MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In this paper, we propose a novel multi-modal multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR), which employs both …	XIAOHUAN ZHOU et. al.	ArXiv	2022-11-29
869	TESSP: Text-Enhanced Self-Supervised Speech Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the distinct pre-training objectives make it challenging to jointly optimize the speech and text representation in the same model. To solve this problem, we propose Text-Enhanced Self-Supervised Speech Pre-training (TESSP), aiming to incorporate the linguistic information into speech pre-training.	ZHUOYUAN YAO et. al.	arxiv-cs.SD	2022-11-24
870	Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation …	INJY HAMED et. al.	2022 IEEE Spoken Language Technology Workshop (SLT)	2022-11-22
871	Deep Learning for Robust Speech Command Recognition Using Convolutional Neural Networks (CNN) Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The rapid development of mobile devices has made human-computer interaction through voice increasingly popular and effective. This condition is made possible by the rapid growth …	ZAHRA CANTIABELA et. al.	Proceedings of the 2022 International Conference on …	2022-11-22
872	SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore training and deploying an ASR system in the label-scarce, compute-limited setting.	RAPHAEL TANG et. al.	arxiv-cs.CL	2022-11-21
873	CORAA ASR: A Large Corpus of Spontaneous and Prepared Speech Manually Validated for Speech Recognition in Brazilian Portuguese Related Papers Related Patents Related Grants Related Venues Related Experts View	ARNALDO CANDIDO JUNIOR et. al.	Language Resources and Evaluation	2022-11-21
874	A Hybrid Discriminant Fuzzy DNN with Enhanced Modularity Bat Algorithm for Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In recent years, speech processing resides a major application in the domain of signal processing. Due to the audibility loss of some speech signals, people with hearing …	S. Venkatalakshmi; K. Sujatha; J. Janet;	J. Intell. Fuzzy Syst.	2022-11-19
875	LongFNT: Long-form Speech Recognition with Factorized Neural Transducer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the {LongFNT-Text} architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor and then embeds token-level long-form features inside the vocabulary predictor, with a pre-trained contextual encoder RoBERTa to further boost the performance.	XUN GONG et. al.	arxiv-cs.SD	2022-11-17
876	Research on Phoneme Recognition Using Attention-based Methods Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech …	Yupei Zhang;	Proceedings of the 2022 11th International Conference on …	2022-11-17
877	Hey ASR System! Why Aren’t You More Inclusive? Automatic Speech Recognition Systems’ Bias and Proposed Bias Mitigation Techniques. A Literature Review IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These systems do not work equally for everyone and actually hinder the productivity of some users. In this paper, we present research that addresses ASR biases against gender, race, and the sick and disabled, while exploring studies that propose ASR debiasing techniques for mitigating these discriminations.	Mikel K. Ngueajio; Gloria Washington;	arxiv-cs.CL	2022-11-17
878	Introducing Semantics Into Speech Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions.	DEREK XU et. al.	arxiv-cs.CL	2022-11-15
879	Improving Children’s Speech Recognition By Fine-tuning Self-supervised Adult Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage self-supervised adult speech representations and use three well-known child speech corpora to build models for children’s speech recognition.	Renee Lu; Mostafa Shahin; Beena Ahmed;	arxiv-cs.CL	2022-11-14
880	The Far Side of Failure: Investigating The Impact of Speech Recognition Errors on Subsequent Dementia Classification Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications includ-ing screening for dementia and other forms of cognitive …	Changye Li; T. Cohen; Serguei V. S. Pakhomov;	ArXiv	2022-11-11
881	Align, Write, Re-order: Explainable End-to-End Speech Translation Via Operation Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A major challenge arises from the fact that translation is a non-monotonic sequence transduction task due to word ordering differences between languages — this clashes with the monotonic nature of ASR. Therefore, we propose to generate ST tokens out-of-order while remembering how to re-order them later.	Motoi Omachi; Brian Yan; Siddharth Dalmia; Yuya Fujita; Shinji Watanabe;	arxiv-cs.CL	2022-11-10
882	A Study on The Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks?	YIFAN PENG et. al.	arxiv-cs.CL	2022-11-10
883	Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks.	Yu Chen; Wen Ding; Junjie Lai;	arxiv-cs.SD	2022-11-09
884	ATCO2 Corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data.	JUAN ZULUAGA-GOMEZ et. al.	arxiv-cs.CL	2022-11-08
885	Blackbox Adversarial Attacks and Explanations for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) models are used widely in applications for voice navigation and voice control of domestic appliances. The computational core of ASRs are Deep …	Xiao-lan Wu;	Proceedings of the 30th ACM Joint European Software …	2022-11-07
886	Streaming, Fast and Accurate On-device Inverse Text Normalization for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the development of an on-device ITN system that is streaming, lightweight & accurate.	YASHESH GAUR et. al.	arxiv-cs.CL	2022-11-07
887	Towards Improved Room Impulse Response Estimation for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR).	ANTON RATNARAJAH et. al.	arxiv-cs.SD	2022-11-07
888	Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework to finetune the connections of speech SSL models, instead of model weights, to empower efficient multilingual and multitask speech processing.	YONGGAN FU et. al.	nips	2022-11-06
889	Evaluation of ASR Systems for Conversational Speech: A Linguistic Perspective Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) meets more informal and free-form input data as voice user interfaces and conversational agents such as the voice assistants such as Alexa, …	H. B. Pasandi; Haniyeh B. Pasandi;	Proceedings of the 20th ACM Conference on Embedded …	2022-11-06
890	Bridging Speech and Textual Pre-trained Models with Unsupervised ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models.	JIATONG SHI et. al.	arxiv-cs.CL	2022-11-06
891	Global Normalization for Streaming Speech Recognition in A Modular Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition.	EHSAN VARIANI et. al.	nips	2022-11-06
892	LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to …	PEIDONG WANG et. al.	INTERSPEECH 2023	2022-11-05
893	LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.	PEIDONG WANG et. al.	arxiv-cs.CL	2022-11-05
894	Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a linguistic perspective, and take the French language as a case study toward disambiguation of the French homophones.	Hannaneh B. Pasandi; Haniyeh B. Pasandi;	arxiv-cs.CL	2022-11-05
895	Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing …	XIN ZHANG et. al.	ArXiv	2022-11-04
896	Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we analyze the performance of features at different layers of a foundation model on the speech recognition task and propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models.	ZHOUYUAN HUO et. al.	arxiv-cs.LG	2022-11-04
897	H_eval: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose H_eval, a new hybrid evaluation metric for ASR systems that considers both semantic correctness and error rate and performs significantly well in scenarios where WER and SD perform poorly.	Zitha Sasindran; Harsha Yelchuri; T. V. Prabhakar; Supreeth Rao;	arxiv-cs.CL	2022-11-03
898	The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC).	AO ZHANG et. al.	arxiv-cs.SD	2022-11-03
899	Probing Statistical Representations For End-To-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: End-to-End automatic speech recognition (ASR) models aim to learn a generalised speech representation to perform recognition.	Anna Ollerenshaw; Md Asif Jalal; Thomas Hain;	arxiv-cs.CL	2022-11-03
900	Towards Zero-Shot Code-Switched Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training.	Brian Yan; Matthew Wiesner; Ondrej Klejch; Preethi Jyothi; Shinji Watanabe;	arxiv-cs.CL	2022-11-02
901	Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work aims to enhance the practical usage of speech SSL models towards a win-win in both enhanced efficiency and alleviated overfitting via our proposed S$^3$-Router framework, which for the first time discovers that simply discarding no more than 10\% of model weights via only finetuning model connections of speech SSL models can achieve better accuracy over standard weight finetuning on downstream speech processing tasks.	YONGGAN FU et. al.	arxiv-cs.LG	2022-11-02
902	Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to overcome, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data.	Lester Phillip Violeta; Ding Ma; Wen-Chin Huang; Tomoki Toda;	arxiv-cs.SD	2022-11-02
903	Conversation-oriented ASR with Multi-look-ahead CBS Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In streaming ASR, high accuracy is assured by attending to look-ahead frames, which leads to delay increments. To tackle this trade-off issue, we propose a multiple latency streaming ASR to achieve high accuracy with zero look-ahead.	HUAIBO ZHAO et. al.	arxiv-cs.SD	2022-11-01
904	Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a method to jointly train the ASR and EP tasks in a single end-to-end (E2E) multitask model, improving EP quality by optionally leveraging information from the ASR audio encoder.	SHAAN BIJWADIA et. al.	arxiv-cs.SD	2022-11-01
905	Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Lhasa dialect is the primary Tibetan dialect, with the most speakers in Tibet and the most extensive written scripts over its lengthy history. Studying speech recognition …	Kak Soky; Zhuo Gong; Sheng Li;	2022 25th Conference of the Oriental COCOSDA International …	2022-11-01
906	End-to-end Named Entity Recognition for Vietnamese Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: One of the first steps in comprehending natural or spoken language is named-entity recognition. It plays a critical role in natural language processing applications such as text …	T. Nguyen; Thai-Binh Nguyen; Quoc Truong Do; T. Nguyen;	2022 25th Conference of the Oriental COCOSDA International …	2022-11-01
907	Improving Vietnamese Accent Recognition Using ASR Transfer Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Accent Recognition (AR) is a critical task in voice-controlled systems. If accent information is known in advance, voice-controlled systems can switch to a suitable …	Bao Thang Ta; Xuan Vuong Dang; Quang Tien Duong; Nhat Minh Le; Van Hai Do;	2022 25th Conference of the Oriental COCOSDA International …	2022-11-01
908	Structured State Space Decoder for Speech Recognition and Synthesis IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we applied S4 as a decoder for ASR and text-to-speech (TTS) tasks by comparing it with the Transformer decoder.	Koichi Miyazaki; Masato Murata; Tomoki Koriyama;	arxiv-cs.SD	2022-10-31
909	Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present our Joint Audio/Text training method for Transformer Rescorer, to leverage unpaired text-only data which is relatively cheaper than paired audio-text data.	SUYOUN KIM et. al.	arxiv-cs.CL	2022-10-31
910	On The Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Interacting with a speech interface to query a Question Answering (QA) system is becoming increasingly popular. Typically, QA systems rely on passage retrieval to select candidate …	Georgios Sidiropoulos; Svitlana Vakulenko; Evangelos Kanoulas;	cikm	2022-10-29
911	XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a novel linear transformer by examining the properties of the key-query product within self-attentions.	Roshan Sharma; Bhiksha Raj;	arxiv-cs.CL	2022-10-29
912	Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the transcription process and the development of a Kiswahili speech corpus, which includes both read-out texts and spontaneous speech data from native Kiswahili speakers.	EBBIE AWINO et. al.	arxiv-cs.CL	2022-10-29
913	Filter and Evolve: Progressive Pseudo Label Refining for Semi-supervised Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Fine-tuning self-supervised pre-trained models using pseudo-labels can effectively improve speech recognition performance. But, low-quality pseudo-labels can misguide decision …	ZEZHONG JIN et. al.	ArXiv	2022-10-28
914	Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models.	TAKAAKI SAEKI et. al.	arxiv-cs.SD	2022-10-27
915	Automatic Severity Classification of Dysarthric Speech By Using Self-supervised Model with Multi-task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning.	Eun Jung Yeo; Kwanghee Choi; Sunhee Kim; Minhwa Chung;	arxiv-cs.CL	2022-10-27
916	Improving Speech-to-Speech Translation Through Unlabeled Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data.	XUAN-PHI NGUYEN et. al.	arxiv-cs.CL	2022-10-26
917	End-to-End Speech to Intent Prediction to Improve E-commerce Customer Support Voicebot in Hindi and English Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automation of on-call customer support relies heavily on accurate and efficient speech-to-intent (S2I) systems. Building such systems using multi-component pipelines can pose …	Abhinav Goyal; Ashutosh Kumar Singh; Nikesh Garera;	ArXiv	2022-10-26
918	Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite joining four models into one, our unified tagging approach matches or outperforms task-specific models across all four tasks on benchmark test sets across several domains.	Sharman Tan; Piyush Behre; Nick Kibre; Issac Alphonso; Shuangyu Chang;	arxiv-cs.CL	2022-10-26
919	ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To promote the development of multi-domain speech systems, we introduce the End-to-end Speech Benchmark (ESB) for evaluating the performance of a single automatic speech recognition (ASR) system across a broad set of speech datasets.	Sanchit Gandhi; Patrick von Platen; Alexander M. Rush;	arxiv-cs.CL	2022-10-24
920	Investigating Self-supervised, Weakly Supervised and Fully Supervised Training Approaches for Multi-domain Automatic Speech Recognition: A Study on Bangladeshi Bangla Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the robustness of the state-of-the-art transfer learning approaches such as self-supervised wav2vec 2.0 and weakly supervised Whisper as well as fully supervised convolutional neural networks (CNNs) for multi-domain ASR.	AHNAF MOZIB SAMIN et. al.	arxiv-cs.CL	2022-10-23
921	Beyond Subtitles: Captioning and Visualizing Non-speech Sounds to Improve Accessibility of User-Generated Videos IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Captioning provides access to sounds in audio-visual content for people who are Deaf or Hard-of-hearing (DHH). As user-generated content in online videos grows in prevalence, …	Oliver Alonzo; Hijung Valentina Shin; Dingzeyu Li;	Proceedings of the 24th International ACM SIGACCESS …	2022-10-22
922	Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech.	THIEN NGUYEN et. al.	arxiv-cs.SD	2022-10-21
923	Analyzing Speaker Information in Self-supervised Models to Improve Unsupervised Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The quality of speech representation is the key to the success of unsupervised speech recognition. Self-supervised models contain a variety of audio information, and non-speech …	SIRUI LI et. al.	Proceedings of the 2022 6th International Conference on …	2022-10-21
924	Deep LSTM Spoken Term Detection Using Wav2Vec 2.0 Recognizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use the Wav2Vec speech recognizer in the task of spoken term detection over a large set of spoken documents.	Jan Švec; Jan Lehečka; Luboš Šmídl;	arxiv-cs.CL	2022-10-21
925	A Textless Metric for Speech-to-Speech Comparison Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts.	Laurent Besacier; Swen Ribeiro; Olivier Galibert; Ioan Calapodescu;	arxiv-cs.CL	2022-10-21
926	Guided Contrastive Self-supervised Pre-training for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel modification of CPC called Guided Contrastive Predictive Coding (GCPC).	Aparna Khare; Minhua Wu; Saurabhchand Bhati; Jasha Droppo; Roland Maas;	arxiv-cs.CL	2022-10-21
927	End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel end-to-end architecture by integrating dereverberation, beamforming, SSLR, and ASR within a single neural network.	Yoshiki Masuyama; Xuankai Chang; Samuele Cornell; Shinji Watanabe; Nobutaka Ono;	arxiv-cs.SD	2022-10-19
928	Throat Microphone Speech Recognition Using Wav2vec 2.0 and Feature Mapping Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Throat microphones can record the voice which and simultaneously suppress the impact of external noise. This work aims to improve speech recognition performance using throat …	Kohta Masuda; J. Ogata; M. Nishida; M. Nishimura;	2022 IEEE 11th Global Conference on Consumer Electronics …	2022-10-18
929	Maestro-U: Leveraging Joint Speech-text Representation Learning for Zero Supervised Speech ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that a modality-matched joint speech and text model can be leveraged to train a massively multilingual ASR model without any supervised (manually transcribed) speech for some languages.	ZHEHUAI CHEN et. al.	arxiv-cs.CL	2022-10-18
930	HMM Vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR).	Tina Raissi; Wei Zhou; Simon Berger; Ralf Schlüter; Hermann Ney;	arxiv-cs.SD	2022-10-18
931	Audio and ASR-based Filled Pause Detection Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Filled pauses (or fillers) are the most common form of speech disfluencies and they can be recognized as hesitation markers (“um”, “uh” and “er”) made by speakers, usually to gain …	AGGELINA CHATZIAGAPI et. al.	2022 10th International Conference on Affective Computing …	2022-10-18
932	Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is only a few transcribed and aligned CS speech available. To overcome this problem and train multilingual systems which can transcribe CS speech, we propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are concatenated.	Enes Yavuz Ugan; Christian Huber; Juan Hussain; Alexander Waibel;	arxiv-cs.CL	2022-10-17
933	Experiments on Turkish ASR with Self-Supervised Speech Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this report, we present our findings on Turkish ASR with speech representation learning using HUBERT.	Ali Safaya; Engin Erzin;	arxiv-cs.CL	2022-10-13
934	Summary on The ISCSLP 2022 Chinese-English Code-Switching ASR Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems.	SHUHAO DENG et. al.	arxiv-cs.CL	2022-10-12
935	A Context-aware Knowledge Transferring Strategy for CTC-based ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the challenge, we propose a context-aware knowledge transferring strategy, consisting of a knowledge transferring module and a context-aware training strategy, for CTC-based ASR.	Ke-Han Lu; Kuan-Yu Chen;	arxiv-cs.CL	2022-10-12
936	CTC Alignments Improve Autoregressive Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework wherein CTC’s core properties can counteract several key weaknesses of pure-attention models during training and decoding.	BRIAN YAN et. al.	arxiv-cs.CL	2022-10-11
937	Streaming Punctuation for Long-form Dictation with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios.	Piyush Behre; Sharman Tan; Padma Varadharajan; Shuangyu Chang;	arxiv-cs.CL	2022-10-11
938	An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.	Chao-Han Huck Yang; I-Fan Chen; Andreas Stolcke; Sabato Marco Siniscalchi; Chin-Hui Lee;	arxiv-cs.SD	2022-10-11
939	Automatic Speech Recognition of Low-Resource Languages Based on Chukchi Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The following paper presents a project focused on the research and creation of a new Automatic Speech Recognition (ASR) based in the Chukchi language.	Anastasia Safonova; Tatiana Yudina; Emil Nadimanov; Cydnie Davenport;	arxiv-cs.CL	2022-10-11
940	Language Identification-Based Evaluation of Single Channel Speech Separation of Overlapped Speeches Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In multi-lingual, multi-speaker environments (e.g., international conference scenarios), speech, language, and background sounds can overlap. In real-world scenarios, source …	Zuhragvl Aysa; Mijit Ablimit; Hankiz Yilahun; A. Hamdulla;	Inf.	2022-10-11
941	A Platform for Deploying The TFE Ecosystem of Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Since data regulations such as the European Union’s General Data Protection Regulation (GDPR) have taken effect, the traditional two-step Automatic Speech Recognition (ASR) …	YUANFENG SONG et. al.	Proceedings of the 30th ACM International Conference on …	2022-10-10
942	Pronunciation Modeling of Foreign Words for Mandarin ASR By Considering The Effect of Language Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on examining the phonetic effect of language transfer in automatic speech recognition.	Lei Wang; Rong Tong;	arxiv-cs.CL	2022-10-07
943	Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages.	LEI WANG et. al.	arxiv-cs.CL	2022-10-07
944	Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose several techniques such as a limited training strategy and regularized adapter modules for the Transducer encoder, prediction, and joiner network.	Somshubra Majumdar; Shantanu Acharya; Vitaly Lavrukhin; Boris Ginsburg;	arxiv-cs.SD	2022-10-06
945	Code-Switching Without Switching: Language Agnostic End-to-End Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we eliminate the need for that, by treating speech recognition and translation as one unified end-to-end speech translation problem.	Christian Huber; Enes Yavuz Ugan; Alexander Waibel;	arxiv-cs.CL	2022-10-04
946	IoT Device Control with Offline Automatic Speech Recognition on Edge Device Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) on edge device still barely used in industry. Most of ASR such as speech-to-text commonly depend on the network presence. This is the …	Panji Setiawan; Rahadian Yusuf;	2022 12th International Conference on System Engineering …	2022-10-03
947	Tamil Speech Recognition Using XLSR Wav2Vec2.0 & CTC Algorithm Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition is a promising research topic with lots of real-world applications like virtual assistants, aids for physically challenged etc. Tamil language speech …	A. Akhilesh; Brinda P; Keerthana S; Deepa Gupta; Susmitha Vekkot;	2022 13th International Conference on Computing …	2022-10-03
948	Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end automatic speech recognition (ASR) has become a popular alternative to traditional module-based systems, simplifying the model-building process with a single deep …	Yosuke Higuchi; Niko Moritz; J. Le Roux; Takaaki Hori;	IEEE Journal of Selected Topics in Signal Processing	2022-10-01
949	Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased attention in the automatic speech recognition (ASR) community. Typical SSL methods …	Ruchao Fan; Yunzheng Zhu; Jinhan Wang; A. Alwan;	IEEE Journal of Selected Topics in Signal Processing	2022-10-01
950	A Fully Integrated 1.7mW Attention-Based Automatic Speech Recognition Processor Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This brief presents a low-power attention-based automatic speech recognition (ASR) processor achieving real-time recognition capability. The proposed attention window algorithm, …	YI-LONG LIOU et. al.	IEEE Transactions on Circuits and Systems II: Express Briefs	2022-10-01
951	Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR).	CHENDONG ZHAO et. al.	arxiv-cs.CL	2022-09-29
952	DAMO-NLP at NLPCC-2022 Task 2: Knowledge Enhanced Robust NER for Speech Entity Linking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach called Knowledge Enhanced Named Entity Recognition (KENER), which focuses on improving robustness through painlessly incorporating proper knowledge in the entity recognition stage and thus improving the overall performance of entity linking.	SHEN HUANG et. al.	arxiv-cs.CL	2022-09-27
953	Unsupervised Domain Adaptation for Speech Recognition with Unsupervised Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an unsupervised error correction method for unsupervised ASR domain adaption, aiming to recover transcription errors caused by domain mismatch.	Long Mai; Julie Carson-Berndsen;	arxiv-cs.SD	2022-09-24
954	A Russian Continuous Speech Recognition System Based on The DTW Algorithm Under Artificial Intelligence Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In order to improve the effect of continuous speech recognition, this paper combines the DTW algorithm to construct a continuous Russian speech recognition system and proposes a …	Chunping Yu; Xin Eric Wang;	J. Robotics	2022-09-19
955	Generalized Keyword Spotting Using ASR Embeddings IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Keyword Spotting (KWS) detects a set of pre-defined spoken keywords. Building a KWS system for an arbitrary set requires massive training datasets. We propose to use the text …	K. R.; V. Kurmi; Vinay Namboodiri; C. V. Jawahar;	Interspeech	2022-09-18
956	Global RNN Transducer Models For Multi-dialect Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Constructing single, uniﬁed automatic speech recognition (ASR) models that work effectively across various dialects of a language is a challenging problem. Although many recently …	TAKASHI FUKUDA et. al.	Interspeech	2022-09-18
957	Using Data Augmentation and Consistency Regularization to Improve Semi-supervised Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: State-of-the-art automatic speech recognition (ASR) networks use attention mechanism and optimize transducer loss on labeled acoustic data. Recently, Semi-Supervised Learning …	A. Sapru;	Interspeech	2022-09-18
958	Reducing Multilingual Context Confusion for End-to-end Code-switching Automatic Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially …	SHUAI ZHANG et. al.	Interspeech	2022-09-18
959	Coarse-Grained Attention Fusion With Joint Training Framework for Complex Speech Enhancement and End-to-End Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Joint training of speech enhancement and automatic speech recognition (ASR) can make the model work robustly in noisy environments. However, most of these models work directly in …	Xuyi Zhuang; Lu Zhang; Zehua Zhang; Yukun Qian; Mingjiang Wang;	Interspeech	2022-09-18
960	Articulatory Synthesis for Data Augmentation in Phoneme Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While numerous studies on automatic speech recognition have been published in recent years describing data augmentation strategies based on time or frequency domain signal …	P. K. KRUG et. al.	Interspeech	2022-09-18
961	Mitigating Bias Against Non-native Accents IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) systems have seen sub-stantial improvements in the past decade; however, not for all speaker groups. Recent research shows that bias exists …	Yuanyuan Zhang; Yixuan Zhang; B. Halpern; T. Patel; O. Scharenborg;	Interspeech	2022-09-18
962	Adversarial Knowledge Distillation For Robust Spoken Language Understanding Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In spoken dialog systems, Spoken Language Understanding (SLU) usually consists of two parts, Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). In …	YE WANG et. al.	Interspeech	2022-09-18
963	Enhancing Speech Privacy with Slicing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Privacy preservation calls for anonymization methods which hide the speaker’s identity in speech signals while min-imizing the impact on downstream tasks such as automatic speech …	MOHAMED MAOUCHE et. al.	Interspeech	2022-09-18
964	Finer-grained Modeling Units-based Meta-Learning for Low-resource Tibetan Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Tibetan is a typical under-resourced language due to its relatively smaller population. Although a character-based end-to-end (E2E) automatic speech recognition (ASR) model with …	Siqing Qin; Longbiao Wang; Sheng Li; Yuqin Lin; J. Dang;	Interspeech	2022-09-18
965	External Text Based Data Augmentation for Low-Resource Speech Recognition in The Constrained Condition of OpenASR21 Challenge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes our USTC NELSLIP system submitted to the Open Automatic Speech Recognition (OpenASR21) Challenge for the Constrained condition, where only a 10-hour speech …	GUOLONG ZHONG et. al.	Interspeech	2022-09-18
966	Improved ASR Performance for Dysarthric Speech Using Two-stage DataAugmentation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Machine learning (ML) and Deep Neural Networks (DNN) have greatly aided the problem of Automatic Speech Recognition (ASR). However, accurate ASR for dysarthric speech remains a …	Chitralekha Bhat; Ashish Panda; H. Strik;	Interspeech	2022-09-18
967	Preventing Sensitive-word Recognition Using Self-supervised Learning to Preserve User-privacy for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Smart voice assistants that rely on automatic speech recognition (ASR) are widely used by people for multiple reasons. These devices, however, feature “always on” microphones that …	Yuchen Liu; Apu Kapadia; D. Williamson;	Interspeech	2022-09-18
968	End-to-End Dependency Parsing of Spoken French Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Research efforts in syntactic parsing have focused on written texts. As a result, speech parsing is usually performed on transcriptions, either in unrealistic settings (gold …	Adrien Pupier; Maximin Coavoux; B. Lecouteux; Jérôme Goulian;	Interspeech	2022-09-18
969	Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Evaluating automatic speech recognition (ASR) systems is a classical but difficult and still open problem, which often boils down to focusing only on the word error rate (WER). …	Thibault Bañeras-Roux; Mickael Rouvier; Jane Wottawa; Richard Dufour;	Interspeech	2022-09-18
970	Incremental Learning for RNN-Transducer Based Speech Recognition Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper investigates an incremental learning framework for a real-world voice assistant employing RNN-Transducer based automatic speech recognition (ASR) model. Such a model …	Deepak Baby; Pasquale D’Alterio; Valentin Mendelev;	Interspeech	2022-09-18
971	Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Conventional Automatic Speech Recognition (ASR) systems are susceptible to dialect variations within a language, thereby adversely affecting the ASR. Therefore, the current …	Aditya Yadavalli; Ganesh S Mirishkar; A. Vuppala;	Interspeech	2022-09-18
972	Improving ASR Robustness in Noisy Condition Through VAD Integration Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) systems are often deployed together with a voice activity detection (VAD) system to run ASR only on the voiced acoustic signals. Although it can …	Sashi Novitasari; Takashi Fukuda; Gakuto Kurata;	Interspeech	2022-09-18
973	End-to-End Spontaneous Speech Recognition Using Disfluency Labeling Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spontaneous speech often contains disfluent acoustic features such as fillers and hesitations, which are major causes of errors during automatic speech recognition (ASR). In this …	KOHARU HORII et. al.	Interspeech	2022-09-18
974	Homophone Disambiguation Profits from Durational Information Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered …	Barbara Schuppler; Emil Berger; Xenia Kogler; F. Pernkopf;	Interspeech	2022-09-18
975	An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to the extensive usage of technology in many languages throughout the world, interest in Automatic Speech Recognition (ASR) systems for Code-Switching (CS) in speech has grown …	Ansen Antony; Sumanth Reddy Kota; Akhilesh Lade; S. V; Shashidhar G. Koolagudi;	Interspeech	2022-09-18
976	OpenASR21: The Second Open Challenge for Automatic Speech Recognition of Low-Resource Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In 2021, the National Institute of Standards and Technology (NIST), in cooperation with the Intelligence Advanced Research Project Activity (IARPA), conducted OpenASR21, the …	Kay Peterson; Audrey Tong; Yan Yu;	Interspeech	2022-09-18
977	Japanese ASR-Robust Pre-trained Language Model with Pseudo-Error Sentences Generated By Grapheme-Phoneme Conversion Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spoken language understanding systems typically consist of a pipeline of automatic speech recognition (ASR) and natural language processing (NLP) modules. Although pre-trained …	Yasuhito Ohsugi; Itsumi Saito; Kyosuke Nishida; Sen Yoshida;	Interspeech	2022-09-18
978	Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The performance of automatic speech recognition (ASR) systems strongly deteriorates when the desired speech signal is contaminated with room reverberation and when the speech of …	Mieszko Fraś; Marcin Witkowski; K. Kowalczyk;	Interspeech	2022-09-18
979	Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this article, we propose a simple yet effective approach to train an end-to-end speech recognition system on languages with limited resources by leveraging a large pre-trained …	Geoffroy Vanderreydt; François Remy; Kris Demuynck;	Interspeech	2022-09-18
980	Empirical Sampling from Latent Utterance-wise Evidence Model for Missing Data ASR Based on Neural Encoder-Decoder Model Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Missing data automatic speech recognition (MD-ASR) can utilize the uncertainty of speech enhancement (SE) results without re-training of model parameters. Such uncertainty is …	Ryu Takeda; Yui Sudo; K. Nakadai; Kazunori Komatani;	Interspeech	2022-09-18
981	Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR By Fusing Speech Generation Methods Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Out-of-vocabulary (OOV) is a common problem for end-to-end (E2E) ASR. For code-switching (CS), the OOV problem on the embedded language is further aggravated and becomes a …	LINGXUAN YE et. al.	Interspeech	2022-09-18
982	Speech Recognition for Functional Decline Assessment in Older Adults Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Functional decline is one of the serious syndromes experienced among older adults. Its early assessment is critical to preventing its symptoms. Some Comprehensive Geriatric …	Dona Elisa Bou Zeidan; Abir Noun; Mohamad Nassereddine; Jamal Charara; A. Chkeir;	Proceedings of the 9th International Conference on …	2022-09-18
983	Ant Multilingual Recognition System for OLR 2021 Challenge Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a comprehensive description of the Ant multilingual recognition system for the 6th Oriental Language Recognition(OLR 2021) Challenge. Inspired by the transfer …	Anqi Lyu; Zhiming Wang; Huijia Zhu;	Interspeech	2022-09-18
984	Prompt-based Re-ranking Language Model for ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In Automatic Speech Recognition(ASR), the language model re-ranking based on unlabeled text can improve the performance and realize flexibly scene adaptation. The scheme of ASR …	Mengxi Nie; Ming Yan; Caixia Gong;	Interspeech	2022-09-18
985	Gram Vaani ASR Challenge on Spontaneous Telephone Speech Recordings in Regional Variations of Hindi IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the corpus and baseline systems for the Gram Vaani Automatic Speech Recognition (ASR) challenge in regional variations of Hindi. The corpus for this challenge …	ANISH BHANUSHALI et. al.	Interspeech	2022-09-18
986	Chunking Defense for Adversarial Attacks on ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While deep learning has lead to dramatic improvements in automatic speech recognition (ASR) systems in the past few years, it has also made them vulnerable to adversarial attacks. …	YIWEN SHAO et. al.	Interspeech	2022-09-18
987	Analysis of The Effect of Audio Data Augmentation Techniques on Phone Digit Recognition For Algerian Arabic Dialect Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this study, we describe a solution for dealing with the problem of data scarcity in Speech Processing tasks involving low-resource languages, including Automatic Speech …	Khaled Lounnas; Mohamed Lichouri; Mourad Abbas;	2022 International Conference on Advanced Aspects of …	2022-09-17
988	Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that modern ASR architectures, specifically ones based on Self-Supervised Learning, are in fact vulnerable to transferability.	Raphael Olivier; Hadi Abdullah; Bhiksha Raj;	arxiv-cs.LG	2022-09-17
989	MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a MVNet consisted of a memory assistance module which improves the performance of downstream ASR and a vocal reinforcement module which boosts the performance of ASV.	JIANRONG WANG et. al.	arxiv-cs.SD	2022-09-15
990	Non-Parallel Voice Conversion for ASR Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that voice conversion can be used as a data augmentation technique to improve ASR performance, even on LibriSpeech, which contains 2,456 speakers.	GARY WANG et. al.	arxiv-cs.SD	2022-09-14
991	A Hybrid TDNN-HMM Automatic Speech Recognizer for Filipino Children’s Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Previous studies presented in the literature in the recent years have shown the feasibility of developing an automatic speech recognition (ASR) system for Filipino-speaking …	John Andrew Y. Ing; Ronald M. Pascual; Francis D. Dimzon;	2022 IEEE International Conference on Artificial …	2022-09-13
992	Bengali Speech Recognition: An Overview Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study outlines the notable efforts of creating of automatic speech recognition (ASR) system in Bengali. It describes data from the Bengali language’s existing voice corpus …	MASHUK AREFIN PRANJOL et. al.	2022 IEEE International Conference on Artificial …	2022-09-13
993	Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although over 300M around the world speak Bangla, scant work has been done in improving Bangla voice-to-text transcription due to Bangla being a low-resource language. However, …	Mohammed Rakib; Md. Ismail Hossain; Nabeel Mohammed; F. Rahman;	Proceedings of the 2023 12th International Conference on …	2022-09-13
994	Conversion of Acoustic Signal (Speech) Into Text By Digital Filter Using Natural Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we created an interface that transforms speech and other auditory inputs into text using a digital filter.	Abhiram Katuri; Sindhu Salugu; Gelli Tharuni; Challa Sri Gouri;	arxiv-cs.AI	2022-09-09
995	Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present a new way to group multiple low-resource locales together and optimize the performance of Multilingual Transformer LMs in ASR.	Li Miao; Jian Wu; Piyush Behre; Shuangyu Chang; Sarangarajan Parthasarathy;	arxiv-cs.CL	2022-09-08
996	Distilling The Knowledge of BERT for CTC-based ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.	Hayato Futami; Hirofumi Inaguma; Masato Mimura; Shinsuke Sakai; Tatsuya Kawahara;	arxiv-cs.CL	2022-09-05
997	Deep Sparse Conformer for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We improve Conformer’s long-sequence representation ability in two directions, \emph{sparser} and \emph{deeper}.	Xianchao Wu;	arxiv-cs.CL	2022-09-01
998	Visual Speech Recognition in A Driver Assistance System IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Visual speech recognition or automated lip-reading is a field of growing attention. Video data proved its usefulness in multimodal speech recognition, especially when acoustic …	D. Ivanko; D. Ryumin; Alexev Kashevnik; A. Axyonov; Alexey Karnov;	2022 30th European Signal Processing Conference (EUSIPCO)	2022-08-29
999	Insights of Neural Representations in Multi-Banded and Multi-Channel Convolutional Transformers for End-to-End ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of speech. Popular approaches for End-to-End solutions have involved utilising …	A. Ollerenshaw; M. Jalal; Thomas Hain;	2022 30th European Signal Processing Conference (EUSIPCO)	2022-08-29
1000	Microphone Array Coding Preserving Spatial Information for Cloud-based Multichannel Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: An efficient method of coding multichannel signals from a microphone array is presented. This is advantageous for cloud-based audio processing, such as Direction-of-Arrival (DOA) …	Daniel T. Jones; D. Sharma; S. Kruchinin; P. Naylor;	2022 30th European Signal Processing Conference (EUSIPCO)	2022-08-29
1001	DualVoice: Speech Interaction That Discriminates Between Normal and Whispered Voice Input Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Interactions based on automatic speech recognition (ASR) have become widely used, with speech input being increasingly utilized to create documents. However, as there is no easy …	J. Rekimoto;	Proceedings of the 35th Annual ACM Symposium on User …	2022-08-22
1002	Audio-Driven Deformation Flow for Effective Lip Reading Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Lip reading, also known as visual speech recognition (VSR), is the task to recognize the speech content using only the visual modality. Inspired by the natural synchronization …	Dalu Feng; Shuang Yang; S. Shan; Xilin Chen;	2022 26th International Conference on Pattern Recognition …	2022-08-21
1003	ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by these challenges, in this paper we use a cloud based framework for production systems to demonstrate insights from privacy preserving incremental learning for automatic speech recognition (ILASR).	GOPINATH CHENNUPATI et. al.	kdd	2022-08-12
1004	Synthesising Audio Adversarial Examples for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the first time, we propose the Speech Synthesising based Attack (SSA), a novel threat model that constructs audio adversarial examples entirely from scratch, i.e., without depending on any existing audio to fool cutting-edge ASR models. To this end, we introduce a conditional variational auto-encoder (CVAE) as the speech synthesiser.	XINGHUA QU et. al.	kdd	2022-08-12
1005	Thai Wav2Vec2.0 with CommonVoice V8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model.	Wannaphong Phatthiyaphaibun; Chompakorn Chaksangchaichot; Peerat Limkonchotiwat; Ekapol Chuangsuwanich; Sarana Nutanong;	arxiv-cs.CL	2022-08-09
1006	Large Vocabulary Speech Recognition for Languages of Africa: Multilingual Modeling and Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data and collected data for 15 languages, and trained experimental models using these techniques.	SANDY RITCHIE et. al.	arxiv-cs.CL	2022-08-05
1007	Automatic Speech Recognition in German: A Detailed Error Analysis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. …	Johannes Wirth; R. Peinl;	2022 IEEE International Conference on Omni-layer …	2022-08-01
1008	Global Performance Disparities Between English-Language Accents in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand the discussion beyond bias as a function of the individual national origin of the speaker to look for bias as a function of the geopolitical orientation of their nation of origin.	Alex DiChristofano; Henry Shuster; Shefali Chandra; Neal Patwari;	arxiv-cs.CL	2022-08-01
1009	Self-conducted Speech Audiometry Using Automatic Speech Recognition: Simulation Results for Listeners with Hearing Loss Related Papers Related Patents Related Grants Related Venues Related Experts View	Jasper Ooster; Laura Tuschen; B. Meyer;	Comput. Speech Lang.	2022-08-01
1010	Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: Cumulating Speech Resources in A Pluricentric Language IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View	Wei Xue; C. Cucchiarini; R. Hout; H. Strik;	Speech Commun.	2022-08-01
1011	Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new approach to perform unsupervised fine-tuning and self-training using unlabeled speech data for recurrent neural network (RNN)-Transducer (RNN-T) end-to-end (E2E) automatic speech recognition (ASR) systems.	Cong-Thanh Do; Mohan Li; Rama Doddipatla;	arxiv-cs.CL	2022-07-29
1012	Pronunciation-aware Unique Character Encoding for RNN Transducer-based Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to use a novel pronunciation-aware unique character encoding for building E2E RNN-T-based Mandarin ASR systems.	Peng Shen; Xugang Lu; Hisashi Kawai;	arxiv-cs.CL	2022-07-29
1013	Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022).	Alef Iury Siqueira Ferreira; Gustavo dos Reis Oliveira;	arxiv-cs.CL	2022-07-28
1014	Automatic Speech Recognition Using Limited Vocabulary: A Survey IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support …	Jean Louis Fendji Kedieng Ebongue; D. Tala; B. Yenke; M. Atemkeng;	Applied Artificial Intelligence	2022-07-25
1015	Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: -In this article gesture recognition and speech recognition applications are implemented on embedded systems with Tiny Machine Learning (TinyML).The main benefit of using TinyML …	V. VISWANATHA et. al.	ArXiv	2022-07-23
1016	Toward Fairness in Speech Recognition: Discovery and Mitigation of Performance Disparities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we report on initial findings with both discovery and mitigation of performance disparities using data from a product-scale AI assistant speech recognition system.	PRANAV DHERAM et. al.	arxiv-cs.CL	2022-07-22
1017	ASR Error Detection Via Audio-Transcript Entailment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment.	Nimshi Venkat Meripo; Sandeep Konam;	arxiv-cs.CL	2022-07-21
1018	When Is TTS Augmentation Through A Pivot Language Useful? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an alternative: produce synthetic audio by running text from the target language through a trained TTS system for a higher-resource pivot language.	Nathaniel Robinson; Perez Ogayo; Swetha Gangu; David R. Mortensen; Shinji Watanabe;	arxiv-cs.CL	2022-07-20
1019	Offline Automatic Speech Recognition System Based on Bidirectional Gated Recurrent Unit (Bi-GRU) with Convolution Neural Network Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In recent years, the usage of smart phones increased rapidly. Such smartphones can be controlled by natural human speech signals with the help of automatic speech recognition …	S. Girirajan; A. Pandian;	J. Mobile Multimedia	2022-07-18
1020	ASRTest: Automated Testing for Deep-neural-network-driven Speech Recognition Systems IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the rapid development of deep neural networks and end-to-end learning techniques, automatic speech recognition (ASR) systems have been deployed into our daily and assist in …	Pin Ji; Yang Feng; Jia Liu; Zhihong Zhao; Zhenyu Chen;	Proceedings of the 31st ACM SIGSOFT International Symposium …	2022-07-18
1021	Self-supervised Learning with Random-projection Quantizer for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple and effective self-supervised learning approach for speech recognition.	Chung-Cheng Chiu; James Qin; Yu Zhang; Jiahui Yu; Yonghui Wu;	icml	2022-07-15
1022	Real-Time End-to-End Speech Emotion Recognition with Cross-Domain Adaptation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Language resources are the main factor in speech-emotion-recognition (SER)-based deep learning models. Thai is a low-resource language that has a smaller data size than …	K. Wongpatikaseree; Sattaya Singkul; Narit Hnoohom; Sumeth Yuenyong;	Big Data Cogn. Comput.	2022-07-15
1023	Data Augmentation for Low-Resource Quechua ASR Improvement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe our data augmentation approach to improve the results of ASR models for low-resource and agglutinative languages.	Rodolfo Zevallos; Nuria Bel; Guillermo Cámbara; Mireia Farrús; Jordi Luque;	arxiv-cs.SD	2022-07-14
1024	Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature Enhancement module (V-CAFE) to enhance the input noisy audio speech with a help of audio-visual correspondence.	Joanna Hong; Minsu Kim; Daehun Yoo; Yong Man Ro;	arxiv-cs.SD	2022-07-13
1025	Huqariq: A Multilingual Speech Corpus of Native Languages of Peru ForSpeech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Huqariq corpus is a multilingual collection of speech from native Peruvian languages. The transcribed corpus is intended for the research and development of speech …	Rodolfo Zevallos; Luis Camacho; Nelsi Melgarejo;	ArXiv	2022-07-12
1026	Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to verify the quality of the corpus, we present speech recognition experiments using 220 hours of fully transcribed audio.	Rodolfo Zevallos; Luis Camacho; Nelsi Melgarejo;	arxiv-cs.CL	2022-07-12
1027	Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Along with automatic speech recognition, many researchers have been actively studying speech emotion recognition, since emotion information is as crucial as the textual …	Jiyoung Seo; Bowon Lee;	Symmetry	2022-07-12
1028	Noisy Student Teacher Training with Self Supervised Learning for Children ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is a fast-growing field, where reliable systems are made for high resource languages and for adult’s speech. However, performance of such ASR …	Shreya S. Chaturvedi; Hardik B. Sailor; H. Patil;	2022 IEEE International Conference on Signal Processing and …	2022-07-11
1029	Sensitivity Analysis of MaskCycleGAN Based Voice Conversion for Enhancing Cleft Lip and Palate Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Cleft lip and palate speech (CLP) is a congenital disorder which deforms the speech of an individual. As a result their speech is not amenable to the speech recognition systems. …	S. Bhattacharjee; R. Sinha;	2022 IEEE International Conference on Signal Processing and …	2022-07-11
1030	Online Continual Learning of End-to-End Speech Recognition Models IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic …	Muqiao Yang; I. Lane; Shinji Watanabe;	Interspeech	2022-07-11
1031	Speaker Consistency Loss and Step-wise Optimization for Semi-supervised Joint Training of TTS and ASR Using Unpaired Text Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the semi-supervised joint training of text to speech (TTS) and automatic speech recognition (ASR), where a small amount of paired data and a large amount of unpaired text data are available.	Naoki Makishima; Satoshi Suzuki; Atsushi Ando; Ryo Masumura;	arxiv-cs.SD	2022-07-11
1032	Non-Autoregressive Chinese ASR Error Correction with Phonological Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the errors introduced by ASR systems will impair the performance of downstream tasks, we introduce a post-processing error correction method, PhVEC, to correct errors in text space.	Zheng Fang; Ruiqing Zhang; Zhongjun He; Hua Wu; Yanan Cao;	naacl	2022-07-09
1033	Investigating The Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities.	Muhammad Umar Farooq; Thomas Hain;	arxiv-cs.CL	2022-07-07
1034	Improving Transformer-based Conversational ASR By Inter-Sentential Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition.	Kun Wei; Pengcheng Guo; Ning Jiang;	arxiv-cs.SD	2022-07-02
1035	Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator (TCPGen) component for end-to-end contextual ASR.	Guangzhi Sun; Chao Zhang; Philip C. Woodland;	arxiv-cs.SD	2022-07-02
1036	Adversarial Example Attacks Against ASR Systems: An Overview Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the development of hardware and algorithms, ASR(Automatic Speech Recognition) systems evolve a lot. As The models get simpler, the difficulty of development and deployment …	XIAO ZHANG et. al.	2022 7th IEEE International Conference on Data Science in …	2022-07-01
1037	SpeechHide: A Hybrid Privacy-preserving Mechanism for Speech Content and Voiceprint in Speech Data Sharing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the development of speech technology, huge amounts of speech data generated by users is collected by speech service providers and may be used for data sharing. However, …	Yu Hu; Ran Li; Simin Wang; Fuqiang Tao; Zhe Sun;	2022 7th IEEE International Conference on Data Science in …	2022-07-01
1038	Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining Vs. Semi-Supervised Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate continued pretraining (CoPT) with unlabeled in-language audio data on the XLSR-53 pretrained model in several low-resource languages.	Mitchell DeHaven; Jayadev Billa;	arxiv-cs.CL	2022-07-01
1039	A NLP-based Approach to Improve Speech Recognition Services for People with Speech Disorders Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Current speech recognition services are not suitable for people with speech disorders, which present difficulties in coordinating muscles and articulating words and sentences. In …	A. Celesti; M. Fazio; Lorenzo Carnevale; M. Villari;	2022 IEEE Symposium on Computers and Communications (ISCC)	2022-06-30
1040	FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to investigate the effectiveness of diverse SSLR combinations using various fusion methods within end-to-end (E2E) ASR models.	Szu-Jui Chen; Jiamin Xie; John H. L. Hansen;	arxiv-cs.SD	2022-06-30
1041	Space-Efficient Representation of Entity-centric Query Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the use of probabilistic grammars as language models within the finite-state transducer (FST) framework.	Christophe Van Gysel; Mirko Hannemann; Ernest Pusateri; Youssef Oualil; Ilya Oparin;	arxiv-cs.CL	2022-06-29
1042	Bengali Common Voice Speech Dataset for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present insights obtained from the dataset and discuss key linguistic challenges that need to be addressed in future versions.	SAMIUL ALAM et. al.	arxiv-cs.CL	2022-06-28
1043	TALCS: An Open-Source Mandarin-English Code-Switching Corpus and A Speech Recognition Baseline IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we will introduce the recording procedure in detail, including audio capturing devices and corpus environments.	CHENGFEI LI et. al.	arxiv-cs.CL	2022-06-27
1044	A Lightweight Downscaled Approach to Automatic Speech Recognition for Small Indigenous Languages Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Development of fully featured Automatic Speech Recognition (ASR) systems for a complete language vocabulary generally requires large data repositories, massive computing power, …	George Vlad Stan; André Baart; Francis Dittoh; H. Akkermans; A. Bon;	Proceedings of the 14th ACM Web Science Conference 2022	2022-06-26
1045	TEVR: Improving Speech Recognition By Token Entropy Variance Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents TEVR, a speech recognition model designed to minimize the variation in token entropy w.r.t. to the language model.	Hajo Nils Krabbenhöft; Erhardt Barth;	arxiv-cs.CL	2022-06-25
1046	Distilling A Pretrained Language Model to A Multilingual ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method called the Distilling a Language model to a Speech model (Distill-L2S), which aligns the latent representations of two different modalities.	Kwanghee Choi; Hyung-Min Park;	arxiv-cs.CL	2022-06-25
1047	Pruned RNN-T for Fast, Memory-efficient ASR Training IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with …	FANGJUN KUANG et. al.	Interspeech	2022-06-23
1048	A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition(ASR) has been dominated by deep learning-based end-to-end speech recognition models. These approaches require large amounts of labeled data in the …	Raviraj Joshi; Ashutosh Kumar Singh;	ArXiv	2022-06-22
1049	Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance and also the pre-training efficiency, either through decoding with a hybrid ASR system to generate phoneme-level alignments (named PBERT), or performing clustering on the supervised speech features extracted from an end-to-end CTC model (named CTC clustering).	CHENGYI WANG et. al.	arxiv-cs.CL	2022-06-21
1050	The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech corpus of 155 hours.	Jonathan Mukiibi; Andrew Katumba; Joyce Nakatumba-Nabende; Ali Hussein; Josh Meyer;	arxiv-cs.CL	2022-06-20
1051	Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This gap introduces serious problems for ASR systems, especially when training or evaluating ASR models on datasets containing a lot of colloquial speech, such as the MALACH project. In this paper, we are addressing this problem in the light of a new paradigm in end-to-end ASR systems — recently introduced self-supervised audio Transformers.	Jan Lehečka; Josef V. Psutka; Josef Psutka;	arxiv-cs.CL	2022-06-15
1052	AVATAR: Unconstrained Audiovisual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is particularly useful for unconstrained videos, where the speaker is not necessarily visible. To solve this task, we propose a new sequence-to-sequence AudioVisual ASR TrAnsformeR (AVATAR) which is trained end-to-end from spectrograms and full-frame RGB.	VALENTIN GABEUR et. al.	arxiv-cs.CV	2022-06-15
1053	Exploring Capabilities of Monolingual Audio Transformers Using Large Datasets in Automatic Speech Recognition of Czech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks using a combination of in-domain data and almost 6 thousand hours of out-of-domain transcribed speech.	Jan Lehečka; Jan Švec; Aleš Pražák; Josef V. Psutka;	arxiv-cs.CL	2022-06-15
1054	Jira: A Central Kurdish Speech Recognition System, Designing and Building Speech Corpus and Pronunciation Lexicon Related Papers Related Patents Related Grants Related Venues Related Experts View	H. Veisi; Hawre Hosseini; Mohammad MohammadAmini; Wirya Fathy; A. Mahmudi;	Language Resources and Evaluation	2022-06-14
1055	VoiceTalk: Multimedia-IoT Applications for Mixing Mandarin, Taiwanese, and English Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The voice-based Internet of Multimedia Things (IoMT) is the combination of IoT interfaces and protocols with associated voice-related information, which enables advanced …	Yi-Bing Lin; Y. Liao; Sin-Horng Chen; Shaw-Hwa Hwang; Yih-Ru Wang;	ACM Transactions on Internet Technology	2022-06-14
1056	Joint Encoder-Decoder Self-Supervised Pre-training for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Usually, encoder-decoder architecture works exceptionally well for a sequence-to-sequence task like ASR. Therefore, in this paper, we propose a new paradigm that exploits the power of a decoder during self-supervised learning.	Arunkumar A; Umesh S;	arxiv-cs.CL	2022-06-09
1057	An Automatic Speech Segmentation Algorithm of Portuguese Based on Spectrogram Windowing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Sentence segmentation is important for improving the human readability of Automatic Speech Recognition (ASR) systems. Although it has been explored through numerous …	Lap-Man Hoi; Yuqi Sun; S. Im;	2022 IEEE World AI IoT Congress (AIIoT)	2022-06-06
1058	The Necessity of Emotion Recognition from Speech Signals for Natural and Effective Human-Robot Interaction in Society 5.0 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The history of humanity has reached Industry 4.0 that aims to the integration of information technologies and especially artificial intelligence with all life-sustaining …	Yeşím Ülgen Sönmez; A. Varol;	2022 10th International Symposium on Digital Forensics and …	2022-06-06
1059	Lip-Listening: Mixing Senses to Understand Lips Using Cross Modality Knowledge Distillation for Word-Based Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize …	Hadeel Mabrouk; Omar Abugabal; Nourhan Sakr; Hesham M. Eraqi;	ArXiv	2022-06-05
1060	LAE: Language-Aware Encoder for Monolingual and Multilingual ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding.	JINCHUAN TIAN et. al.	arxiv-cs.CL	2022-06-05
1061	Adaptive Activation Network For Low Resource Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduced an adaptive activation network to the upper layers of ASR model, and applied different activation functions to different languages.	Jian Luo; Jianzong Wang; Ning Cheng; Zhenpeng Zheng; Jing Xiao;	arxiv-cs.CL	2022-05-28
1062	Contextual Adapters for Personalized Speech Recognition in Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models.	KANTHASHREE MYSORE SATHYENDRA et. al.	arxiv-cs.CL	2022-05-26
1063	Global Normalization for Streaming Speech Recognition in A Modular Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition.	EHSAN VARIANI et. al.	arxiv-cs.LG	2022-05-26
1064	On Building Spoken Language Understanding Systems for Low Resourced Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a series of experiments to explore extremely low-resourced settings where we perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset.	Akshat Gupta;	arxiv-cs.CL	2022-05-25
1065	Heterogeneous Reservoir Computing Models for Persian Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the accuracy of the RC in ASR applications, we propose heterogeneous single and multi-layer ESNs to create non-linear transformations of the inputs that capture temporal context at different scales.	Zohreh Ansari; Farzin Pourhoseini; Fatemeh Hadaeghi;	arxiv-cs.SD	2022-05-25
1066	FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM.	ALEXIS CONNEAU et. al.	arxiv-cs.CL	2022-05-24
1067	Improved Language Models for ASR Using Written Language Text Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The performance of an Automatic Speech Recognition (ASR) engine primarily depends on ($a$) the acoustic model (AM), (b) the language model (LM) and (c) the lexicon (Lx), While the …	Kaustuv Mukherji; Meghna Pandharipande; Sunil Kumar Kopparapu;	2022 National Conference on Communications (NCC)	2022-05-24
1068	Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel method involves with multi-level modeling units, which integrates multi-level information for mandarin speech recognition.	Yuting Yang; Binbin Du; Yuke Li;	arxiv-cs.CL	2022-05-24
1069	End-to-End ASR-Enhanced Neural Network for Alzheimer’s Disease Diagnosis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents an approach to Alzheimer’s disease (AD) diagnosis from spontaneous speech using an end-to-end ASR-enhanced neural network. Under the condition that only audio …	Jiancheng Gui; Yikai Li; Kai Chen; Joanna Siebert; Qingcai Chen;	ICASSP 2022 – 2022 IEEE International Conference on …	2022-05-23
1070	Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (SimuGAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels.	C. Chen; N. Hou; Y. Hu; S. Shirol; E. S. Chng;	icassp	2022-05-22
1071	LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, the sparsity of the supervised training data forces the model to have the ability to learn from limited data. To address these problems, we propose LatticeBART, a model that decodes the sequence from the lattice in an end-to-end fashion and can use the pre-trained language models� prior.	L. Dai; L. Chen; Z. Zhou; K. Yu;	icassp	2022-05-22
1072	Summary on The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions.	F. Yu; et al.	icassp	2022-05-22
1073	Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the effectiveness of multi-modal acoustic modelling for dysarthric speech recognition using acoustic features along with articulatory information.	Z. Yue; E. Loweimi; Z. Cvetkovic; H. Christensen; J. Barker;	icassp	2022-05-22
1074	Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use of the NOS rescoring model on a first-pass multilingual model and show that similar to the first-pass model, the rescoring model can be made multilingual.	N. GAUR et. al.	icassp	2022-05-22
1075	A Two-Step Approach to Leverage Contextual Data: Speech Recognition in Air-Traffic Communications IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a two-step callsign boosting approach: (1) at the 1st step (ASR), weights of probable callsign n-grams are reduced in G.fst and/or in the decoding FST (lattices), (2) at the 2nd step (NLP), callsigns extracted from the improved recognition outputs with Named Entity Recognition (NER) are correlated with the surveillance data to select the most suitable one.	I. Nigmatulina; J. Zuluaga-Gomez; A. Prasad; S. Saeed Sarfjoo; P. Motlicek;	icassp	2022-05-22
1076	Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast and accurate end-to-end (E2E) model, which executes automatic speech recognition (ASR) and downstream natural language processing (NLP) simultaneously.	M. Omachi; Y. Fujita; S. Watanabe; T. Wang;	icassp	2022-05-22
1077	Joint Modeling of Code-Switched and Monolingual ASR Via Conditional Factorization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition.	B. Yan; et al.	icassp	2022-05-22
1078	End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes.	R. Kumar; A. Purushothaman; A. Sreeram; S. Ganapathy;	icassp	2022-05-22
1079	Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we explore a speaker reinforcement strategy for improving recognition performance without retraining the acoustic model (AM).	C. Zorila; R. Doddipatla;	icassp	2022-05-22
1080	Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence and ASR Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU).	Z. Wang; et al.	icassp	2022-05-22
1081	Improving Spoken Language Understanding By Enhancing Text Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel model to train a language model namely CapuBERT that is able to deal with spoken form input from ASR module.	T. B. Nguyen;	icassp	2022-05-22
1082	Sentiment-Aware Automatic Speech Recognition Pre-Training for Enhanced Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER).	A. Ghriss; B. Yang; V. Rozgic; E. Shriberg; C. Wang;	icassp	2022-05-22
1083	Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents initial Speech Recognition results on �Casual Conversations� � a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.	C. Liu; et al.	icassp	2022-05-22
1084	Multi-Stage and Multi-Loss Training for Fullband Non-Personalized and Personalized Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep learning-based wideband (16kHz) speech enhancement approaches have surpassed traditional methods. This work further extends the existing wideband systems to enable full-band …	L. Chen; et al.	icassp	2022-05-22
1085	Knowledge Transfer from Large-Scale Pretrained Language Models to End-To-End Speech Recognizers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since end-to-end models are also known to be severely data hungry, this constraint is crucial especially because obtaining transcribed utterances is costly and can possibly be impractical or impossible. This paper proposes a method for alleviating this issue by transferring knowledge from a language model neural network that can be pretrained with text-only data.	Y. Kubo; S. Karita; M. Bacchiani;	icassp	2022-05-22
1086	Model-Based Approach for Measuring The Fairness in ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest.	Z. Liu; I. -E. Veliche; F. Peng;	icassp	2022-05-22
1087	Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Transcribe-to-Diarize, a new approach for neural speaker diarization that uses an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR).	N. Kanda; et al.	icassp	2022-05-22
1088	Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training before being cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.	S. Hu; et al.	icassp	2022-05-22
1089	Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the over-suppression phenomenon in the enhanced speech might degrade the performance of downstream automatic speech recognition (ASR) task due to the missing latent information. To alleviate such problem, we propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature.	Y. Hu; N. Hou; C. Chen; E. Siong Chng;	icassp	2022-05-22
1090	Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a DNN-based switching method that directly estimates whether ASR will perform better on the enhanced or observed signals.	H. SATO et. al.	icassp	2022-05-22
1091	Punctuation Prediction for Streaming On-Device Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss one-pass models for both ASR and punctuation prediction to replace the conventional two-pass post-processing pipeline.	Z. Zhou; T. Tan; Y. Qian;	icassp	2022-05-22
1092	Unsupervised Speech Enhancement with Speech Recognition Embedding and Disentanglement Losses IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose an unsupervised loss function to tackle those two problems.	V. A. Trinh; S. Braun;	icassp	2022-05-22
1093	The Royalflush System of Speech Recognition for M2met Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge.	S. Ye; P. Wang; S. Chen; X. Hu; X. Xu;	icassp	2022-05-22
1094	Optimize Wav2vec2s Architecture for Small Training Set Through Analyzing Its Pre-Trained Models Attention Pattern Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations.	L. Chen; M. Asgari; H. H. Dodge;	icassp	2022-05-22
1095	Dementia Detection By Fusing Speech and Eye-Tracking Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method of detecting dementia from the simultaneous speech and eye-tracking recordings of subjects in a picture description task.	Z. Sheng; Z. Guo; X. Li; Y. Li; Z. Ling;	icassp	2022-05-22
1096	Speech Pattern Based Black-Box Model Watermarking for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first black-box model watermarking framework for protecting the IP of ASR models.	H. CHEN et. al.	icassp	2022-05-22
1097	Fusing ASR Outputs in Joint Training for Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to fuse Automatic Speech Recognition (ASR) outputs into the pipeline for joint training SER.	Y. Li; P. Bell; C. Lai;	icassp	2022-05-22
1098	Multi-Turn RNN-T for Streaming Recognition of Multi-Party Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through an in-depth analysis, we discuss potential pitfalls of the proposed system as well as promising future research directions.	I. Sklyar; A. Piunova; X. Zheng; Y. Liu;	icassp	2022-05-22
1099	Massively Multilingual ASR: A Lifelong Learning Solution IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the impact of adding more languages and propose a lifelong learning approach to build high quality MMASR systems.	B. Li; et al.	icassp	2022-05-22
1100	Analyzing The Robustness of Unsupervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Unsupervised speech recognition (unsupervised ASR) aims to learn the ASR system with non-parallel speech and text corpus only. Wav2vec-U [1] has shown promising results in …	G. -T. Lin; C. -J. Hsu; D. -R. Liu; H. -Y. Lee; Y. Tsao;	icassp	2022-05-22
1101	Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel approach to introduce LF-MMI criterion into E2E ASR frameworks in both training and decoding stages.	J. Tian; et al.	icassp	2022-05-22
1102	Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR.	M. Soleymanpour; M. T. Johnson; R. Soleymanpour; J. Berry;	icassp	2022-05-22
1103	An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages.	S. Kessler; B. Thomas; S. Karout;	icassp	2022-05-22
1104	Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an embedding aligner and modality switch training to better align the speech and text latent spaces.	W. Wang; et al.	icassp	2022-05-22
1105	An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore HuBERT with larger numbers of clusters and iterations in order to obtain better speech representation.	T. Maekaku; X. Chang; Y. Fujita; S. Watanabe;	icassp	2022-05-22
1106	Conversational Speech Recognition By Learning Conversation-Level Characteristics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework.	K. Wei; Y. Zhang; S. Sun; L. Xie; L. Ma;	icassp	2022-05-22
1107	TED Talk Teaser Generation with Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the challenge of automatically generating teasers for TED talks.	G. Vico; J. Niehues;	icassp	2022-05-22
1108	Integer-Only Zero-Shot Quantization for Efficient Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, they require training and/or validation data during quantization, which may not be available due to security or privacy concerns. To address these limitations, we propose an integer-only, zeroshot quantization scheme for ASR models.	S. Kim; et al.	icassp	2022-05-22
1109	Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel text representation and training framework for E2E ASR models.	S. Thomas; B. Kingsbury; G. Saon; H. -K. J. Kuo;	icassp	2022-05-22
1110	Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from previous one-piece model, in this paper, we propose a novel and agile framework called CR-ID for ASR error robust intent detection with two plug-and-play modules, namely semantic drift calibration module (SDCM) and phonemic refinement module (PRM), which are both model-agnostic and thus could be easily integrated to any existing intent detection models without modifying their structures.	Peilin Zhou; Dading Chong; Helin Wang; Qingcheng Zeng;	arxiv-cs.CL	2022-05-22
1111	Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of suppressing background noise with a conventional cascaded pipeline, we employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.	H. Wang; et al.	icassp	2022-05-22
1112	Magic Dust for Cross-Lingual Adaptation of Monolingual Wav2vec-2.0 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.	S. Khurana; A. Laurent; J. Glass;	icassp	2022-05-22
1113	Transformer-Based Streaming ASR with Cumulative Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR).	M. Li; S. Zhang; C. Zorila; R. Doddipatla;	icassp	2022-05-22
1114	Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to combine their ideas for end-to-end speech recognition model.	S. Ling; C. Shen; M. Cai; Z. Ma;	icassp	2022-05-22
1115	Joint and Adversarial Training with ASR for Expressive Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to alleviate the entanglement problem by integrating Text-To-Speech (TTS) model and Automatic Speech Recognition (ASR) model with a share layer network for joint training, and using ASR adversarial training to eliminate the content information in the style information.	K. ZHANG et. al.	icassp	2022-05-22
1116	Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.	A. RATNARAJAH et. al.	icassp	2022-05-22
1117	Channel-Wise AV-Fusion Attention for Multi-Channel Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our work for automatic speech recognition (ASR) in the Multimodal Information Based Speech Processing (MISP) Challenge 2021.	G. Xu; et al.	icassp	2022-05-22
1118	Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We trained personalized models for 195 individuals with different types and severities of speech impairment with training sets ranging in size from <1 minute to 18-20 minutes of speech data.	J. Tobin; K. Tomanek;	icassp	2022-05-22
1119	End-to-End Speech Recognition from Federated Acoustic Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French and Italian sets of the CommonVoice dataset, a large heterogeneous dataset containing thousands of different speakers, acoustic environments and noises.	Y. Gao; et al.	icassp	2022-05-22
1120	Privacy Attacks for Automatic Speech Recognition Acoustic Models in A Federated Learning Framework IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset.	N. Tomashenko; S. Mdhaffar; M. Tommasi; Y. Est�ve; J. -F. Bonastre;	icassp	2022-05-22
1121	Joint Speech Recognition and Audio Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of AAC is to generate natural language descriptions of contents in audio samples. We propose several approaches for end-to-end joint modeling of ASR and AAC tasks and demonstrate their advantages over traditional approaches, which model these tasks independently.	C. NARISETTY et. al.	icassp	2022-05-22
1122	DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dual-path network for the far-field acoustic model, which uses voice processing (VP) signal and acoustic echo cancellation (AEC) signal as input.	D. MA et. al.	icassp	2022-05-22
1123	Exploring Machine Speech Chain For Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the TTS?ASR pipeline in machine speech chain to perform domain adaptation for both E2E ASR and neural TTS models with only text data from the target domain.	F. Yue; Y. Deng; L. He; T. Ko; Y. Zhang;	icassp	2022-05-22
1124	Bilingual End-to-End ASR with Byte-Level Subwords Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR).	L. Deng; R. Hsiao; A. Ghoshal;	icassp	2022-05-22
1125	Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all three stages of the system is proposed.	G. Li; J. Yu; J. Deng; X. Liu; H. Meng;	icassp	2022-05-22
1126	Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks, and increase scalability of the model to multiple tasks or languages.	B. Thomas; S. Kessler; S. Karout;	icassp	2022-05-22
1127	RescoreBERT: Discriminative Speech Recognition Rescoring With Bert IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model.	L. Xu; et al.	icassp	2022-05-22
1128	Listen, Know and Spell: Knowledge-Infused Subword Modeling for Improving ASR Performance of OOV Named Entities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Knowledge-Infused Subword Model (KISM), a novel technique for incorporating semantic context from KGs into the ASR pipeline for improving the performance of OOV named entities.	N. DAS et. al.	icassp	2022-05-22
1129	Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization.	T. Munkhdalai; et al.	icassp	2022-05-22
1130	Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model.	Q. LI et. al.	icassp	2022-05-22
1131	Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning.	Y. WANG et. al.	icassp	2022-05-22
1132	End-to-End ASR-Enhanced Neural Network for Alzheimer�s Disease Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an approach to Alzheimer�s disease (AD) diagnosis from spontaneous speech using an end-to-end ASR-enhanced neural network.	J. Gui; Y. Li; K. Chen; J. Siebert; Q. Chen;	icassp	2022-05-22
1133	Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds.	A. Ogawa; N. Tawara; M. Delcroix; S. Araki;	icassp	2022-05-22
1134	SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to create a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE) consisting of limited-size labeled training sets and corresponding evaluation sets.	S. Shon; et al.	icassp	2022-05-22
1135	Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.	X. Zhao; et al.	icassp	2022-05-22
1136	Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic speech recognition (ASR) of multi-channel multi-speaker overlapped speech remains one of the most challenging tasks to the speech community. In this paper, we look into this challenge by utilizing the location information of target speakers in the 3D space for the first time.	Y. Shao; S. -X. Zhang; D. Yu;	icassp	2022-05-22
1137	Speaker-Targeted Audio-Visual Speech Recognition Using A Hybrid CTC/Attention Model with Interference Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although AV Align shows an improvement in recognition accuracy in background noise environments, we have observed that the recognition accuracy degrades significantly in interference speaker environments, where a target speech and an interfering speech overlap each other. In order to improve the speech recognition accuracy of the target speaker in such situations, we propose a method that combines the auxiliary loss function that maximizes the recognition accuracy of the interference speaker and the CTC loss function for training the AV-ASR model.	R. Tsunoda; R. Aihara; R. Takashima; T. Takiguchi; Y. Imai;	icassp	2022-05-22
1138	Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fulfill the two demands, in this paper, we propose a NAR CTC/attention model utilizing both pre-trained acoustic and language models: wav2vec2.0 and BERT.	K. DENG et. al.	icassp	2022-05-22
1139	Caching Networks: Capitalizing on Common Speech for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Caching Networks (CachingNets), a speech recognition network architecture capable of delivering faster, more accurate decoding by leveraging common speech patterns.	A. Alexandridis; et al.	icassp	2022-05-22
1140	Contextual Adapters for Personalized Speech Recognition in Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models.	K. M. Sathyendra; et al.	icassp	2022-05-22
1141	Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network, which can be applied to a wider range of tasks.	X. Chang; N. Moritz; T. Hori; S. Watanabe; J. L. Roux;	icassp	2022-05-22
1142	A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There-fore, we propose a time domain progressive learning (TDPL) approach for speech enhancement and ASR.	Z. Nian; J. Du; Y. Ting Yeung; R. Wang;	icassp	2022-05-22
1143	Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-Box Acoustic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A deep neural network (DNN)-based speech enhancement (SE) aiming to maximize the performance of an automatic speech recognition (ASR) system is proposed in this paper.	R. Sawata; Y. Kashiwagi; S. Takahashi;	icassp	2022-05-22
1144	AISHELL-NER: Named Entity Recognition from Chinese Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech.	B. CHEN et. al.	icassp	2022-05-22
1145	Effect of Noise Suppression Losses on Speech Distortion and ASR Performance IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the introduced speech distortion and artifacts greatly harm speech quality and intelligibility, and often significantly degrade automatic speech recognition (ASR) rates. In this work, we shed light on the success of the spectral complex compressed mean squared error (MSE) loss, and how its magnitude and phase-aware terms are related to the speech distortion vs. noise reduction trade off.	S. Braun; H. Gamper;	icassp	2022-05-22
1146	Performance-Efficiency Trade-Offs in Unsupervised Pre-Training for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Putting together all our observations, we introduce SEW-D (Squeezed and Efficient Wav2vec with Disentangled Attention), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups.	F. WU et. al.	icassp	2022-05-22
1147	Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.	C. -H. H. Yang; et al.	icassp	2022-05-22
1148	Factorized Neural Transducer for Efficient Language Model Adaptation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This draw-back might prevent their potential applications in practice. In order to address this issue, we propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction, and adopting a standalone language model for the vocabulary prediction.	X. Chen; Z. Meng; S. Parthasarathy; J. Li;	icassp	2022-05-22
1149	Enhance Rnnlms with Hierarchical Multi-Task Learning for ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, how to best share information among related tasks in MTL remains to be addressed. In this current work, we propose a hierarchical multi-task learning (HMTL) approach to incorporate linguistic knowledge into recurrent neural network language models (RNNLM), instead of using linguistic features as word factors.	M. Song; Y. Zhao;	icassp	2022-05-22
1150	M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.	F. Yu; et al.	icassp	2022-05-22
1151	Improving Recognition-Synthesis Based Any-to-one Voice Conversion with Cyclic Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This inconsistency between conversion and training stages constrains the speaker similarity of converted speech. To address this issue, a cyclic training method is proposed in this paper.	Y. -N. Chen; L. -J. Liu; Y. -J. Hu; Y. Jiang; Z. -H. Ling;	icassp	2022-05-22
1152	Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A question that naturally arises is whether the dissemination of personalized acoustic models can leak personal information. In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.	S. Mdhaffar; J. -F. Bonastre; M. Tommasi; N. Tomashenko; Y. Est�ve;	icassp	2022-05-22
1153	WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.	B. Zhang; et al.	icassp	2022-05-22
1154	SYNT++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two novel techniques during training to mitigate the problems due to the distribution gap: (i) a rejection sampling algorithm and (ii) using separate batch normalization statistics for the real and the synthetic samples.	T. -Y. HU et. al.	icassp	2022-05-22
1155	A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model.	Q. -S. ZHU et. al.	icassp	2022-05-22
1156	Multi-Modal Pre-Training for Automated Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel approach that leverages a self-supervised learning technique based on masked language modeling to compute a global, multi-modal encoding of the environment in which the utterance occurs.	D. M. Chan; S. Ghosh; D. Chakrabarty; B. Hoffmeister;	icassp	2022-05-22
1157	Best of Both Worlds: Multi-Task Audio-Visual Automatic Speech Recognition and Active Speaker Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has shown that we can solve both problems simultaneously by employing an attention mechanism over the competing video tracks of the speakers� faces, at the cost of sacrificing some accuracy on active speaker detection. This work closes this gap in active speaker detection accuracy by presenting a single model that can be jointly trained with a multi-task loss.	O. Braga; O. Siohan;	icassp	2022-05-22
1158	Improved Meta Learning for Low Resource Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new meta learning based framework for low resource speech recognition that improves the previous model agnostic meta learning (MAML) approach.	S. Singh; R. Wang; F. Hou;	icassp	2022-05-22
1159	SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs.	J. Pan; T. Lei; K. Kim; K. J. Han; S. Watanabe;	icassp	2022-05-22
1160	Phone-Informed Refinement of Synthesized Mel Spectrogram for Data Augmentation in Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a phone-informed post-processing network that refines Mel spectrograms without using the vocoder.	S. Ueno; T. Kawahara;	icassp	2022-05-22
1161	Spell My Name: Keyword Boosted Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recognition of uncommon words such as names and technical terminology is important to understanding conversations in context. However, the ability to recognise such words remains …	N. Jung; G. Kim; J. S. Chung;	icassp	2022-05-22
1162	Usted: Improving ASR with A Unified Speech and Text Encoder-Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose training ASR model jointly with a set of text-to-text auxiliary tasks with which it shares a decoder and parts of the encoder.	B. Yusuf; A. Gandhe; A. Sokolov;	icassp	2022-05-22
1163	Towards Better Meta-Initialization with Task Augmentation for Kindergarten-Aged Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we validate the effectiveness of MI in children�s ASR and attempt to alleviate the problem of learner overfitting.	Y. Zhu; R. Fan; A. Alwan;	icassp	2022-05-22
1164	Being Greedy Does Not Hurt: Sampling Strategies for End-To-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, the Optimal Completion Distillation (OCD) training method was proposed which attempts to address some of those issues. In this paper, we analyze if the method is competitive over a strong MLE baseline and investigate its scalability towards large speech data beyond read speech, which to our knowledge is the first attempt known in literature.	J. Heymann; E. Lakomkin; L. R�del;	icassp	2022-05-22
1165	Curriculum Optimization for Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new difficulty measure called compression ratio that can be used as a scoring function for raw audio in various noise conditions.	A. Kuznetsova; A. Kumar; J. D. Fox; F. M. Tyers;	icassp	2022-05-22
1166	Integrating Multiple ASR Systems Into NLP Backend with Attention Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reduce the impact of ASR errors on the NLP back-end by combining transcriptions from various ASR systems.	T. Kano; A. Ogawa; M. Delcroix; S. Watanabe;	icassp	2022-05-22
1167	End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a conformer-based multi-modal speech recognition system.	J. Chen; M. Wang; X. -L. Zhang; Z. Huang; S. Rahardja;	icassp	2022-05-22
1168	Exploring Effective Data Utilization for Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a series of training strategies to exploring more effective data utilization for low-resource speech recognition.	Z. Zhou; W. Wang; W. Zhang; Y. Qian;	icassp	2022-05-22
1169	Endpoint Detection for Streaming End-to-End Multi-Talker ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the EP detection problem in the SURT framework by introducing an end-of-sentence token as an output unit, following the practice of single-talker end-to-end models.	L. Lu; J. Li; Y. Gong;	icassp	2022-05-22
1170	Reference Microphone Selection and Low-Rank Approximation Based Multichannel Wiener Filter with Application to Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an experimental study on the low-rank approximation and reference microphone selection based MWF with application to noisy speech recognition.	X. -Y. Chen; J. Zhang; L. -R. Dai;	icassp	2022-05-22
1171	The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the SJTU system for ICASSP Multi-modal Information based Speech Processing Challenge (MISP) 2021.	W. Wang; et al.	icassp	2022-05-22
1172	Insights on Neural Representations for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: End-to-end automatic speech recognition (ASR) models aim to learn a generalised speech representation.	Anna Ollerenshaw; Md Asif Jalal; Thomas Hain;	arxiv-cs.CL	2022-05-19
1173	Minimising Biasing Word Errors for Contextual ASR with The Tree-Constrained Pointer Generator IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel tree-constrained pointer generator (TCPGen) component that enables end-to-end ASR models to bias towards a list of long-tail words obtained using external contextual information.	Guangzhi Sun; Chao Zhang; Philip C Woodland;	arxiv-cs.CL	2022-05-18
1174	PriMock57: A Dataset Of Primary Care Mock Consultations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We detail the development of a public access, high quality dataset comprising of 57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes.	Alex Papadopoulos Korfiatis; Francesco Moramarco; Radmila Sarac; Aleksandar Savkov;	acl	2022-05-17
1175	Deploying Self-supervised Learning in The Wild for Hybrid Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a full exploration on how to utilize uncurated audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model.	MOSTAFA KARIMI et. al.	arxiv-cs.SD	2022-05-17
1176	Unified Speech-Text Pre-training for Speech Translation and Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition.	YUN TANG et. al.	acl	2022-05-17
1177	Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR Via Speech Chain Reconstruction and Self-Transcribing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an improved consistency training paradigm of semi-supervised S2S ASR.	Heli Qi; Sashi Novitasari; Sakriani Sakti; Satoshi Nakamura;	arxiv-cs.CL	2022-05-14
1178	LAS-Transformer: An Enhanced Transformer Based on The Local Attention Mechanism for Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, Transformer-based models have shown promising results in automatic speech recognition (ASR), outperforming models based on recurrent neural networks (RNNs) and …	Pengbin Fu; Daxing Liu; Huirong Yang;	Inf.	2022-05-13
1179	DMS-SK/BLSTM-CTC Hybrid Network for Gesture/Speech Fusion and Its Application in Lunar Robot-Astronauts Interaction Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the future manned lunar exploration mission, astronauts would work with the lunar robots, which has a high requirement for human–robot interaction (HRI). As the accuracy of …	Jianli Ding; Jin Liu; X. Ning; Z. Kang;	Int. J. Pattern Recognit. Artif. Intell.	2022-05-12
1180	MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Large-scale automatic speech recognition model has achieved impressive performance. However, huge computational resources and massive amount of data are required to train an ASR …	Xing Wu; Yifan Jin; Jianjia Wang; Quan Qian; Yike Guo;	Algorithms	2022-05-11
1181	Hearing Voices at The National Library – A Speech Corpus and Acoustic Model for The Swedish Language Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper details our work in developing new acoustic models for automated speech recognition (ASR) at KBLab, the infrastructure for data-driven research at the National Library …	Martin Malmsten; Chris Haffenden; Love Borjeson;	ArXiv	2022-05-06
1182	Hearing Voices at The National Library — A Speech Corpus and Acoustic Model for The Swedish Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate different approaches for a viable speech-to-text pipeline for audiovisual resources in Swedish, using the wav2vec 2.0 architecture in combination with speech corpuses created from KB’s collections.	Martin Malmsten; Chris Haffenden; Love Börjeson;	arxiv-cs.CL	2022-05-06
1183	Speaker Recognition in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori.	NEERAJ CHHIMWAL et. al.	arxiv-cs.SD	2022-05-05
1184	Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.	FELIX WU et. al.	arxiv-cs.CL	2022-05-02
1185	DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition technology has played an indispensable role in realizing human-computer intelligent interaction. However, most of the current Chinese speech recognition systems …	Hong Lei; Yue Xiao; Yanchun Liang; Dalin Li; Heow Pueh Lee;	Complex.	2022-05-02
1186	Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present our recent improvements within the CapisciAMe project, an Italian initiative aimed at investigating the usage of deep learning strategies for automatic …	Davide Mulfari; A. Celesti; M. Villari;	2022 22nd IEEE International Symposium on Cluster, Cloud …	2022-05-01
1187	Bilingual End-to-End ASR with Byte-Level Subwords Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR).	Liuhui Deng; Roger Hsiao; Arnab Ghoshal;	arxiv-cs.CL	2022-05-01
1188	Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) researchers are turning their attention towards supporting low-resource languages, such as isiXhosa or Marathi, with only limited training …	THOMAS REITMAIER et. al.	Proceedings of the 2022 CHI Conference on Human Factors in …	2022-04-29
1189	Stuttering Disfluency Detection Using Machine Learning Approaches Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Stuttering is a neurodevelopmental speech disorder wherein people suffer from disfluency in speech generation. Recent research has applied machine learning and deep learning …	Abedal-Kareem Al-Banna; E. Edirisinghe; H. Fang; W. Hadi;	J. Inf. Knowl. Manag.	2022-04-28
1190	DualVoice: A Speech Interaction Method Using Whisper-Voice As Commands Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Applications based on speech recognition have become widely used, and speech input is increasingly being utilized to create documents. However, it is still difficult to correct …	J. Rekimoto;	CHI Conference on Human Factors in Computing Systems …	2022-04-27
1191	Improving Multimodal Speech Recognition By Data Augmentation and Speech Representations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate ways of improving the base speech recognition system by following similar techniques to the ones used for the visual encoder, namely, transferring representations and data augmentation.	Dan Oneata; Horia Cucu;	arxiv-cs.SD	2022-04-27
1192	E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to replace the VAD with an end-to-end ASR model capable of predicting segment boundaries in a streaming fashion, allowing the segmentation decision to be conditioned not only on better acoustic features but also on semantic features from the decoded text with negligible extra computation.	W. RONNY HUANG et. al.	arxiv-cs.SD	2022-04-22
1193	Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we aim to tackle these problems with a novel layer-wise adaptation structure injected into the E2E ASR model encoder.	Xun Gong; Yizhou Lu; Zhikai Zhou; Yanmin Qian;	arxiv-cs.SD	2022-04-21
1194	Personalized Taiwanese Speech Synthesis Using Cascaded ASR and TTS Framework Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: To bring endangered Taiwanese language back to life, this paper leveraged a large-scale Taiwanese across Taiwan (TAT) corpus to construct cascaded automatic speech recognition …	Y. LIAO et. al.	2022 32nd International Conference Radioelektronika …	2022-04-21
1195	WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Wave BERT (WaBERT), a novel end-to-end model combining the speech model and the language model for SLU tasks.	LIN YAO et. al.	arxiv-cs.CL	2022-04-21
1196	Disappeared Command: Spoofing Attack On Automatic Speech Recognition Systems with Sound Masking Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The development of deep learning technology has greatly promoted the performance improvement of automatic speech recognition (ASR) technology, which has demonstrated an ability …	Jinghui Xu; Jifeng Zhu; Yong Yang;	arxiv-cs.SD	2022-04-19
1197	Automated Speech Tools for Helping Communities Process Restricted-access Corpora for Language Revival Efforts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g.	NAY SAN et. al.	arxiv-cs.CL	2022-04-14
1198	HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the model efficiency, we propose an early exit scheme for ASR, namely HuBERT-EE, that allows the model to stop the inference dynamically.	Ji Won Yoon; Beom Jun Woo; Nam Soo Kim;	arxiv-cs.CL	2022-04-13
1199	ASR in German: A Detailed Error Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets.	Johannes Wirth; Rene Peinl;	arxiv-cs.CL	2022-04-12
1200	Large-Scale Streaming End-to-End Speech Translation with Neural Transducers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly.	Jian Xue; Peidong Wang; Jinyu Li; Matt Post; Yashesh Gaur;	arxiv-cs.CL	2022-04-11
1201	MAESTRO: Matched Speech Text Representations Through Modality Matching IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Maestro, a self-supervised training method to unify representations learnt from speech and text modalities.	ZHEHUAI CHEN et. al.	arxiv-cs.CL	2022-04-07
1202	Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis. In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.	SRAVYA POPURI et. al.	arxiv-cs.CL	2022-04-06
1203	Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a learnable and interpretable framework to combine SF and SSL representations.	DAN BERREBBI et. al.	arxiv-cs.CL	2022-04-05
1204	Towards End-to-end Unsupervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Similar to the trend of making supervised speech recognition end-to-end, we introduce wav2vec-U 2.0 which does away with all audio-side pre-processing and improves accuracy through better architecture.	Alexander H. Liu; Wei-Ning Hsu; Michael Auli; Alexei Baevski;	arxiv-cs.CL	2022-04-05
1205	A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage unpaired data to train a general sequence-to-sequence model.	YE-QIAN DU et. al.	arxiv-cs.SD	2022-04-05
1206	Audio-visual Multi-channel Speech Separation, Dereverberation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all three stages of the system is proposed.	Guinan Li; Jianwei Yu; Jiajun Deng; Xunying Liu; Helen Meng;	arxiv-cs.SD	2022-04-05
1207	A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using French as our investigation language, we train and compare gender-specific wav2vec 2.0 models against models containing different degrees of gender balance in their pre-training data.	Marcely Zanon Boito; Laurent Besacier; Natalia Tomashenko; Yannick Estève;	arxiv-cs.CL	2022-04-04
1208	Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model.	ABNER HERNANDEZ et. al.	arxiv-cs.CL	2022-04-04
1209	Deliberation Model for On-Device Spoken Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR’s text and audio embeddings.	DUC LE et. al.	arxiv-cs.CL	2022-04-04
1210	Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we have used transfer learning approach using most recent Deep Speech model i.e., deepspeech-0.9.3 to develop an end-to-end speech recognition system for Indian-English accents.	Priyank Dubey; Bilal Shah;	arxiv-cs.CL	2022-04-02
1211	Speaker Adaptation for Wav2vec2 Based Dysarthric ASR IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple adaptation network for fine-tuning wav2vec2 using fMLLR features.	MURALI KARTHICK BASKAR et. al.	arxiv-cs.SD	2022-04-02
1212	End-to-end Model for Named Entity Recognition from Speech Without Paired Training Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an approach to build an end-to-end neural model to extract semantic information in a scenario in which zero paired audio data is available.	Salima Mdhaffar; Jarod Duret; Titouan Parcollet; Yannick Estève;	arxiv-cs.CL	2022-04-02
1213	Text-To-Speech Data Augmentation for Low Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this research is to propose a new data augmentation method to improve ASR models for agglutinative and low-resource languages.	Rodolfo Zevallos;	arxiv-cs.CL	2022-04-01
1214	Zero-Shot Cross-lingual Aphasia Detection Using Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an end-to-end pipeline using pre-trained Automatic Speech Recognition (ASR) models that share cross-lingual speech representations and are fine-tuned for our desired low-resource languages.	GERASIMOS CHATZOUDIS et. al.	arxiv-cs.LG	2022-04-01
1215	End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS).	Xuankai Chang; Takashi Maekaku; Yuya Fujita; Shinji Watanabe;	arxiv-cs.SD	2022-04-01
1216	End-to-end Multi-talker Audio-visual ASR Using An Active Speaker Attention Module Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new approach for end-to-end audio-visual multi-talker speech recognition.	Richard Rose; Olivier Siohan;	arxiv-cs.SD	2022-04-01
1217	CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel channel and temporal-wise attention RNN (CTA-RNN) architecture based on the intermediate representations of pre-trained ASR models.	Chengxin Chen; Pengyuan Zhang;	arxiv-cs.SD	2022-03-31
1218	Analyzing The Factors Affecting Usefulness of Self-Supervised Pre-trained Representations for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, as part of the Interspeech Gram Vaani ASR challenge, we try to study the effect of domain, language, dataset size, and other aspects of our upstream pre-training SSL data on the final performance low-resource downstream ASR task.	Ashish Seth; Lodagala V S V Durga Prasad; Sreyan Ghosh; S. Umesh;	arxiv-cs.CL	2022-03-31
1219	Perceptive, Non-linear Speech Processing and Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss the potential of perceptive speech analysis and processing in combination with biologically plausible neural network processors.	Jean Rouat; Ramin Pichevar; Stéphane Loiselle;	arxiv-cs.SD	2022-03-31
1220	HiFi-VC: High Quality ASR-Based Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new any-to-any voice conversion pipeline.	A. Kashkin; I. Karpukhin; S. Shishkin;	arxiv-cs.SD	2022-03-31
1221	A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a comparative study on speaker-attributed automatic speech recognition (SA-ASR) in the multi-party meeting scenario, a topic with increasing attention in meeting rich transcription.	Fan Yu; Zhihao Du; Shiliang Zhang; Yuxiao Lin; Lei Xie;	arxiv-cs.SD	2022-03-31
1222	Effectiveness of Text to Speech Pseudo Labels for Forced Alignment and Cross Lingual Pretrained Models for Low Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present an approach to create labelled data for Maithili, Bhojpuri and Dogri by utilising pseudo labels from text to speech for forced alignment.	ANIRUDH GUPTA et. al.	arxiv-cs.CL	2022-03-31
1223	Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model.	JUNYI AO et. al.	arxiv-cs.SD	2022-03-31
1224	Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the problems, we introduce explicit interaction between characters and syllables using Self-conditioned connectionist temporal classification (CTC), in which the upper layers are “self-conditioned” on the intermediate predictions from the lower layers.	Yusuke Fujita; Tatsuya Komatsu; Yusuke Kida;	arxiv-cs.CL	2022-03-31
1225	Is Word Error Rate A Good Evaluation Metric for Speech Recognition in Indic Languages? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR).	PRIYANSHI SHAH et. al.	arxiv-cs.CL	2022-03-30
1226	Code Switched and Code Mixed Speech Recognition for Indic Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID).	HARVEEN SINGH CHADHA et. al.	arxiv-cs.CL	2022-03-30
1227	Improving Speech Recognition for Indic Languages Using Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages.	ANKUR DHURIYA et. al.	arxiv-cs.CL	2022-03-30
1228	Vakyansh: ASR Toolkit for Low Resource Indic Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through Vakyansh, we introduce automatic data pipelines for data creation, model training, model evaluation and deployment.	HARVEEN SINGH CHADHA et. al.	arxiv-cs.CL	2022-03-30
1229	Earnings-22: A Practical Benchmark for Accents in The Wild IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To ensure this type of speech is represented in ASR benchmarking, we present Earnings-22, a 125 file, 119 hour corpus of English-language earnings calls gathered from global companies.	Miguel Del Rio; Peter Ha; Quinten McNamara; Corey Miller; Shipra Chandra;	arxiv-cs.CL	2022-03-29
1230	Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (Simu-GAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels.	Chen Chen; Nana Hou; Yuchen Hu; Shashank Shirol; Eng Siong Chng;	arxiv-cs.SD	2022-03-29
1231	WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage.	BINBIN ZHANG et. al.	arxiv-cs.SD	2022-03-29
1232	Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate SSL frameworks such as the wav2vec 2.0 and WavLM models using different setups and compare their performance with different supervised pretraining setups, using two types of pathological speech, namely, Japanese electrolaryngeal and English dysarthric.	Lester Phillip Violeta; Wen-Chin Huang; Tomoki Toda;	arxiv-cs.SD	2022-03-29
1233	Finnish Parliament ASR Corpus – Analysis, Benchmarks and Statistics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we publish and analyse the Finnish parliament ASR corpus, the largest publicly available collection of manually transcribed speech data for Finnish with over 3000 hours of speech and 449 speakers for which it provides rich demographic metadata.	Anja Virkkunen; Aku Rouhe; Nhan Phan; Mikko Kurimo;	arxiv-cs.CL	2022-03-28
1234	A Dataset for Speech Emotion Recognition in Greek Theatrical Plays Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Machine learning methodologies can be adopted in cultural applications and propose new ways to distribute or even present the cultural content to the public. For instance, speech …	Maria Moutti; S. Eleftheriou; Panagiotis Koromilas; Theodoros Giannakopoulos;	International Conference on Language Resources and …	2022-03-27
1235	Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a modification of the conventional FDLP model that allows easy interpretability of the complex cepstrum as temporal modulations in an all-pole model approximation of the power of the speech signal.	Samik Sadhu; Hynek Hermansky;	arxiv-cs.SD	2022-03-24
1236	Disentangleing Content and Fine-grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.	XINTAO ZHAO et. al.	arxiv-cs.SD	2022-03-23
1237	BeParrot: Efficient Interface for Transcribing Unclear Speech Via Respeaking Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transcribing speech from audio files to text is an important task not only for exploring the audio content in text form but also for utilizing the transcribed data as a source to …	Riku Arakawa; Hiromu Yakura; Masataka Goto;	27th International Conference on Intelligent User Interfaces	2022-03-22
1238	Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence and ASR Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU).	ZEXUN WANG et. al.	arxiv-cs.CL	2022-03-22
1239	Inequity in Popular Speech Recognition Systems for Accented English Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Voice-enabled technology has become increasingly common in homes, businesses, and other parts of everyday life. The benefits of smart speakers, hands-free controllers, and digital …	Chinaemere Ike; Seth Polsley; T. Hammond;	27th International Conference on Intelligent User Interfaces	2022-03-22
1240	A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study developed and validated a training pipeline for fine-tuning state-of-the-art (SOTA) neural TTS models using child speech datasets.	Rishabh Jain; Mariam Yiwere; Dan Bigioi; Peter Corcoran; Horia Cucu;	arxiv-cs.SD	2022-03-22
1241	Intelligent Stuttering Speech Recognition: A Succinct Review Related Papers Related Patents Related Grants Related Venues Related Experts View	N. Banerjee; Samarjeet Borah; Nilambar Sethi;	Multimedia Tools and Applications	2022-03-19
1242	Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR.	Abdul Hameed Azeemi; Ihsan Ayyub Qazi; Agha Ali Raza;	arxiv-cs.LG	2022-03-18
1243	Prediction of Speech Intelligibility with DNN-based Performance Measures IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities.	Angel Mario Castro Martinez; Constantin Spille; Jana Roßbach; Birger Kollmeier; Bernd T. Meyer;	arxiv-cs.SD	2022-03-17
1244	Data Analytics on Eco-Conditional Factors Affecting Speech Recognition Rate of Modern Interaction Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech-based Interaction systems contribute to the growing class of contemporary interactive techniques (Human-Computer Interactive system), which have emerged quickly in the last …	A. C. KALADEVI et. al.	J. Mobile Multimedia	2022-03-16
1245	Modelling Word Learning and Recognition Using Visually Grounded Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Methods: We investigate the time-course of word recognition as simulated by the model using a gating paradigm to test whether its recognition is affected by well-known word-competition effects in human speech processing.	Danny Merkx; Sebastiaan Scholten; Stefan L. Frank; Mirjam Ernestus; Odette Scharenborg;	arxiv-cs.CL	2022-03-14
1246	The Improving Effect of Intelligent Speech Recognition System on English Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: To improve the effect of English learning in the context of smart education, this study combines speech coding to improve the intelligent speech recognition algorithm, builds an …	Qinqin Luo;	Adv. Multim.	2022-03-10
1247	Adaptation of A Pronunciation Dictionary for Dysarthric Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the general framework of an automatic speech recognition system, a pronunciation dictionary, that is a mapping table from a phoneme sequence to a word, is used both in the …	Yuya Sawa; R. Takashima; T. Takiguchi;	2022 IEEE 4th Global Conference on Life Sciences and …	2022-03-07
1248	Data Augmentation for Dysarthric Speech Recognition Based on Text-to-Speech Synthesis Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the field of automatic speech recognition (ASR) for people with dysarthria, it is problematic that not enough training speech data can be collected from people with dysarthria. …	Yuki Matsuzaka; R. Takashima; Chiho Sasaki; T. Takiguchi;	2022 IEEE 4th Global Conference on Life Sciences and …	2022-03-07
1249	Facial Feature Points for Japanese Speech Content Estimation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: An automatic minute generating system based on speech recognition technology can be used to improve the efficiency of minute meetings. When only one microphone was used for speech …	Etsuro Nakamura; Yusuke Honda; Y. Kageyama; Satoshi Hirose;	2022 IEEE 4th Global Conference on Life Sciences and …	2022-03-07
1250	AaeCAPTCHA: The Design and Implementation of Audio Adversarial CAPTCHA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the robustness of audio CAPTCHAs against automated abuses, we present the design and implementation of an audio adversarial CAPTCHA (aaeCAPTCHA) system in this paper.	Md Imran Hossen; Xiali Hei;	arxiv-cs.CR	2022-03-05
1251	Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system.	XIAOQIANG WANG et. al.	arxiv-cs.CL	2022-03-02
1252	Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network, which can be applied to a wider range of tasks.	Xuankai Chang; Niko Moritz; Takaaki Hori; Shinji Watanabe; Jonathan Le Roux;	arxiv-cs.SD	2022-03-01
1253	Deep Neural Network Based Chinese Dialect Classification Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the recent advance of neural networks in audio speech recognition (ASR), Deep Neural Network Based ASR has been widely used in multiple application scenarios such as smart …	MIAO WAN et. al.	2021 Ninth International Conference on Advanced Cloud and …	2022-03-01
1254	Multilingual Speech Recognition for GlobalPhone Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View	Martha Yifiru Tachbelie; S. Abate; Tanja Schultz;	Speech Commun.	2022-03-01
1255	Spike‐Enabled Audio Learning in Multilevel Synaptic Memristor Array‐Based Spiking Neural Network IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition involves the ability to learn the audios which are closely related to event sequence. Although speech recognition has been widely implemented in software neural …	Xulei Wu; B. Dang; Hong Wang; Xiu-Qing Wu; Yuchao Yang;	Advanced Intelligent Systems	2022-03-01
1256	A Conformer Based Acoustic Model for Robust Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acoustic model.	Yufeng Yang; Peidong Wang; DeLiang Wang;	arxiv-cs.SD	2022-03-01
1257	Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel text representation and training framework for E2E ASR models.	Samuel Thomas; Brian Kingsbury; George Saon; Hong-Kwang J. Kuo;	arxiv-cs.CL	2022-02-26
1258	Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages.	Xiaoxiao Miao; Xin Wang; Erica Cooper; Junichi Yamagishi; Natalia Tomashenko;	arxiv-cs.SD	2022-02-26
1259	A Survey of Multilingual Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we survey the state of the art in multilingual ASR models that are built with cross-lingual transfer in mind.	Hemant Yadav; Sunayana Sitaram;	arxiv-cs.CL	2022-02-25
1260	Language Technology Practitioners As Language Managers: Arbitrating Data Bias and Predictive Bias in ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we use the lens of language policy to analyse how current practices in training and testing ASR systems in industry lead to the data bias giving rise to these systematic error differences.	Nina Markl; Stephen Joseph McNulty;	arxiv-cs.CL	2022-02-25
1261	Ask2Mask: Guided Data Selection for Masked Speech Modeling Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods …	M. Baskar; A. Rosenberg; B. Ramabhadran; Yu Zhang; P. Moreno;	IEEE Journal of Selected Topics in Signal Processing	2022-02-24
1262	Korean Tokenization for Beam Search Rescoring in Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR.	Kyuhong Shim; Hyewon Bae; Wonyong Sung;	arxiv-cs.CL	2022-02-22
1263	Influencing Neutrosophic Factors of Speech Recognition Technology in English Collection Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Based on the research and analysis of speech recognition system and neural network principle, combined with English related decision tree, this paper completes the construction …	Xizhi Chu; Yuchen Liu;	J. Cases Inf. Technol.	2022-02-21
1264	End-to-end Contextual Asr Based on Posterior Distribution Adaptation for Hybrid Ctc/attention System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases.	Zhengyi Zhang; Pan Zhou;	arxiv-cs.CL	2022-02-17
1265	AISHELL-NER: Named Entity Recognition from Chinese Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech.	BOLI CHEN et. al.	arxiv-cs.CL	2022-02-17
1266	Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method for alleviating this issue by transferring knowledge from a language model neural network that can be pretrained with text-only data.	Yotaro Kubo; Shigeki Karita; Michiel Bacchiani;	arxiv-cs.CL	2022-02-16
1267	Conversational Speech Recognition By Learning Conversation-level Characteristics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework.	Kun Wei; Yike Zhang; Sining Sun; Lei Xie; Long Ma;	arxiv-cs.SD	2022-02-15
1268	USTED: Improving ASR with A Unified Speech and Text Encoder-Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose training ASR model jointly with a set of text-to-text auxiliary tasks with which it shares a decoder and parts of the encoder.	Bolaji Yusuf; Ankur Gandhe; Alex Sokolov;	arxiv-cs.CL	2022-02-12
1269	ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle those challenges by proposing ASRPU, a programmable accelerator for on-edge ASR.	Dennis Pinto; Jose-María Arnau; Antonio González;	arxiv-cs.AR	2022-02-10
1270	Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We further \textbf{(ii)} incorporate language model decoding in the ASR system, along with the fine-tuning method.	Peter Sullivan; Toshiko Shibano; Muhammad Abdul-Mageed;	arxiv-cs.CL	2022-02-10
1271	Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions.	FAN YU et. al.	arxiv-cs.SD	2022-02-08
1272	English Speech Emotion Recognition Method Based on Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View	Man Liu;	International Journal of Speech Technology	2022-02-08
1273	Understanding The Role of Self Attention for Efficient Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the role of self attention in Transformer-based speech recognition and present a practical technique to design a model that accelerates the inference and improve the performance.	Kyuhong Shim; Jungwook Choi; Wonyong Sung;	iclr	2022-02-08
1274	A Two-step Approach to Leverage Contextual Data: Speech Recognition in Air-traffic Communications IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a two-step callsign boosting approach: (1) at the 1 step (ASR), weights of probable callsign n-grams are reduced in G.fst and/or in the decoding FST (lattices), (2) at the 2 step (NLP), callsigns extracted from the improved recognition outputs with Named Entity Recognition (NER) are correlated with the surveillance data to select the most suitable one.	Iuliia Nigmatulina; Juan Zuluaga-Gomez; Amrutha Prasad; Seyyed Saeed Sarfjoo; Petr Motlicek;	arxiv-cs.CL	2022-02-08
1275	Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks, and increase scalability of the model to multiple tasks or languages.	Bethan Thomas; Samuel Kessler; Salah Karout;	arxiv-cs.CL	2022-02-07
1276	Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a key characteristic in audio-visual speech recognition (AVSR), relating linguistic information observed across visual and audio data has been a challenge, benefiting not only audio/visual speech recognition (ASR/VSR) but also for manipulating data within/across modalities. In this paper, we present a feature disentanglement-based framework for jointly addressing the above tasks.	Chih-Chun Yang; Wan-Cyuan Fan; Cheng-Fu Yang; Yu-Chiang Frank Wang;	aaai	2022-02-07
1277	Polyphonic Pitch Detection with Convolutional Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we outline an online polyphonic pitch detection system that streams audio to MIDI by ConvLSTMs.	Carl Thomé; Sven Ahlbäck;	arxiv-cs.SD	2022-02-04
1278	The RoyalFlush System of Speech Recognition for M2MeT Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge.	Shuaishuai Ye; Peiyao Wang; Shunfei Chen; Xinhui Hu; Xinkang Xu;	arxiv-cs.SD	2022-02-03
1279	Error Correction in ASR Using Sequence-to-Sequence Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The outputs of an ASR system are largely prone to phonetic and spelling errors. In this paper, we propose to use a powerful pre-trained sequence-to-sequence model, BART, further adaptively trained to serve as a denoising model, to correct errors of such types.	SAMRAT DUTTA et. al.	arxiv-cs.CL	2022-02-02
1280	Language Dependencies in Adversarial Attacks on Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare the attackability of a German and an English ASR system, taking Deepspeech as an example.	Karla Markert; Donika Mirdita; Konstantin Böttinger;	arxiv-cs.CL	2022-02-01
1281	A Bidirectional Context Embedding Transformer for Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transformers have become popular in building end-to-end automatic speech recognition (ASR) systems. However, transformer ASR systems are usually trained to give output sequences …	L. LIAO et. al.	Inf.	2022-01-29
1282	Reducing Language Context Confusion for End-to-end Code-switching Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory.	SHUAI ZHANG et. al.	arxiv-cs.CL	2022-01-28
1283	Sentiment-Aware Automatic Speech Recognition Pre-training for Enhanced Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER).	Ayoub Ghriss; Bo Yang; Viktor Rozgic; Elizabeth Shriberg; Chao Wang;	arxiv-cs.CL	2022-01-27
1284	Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language.	PIOTR ŻELASKO et. al.	arxiv-cs.SD	2022-01-26
1285	The Norwegian Parliamentary Speech Corpus IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To test the usefulness of this dataset, we have compared an ASR system trained on the NPSC with a baseline system trained on only manuscript-read speech.	Per Erik Solberg; Pablo Ortiz;	arxiv-cs.CL	2022-01-26
1286	Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-visual automatic speech recognition (AV-ASR) extends speech recognition by introducing the video modality as an ad-ditional source of information. In this work, the …	Dmitriy Serdyuk; Otavio Braga; O. Siohan;	Interspeech	2022-01-25
1287	Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, the information contained in the motion of the speaker’s mouth is used to augment the audio features.	Dmitriy Serdyuk; Otavio Braga; Olivier Siohan;	arxiv-cs.CV	2022-01-25
1288	Speech Recognition for Light Control on Raspberry Pi Using Python Programming Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Internet of Things has been substantially developed for disabled and elderly persons in various domains. Speech recognition is an extremely challenging technique for …	P. Netinant; Krairat Arpabusayapan; Meennapa Rukhiran;	Proceedings of the 2022 5th International Conference on …	2022-01-21
1289	Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when …	MENGZHE GENG et. al.	arxiv-cs.SD	2022-01-14
1290	Investigation of Data Augmentation Techniques for Disordered Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation.	MENGZHE GENG et. al.	arxiv-cs.SD	2022-01-14
1291	The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition.	Luke Prananta; Bence Mark Halpern; Siyuan Feng; Odette Scharenborg;	arxiv-cs.SD	2022-01-13
1292	End-to-End Speech to Braille Translation in Japanese Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study addresses an end-to-end braille translation approach from Japanese speech for the deaf-blind. In Japan, automatic Braille translation from spoken language is expected …	A. Kobayashi; Junji Onishi; H. Nishizaki; N. Kitaoka;	2022 IEEE International Conference on Consumer Electronics …	2022-01-07
1293	Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text.	AMIR HUSSEIN et. al.	arxiv-cs.CL	2022-01-07
1294	Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset.	TIEZHENG YU et. al.	arxiv-cs.CL	2022-01-07
1295	Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences for each partial hypothesis.	Jinchuan Tian; Jianwei Yu; Chao Weng; Yuexian Zou; Dong Yu;	arxiv-cs.CL	2022-01-06
1296	Robust Self-Supervised Audio-Visual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a self-supervised AVSR framework built upon Audio-Visual HuBERT (AV-HuBERT), a state-of-the-art audio-visual speech representation learning model.	Bowen Shi; Wei-Ning Hsu; Abdelrahman Mohamed;	arxiv-cs.SD	2022-01-05
1297	Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most easiest and efficient way for …	Yuanfeng Song; Raymond Chi-Wing Wong; Xuefang Zhao; Di Jiang;	arxiv-cs.DB	2022-01-04
1298	Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model …	Hosung Park; Changmin Kim; Hyunsoo Son; Soonshin Seo; Ji-Hwan Kim;	J. Web Eng.	2022-01-04
1299	ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spoken Language Understanding (SLU) aims to interpret the meanings of human speeches in order to support various human-machine interaction systems. A key technique for SLU is …	CHENGYU WANG et. al.	IEEE/ACM Transactions on Audio, Speech, and Language …	2022-01-01
1300	Towards Privacy-Preserving Speech Representation for Client-Side Data Sharing Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Privacy and security are major concerns when sharing and collecting speech data for cloud services such as automatic speech recognition (ASR) and speech emotion recognition (SER). …	Minh Tran; M. Soleymani;	ArXiv	2022-01-01
1301	Deep Investigation of The Recent Advances in Dialectal Arabic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition systems play an important role in human–machine interactions. Many systems exist for Arabic speech, however, there are limited systems for dialectal Arabic …	Hamzah A. Alsayadi; A. Abdelhamid; I. Hegazy; Bandar Alotaibi; Z. Fayed;	IEEE Access	2022-01-01
1302	An E2E-ASR-Based Iteratively-Trained Timestamp Estimator Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Text-to-speech alignment, also known as time alignment, is essential for automatic speech recognition (ASR) systems used for speech retrieval tasks, such as keyword search and …	Runyan Yang; Gaofeng Cheng; Pengyuan Zhang; Yonghong Yan;	IEEE Signal Processing Letters	2022-01-01
1303	Multi-sequence Intermediate Conditioning for CTC-based ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: End-to-end automatic speech recognition (ASR) directly maps input speech to a character sequence without using pronunciation lexica. However, in languages with thousands of …	Yusuke Fujita; Tatsuya Komatsu; Yusuke Kida;	ArXiv	2022-01-01
1304	Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech emotion recognition (SER) is essential for understanding a speaker’s intention. Recently, some groups have attempted to improve SER performance using a bidirectional long …	Jennifer Santoso; Takeshi Yamada; K. Ishizuka; Taiichi Hashimoto; S. Makino;	IEEE Access	2022-01-01
1305	Estonian Speech Recognition and Transcription Editing Service IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: . This paper describes the latest iteration of our Estonian speech recognition system and the publicly available transcription editing service. The system is now based on an …	Aivo Olev; Tanel Alumäe;	Balt. J. Mod. Comput.	2022-01-01
1306	HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While the Turkish language is listed among low-resource languages, literature on Turkish automatic speech recognition (ASR) is relatively old. In this paper, we present H UBERT – …	Ali Safaya; E. Erzin;	ArXiv	2022-01-01
1307	The Performance of Wearable Speech Enhancement System Under Noisy Environment: An Experimental Study Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Wearable speech enhancement can improve the recognition accuracy of the speech signals in stationary noise environments at 0dB to 60dB signal to noise ratio. Beamforming, adaptive …	Pavani Cherukuru; Mumtaz Begum Mustafa; Hema Subramaniam;	IEEE Access	2022-01-01
1308	MTL-SLT: Multi-Task Learning for Spoken Language Tasks IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Language understanding in speech-based systems has attracted extensive interest from both academic and industrial communities in recent years with the growing demand for …	ZHIQI HUANG et. al.	NLP4CONVAI	2022-01-01
1309	A Novel Approach of Audio-Visual Color Recognition Using KNN Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech is one of the attractive areas of the scientists to research in the field of machine learning and they got maximum success in Automatic Speech Recognition system. ASR …	Bachchu Paul; Tanushree Dey; Debashri Das Adhikary; Sanchita Guchhai; Somnath Bera;		2022-01-01
1310	Automatic Speech Recognition for Speech Assessment of Preschool Children Related Papers Related Patents Related Grants Related Venues Related Experts View	Amirhossein Abaskohi; Fatemeh Mortazavi; Hadi Moradi;	ArXiv	2022-01-01
1311	Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and A Two-Stage Pre-Trained Approach Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Nowadays Automatic Speech Recognition (ASR) systems can accurately recognize which words are said. However, due to the disfluency, grammatical error, and other phenomena in …	Junwei Liao; Yu Shi; Yong Xu;	IEEE Access	2022-01-01
1312	Natural Backdoor Attacks on Speech Recognition Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View	Jinwen Xin; X. Lyu; Jing Ma;	International Conference on Machine Learning for Cyber …	2022-01-01
1313	The BiLSTM-based Synthesized Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View	Dmitry Efanov; P. Aleksandrov; Nikolay Karapetyants;	BICA*AI	2022-01-01
1314	On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic recognition of dysarthric and elderly speech highly challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender commonly found in normal …	MENGZHE GENG et. al.	ArXiv	2022-01-01
1315	A Hidden Markov Optimization Model for Processing and Recognition of English Speech Feature Signals Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech recognition plays an important role in human–computer interaction. The higher the accuracy and efficiency of speech recognition are, the larger the improvement of …	Yinchun Chen;	Journal of Intelligent Systems	2022-01-01
1316	Emotional Speech Recognition Based on Lip-Reading Related Papers Related Patents Related Grants Related Venues Related Experts View	E. Ryumina; D. Ivanko;	International Conference on Speech and Computer	2022-01-01
1317	Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Time-frequency (TF) masks are widely used in speech enhancement (SE). However, accurately estimating TF masks from noisy speech remains a challenge to both statistical or neural …	Suliang Bu; Yunxin Zhao; Tuo Zhao; Shaojun Wang; Mei Han;	IEEE/ACM Transactions on Audio, Speech, and Language …	2022-01-01
1318	Evaluation of Automatic Speech Recognition Approaches Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: . Automatic Speech Recognition (ASR) is essential for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. …	R. P. MAGALHÃES et. al.	J. Inf. Data Manag.	2022-01-01
1319	Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: In this article we present the design and the development of a knowledge based computational linguistic tool, Mlphon for Malayalam language. Mlphon computationally models …	K. Manohar; A. Jayan; R. Rajan;	IEEE Access	2022-01-01
1320	Application of Wearable Motion Sensor in Business English Teaching Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: With the advancement of science and technology, portable motion sensors are becoming increasingly popular in life and have become a research point for improving life and learning, …	Dan Lu; Fen Guo;	Comput. Sci. Inf. Syst.	2022-01-01
1321	Cleanformer: A Microphone Array Configuration-invariant, Streaming, Multichannel Neural Enhancement Frontend for ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This work introduces the Cleanformer , a streaming multichannel neural based enhancement frontend for automatic speech recognition (ASR). This model has a conformer-based …	J. Caroselli; A. Naranayan; Tom O’Malley;	ArXiv	2022-01-01
1322	Efficient Multi-angle Audio-visual Speech Recognition Using Parallel WaveGAN Based Scene Classifier Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: : Recently, Audio-Visual Speech Recognition (AVSR), one of robust Automatic Speech Recognition (ASR) methods against acoustic noise, has been widely researched. AVSR combines ASR …	Shinnosuke Isobe; S. Tamura; Yuuto Gotoh; Masaki Nose;	International Conference on Pattern Recognition …	2022-01-01
1323	Taris: An Online Speech Recognition Framework with Sequence to Sequence Neural Networks for Both Audio-only and Audio-visual Speech Related Papers Related Patents Related Grants Related Venues Related Experts View	George Sterpu; N. Harte;	Comput. Speech Lang.	2022-01-01
1324	IBGS: A Wearable Smart System to Assist Visually Challenged IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Traditional blind guide devices are expensive and large. In this study, an intelligent blind guide system (IBGS) was introduced. GD32 is used as the main control chip, it …	Kun Xia; Xueyong Li; Haiyang Liu; Mingli Zhou; Kexin Zhu;	IEEE Access	2022-01-01
1325	Time-Reversal Enhancement Network With Cross-Domain Information for Noise-Robust Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to the enormous progress in deep learning, speech enhancement (SE) techniques have shown promising efficacy and play a pivotal role prior to an automatic speech recognition …	Fu-An Chao; J. Hung; T. Sheu; Berlin Chen;	IEEE MultiMedia	2022-01-01
1326	A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality speech. However, TTS speech intelligibility degrades in noisy environments …	Sashi Novitasari; S. Sakti; Satoshi Nakamura;	IEEE/ACM Transactions on Audio, Speech, and Language …	2022-01-01
1327	Generative Adversarial Networks for Speech Processing: A Review IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View	AAMIR WALI et. al.	Computer Speech & Language	2022-01-01
1328	Speech Recognition Lab Related Papers Related Patents Related Grants Related Venues Related Experts View	Alessia Cornaggia; Fahrettin Gökgöz; F. Kurth; Hans-Christian Schmitz; Kevin Wilkinghoff;	ICMCIS	2022-01-01
1329	NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper provides an overview of NVIDIA NeMo’s speech translation systems for the IWSLT 2022 Offline Speech Translation Task. Our cascade system consists of 1) Conformer RNN-T …	OLEKSII HRINCHUK et. al.	International Workshop on Spoken Language Translation	2022-01-01
1330	Towards Representative Subset Selection for Self-Supervised Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Self-supervised speech recognition models require considerable labeled training data for learning high-ﬁdelity representations for Automatic Speech Recognition (ASR) which is …	Abdul Hameed Azeemi; I. Qazi; Agha Ali Raza;	ArXiv	2022-01-01
1331	Performance Disparities Between Accents in Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) services are ubiquitous, transforming speech into text for systems like Amazon’s Alexa, Google’s Assistant, and Microsoft’s Cortana. However, …	Alex DiChristofano; Henry Shuster; Shefali Chandra; Neal Patwari;	ArXiv	2022-01-01
1332	Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transformer-based models have led to significant innovation in various classic and practical subjects, including speech processing, natural language processing, and computer …	Fu-Hao Yu; Kuan-Yu Chen; Keda Lu;	IEEE/ACM Transactions on Audio, Speech, and Language …	2022-01-01
1333	The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the HW-TSC’s designation of the Offline Speech Translation System submitted for IWSLT 2022 Evaluation. We explored both cascade and end-to-end system on three …	MINGHAN WANG et. al.	International Workshop on Spoken Language Translation	2022-01-01
1334	CMU’s IWSLT 2022 Dialect Speech Translation System IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes CMU’s submissions to the IWSLT 2022 dialect speech translation (ST) shared task for translating Tunisian-Arabic speech to English text. We use additional …	BRIAN YAN et. al.	International Workshop on Spoken Language Translation	2022-01-01
1335	Can Self-Supervised Learning Solve The Problem of Child Speech Recognition? Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challenging task. Current Automatic Speech Recognition (ASR) models required …	Rishabh Jain; Mariam Yiwere; Dan Bigioi; Peter Corcoran;	ArXiv	2022-01-01
1336	JHU IWSLT 2022 Dialect Speech Translation System Description Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper details the Johns Hopkins speech translation (ST) system used in the IWLST2022 dialect speech translation task. Our system uses a cascade of automatic speech …	Jinyi Yang; A. Hussein; Matthew Wiesner; S. Khudanpur;	International Workshop on Spoken Language Translation	2022-01-01
1337	ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Even though attention-based end-to-end (E2E) automatic speech recognition (ASR) models have been yielding state-of-the-art recognition accuracy, they still fall behind many of the …	Gaofeng Cheng; Haoran Miao; Runyan Yang; Keqi Deng; Yonghong Yan;	IEEE/ACM Transactions on Audio, Speech, and Language …	2022-01-01
1338	SSNCSE_NLP@LT-EDI-ACL2022: Speech Recognition for Vulnerable Individuals in Tamil Using Pre-trained XLSR Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition is a tool used to transform human speech into a written form. It is used in a variety of avenues, such as in voice commands, customer, service and …	Dhanya Srinivasan; B. Bharathi; Thenmozhi Durairaj; B. Senthilkumar;	LTEDI	2022-01-01
1339	Automatic Severity Assessment of Dysarthric Speech By Using Self-supervised Model with Multi-task Learning Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data …	E. Yeo; Kwanghee Choi; Sunhee Kim; Minhwa Chung;	ArXiv	2022-01-01
1340	End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Far-field multi-speaker automatic speech recognition (ASR) has drawn increasing attention in recent years. Most existing methods feature a signal processing frontend and an ASR …	WANGYOU ZHANG et. al.	IEEE/ACM Transactions on Audio, Speech, and Language …	2022-01-01
1341	Bangla Spoken Numerals Recognition By Using HMM Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech is one of the most natural forms of vocalized communication media. Nowadays with the advancement of machine learning, different doors are opened to us for finding several …	Bachchu Paul; Debashri Das Adhikary; Tanushree Dey; Sanchita Guchhait; Somnath Bera;		2022-01-01
1342	Exploring The Effect of Dialect Mismatched Language Models in Telugu Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Previous research has found that Acoustic Models (AM) of an Automatic Speech Recognition (ASR) system are susceptible to dialect variations within a language, thereby adversely …	Aditya Yadavalli; Mirishkar Sai Ganesh; A. Vuppala;	North American Chapter of the Association for Computational …	2022-01-01
1343	Influence of Accented Speech in Automatic Speech Recognition: A Case Study on Assamese L1 Speakers Speaking Code Switched Hindi-English Related Papers Related Patents Related Grants Related Venues Related Experts View	Joyshree Chakraborty; R. Sinha; Priyankoo Sarmah;	International Conference on Speech and Computer	2022-01-01
1344	A Perspective Study on Speech Emotion Recognition: Databases, Features and Classification Models Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is a popular research area with many variations in human behaviour functionalities and interactions. Human beings want speech for communication …	Raghu Kogila; M. Sadanandam;	Traitement du Signal	2021-12-31
1345	Multi-Variant Consistency Based Self-supervised Learning for Robust Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the proposed method on the commercially-motivated dataset, CHiME-4, and the meeting dataset, AMI.	Changfeng Gao; Gaofeng Cheng; Pengyuan Zhang;	arxiv-cs.SD	2021-12-23
1346	Voice Quality and Pitch Features in Transformer-Based Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the effects of incorporating voice quality and pitch features altogether and separately to a Transformer-based ASR model, with the intuition that the attention mechanisms might exploit latent prosodic traits.	Guillermo Cámbara; Jordi Luque; Mireia Farrús;	arxiv-cs.CL	2021-12-21
1347	JTubeSpeech: Corpus of Japanese Speech Collected from YouTube for Speech Recognition and Speaker Verification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct a new Japanese speech corpus called JTubeSpeech.	Shinnosuke Takamichi; Ludwig Kürzinger; Takaaki Saeki; Sayaka Shiota; Shinji Watanabe;	arxiv-cs.SD	2021-12-17
1348	Improving Deep Learning Based Automatic Speech Recognition for Gujarati IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that …	Deep Raval; Vyom Pathak; Muktan Patel; Brijesh Bhatt;	Transactions on Asian and Low-Resource Language Information …	2021-12-14
1349	Improving ASR Error Correction Using N-Best Hypotheses IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the field of Automatic Speech Recognition (ASR), Grammatical Error Correction (GEC) can be used to correct errors in recognition results of ASR systems and whereby it further …	Linchen Zhu; Wenjie Liu; Linquan Liu; Ed Lin;	2021 IEEE Automatic Speech Recognition and Understanding …	2021-12-13
1350	Data Augmentation for ASR Using TTS Via A Discrete Representation IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: While end-to-end automatic speech recognition (ASR) has achieved high performance, it requires a huge amount of paired speech and transcription data for training. Recently, data …	Sei Ueno; M. Mimura; S. Sakai; Tatsuya Kawahara;	2021 IEEE Automatic Speech Recognition and Understanding …	2021-12-13
1351	Automatic Speech Recognition for Low-Resource Languages: The Thuee Systems for The IARPA Openasr20 Evaluation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The paper introduces our Automatic Speech Recognition (ASR) systems for the IARPA Open Automatic Speech Recognition Challenge (OpenASR20) as well as some post explorations with …	Jing Zhao; Gui-Xin Shi; Guan-Bo Wang; Weiqiang Zhang;	2021 IEEE Automatic Speech Recognition and Understanding …	2021-12-13
1352	Far-Field Speech Recognition Based on Complex-Valued Neural Networks and Inter-Frame Similarity Difference Method Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Far-field automatic speech recognition (ASR) is a challenging task due to the background noise and reverberation. To address this issue, we introduce a novel end-to-end …	Y. Guo; Yifan Chen; Gaofeng Cheng; Pengyuan Zhang; Yonghong Yan;	2021 IEEE Automatic Speech Recognition and Understanding …	2021-12-13
1353	Detecting Audio Adversarial Examples with Logit Noising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to detect audio adversarial examples by adding noise to the logits before feeding them into the decoder of the ASR.	Namgyu Park; Sangwoo Ji; Jong Kim;	arxiv-cs.CR	2021-12-13
1354	PM-MMUT: Boosted Phone-Mask Data Augmentation Using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To boost the performance of PMT, we propose multi-modeling unit training (MMUT) architecture fusion with PMT (PM-MMUT).	Guodong Ma; Pengfei Hu; Nurmemet Yolwas; Shen Huang; Hao Huang;	arxiv-cs.SD	2021-12-13
1355	Improving Speech Recognition on Noisy Speech Via Speech Enhancement with Multi-Discriminators CycleGAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method named Multi-discriminators CycleGAN to reduce noise of input speech and therefore improve the automatic speech recognition performance.	Chia-Yu Li; Ngoc Thang Vu;	arxiv-cs.CL	2021-12-12
1356	Indonesian Speech Recognition Based on Deep Neural Network Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The speech recognition system using deep neural network and large-scale datasets have had good performance. Indonesian is a language with hundreds of millions of people. Due to …	Ruolin Yang; Jian Yang; Yu Lu;	2021 International Conference on Asian Language Processing …	2021-12-11
1357	Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we used a pre-trained acoustic model to generate a perceptual loss that makes speech enhancement more aware of the phonetic properties of the signal.	Peter Plantinga; Deblin Bagchi; Eric Fosler-Lussier;	arxiv-cs.SD	2021-12-11
1358	Sequence-level Self-learning with Multiple Hypotheses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR).	KENICHI KUMATANI et. al.	arxiv-cs.CL	2021-12-10
1359	Building A Great Multi-lingual Teacher with Sparsely-gated Mixture of Experts for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate how multi-lingual Automatic Speech Recognition (ASR) networks can be scaled up with a simple routing algorithm in order to achieve better accuracy.	KENICHI KUMATANI et. al.	arxiv-cs.CL	2021-12-10
1360	Revisiting The Boundary Between ASR and NLU in The Age of Conversational Dialog Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of the observations we make in this paper, we argue that (1) NLU should be cognizant of the presence of ASR models being used upstream in a dialog system’s pipeline, (2) ASR should be able to learn from errors found in NLU, (3) there is a need for end-to-end datasets that provide semantic annotations on spoken input, (4) there should be stronger collaboration between ASR and NLU research communities.	Manaal Faruqui; Dilek Hakkani-Tür;	arxiv-cs.CL	2021-12-10
1361	A Sequence-to-sequence Based Error Correction Model for Medical Automatic Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The use of Automatic Speech Recognition (ASR) systems in medical applications is receiving rapidly growing interest due to their ability to reduce distractions and the cognitive …	Yu Jiang; C. Poellabauer;	2021 IEEE International Conference on Bioinformatics and …	2021-12-09
1362	CommanderGabble: A Universal Attack Against ASR Systems Leveraging Fast Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) systems are widely used in various online transcription services and personal digital assistants. Emerging lines of research have demonstrated …	Zhaohe Zhang; Edwin Yang; Song Fang;	Annual Computer Security Applications Conference	2021-12-06
1363	Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages.	JINCHUAN TIAN et. al.	arxiv-cs.AI	2021-12-05
1364	Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: SUMMARY Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such …	Sashi Novitasari; S. Sakti; Satoshi Nakamura;	IEICE Trans. Inf. Syst.	2021-12-01
1365	Speech Disorders Classification in Phonetic Exams with MFCC and DTW Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recognizing disordered speech is a challenge to Automatic Speech Recognition (ASR) systems. This research focuses on classifying disordered speech vs. non-disordered speech …	JUETING LIU et. al.	2021 IEEE 7th International Conference on Collaboration and …	2021-12-01
1366	Multimodal N-best List Rescoring with Weakly Supervised Pre-training in Hybrid Speech Recognition Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: N-best list rescoring, an essential step in hybrid automatic speech recognition (ASR), aims to re-evaluate the N-best hypothesis list decoded by the acoustic model (AM) and …	Yuanfeng Song; Xiaoling Huang; Xuefang Zhao; Di Jiang; Raymond Chi-Wing Wong;	2021 IEEE International Conference on Data Mining (ICDM)	2021-12-01
1367	Joint Modeling of Code-Switched and Monolingual ASR Via Conditional Factorization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition.	BRIAN YAN et. al.	arxiv-cs.CL	2021-11-29
1368	Romanian Speech Recognition Experiments from The ROBIN Project Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents different speech recognition experiments with deep neural networks focusing on producing fast (under 100ms latency from the network itself), while still reliable models.	Andrei-Marius Avram; Vasile Păiş; Dan Tufiş;	arxiv-cs.CL	2021-11-23
1369	Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we look into this challenge by utilizing the location information of target speakers in the 3D space for the first time.	Yiwen Shao; Shi-Xiong Zhang; Dong Yu;	arxiv-cs.SD	2021-11-22
1370	Speech-T: Transducer for Text to Speech and Beyond IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that monotonic alignments are also critical to text to speech (TTS) synthesis and streaming TTS is also an important application scenario, in this work, we explore the possibility of applying Transducer to TTS and more.	JIAWEI CHEN et. al.	nips	2021-11-20
1371	PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.	CHENG-I JEFF LAI et. al.	nips	2021-11-20
1372	FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment.	YICHONG LENG et. al.	nips	2021-11-20
1373	SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to create a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE) consisting of limited-size labeled training sets and corresponding evaluation sets.	SUWON SHON et. al.	arxiv-cs.CL	2021-11-19
1374	Simultaneous Speech-to-Speech Translation System with Transformer-Based Incremental ASR, MT, and TTS Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this paper, we present an English-to-Japanese simultaneous speech-to-speech translation (S2ST) system. It has three Transformer-based incremental processing modules for S2ST: …	RYO FUKUDA et. al.	2021 24th Conference of the Oriental COCOSDA International …	2021-11-18
1375	GAMVA: A Japanese Audio-Visual Multi-Angle Speech Corpus Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-visual speech recognition (AVSR) has contributed to improve Automatic Speech Recognition (ASR) accuracy in noisy environments. In real-world scenarios, a speaker does not …	SHINNOSUKE ISOBE et. al.	2021 24th Conference of the Oriental COCOSDA International …	2021-11-18
1376	Khmer Speech Translation Corpus of The Extraordinary Chambers in The Courts of Cambodia (ECCC) Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech translation (ST) is a subject of rapidly increasing interest in the area of speech processing research. This interest is apparent from the increasing tools and corpora for …	KAK SOKY et. al.	2021 24th Conference of the Oriental COCOSDA International …	2021-11-18
1377	M2ASR-MONGO: A Free Mongolian Speech Database and Accompanied Baselines Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep learning has significantly improved the performance of automatic speech recognition (ASR), in particular for major languages such as English and Chinese. However, for minor …	Tiankai Zhi; Ying Shi; Wenqiang Du; Guanyu Li; Dong Wang;	2021 24th Conference of the Oriental COCOSDA International …	2021-11-18
1378	Investigation of A Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-to-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to the presence of distortion, most of the single-channel frequency-domain speech enhancement (SE) approaches are still challenging for downstream automatic speech recognition …	MAHBUB E. NOOR et. al.	2021 24th Conference of the Oriental COCOSDA International …	2021-11-18
1379	A Multi-Genre Urdu Broadcast Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper reports the development of a multi-genre Urdu Broadcast (BC) corpus and a Large Vocabulary Continuous Speech Recognition (LVCSR) system. BC speech corpus of 98 hours …	Erbaz Khan; Sahar Rauf; F. Adeeba; S. Hussain;	2021 24th Conference of the Oriental COCOSDA International …	2021-11-18
1380	Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR.	Yi-Chang Chen; Chun-Yen Cheng; Chien-An Chen; Ming-Chieh Sung; Yi-Ren Yeh;	arxiv-cs.CL	2021-11-16
1381	Analysis of Data Augmentation Methods for Low-Resource Maltese ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case.	ANDREA DEMARCO et. al.	arxiv-cs.CL	2021-11-15
1382	Visualizing Automatic Speech Recognition – Means for A Better Understanding? Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) is improving ever more at mimicking human speech processing. The functioning of ASR, however, remains to a large extent obfuscated by the …	KARLA MARKERT et. al.	ArXiv	2021-11-10
1383	Scaling ASR Improves Zero and Few Shot Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose data selection techniques to efficiently scale training data to find the most valuable samples in massive datasets.	ALEX XIAO et. al.	arxiv-cs.CL	2021-11-10
1384	Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.	Salima Mdhaffar; Jean-François Bonastre; Marc Tommasi; Natalia Tomashenko; Yannick Estève;	arxiv-cs.CL	2021-11-07
1385	Protection Method Based on Multiple Sub-Detectors Against Audio Adversarial Examples Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Applications with audio speech recognition usually involve personal and authentication information; therefore, security measurement for audio speech recognition is one of the most …	Keiichi Tamura; Hajime Ito;	2021 IEEE 12th International Workshop on Computational …	2021-11-06
1386	Development of Speech Therapy Mobile Application for Speech Disorder Post-Stroke Patients Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In Malaysia, stroke is the third cause of death and disability. Stroke cause significant injury to the brain that may result in long-term problems such as communication, …	H. BASIRON et. al.	2021 IEEE 11th International Conference on System …	2021-11-06
1387	Privacy Attacks for Automatic Speech Recognition Acoustic Models in A Federated Learning Framework IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset.	Natalia Tomashenko; Salima Mdhaffar; Marc Tommasi; Yannick Estève; Jean-François Bonastre;	arxiv-cs.CL	2021-11-05
1388	Effective Cross-Utterance Language Modeling for Conversational Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To flesh out our ideas, we frame the ASR N-best hypothesis rescoring task as a prediction problem, leveraging BERT, an iconic pre-trained LM, as the ingredient vehicle to facilitate selection of the oracle hypothesis from a given N-best hypothesis list.	Bi-Cheng Yan; Hsin-Wei Wang; Shih-Hsuan Chiu; Hsuan-Sheng Chiu; Berlin Chen;	arxiv-cs.CL	2021-11-05
1389	Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that by adding a relatively small number of extra parameters to the encoder layers via so-called residual adapter, we can achieve similar adaptation gains compared to model fine-tuning, while only updating a tiny fraction (less than 0.5%) of the model parameters.	Katrin Tomanek; Vicky Zayats; Dirk Padfield; Kara Vaillancourt; Fadi Biadsy;	emnlp	2021-11-05
1390	Sequential Randomized Smoothing for Adversarially Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.	Raphael Olivier; Bhiksha Raj;	arxiv-cs.CL	2021-11-05
1391	Context-Aware Transformer Transducer for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel context-aware transformer transducer (CATT) network that improves the state-of-the-art transformer-based ASR system by taking advantage of such contextual signals.	FENG-JU CHANG et. al.	arxiv-cs.CL	2021-11-05
1392	A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explored partial fine-tuning and entire fine-tuning on wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding.	Yingzhi Wang; Abdelmoumene Boumadane; Abdelwahab Heba;	arxiv-cs.CL	2021-11-04
1393	Speech Recognition for Air Traffic Control Via Feature Learning and End-to-end Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems.	Peng Fan; Dongyue Guo; Yi Lin; Bo Yang; Jianwei Zhang;	arxiv-cs.SD	2021-11-04
1394	Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of suppressing background noise with a conventional cascaded pipeline, we employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.	HEMING WANG et. al.	arxiv-cs.SD	2021-10-28
1395	WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks.	SANYUAN CHEN et. al.	arxiv-cs.CL	2021-10-26
1396	ViDA-MAN: Visual Dialog with Digital Humans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries.	TONG SHEN et. al.	arxiv-cs.CV	2021-10-25
1397	Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an embedding aligner and modality switch training to better align the speech and text latent spaces.	WEI WANG et. al.	arxiv-cs.SD	2021-10-23
1398	A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The speech recognition technology of isolated words is one of the widely used speech recognition technologies at present. The isolated Words Speech Recognition technology for …	Qiuyun Hao; Fuqiang Wang; Xiaofeng Ma; Peng Zhang;	2021 14th International Congress on Image and Signal …	2021-10-23
1399	Research of Automatic Speech Recognition of Asante-Twi Dialect For Translation Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper presents a new way of building low-resourced dialect Automatic Speech Recognition (ASR) systems using a small database using the Asante-Twi dialect. Three different ASR …	Adwoa Agyeiwaa Boakye-Yiadom; Mingwei Qin; Ren Jing;	Proceedings of the 2021 5th International Conference on …	2021-10-22
1400	A Preliminary Study on Wav2Vec 2.0 Embeddings for Text-to-Speech Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Wav2Vec 2.0 (W2V), a self-supervised speech representation trained with massive unlabeled speech data, showed promising results on Automatic Speech Recognition (ASR). In spite of …	Yohan Lim; Namhyeong Kim; Seung Yun; Sang-Hun Kim; Seung-Ik Lee;	2021 International Conference on Information and …	2021-10-20
1401	An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency.	Huaibo Zhao; Yosuke Higuchi; Tetsuji Ogawa; Tetsunori Kobayashi;	arxiv-cs.SD	2021-10-20
1402	Speech Pattern Based Black-box Model Watermarking for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first black-box model watermarking framework for protecting the IP of ASR models.	HAOZHE CHEN et. al.	arxiv-cs.SD	2021-10-19
1403	SLAM: A Unified Encoder for Speech and Language Modeling Via Speech-Text Joint Pre-Training IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.	ANKUR BAPNA et. al.	arxiv-cs.CL	2021-10-19
1404	AequeVox: Automated Fairness Testing of Speech Recognition Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce, AequeVox, an automated testing framework for evaluating the fairness of ASR systems.	Sai Sathiesh Rajan; Sakshi Udeshi; Sudipta Chattopadhyay;	arxiv-cs.LG	2021-10-19
1405	Analysis of French Phonetic Idiosyncrasies for Accent Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using spectrograms of speech signals, we propose a multi-class classification framework for accent recognition.	Pierre Berjon; Avishek Nag; Soumyabrata Dev;	arxiv-cs.CL	2021-10-18
1406	Improving Model Stability and Training Efficiency in Fast, High Quality Expressive Voice Conversion System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Voice conversion (VC) systems have made significant progress owing to advanced deep learning methods. Current research is not only concerned with high-quality and fast audio …	ZHIYUAN ZHAO et. al.	Companion Publication of the 2021 International Conference …	2021-10-18
1407	Measuring Frequency of Child-directed WH-Question Words for Alternate Preschool Locations Using Speech Recognition and Location Tracking Technologies Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Speech and language development in children are crucial for ensuring effective skills in their long-term learning ability. A child’s vocabulary size at the time of entry into …	PRASANNA V. KOTHALKAR et. al.	Companion Publication of the 2021 International Conference …	2021-10-18
1408	Multilingual Speech Recognition Using Knowledge Transfer Across Learning Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to enhance the multilingual ASR performance in two ways, 1)studying the impact of feeding a one-hot vector identifying the language, 2)formulating the task with a meta-learning objective combined with self-supervised learning (SSL).	Rimita Lahiri; Kenichi Kumatani; Eric Sun; Yao Qian;	arxiv-cs.CL	2021-10-15
1409	CORAA: A Large Corpus of Spontaneous and Prepared Speech Manually Validated for Speech Recognition in Brazilian Portuguese IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents CORAA (Corpus of Annotated Audios) v1.	ARNALDO CANDIDO JUNIOR et. al.	arxiv-cs.CL	2021-10-14
1410	M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems.	FAN YU et. al.	arxiv-cs.SD	2021-10-14
1411	Prompt-tuning in ASR Systems for Efficient Domain-adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we overcome the problem using prompt-tuning, a methodology that trains a small number of domain token embedding parameters to prime a transformer-based LM to a particular domain.	SAKET DINGLIWAL et. al.	arxiv-cs.CL	2021-10-13
1412	Improvements of SpeeD’s Romanian ASR System During ReTeRom Project Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic speech recognition (ASR) for Romanian language is on an ascending trend of interest for the scientific community. In the last two years several research groups reported …	Alexandru-Lucian Georgescu; H. Cucu; C. Burileanu;	2021 International Conference on Speech Technology and …	2021-10-13
1413	Corpus Design and Automatic Speech Recognition for Deaf and Hard-of-Hearing People Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This study describes automatic speech recognition (ASR) for the deaf and hard-of-hearing people. In the relevant literature, ASR for the deaf has been studied in a manner similar …	A. Kobayashi; K. Yasu; H. Nishizaki; N. Kitaoka;	2021 IEEE 10th Global Conference on Consumer Electronics …	2021-10-12
1414	Emotion Recognition Combining Acoustic and Linguistic Features Based on Speech Recognition Results Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this study, a speech emotion recognition method that uses both acoustic and linguistic features is studied. Various emotion recognition methods using both the abovementioned …	Misaki Sakurai; T. Kosaka;	2021 IEEE 10th Global Conference on Consumer Electronics …	2021-10-12
1415	K-Wav2vec 2.0: Automatic Speech Recognition Based on Joint Decoding of Graphemes and Syllables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present K-Wav2Vec 2.0, which is a modified version of Wav2vec 2.0 designed for Korean automatic speech recognition by exploring and optimizing various factors of the original Wav2vec 2.0.	Jounghee Kim; Pilsung Kang;	arxiv-cs.CL	2021-10-11
1416	Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose evaluating ASR output hypotheses quality with SemDist that can measure semantic correctness by using the distance between the semantic vectors of the reference and hypothesis extracted from a pre-trained language model.	SUYOUN KIM et. al.	arxiv-cs.CL	2021-10-11
1417	Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning.	YIMING WANG et. al.	arxiv-cs.CL	2021-10-10
1418	An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.	XUANKAI CHANG et. al.	arxiv-cs.CL	2021-10-09
1419	WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.	BINBIN ZHANG et. al.	arxiv-cs.SD	2021-10-07
1420	FAST-RIR: Fast Neural Diffuse Room Impulse Response Generator IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.	ANTON RATNARAJAH et. al.	arxiv-cs.SD	2021-10-07
1421	Explaining The Attention Mechanism of End-to-End Speech Recognition Using Decision Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we use decision trees to explain how the attention mechanism impact itself in speech recognition.	Yuanchao Wang; Wenji Du; Chenghao Cai; Yanyan Xu;	arxiv-cs.CL	2021-10-07
1422	Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.	Liang-Hsuan Tseng; Yu-Kuan Fu; Heng-Jui Chang; Hung-yi Lee;	arxiv-cs.CL	2021-10-07
1423	Magic Dust for Cross-lingual Adaptation of Monolingual Wav2vec-2.0 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.	Sameer Khurana; Antoine Laurent; James Glass;	arxiv-cs.CL	2021-10-07
1424	Spell My Name: Keyword Boosted Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords, which in turn enables better readability of the results.	Namkyu Jung; Geonmin Kim; Joon Son Chung;	arxiv-cs.SD	2021-10-06
1425	Integrating Categorical Features in End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we treat all these aspects as categorical information in an ASR system, and propose a simple yet effective way to integrate categorical features into E2E model.	Rongqing Huang;	arxiv-cs.CL	2021-10-06
1426	Is Attention Always Needed? A Case Study on Language Identification from Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The present study introduces convolutional recurrent neural network (CRNN) based LID, designed to operate on the Mel-frequency Cepstral Coefficient (MFCC) characteristics of audio samples.	Atanu Mandal; Santanu Pal; Indranil Dutta; Mahidas Bhattacharya; Sudip Kumar Naskar;	arxiv-cs.LG	2021-10-05
1427	BERT Attends The Conversation: Improving Low-Resource Conversational ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose new, data-efficient training tasks for BERT models that improve performance of automatic speech recognition (ASR) systems on conversational speech.	Pablo Ortiz; Simen Burud;	arxiv-cs.CL	2021-10-05
1428	Evaluation of Automatic Speech Recognition Systems Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and …	MATHEUS XAVIER SAMPAIO et. al.	SBBD	2021-10-04
1429	Audio Steganography with Speech Recognition System Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples that are intentionally crafted by adding small perturbations to the original input. Most works focus on …	HAO TAN et. al.	2021 IEEE Sixth International Conference on Data Science in …	2021-10-01
1430	SpliceOut: A Simple and Efficient Audio Augmentation Method IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SpliceOut, a simple modification to time masking which makes it computationally more efficient.	Arjit Jain; Pranay Reddy Samala; Deepak Mittal; Preethi Jyoti; Maneesh Singh;	arxiv-cs.SD	2021-09-30
1431	FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy.	YICHONG LENG et. al.	arxiv-cs.CL	2021-09-29
1432	Challenges and Opportunities of Speech Recognition for Bengali Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research work, we sedulously disclose the current status of the Bengali ASR system’s research endeavors.	M. F. Mridha; Abu Quwsar Ohi; Md. Abdul Hamid; Muhammad Mostafa Monowar;	arxiv-cs.CL	2021-09-27
1433	Audio-Visual Speech Recognition Is Worth 32$\times$32$\times$8 Voxels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to replace the 3D convolutional visual front-end with a video transformer front-end.	Dmitriy Serdyuk; Otavio Braga; Olivier Siohan;	arxiv-cs.CV	2021-09-20
1434	Audio-Visual Speech Recognition Is Worth $32\times 32\times 8$ Voxels Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Audio-visual automatic speech recognition (AV-ASR) intro-duces the video modality into the speech recognition process, often by relying on information conveyed by the motion of …	Dmitriy Serdyuk; Otavio Braga; O. Siohan;	2021 IEEE Automatic Speech Recognition and Understanding …	2021-09-20
1435	Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups.	FELIX WU et. al.	arxiv-cs.CL	2021-09-14
1436	Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In such cases, we show that the domain-independent acoustic models learned from the high-resource language through unsupervised domain adaptation (UDA) schemes can enhance the performance of the ASR in the low-resource language.	Anoop C S; Prathosh A P; A G Ramakrishnan;	arxiv-cs.CL	2021-09-12
1437	Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the self-attention channel combinator (SACC) ASR frontend, which leverages the self-attention mechanism to combine multichannel audio signals in the magnitude spectral domain.	RONG GONG et. al.	arxiv-cs.SD	2021-09-10
1438	Using Data Augmentation and Time-Scale Modification to Improve ASR of Children’s Speech in Noisy Environments Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Current ASR systems show poor performance in recognition of children’s speech in noisy environments because recognizers are typically trained with clean adults’ speech and …	H. Kathania; Sudarsana Reddy Kadiri; P. Alku; M. Kurimo;	Applied Sciences	2021-09-10
1439	DeepEMO: Deep Learning for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We proposed the industry level deep learning approach for speech emotion recognition task.	Enkhtogtokh Togootogtokh; Christian Klasen;	arxiv-cs.SD	2021-09-09
1440	Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel tree-constrained pointer generator (TCPGen) component is proposed that incorporates such knowledge as a list of biasing words into both attention-based encoder-decoder and transducer end-to-end ASR models in a neural-symbolic way.	Guangzhi Sun; Chao Zhang; Philip C. Woodland;	arxiv-cs.CL	2021-09-01
1441	BERT-Based Semantic Model for Rescoring N-Best Speech Recognition List IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This work aims to improve automatic speech recognition (ASR) by modeling long-term semantic relations. We propose to perform this through rescoring the ASR N-best hypotheses list. …	D. Fohr; I. Illina;	Interspeech	2021-08-30
1442	ETLT 2021: Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The paper presents the Second ASR Challenge for Non-native Children’s Speech proposed as a Special Session at Interspeech 2021, following the successful first challenge at …	R. GRETTER et. al.	Interspeech	2021-08-30
1443	Weakly Supervised Construction of ASR Systems from Massive Video Data Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Despite the rapid development of deep learning models, for real-world applications, building large-scale Automatic Speech Recognition (ASR) systems from scratch is still …	Mengli Cheng; Chengyu Wang; Jun Huang; Xiaobo Wang;	Interspeech	2021-08-30
1444	Adversarial Example Devastation and Detection on Speech Recognition System By Adding Random Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm of devastation and detection on adversarial examples that can attack current advanced ASR systems.	Mingyu Dong; Diqun Yan; Yongkang Gong; Rangding Wang;	arxiv-cs.SD	2021-08-30
1445	ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: %To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics.	LINGYUN FENG et. al.	arxiv-cs.CL	2021-08-30
1446	You Don’t Understand Me!: Comparing ASR Results for L1 and L2 Speakers of Swedish IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The performance of Automatic Speech Recognition (ASR) systems has constantly increased in state-of-the-art development. However, performance tends to decrease considerably in more …	Ronald Cumbal; Birger Moell; José Lopes; Olov Engwall;	Interspeech	2021-08-30
1447	Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR).	INJY HAMED et. al.	arxiv-cs.CL	2021-08-29
1448	Improving Callsign Recognition with Air-surveillance Data in Air-traffic Communication IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate two approaches: (1) G-boosting, when callsigns weights are adjusted at language model level (G) and followed by the dynamic decoder with an on-the-fly composition, and (2) lattice rescoring when callsign information is introduced on top of lattices generated using a conventional decoder.	Iuliia Nigmatulina; Rudolf Braun; Juan Zuluaga-Gomez; Petr Motlicek;	arxiv-cs.CL	2021-08-27
1449	Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to (1) automatically segment the ATCO and pilot data based on an intuitive approach exploiting ASR transcripts and (2) subsequently consider an automatic recognition of ATCOs’ and pilots’ voice as two separate tasks.	AMRUTHA PRASAD et. al.	arxiv-cs.CL	2021-08-27
1450	Injecting Text in Self-Supervised Speech Pretraining IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to jointly learn representations during pretraining from two different modalities: speech and text.	ZHEHUAI CHEN et. al.	arxiv-cs.CL	2021-08-27
1451	Task-aware Warping Factors in Mask-based Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the use of two task-aware warping factors in mask-based speech enhancement (SE).	Qiongqiong Wang; Kong Aik Lee; Takafumi Koshinaka; Koji Okabe; Hitoshi Yamamoto;	arxiv-cs.SD	2021-08-27
1452	Spontaneous Speech Summarization: Transformers All The Way Through Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper proposes a speech summarization system for spontaneous speech. The proposed system consists of speech segmentation, speech recognition, and extractive text …	Tomoki Hayashi; Takenori Yoshimura; Masaya Inuzuka; Ibuki Kuroyanagi; Osamu Segawa;	2021 29th European Signal Processing Conference (EUSIPCO)	2021-08-23
1453	Automatic Speech Recognition And Limited Vocabulary: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possible future directions in ASR using a limited vocabulary.	Jean Louis K. E. Fendji; Diane C. M. Tala; Blaise O. Yenke; Marcellin Atemkeng;	arxiv-cs.AI	2021-08-23
1454	Few-Shot Learning for Frame-Wise Phoneme Recognition: Adaptation of Matching Networks Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, the topic of Few-Shot Learning (FSL) is emerging as a radical direction in machine learning, well established with a variety of paradigms and network realizations for …	TIRTHANKAR BANERJEE et. al.	2021 29th European Signal Processing Conference (EUSIPCO)	2021-08-23
1455	Data Augmentation Using CycleGAN for End-to-End Children ASR IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recent deep learning algorithms are known to perform better for Automatic Speech Recognition (ASR) of adult speakers, however, yet remains a challenge to recognize children’s …	D. K. Singh; Preet P. Amin; Hardik B. Sailor; H. Patil;	2021 29th European Signal Processing Conference (EUSIPCO)	2021-08-23
1456	Multilingual Speech Recognition for Low-Resource Indian Languages Using Multi-Task Conformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a multi-task learning-based transformer model for low-resource multilingual speech recognition for Indian languages.	Krishna D N;	arxiv-cs.CL	2021-08-22
1457	Hierarchical Summarization for Longform Spoken Dialog IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we design a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges.	Daniel Li; Thomas Chen; Albert Tung; Lydia Chilton;	arxiv-cs.CL	2021-08-21
1458	A Light-weight Contextual Spelling Correction Model for Customizing Transducer-based Speech Recognition Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems.	Xiaoqiang Wang; Yanqing Liu; Sheng Zhao; Jinyu Li;	arxiv-cs.CL	2021-08-17
1459	Meaning Error Rate: ASR Domain-specific Metric Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we build a speech recognition quality evaluation framework that unifies feedback coming from different types of customers into a single metric.	Ludmila Gordeeva; Vasily Ershov; Oleg Gulyaev; Igor Kuralenok;	kdd	2021-08-12
1460	Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an approach, "Mondegreen", to correct voice queries in text space without depending on audio signals, which may not always be available due to system constraints or privacy or bandwidth (for example, some ASR systems run on-device) considerations.	SUKHDEEP S. SODHI et. al.	kdd	2021-08-12
1461	On The Compensation Between Magnitude and Phase in Speech Separation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase.	Zhong-Qiu Wang; Gordon Wichern; Jonathan Le Roux;	arxiv-cs.SD	2021-08-11
1462	The HW-TSC’s Offline Speech Translation Systems for IWSLT 2021 Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our work in participation of the IWSLT-2021 offline speech translation task.	MINGHAN WANG et. al.	arxiv-cs.CL	2021-08-09
1463	StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized By Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome this problem, we propose the use of automatic speech recognition to assist model training, to improve StarGAN-VC, especially in low-resource scenarios.	Shoki Sakamoto; Akira Taniguchi; Tadahiro Taniguchi; Hirokazu Kameoka;	arxiv-cs.SD	2021-08-09
1464	Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we exploit the scope of the transformer distillation method that is specifically designed for knowledge distillation from a transformer based language model to a transformer based speech model.	Yidi Jiang; Bidisha Sharma; Maulik Madhavi; Haizhou Li;	arxiv-cs.CL	2021-08-05
1465	Dyn-ASR: Compact, Multilingual Speech Recognition Via Spoken Language and Accent Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach to enable multilingual speech recognition on edge devices.	Sangeeta Ghangam; Daniel Whitenack; Joshua Nemecek;	arxiv-cs.CL	2021-08-04
1466	Improving Distinction Between ASR Errors and Speech Disfluencies with Feature Space Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a scheme to improve existing LM-based ASR error detection systems, both in terms of detection scores and resilience to such distracting auxiliary tasks.	SEONGMIN PARK et. al.	arxiv-cs.CL	2021-08-03
1467	An Intelligent Hybrid–Integrated System Using Speech Recognition and A 3D Display for Early Childhood Education IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the past few years, people’s attitudes toward early childhood education (PAUD) have undergone a complete transformation. Personalized and intelligent communication methods are …	Kun Xia; Xinghao Xie; Hongliang Fan; Haiyang Liu;	Electronics	2021-08-03
1468	The Role of Phonetic Units in Speech Emotion Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0.	Jiahong Yuan; Xingyu Cai; Renjie Zheng; Liang Huang; Kenneth Church;	arxiv-cs.CL	2021-08-02
1469	Decoupling Recognition and Transcription in Mandarin ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese.	JIAHONG YUAN et. al.	arxiv-cs.CL	2021-08-02
1470	Automatic Speech Recognition (ASR) Systems for Learning Arabic Language and Al-Quran Recitation: A Review Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper provides a literature survey about Automatic Speech Recognition (ASR) systems for learning Arabic language and Al-Quran Recitation. The growth in communication …	Nazik O’mar Balula; M. Rashwan; S. Abdou;	International Journal of Computer Science and Mobile …	2021-07-30
1471	The History of Speech Recognition to The Year 2030 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: I attempt to forecast the state of speech recognition research and applications by the year 2030.	Awni Hannun;	arxiv-cs.CL	2021-07-30
1472	Lightweight Adapter Tuning for Multilingual Speech Translation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST).	HANG LE et. al.	acl	2021-07-26
1473	VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VoxPopuli, a large-scale multilingual corpus providing 400K hours of unlabeled speech data in 23 languages.	CHANGHAN WANG et. al.	acl	2021-07-26
1474	Stacked Acoustic-and-Textual Encoding: Integrating The Pre-trained Models Into Speech Translation Encoders IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Stacked Acoustic-and-Textual Encoding (SATE) method for speech translation.	CHEN XU et. al.	acl	2021-07-26
1475	OLR 2021 Challenge: Datasets, Rules and Baselines IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the sixth Oriental Language Recognition (OLR) 2021 Challenge, which intends to improve the performance of language recognition systems and speech recognition systems within multilingual scenarios.	BINLING WANG et. al.	arxiv-cs.CL	2021-07-23
1476	Brazilian Portuguese Speech Recognition Using Wav2vec 2.0 IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this sense, this work presents the development of an public Automatic Speech Recognition (ASR) system using only open available audio data, from the fine-tuning of the Wav2vec 2.0 XLSR-53 model pre-trained in many languages, over BP data.	Lucas Rafael Stefanel Gris; Edresson Casanova; Frederico Santos de Oliveira; Anderson da Silva Soares; Arnaldo Candido Junior;	arxiv-cs.CL	2021-07-23
1477	On Prosody Modeling for ASR+TTS Based Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, in this work, we propose to directly predict prosody from the linguistic representation in a target-speaker-dependent manner, referred to as target text prediction (TTP).	Wen-Chin Huang; Tomoki Hayashi; Xinjian Li; Shinji Watanabe; Tomoki Toda;	arxiv-cs.SD	2021-07-20
1478	The IWSLT 2021 BUT Speech Translation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study their efficiency from the perspective of having a large amount of separate ASR training data and MT training data, and a smaller amount of speech-translation training data.	Hari Krishna Vydana; Martin Karafi’at; Luk’as Burget; Honza Cernock’y;	arxiv-cs.CL	2021-07-13
1479	Zero-shot Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These models tend to output the wrong language when performing zero-shot ST. We tackle the issues by including additional training data and an auxiliary loss function that minimizes the text-audio difference.	Tu Anh Dinh;	arxiv-cs.CL	2021-07-13
1480	Noisy Training Improves E2E ASR for The Edge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a simple yet effective noisy training strategy to further improve the E2E ASR model training.	DILIN WANG et. al.	arxiv-cs.CL	2021-07-09
1481	Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize compared to a strong baseline.	Christian Huber; Juan Hussain; Sebastian Stüker; Alexander Waibel;	arxiv-cs.CL	2021-07-05
1482	Arabic Code-Switching Speech Recognition Using Monolingual Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments. With this study, we release an artificially generated development and test sets, along with ecological code-switching test set, to benchmark the ASR performance.	Ahmed Ali; Shammur Chowdhury; Amir Hussein; Yasser Hifny;	arxiv-cs.CL	2021-07-04
1483	Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel modeling method for single-channel multi-talker overlapped automatic speech recognition (ASR) systems.	RYO MASUMURA et. al.	arxiv-cs.CL	2021-07-04
1484	Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a cross-modal transformer-based neural correction models that refines the output of an automatic speech recognition (ASR) system so as to exclude ASR errors.	TOMOHIRO TANAKA et. al.	arxiv-cs.CL	2021-07-04
1485	IMS’ Systems for The IWSLT 2021 Low-Resource Speech Translation Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team.	Pavel Denisov; Manuel Mager; Ngoc Thang Vu;	arxiv-cs.CL	2021-06-30
1486	Word-Free Spoken Language Understanding for Mandarin-Chinese Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Transformer-based SLU system that works directly on phones.	Zhiyuan Guo; Yuexin Li; Guo Chen; Xingyu Chen; Akshat Gupta;	arxiv-cs.CL	2021-06-30
1487	IMS’ Systems for The IWSLT 2021 Low-Resource Speech Translation Task Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data …	Pavel Denisov; Manuel Mager; Ngoc Thang Vu;	International Workshop on Spoken Language Translation	2021-06-30
1488	Alzheimer’s Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer’s Disease and to what degree, evaluating the ADReSSo challenge 2021 data.	Morteza Rohanian; Julian Hough; Matthew Purver;	arxiv-cs.CL	2021-06-29
1489	Towards Multilingual End‐to‐end Speech Recognition for Air Traffic Control IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In this work, an end-to-end framework is proposed to achieve multilingual automatic speech recognition (ASR) in air trafﬁc control (ATC) systems. Considering the standard ATC …	Yi Lin; Bo Yang; Dongyue Guo; Peng Fan;	IET Intelligent Transport Systems	2021-06-22
1490	SynthASR: Unlocking Synthetic Data for Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to utilize synthetic speech for ASR training (SynthASR) in applications where data is sparse or hard to get for ASR model training.	AMIN FAZEL et. al.	arxiv-cs.LG	2021-06-14
1491	Using Heterogeneity in Semi-supervised Transcription Hypotheses to Improve Code-switched Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, we propose a semi-supervised approach for code-switched ASR.	Andrew Slottje; Shannon Wotherspoon; William Hartmann; Matthew Snover; Owen Kimball;	arxiv-cs.CL	2021-06-14
1492	Assessing The Use of Prosody in Constituency Parsing of Imperfect Transcripts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work explores constituency parsing on automatically recognized transcripts of conversational speech.	Trang Tran; Mari Ostendorf;	arxiv-cs.CL	2021-06-14
1493	GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training.	GUOGUO CHEN et. al.	arxiv-cs.SD	2021-06-13
1494	Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In view of this, we in this paper seek to represent the historical context information of an utterance as graph-structured data so as to distill cross-utterances, global word interaction relationships.	Shih-Hsuan Chiu; Tien-Hong Lo; Fu-An Chao; Berlin Chen;	arxiv-cs.CL	2021-06-13
1495	PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Prune-Adjust-Re-Prune (PARP), which discovers and finetunes subnetworks for much better performance, while only requiring a single downstream ASR finetuning run.	CHENG-I JEFF LAI et. al.	arxiv-cs.CL	2021-06-10
1496	Unsupervised Automatic Speech Recognition: A Review IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition.	Hanan Aldarmaki; Asad Ullah; Nazar Zaki;	arxiv-cs.CL	2021-06-09
1497	Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To extract learnable and adaptive features and mitigate information loss, we propose a new encoder that adopts globally attentive locally recurrent (GALR) networks and directly takes raw waveform as input.	Max W. Y. Lam; Jun Wang; Chao Weng; Dan Su; Dong Yu;	arxiv-cs.SD	2021-06-08
1498	Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inthis paper, we propose a simple yet efficient neural networkarchitecture to exploit both acoustic and lexical informationfrom speech.	Zixuan Peng; Yu Lu; Shengfeng Pan; Yunfeng Liu;	arxiv-cs.SD	2021-06-08
1499	Vowel Non-Vowel Based Spectral Warping and Time Scale Modification for Improvement in Children’s ASR Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Acoustic differences between children’s and adults’ speech causes the degradation in the automatic speech recognition system performance when system trained on adults’ speech and …	H. Kathania; Avinash Kumar; M. Kurimo;	ICASSP 2021 – 2021 IEEE International Conference on …	2021-06-06
1500	Towards Data Selection on TTS Data for Children’s Speech Recognition IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although great progress has been made on automatic speech recognition (ASR) systems, children’s speech recognition still remains a challenging task. General ASR systems for …	WEI WANG et. al.	ICASSP 2021 – 2021 IEEE International Conference on …	2021-06-06