Paper Digest: ICASSP 2023 Highlights
The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is one of the top signal processing conferences in the world. In 2023, it is to be held in Rhodes Island, Greece.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICASSP 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | 2DSBG: A 2d Semi Bi-Gaussian Filter Adapted for Adjacent and Multi-Scale Line Feature Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a new filter composed of a bi-Gaussian and a semi-Gaussian kernel is proposed, capable of highlighting complex linear structures such as ridges and valleys of different widths, with noise robustness. |
B. Magnier; G. S. Shokouh; L. Berthier; M. Pie; A. Ruggiero; |
2 | 3D Audio Signal Processing Systems for Speech Enhancement and Sound Localization and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a two-stage system based on DPRNN and UNet for the SE task and a Conformer-based system for the SELD task. |
J. Bai; S. Huang; H. Yin; Y. Jia; M. Wang; J. Chen; |
3 | 3D Point Cloud Completion Based on Multi-Scale Degradation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore unsupervised 3D point cloud completion methods that give attention to both, we propose a multi resolution completion net (MRC-Net) which introduces a multi-scale degradation (KM- mask) and multi-discriminator into GAN inversion paradigm. |
J. Long; Q. Zhu; H. He; Z. Yu; Q. Zhang; Z. Zhang; |
4 | 6G Integrated Sensing and Communication – Sensing Assisted Environmental Reconstruction and Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a multi transmission reception points (TRP) sensing architecture based on scatter polygon assumption to improve environment sensing accuracy. |
Z. Zhou; X. Li; J. He; X. Bi; Y. Chen; G. Wang; P. Zhu; |
5 | A2S-NAS: Asymmetric Spectral-Spatial Neural Architecture Search for Hyperspectral Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, plenty of previous works ignore asymmetric spectral-spatial dimensions in HSI. To address the above issues, we propose a multi-stage search architecture in order to overcome asymmetric spectral-spatial dimensions and capture significant features. |
L. Zhan; J. Fan; P. Ye; J. Cao; |
6 | A 3D-Assisted Framework to Evaluate The Quality of Head Motion Replication By Reenactment DEEPFAKE Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we focus on the quality of head motion replication by deepfake generators that use a pilot video of a particular person to animate a single source image of another person. |
S. Husseini; J. -L. Dugelay; F. Aili; E. Nars; |
7 | A3S: Adversarial Learning of Semantic Representations for Scene-Text Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Texts in natural scene images tend to not be a random string of characters but a meaningful string of characters, a word. Therefore, we propose adversarial learning of semantic representations for scene text spotting (A3S) to improve end-to-end accuracy, including text recognition. |
M. Fujitake; |
8 | A Bandit Online Convex Optimization Approach To Distributed Energy Management In Networked Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the energy-sharing problem in a system consisting of several DERs. |
I. Tsetis; X. Cheng; S. Maghsudi; |
9 | A Bayesian Perspective for Determinant Minimization Based Robust Structured Matrix Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the corresponding maximum a posteriori estimation problem boils down to the robust determinant minimization approach for structured matrix factorization, providing insights about parameter selections and potential algorithmic extensions. |
G. Tatli; A. T. Erdogan; |
10 | A Bayesian Perspective on Noise2Noise: Theory and Extensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Bayesian counter-piece to the original Noise2Noise formulation, with a fully stochastic treatment of the latent variable. |
S. Miller; C. Karam; A. Idoughi; K. Kikuchi; K. Hirakawa; |
11 | A Benchmark for Evaluating Robustness of Spoken Language Understanding Models in Slot Filling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our experiments and analysis reveal that all of the six SLU models have a significant performance degradation on NASE. |
M. Peng; X. Jia; M. Peng; |
12 | A Bidirectional Joint Model for Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a bidirectional joint model for SLU that explicitly incorporates intent information into slot filling and slot information into intent detection. |
N. A. Tu; D. Xuan Hieu; T. M. Phuong; N. Xuan Bach; |
13 | Absolute Decision Corrupts Absolutely: Conservative Online Speaker Diarisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains. |
Y. Kwon; H. -S. Heo; B. -J. Lee; Y. J. Kim; J. -W. Jung; |
14 | Abstract Representation for Multi-Intent Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose in this study a new way to project annotation in an abstract structure with more compositional expressive power and a model to directly generate this abstract structure. |
R. Abrougui; G. Damnati; J. Heinecke; F. Béchet; |
15 | Abusive Activity Detection with Multi-Modality Based on Convolutional Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is difficult to detect because it has various forms and is not easy to define. Therefore, in this study, we try to detect using the Convolutional Neural Network (CNN). |
J. Kim; H. Ahn; B. Yoo; |
16 | A Causal Convolutional Approach for Packet Loss Concealment in Low Powered Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a deep learning model for audio Packet Loss Concealment (PLC) for real time communications that is accurate, lightweight, with a low inference time suitable for low powered mobile handsets. |
S. Davy; N. Belton; J. Tobin; O. B. Zuber; L. Dong; Y. Xuewen; |
17 | Accelerated Distributed Stochastic Non-Convex Optimization Over Time-Varying Directed Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The network nodes, which can access only their local objectives and query a stochastic first-order oracle for the gradient estimates, collaborate by exchanging messages with their neighbors to minimize a global objective function. We propose an algorithm for non-convex optimization problems in such settings that leverages stochastic gradient descent with momentum and gradient tracking. |
Y. Chen; A. Hashemi; H. Vikalo; |
18 | Accelerated Massive MIMO Detector Based on Annealed Underdamped Langevin Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a multiple-input multiple-output (MIMO) detector based on an annealed version of the underdamped Langevin (stochastic) dynamic. |
N. Zilberstein; C. Dick; R. Doost-Mohammady; A. Sabharwal; S. Segarra; |
19 | Accelerating Matrix Trace Estimation By Aitken’s Δ2 Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm to estimate the trace of symmetric matrices that are available only via Matrix-Vector multiplication. |
V. Kalantzis; G. Kollias; S. Ubaru; T. Salonidis; |
20 | Accelerating RNN-T Training and Inference Using CTC Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model. |
Y. Wang; Z. Chen; C. Zheng; Y. Zhang; W. Han; P. Haghani; |
21 | Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend previous self-supervised approaches for language identification by experimenting with Conformer based architecture in a multilingual pre-training paradigm. |
T. M. Bartley; F. Jia; K. C. Puvvada; S. Kriman; B. Ginsburg; |
22 | ACE-VC: Adaptive and Controllable Voice Conversion Using Explicitly Disentangled Self-Supervised Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. |
S. Hussain; P. Neekhara; J. Huang; J. Li; B. Ginsburg; |
23 | ACF: Aligned Contrastive Finetuning For Language and Vision Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel aligned contrastive finetuning (ACF) approach in this work. |
W. Zhu; P. Wang; X. Wang; Y. Ni; G. Xie; |
24 | Achievable Error Exponents for Almost Fixed-Length M-Ary Hypothesis Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit multiple hypothesis testing and propose a two-phase test, where each phase is a fixed-length test and the second-phase proceeds only if a reject option is decided in the first phase. |
J. Diao; L. Zhou; L. Bai; |
25 | Achieving Fair Speech Emotion Recognition Via Perceptual Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we proposed a two-stage framework, which produces debiased representations by using a fairness constraint adversarial framework in the first stage. |
W. -S. Chien; C. -C. Lee; |
26 | A Closer Look At Scoring Functions And Generalization Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, GEPs often utilize disparate mechanisms (e.g., regressors, thresholding functions, calibration datasets, etc), to derive such error estimates, which can obfuscate the benefits of a particular scoring function. Therefore, in this work, we rigorously study the effectiveness of popular scoring functions (confidence, local manifold smoothness, model agreement), independent of mechanism choice. |
P. Trivedi; D. Koutra; J. J. Thiagarajan; |
27 | A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare three state-of-the-art semi-supervised methods encompassing both unpaired text and audio as well as several of their combinations in a controlled setting using joint training. |
C. Peyser; M. Picheny; K. Cho; R. Prabhavalkar; W. R. Huang; T. N. Sainath; |
28 | A Compensated Shrinkage Affine Projection Algorithm for Debiased Sparse Adaptive Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel sparse adaptive filtering algorithm termed compensated shrinkage affine projection algorithm (CS-APA). |
Y. Zhang; I. Yamada; |
29 | A Comprehensive Comparison of Projections in Omnidirectional Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we find that different projection methods have great impact on the performance of DNNs. |
H. Pi; S. Tian; M. Lu; J. Liu; Y. Guo; S. Zhang; |
30 | A Computationally Efficient Algorithm for Distributed Adaptive Signal Fusion Based on Fractional Programs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on Dinkelbach’s iterative procedure to solve fractional programs, i.e., problems of which the objective function is a ratio of two continuous functions. |
C. A. Musluoglu; A. Bertrand; |
31 | A Content Adaptive Learnable Time-Frequency Representation for Audio Signal Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a way of computing a content-adaptive learnable time-frequency representation. |
P. Verma; C. Chafe; |
32 | A Content-Based Multi-Scale Network for Single Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel content-based multi-scale network (CMNet) is proposed in this paper for conducting single image super-resolution (SISR). |
J. Ji; B. Zhong; K. -K. Mu; |
33 | A Context-Aware Computational Approach for Measuring Vocal Entrainment in Dyadic Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a context-aware approach for measuring vocal entrainment in dyadic conversations. |
R. Lahiri; M. Nasir; C. Lord; S. H. Kim; S. Narayanan; |
34 | A Contrastive Embedding-Based Domain Adaptation Method for Lung Sound Recognition in Children Community-Acquired Pneumonia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The data scarcity will further exacerbate this problem. Therefore, we propose a contrastive embedding-based domain adaptation network (CEDANN) to eliminate individual differences and alleviate data scarcity for improving the generalization ability. |
D. Huang; L. Wang; H. Lu; W. Wang; |
35 | A Contrastive Framework to Enhance Unsupervised Sentence Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often suffer from semantic monotonicity, sampling bias, and training effect dependent on batch size. In order to solve these problems, this paper proposes a contrastive framework (CEUR) to enhance unsupervised sentence representation learning. |
H. Ma; Z. Li; H. Guo; |
36 | A Contrastive Knowledge Transfer Framework for Model Compression and Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these works overlook the high-dimension structural knowledge from the intermediate representations of the teacher, which leads to limited effectiveness, and they are motivated by various heuristic intuitions, which makes it difficult to generalize. This paper proposes a novel Contrastive Knowledge Transfer Framework (CKTF), which enables the transfer of sufficient structural knowledge from the teacher to the student by optimizing multiple contrastive objectives across the intermediate representations between them. |
K. Zhao; Y. Chen; M. Zhao; |
37 | A Controllable Lifestyle Simulator for Use in Deep Reinforcement Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel and highly generalizable simulation system based on state machines associated with probabilistic transitions to simulate the user’s lifestyle. |
L. G. Braz; A. Susaiyah; |
38 | Acoustically-Driven Phoneme Removal That Preserves Vocal Affect Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for removing linguistic information from speech for the purpose of isolating paralinguistic indicators of affect. |
C. Noufi; J. Berger; K. J. Parker; D. L. Bowling; |
39 | Acoustic Source Localization in The Spherical Harmonics Domain Exploiting Low-Rank Approximations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective method to localize prominent acoustic sources in adverse acoustic scenarios. |
M. Cobos; M. Pezzoli; F. Antonacci; A. Sarti; |
40 | A Critical Look at Recent Trends in Compression of Channel State Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we challenge the current view on state-of-the-art deep learning-based methods for compressing wireless channel state information and show that traditional methods can be highly competitive on commonly used open-source benchmarks. |
M. V. Örnhag; S. Adalbjörnsson; P. Güler; M. Mahdavi; |
41 | Active Beam Tracking with Reconfigurable Intelligent Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is an active sensing problem which is analytically intractable. This paper proposes a deep learning framework to solve this problem. |
H. Han; T. Jiang; W. Yu; |
42 | Active IRS-Assisted MIMO Channel Estimation and Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our objective is to estimate and predict the user-IRS channels by exploiting a small number of sparsely distributed active elements with a low pilot overhead. |
M. A. Haider; S. R. Pavel; Y. D. Zhang; E. Aboutanios; |
43 | Active Learning for Efficient Few-Shot Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the problem of Active Few-Shot Classification (AFSC) where the objective is to classify a small, initially unlabeled, dataset given a very restrained labeling budget. |
A. Abdali; V. Gripon; L. Drumetz; B. Boguslawski; |
44 | Active Learning of Non-Semantic Speech Tasks with Pretrained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning (AL). |
H. Lee; A. Saeed; A. L. Bertozzi; |
45 | Active Noise Control Over 3D Space: A Realistic Error Microphone Geometry Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we optimize the aforementioned system in terms of the error microphone geometry. |
H. Sun; P. Samarasinghe; T. Abhayapala; |
46 | Active Perception System for Enhanced Visual Signal Recovery Using Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models can only recognize and predict segmentation masks with great accuracy when RGB data have sufficient information about the objects of interest. In this paper, we suggest an intelligent, active perception system that can adjust its 3D position to improve signal acquisition. |
G. Chaudhary; L. Behera; T. Sandhan; |
47 | Active Selection of Source Patients in Transfer Learning for Epileptic Seizure Detection Using Riemannian Manifold Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we introduced an active learning based training data selection and modification method with a Riemannian geometry, centroid alignment, tangent space mapping and a support vector machine classifier. |
T. Orihara; K. M. Hassan; T. Tanaka; |
48 | Active Subsampling Using Deep Generative Models By Maximizing Expected Information Gain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an adaptive, fully probabilistic pipeline for optimized signal subsampling in sampling-budget constrained systems. |
K. C. E. van de Camp; H. Joudeh; D. J. Antunes; R. J. G. van Sloun; |
49 | Activity-Informed Industrial Audio Anomaly Detection Via Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is particularly challenging since the interfering sounds are virtually indistinguishable from the target machine without additional information. To overcome these challenges, we fully exploit the information of machine activity or control that is easy to obtain in the industrial environment, and propose a framework of source separation (SS) followed by anomaly detection (AD), so called SSAD. |
J. Kim; Y. Lee; H. M. Cho; D. W. Kim; C. H. Song; J. Ok; |
50 | AdapITN: A Fast, Reliable, and Dynamic Adaptive Inverse Text Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a novel end2end model that can handle both semiotic phrases (SEP) and phonetization phrases (PHP), named AdapITN. |
T. -B. Nguyen; L. D. M. Nhat; Q. M. Nguyen; Q. T. Do; C. M. Luong; A. Waibel; |
51 | Adaptable End-to-End ASR Models Using Replaceable Internal LMs and Residual Softmax Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it still suffers from domain shifts from training to testing, and domain adaptation is still challenging. To alleviate this problem, this paper designs a replaceable internal language model (RILM) method, which makes it feasible to directly replace the internal language model (LM) of E2E ASR models with a target-domain LM in the decoding stage when a domain shift is encountered. |
K. Deng; P. C. Woodland; |
52 | Adapted Multimodal Bert with Layer-Wise Fusion for Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. |
O. S. Chlapanis; G. Paraskevopoulos; A. Potamianos; |
53 | Adapter Tuning With Task-Aware Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, this paper proposes the task-aware attention mechanism (TAM) to enhance adapter tuning. |
J. Lu; F. Jin; J. Zhang; |
54 | Adapting A Self-Supervised Speech Representation for Noisy Speech Emotion Recognition By Using Contrastive Teacher-Student Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For adaptation, it is essential to balance between acquiring new knowledge from noisy speech and keeping the previous knowledge acquired during the pre-training and fine-tuning of the model. Therefore, we propose a contrastive teacher-student learning framework to retrain a self-supervised speech representation model for noisy SER. |
S. -G. Leem; D. Fulford; J. -P. Onnela; D. Gard; C. Busso; |
55 | Adapting Exploratory Behaviour in Active Inference for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we integrate the imitation learning method with active inference to minimize the expected free energy under the supervision of an expert model. |
S. Nozari; A. Krayani; P. Marin; L. Marcenaro; D. Martin; C. Regazzoni; |
56 | Adapting Self-Supervised Models to Multi-Talker Speech Recognition Using Speaker Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the adaptation of upstream SSL models to the multi-talker automatic speech recognition (ASR) task under two conditions. |
Z. Huang; D. Raj; P. García; S. Khudanpur; |
57 | Adaptive Axonal Delays in Feedforward Spiking Neural Networks for Accurate Spoken Word Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider a learnable axonal delay capped at a maximum value, which can be adapted according to the axonal delay distribution in each network layer. |
P. Sun; E. Eqlimi; Y. Chua; P. Devos; D. Botteldooren; |
58 | Adaptive CSI Feedback with Hidden Semantic Information Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a deep-learning-empowered adaptive CSI feedback compression and quantization based on the information-bottleneck principle, where the sensory data transmission is hidden within the CSI feedback to eliminate extra communication cost and preserve the data privacy at the same time. |
J. Cao; L. Lian; Y. Mao; B. Clerckx; |
59 | Adaptive Data Augmentation for Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AdDA, which implements a closed-loop feedback structure to a generic contrastive learning network. |
Y. Zhang; H. Zhu; S. Yu; |
60 | Adaptive Eccm for Mitigating Smart Jammers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers adaptive radar electronic counter-counter measures (ECCM) to mitigate ECM by an adversarial jammer. |
S. Jain; K. Pattanayak; V. Krishnamurthy; C. Berry; |
61 | Adaptive Endpointing with Deep Contextual Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search. |
D. J. Min; A. Stolcke; A. Raju; C. Vaz; D. He; V. Ravichandran; V. A. Trinh; |
62 | Adaptive Filtering Algorithms For Set-Valued Observations-Symmetric Measurement Approach To Unlabeled And Anonymized Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By using symmetric polynomials, we formulate a symmetric measurement equation that maps the observation set to a unique vector. |
V. Krishnamurthy; |
63 | Adaptive Gaussian Nested Filter for Parameter Estimation and State Tracking in Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the adaptive Gaussian nested filter (AGNesF), the first nested method that adapts the number of samples to estimate both the static parameters and the dynamical variables of a state-space model. |
S. Pérez-Vieites; V. Elvira; |
64 | Adaptive Knowledge Distillation Between Text and Speech Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the semantic and granularity gap between text and speech has been omitted in literature, which impairs the distillation, we propose the Prior-informed Adaptive knowledge Distillation (PAD) that adaptively leverages text/speech units of variable granularity and prior distributions to achieve better global and local alignments between text and speech pre-trained models. |
J. Ni; Y. Ma; W. Wang; Q. Chen; D. Ng; H. Lei; T. H. Nguyen; C. Zhang; B. Ma; E. Cambria; |
65 | Adaptive Large Margin Fine-Tuning For Robust Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our experiments, we also find that LMFT fails in short duration and other verification scenarios. To solve this problem, we propose the duration-based and similarity-based adaptive large margin fine-tuning (ALMFT) strategy. |
L. Zhang; Z. Chen; Y. Qian; |
66 | Adaptive Mask Co-Optimization for Modal Dependence in Multimodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, multimodal models may incline to rely on some modalities that are easier to be learned, while under-fit the other modalities and lead to sub-optimal results. To address this problem, we propose a novel plug-in module, Adaptive Mask Co-optimization (AMCo), which could be inserted into advanced models. |
Y. Zhou; X. Liang; S. Zheng; H. Xuan; T. Kumada; |
67 | Adaptive Multi-Corpora Language Model Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel adaptive multi-corpora training algorithm that dynamically learns and adjusts the sampling probability of each corpus along the training process. |
Y. Ma; Z. Liu; X. Zhang; |
68 | Adaptive Noise Canceller Algorithm with SNR-Based Stepsize and Data-Dependent Averaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an adaptive noise canceller algorithm with an SNR-based stepsize and data-dependent averaging. |
A. Sugiyama; |
69 | Adaptive Non-Local Generative Adversarial Networks for Low-Dose CT Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For different input images, conventional neural networks always adopt a fixed number of channels which limits the performance of deep networks. To address these problems, we propose a channel-adaptive convolution and patch selection (CAPS) module to enhance the feature extraction of our network. |
L. Yang; H. Liu; F. Shang; Y. Liu; |
70 | Adaptive Scale and Spatial Aggregation for Real-Time Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The accuracy of detection may be limited by their insufficient capabilities to obtain powerful feature representation, which is a notoriously onerous task in machine vision applications. Aiming at this problem, this study proposes a method of adaptive aggregation of features at both scale and spatial levels in an anchor-free framework: 1) at the scale level, a Multi-scale Point Feature Fusion (MPFF) module has been proposed to fuse point features from multiple scales via a self-adaptive re-weighting manner; 2) at the spatial level, a Restrained Deformable Convolution (R-DCN) has been designed to focus on the most informative features in a pre-defined region while avoiding the remote feature distraction. |
W. Chen; Y. He; Z. Liang; Y. Guo; |
71 | Adaptive Semantic Fusion Framework for Unsupervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, numerous existing methods relying on photometric consistency are excessively susceptible to variations in illumination and suffer in the regions with strong reflection. To overcome this limitation, we propose a novel unsupervised depth estimation framework named ColorDepth, which forces the model to explore object semantic to infer depth. |
R. Li; H. Yu; K. Du; Z. Xiao; B. Yan; Z. Yuan; |
72 | Adaptive Simulated Annealing Through Alternating Rényi Divergence Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose here a new simulated annealing algorithm with adaptive cooling schedule, which draws samples from variational approximations of the Boltzmann distributions. |
T. Guilmeau; E. Chouzenoux; V. Elvira; |
73 | Adaptive Step-Size Methods for Compressed SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we introduce a scaling technique for the descent step, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition, and for non-convex objectives under a strong growth condition. |
A. M. Subramaniam; A. Magesh; V. V. Veeravalli; |
74 | Adaptive Submanifold-Preserving Sparse Regression for Feature Selection And Multiclass Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel embedded feature selection method, which is able to select the informative and discriminative features with the underlying submanifolds of data in intra-class being well preserved so as to improve the classification performance. |
R. Xu; X. Liang; |
75 | Adaptive Time-Scale Modification for Improving Speech Intelligibility Based On Phoneme Clustering For Streaming Services Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes an adaptive time-scale modification algorithm (ATSM); that adaptively varies the speaking rate for each phoneme cluster of speech to improve speech intelligibility. |
S. Jang; J. Kim; Y. -J. Kim; J. -H. Chang; |
76 | A Database for Multi-Modal Short Video Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish a novel database dubbed MMSVD-Douyin for assessing multi-modal short video quality under consideration of three evaluation criteria. |
Y. Zhang; C. Wang; S. Zhang; X. Cao; |
77 | A Dataset for Audio-Visual Sound Event Detection in Movies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds (SAM-S). |
R. Hebbar; D. Bose; K. Somandepalli; V. Vijai; S. Narayanan; |
78 | A Deep Disentangled Approach for Interpretable Hyperspectral Unmixing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a physically interpretable deep learning method for hyperspectral unmixing accounting for nonlinearity and the variability of the endmembers. |
R. A. Borsoi; T. Imbiriba; D. Erdo?mu?; |
79 | A Deep Fusion Rule for Infrared and Visible Image Fusion: Feature Communication for Importance Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing fusion rules may not extract the most useful information and cannot effectively retain important information. To solve this problem, we propose a novel deep learning-based fusion rule. |
X. Lv; J. Cheng; G. Lv; Z. Wei; |
80 | A Deep Temporal Factor Analysis Method for Large Scale Financial Portfolio Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a neural network temporal factor analysis (NN-TFA) model for dimensionality reduction and it enables us to build a scalable deep reinforcement learning method for large-scale portfolio management. |
Y. Zhou; R. Su; S. Tu; L. Xu; |
81 | ADHD Classification with Biomarker Identification Using A Triplet Loss Attention Auto-Encoding Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we proposed an attention auto-encoding network with triplet loss (Tri-Att-AENet) for both ADHD classification and biomarker identification. |
Y. Tang; Y. Chen; Y. Gao; A. Jiang; L. Zhou; |
82 | A Discriminative Multi-Channel Noise Feature Representation Model for Image Manipulation Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the ability of different noise feature modules to localize different manipulation types. |
Y. Zhou; H. Wang; Q. Zeng; R. Zhang; S. Meng; |
83 | A Distributed Adaptive Algorithm for Non-Smooth Spatial Filtering Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the DASF algorithm has only been shown to converge for filtering problems that can be expressed as smooth optimization problems. In this paper, we explore an extension of the DASF algorithm to a family of non-smooth spatial filtering problems, allowing the addition of non-smooth regularizers to the optimization problem, which could for example be used to perform node selection, and eliminate nodes not contributing to the filter objective, therefore further reducing communication costs. |
C. Hovine; A. Bertrand; |
84 | A DNN-Based Hearing-Aid Strategy For Real-Time Processing: One Size Fits All Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present a deep-neural-network (DNN) HA processing strategy that can provide individualised sound processing for the audiogram of a listener using a single model architecture. |
F. Drakopoulos; A. Van Den Broucke; S. Verhulst; |
85 | A DNN Based Normalized Time-Frequency Weighted Criterion for Robust Wideband DoA Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the robustness against interference, we propose a DNN based normalized time-frequency (T-F) weighted criterion which minimizes the distance between the candidate steering vectors and the filtered snapshots in the T-F domain. |
K. -L. Chen; C. -H. Lee; B. D. Rao; H. Garudadri; |
86 | A Dual-Branch Adaptive Distribution Fusion Framework for Real-World Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address FER task via label distribution learning paradigm, and develop a dual-branch Adaptive Distribution Fusion (AdaDF) framework. |
S. Liu; Y. Xu; T. Wan; X. Kui; |
87 | A Dual-Path Transformer Network for Scene Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DPTNet (Dual-Path Transformer Network), a simple yet effective network to utilize both global and local information for the scene text detection task. |
J. Lin; Y. Yan; H. Wang; |
88 | Advancing The Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. |
Y. J. Kim; H. -S. Heo; J. -W. Jung; Y. Kwon; B. -J. Lee; J. S. Chung; |
89 | Adversarial Attacks on Genotype Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such studies commonly include steps involving the analysis of the genomic sequences’ structure using dimensionality reduction techniques and ancestry inference methods. In this paper we show how white-box gradient-based adversarial attacks can be used to corrupt the output of genomic analyses, and we explore different machine learning techniques to detect such manipulations. |
D. M. Montserrat; A. G. Ioannidis; |
90 | Adversarial Contrastive Distillation with Adaptive Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel structured ARD method called Contrastive Relationship DeNoise Distillation (CRDND). |
Y. Wang; Z. Chen; D. Yang; Y. Liu; S. Liu; W. Zhang; L. Qi; |
91 | Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. |
Z. Jin; X. Xie; M. Geng; T. Wang; S. Hu; J. Deng; G. Li; X. Liu; |
92 | Adversarial Guitar Amplifier Modelling with Unpaired Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an audio effects processing framework that learns to emulate a target electric guitar tone from a recording. |
A. Wright; V. Välimäki; L. Juvela; |
93 | Adversarially Robust Fairness-Aware Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using a minimax framework, in this paper, we aim to design an adversarially robust fair regression model that achieves optimal performance in the presence of an attacker who is able to perform a rank-one attack on the dataset. |
Y. Jin; L. Lai; |
94 | Adversarial Network Pruning By Filter Robustness Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous studies maintain the robustness of the pruned networks by combining adversarial training and network pruning but ignore preserving the robustness at a high sparsity ratio in structured pruning. To address such a problem, we propose an effective filter importance criterion, Filter Robustness Estimation (FRE), to evaluate the importance of filters by estimating their contribution to the adversarial training loss. |
X. Zhuang; Y. Ge; B. Zheng; Q. Wang; |
95 | Adversarial Permutation Invariant Training for Universal Sound Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we complement PIT with adversarial losses but find it challenging with the standard formulation used in speech source separation. |
E. Postolache; J. Pons; S. Pascual; J. Serrà; |
96 | A Dynamic Cross-Scale Transformer with Dual-Compound Representation for 3D Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, single-scale attention fails to achieve a balance between feature representation and semantic information. Aiming at the above problems, we propose a window-based dynamic crossscale cross-attention transformer (DCS-Former) for precise representation of the diversity features. |
R. Zhang; Z. Wang; Z. Wang; J. Xin; |
97 | A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a novel approach to construct the interactive graph based on the injection of label semantics, which can automatically update the graph to better alleviate error propagation. |
Z. Zhu; W. Xu; X. Cheng; T. Song; Y. Zou; |
98 | AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue, we proposed an angular-distance-based multiple SELD (AD-YOLO), which is an adaptation of the You Look Only Once algorithm for SELD. |
J. S. Kim; H. Joon Park; W. Shin; S. W. Han; |
99 | AE-Flow: Autoencoder Normalizing Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce supervision to the training process of normalizing flows, without the need for parallel data. |
J. Mosiński; P. Biliński; T. Merritt; A. Ezzerg; D. Korzekwa; |
100 | AERO: Audio Super Resolution in The Spectral Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. |
M. Mandel; O. Tal; Y. Adi; |
101 | A Fast and Accurate Pitch Estimation Algorithm Based on The Pseudo Wigner-Ville Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we capitalize on the high time and frequency resolution of the pseudo Wigner-Ville distribution (PWVD) and propose a new PWVD-based pitch estimation method. |
Y. Liu; P. Wu; A. W. Black; G. K. Anumanchipalli; |
102 | A Few Shot Learning of Singing Technique Conversion Based on Cycle Consistency Generative Adversarial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the proposed methods on three datasets that were commonly used in pop songs which involve singing techniques in terms of breathy voice, vibrato, and vocal fry. |
P. -W. Chen; V. -W. Soo; |
103 | Affinity Learning With Blind-Spot Self-Supervision for Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the blind-spot based self-supervised denoising by using affinity learning to remove noise from affected pixels. |
Y. Zhou; L. Zhou; I. H. Laradji; T. Lun Lam; Y. Xu; |
104 | A Flow-Guided Non-Local Alignment Network for Video Compressive Sensing Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a flow-guided non-local alignment network (FNLAN), which can build accurate temporal dependencies among adjacent frames to help video recovery. |
C. Zhou; C. Chen; D. Zhang; |
105 | A Framework for Unified Real-Time Personalized and Non-Personalized Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. |
Z. Wang; R. Giri; D. Shah; J. -M. Valin; M. M. Goodwin; P. Smaragdis; |
106 | A Frequency-Domain Recursive Least-Squares Adaptive Filtering Algorithm Based On A Kronecker Product Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a frequency-domain recursive least-squares (RLS) adaptive filtering algorithm for identifying time-varying acoustic systems in noisy environments. |
H. He; J. Chen; J. Benesty; Y. Yu; |
107 | A Frequency-Weighted Leaky Fxlms Algorithm with Application to Feedback Active Noise Control Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the traditional leaky filtered-x least mean square (FxLMS) algorithm, a frequency-weighted leaky FxLMS algorithm is proposed in this paper, where the weight factors of the proposed algorithm are characterized in frequency-domain and can be calculated directly by solving a constrained optimization problem. |
Y. Tang; H. Zhang; |
108 | A Fusion-Based and Multi-Layer Method for Low Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a low light image enhancement algorithm using a fusion-based and multi-layer model. |
X. Zhou; J. Guo; H. Liu; C. Wang; |
109 | A Game of Snakes and Gans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish a connection between active contour models (snakes) and GANs. |
S. Asokan; F. S. Mohammed; C. Sekhar Seelamantula; |
110 | A Gaussian Latent Variable Model for Incomplete Mixed Type Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a Gaussian process framework that efficiently captures the information from mixed numerical and categorical data that effectively incorporates missing variables. |
M. Ajirak; P. M. Djurić; |
111 | A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel transfer learning framework, named generalized subspace distribution adaptation (GSDA), to tackle the challenging cross-corpus speech emotion recognition problem. |
S. Li; P. Song; L. Ji; Y. Jin; W. Zheng; |
112 | A Geometric Surrogate for Simulation Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we employ a machine learning-based approach to perform calibration faster and more accurately, with two components: a surrogate model of the simulation that is easy to obtain but not physically interpretable and a bridge model that maps the surrogate to the calibrated parameters. |
L. S. Souza; B. Batalo; K. Yamazaki; |
113 | Agile Radio Map Prediction Using Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a runtime-efficient radio frequency (RF) map prediction method based on UNet convolutional neural networks (CNNs), trained on a large-scale 3D maps dataset. |
E. Krijestorac; H. Sallouha; S. Sarkar; D. Cabric; |
114 | A Graph Neural Network Multi-Task Learning-Based Approach for Detection and Localization of Cyberattacks in Smart Grids Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-task learning-based approach that performs both tasks simultaneously using a graph neural network (GNN) with stacked convolutional Chebyshev graph layers. |
A. Takiddin; R. Atat; M. Ismail; K. Davis; E. Serpedin; |
115 | A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a hierarchical framework, based on chain regression models, for affective recognition from VBs, that explicitly considers multiple relationships: (i) between emotional states and diverse cultures; (ii) between low-dimensional (arousal & valence) and high-dimensional (10 emotion classes) emotion spaces; and (iii) between various emotion classes within the high-dimensional space. |
J. Li; X. Wu; K. Song; D. Li; X. Liu; H. Meng; |
116 | A Highly Interpretable Deep Equilibrium Network for Hyperspectral Image Deconvolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel technique for the hyperspectral image deconvolution problem is developed. |
A. Gkillas; D. Ampeliotis; K. Berberidis; |
117 | A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a holistic cascade system for expressive S2ST, combining multiple prosody transfer techniques previously considered only in isolation. |
W. -C. Huang; B. Peloquin; J. Kao; C. Wang; H. Gong; E. Salesky; Y. Adi; A. Lee; P. -J. Chen; |
118 | A Hybrid Deep Neural Network for Nonlinear Causality Analysis in Complex Industrial Control System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel neural causality analysis network with directed acyclic graph to locate the root cause for complex industrial systems. |
T. Feng; Q. Chen; Y. Shi; X. Lang; L. Xie; H. Su; |
119 | Aiding Speech Harmonic Recovery in DNN-Based Single Channel Noise Reduction Using Cepstral Excitation Manipulation (CEM) Components Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by previous work on speech harmonic enhancement using statistical methods, we present a loss function component we term cepstral excitation manipulation (CEM) loss, which is constructed based on the fundamental frequency-related cepstral coefficients. |
Y. Song; N. Madhu; |
120 | A Knowledge-Driven Vowel-Based Approach of Depression Classification from Speech Using Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel explainable machine learning (ML) model that identifies depression from speech, by modeling the temporal dependencies across utterances and utilizing the spectrotemporal information at the vowel level. |
K. Feng; T. Chaspari; |
121 | A Large-Scale Pretrained Deep Model for Phishing URL Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PhishBERT, a veritable pretrained deep transformer network model for phishing URL detection. |
Y. Wang; W. Zhu; H. Xu; Z. Qin; K. Ren; W. Ma; |
122 | A Learnable Spatial Mapping for Decoding The Directional Focus of Auditory Attention Using EEG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a learnable spatial mapping (LSM) mechanism to transform EEG channels into a 2D form, which can be combined with the spatial attention mechanism to better extract the inherent coherence among the electrodes. |
Y. Zhang; H. Ruan; Z. Yuan; H. Du; X. Gao; J. Lu; |
123 | Aleatoric Uncertainty Estimation of Overnight Sleep Statistics Through Posterior Sampling Using Conditional Normalizing Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of factorizing, we propose to jointly model the sequence of sleep stages, by introducing U-Flow, a conditional normalizing flow network. |
H. v. Gorp; M. M. van Gilst; P. Fonseca; S. Overeem; R. J. G. van Sloun; |
124 | Algebraic Convolutional Filters on Lie Group Algebras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking an algebraic signal processing perspective, we propose a novel convolutional filter from the Lie group algebra directly, thereby removing the need to lift altogether. |
H. Kumar; A. Parada-Mayorga; A. Ribeiro; |
125 | A Lightweight Convolutional Neural Network Using Feature Filtering Module Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, if a large number of channel connections are used to fuse the feature layer, the parameter quantity will increase dramatically. In this work, we propose a new network architecture with dense connection and feature filtering to tackle this problem. |
N. Jing; Y. Zhang; |
126 | A Lightweight Fourier Convolutional Attention Encoder for Multi-Channel Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The spectral-spatial cues are crucial in beamforming weights estimation, however, many existing works fail to optimally predict the beamforming weights with an absence of adequate spectral-spatial information learning. To tackle this challenge, we propose a Fourier convolutional attention encoder (FCAE) to provide a global receptive field over the frequency axis and boost the learning of spectral contexts and cross-channel features. |
S. Sun; J. Jin; Z. Han; X. Xia; L. Chen; Y. Xiao; P. Ding; S. Song; R. Togneri; H. Zhang; |
127 | Alignment Entropy Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use entropy to measure a model’s uncertainty, i.e. how it chooses to distribute the probability mass over the set of allowed alignments. |
E. Variani; K. Wu; D. Rybach; C. Allauzen; M. Riley; |
128 | Align, Write, Re-Order: Explainable End-to-End Speech Translation Via Operation Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The black-box nature of end-to-end speech-to-text translation (E2E ST) makes it difficult to understand how source language inputs are being mapped to the target language. To solve this problem, we propose to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word. |
M. Omachi; B. Yan; S. Dalmia; Y. Fujita; S. Watanabe; |
129 | A Low-Latency Deep Hierarchical Fusion Network for Fullband Acoustic Echo Cancellation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our submission to the fourth Acoustic Echo Cancellation (AEC) Challenge, which is part of ICASSP 2023 Signal Processing Grand Challenge. |
H. Zhao; N. Li; R. Han; X. Zheng; C. Zhang; L. Guo; B. Yu; |
130 | A Low-Latency Hybrid Multi-Channel Speech Enhancement System For Hearing Aids Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper summarizes a hybrid multi-channel speech enhancement system for the ICASSP Signal Processing Grand Challenge: Clarity Challenge (Speech Enhancement for Hearing Aids) 2023. |
T. Lei; Z. Hou; Y. Hu; W. Yang; T. Sun; X. Rong; D. Wang; K. Chen; J. Lu; |
131 | Alternating Constrained Minimization Based Approximate Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the GAMP algorithm (as e.g. for sparse Bayesian learning (SBL)) by more rigorously applying an alternating constrained minimization strategy to an appropriately reparameterized LSL BFE. |
C. K. Thomas; D. Slock; |
132 | Alternating Phase Langevin Sampling with Implicit Denoiser Priors for Phase Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a way of leveraging the prior implicitly learned by a denoiser to solve phase retrieval problems by incorporating it in a classical alternating minimization framework. |
R. Agrawal; O. Leong; |
133 | A Magnetic Framelet-Based Convolutional Neural Network for Directed Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Framelet-MagNet, a magnetic framelet-based spectral GCNN for directed graphs (digraphs). |
L. Lin; J. Gao; |
134 | A Mathematical Model for Neuronal Activity and Brain Information Processing Capacity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an information conservation law for regional brain activation, and establish a mathematical model to quantify the relationship between the information processing capacity, input storage capacity, the arrival rate of exogenous information, and the neuronal activity of a brain region—referred to as the brain information processing capacity (IPC) model. |
Y. Zheng; D. Zhu; J. Ren; T. Liu; K. Friston; T. Li; |
135 | AMC-Net: An Effective Network for Automatic Modulation Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the drawback, we propose a novel AMC-Net that improves recognition by denoising the input signal in the frequency domain while performing multi-scale and effective feature extraction. |
J. Zhang; T. Wang; Z. Feng; S. Yang; |
136 | A Memory-Free Evolving Bipolar Neural Network for Efficient Multi-Label Stream Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes an Evolving Bipolar Network architecture called EBN-MSL consisting of two parallel layers trained in a maximum margin framework to learn efficiently in a continual multi-label learning scenario without utilizing any samples stored from previous tasks. |
S. Mishra; S. Sundaram; |
137 | A Meta-Gnn Approach to Personalized Seizure Detection and Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples. |
A. Rahmani; A. Venkitaraman; P. Frossard; |
138 | A Method of Constructing and Automatically Labeling Radio Frequency Signal Training Dataset for UAV Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the UAV signal dataset cannot be directly applied to object detection, we propose a method using time-frequency domain filtering and automatic labeling to construct a large-scale time-frequency spectrogram dataset. |
C. Liu; R. Ma; Z. Si; M. Chi; |
139 | Amicable Aid: Perturbing Images to Improve Classification Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that by taking the opposite search direction of perturbation, an image can be modified to yield higher classification confidence and even a misclassified image can be made correctly classified. |
J. Kim; J. -H. Choi; S. Jang; J. -S. Lee; |
140 | A Model-Based Hearing Compensation Method Using A Self-Supervised Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a model-based hearing compensation method using a self-supervised framework with a given auditory model. |
Y. Niu; N. Li; X. Wu; J. Chen; |
141 | A Momentum Two-Gradient Direction Algorithm with Variable Step Size Applied to Solve Practical Output Constraint Issue for Active Noise Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a two-gradient direction ANC algorithm with a momentum factor to solve the saturation with faster convergence. |
X. Shen; D. Shi; Z. Luo; J. Ji; W. -S. Gan; |
142 | AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the single-frame method still needs to model the physically connected relations among joints because the feature representations transformed only by global relations via the Transformer neglect information on the human skeleton. To deal with this problem, we propose a novel method in which the Transformer encoder and GCN blocks are alternately stacked, namely AMPose, to combine the global and physically connected relations among joints towards HPE. |
H. Lin; Y. Chiu; P. Wu; |
143 | A Multi-Channel Aggregation Framework for Object Detection in Large-Scale SAR Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, multiple sets of slices of large-scale images are first sliced using slicers of various sizes. |
C. Yang; C. Zhang; Z. Fan; Z. Yu; Q. Sun; M. Dai; |
144 | A Multi-Modal Approach For Context-Aware Network Traffic Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Multi-Modal Classification method named MTCM to systematically exploit the context for the classification task. |
B. Pang; Y. Fu; S. Ren; S. Shen; Y. Wang; Q. Liao; Y. Jia; |
145 | A Multi-Scale Feature Aggregation Based Lightweight Network for Audio-Visual Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing AVSE models are heavyweight in the sense of parameter count, which is inappropriate for the deployment and practical applications. In this paper, we therefore present a lightweight AVSE approach (called M3Net) by incorporating several multi-modality, multi-scale and multi-branch strategies. |
H. Xu; L. Wei; J. Zhang; J. Yang; Y. Wang; T. Gao; X. Fang; L. Dai; |
146 | A Multi-Signal Perception Network for Textile Composition Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper propose a Multi-Signal Perception Network (MSPNet) for nondestructive textile composition identification, allowing the model to benefit from the advantages of multimodal data. |
B. Peng; L. He; D. Wu; M. Chi; J. Chen; |
147 | A Multi-Stage Hierarchical Relational Graph Neural Network for Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-stage hierarchical relational graph neural network (MHRG), catering to intra- and inter-modal dynamics learning with modality calibration. |
P. Gong; J. Liu; X. Zhang; X. Li; |
148 | A Multi-Stage Low-Latency Enhancement System for Hearing Aids Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce four major novelties: (1) a novel multi-stage system in both the magnitude and complex domains to better utilize phase information; (2) an asymmetric window pair to achieve higher frequency resolution with the 5ms latency constraint; (3) the integration of head rotation information and the mixture signals to achieve better enhancement; (4) a post-processing module that achieves higher hearing aid speech perception index (HASPI) scores with the hearing aid amplification stage provided by the baseline system. |
C. Ouyang; K. Fei; H. Zhou; C. Lu; L. Li; |
149 | A Multi-Stage Triple-Path Method For Speech Separation in Noisy and Reverberant Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In noisy and reverberant environments, the performance of deep learning-based speech separation methods drops dramatically because previous methods are not designed and optimized for such situations. To address this issue, we propose a multi-stage end-to-end learning method that decouples the difficult speech separation problem in noisy and reverberant environments into three sub-problems: speech denoising, separation, and de-reverberation. |
Z. Mu; X. Yang; X. Yang; W. Zhu; |
150 | A Mutual Implicit Sentiment Analysis Model with Bundle-Aware Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from them, the core idea of this paper is to form explicit-implicit bundles to ensure each batch has the two expressions, which does not rely on external resources. |
S. Cai; J. Yuan; L. Li; |
151 | An Adapter Based Multi-Label Pre-Training for Speech Separation and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on HuBERT, this work investigates improving the SSL model for SS and SE. |
T. Wang; X. Chen; Z. Chen; S. Yu; W. Zhu; |
152 | An Adaptive DFE Using Light-Pattern-Protection Algorithm in 12 NM CMOS Technology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article proposes a novel light-pattern-protection (LPP) algorithm to achieve robustness. |
S. Xing; C. Lin; Y. Li; H. Wang; |
153 | An Adaptive Enhancement Method for Gastrointestinal Low-Light Images of Capsule Endoscope Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an adaptive enhancement method for WCE images. |
P. Liu; Y. Wang; J. Yang; W. Li; |
154 | An Adaptive Plug-and-Play Network for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The bottleneck is that deep networks and complex metrics tend to induce overfitting in FSL, making it difficult to further improve the performance. Towards this, we propose plug-and-play model-adaptive resizer (MAR) and adaptive similarity metric (ASM) without any other losses. |
H. Li; L. Li; Y. Huang; N. Li; Y. Zhang; |
155 | Analysing Diffusion-based Generative Approaches Versus Discriminative Approaches for Speech Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks. |
J. -M. Lemercier; J. Richter; S. Welker; T. Gerkmann; |
156 | Analysing Discrete Self Supervised Speech Representation For Spoken Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. |
A. Sicherman; Y. Adi; |
157 | Analysing The Masked Predictive Coding Training Criterion for Pre-Training A Speech Representation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the impact of MPC loss on the type of information learnt at various layers in the HuBERT model, using nine probing tasks. |
H. Yadav; S. Sitaram; R. R. Shah; |
158 | Analysis and Re-Synthesis of Natural Cricket Sounds Assessing The Perceptual Relevance of Idiosyncratic Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes cricket sounds from a parametric point of view, characterizes their main temporal and spectral features, namely jitter, shimmer and frequency sweeps, and explains a re-synthesis process generating modified natural cricket sounds. |
M. Oliveira; V. Almeida; J. Silva; A. Ferreira; |
159 | Analysis and Transformation of Voice Level in Singing Voice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a neural auto-encoder that transforms the musical dynamic in recordings of singing voice via changes in voice level. |
F. Bous; A. Roebel; |
160 | Analysis Of Noisy-Target Training For Dnn-Based Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct various analyses to deepen our understanding of NyTT. |
T. Fujimura; T. Toda; |
161 | Analyzing Acoustic Word Embeddings from Pre-Trained Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study several pre-trained models and pooling methods for constructing AWEs with self-supervised representations. |
R. Sanabria; H. Tang; S. Goldwater; |
162 | An Analysis of Degenerating Speech Due to Progressive Dysarthria on ASR Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aims of this study were to (1) analyze the change of performance of ASR over time in individuals with degrading speech, and (2) explore mitigation strategies to optimize recognition throughout disease progression. |
K. Tomanek; K. Seaver; P. -P. Jiang; R. Cave; L. Harrell; J. R. Green; |
163 | An Antispoofing Approach in Biometric Authentication System for A Smartcard Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To meet low-power constraints for smartcards, we propose a simple convolutional neural network-based architecture and dedicated hardware to handle the problem. |
H. -S. Lee; M. -K. Song; J. Lee; Y. Seong; D. Kim; K. Bae; S. Song; |
164 | An Application of Quantum Mechanics to Attention Methods in Computer Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes the quantum-state-based mapping (QSM) for machine learning. |
J. Zhang; Y. Luo; P. Cheng; Z. Li; H. Wu; K. Yu; W. An; J. Zhou; |
165 | An Approach to Ontological Learning from Weak Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ontologies encompass a formal representation of knowledge through the definition of concepts or properties of a domain, and the relationships between those concepts. In this work, we seek to investigate whether using this ontological information will improve learning from weakly labeled data, which are easier to collect since it requires only the presence or absence of an event to be known. |
A. Shah; L. Tang; P. H. Chou; Y. Y. Zheng; Z. Ge; B. Raj; |
166 | An ASR-Free Fluency Scoring Approach with Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a novel ASR-free approach for automatic fluency assessment using self-supervised learning (SSL). |
W. Liu; K. Fu; X. Tian; S. Shi; W. Li; Z. Ma; T. Lee; |
167 | An Asynchronous Updating Reinforcement Learning Framework for Task-Oriented Dialog System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The errors from DST might misguide the dialog policy, and the system action brings extra difficulties for the DST module. To alleviate this problem, we propose Asynchronous Updating Reinforcement Learning framework (AURL) that updates the DST module and the DP module asynchronously under a cooperative setting. |
S. Zhang; Y. Hu; X. Wang; C. Yuan; |
168 | An Attention-Based Approach to Hierarchical Multi-Label Music Instrument Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the effective joint training in the multi-label setting, we propose two methods to model the connection between fine- and coarse-level tags, where one uses rule-based grouped max-pooling, the other one uses the attention mechanism obtained in a data-driven manner. |
Z. Zhong; M. Hirano; K. Shimada; K. Tateishi; S. Takahashi; Y. Mitsufuji; |
169 | An Augmented Gaussian Sum Filter Through A Mixture Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a way of controlling the covariances of the underlying Gaussian mixture. |
K. Tsampourakis; V. Elvira; |
170 | An Auto-Encoder Based Method for Camera Fingerprint Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new method to compress high-dimensional floating-point fingerprints to low-dimensional binary features to save storage as well as maintaining their representative abilities. |
K. Zhang; Z. Liu; J. Hu; S. Wang; |
171 | An Automotive Radar Dataset For Object Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel 77 GHz automotive radar dataset of static and moving objects. |
A. Shyam; K. Komalavally; M. Gautam; V. Kancharla; V. Gudisa; V. Patil; A. Balasubramanian; S. Channappayya; |
172 | Anchored Speech Recognition with Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech. |
D. Raj; J. Jia; J. Mahadeokar; C. Wu; N. Moritz; X. Zhang; O. Kalinli; |
173 | Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Distant Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel augmentation method of ancient Chinese WSG and POS tagging data using distant supervision over parallel corpus. |
S. Feng; P. Li; |
174 | An Edge Alignment-Based Orientation Selection Method for Neutron Tomography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an adaptive orientation selection method in which an MBIR reconstruction on previously-acquired measurements is used to define an objective function on orientations that balances a data-fitting term promoting edge alignment and a regularization term promoting orientation diversity. |
D. Yang; S. Tang; S. V. Venkatakrishnan; M. S. N. Chowdhury; Y. Zhang; H. Z. Bilheux; G. T. Buzzard; C. A. Bouman; |
175 | An Effective Anomalous Sound Detection Method Based on Representation Learning with Simulated Anomalies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective anomalous sound detection (ASD) method based on representation learning with simulated anomalies. |
H. Chen; Y. Song; Z. Zhuo; Y. Zhou; Y. -H. Li; H. Xue; I. McLoughlin; |
176 | An Efficient Beam-Sharing Algorithm for RIS-aided Simultaneous Wireless Information and Power Transfer Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient beam-sharing algorithm for RIS-aided SWIPT systems. |
N. M. Tran; M. M. Amri; J. H. Park; D. I. Kim; K. W. Choi; |
177 | An Efficient Relay Selection Scheme for Relay-assisted HARQ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from previous works, whether to participate in the transmission is determined by each RN itself in this work, thus reducing the overhead. |
W. Ding; M. Shikh-Bahaei; |
178 | An Empirical Study and Improvement for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior works mainly focus on exploiting advanced networks to model and fuse different modality information to facilitate performance, while neglecting the effect of different fusion strategies on emotion recognition. In this work, we consider a simple yet important problem: how to fuse audio and text modality information is more helpful for this multimodal task. |
Z. Wu; Y. Lu; X. Dai; |
179 | An Empirical Study of Backdoor Attacks on Masked Auto Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as a representation learning method, the backdoor pitfall of MAE, and its impact on downstream tasks, have not been fully investigated. In this paper, we use several common triggers to perform backdoor attacks on the pre-training phase of MAE and test them on downstream tasks. |
S. Zhuang; P. Xia; B. Li; |
180 | An Empirical Study on Speech Restoration Guided By Self-Supervised Speech Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on exploring the impact of self-supervised speech representation learning on the speech restoration task. |
J. Byun; Y. Ji; S. -W. Chung; S. Choe; M. -S. Choi; |
181 | An End-to-End Framework for Partial View-Aligned Clustering with Graph Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to tackle it, termed An End-to-end Framework for Partial View-aligned Clustering with Graph structure(EGPVC). |
L. Zhao; Q. Xie; S. Wu; S. Ma; |
182 | An End-to-End Neural Network for Image-to-Audio Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes an end-to-end (E2E) neural architecture for the audio rendering of small portions of display content on low resource personal computing devices. |
L. Chen; M. Deisher; M. Georges; |
183 | A Nested Ensemble Method to Bilevel Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such problems involve a nested relation between inner- and outer-level problems, which often have suboptimal solutions with poor generalization ability. To address this issue, this paper proposes an ensemble method tailored to bilevel learning. |
L. Chen; M. Abbas; T. Chen; |
184 | An Evaluation Platform to Scope Performance of Synthetic Environments in Autonomous Ground Vehicles Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present our Scoping Autonomous Vehicle Simulation (SAVeS) platform for benchmarking the performance of simulated environments for autonomous ground vehicle testing 1. |
X. Bai; L. Jiang; Y. Luo; A. Gupta; P. Kaveti; H. Singh; S. Ostadabbas; |
185 | A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new method based on affine combinations of adaptive filters to extract FECG signals. |
Y. Xuan; X. Zhang; S. S. Li; Z. Shen; X. Xie; L. P. Garcia; R. Togneri; |
186 | A New Personalized Efficacy Atlas for Pallidal Deep Brain Stimulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to create a novel personalized efficacy atlas that warps functional scales and estimates activation volume by modeling electric field to characterize the link between electrode location and neurosurgical performance. |
X. Luo; |
187 | A New Probabilistic Distance Metric with Application in Gaussian Mixture Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new distance metric to compare two continuous probability density functions. |
A. Sajedi; Y. A. Lawryshyn; K. N. Plataniotis; |
188 | A New Semi-Supervised Classification Method Using A Supervised Autoencoder for Biomedical Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new approach to solve semi-supervised classification tasks for biomedical applications, involving a supervised autoencoder network. |
C. Gille; F. Guyard; M. Barlaud; |
189 | An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study four data augmentation (DA) techniques and two model architectures on realistic data for sound event localization and detection (SELD). |
S. Niu; J. Du; Q. Wang; L. Chai; H. Wu; Z. Nian; L. Sun; Y. Fang; J. Pan; C. -H. Lee; |
190 | Angle-Of-Arrival Target Tracking Using A Mobile Uav In External Signal-Denied Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on the angle-of-arrival (AOA) target tracking problem using a mobile unmanned aerial vehicle (UAV) equipped with an angle-of-arrival (AOA) sensor to observe targets in an external-denied (no global positioning system, inertial navigation system aid) environment. |
B. Zhu; S. Xu; F. Rice; K. Doğançay; |
191 | Animal Re-Identification Algorithm for Posture Diversity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a Multi-pose Feature Fusion Network (MPFNet) is proposed to improve the performance of the Re-ID. |
Z. He; J. Qian; D. Yan; C. Wang; Y. Xin; |
192 | An Implicit Gradient Method for Constrained Bilevel Problems Using Barrier Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose algorithms for solving a class of Bilevel Optimization (BLO) problems, with applications in areas such as signal processing, networking and machine learning. |
I. Tsaknakis; P. Khanduri; M. Hong; |
193 | An Improved Optimal Transport Kernel Embedding Method with Gating Mechanism for Singing Voice Separation and Speaker Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since essential features of a signal can be well reflected on its latent geometric structure of the feature distribution, a natural way to address SVS/SI is to extract the geometry-aware and distribution-related features of the target signal. To do this, this work introduces the concept of optimal transport (OT) to SVS/SI and proposes an improved optimal transport kernel embedding (iOTKE) to extract the target-distribution-related features. |
W. Yuan; Y. Bian; S. Wang; M. Unoki; W. Wang; |
194 | An Interpretable Model Using Evidence Information for Multi-Hop Question Answering Over Long Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better use evidence information, we propose a loss function considering answer groups, which improves the reasoning ability of the reader in the Retriever-Reader architecture. |
Y. Chen; R. Liu; X. Liu; Y. Shi; G. Bai; |
195 | An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on The Zero Resource Speech Challenge 2021 Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose using hidden-unit BERT (HuBERT) self-supervised representation learning, and we provide detailed analyses and comparisons of their isotropies of embedding space, which might influence performance. |
J. Chen; S. Sakti; |
196 | Anomalous Signal Detection for Cyber-Physical Systems Using Interpretable Causal Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a novel time series anomalous signal detection model based on neural system identification and causal inference to track the dynamics of CPS in a dynamical state-space and avoid absorbing spurious correlation caused by confounding bias generated by system noise, which improves the stability, security and interpretability in detection of anomalous signals from CPS. |
S. Zhang; J. Liu; |
197 | Anomalous Sound Detection Using Audio Representation with Machine ID Based Contrastive Learning Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample. |
J. Guan; F. Xiao; Y. Liu; Q. Zhu; W. Wang; |
198 | Anomaly Detection in Optical Spectra VIA Joint Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method based on a joint optimization procedure for estimating the major trends that characterize the spectrum, enabling the detection of anomalies even in the presence of few channels and heavy distortions. |
A. M. Rizzo; L. Magri; P. Invernizzi; E. Sozio; S. Piciaccia; A. Tanzi; S. Binetti; C. Alippi; G. Boracchi; |
199 | A Non-contact SpO2 Estimation Using Video Magnification and Infrared Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Eulerian Video Magnification (EVM) technique was used to enhance the subtle differences in skin pixel intensity in the facial area. |
T. Stogiannopoulos; G. -A. Cheimariotis; N. Mitianoudis; |
200 | An Online Algorithm for Chance Constrained Resource Allocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To model their uncertainties, we take the chance constraints into the consideration. |
Y. Chen; Z. Deng; Y. Zhou; Z. Chen; Y. Chen; H. Hu; |
201 | An Online Algorithm for Contrastive Principal Component Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce a modified cPCA method, which we denote cPCA∗, that is more interpretable and less sensitive to the choice of hyper-parameter. |
S. Golkar; D. Lipshutz; T. Tesileanu; D. B. Chklovskii; |
202 | A Novel Approach Based on Voronoï Cells to Classify Spectrogram Zeros of Multicomponent Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to classify the spectrogram zeros (SZs) of multicomponent signals based on the analysis of the Voronoï cells associated with these zeros. |
N. Laurent; S. Meignen; M. A. Colominas; J. M. Miramont; F. Auger; |
203 | A Novel Cross-Component Context Model for End-to-End Wavelet Image Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a promising alternative approach for neural compression, with an autoencoder whose latent space represents a nonlinear wavelet decomposition. |
A. Meyer; A. Kaup; |
204 | A Novel Efficient Multi-View Traffic-Related Object Detection Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose a novel traffic-related framework named CEVAS to achieve efficient object detection using multi-view video data. |
K. Yang; J. Liu; D. Yang; H. Wang; P. Sun; Y. Zhang; Y. Liu; L. Song; |
205 | A Novel Extrapolation Technique to Accelerate WMMSE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel extrapolation technique to further accelerate WMMSE. |
K. Zhou; Z. Chen; G. Liu; Z. Chen; |
206 | A Novel Heart Rate Estimation Method Exploiting Heartbeat Second Harmonic Reconstruction Via Millimeter Wave Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At present, the interference of the second and third harmonics of respiration has become a significant problem that hinders further improvement of heart rate estimation accuracy. To handle this problem, we propose a novel method to estimate heart rate based on reconstructing the heartbeat second harmonic. |
T. Li; H. Shou; Y. Deng; Y. Zhou; C. Shi; P. Chen; |
207 | A Novel Metric For Evaluating Audio Caption Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel metric based on Text-to-Audio Grounding (TAG), to incorporate acoustic semantics. |
S. Bhosale; R. Chakraborty; S. K. Kopparapu; |
208 | A Novel Mode Selection-Based Fast Intra Prediction Algorithm for Spatial SHVC Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we have proposed a novel Mode Selection-Based Fast Intra Prediction algorithm for SSHVC. |
D. Wang; Y. Sun; W. Li; L. Xie; X. Lu; F. Dufaux; C. Zhu; |
209 | A Novel State Connection Strategy for Quantum Computing to Represent and Compress Digital Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new SCMFRQI (state connection modification FRQI) approach for further reducing the required bits by modifying the state connection using a reset gate rather than repeating the use of the same Toffoli gate connection as a reset gate. |
M. E. Haque; M. Paul; A. Ulhaq; T. Debnath; |
210 | A Novel Transformer-Based Pipeline for Lung Cytopathological Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel three-stage Transformer-based methodology for entire cytopathological whole slide image (WSI) classification. |
G. Li; Q. Liu; H. Liu; Y. Liang; |
211 | Antenna Impedance Estimation in Correlated Rayleigh Fading Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate antenna impedance estimation in a classical estimation framework under correlated Raleigh fading channels. |
S. Wu; B. L. Hughes; |
212 | Any-to-Any Voice Conversion with F0 and Timbre Disentanglement and Novel Timbre Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a PPG-based VC model that directly decodes waveforms. |
S. Kovela; R. Valle; A. Dantrey; B. Catanzaro; |
213 | A Parallel Attention Mechanism for Image Manipulation Detection and Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a parallel attention mechanism based network to localize tampered regions, which is inclined to have better generalization, while it possesses higher model capacity. |
Q. Zeng; H. Wang; Y. Zhou; R. Zhang; S. Meng; |
214 | A Patient Invariant Model Towards The Prediction of Freezing of Gait Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel algorithm to predict the onset of FoG using a single ankle accelerometer sensor. |
N. Ahmed; S. Singhal; A. Sinha; A. Ghose; |
215 | A Perceptual Neural Audio Coder with A Mean-Scale Hyperprior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an end-to-end neural audio coder based on a mean-scale hyperprior model together with a perceptual optimization using a psychoacoustic model (PAM)-based loss function. |
J. Byun; S. Shin; Y. Park; J. Sung; S. Beack; |
216 | A Person Identification System for The ICASSP 2023 E-Prevention Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes SRCB-LUL team’s person identification system submitted to track 1 of the ICASSP 2023 Person Identification and Relapse Detection from Continuous Recordings of Biosignals (e-Prevention) challenge, which aims to identify the wearer of the smartwatch. |
J. Wu; M. Tu; |
217 | A Perturbation-Based Policy Distillation Framework with Generative Adversarial Nets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose deep imitation learning through a guidance-based policy distillation (GIL) algorithm. |
L. Zhang; Q. Liu; X. Zhang; Y. Xu; |
218 | APGP: Accuracy-Preserving Generative Perturbation for Defending Against Model Cloning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel formulation to defend against model cloning attacks. |
A. Cheng; J. Cheng; |
219 | A Phoneme-Informed Neural Network Model For Note-Level Singing Transcription Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method of finding note onsets of singing voice more accurately by leveraging the linguistic characteristics of singing, which are not seen in other instruments. |
S. Yong; L. Su; J. Nam; |
220 | A Physically Explainable Framework for Human-Related Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce physically explainable dynamics to enhance visual representations. |
Y. Jiang; H. Li; C. Li; |
221 | A Point Is A Wave: Point-Wave Network for Place Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods concentrate on the multi-layer perception with intricate architectures, needing lots of parameters to learn with limited gains. Unlike these methods, we propose an innovative method by designing a point-wave module, modeling a point as a wave function to avoid losing the information of origin points. |
G. Li; R. Zhang; |
222 | Applying Independent Vector Analysis on EEG-Based Motor Imagery Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an original approach of IVA as a feature extraction step for Brain-Computer Interfaces, focused on the Motor Imagery (MI) paradigm. |
C. P. A. Moraes; B. Aristimunha; L. H. Dos Santos; W. H. L. Pinaya; R. Y. de Camargo; D. G. Fantinato; A. Neves; |
223 | Applying Symmetrical Component Transform for Industrial Appliance Classification in Non-Intrusive Load Monitoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a load recognition technique for NILM applying low complexity Fortesque Transform (FT). |
A. Faustine; L. Pereira; |
224 | Approximation Error Back-Propagation for Q-Function in Scalable Reinforcement Learning with Tree Dependence Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper applies the exponential decay property of scalable RL theory to a specific scenario where the network structure is a tree, and use KL (Kullback-Leibler) divergence to analyze the propagation of approximation error along the structure over time, in order to quantify its backtracking result. |
Y. Yan; Y. Dong; K. Ma; Y. Shen; |
225 | A Practical Distributed Active Noise Control Algorithm Overcoming Communication Restrictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, this paper develops a novel DMCANC algorithm that utilizes the compensation filters and neighbour nodes’ information to counterbalance the cross-talk effect between channels while maintaining independent weight updating. |
J. Ji; D. Shi; Z. Luo; X. Shen; W. -S. Gan; |
226 | A Principled Approach to Model Validation in Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when it comes to model selection, most of these methods rely on traditional validation routines that select models solely based on the lowest classification risk on the validation set. In this paper, we theoretically demonstrate a trade-off between minimizing classification risk and mitigating domain discrepancy, i.e., it is impossible to achieve the minimum of these two objectives simultaneously. |
B. Lyu; T. Nguyen; M. Scheutz; P. Ishwar; S. Aeron; |
227 | A Privacy-Preserving Trajectory Mining Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider user privacy issues in location-based social networks (LBSNs). |
Z. Wang; S. X. Wu; J. Zhu; Y. Zhu; |
228 | A Probabilistic Framework for Pruning Transformers Via A Finite Admixture of Keys Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a novel probabilistic framework for pruning attention scores and keys in transformers. |
T. M. Nguyen; T. Nguyen; L. Bui; H. Do; D. K. Nguyen; D. D. Le; H. Tran-The; N. Ho; S. J. Osher; R. G. Baraniuk; |
229 | A Processing Framework to Access Large Quantities of Whispered Speech Found in ASMR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe our processing pipeline and a method for improved whispered activity detection (WAD) in the ASMR data. |
P. P. Zarazaga; G. Eje Henter; Z. Malisz; |
230 | Aprogressive Image Dehazing Framework with Inter and Intra Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, it is hard to train end-to-end dehazing networks due to the enormous gap between hazy images and corresponding clear images. In this paper, we propose a novel progressive image dehazing framework with inter and intra contrastive learning to solve the above problems. |
H. Xu; S. Liu; Y. Shu; F. Jiang; |
231 | A Progressive Neural Network for Acoustic Echo Cancellation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a hybrid signal processing and deep echo cancellation method, where a two-stage neural network is designed to remove residual echo progressively. |
Z. Chen; X. Xia; S. Sun; Z. Wang; C. Chen; G. Xie; P. Zhang; Y. Xiao; |
232 | A Prototypical Semantic Decoupling Method Via Joint Contrastive Learning for Few-Shot Named Entity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a Prototypical Semantic Decoupling method via joint Contrastive learning (PSDC) for few-shot NER. |
G. Dong; Z. Wang; L. Wang; D. Guo; D. Fu; Y. Wu; C. Zeng; X. Li; T. Hui; K. He; X. Cui; Q. Gao; W. Xu; |
233 | A Proximal Approach to IVA-G with Convergence Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a penalized maximum-likelihood framework for the problem, which enables us to derive a non-convex cost function that depends on the precision matrices of the source component vectors, the main mechanism by which IVA-G leverages correlation across the datasets. |
C. Cosserat; B. Gabrielson; E. Chouzenoux; J. -C. Pesquet; T. Adali; |
234 | A Quantum Approach for Stochastic Constrained Binary Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this work puts forth a quantum heuristic to cope with stochastic binary quadratically constrained quadratic programs (QCQP). |
S. Gupta; V. Kekatos; |
235 | A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios. |
C. -H. H. Yang; B. Li; Y. Zhang; N. Chen; T. N. Sainath; S. Marco Siniscalchi; C. -H. Lee; |
236 | A Radar-Jammer Zero-Sum Repeated Bayesian Game Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider an instance of a radar jamming countermeasure problem, where the radar and the jammer have uncertainty about the radar environment (for instance, about the noise variance and the radar cross section variance), and they have to account for these uncertainties with statistical priors. |
S. Suvorova; A. Pezeshki; R. Kyprianou; B. Moran; |
237 | A Reality Check and A Practical Baseline for Semantic Speech Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Generating spoken word embeddings that possess semantic information has attracted lots of research interest. Among them, Speech2vec, as one of the most influential works, has … |
G. Chen; Y. Cao; |
238 | A Robust Kalman Filter Based Approach for Indoor Robot Positionning with Multi-Path Contaminated UWB Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, UWB performance suffers from multi-path outliers when signals reflect on surfaces or encounter obstacles. This paper describes an approach to mitigate this issue, based on a M-Estimation Robust Kalman Filter (M-RKF) and leveraging an adaptive empirical variance model for UWB signals. |
J. Cano; Y. Ding; G. Pages; E. Chaumette; J. Le Ny; |
239 | A Role Engineering Approach Based on Spectral Clustering Analysis for Restful Permissions in Cloud Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generally, encryption methods are used to ensure privacy, which may result in high computation and communication overheads. |
Y. Xia; Y. Luo; W. Luo; Q. Shen; Y. Yang; Z. Wu; |
240 | Articulation GAN: Unsupervised Modeling of Articulatory Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Articulatory Generator to the Generative Adversarial Network paradigm, a new unsupervised generative model of speech production/synthesis. |
G. Beguš; A. Zhou; P. Wu; G. K. Anumanchipalli; |
241 | Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel articulatory representation decomposition algorithm that takes the advantage of guided factor analysis to derive the articulatory-specific factors and factor scores. |
J. Lian; A. W. Black; Y. Lu; L. Goldstein; S. Watanabe; G. K. Anumanchipalli; |
242 | A Sentiment and Syntactic-Aware Graph Convolutional Network for Aspect-Level Sentiment Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, constructing more accurate syntactic trees by introducing external knowledge has limited improvement on ungrammatical informal texts and has led to over-parameterization of the model. To alleviate this problem, we propose a sentiment and syntactic-aware graph convolutional network (SaS-GCN) that combines syntactic and sentiment relations. |
Y. Yang; X. Sun; Q. Lu; R. Sutcliffe; J. Feng; |
243 | A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains … |
L. Meng; J. Kang; M. Cui; Y. Wang; X. Wu; H. Meng; |
244 | A Simple Scheme for Coupled Factorization for Hyperspectral Super-Resolution: Exploiting Sparsity in An Easy Way Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop a simple scheme for a coupled matrix factorization problem arising in the topic of hyperspectral super-resolution (HSR). |
Y. Li; W. -K. Ma; R. Wu; H. Liu; |
245 | A Simple Yet Effective Approach to Structured Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Structured prediction models aim at solving tasks where the output is a complex structure, rather than a single variable. |
W. Lin; Y. Li; L. Liu; S. Shi; H. -T. Zheng; |
246 | A Simulation-Based Framework for Urban Traffic Accident Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework to synthesize traffic videos containing both normal traffic and accident events by simulating the real urban traffic scenarios. |
H. Luo; F. Wang; |
247 | A Slot-Shared Span Prediction-Based Neural Network for Multi-Domain Dialogue State Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, the slot-independent design leads to poor scalability. In this paper, we propose a Slot-shared Span Prediction based Network (SSNet) with a general value extraction module for all slots to tackle these problems. |
A. Atawulla; X. Zhou; Y. Yang; B. Ma; F. Yang; |
248 | A Spatial-Temporal ECG Emotion Recognition Model Based on Dynamic Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel ECG emotion recognition method, which adopts a spatial and temporal ECG emotion recognition model based on dynamic feature fusion (DFF-STM) to learn spatial-temporal representations of different ECG areas. |
S. Xiao; X. Qiu; C. Tang; Z. Huang; |
249 | A Spatio-Temporal Decomposition Network for Compressed Video Quality Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Spatio-Temporal Decomposition Network (STDN) to reduce the compressed distortion with motion classification and frequency separation. |
K. Wang; F. Chen; Z. Ye; L. Wang; X. Wu; S. Pu; |
250 | A Speech Representation Anonymization Framework Via Selective Noise Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a speech anonymization framework that achieves privacy via noise perturbation to a selected subset of the high-utility representations extracted using a pre-trained speech encoder. |
M. Tran; M. Soleymani; |
251 | ASSD: Synthetic Speech Detection in The AAC Compressed Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to study if a small set of coding metadata contained in the AAC compressed bit stream is sufficient to detect synthetic speech. |
A. K. Singh Yadav; Z. Xiang; E. R. Bartusiak; P. Bestagini; S. Tubaro; E. J. Delp; |
252 | Assessing The Robustness of Deep Learning-Assisted Pathological Image Analysis Under Practical Variables of Imaging System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct an evaluation pathway to assess the stability and consistency of deep learning models under various customized scanner parameters. |
Y. Sun; C. Zhu; Y. Zhang; H. Li; P. Chen; L. Yang; |
253 | Assisted RTF-Vector-Based Binaural Direction of Arrival Estimation Exploiting A Calibrated External Microphone Array Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we assume the availability of a calibrated array of external microphones, which is characterized by a second database of anechoic prototype RTF vectors. |
D. Fejgin; S. Doclo; |
254 | Associative Learning Network for Coherent Visual Storytelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel Associative Learning Network for Coherent Visual Storytelling to explore the model’s association ability while telling a new story. |
X. Li; C. Liu; Y. Ji; |
255 | A Statistical Interpretation of The Maximum Subarray Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Maximum subarray is a classical problem in computer science that given an array of numbers aims to find a contiguous subarray with the largest sum. We focus on its use for a noisy statistical problem of localizing an interval with a mean different from background. |
D. Wei; D. M. Malioutov; |
256 | AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED. |
K. Li; Y. Song; L. -R. Dai; I. McLoughlin; X. Fang; L. Liu; |
257 | A Study of Audio Mixing Methods for Piano Transcription in Violin-Piano Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study aims to analyze the impact of different data augmentation methods on piano transcription performance, specifically focusing on mixing techniques applied to violin-piano ensembles. |
H. Kim; J. Park; T. Kwon; D. Jeong; J. Nam; |
258 | A Study on Bias and Fairness in Deep Speaker Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study the notion of fairness in recent SR systems based on 3 popular and relevant definitions, namely Statistical Parity, Equalized Odds, and Equal Opportunity. |
A. Hajavi; A. Etemad; |
259 | A Study on The Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge which is part of ICASSP Signal Processing Grand Challenge 2023. |
S. Arora; H. Futami; S. -L. Wu; J. Huynh; Y. Peng; Y. Kashiwagi; E. Tsunoo; B. Yan; S. Watanabe; |
260 | A Study on The Invariance in Security Whatever The Dimension of Images for The Steganalysis By Deep-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the performance invariance of convolutional neural networks when confronted with variable image sizes in the context of a more wild steganalysis. |
K. Planolles; M. Chaumont; F. Comby; |
261 | Asymmetric Polynomial Loss for Multi-Label Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, the imbalance between redundant negative samples and rare positive samples could degrade the model performance. In this paper, we propose an effective Asymmetric Polynomial Loss (APL) to mitigate the above issues. |
Y. Huang; J. Qi; X. Wang; Z. Lin; |
262 | Asymptotically Optimal Nonparametric Classification Rules for Spike Train Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider the nonparametric classification problem for a class of spike train data characterized by nonparametricaly specified intensity functions. |
M. Pawlak; M. Pabian; D. Rzepka; |
263 | Asymptotic Bias and Variance of Kernel Ridge Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Kernel ridge regression is widely used but the theory of its performance has never been fully developed. |
V. Solo; |
264 | Asymptotic Distribution of Stochastic Mirror Descent Iterates in Average Ensemble Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the performance of the SMD on mean-field ensemble models and generalize earlier results obtained for SGD. |
T. Kargin; F. Salehi; B. Hassibi; |
265 | Asynchronous Federated Learning for Real-Time Multiple Licence Plate Recognition Through Semantic Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a federated learning framework is introduced to simultaneously detect multiple license plates over different network cameras through semantic communication. |
R. Xie; C. Li; X. Zhou; Z. Dong; |
266 | Asynchronous Social Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze belief convergence and steady-state learning performance for both traditional and adaptive formulations of social learning under asynchronous behavior by the agents, where some of the agents may decide to abstain from sharing any information with the network at some time instants. |
M. Cemri; V. Bordignon; M. Kayaalp; V. Shumovskaia; A. H. Sayed; |
267 | A Synthetic Corpus Generation Method for Neural Vocoder Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a synthetic corpus generation method for neural vocoder training, which can easily generate synthetic audio with an unlimited number at nearly no cost. |
Z. Wang; P. Liu; J. Chen; S. Li; J. Bai; G. He; Z. Wu; H. Meng; |
268 | A Targeted Sampling Strategy for Compressive Cryo Focused Ion Beam Scanning Electron Microscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a compressive sensing variant of cryo FIB-SEM capable of reducing the operational electron dose and increasing speed. |
D. Nicholls; J. Wells; A. W. Robinson; A. Moshtaghpour; M. Kobylynska; R. A. Fleck; A. I. Kirkland; N. D. Browning; |
269 | A Template Matching Approach for Reference Picture Padding in Video Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper shows that it also improves coding performance when applied in the context of reference picture padding. |
N. Horst; P. Das; M. Wien; |
270 | A Token-Level Contrastive Framework for Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the publicly available SLT corpus is very limited, which causes the collapse of the token representations and the inaccuracy of the generated tokens. To alleviate this issue, we propose Con-SLT, a novel token-level Contrastive learning framework for Sign Language Translation , which learns effective token representations by incorporating token-level contrastive learning into the SLT decoding process. |
B. Fu; P. Ye; L. Zhang; P. Yu; C. Hu; X. Shi; Y. Chen; |
271 | A Topic-Enhanced Approach for Emotion Distribution Forecasting in Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new task: Emotion Distribution Forecasting in Conversations (EDFC), which aims to predict the emotion distribution of next utterance. |
X. Lu; W. Zhao; Y. Zhao; B. Qin; Z. Zhang; J. Wen; |
272 | A Transformer-Based E2E SLU Model for Improved Semantic Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper demonstrates our contribution to the Spoken Language Understanding Grand Challenge at ICASSP 2023. |
O. Istaiteh; Y. Kussad; Y. Daqour; M. Habib; M. Habash; D. Gowda; |
273 | Attention Based Relation Network for Facial Action Units Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Attention Based Relation Network (ABRNet) for AU recognition, which can automatically capture AU relations without unnecessary or even disturbing predefined rules. |
Y. Wei; H. Wang; M. Sun; J. Liu; |
274 | Attention-Guided Deep Learning Framework For Movement Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore an attention-guided transformer-based architecture for MQA. |
A. Kanade; M. Sharma; M. Muniyandi; |
275 | Attention Localness in Shared Encoder-Decoder Model For Text Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a localness attention network, with simplicity and feasibility in mind, which circles different local regions in the source article as contributors in different decoding steps. |
L. Huang; H. Wu; Q. Gao; G. Liu; |
276 | Attention Mixup: An Accurate Mixup Scheme Based On Interpretable Attention Mechanism for Multi-Label Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Attention MixUp (AMU), which only selects those segments that contain sound events for mixup, rather than simply mixing the entire sample. |
W. Liu; Y. Ren; J. Wang; |
277 | A Two-Branch Network for Video Anomaly Detection with Spatio-Temporal Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a two-branch network to obtain the global and each local object’s action information of the clip respectively, where the local objects are extracted by a pre-trained object detector. |
G. Li; S. Chen; Y. Yang; Z. Guo; |
278 | A Two-Stage System for Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Two-Stage system for SLU, which consists of Automatic Speech Recognition (ASR) tasks and Natural Language Understanding (NLU) tasks. |
G. Zhang; S. Miao; L. Tang; P. Qian; |
279 | Audio Barlow Twins: Self-Supervised Audio Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As such, we present Audio Barlow Twins, a novel self-supervised audio representation learning approach, adapting Barlow Twins to the audio domain. |
J. Anton; H. Coppock; P. Shukla; B. W. Schuller; |
280 | Audio Coding With Unified Noise Shaping And Phase Contrast Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a unified noise-shaping (UNS) framework including FDNS and complex LPC-based TNS (CTNS) in the DFT domain is proposed to overcome the aliasing issues. |
B. Jo; S. Beack; T. Lee; |
281 | Audio Cross Verification Using Dual Alignment Likelihood Ratio Test Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for cross verifying a short audio query against a reference recording from which it was taken. |
H. Lei; A. Wonghirundacha; I. Bukey; T. J. Tsai; |
282 | Audiodec: An Open-Source Streaming High-Fidelity Neural Audio Codec Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48 kHz speech signals while operating at only 12 kbps and running with less than 6 ms (GPU)/10 ms (CPU) latency. |
Y. -C. Wu; I. D. Gebru; D. Marković; A. Richard; |
283 | Audio-Driven Facial Landmark Generation in Violin Performance Using 3DCNN Network with Self Attention Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we compile a violin soundtrack and facial expression dataset (VSFE) for modeling facial expressions in violin performance. |
T. -W. Lin; C. -L. Liu; L. Su; |
284 | Audio-Driven High Definetion and Lip-Synchronized Talking Face Generation Based on Face Reenactment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel audio-driven talking face generation method was proposed, which subtly converts the problem of improving video definition into the problem of face reenactment to produce both lip-synchronized and high- definition face video. |
X. Wang; Y. Zhang; W. He; Y. Wang; M. Li; Y. Wang; J. Zhang; S. Zhou; Z. Zhang; |
285 | Audio-Driven Talking Head Video Generation with Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel audio-driven diffusion method for generating high-resolution realistic videos of talking heads with the help of the denoising diffusion model. |
Y. Zhua; C. Zhanga; Q. Liub; X. Zhoub; |
286 | Audio Quality Assessment of Vinyl Music Collections Using Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the self-supervised learning (SSL) model wav2vec 2.0 can be successfully used to predict the perceived audio quality of archive music collections. |
A. Ragano; E. Benetos; A. Hines; |
287 | Audio Signal Enhancement with Learning from Positive and Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we explore SE using non-parallel training data consisting of noisy signals and noise, which can be easily recorded. |
N. Ito; M. Sugiyama; |
288 | Audio-Text Models Do Not Yet Leverage Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that state-of-the-art audio-text models do not yet really understand natural language, especially contextual concepts such as sequential or concurrent ordering of sound events. |
H. -H. Wu; O. Nieto; J. P. Bello; J. Salomon; |
289 | Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a novel approach to predict the user intention (whether the user is speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are obtained via an end-to-end (E2E) ASR model. |
P. Dighe; P. Nayak; O. Rudovic; E. Marchi; X. Niu; A. Tewfik; |
290 | Audio-Visual Inpainting: Reconstructing Missing Visual Information with Sound Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a multimodal, audio-visual inpainting method (AVIN), and show how to leverage sound to reconstruct semantically consistent images. |
V. Sanguineti; S. Thakur; P. Morerio; A. Del Bue; V. Murino; |
291 | Audio-Visual Speaker Diarization in The Framework of Multi-User Human-Robot Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a temporal audio-visual fusion model for multiusers speaker diarization, with low computing requirement, a good robustness and an absence of training phase. |
T. Dhaussy; B. Jabaian; F. Lefèvre; R. Horaud; |
292 | Audio-Visual Speech Enhancement with A Deep Kalman Filter Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an audio-visual deep Kalman filter (AV-DKF) generative model which assumes a first-order Markov chain model for the latent variables and effectively fuses audio-visual data. |
A. Golmakani; M. Sadeghi; R. Serizel; |
293 | Augmentation Robust Self-Supervised Learning for Human Activity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically verify our approaches on three public HAR datasets. |
C. Xu; Y. Li; D. Lee; D. Hoon Park; H. Mao; H. Do; J. Chung; D. Nair; |
294 | Augmenting Transformer-Transducer Based Speaker Change Detection with Token-Level Training Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. |
G. Zhao; Q. Wang; H. Lu; Y. Huang; I. L. Moreno; |
295 | AugTarget Data Augmentation for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As one of main limitations, it is hampering the further promotion of target detection performance. In this paper, we propose a simple and effective data augmentation scheme, AugTarget, to address this shortage issue of small target samples. |
S. Chen; J. Zhu; L. Ji; H. Pan; Y. Xu; |
296 | A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a unified system to realize one-shot voice conversion (VC) on the pitch, rhythm, and speaker attributes. |
L. -W. Chen; S. Watanabe; A. Rudnicky; |
297 | A Unified Uncertainty-Aware Exploration: Combining Epistemic and Aleatory Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an algorithm that clarifies the theoretical connection between aleatory and epistemic uncertainty, unifies aleatory and epistemic uncertainty estimation, and quantifies the combined effect of both uncertainties for a risk-sensitive exploration. |
P. Malekzadeh; M. Hou; K. N. Plataniotis; |
298 | A Unitary Transform Based Generalized Approximate Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the unitary transform approximate message passing (UAMP) and expectation propagation, a unitary transform based generalized AMP (GUAMP) algorithm is proposed for general measurement matrices, in particular highly correlated matrices. |
J. Zhu; X. Meng; X. Lei; Q. Guo; |
299 | AURA: Privacy-Preserving Augmentation to Improve Test Set Diversity in Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Speech enhancement models running in production environments are commonly trained on publicly available data. |
X. Gitiaux; A. Khant; E. Beyrami; C. Reddy; J. Gupchup; R. Cutler; |
300 | Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this work, we investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size. |
P. Ma; A. Haliassos; A. Fernandez-Lopez; H. Chen; S. Petridis; M. Pantic; |
301 | AutoGCF: Personalized Aggregation on Neural Graph Collaborative Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies have shown that the effectiveness of existing NeuGCFs largely relies on the selection of optimal aggregation steps, which makes the performance on various recommendation scenarios unsatisfactory. To tackle this, we for the first time propose a framework to achieve personalized aggregation step assignment on NeuGCF. |
X. You; C. Li; J. Xu; M. Zhang; |
302 | Automatic Camera Pose Estimation By Key-Point Matching of Reference Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to design an automatic camera pose estimation pipeline for clinical spaces such as catheterization laboratories. |
J. Zeng; R. Butler; J. J. van den Dobbelsteen; B. H. W. Hendriks; M. V. der Elst; J. Dauwels; |
303 | Automatic Classification of Vocal Intensity Category from Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the current study, we study machine learning and deep learning -based methods in automatic classification of vocal intensity category when the input speech is expressed using an arbitrary amplitude scale. |
M. Kodali; S. R. Kadiri; L. Laaksonen; P. Alku; |
304 | Automatic Error Detection in Integrated Circuits Image Segmentation: A Data-Driven Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first data-driven automatic error detection approach that targets two types of IC segmentation errors: wire and via errors. |
Z. Zhang; B. M. Trindade; M. Green; Z. Yu; C. Pawlowicz; F. Ren; |
305 | Automatic Segmentation of Nasopharyngeal Carcinoma in CT Images Using Dual Attention and Edge Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the features of uneven grayscale values and hazy boundaries of NPC regions make accurate NPC segmentation particularly challenging. To address these problems, we propose an accurate and effective NPC segmentation method using Dual Attention and Edge Detection Convolutional Neural Network (DAED-Net). |
Q. Wang; W. Huang; Y. Zhang; X. Li; X. Ye; K. Hu; |
306 | Automatic Severity Classification of Dysarthric Speech By Using Self-Supervised Model with Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. |
E. J. Yeo; K. Choi; S. Kim; M. Chung; |
307 | Autonomous Navigation of A Robotic Swarm in Space Exploration Missions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a kinematics-aware information seeking algorithm for swarm navigation. |
S. Zhang; T. Baumgartner; E. Staudinger; R. Pöhlmann; F. Broghammer; A. Dammann; |
308 | Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-Linked Inputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose modular modifications to an existing attention-based deep neural network, to allow early, mid-level, and late feature fusion of participant-linked, visual, and acoustic features. |
K. Ooi; K. N. Watcharasupat; B. Lam; Z. -T. Ong; W. -S. Gan; |
309 | Autotts: End-to-End Text-to-Speech Synthesis Through Differentiable Duration Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method is based on a soft-duration mechanism that optimizes a stochastic process in expectation. Using this differentiable duration method, we introduce AutoTTS, a direct text-to-waveform speech synthesis model. |
B. Nguyen; F. Cardinaux; S. Uhlich; |
310 | Autovocoder: Fast Waveform Generation from A Learned Speech Representation Using Differentiable Digital Signal Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use machine learning to obtain a representation that replaces the mel-spectrogram, and that can be inverted back to a waveform using simple, fast operations including a differentiable implementation of the inverse STFT.The autovocoder generates a waveform 5 times faster than the DSP-based Griffin-Lim algorithm, and 14 times faster than the neural vocoder HiFi-GAN. |
J. J. Webber; C. Valentini-Botinhao; E. Williams; G. E. Henter; S. King; |
311 | Auxiliary Pooling Layer For Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the variable granularity in transferring knowledge from texts to speech representation via APLY, an auxiliary pooling layer, that fuses the global information with the adaptively encoded local context. |
Y. Ma; T. H. Nguyen; J. Ni; W. Wang; Q. Chen; C. Zhang; B. Ma; |
312 | A Variational Inequality Model for Learning Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an alternative approach which foregoes the optimization framework and adopts a variational inequality formalism. |
P. L. Combettes; J. -C. Pesquet; A. Repetti; |
313 | AVES: Animal Vocalization Encoder Based on Self-Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to leverage a large amount of unannotated audio data, we propose AVES (Animal Vocalization Encoder based on Self-Supervision), a self-supervised, transformer-based audio representation model for encoding animal vocalizations. |
M. Hagiwara; |
314 | A Video Anomaly Detection Framework Based on Appearance-Motion Semantics Representation Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The prevalent methods mainly investigate the reconstruction difference between normal and abnormal patterns but ignore the semantics consistency between appearance and motion information of behavior patterns, making the results highly dependent on the local context of frame sequences and lacking the understanding of behavior semantics. To address this issue, we propose a framework of Appearance-Motion Semantics Representation Consistency that uses the gap of appearance and motion semantic representation consistency between normal and abnormal data. |
X. Huang; C. Zhao; Z. Wu; |
315 | Avoid Overthinking in Self-Supervised Models for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then motivate further research in EE by computing an optimal bound for performance versus speed trade-offs. To approach this bound we propose two new strategies for ASR: (1) we adapt the recently proposed patience strategy to ASR; and (2) we design a new EE strategy specific to ASR that performs better than all strategies previously introduced. |
D. Berrebbi; B. Yan; S. Watanabe; |
316 | Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AV-SepFormer, a SepFormer-based attention dual-scale model that utilizes cross- and self-attention to fuse and model features from audio and visual. |
J. Lin; X. Cai; H. Dinkel; J. Chen; Z. Yan; Y. Wang; J. Zhang; Z. Wu; Y. Wang; H. Meng; |
317 | AV-TAD: Audio-Visual Temporal Action Detection With Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current works mainly tackle this task with visual information, while neglecting to explore the potential of the audio modality. To address this challenge, in this paper, we propose a simple yet effective AudioVisual Temporal Action Detection Transformer named AV- TAD, which performs early fusion on audio and visual modalities in an end-to-end fashion. |
Y. Li; Z. Yu; S. Xiang; T. Liu; Y. Fu; |
318 | A Wavelet Scattering Approach for Load Identification with Limited Amount of Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, a two-dimensional wavelet scattering approach for load identification is presented. |
P. A. Schirmer; I. Mporas; |
319 | Backdoor Attack Against Automatic Speaker Verification Models in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: During the training process of FL, we make full use of the advantages of FL, and design a two stage training strategy. Besides, we propose Global Spectral Cluster (GSC) method to alleviate insufficient trigger generalization problem, which cased by the constrain that the attacker can only reach and poison its own data. |
D. Meng; X. Wang; J. Wang; |
320 | Backdoor Defense Via Suppressing Model Shortcuts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the backdoor mechanism from the angle of the model structure. |
S. Yang; Y. Li; Y. Jiang; S. -T. Xia; |
321 | Background Disturbance Mitigation for Video Captioning Via Entity-Action Relocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they focus on exploiting foreground semantics, ignoring the potential negative impact of video background disturbance to caption generation, i.e., the entities and the actions are misjudged by a similar video background. To ameliorate this issue, we propose Entity-Action Relocation (EAR) to enhance the adaptability of entities and actions to various backgrounds by giving them the background. |
Z. Li; X. Zhong; S. Chen; W. Liu; W. Huang; L. Li; |
322 | Background-Weakening Consistency Regularization for Semi-Supervised Video Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus we propose a Background-Weakening with Calibration Constraint (BWCC) framework, which highlights the negative impact of information in the background of false detection by calculating the consistency of the predictions of the background weakened video and the original video. |
X. Zhong; A. Yi; W. Liu; W. Huang; C. Zou; Z. Wang; |
323 | BadRes: Reveal The Backdoors Through Residual Connection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a simple yet strong backdoor attack method called BadRes, where the residual connections play as a turnstile to be deterministic on clean inputs while unpredictable on poisoned ones. |
M. He; T. Chen; H. Zhou; S. Zhang; J. Li; |
324 | Bagging R-CNN: Ensemble for Object Detection in Complex Traffic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The existing methods are not robust enough to be extended to new complex traffic scenes. To address this issue, we leverage the idea of ensemble learning for strong robustness and propose a novel Bagging R-CNN framework. |
P. Li; Y. He; D. Yin; F. R. Yu; P. Song; |
325 | Bag of Tricks with Quantized Convolutional Neural Networks for Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the effectiveness of our proposed method with two popular models, ResNet50 and MobileNetV2, on the ImageNet dataset. |
J. Hu; M. Zeng; E. Wu; |
326 | Balanced Deep CCA for Bird Vocalization Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The key objective of this work is to learn useful embeddings associated with high performance in downstream event detection tasks when labeled data is scarce and the audio events of interest — songbird vocalizations — are sparse. |
S. Kumar; B. Anshuman; L. Rüttimann; R. H. R. Hahnloser; V. Arora; |
327 | Balanced Mixup Loss for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we detail the theoretical analysis of the data imbalance caused by Mixup, and propose a novel Balanced Mixup (BaMix) loss function from the output perspective. |
H. Ye; F. Zhou; X. Li; Q. Zhang; |
328 | Bat: Bi-Alignment Based On Transformation in Multi-Target Domain Adaptation for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, it is impossible for existing methods to handle the more realistic multi-target domain adaptive semantic segmentation (MT-DASS) tasks. To solve this problem, we propose a Bi-Alignment framework based on Transformation (BAT). |
X. Zhong; W. Li; L. Liao; J. Xiao; W. Liu; W. Huang; Z. Wang; |
329 | Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose an uncertainty quantification approach by modeling data distributions in feature spaces. |
X. Chen; Y. Li; Y. Yang; |
330 | Batch Normalization Damages Federated Learning on NON-IID Data: Analysis and Remedy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first convergence analysis to show that the mismatched local and global statistical parameters due to non-i.i.d data cause gradient deviation and it leads the algorithm to converge to a biased solution with a slower rate. |
Y. Wang; Q. Shi; T. -H. Chang; |
331 | BATT: Backdoor Attack with Transformation-Based Triggers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the previous findings from another side. |
T. Xu; Y. Li; Y. Jiang; S. -T. Xia; |
332 | BAUENet: Boundary-Aware Uncertainty Enhanced Network for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing ISTD methods can discover regularly-shaped and clear objects well, but tend to overlook the tough-to-detect ones, such as targets with irregular shapes or blurry boundaries, causing inaccurate segmentation and missed detection. Considering that boundary areas assemble rich uncertainty information, we propose the Boundary-Aware Uncertainty Enhanced Network (BAUENet), where Uncertainty Enhanced Context Refinement (UECR) and Adaptive Feature Fusion Modules (AFFM) are devised to address this problem. |
T. Chen; Q. Chu; Z. Tan; B. Liu; N. Yu; |
333 | Bayesian Cramér-Rao Bound Estimation With Score-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel data-driven method for Bayesian CRB estimation, leveraging state-of-the-art score estimation and deep generative modeling techniques. |
E. S. Crafts; B. Zhao; |
334 | Bayesian Methods for Optical Flow Estimation Using A Variational Approximation, with Applications to Ultrasound Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a unified Bayesian framework for optical flow (OF) estimation that uses a variational lower bound to obtain a variational approximation of the posterior probability distribution. |
J. Dorazil; B. H. Fleury; F. Hlawatsch; |
335 | Bayesian Network Modeling and Prediction of Transitions Within The Homelessness System Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Administrative data collected by homeless service providers offer a unique opportunity to understand how homeless individuals navigate the homeless system towards securing stable … |
K. S. Rahman; D. -S. Zois; C. Chelmis; |
336 | Bayesian Optimization with Ensemble Learning Models and Adaptive Expected Improvement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Optimizing a black-box function that is expensive to evaluate emerges in a gamut of machine learning and artificial intelligence applications including drug discovery, policy optimization in robotics, and hyperparameter tuning of learning models to list a few. |
K. D. Polyzos; Q. Lu; G. B. Giannakis; |
337 | Beamformer-Guided Target Speaker Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method to extract a target speaker’s voice from a multi-channel recording informed by the direction of arrival of the target. |
M. Elminshawi; S. Raj Chetupalli; E. A. P. Habets; |
338 | Beamforming Optimization in RIS-Aided Mimo Systems Under Multiple-Reflection Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method to optimize the RIS phase shifters considering the effect of multiple reflections using a more physically-consistent model (exact RIS model). |
D. Wijekoon; A. Mezghani; E. Hossain; |
339 | BEANS: The Benchmark of Animal Sounds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose BEANS (the BEnchmark of ANimal Sounds), a collection of bioacoustics tasks and public datasets, specifically designed to measure the performance of machine learning algorithms in the field of bioacoustics. |
M. Hagiwara; B. Hoffman; J. -Y. Liu; M. Cusimano; F. Effenberger; K. Zacarian; |
340 | Bebert: Efficient And Robust Binary Ensemble Bert Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap. |
J. Tian; C. Fang; H. Wang; Z. Wang; |
341 | BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. |
Y. Higuchi; T. Ogawa; T. Kobayashi; S. Watanabe; |
342 | Benchmarking Convolutional Neural Network Inference on Low-Power Edge Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result of the low-complexity AI models and the available low-power embedded systems on the market, this paper provides a comparative study on the inference performance of convolutional neural networks for different edge devices, by exploiting low-power GPUs and dedicated AI hardware. |
O. Ferraz; H. Araujo; V. Silva; G. Falcao; |
343 | Benchmarking Cross-Domain Face Recognition with Avatars, Caricatures and Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, various relevant domains have hardly been explored and a lack of public databases hampers the development of new algorithms.In this work, we introduce the HDA Cross-Domain (HDA-CD) face image database comprising 1,400 face images from three different domains including avatars, caricatures, and sketches. |
A. Foroughi; C. Rathgeb; M. Ibsen; C. Busch; |
344 | Benchmarking White Blood Cell Classification Under Domain Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we establish a benchmark for WBC recognition. |
S. Tsutsui; Z. Su; B. Wen; |
345 | Benchmark of Physiological Model Based and Deep Learning Based Remote Photoplethysmography in Automotive Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Remote photoplethysmography (rPPG) can be used to monitor driver’s cardio-respiratory functions in automotive for improving the safety of driving. To understand the challenges of rPPG in this application, we created a benchmark of latest rPPG algorithms based on the MR-NIRP Car dataset, selecting the representative methods from both the physiological model based (PBV and DIS) and deep learning based (Supervised Learning and Contrastive Learning) approaches. |
Z. Wang; X. Yang; H. Lu; C. Shan; W. Wang; |
346 | BER-Aware Dynamic Resource Management for Edge-Assisted Goal-Oriented Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Exploiting Lyapunov optimization, we propose a minimum-energy strategy, which trades information rates for BER, under delay and classification accuracy constraints. |
F. Binucci; P. Banelli; |
347 | Bert Is Robust! A Case Against Word Substitution-Based Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the robustness of BERT using four word substitution-based attacks. |
J. Hauser; Z. Meng; D. Pascual; R. Wattenhofer; |
348 | Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to combine DS and Voice Activity Detection (VAD), both recently proposed for TV audio. |
M. Torcoli; E. A. P. Habets; |
349 | Beyond Neural-on-Neural Approaches to Speaker Gender Protection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Currently, the common practice for developing and testing gender protection algorithms is neural-on-neural, i.e., perturbations are generated and tested with a neural network. In this paper, we propose to go beyond this practice to strengthen the study of gender protection. |
L. van Bemmel; Z. Liu; N. Vaessen; M. Larson; |
350 | Beyond Rate Coding: Signal Coding and Reconstruction Using Lean Spike Trains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent years have seen a growing interest in spike based encoding of continuous time signals–a hallmark of biological computation. In this context, we present a mathematical framework for signal representation, leveraging a simple but robust mechanistic model of a biologically plausible spiking neuron. |
A. Chattopadhyay; A. Banerjee; |
351 | BHE-DARTS: Bilevel Optimization Based on Hypergradient Estimation for Differentiable Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a stochastic bilevel optimization approach based on a hypergradient estimator, called BHE- DARTS, as a remedy for this issue that it is easy to search for locally optimal structures rather than globally optimal ones in Differentiable Architecture Search (DARTS) bilevel optimization model. |
Z. Cai; L. Chen; H. -L. Liu; |
352 | Bias Identification with RankPix Saliency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce RankPix, a novel saliency method for visual bias identification in image classification tasks. |
S. Konate; L. Lebrat; R. S. Cruz; C. Fookes; A. Bradley; O. Salvado; |
353 | Bias Reduced Semidefinite Relaxation Method for Multistatic Localization in The Absence of Transmitter Position And Its Synchronization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the time delay measurements from the direct and indirect paths, we propose to jointly estimate the object and transmitter positions together with the clock offset. |
J. Pei; G. Wang; K. C. Ho; L. Huang; |
354 | Bilateral Coarse-to-Fine Network for Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods of-ten directly infer the missing points from the partial shape, but they suffer from limited structural information. To address this, we propose the Bilateral Coarse-to-Fine Network (BCF-Net), which leverages 2D images as guidance to compensate for structural information loss. |
T. T. Phong Nguyen; S. Lam Phung; V. Gopaldasani; J. Whitelaw; |
355 | Bimodal Fusion Network for Basic Taste Sensation Recognition from Electroencephalography and Electromyography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a bimodal fusion network (Bi-FusionNet) for recognizing basic taste sensations (sour, sweet, bitter, salty, umami, and blank). |
H. Gao; S. Zhao; H. Li; L. Liu; Y. Wang; R. Hu; J. Zhang; G. Li; |
356 | Binary Image Fast Perfect Recovery from Sparse 2D-DFT Coefficients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive a lower bound on the number of coefficients required for perfect image recovery and propose a reconstruction algorithm. |
S. -C. Pei; K. -W. Chang; |
357 | Binary Sequence Set Optimization for CDMA Applications Via Mixed-Integer Quadratic Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the ISL minimization problem may be formulated as a mixed-integer quadratic program (MIQP). |
A. Yang; T. Mina; G. Gao; |
358 | Binauralization Robust To Camera Rotation Using 360° Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel binauralization method that is robust to camera rotation. |
M. Yoshida; R. Togo; T. Ogawa; M. Haseyama; |
359 | Biologically-Inspired Continual Learning of Human Motion Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a model for continual learning on tasks involving temporal sequences, specifically, human motions. |
J. Ott; S. -C. Liu; |
360 | Bipartite Graph Convolutional Networks with Adversarial Domain Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a novel graph convolution operation to propagate in bipartite graph with less spatial and temporal complexities, and two mapping functions with adversarial constraints to transfer features between two domains. |
D. Wu; B. Liang; X. Liu; X. Zang; M. Chi; |
361 | BIRD-PCC: Bi-Directional Range Image-Based Deep Lidar Point Cloud Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, their handcrafted design of residual coding methods could not fully exploit spatial redundancy. To remedy this, we propose a coding framework BIRD-PCC. |
C. -S. Liu; J. -F. Yeh; H. Hsu; H. -T. Su; M. -S. Lee; W. H. Hsu; |
362 | BISVP: Building Footprint Extraction Via Bidirectional Serialized Vertex Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new refinement-free and end-to-end building footprint extraction method, which is conceptually intuitive, simple, and effective. |
M. Zhang; Y. Du; Z. Hu; Q. Liu; Y. Wang; |
363 | Bit Error and Block Error Rate Training for ML-Assisted Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose new loss functions targeted at minimizing the block error rate and SNR deweighting, a novel method that trains communication systems for optimal performance over a range of signal-to-noise ratios. |
R. Wiesmayr; G. Marti; C. Dick; H. Song; C. Studer; |
364 | Blind Acoustic Room Parameter Estimation Using Phase Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent works in speech enhancement, we propose utilizing phase-related features to extend recent approaches to blindly estimate the so-called reverberation fingerprint parameters, namely, volume and RT60. |
C. Ick; A. Mehrabi; W. Jin; |
365 | Blind Estimation of Audio Processing Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply our model to singing voice effects and drum mixing estimation tasks. |
S. Lee; J. Park; S. Paik; K. Lee; |
366 | Blind Polynomial Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in many applications, the input may be partially known or not known at all, rendering conventional regression approaches not applicable. In this paper, we formally state the (potentially partial) blind regression problem, illustrate some of its theoretical properties, and propose an algorithmic approach to solve it. |
A. Natali; G. Leus; |
367 | Blind Source Counting and Separation with Relative Harmonic Coefficients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we transfer a previous relative transfer function based separation method into the wave domain by utilizing a higher-order microphone for the mixture recording. |
H. Sun; P. Samarasinghe; T. Abhayapala; |
368 | Block-Based Color Constancy: The Deviation of Salient Pixels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: During experiments, we observed that some pixels decrease the performance of the method. In this work, the algorithm is modified to eliminate the impact of these pixels. |
O. Ulucan; D. Ulucan; M. Ebner; |
369 | Blood Oxygen Saturation Estimation from Facial Video Via DC and AC Components of Spatio-Temporal Map Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an SpO2 estimation method from facial videos based on convolutional neural networks (CNN). |
Y. Akamatsu; Y. Onishi; H. Imaoka; |
370 | Body Prior Guided Graph Convolutional Neural Network for Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Via taking full advantage of the body prior knowledge, this paper presents a Body Prior Guided Graph Convolutional Network (BPG-GCN) to jointly meet the demand for large-scale training data and effective model architecture. |
Q. Hu; H. Liu; H. -Q. Wang; M. Liu; |
371 | Boosting Bert Subnets with Neural Grafting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Neural grafting to boost BERT subnets, especially the larger ones. |
T. Hu; C. Meinel; H. Yang; |
372 | Boosting Face Recognition Performance with Synthetic Data and Limited Real Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attempt to boost face recognition simultaneously using synthetic data and limited real data. |
W. Wang; L. Zhang; C. -M. Pun; J. -C. Xie; |
373 | Boosting Fine-Grained Sketch-Based Image Retrieval with Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a better self-supervised pre-trained FG-SBIR model which does not depend on large-scale annotated datasets. |
Z. Zhang; Y. Chen; Y. Zhang; R. Feng; T. Zhang; |
374 | Boosting No-Reference Super-Resolution Image Quality Assessment with Knowledge Distillation and Extension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Knowledge Extension Super-Resolution Image Quality Assessment (KE-SR-IQA) framework to predict SR image quality by leveraging a semi-supervised knowledge distillation (KD) strategy. |
H. Zhang; S. Su; Y. Zhu; J. Sun; Y. Zhang; |
375 | Boosting Person Re-Identification with Viewpoint Contrastive Learning and Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite significant progress in person ReID, viewpoint variation remains an obstacle to extracting discriminative features for retrieval. To address this problem, we propose a Viewpoint-Robust Network (VRN) based on contrastive learning and adversarial training to boost person ReID. |
X. Shi; H. Liu; W. Shi; Z. Zhou; Y. Li; |
376 | Boosting Prompt-Based Few-Shot Learners Through Out-of-Domain Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Boost-Distiller, the first few-shot KD algorithm for prompt-tuned PLMs with the help of the out-of-domain data. |
X. Chen; C. Wang; J. Dong; M. Qiu; L. Feng; J. Huang; |
377 | Boosting Semi-Supervised Federated Learning with Model Personalization and Client-Variance-Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we boost the semi-supervised FL by addressing the issue using model personalization and client-variance-reduction. |
S. Wang; Y. Xu; Y. Yuan; X. Wang; T. Q. S. Quek; |
378 | Boosting Signal Modulation Few-Shot Learning with Pre-Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Modulated Signal Pre-transformation (MSP), a parameterized radio signal transformation framework that encourages the signals having the same semantics to have similar representations. |
P. Sun; J. Su; Z. Wen; Y. Zhou; Z. Hong; S. Yu; H. Zhou; |
379 | Boosting The Accuracy of SRAM-Based In-Memory Architectures Via Maximum Likelihood-Based Error Compensation Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a Maximum Likelihood (ML)-based statistical Error Compensation (MLEC) method to enhance the accuracy of binary DPs in a 6T SRAM-based IMC. |
H. Kim; N. Shanbhag; |
380 | Boosting Transferability of Adversarial Example Via An Enhanced Euler’s Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we try to develop a better base attack to boost the transferability of adversarial examples. |
A. Peng; Z. Lin; H. Zeng; W. Yu; X. Kang; |
381 | Boundary Cue Guidance and Contextual Feature Mining for Glasss Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose a boundary cue guidance and contextual feature mining network (BCNet) to accurately and efficiently segment glass. |
Q. Xiao; Y. Zhang; X. Li; K. Hu; |
382 | Brainnetformer: Decoding Brain Cognitive States with Spatial-Temporal Cross Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose BrainNetFormer to incorporate both static and dynamic properties for human behavior prediction. |
L. Sheng; W. Wang; Z. Shi; J. Zhan; Y. Kong; |
383 | Brain Network Features Differentiate Intentions from Different Emotional Expressions of The Same Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To identify effective brain features that were most relevant to intent recognition improvement, we compared the event-related spectral perturbation and effective brain connectivity patterns on two intent conditions (praise vs. irony). |
Z. Li; B. Zhao; G. Zhang; J. Dang; |
384 | Breaking The Trade-Off in Personalized Speech Enhancement With Cross-Task Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that existing PSE methods suffer from a trade-off between speech over-suppression and interference leakage by addressing one problem at the expense of the other. We propose a new PSE model training framework using cross-task knowledge distillation to mitigate this trade-off. |
H. Taherian; S. Emre Eskimez; T. Yoshioka; |
385 | BreathIE: Estimating Breathing Inhale Exhale Ratio Using Motion Sensor Data from Consumer Earbuds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel algorithm, BreathIE, to estimate breathing rate and IE ratio using a low-power motion sensor embedded in consumer-grade earbuds. |
N. Rashid; M. M. Rahman; T. Ahmed; J. Kuang; J. A. Gao; |
386 | Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. |
J. Shi; C. -J. Hsu; H. Chung; D. Gao; P. Garcia; S. Watanabe; A. Lee; H. -Y. Lee; |
387 | BTS-E: Audio Deepfake Detection Using Breathing-Talking-Silence Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Countermeasure (CM) systems have been developed recently to help ASV combat synthetic speech. In this work, we propose BTS-E, a framework to evaluate the correlation between Breathing, Talking (speech), and Silence sounds in an audio clip, then use this information for deepfake detection tasks. |
T. -P. Doan; L. Nguyen-Vu; S. Jung; K. Hong; |
388 | Building Blocks for A Complex-Valued Transformer Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally the Fourier transform of signals is complex-valued and has numerous applications. We aim to make deep learning directly applicable to these complex-valued signals without using projections into ? |
F. Eilers; X. Jiang; |
389 | Building Change Detection Using Cross-Temporal Feature Interaction Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill the gap, we propose a cross-temporal feature interaction network to effectively derive the change representations. |
Y. Feng; J. Jiang; H. Xu; J. Zheng; |
390 | Building Keyword Search System from End-To-End Asr Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe a general KWS pipeline, applicable to any ASR model that generates N-best lists. |
R. Huang; M. Wiesner; L. P. Garcia-Perera; D. Povey; J. Trmal; S. Khudanpur; |
391 | Burst Perception-Distortion Tradeoff: Analysis and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the perception-distortion tradeoff theory by introducing multiple-frame information. |
D. Xue; L. Herranz; J. V. Corral; Y. Zhang; |
392 | Bytecover3: Accurate Cover Song Identification On Short Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we upgrade the previous ByteCover systems to ByteCover3 that utilizes local features to further improve the identification performance of short music queries. |
X. Du; Z. Wang; X. Liang; H. Liang; B. Zhu; Z. Ma; |
393 | Byzantine-Robust and Communication-Efficient Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a projected stochastic block gradient descent method to address the robustness issue. |
X. He; J. Zhang; Q. Ling; |
394 | C2BN: Cross-Modality and Cross-Scale Balance Network for Multi-Modal 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, the multi-level features from images also suffer from imbalance problems in receptive fields. To address the above problems, we propose two novel networks: cross-modality balance network (CMN) and cross-scale balance network (CSN). |
B. Ding; J. Xie; J. Nie; |
395 | C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a Cross-Lingual Cross-Modal Knowledge Distillation method to improve multilingual text-video retrieval. |
A. Rouditchenko; Y. -S. Chuang; N. Shvetsova; S. Thomas; R. Feris; B. Kingsbury; L. Karlinsky; D. Harwath; H. Kuehne; J. Glass; |
396 | CADET: Control-Aware Dynamic Edge Computing for Real-Time Target Tracking in UAV Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an innovative approach – CADET – to control where sensor signals are processed in the system. |
L. F. Florenzan Reyes; F. Smarra; A. D’Innocenzo; M. Levorato; |
397 | CAENet: Using Collaborative Attention Transformer and Add-Boost Strategy for Single Image Deraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the rain streaks in an images are usually complex and diverse while few methods fully explore the richness of the information which may improve the network’s feature representation ability. To solve the above issues, we propose a novel Collaborative Attention Enhanced Network (CAENet) for single image deraining. |
S. Qin; S. Zhang; Y. Zhang; H. Gao; |
398 | Calibrating AI Models for Few-Shot Demodulation VIA Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to leverage the conformal prediction framework to obtain data-driven set predictions whose calibration properties hold irrespective of the data distribution. |
K. M. Cohen; S. Park; O. Simeone; S. Shamai Shitz; |
399 | CAN2V: Can-Bus Data-Based Seq2seq Model for Vehicle Velocity Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a model named CAN2V, which effectively analyzes the vehicle characteristics and driving patterns in the encoder through multi-task learning. |
J. -H. Cho; J. -H. Chang; |
400 | Cancelling Intermodulation Distortions for Otoacoustic Emission Measurements with Earbuds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel cancellation method of Intermodulation Distortions (IMDs) for earbud speakers used to measure Distortion Product Otoacoustic Emissions (DPOAE). |
B. U. Demirel; K. Al-Naimi; F. Kawsar; A. Montanari; |
401 | CANDY: Category-Kernelized Dynamic Convolution for Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the comparable performance between local-based and global-based approaches, the AP results of objects on different scales vary significantly. In this paper, we first point out that the key factor to bridging such a gap lies in the utilization of local RoI information for global mask prediction. |
Y. Lu; Z. Chen; Z. Chen; J. Hu; L. Cao; S. Zhang; |
402 | CANet: Curved Guide Line Network with Adaptive Decoder for Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Lots of solutions were proposed, but can not deal with corner lanes well. To address this problem, this paper proposes a new top-down deep learning lane detection approach, CANet. |
Z. Yang; C. Shen; W. Shao; T. Xing; R. Hu; P. Xu; H. Chai; R. Xue; |
403 | Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we analyze the shortcomings of a TTS-based MIDI-to-audio system and improve it in terms of feature computation, model selection, and training strategy, aiming to synthesize highly natural-sounding audio. |
X. Shi; E. Cooper; X. Wang; J. Yamagishi; S. Narayanan; |
404 | Can Spoofing Countermeasure And Speaker Verification Systems Be Jointly Optimised? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using only a modest quantity of auxiliary data collected from new speakers, we show that joint optimisation degrades the performance of separate CM and ASV sub-systems, but that it nonetheless improves complementarity, thereby delivering superior SASV performance. |
W. Ge; H. Tak; M. Todisco; N. Evans; |
405 | Capacity Maximization for Active RIS Assisted Outdoor-to-Indoor Communication System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to implement outdoor-to-indoor communication with the aid of an active reconfigurable intelligent surface (active-RIS), where the active-RIS allows the incoming signal from an outdoor base station (BS) to pass through the surface and be received by indoor users (UEs) after shifting phase and magnifying amplitude. |
C. He; W. Gong; Y. Dong; X. Xie; Z. J. Wang; |
406 | Capturing Cross-Scale Disparity for Stereo Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on how to effectively exploit the disparity information between stereo viewpoints and proposes a cross-scale parallax-attention network (CSPAN) for stereo image SR. |
K. He; C. Li; D. Zhang; J. Shao; |
407 | Cardiac Disease Diagnosis on Imbalanced Electrocardiography Data Through Optimal Transport Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on a new method of data augmentation to solve the data imbalance problem within imbalanced ECG datasets to improve the robustness and accuracy of heart disease detection. |
J. Qiu; J. Zhu; M. Xu; P. Huang; M. Rosenberg; D. Weber; E. Liu; D. Zhao; |
408 | Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. |
Y. Tseng; C. -I. J. Lai; H. -Y. Lee; |
409 | CAT: Causal Audio Transformer for Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a Causal Audio Transformer (CAT) consisting of a Multi-Resolution Multi- Feature (MRMF) feature extraction with an acoustic attention block for more optimized audio modeling. |
X. Liu; H. Lu; J. Yuan; X. Li; |
410 | Causal Discovery and Causal Inference Based Counterfactual Fairness in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework named Structural Causal Fairness Framework (SCFF) to achieve counterfactual fairness without assumptions like previous works. |
Y. Wang; Z. Luo; |
411 | CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer. |
Y. Xu; B. Liu; Q. Huang; X. Song; Z. Wu; S. Kang; H. Meng; |
412 | CC-PoseNet: Towards Human Pose Estimation in Crowded Classrooms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on improving human pose estimation in crowded classrooms from the perspective of crowd detection and pose refinement. |
Z. Yu; Y. Hu; S. Xiang; T. Liu; Y. Fu; |
413 | CD-FSOD: A Benchmark For Cross-Domain Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a study of the cross-domain few-shot object detection (CD-FSOD) benchmark, consisting of image data from a diverse data domain. |
W. Xiong; |
414 | CDHD: Contrastive Dreamer for Hint Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such homogenous images further hinder the knowledge distillation process when regularising only the deeper layers close to the output, resulting in catastrophic forgetting. To address these issues, we present CDHD: a contrastive dreamer for hint distillation. |
L. Yu; T. Hua; W. Yang; P. Ye; Q. Liao; |
415 | Centralized Cascade Multi-Channel Noise Reduction and Acoustic Feedback Cancellation in A Wireless Acoustic Sensor And Actuator Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a centralized cascade multi-channel noise reduction (NR) and acoustic feedback cancellation (AFC) algorithm for speech applications in a wireless acoustic sensor and actuator network (WASAN). |
S. Ruiz; T. van Waterschoot; M. Moonen; |
416 | Central Nodes Detection from Partially Observed Graph Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on detecting the central nodes in a graph from partially observed graph signals with unknown graph topology. |
Y. He; H. -T. Wai; |
417 | Centroid Distance Distillation for Effective Rehearsal in Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on tackling the continual domain drift problem with centroid distance distillation. |
D. Liu; F. Lyu; L. Li; Z. Xia; F. Hu; |
418 | Certified Robustness of Quantum Classifiers Against Adversarial Examples Through Quantum Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose one first theoretical study that utilizing the added quantum random rotation noise can improve the robustness of quantum classifiers against adversarial attacks. |
J. -C. Huang; Y. -L. Tsai; C. -H. H. Yang; C. -F. Su; C. -M. Yu; P. -Y. Chen; S. -Y. Kuo; |
419 | CFFMixer: Multi-Dimensional Feature Fusion for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that different modules are applicable to different dimensions, we proposed an object detector named CFFMixer which used hybrid architecture to achieve multi-dimensional feature fusion. |
H. Xie; W. Yuan; B. Kang; S. Du; |
420 | CF-VTON: Multi-Pose Virtual Try-on with Cross-Domain Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works in this field have encountered issues such as unnatural garment alignment and difficulty in preserving the person’s identity, arising from the weak mapping relationship between different feature crosses. To address these challenges, this paper proposes a novel multi-pose virtual try-on network named CF-VTON. |
C. Du; S. Xiong; |
421 | Change Point Detection with Neural Online Density-Ratio Estimator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a change point detection method using an online approach based on neural networks to directly estimate the density-ratio between current and reference windows of the data stream. |
X. Wang; R. A. Borsoi; C. Richard; J. Chen; |
422 | Channel-Driven Decentralized Bayesian Federated Learning for Trustworthy Decision Making in D2D Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the observation that DSGLD applies random Gaussian perturbations to the model parameters, we propose to leverage channel noise on the D2D links as a mechanism for MCMC sampling. |
L. Barbieri; O. Simeone; M. Nicoli; |
423 | Channel Estimation in Massive MIMO with Heavy-Tailed Noise: Gaussian-Mixture Versus Cauchy Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we compare two types of massive multiple- input multiple-output (MIMO) receivers, namely those based on a Gaussian-mixture assumption and those based on a Cauchy assumption, in terms of channel estimation quality, when the noise is impulsive. |
Z. Gülgün; E. G. Larsson; |
424 | Channel Estimation with Tightly-Coupled Antenna Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper develops a linear minimum mean-square error (LMMSE) channel estimator that takes advantage of the mutual coupling in antenna arrays. |
B. Tadele; V. Shyianov; F. Bellili; A. Mezghani; |
425 | Channel State Information-Free Artificial Noise-Aided Location-Privacy Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an artificial noise-aided strategy is presented for location-privacy preservation. |
J. Li; U. Mitra; |
426 | Choice Fusion As Knowledge For Zero-Shot Dialogue State Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although prior works have leveraged question-answering (QA) data to reduce the need for in-domain training in DST, they fail to explicitly model knowledge transfer and fusion for tracking dialogue states. To address this issue, we propose CoFunDST, which is trained on domain-agnostic QA datasets and directly uses candidate choices of slot-values as knowledge for zero-shot dialogue-state generation, based on a T5 pre-trained language model. |
R. Su; J. Yang; T. -W. Wu; B. -H. Juang; |
427 | Chord-Conditioned Melody Harmonization With Controllable Harmonicity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Melody harmonization has long been closely associated with chorales composed by Johann Sebastian Bach. Previous works rarely emphasised chorale generation conditioned on chord … |
S. Wu; X. Li; M. Sun; |
428 | CLAP Learning Audio Concepts from Natural Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose to learn audio concepts from natural language supervision. |
B. Elizalde; S. Deshmukh; M. A. Ismail; H. Wang; |
429 | ClassA Entropy for The Analysis of Structural Complexity of Physiological Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the recent theoretical boom in Sample Entropy based algorithms for the analysis of physiological and pathological systems, the major issue which prevents their more widespread use remains that of large computational load, particularly in the studies of quantification of structural richness in data. |
H. Xiao; L. Li; D. P. Mandic; |
430 | Class-Aware Contextual Information for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a CACINet, which consists of a Semantic Affinity Module (SAM) and a Class Association Module (CAM), to generate class-aware contextual information among pixels on a fine-grained level. |
H. Tang; Y. Zhao; Y. Jiang; Z. Gan; Q. Wu; |
431 | Class-Aware Shared Gaussian Process Dynamic Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new method of Gaussian process dynamic model (GPDM), named class-aware shared GPDM (CSGPDM), is presented in this paper. |
R. Sawata; T. Ogawa; M. Haseyama; |
432 | Class-Guided Triple Head Prediction Network for Long-Tail Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate this problem, we devise a novel Class-guided Triple Head Prediction Network (CTHNet). Considering the long-tail LVIS dataset contains frequent, common and rare classes, we propose a Triple Box Heads (TBH) to deal with these three classes, enhancing discriminative representations for all classes. |
X. Liu; Y. Zheng; |
433 | Classification-Based Dynamic Network for Efficient Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To accelerate network inference under resource constraints, we propose a classification-based dynamic network for efficient super-resolution (CDNSR), which combines the classification and SR networks in a unified framework. |
Q. Wang; W. Fang; M. Wang; Y. Cheng; |
434 | Classification of Synthetic Facial Attributes By Means of Hybrid Classification/Localization Patch-Based Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new architecture whose objective is to identify the altered facial attributes of synthetic face images. |
J. Wang; B. Tondi; M. Barni; |
435 | Classification of The Cervical Vertebrae Maturation (CVM) Stages Using The Tripod Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel deep learning method for fully automated detection and classification of the Cervical Vertebrae Maturation (CVM) stages. |
S. Atici; H. Pan; M. H. Elnagar; V. Allareddy; O. Suhaym; R. Ansari; A. E. Cetin; |
436 | Classification Via Subspace Learning Machine (SLM): Methodology and Performance Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the decision learning process of multilayer per-ceptron (MLP) and decision tree (DT), a new classification model, named the subspace learning machine (SLM), is proposed in this work. |
H. Fu; Y. Yang; V. K. Mishra; C. . -C. Jay Kuo; |
437 | Classifying Non-Individual Head-Related Transfer Functions with A Computational Auditory Model: Calibration And Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study explores the use of a multi-feature Bayesian auditory sound localisation model to classify non-individual head-related transfer functions (HRTFs). |
R. Daugintis; R. Barumerli; L. Picinali; M. Geronazzo; |
438 | Classifying Pathological Images Based on Multi-Instance Learning and End-to-End Attention Pooling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to address the issue that previous deep learning methods for classifying pathological images cannot adaptively learn features, we propose an end-to-end attention pooling method based on a multi-instance learning patch scoring model. |
Y. Chen; J. Liu; Z. Zuo; P. Jiang; Y. Jin; G. Wu; |
439 | Class-Incremental Learning on Multivariate Time Series Via Shape-Aligned Temporal Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on practical privacy-sensitive circumstances, we propose a novel distillation-based strategy using a single-headed classifier without saving historical samples. |
Z. Qiao; M. Hu; X. Jiang; P. N. Suganthan; R. Savitha; |
440 | Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Cleanformer —a streaming multichannel neural enhancement frontend for automatic speech recognition (ASR). |
J. Caroselli; A. Narayanan; N. Howard; T. O’Malley; |
441 | Clean Sample Guided Self-Knowledge Distillation for Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, for online Self-knowledge Distillation (SD), DA is not always beneficial because of the absence of a trustworthy teacher model. To address this issue, this paper proposes an SD method named Clean sample guided Self-knowledge Distillation (CleanSD), in which the original clean sample is used as a guide when the model is trained with the augmented samples. |
J. Wang; Y. Li; Q. He; W. Xie; |
442 | Clicker: Attention-Based Cross-Lingual Commonsense Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While mPTMs show the potential to encode commonsense knowledge for different languages, transferring commonsense knowledge learned in large-scale English corpus to other languages is challenging. To address this problem, we propose the attention-based Cross-LIngual Commonsense Knowledge transfER (CLICKER) framework, which minimizes the performance gaps between English and non-English languages in commonsense question-answering tasks. |
R. Su; Z. Sun; S. Lu; C. Ma; C. Guo; |
443 | Client Selection for Generalization in Accelerated Federated Learning: A Bandit Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel multi-armed bandit (MAB)-based approach for CS to minimize the training latency without harming the ability of the model to generalize, i.e., to give reliable predictions for new observations. |
D. B. Ami; K. Cohen; Q. Zhao; |
444 | CLIP4VideoCap: Rethinking Clip for Video Captioning with Multiscale Temporal Fusion and Commonsense Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CLIP4VideoCap for video captioning based on large-scale pre-trained CLIP image and text encoders together with multi-scale temporal reasoning and commonsense knowledge. |
T. Mahmud; F. Liang; Y. Qing; D. Marculescu; |
445 | CLMAE: A Liter and Faster Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, pre-training on big datasets suffers a lengthy training schedule and large memory consumption. To alleviate these problems, we propose a light-weighted model called Convolutional Lite Masked AutoEncoder (CLMAE). |
Y. Song; L. Ma; |
446 | Clustered Greedy Algorithm For Large-Scale Sensor Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a clustering-based solution called clustered greedy selection (CGS) which not only reduces the problem size, but also achieves a similar performance to GS. |
K. Majumder; S. R. B. Pillai; S. Mulleti; |
447 | Clustering-Based Supervised Contrastive Learning for Identifying Risk Items on Heterogeneous Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Clustering-based Supervised Contrastive Learning (CSCL) to address the two challenges. |
A. Li; Y. Ji; G. Chu; X. Wang; D. Li; C. Shi; |
448 | CM-CS: Cross-Modal Common-Specific Feature Learning For Audio-Visual Video Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel cross-modal common-specific feature learning method (cm-CS) to map the modal features into modality-common and modality-specific subspaces. |
H. Chen; D. Zhu; G. Zhang; W. Shi; X. Zhang; J. Li; |
449 | CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Research on Video to Speech Synthesis (VTS) surges recently and the focus is gradually shifting from small-vocabulary short-phrase VTS to large-vocabulary continuous VTS (LVC-VTS). A large-scale dataset with sufficient speakers and utterances is a prerequisite for such research, and the database is certainly language dependent.In this paper, we introduce CN-CVS, a large-scale Mandarin continuous visual-speech dataset, to support LVC-VTS research. |
C. Chen; D. Wang; T. F. Zheng; |
450 | CNEG-VC: Contrastive Learning Using Hard Negative Example In Non-Parallel Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A positive example could not be effectively pushed toward the query examples. We present contrastive learning in non-parallel voice conversion to solve this problem using hard negative examples. |
B. Prihasto; Y. -X. Lin; P. T. Le; C. -L. Huang; J. -C. Wang; |
451 | CNN Filter for RPR-Based SR in VVC with Wavelet Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a convolutional neural network (CNN) filter for reference picture resampling (RPR)-based super-resolution (SR) with wavelet decomposition. |
H. Lan; C. Jung; Y. Liu; M. Li; |
452 | CNN Filter for Super-Resolution with RPR Functionality in VVC Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a convolutional neural network (CNN) filter for super-resolution (SR) with the RPR functionality in VVC. |
S. Huang; C. Jung; Y. Liu; M. Li; |
453 | Coarse-to-Fine Covid-19 Segmentation Via Vision-Language Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there are few relevant studies due to the lack of detailed information and high-quality annotation in the COVID-19 dataset. To solve the above problem, we propose C2FVL, a Coarse-to-Fine segmentation framework via Vision-Language alignment to merge text information containing the number of lesions and specific locations of image information. |
D. Shan; Z. Li; W. Chen; Q. Li; J. Tian; Q. Hong; |
454 | Coarse-To-Fine Knowledge Selection for Document Grounded Dialogs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Re3G, which aims to optimize both coarse-grained knowledge retrieval and fine-grained knowledge extraction in a unified framework. |
Y. Zhang; H. Fu; C. Fu; H. Yu; Y. Li; C. -T. Nguyen; |
455 | Cochlear Decomposition: A Novel Bio-Inspired Multiscale Analysis Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, issues, such as mode mixing for signals with closelyspaced modes, have been identified. To confront such problems, we propose here a novel spatial auditory decomposition framework for non-stationary signals, namely the Cochlear Decomposition (CD). |
H. Alfalahi; A. Khandoker; G. Alhussein; L. Hadjileontiadis; |
456 | Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Cocktail HuBERT, a self-supervised learning framework that generalizes to mixture speech using a masked pseudo source separation objective. |
M. Fazel-Zarandi; W. -N. Hsu; |
457 | Codebook-Based User Tracking in IRS-Assisted MmWave Communication Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel mobile user tracking (UT) scheme for codebook-based intelligent reflecting surface (IRS)-aided millimeter wave (mmWave) systems. |
M. Garkisch; V. Jamali; R. Schober; |
458 | Coded Matrix Computations for D2D-Enabled Linearized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel straggler-optimal approach for coded matrix computations which can significantly reduce the communication delay and privacy issues introduced from D2D data transmissions in FL. |
A. B. Das; A. Ramamoorthy; D. J. Love; C. G. Brinton; |
459 | Code-Enhanced Fine-Grained Semantic Matching For Tag Recommendation In Software Information Sites Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods usually ignore the semantic information of code snippets in software information sites. To tackle this issue, we regard the code as a semantic enhancement signal, and propose a novel Code-Enhanced fine-grained semantic matching method for Tag Recommendation in software information sites (CETR) to learn the matching score between tags and software objects. |
L. Li; P. Wang; X. Zheng; Q. Xie; |
460 | Codes Correcting Burst and Arbitrary Erasures for Reliable and Low-Latency Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by modern network communication applications which require low latency, we study codes that correct erasures with low decoding delay. |
S. Kas Hanna; Z. Tan; W. Xu; A. Wachter-Zeh; |
461 | Co-Design for Mimo Radar and Mimo Communication Aided By Reconfigurable Intelligent Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve the goal, we develop a cyclic framework based on semi-definite programming (SDP), semi-definite relaxation (SDR), and alternating direction method of multiplier (ADMM) to jointly optimize the radar transmit waveforms, the receive filters, the communication codebook, and the RIS coefficients. |
D. Li; B. Tang; L. Xue; |
462 | Code-Switching Speech Synthesis Based on Self-Supervised Learning and Domain Adaptive Speaker Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods are still challenging to synthesize highly natural speech. In order to solve the above problems, we introduce self-supervised learning and frame-level domain adversarial training into the speaker encoder based on the speaker verification task, so that the speaker vectors of different languages keep a consistent distribution in the speaker space, and the performance of speech synthesis is improved. |
Y. -X. Lin; C. -H. Pai; P. T. Le; B. Prihasto; C. -L. Huang; J. C. Wang; |
463 | Code-Switching Text Generation and Injection in Mandarin-English ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T), in Mandarin-English code-switching speech recognition. |
H. Yu; Y. Hu; Y. Qian; M. Jin; L. Liu; S. Liu; Y. Shi; Y. Qian; E. Lin; M. Zeng; |
464 | Cold Diffusion for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The unique mathematical properties of the sampling process from cold diffusion could be utilized to restore high-quality samples from arbitrary degradations. Based on these properties, we propose an improved training algorithm and objective to help the model generalize better during the sampling process. |
H. Yen; F. G. Germain; G. Wichern; J. L. Roux; |
465 | Collaborative Audio-Visual Event Localization Based on Sequential Decision and Cross-Modal Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current approaches model the AVE localization task as a sequential classification process, through which event-relevant video segments cannot accurately collaborate with each other. Therefore, we propose the Collaborative Segments Decision (CSD) that can collaborate between event-relevant video segments by modeling the AVE localization task as a sequential decision process. |
Y. Kuang; X. Fan; |
466 | Color Guided Depth Map Super-Resolution with Nonlocla Autoregres-Sive Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a color guided depth map super-resolution method with nonlocal autoregressive modeling. |
W. Xu; N. Qi; Q. Zhu; J. Qi; L. Huang; K. Cao; Y. Bao; Q. Wang; |
467 | Column-Based Matrix Approximation with Quasi-Polynomial Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work designs the first provable matrix approximation algorithm using just column samples. |
J. Chae; P. Narayanamurthy; S. Bac; S. M. Sharada; U. Mitra; |
468 | Combining Dual-Tree Wavelet Analysis and Proximal Optimization for Anisotropic Scale-Free Texture Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To minimize the corresponding functional, a primal-dual proximal convergent algorithm is devised and accelerated by taking advantage of the strong convexity of the data-fidelity term. |
L. Davy; N. Pustelnik; P. Abry; |
469 | Combining Loss Reweighting and Sample Resampling for Long-Tailed Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The recent long-tailed solutions only consider loss reweighting or sample resampling, which still suffers from the gradient imbalance of positive and negative samples and the overfitting risk of the tail classes. To address these problems, we propose a novel reweighting method, named Foreground and Background Separation Loss (FBSL), to alleviate the imbalance problem of the tail classes being suppressed by the overwhelming foreground and background during the learning process of the model. |
Y. Zhao; S. Chen; Q. Chen; Z. Hu; |
470 | Combining The Silhouette and Skeleton Data for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, appearance-based methods are greatly affected by clothes-changing and carrying conditions, while model-based methods are limited by the accuracy of pose estimation. To tackle this challenge, a simple yet effective two-branch network is proposed in this paper, which contains a CNN-based branch taking silhouettes as input and a GCN-based branch taking skeletons as input. |
L. Wang; R. Han; W. Feng; |
471 | Commdre: Document-Level Relation Extraction with Self-Supervised Commonsense Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a self-supervised commonsense-enhanced DocRE model, called CommDRE, without external knowledge. |
R. Li; J. Zhong; Z. Xue; Q. Dai; C. Wang; X. Li; |
472 | Communication-Constrained Exchange of Zeroth-Order Information with Application to Collaborative Target Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a communication-constrained multi-agent zeroth-order online optimization problem within the federated learning (FL) setting with application to target tracking where multiple agents have access only to the knowledge of their current distances to their respective targets. |
E. C. Kaya; M. Berk Sahin; A. Hashemi; |
473 | Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel graph-based clustering approach called Community Detection Graph Convolutional Network (CDGCN) to improve the performance of the speaker diarization system. |
J. Wang; Z. Chen; H. Zhou; L. Li; Q. Hong; |
474 | Comparative Layer-Wise Analysis of Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we examine the intermediate representations for a variety of recent models. |
A. Pasad; B. Shi; K. Livescu; |
475 | Comparative Study of IRS Assisted Opportunistic Communications Over I.i.d. and Los Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider intelligent reflecting surface (IRS) assisted opportunistic communications (OC), and present a comparative analysis of the system throughput over independent and identically distributed (i.i.d.) and line-of-sight (LoS) channels. |
L. Yashvanth; C. R. Murthy; |
476 | Comparing Decentralized Gradient Descent Approaches and Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In recent work, we presented constructive convergence guarantees for Dec-AltProjGDmin under simple assumptions. |
S. Moothedath; N. Vaswani; |
477 | Comparison of Soft and Hard Target RNN-T Distillation for Large-Scale ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on knowledge distillation for the RNN-T model, which is widely used in state-of-the-art (SoTA) automatic speech recognition (ASR). |
D. Hwang; K. Chai Sim; Y. Zhang; T. Strohman; |
478 | Compensatory Debiasing For Gender Imbalances In Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is particularly challenging to detach and remove biased representations in the embedding space because the learned linguistic knowledge entails bias. To address this problem, we propose a compensatory debiasing strategy to reduce gender bias while preserving linguistic knowledge. |
T. -J. Woo; W. -J. Nam; Y. -J. Ju; S. -W. Lee; |
479 | Complementary Learning System Based Intrinsic Reward in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the fact that humans evaluate curiosity by comparing current observations with historical information, we propose a novel intrinsic reward, namely CLS-IR, which aims to address the problems caused by sparse extrinsic rewards. |
Z. Gao; K. Xu; H. Jia; T. Wan; B. Ding; D. Feng; X. Mao; H. Wang; |
480 | #NAME? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a low-rank approximation algorithm of singular value decomposition (SVD) for large-scale matrices in tensor train format (TT-format). |
J. -C. Chi; C. -E. Chen; Y. -H. Huang; |
481 | Compose & Embellish: Well-Structured Piano Performance Generation Via A Two-Stage Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Observing the above, we devise a two-stage Transformer-based framework that Composes a lead sheet first, and then Embellishes it with accompaniment and expressive touches. |
S. -L. Wu; Y. -H. Yang; |
482 | Composition of Motion from Video Animation Through Learning Local Transformations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we solve the problem of motion representation in videos, according to local transformations applied to specific keypoints extracted from static the images. |
M. Vrigkas; V. Tagka; M. E. Plissiti; C. Nikou; |
483 | Comprehensive Complexity Assessment of Emerging Learned Image Compression on CPU and GPU Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The reported results (1) quantify the complexity of LC methods, (2) fairly compare different methods, and (3) a major contribution of the work is identifying and quantifying the key factors affecting the complexity. |
F. Pakdaman; M. Gabbouj; |
484 | Compressed Distributed Regression Over Adaptive Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We examine the learning performance achievable by a network of agents that solve a distributed regression problem using the recently proposed ACTC (Adapt-Compress-Then-Combine) diffusion strategy. |
M. Carpentiero; V. Matta; A. H. Sayed; |
485 | Compressed-Sensing-Based 3D Localization with Distributed Passive Reconfigurable Intelligent Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the programmable signal propagation paradigm, enabled by Reconfigurable Intelligent Surfaces (RISs), is exploited for high accuracy 3-Dimensional (3D) user localization with a single multi-antenna base station. |
J. He; A. Fakhreddine; H. Wymeersch; G. C. Alexandropoulos; |
486 | Compressing Cross-Domain Representation Via Lifelong Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address a more challenging scenario in which different tasks are presented sequentially, at different times, and the learning goal is to transfer the generative factors of visual concepts learned by a Teacher module to a compact latent space represented by a Student module. |
F. Ye; A. G. Bors; |
487 | Compressive Channel Estimation for IRS-Aided Millimeter-Wave Systems Via Two-Stage Lamp Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By exploiting the low-rank nature of mmWave channels in the virtual angular domain (VAD) and the powerful learned approximate message passing (LAMP) network, we propose a two-stage LAMP network with row compression (RCTS-LAMP). |
W. -C. Tsai; C. -W. Chen; A. -Y. A. Wu; |
488 | Compressive Estimation of Near Field Channels for Ultra Massive-Mimo Wideband THz Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a channel estimation strategy for terahertz (THz) ultra-massive multiple-input multiple-output (UM-MIMO) system with a sub-connected array-of-subarrays architecture, in which one subarray (SA) is connected to one RF chain exclusively. |
S. Tarboush; A. Ali; T. Y. Al-Naffouri; |
489 | Compressive Sensing with Tensorized Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our goal is to recover images without access to the ground-truth (clean) images using the articulations as structural prior of the data. |
R. Hyder; M. S. Asif; |
490 | Conditional Conformer: Improving Speaker Modulation For Single And Multi-User Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, Feature-wise Linear Modulation (FiLM) has been shown to outperform other approaches to incorporate speaker embedding into speech separation and VoiceFilter models. We propose an improved method of incorporating such embeddings into a Voice- Filter frontend for automatic speech recognition (ASR) and text- independent speaker verification (TI-SV). |
T. O’Malley; S. Ding; A. Narayanan; Q. Wang; R. Rikhye; Q. Liang; Y. He; I. McGraw; |
491 | Conditional LS-GAN Based Skylight Polarization Image Restoration and Application in Meridian Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a deep learning-based methodology for restoring SPIs and utilizing the restored images for navigation. |
T. Yang; H. Bo; X. Yang; J. Gao; Z. Shi; |
492 | Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel sampling algorithm that communicates the information of the low-resolution audio via the reverse sampling process of DMs. |
C. -Y. Yu; S. -L. Yeh; G. Fazekas; H. Tang; |
493 | CO-NET: Classification-Oriented Point Cloud Sampling Via Informative Feature Learning and Non-Overlapped Local Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a classification-oriented sampling network named CO-Net, aiming to learn informative sampled points that benefit downstream classification tasks. |
Y. Lin; K. Chen; S. Zhou; Y. Huang; Y. Lei; |
494 | Confidence-Based Event-Centric Online Video Question Answering on A Newly Constructed ATBS Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O2VQA). |
W. Kong; S. Ye; C. Yao; J. Ren; |
495 | Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR). |
Y. Zhang; K. C. Puvvada; V. Lavrukhin; B. Ginsburg; |
496 | CONSEN: Complementary and Simultaneous Ensemble for Alzheimer’s Disease Detection and MMSE Score Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel method for Alzheimer’s disease detection and MMSE prediction using a complementary and simultaneous ensemble (CONSEN) algorithm based on multilingual spontaneous speech. |
L. Jin; Y. Oh; H. Kim; H. Jung; H. J. Jon; J. E. Shin; E. Y. Kim; |
497 | Consistent Estimators of A New Class of Covariance Matrix Distances in The Large Dimensional Regime Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The problem of estimating the distance between two covariance matrices is considered. A general estimator is provided for a class of metrics, the estimator of which has never been … |
R. Pereira; X. Mestre; D. Gregoratti; |
498 | Constrained Dynamical Neural ODE for Time Series Modelling: A Case Study on Continuous Emotion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose constrained dynamical neural ordinary differential equation (CD-NODE) models, which treat the desired time series as a dynamic process that can be described by an ODE. |
T. Dang; A. Dimitriadis; J. Wu; V. Sethu; E. Ambikairajah; |
499 | Constrained Independent Component Analysis Based on Entropy Bound Minimization for Subgroup Identification from Multi-subject FMRI Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a constrained independent component analysis algorithm based on minimizing the entropy bound (c-EBM) to overcome the computational complexity limitation of IVA. |
H. Yang; F. Ghayem; B. Gabrielson; M. A. B. S. Akhonda; V. D. Calhoun; T. Adali; |
500 | Constrained Non-negative PARAFAC2 for Electromyogram Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of this paper is to reduce the crosstalk during the simultaneous extension of the muscles of the index finger and the little finger from a matrix of surface electromygraphy signals (sEMG). |
A. Magbonde; F. Quaine; B. Rivet; |
501 | Content-Insensitive Dynamic Lip Feature Extraction for Visual Speaker Authentication Against Deepfake Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with emerging deepfake technology, attackers can make high fidelity talking videos of a user, thus posing a great threat to these systems. Confronted with this threat, we propose a new deep neural network for lip-based visual speaker authentication against human imposters and deepfake attacks. |
Z. Guo; S. Wang; |
502 | Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a context-aware coherent speaking style prediction method for audiobook speech synthesis. |
S. Lei; Y. Zhou; L. Chen; Z. Wu; S. Kang; H. Meng; |
503 | Context-Aware End-to-end ASR Using Self-Attentive Embedding and Tensor Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a context-aware end-to-end ASR model that injects the self-attentive context embedding into the decoder of the recurrent neural network transducer (RNN-T). |
S. -Y. Chang; C. Zhang; T. N. Sainath; B. Li; T. Strohman; |
504 | Context-Aware Face Clustering with Graph Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Context-Aware Graph Convolutional Network (CAGCN) to explicitly consider both the global and local information. |
D. Zhang; J. Guo; Z. Jin; |
505 | Context-Aware Fine-Tuning of Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the use of context, i.e., surrounding segments, during fine-tuning and propose a new approach called context-aware fine-tuning. |
S. Shon; F. Wu; K. Kim; P. Sridhar; K. Livescu; S. Watanabe; |
506 | Contextually-Rich Human Affect Perception Using Multimodal Scene Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage pretrained vision-language (VLN) models to extract descriptions of foreground context from images. |
D. Bose; R. Hebbar; K. Somandepalli; S. Narayanan; |
507 | Contextual Similarity Is More Valuable Than Character Similarity: An Empirical Study for Chinese Spell Checking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make better use of contextual information, we propose a simple yet effective Curriculum Learning (CL) framework for the CSC task. |
D. Zhang; Y. Li; Q. Zhou; S. Ma; Y. Li; Y. Cao; H. -T. Zheng; |
508 | Continilm: A Continual Learning Scheme for Non-Intrusive Load Monitoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work alleviates the aforementioned limitation by introducing ContiNILM, a continual learning scheme for NILM to build robust models that track environmental/seasonal alterations with direct impact on several appliances’ operation. |
S. Sykiotis; M. Kaselimi; A. Doulamis; N. Doulamis; |
509 | Continual Cell Instance Segmentation of Microscopy Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, as acquiring annotations is label-intensive, cell images can be partially labeled. In this paper, we present iMRCNN, which extends Mask R-CNN with knowledge distillation and pseudo labeling, to address these challenges. |
T. -T. Chuang; T. -Y. Wei; Y. -H. Hsieh; C. -S. Chen; H. -F. Yang; |
510 | Continual Learning for On-Device Speech Recognition Using Disentangled Conformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This algorithm produces ASR models consisting of a frozen ‘core’ network for general-purpose use and several tunable ‘augment’ networks for speaker-specific tuning. Using such models, we propose a novel compute-efficient continual learning algorithm called DisentangledCL. |
A. Diwan; C. -F. Yeh; W. -N. Hsu; P. Tomasello; E. Choi; D. Harwath; A. Mohamed; |
511 | Continuous Action Space-Based Spoken Language Acquisition Agent Using Residual Sentence Embedding and Transformer Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Studies on spoken language acquisition agents aim to understand the mechanism of human language learning and to realize it on computers. |
R. Komatsu; Y. Kimura; T. Okamoto; T. Shinozaki; |
512 | Continuous Descriptor-Based Control for Deep Audio Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We assess the performance of our method on a wide variety of sounds including instrumental, percussive and speech recordings while providing both timbre and attributes transfer, allowing new ways of generating sounds. |
N. Devis; N. Demerlé; S. Nabi; D. Genova; P. Esling; |
513 | Continuous Interaction with A Smart Speaker Via Low-Dimensional Embeddings of Dynamic Hand Pose Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. |
S. Xu; C. Kaul; X. Ge; R. Murray-Smith; |
514 | Continuous Learning for Blind Image Quality Assessment with Contrastive Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Transformer-based BIQA contrastive continual learning approach to improve model transfer performance. |
J. Yang; Z. Wang; B. Huang; L. Deng; |
515 | Contrastive Domain Adaptation Via Delimitation Discriminator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we proposed contrastive domain adaptation via delimitation discriminator (CDVD), which addresses the inconsistency problem of optimizing contrastive learning and classification tasks. |
X. Wei; B. Wen; L. Chen; Y. Liu; C. Zhao; Y. Lu; |
516 | Contrastive Learning at The Relation and Event Level for Rumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, manual data annotation in realistic cases is very expensive and time-consuming. In this paper, we propose a novel self-supervised Relation-Event based Contrastive Learning (RECL) framework for rumor detection to address the above issue. |
Y. Xu; J. Hu; J. Ge; Y. Wu; T. Li; H. Li; |
517 | Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we use instead a contrastive learning procedure that derives cross-modal embeddings linking the audio and text domains. |
S. Durand; D. Stoller; S. Ewert; |
518 | Contrastive Learning of Functionality-Aware Code Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Functionality-aware Code Embeddings (FaCE) in terms of contrastive learning. |
Y. Li; H. Wu; H. Zhao; |
519 | Contrastive Learning of Sentence Embeddings in Product Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose WS-SimCSE, a weak supervision approach based on graph neural networks, which utilizes user behavior data to model relevance relationship between queries and items in a heterogeneous graph. |
B. -W. Zhang; Y. Yan; J. Yu; |
520 | Contrastive Learning with Dialogue Attributes for Neural Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to guide the response generation with attribute-aware contrastive learning to improve the overall quality of the generated responses, where contrastive learning samples are generated according to various important dialogue attributes each specializing in a different principle of conversation. |
J. Tan; H. Cai; H. Chen; H. Cheng; H. Meng; Z. Ding; |
521 | Contrastive Representation Learning for Acoustic Parameter Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A study is presented in which a contrastive learning approach is used to extract low-dimensional representations of the acoustic environment from single-channel, reverberant speech signals. |
P. Götz; C. Tuna; A. Walther; E. A. P. Habets; |
522 | Contrastive Self-Supervised Learning for Automated Multi-Modal Dance Performance Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A fundamental challenge of analyzing human motion is to effectively represent human movements both spatially and temporally. We propose a contrastive self-supervised strategy to tackle this challenge. |
Y. Zhong; F. Zhang; Y. Demiris; |
523 | Contrastive Speech Mixup for Low-Resource Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with the rising demand for smart devices to become more person-alized, KWS models need to adapt quickly to smaller user samples. To tackle this challenge, we propose a contrastive speech mixup (CosMix) learning algorithm for low-resource KWS. |
D. Ng; R. Zhang; J. Q. Yip; C. Zhang; Y. Ma; T. H. Nguyen; C. Ni; E. S. Chng; B. Ma; |
524 | Contrast-PLC: Contrastive Learning for Packet Loss Concealment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to use contrastive learning to learn a loss-robust semantic representation for PLC. |
H. Xue; X. Peng; Y. Lu; |
525 | Controllable Music Inpainting with Mixed-Level and Disentangled Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we contribute a controllable inpainting model by combining the high expressivity of mixed-level, disentangled music representations and the strong predictive power of masked language modeling. |
S. Wei; Z. Wang; W. Gao; G. Xia; |
526 | Convergence Analysis of Graphical Game-Based Nash Q−Learning Using The Interaction Detection Signal of N−Step Return Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we adopt the ${\mathcal{N}}$-step return signal to detect interactions between agents and build the Markov graphical game based on it. |
Y. Zhuang; S. Yang; W. Li; Y. Gao; |
527 | Convergence of Stochastic PDMM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the primal-dual method of multipliers (PDMM), which is a promising distributed optimisation algorithm that is suitable for distributed optimisation in heterogeneous networks. |
S. O. Jordan; T. W. Sherson; R. Heusdens; |
528 | Conversational Text-to-SQL: An Odyssey Into State-of-the-Art and Challenges Ahead Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With multi-tasking (MT) over coherent tasks with discrete prompts during training, we improve over specialized text-to-SQL T5-family models. |
S. Hari Krishnan Parthasarathi; L. Zeng; D. Hakkani-Tür; |
529 | Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In streaming ASR, high accuracy is assured by attending to look-ahead frames, which leads to delay increments. To tackle this trade-off issue, we propose a multiple latency streaming ASR to achieve high accuracy with zero look-ahead. |
H. Zhao; S. Fujie; T. Ogawa; J. Sakuma; Y. Kida; T. Kobayashi; |
530 | Convex Optimization of Deep Polynomial and ReLU Activation Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider training multi-layer neural networks with polynomial and ReLU activation functions. |
B. Bartan; M. Pilanci; |
531 | Convolutional Filtering on Sampled Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Effective linear information processing on the manifold requires quantifying the error incurred when approximating manifold convolutions with graph convolutions. In this paper, we derive a non-asymptotic error bound for this approximation, showing that convolutional filtering on the sampled manifold converges to continuous manifold filtering. |
Z. Wang; L. Ruiz; A. Ribeiro; |
532 | Convolutional Recurrent MetriCGAN With Spectral Dimension Compression For Full-Band Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand it to full-band enhancement by combining our recently proposed learnable spectral dimension compression mapping strategy. |
Z. Hou; Q. Hu; T. Sun; Y. Hu; C. Zhu; K. Chen; |
533 | Convolutional Recurrent Neural Networks for The Classification of Cetacean Bioacoustic Patterns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we focus on the development of a convolutional recurrent neural network (CRNN) to categorize biosignals collected in the Hellenic Trench, generated by two cetacean species, sperm whales (Physeter macrocephalus) and striped dolphins (Stenella coeruleoalba). |
D. N. Makropoulos; A. Tsiami; A. Prospathopoulos; D. Kassis; A. Frantzis; E. Skarsoulis; G. Piperakis; P. Maragos; |
534 | Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an efficient two-dimensional convolution-based attention module, namely C2D-Att. |
J. Li; Y. Tian; T. Lee; |
535 | Convolutive NTF for Ambisonic Source Separation Under Reverberant Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Non-negative Tensor Factorization (NTF) based sound source separation method with a novel convolutive Spatial Covariance Matrix (SCM) model, that is suitable for use with reverberant Ambisonic signals. |
M. Guzik; K. Kowalczyk; |
536 | Co-Operative CNN for Visual Saliency Prediction on WCE Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we preset a novel and robust gaze estimation methodology based on physicians’ eye fixations, using convolutional neural networks (CNNs) trained according to a novel co-operative scheme, on medical images acquired during Wireless Capsule Endoscopy (WCE). |
G. Dimas; A. Koulaouzidis; D. K. Iakovidis; |
537 | Cooperative Five Degrees Of Freedom Motion Estimation For A Swarm Of Autonomous Vehicles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel cooperative-based system that facilitates each autonomous vehicle of the swarm to be fully aware of its 5 degrees of freedom (DOF) motion, i.e., 3D translation and 2D rotation, a very important task for autonomous navigation, known also as simultaneous localization and mapping (SLAM). |
N. Piperigkos; A. S. Lalos; K. Berberidis; C. Anagnostopoulos; |
538 | Core: Transferable Long-Range Time Series Forecasting Enhanced By Covariates-Guided Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, existing methods only take a window of the near past as input, which prevents the models from learning persistent historical patterns. To tackle these problems, we propose CoRe, a novel transferable long-term forecasting method enhanced by Covariates-guided Representation. |
X. -Y. Li; P. -N. Zhong; D. Chen; Y. -B. Yang; |
539 | CORSD: Class-Oriented Relational Self Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel training framework named Class-Oriented Relational Self Distillation (CORSD) to address the limitations. |
M. Yu; S. H. Tan; K. Wu; R. Dong; L. Zhang; K. Ma; |
540 | Cosmopolite Sound Monitoring (CoSMo): A Study of Urban Sound Event Detection Systems Generalizing to Multiple Cities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an in-depth study of the behaviour of state-of-the-art SED systems well suited to our problem, combining three far-field real recordings datasets which can be used jointly during training. |
F. Angulo; S. Essid; G. Peeters; C. Mietlicki; |
541 | Cough Detection Using Millimeter-Wave Fmcw Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a signal processing method to detect human cough signals with a millimeter-wave frequency-modulated continuous-wave (FMCW) radar. |
K. Han; S. Hong; |
542 | Could The BubbleView Metaphor Be Used to Infer Visual Attention on 3D Graphical Content? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we question the adequacy of this method to provide a reliable proxy for visual attention in the context of 3D graphical objects. |
A. Bruckert; M. Abid; M. P. Da Silva; P. Le Callet; |
543 | Counterfactual Explanation for Multivariate Times Series Using A Contrastive Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a model to understand abnormal class features on multivariate time series. |
W. Todo; M. Selmani; B. Laurent; J. -M. Loubes; |
544 | Counterfactual Two-Stage Debiasing For Video Corpus Moment Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a Counterfactual Two-stage Debiasing Learning (CTDL), which incorporates a counterfactual bias network that intentionally learns the retrieval bias by providing a shortcut to learn the spurious correlation between keyword and scene, and performs two-stage debiasing learning that mitigates the bias via contrasting factual retrievals with counterfactually biased retrievals. |
S. Yoon; J. W. Hong; S. Eom; H. S. Yoon; E. Yoon; D. Kim; J. Kim; C. Kim; C. D. Yoo; |
545 | Coupled CP Tensor Decomposition with Shared and Distinct Components for Multi-Task Fmri Data Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a tensor-based framework for multi-task fMRI data fusion, using a partially constrained canonical polyadic (CP) decomposition model. |
R. A. Borsoi; I. Lehmann; M. A. B. S. Akhonda; V. D. Calhoun; K. Usevich; D. Brie; T. Adali; |
546 | Covariance Regularization for Probabilistic Linear Discriminant Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores two alternative covariance regularization approaches, namely, interpolated PLDA and sparse PLDA, to tackle the problem. |
Z. Peng; M. Shao; X. He; X. Li; T. Lee; K. Ding; G. Wan; |
547 | COVID-19 Detection from Speech in Noisy Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore the integration of audio enhancement into a speech-based COVID-19 detection system in an attempt to make speech captured in noisy environments from everyday life useful for the detection of the virus. |
S. Liu; A. Mallol-Ragolta; B. W. Schuller; |
548 | Cov Loss: Covariance-Based Loss for Deep Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an optimized approach for large-scale face recognition. |
I. Alkanhal; A. Almansour; L. Alsalloom; R. Aljadaany; M. Savvides; |
549 | CPA: Compressed Private Aggregation for Scalable Federated Learning Over Massive Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we present compressed private aggregation (CPA), which allows massive deployments to simultaneously communicate at extremely low bit-rates while achieving privacy, anonymity, and resilience to malicious users. |
N. Lang; E. Sofer; N. Shlezinger; R. G. L. D’Oliveira; S. El Rouayheb; |
550 | CPD-GAN: Cascaded Pyramid Deformation GAN for Pose Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing work often failed to transfer complex textures to generated images well. To solve this problem, we propose a novel network for this task. |
Y. Huang; Y. Tang; X. Zheng; J. Tang; |
551 | Cramér-Rao Bound on Lie Groups with Observations on Lie Groups: Application to SE(2) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this communication, we derive a new intrinsic Cramér-Rao bound for both parameters and observations lying on Lie groups. |
S. Labsir; A. Renaux; J. Vilà-Valls; É. Chaumette; |
552 | CRFAST: Clip-Based Reference-Guided Facial Image Semantic Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new task for CLIP-based reference-guided facial image semantic transfer: the source facial image is translated to the output image with the high-level semantic attributes from the reference image while maintaining identity preservation. |
A. Li; L. Zhao; Z. Zuo; Z. Wang; W. Xing; D. Lu; |
553 | Cross-Device Federated Learning for Mobile Health Diagnostics: A First Study on COVID-19 Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FedLoss, a novel cross-device FL framework for health diagnostics. |
T. Xia; J. Han; A. Ghosh; C. Mascolo; |
554 | Cross-Domain Diffusion Based Speech Enhancement for Very Noisy Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose to incorporate diffusion-based learning into an enhancement model and improve robustness in extremely noisy conditions. |
H. Wang; D. Wang; |
555 | Cross-Domain Learning with Normalizing Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One big challenge in cross-domain learning is to effectively synergize the knowledge learning between domains. In this paper, we propose a new solution to address this challenge using normalizing flow, named as DomainFlow, which works as a learned mapping to establish knowledge sharing between source and target domains. |
C. Wang; J. Gao; Y. Hua; H. Wang; |
556 | Cross-Domain Object Classification Via Successive Subspace Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing SSL-based methods rely heavily on the data-centric subspace representations, leading to potential performance degradation problem in case of the domain shift between the training (a.k.a., source domain) and testing (a.k.a., target domain) data. To address this limitation, we propose an effective successive subspace learning method based on existing SSL-based methods. |
K. Chen; H. Li; H. Yan; |
557 | Cross-Head Supervision for Crowd Counting with Noisy Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These noisy annotations severely affect the model training, especially for density map-based methods. To alleviate the negative impact of noisy annotations, we propose a novel crowd counting model with one convolution head and one transformer head, in which these two heads can supervise each other in noisy areas, called Cross-Head Supervision. |
M. Dai; Z. Huang; J. Gao; H. Shan; J. Zhang; |
558 | Cross-Lingual Alzheimer’s Disease Detection Based on Paralinguistic and Pre-Trained Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present our submission to the ICASSP-SPGC-2023 ADReSS-M Challenge Task, which aims to investigate which acoustic features can be generalized and transferred across languages for Alzheimer’s Disease (AD) prediction. |
X. Chen; Y. Pu; J. Li; W. -Q. Zhang; |
559 | Cross-Lingual Transfer Learning for Alzheimer’s Detection from Spontaneous Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill this gap, the ADReSS-M challenge was organized. This paper presents our submission to this ICASSP-2023 Signal Processing Grand Challenge (SPGC). |
B. Tamm; R. Vandenberghe; H. Van Hamme; |
560 | Cross-Modal Adversarial Contrastive Learning for Multi-Modal Rumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Cross-Modal Adversarial Contrastive (CMAC) fusion strategy, in which adversarial learning is used to align the latent feature distribution of text and image, and contrastive learning is used to align the feature distribution among multi-modal samples of the same category. |
T. Zou; Z. Qian; P. Li; Q. Zhu; |
561 | Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Visual speech (i.e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production. This paper investigates this correlation and proposes a cross-modal speech co-learning paradigm. |
M. Liu; K. A. Lee; L. Wang; H. Zhang; C. Zeng; J. Dang; |
562 | Cross-Modal Fusion Techniques for Utterance-Level Emotion Recognition from Text and Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Cross-Modal RoBERTa (CM-RoBERTa) model for emotion detection from spoken audio and corresponding transcripts. |
J. Luo; H. Phan; J. Reiss; |
563 | Cross-Modality Depth Estimation Via Unsupervised Stereo RGB-to-infrared Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our core idea is to first develop an unsupervised RGB-to-IR translation (RIT) network with proposed Fourier domain adaptation and multi-space warping regularization to synthesize stereo IR images from their corresponding stereo RGB images. |
S. Tang; X. Ye; F. Xue; R. Xu; |
564 | Cross Modality Knowledge Distillation for Robust Pedestrian Detection in Low Light and Adverse Weather Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework that utilizes Cross Modality Knowledge Distillation (CMKD) to improve the performance of RGB-only pedestrian detection in low light and adverse weather conditions. |
M. Hnewa; A. Rahimpour; J. Miller; D. Upadhyay; H. Radha; |
565 | Cross-Modal Matching and Adaptive Graph Attention Network for RGB-D Scene Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, cross-modal features have not been considered in most existing methods. To address these concerns, we propose to integrate the tasks of cross-modal matching and modal-specific recognition, termed as Matching-to-Recognition Network (MRNet). |
Y. Guo; X. Liang; J. T. Kwok; X. Zheng; B. Wu; Y. Ma; |
566 | Cross-Modal Mutual Learning for Cued Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the asynchronous modalities (i.e., lip, hand shape and hand position) in CS may cause interference for feature concatenation. To address this challenge, we propose a transformer based cross-modal mutual learning framework to prompt multi-modal interaction. |
L. Liu; L. Liu; |
567 | Cross-Modal Optical Flow Estimation Via Modality Compensation and Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a modality compensation module (MCM) to extract complementary features from different modalities adaptively. |
M. Zhai; K. Ni; J. Xie; H. Gao; |
568 | Cross-Site Generalization for Imbalanced Epileptic Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, the current detection techniques do not really generalize to other patient populations. To address these issues, we present in this paper a hybrid CNN-LSTM model robust to cross-site variability. |
T. Abdallah; N. Jrad; F. Abdallah; A. Humeau-Heurtier; P. Van Bogaert; |
569 | Cross-Speaker Emotion Transfer By Manipulating Speech Style Latents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic in latent style space. |
S. Jo; Y. Lee; Y. Shin; Y. Hwang; T. Kim; |
570 | CROSSSPEECH: Speaker-Independent Acoustic Representation for Cross-Lingual Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CrossSpeech which improves the quality of cross-lingual speech by effectively disentangling speaker and language information in the level of acoustic feature space. |
J. -H. Kim; H. -S. Yang; Y. -C. Ju; I. -H. Kim; B. -Y. Kim; |
571 | Cross-Subject Mental Fatigue Detection Based on Separable Spatio-Temporal Feature Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An inevitable issue of such a paradigm is that the samples near the decision boundary are easy to be misclassified. To address this issue, we propose a Separable Spatio-temporal Feature Aggregation (SSFA) that consists of a Spatio-temporal Feature Extractor (SFE) and a Separable Feature Aggregation mechanism (SFA). |
Y. Ye; Y. He; W. Huang; Q. Dong; C. Wang; G. Wang; |
572 | Cross-Training: A Semi-Supervised Training Scheme for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a solution, we propose cross-training: instead of training one network with two losses, we train two separate networks, each with a different loss; we then tie the parameters of the networks by minimizing an additional L2 loss between the parameters. |
S. Khorram; A. Tripathi; J. Kim; H. Lu; Q. Zhang; R. Prabhavalkar; H. Sak; |
573 | Cross-Utterance ASR Rescoring with Graph-Based Label Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. |
S. Tankasala; L. Chen; A. Stolcke; A. Raju; Q. Deng; C. Chandak; A. Khare; R. Maas; V. Ravichandran; |
574 | CryoSWD: Sliced Wasserstein Distance Minimization for 3D Reconstruction in Cryo-electron Microscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sliced Wasserstein distance (SWD), on the other hand, has shown desirable training stability and ease to compute. Therefore, we propose to re-place Wasserstein-1 distance with SWD in the CryoGAN framework, hence the name CryoSWD. |
M. Zehni; Z. Zhao; |
575 | CSM In Motion Vector Steganalysis: The Effect of Coders on Motion Vectors in H.264 Video Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper compares the outputs of compression operations of a range of coders according to specific MV statistics. |
V. Lachner; K. Schaar; R. Zimmermann; |
576 | CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a simple but effective method, CTCBERT, for advancing hidden-unit BERT (HuBERT). |
R. Fan; Y. Wang; Y. Gaur; J. Li; |
577 | CTTSR: A Hybrid CNN-Transformer Network for Scene Text Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a CNN-Transformer Text Super Resolution Network (CTTSR) to capture the semantic features of text images by the multi-head attention mechanism of the transformer. |
K. Dai; N. Kang; L. Kuang; |
578 | Cumulative Attention Based Streaming Transformer ASR with Internal Language Model Joint Training and Rescoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an approach to improve the performance of streaming Transformer ASR by introducing an internal language model (ILM) as a part of the decoder layers. |
M. Li; C. -T. Do; R. Doddipatla; |
579 | Customized Automatic Face Beautification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better meet the esthetic preferences of users, we devise a customized automatic face beautification task that can retouch the face adaptively to match the user-entered target score whilst preserving the ID information as much as possible. To accomplish this task, we propose a Human Esthetics Guided StyleGAN Inversion method to retouch each face in the embedding space using StyleGAN inversion. |
W. Chen; P. Chen; W. Chen; L. Lin; |
580 | Cutting Through The Noise: An Empirical Comparison of Psycho-Acoustic and Envelope-based Features for Machinery Fault Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Limited attention has been paid to improving the robustness of fault detection against industrial environmental noise. Therefore, we present the Lenze production background-noise (LPBN) real-world dataset and an automated and noise-robust auditory inspection (ARAI) system for the end-of-line inspection of geared motors. |
P. Wiβbrock; Y. Richter; D. Pelkmann; Z. Ren; G. Palmer; |
581 | CyFi-TTS: Cyclic Normalizing Flow with Fine-Grained Representation for End-to-End Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Usually, the generated speech tends to be mispronounced because the one-to-many problem creates an information gap between the text and speech. To address these problems, we propose a cyclic normalizing flow with fine-grained representation for end-to-end text-to-speech (CyFi-TTS), which generates natural-sounding speech by bridging the information gap. |
I. -S. Hwang; Y. -S. Han; B. -K. Jeon; |
582 | CyPMLI: WISL-Minimized Unimodular Sequence Design Via Power Method-Like Iterations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an efficient approach to WISL minimization for unimodular sequence design that takes advantage of the low-cost and easily implementable power method-like iterations. |
A. Eamaz; F. Yeganegi; M. Soltanalian; |
583 | D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network Using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many recent studies focus on either complex masking or complex spectral mapping, ignoring their performance boundaries. To address above issues, we propose a fully complex dual-path dual-decoder conformer network (D2Former) using joint complex masking and complex spectral mapping for monaural speech enhancement. |
S. Zhao; B. Ma; |
584 | D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end framework for oriented object detection, which simplifies the model pipeline and obtains superior performance. |
Q. Zhou; C. Yu; Z. Wang; F. Wang; |
585 | D-3DLD: Depth-Aware Voxel Space Mapping for Monocular 3D Lane Detection with Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a new lane representation combined with uncertainties and predict the confidence intervals of 3D lane points using Laplace loss. |
N. Kim; M. Byeon; D. Ji; D. Oh; |
586 | Daily Mental Health Monitoring from Speech: A Real-World Japanese Dataset and Multitask Learning Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Japanese Daily Speech Dataset (JDSD), a large in-the-wild daily speech emotion dataset consisting of 20,827 speech samples from 342 speakers and 54 hours of total duration. |
M. Song; A. Triantafyllopoulos; Z. Yang; H. Takeuchi; T. Nakamura; A. Kishi; T. Ishizawa; K. Yoshiuchi; X. Jing; V. Karas; Z. Zhao; K. Qian; B. Hu; B. W. Schuller; Y. Yamamoto; |
587 | DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce DailyTalk, a high-quality conversational speech dataset designed for conversational TTS. |
K. Lee; K. Park; D. Kim; |
588 | DAIS: The Delft Database of EEG Recordings of Dutch Articulated and Imagined Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present an open database consisting of electroencephalography (EEG) and speech data from 20 participants recorded during the covert (imagined) and actual articulation of 15 Dutch prompts. |
B. Dekker; A. C. Schouten; O. Scharenborg; |
589 | DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification, which can generate diversified training samples in speaker embedding space with negligible extra computing cost. |
Y. Wang; Y. Zhang; Z. Wu; Z. Yang; T. Wei; K. Zou; H. Meng; |
590 | Dasformer: Deep Alternating Spectrogram Transformer For Multi/Single-Channel Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the task of speech separation, previous study usually treats multi-channel and single-channel scenarios as two research tracks with specialized solutions developed respectively. Instead, we propose a simple and unified architecture – DasFormer (Deep alternating spectrogram transFormer) to handle both of them in the challenging reverberant environments. |
S. Wang; X. Kong; X. Peng; H. Movassagh; V. Prakash; Y. Lu; |
591 | Data2vec-Aqc: Search for The Right Teaching Assistant in The Teacher-Student Training Setup Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. |
V. S. Lodagala; S. Ghosh; S. Umesh; |
592 | DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for generative tasks such as speech enhancement and speech separation, most self-supervised speech representations did not show substantial improvements. To deal with this problem, in this paper, we propose data2vec-SG (Speech Generation), which is a teacher-student learning framework that addresses speech generation tasks. |
H. Wang; Y. Qian; H. Yang; N. Kanda; P. Wang; T. Yoshioka; X. Wang; Y. Wang; S. Liu; Z. Chen; D. Wang; M. Zeng; |
593 | Data Augmentation Based On Invariant Shape Blending For Deep Learning Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we introduce a new technique for augmenting 2D shape datasets based on a planar blending. |
E. Ghorbel; M. Ghorbel; S. Mhiri; |
594 | Data-Aware Zero-Shot Neural Architecture Search for Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a generator to generate data for a score calculation with affordable overhead, and adopt contrastive learning to optimize the generator for a more stable score. |
Y. Fan; Z. -H. Niu; Y. -B. Yang; |
595 | Database-Aware ASR Error Correction for Speech-to-SQL Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an ASR correction method, DBATI (DataBase-Aware TaggerILM). |
Y. Shao; A. Kumar; N. Nakashole; |
596 | Data-Driven Graph Convolutional Neural Networks for Power System Contingency Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a graph convolutional neural network for power system contingency analysis. |
V. Bolz; J. Rue; A. Zell; |
597 | Data Driven Joint Sensor Fusion and Regression Based on Geometric Mean Squared Error Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An efficient data driven algorithm is proposed to obtain the best linear sensor combiner, whose performance is numerically analyzed and compared with the Cramer-Rao Lower Bound of the estimated parameters. |
C. A. Lopez; J. Riba; |
598 | Data-Driven Quickest Change Detection in Markov Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A kernel based data-driven algorithm is developed, which applies to general state space and is recursive and computationally efficient. |
Q. Zhang; Z. Sun; L. C. Herrera; S. Zou; |
599 | Data Leakage in Cross-Modal Retrieval Training: A Case Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose new training, validation, and testing splits for the dataset that we make available online. |
B. Weck; X. Serra; |
600 | Dataset Balancing Can Hurt Model Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find, however, that while balancing improves performance on the public AudioSet evaluation data it simultaneously hurts performance on an unpublished evaluation set collected under the same conditions. By varying the degree of balancing, we show that its benefits are fragile and depend on the evaluation set. |
R. C. Moore; D. P. W. Ellis; E. Fonseca; S. Hershey; A. Jansen; M. Plakal; |
601 | DB-UNet: MLP Based Dual Branch UNet for Accurate Vessel Segmentation in OCTA Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a dual branch UNet (DB-UNet), which has a pure-convolutional branch to extract detailed features such as microvessels, and a UNet branch to extract high-level features. |
C. Wang; H. Ning; X. Chen; S. Li; |
602 | D-CONFORMER: Deformable Sparse Transformer Augmented Convolution for Voxel-Based 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to fuse convolution and transformer, and simultaneously considering the different contributions of non-empty voxels at different positions in 3D space to object detection, it is not consistent with applying standard convolution and transformer directly on voxels. |
X. Zhao; L. Su; X. Zhang; D. Yang; M. Sun; S. Wang; P. Zhai; L. Zhang; |
603 | DDN: Dynamic Aggregation Enhanced Dual-Stream Network for Medical Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dynamic aggregation enhanced dual-stream network termed DDN to take the advantage of ViT and CNN to enrich the feature representation of medical images. |
L. Wang; J. Liu; P. Jiang; D. Cao; B. Pang; |
604 | Decaying Contrast for Fine-Grained Video Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a decaying strategy to grasp the gradual evolution along the temporal dimension for fine-grained spatiotemporal representation learning, which consists of two novel contrastive losses. |
H. Zhang; B. Su; |
605 | Decoding Auditory EEG Responses Using An Adapted Wavenet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a WaveNet-based model that placed second in the Auditory EEG Challenge on the regression subtask of the ICASSP Signal Processing Grand Challenge 2023. |
B. Van Dyck; L. Yang; M. M. Van Hulle; |
606 | Decoding Musical Pitch from Human Brain Activity with Automatic Voxel-Wise Whole-Brain FMRI Feature Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a two-stage thresholding approach that automatically pools relevant voxels from the whole-brain to enhance decoding performance. |
V. K. M. Cheung; Y. -P. Peng; J. -H. Lin; L. Su; |
607 | DecomFormer: Decompose Self-Attention Via Fourier Transform for VHR Aerial Image Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although transformer-based models have demonstrated strong ability in natural image classification, transformer-based methods on VHR aerial image tasks are still lack of concern because the complexity of self-attention in the transformer grows quadratically with the image resolution. To address this issue, we decompose the self-attention via Fourier Transform and propose a novel Fourier self-attention (FSA) mechanism. |
Y. Zhang; X. Gao; X. Pu; T. Wang; X. Gao; |
608 | Decomposition, Interaction, Reconstruction Meets Global Context Learning In Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tensor decomposition and reconstruction attention is a promising global context learning approach because it can remain efficient while avoiding feature compression. |
H. Tan; K. Hu; M. Cao; M. Wang; L. Xu; W. Yang; |
609 | Decontamination Transformer For Blind Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired that most existing methods for inpainting suffer from complex contamination, we propose a model that explicitly predicts the realvalued alpha mask and contaminant to eliminate the contamination from the corrupted image, thus improving the inpainting performance. |
C. -Y. Li; Y. -Y. Lin; W. -C. Chiu; |
610 | Decorrelating Language Model Embeddings for Speech-Based Prediction of Cognitive Impairment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new regularization scheme to penalize correlated embeddings during fine tuning of BERT and apply the approach to speech-based assessment of cognitive impairment. |
L. Xu; K. D. Mueller; J. Liss; V. Berisha; |
611 | Decoupled Non-Parametric Knowledge Distillation for End-to-End Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency. |
H. Zhang; N. Si; Y. Chen; W. Zhang; X. Yang; D. Qu; Z. Li; |
612 | Decoupled Visual Causality for Robust Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a disentangled visual causal model to eliminate the effects of confounders while reserving the corresponding mediators. |
P. Jiang; X. Deng; S. Zhang; |
613 | Deep3DSketch: 3D Modeling from Free-Hand Sketches with View- and Structural-Aware Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a view-and structural-aware deep learning approach, Deep3DSketch, which tackles the ambiguity and fully uses sparse information of sketches, emphasizing the structural information. |
T. Chen; C. Fu; L. Zhu; P. Mao; J. Zhang; Y. Zang; L. Sun; |
614 | Deep Adaptive Superpixels For Hadamard Single Pixel Imaging In Near-Infrared Spectrum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we proposed an adaptive end-to-end sensing methodology for the HSI sensing matrix design based on deep superpixels estimation by coupling the sensing and recovery of the near-infrared spectral images. |
B. Monroy; J. Bacca; H. Arguello; |
615 | Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it. |
H. Zhang; M. Yu; D. Yu; |
616 | Deep Architecture for DOA Trajectory Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a data-based joint localization and tracking task called trajectory localization with source trajectories identified for a block (multiple measurements) of array data. |
S. Jaiswal; R. Pandey; S. Nannuru; |
617 | Deep Autoencoding One-Class Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new AD method called deep Autoencoding One-Class (AOC), which learns features with AutoEncoder(AE). |
X. Mou; R. Wang; T. Wang; J. Sun; B. Li; T. Wo; X. Liu; |
618 | Deep Born Operator Learning for Reflection Tomographic Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a physics-inspired machine learning-based method to learn the wave-matter interaction under the GPR setting. |
Q. Zhao; Y. Ma; P. Boufounos; S. Nabi; H. Mansour; |
619 | Deep Double Self-Expressive Subspace Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a double self-expressive subspace clustering algorithm. |
L. Zhao; Y. Ma; S. Chen; J. Zhou; |
620 | Deep Feature Aggregation for Lightweight Single Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a lightweight deep feature aggregation network (DFAnet), which fuses the outputs of all the deep feature aggregation blocks (DFAB) through the designed nonlinear global feature fusion (NGFF) module. |
Y. Li; X. He; S. Tian; Z. Li; S. Long; |
621 | Deep Fusion of Multi-Object Densities Using Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that deep learning based methods can be used to fuse multi-object densities. |
L. Li; C. Dai; Y. Xia; L. Svensson; |
622 | Deep Generative Fixed-Filter Active Noise Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, the limited number of pre-trained control filters may affect noise reduction performance, especially when the incoming noise differs much from the initial noises during pre-training. Therefore, a generative fixed-filter active noise control (GFANC) method is proposed in this paper to overcome the limitation. |
Z. Luo; D. Shi; X. Shen; J. Ji; W. -S. Gan; |
623 | Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN) to deal with cross-corpus speech emotion recognition (SER) problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora. |
Y. Zhao; J. Wang; Y. Zong; W. Zheng; H. Lian; L. Zhao; |
624 | Deep Learning-Based Compressive Sampling Optimization in Massive MIMO Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a deep learning framework to optimize the compressive sampling matrix in a massive multiple-input multiple-output (MIMO) system. |
S. R. Pavel; Y. D. Zhang; M. S. Greco; F. Gini; |
625 | Deep Learning-Based Path Loss Prediction for Outdoor Wireless Communication Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents how a deep convolutional encoder-decoder, namely a path loss prediction net (PPNet) based on SegNet, can be trained to transform information related to an outdoor propagation environment into a PL heatmap. |
K. Qiu; S. Bakirtzis; H. Song; I. Wassell; J. Zhang; |
626 | Deep Learning-Based Stereo Camera Multi-Video Synchronization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A software-based synchronization method would reduce the cost, weight and size of the entire system and allow for more flexibility when building such systems. With this goal in mind, we present here a comparison of different deep learning-based systems and prove that some are efficient and generalizable enough for such a task. |
N. Boizard; K. E. Haddad; T. Ravet; F. Cresson; T. Dutoit; |
627 | Deep Learning for Lagrangian Drift Simulation at The Sea Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel architecture, referred to as DriftNet, inspired from the Eulerian Fokker-Planck representation of Lagrangian dynamics. |
D. Botvynko; C. Granero-Belinchon; S. v. Gennip; A. Benzinou; R. Fablet; |
628 | Deep Learning Sparse Array Design Using Binary Switching Configurations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design sparse arrays using binary switching per RF chain for optimum beamforming that maximizes signal-to-interference-and-noise ratio (SINR). |
S. A. Hamza; K. Juretus; M. G. Amin; F. Ahmad; |
629 | Deep Low Light Image Enhancement Via Multi-Scale Recursive Feature Enhancement and Curve Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Enhancing lowlight images tends to amplify noise. To address this problem, we propose a Multi-Scale Recursive Feature Enhancement (MSRFE) network for low light image enhancement. |
H. Jin; D. Wei; H. Su; |
630 | Deep Manifold Graph Auto-Encoder For Attributed Graph Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) method for attributed graph data to improve the stability and quality of learned representations to tackle the crowding problem. |
B. Hu; Z. Zang; J. Xia; L. Wu; C. Tan; S. Z. Li; |
631 | Deep Network Series for Large-Scale High-Dynamic Range Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach for large-scale high-dynamic range computational imaging. |
A. Aghabiglou; M. Terris; A. Jackson; Y. Wiaux; |
632 | Deep Neural Mel-Subband Beamformer for In-Car Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DL-based mel-subband spatio-temporal beamformer to perform speech separation in a car environment with reduced computation cost and inference time. |
V. Kothapally; Y. Xu; M. Yu; S. -X. Zhang; D. Yu; |
633 | Deep Plug-and-Play for Tensor Robust Principal Component Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To restore the data more accurately, we propose a new TRPCA method which simultaneously combines the model-based method and data-driven approaches to preserve the global structure and fine local information. |
H. Tan; J. Wang; W. Kong; |
634 | Deep Probabilistic Model for Lossless Scalable Point Cloud Attribute Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build an end-to-end multiscale point cloud attribute coding method (MNeT) that progressively projects the attributes onto multiscale latent spaces. |
D. T. Nguyen; K. G. Nambiar; A. Kaup; |
635 | Deep Proximal Gradient Method for Learned Convex Regularizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to apply Accelerated Proximal Gradient Method (APGM) using a learned proximity operator in place of the true proximity operator of the learned penalty function. |
A. Berk; Y. Ma; P. Boufounos; P. Wang; H. Mansour; |
636 | Deep Quantigraphic Image Enhancement Via Comparametric Equations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The former is usually less efficient, and the latter is constrained by a strong assumption regarding image reflectance as the desired enhancement result. To alleviate this constraint while retaining high efficiency, we propose a novel trainable module that diversifies the conversion from the low-light image and illumination map to the enhanced image. |
X. Wu; Y. Sun; A. Kimura; |
637 | Deep Reinforcement Learning for Green UAV-Assisted Data Collection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Later we leverage the deep reinforcement learning (DRL) framework based on a deep deterministic policy gradient (DDPG) algorithm to learn the UAV’s trajectory. |
A. Mondal; D. Mishra; G. Prasad; A. Hossain; |
638 | Deep Root Music Algorithm for Data-Driven Doa Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose Deep Root-MUSIC (DR-MUSIC); a data-driven DoA estimator which augments Root-MUSIC with a deep neural network applied to the empirical autocorrelation of the input. |
D. H. Shmuel; J. P. Merkofer; G. Revach; R. J. G. van Sloun; N. Shlezinger; |
639 | Deepspace: Dynamic Spatial and Source CUE Based Source Separation for Dialog Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. |
A. Master; L. Lu; J. Samuelsson; H. -M. Lehtonen; S. Norcross; N. Swedlow; A. Howard; |
640 | Deep Spatio-Temporal Multiplex Graph Learning for Cardiac Imaging Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel representation of spatio-temporal cardiac data as a multiplex graph and develop a multi-level message passing neural network to classify clinical groups corresponding to different cardiovascular diseases. |
J. Banus; A. Ogier; R. Hullin; P. Meyer; R. B. van Heeswijk; J. Richiardi; |
641 | Deep Spectrum Cartography Using Quantized Measurements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Spectrum cartography (SC) techniques craft multi-domain (e.g., space and frequency) radio maps from limited measurements, which is an ill-posed inverse problem. |
S. Timilsina; S. Shrestha; X. Fu; |
642 | Deep Subband Network for Joint Suppression of Echo, Noise and Reverberation in Real-Time Fullband Speech Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a deep and lightweight subband neural network which jointly suppresses the common interference in real-time fullband speech communication: echo, noise and reverberation. |
F. Xiong; M. Dong; K. Zhou; H. Zhu; J. Feng; |
643 | Deep Survival Analysis and Counterfactual Inference Using Balanced Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the nature of observational data, this framework is prone to treatment selection bias and censoring bias. We address these issues and propose the novel SurvCI framework which consists of a counterfactual inference (CI) algorithm for data with time-to-event outcomes. |
M. Gupta; G. Kannan; R. Prasad; G. Gupta; |
644 | Deep Triple-Supervision Learning Unannotated Surgical Endoscopic Video Data for Monocular Dense Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, such a dense depth recovery suffers from illumination variation, weak texture, and occlusion. To address these problems, this work proposes a new triple-supervision self-learning strategy that uses unannotated endoscopic video data to predict monocular endoscopic dense depth information. |
W. Fan; K. Zhang; H. Shi; J. Chen; Y. Chen; X. Luo; |
645 | Deep-Unfolded Adaptive Projected Subgradient Method For Mimo Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose deep-unfolded versions of the recently proposed superiorized adaptive projected subgradient method for MIMO detection. |
J. Fink; R. L. G. Cavalcante; Z. Utkovski; S. Stańczak; |
646 | Deep Unfolded Tensor Robust PCA With Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we describe a fast and simple self-supervised model for tensor RPCA using deep unfolding by only learning four hyperparameters. |
H. Dong; M. Shah; S. Donegan; Y. Chi; |
647 | Deep Unfolding-Enabled Hybrid Beamforming Design for MmWave Massive MIMO Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient deep unfolding-based HBF scheme, referred to as ManNet-HBF, that approximately maximizes the system spectral efficiency (SE). |
N. Nguyen; M. Ma; N. Shlezinger; Y. C. Eldar; A. L. Swindlehurst; M. Juntti; |
648 | Defending Against Universal Patch Attacks By Restricting Token Attention in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we empirically reveal and mathematically explain that the shallow tokens in the transformer and the attention of the network can largely influence the classification result. |
H. Yu; J. Chen; H. Ma; C. Yu; X. Ding; |
649 | Defense Against Black-Box Adversarial Attacks Via Heterogeneous Fusion Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an effective approach for the adversarial defense task named a heterogeneous feature fusion network (HFFN). |
J. Zhang; K. Maeda; T. Ogawa; M. Haseyama; |
650 | Deformable Cross Attention for Learning Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, Transformer suffers from excessive attention computations and still brings irrelevant parts into the region of interest. Therefore, we propose a deformable cross-attention for optical flow estimation, which provides two important advantages: connecting the parts of the image globally while deforming the attention to the objects’ shapes in the image and reducing the memory consumption. |
R. Abdein; X. Xiang; N. Lv; A. E. Saddik; |
651 | Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation. |
W. Ravenscroft; S. Goetze; T. Hain; |
652 | DEHRFormer: Real-Time Transformer for Depth Estimation and Haze Removal from Varicolored Haze Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a real-time transformer for simultaneous single image Depth Estimation and Haze Removal (DEHRFormer). |
S. Chen; T. Ye; J. Shi; Y. Liu; J. Jiang; E. Chen; P. Chen; |
653 | De’hubert: Disentangling Noise in A Self-Supervised Model for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow’s redundancy-reduction principle. |
D. Ng; R. Zhang; J. Q. Yip; Z. Yang; J. Ni; C. Zhang; Y. Ma; C. Ni; E. S. Chng; B. Ma; |
654 | Delay-Aware Backpressure Routing Using Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a throughput-optimal biased backpressure (BP) algorithm for routing, where the bias is learned through a graph neural network that seeks to minimize end-to-end delay. |
Z. Zhao; B. Radojicic; G. Verma; A. Swami; S. Segarra; |
655 | Delay-Penalized Transducer for Low-Latency Streaming ASR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple way to penalize symbol delay in transducer model, so that we can balance the trade-off between symbol delay and accuracy for streaming models without external alignments. |
W. Kang; Z. Yao; F. Kuang; L. Guo; X. Yang; L. Lin; P. Żelasko; D. Povey; |
656 | Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a novel VC model, referred to as MFC-StyleVC, is proposed for the low-resource VC task. |
Z. Wang; X. Wang; L. Xie; Y. Chen; Q. Tian; Y. Wang; |
657 | Dense Adversarial Transfer Learning Based On Class-Invariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes the dense adversarial transfer learning based on class-invariance, which is a novel, unsupervised, conditional adversarial domain adaptation approach. |
B. -T. Pham; T. -Y. Wang; P. L. Thi; K. -T. Nguyen; Y. -S. Lee; T. -C. Tai; J. -C. Wang; |
658 | Densitytoken: Weakly-Supervised Crowd Counting with Density Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, we design a density token to perceive the crowd distribution in the scenes. Based on this, we propose a Dual Supervision Transformer (DSFormer) to perform weakly-supervised crowd counting in the double supervision of the total count. |
Z. Hu; B. Wang; X. Li; |
659 | Depth Estimation for A Single Omnidirectional Image with Reversed-Gradient Warming-up Thresholds Discriminator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the challenge of labelled real-world datasets generation and stability of the performance, we propose an architecture with the Reverse-gradient Warming-up Threshold Discriminator (RWTD) to estimate real-world depth maps from the synthetic ground truth. |
Y. Wu; Y. Heng; M. Niranjan; H. Kim; |
660 | DepthFormer: Multimodal Positional Encodings and Cross-Input Attention for Transformer-based Segmentation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on transformer-based deep learning architectures, that have achieved state-of-the-art performances on the segmentation task, and we propose to employ depth information by embedding it in the positional encoding. |
F. Barbato; G. Rizzoli; P. Zanuttigh; |
661 | Dereverberation in Acoustic Sensor Networks Using Weighted Prediction Error with Microphone-Dependent Prediction Delays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the last decades several multi-microphone speech dereverberation algorithms have been proposed, among which the weighted prediction error (WPE) algorithm. |
A. Lohmann; T. van Waterschoot; J. Bitzer; S. Doclo; |
662 | Design and Performance of The Low-Power Noise Reduction Algorithm of The Med-El Sonnet 2™ Cochlear Implant Audio Processor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper the performance of a very-low-complexity single channel NR algorithm implemented in commercially available CI audio processors (Sonnet 2™ and Rondo 3™ from Med-El GesmbH) is disussed. |
E. Aschbacher; F. Frühauf; A. Kurz; P. Nopp; |
663 | Design Choices for Learning Embeddings from Auxiliary Tasks for Domain Generalization in Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a conceptually simple state-of-the-art ASD system based on embeddings learned through auxiliary tasks generalizing to multiple data domains is presented. |
K. Wilkinghoff; |
664 | Designing A 3d-Aware Stylenerf Encoder for Face Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the 3D-aware GAN inversion remains under-explored. To tackle this problem, we propose a 3D-aware (3Da) encoder for GAN inversion and face editing based on the powerful StyleNeRF model. |
S. Yang; W. Wang; B. Peng; J. Dong; |
665 | Designing and Evaluating Speech Emotion Recognition Systems: A Reality Check Case Study with IEMOCAP Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: There is an imminent need for guidelines and standard test sets to allow direct and fair comparisons of speech emotion recognition (SER). While resources, such as the Interactive … |
N. Antoniou; A. Katsamanis; T. Giannakopoulos; S. Narayanan; |
666 | Designing Transformer Networks for Sparse Recovery of Sequential Data Using Deep Unfolding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep unfolding models are designed by unrolling an optimization algorithm into a deep learning network. |
B. D. Weerdt; Y. C. Eldar; N. Deligiannis; |
667 | Detail-Aware Uncalibrated Photometric Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: That is mainly due to the non-convex and non-linear nature of the problem which requires the best initialization as possible. In this context, we propose a fully interpretable formulation that combines a physically-aware image formation model under perspective projection with a minimal detail-aware initialization and that it can handle general lighting. |
A. Agudo; |
668 | Detecting Malicious Migration on Edge to Prevent Running Data Leakage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a live migration detection model to simulate the migration process, namely observing the indicator values that can obtain without high authorities and calculating the possibility of state transition. |
Y. Wong; Q. Shen; C. Li; C. Liu; T. Ai; |
669 | Detecting Out-of-Distribution Examples Via Class-Conditional Impressions Reappearing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data-free method without training on natural data, called Class-Conditional Impressions Reappearing (C2IR), which utilizes image impressions from the fixed model to recover class-conditional feature statistics. |
J. Chen; X. Qu; J. Li; J. Wang; J. Wan; J. Xiao; |
670 | Detection of Real-Time Deepfakes in Video Conferencing with Active Probing and Corneal Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe a new active forensic method to detect real-time DeepFakes. |
H. Guo; X. Wang; S. Lyu; |
671 | Dewarping Documents Using C2 Continuous Boundary Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel document dewarping algorithm that works as follows. |
P. Mondal; A. Pant; S. Soni; |
672 | DGN: Descriptor Generation Network for Feature Matching in Monocular Endoscopy 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective feature matching framework for monocular endoscopy 3D reconstruction. |
K. Zhang; W. Fan; Y. Chen; X. Luo; |
673 | Diabetic Retinopathy Grading with Weakly-Supervised Lesion Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel weakly-supervised lesion-aware network for DR grading, which enhances the discriminative features with lesion priors by only image-level supervision. |
J. Hou; F. Xiao; J. Xu; R. Feng; Y. Zhang; H. Zou; L. Lu; W. Xue; |
674 | Diagonal State Space Augmented Transformers for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models. |
G. Saon; A. Gupta; X. Cui; |
675 | Dialog Act Guided Contextual Adapter for Personalized Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a dialog act guided contextual adapter network. |
F. -J. Chang; T. Muniyappa; K. M. Sathyendra; K. Wei; G. P. Strimel; R. McGowan; |
676 | DialogMI: A Dialogue Model Based on Enhancing Dialogue Mutual Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, we propose two methods to represent the loss function of the novel task and a method to enhance the effect of the mutual information loss. |
Y. Zhang; P. Gong; Z. Wang; Z. Li; X. Yang; |
677 | Dialogue Context Modelling for Action Item Detection: Solution for ICASSP 2023 Mug Challenge Track 5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Action item detection aims at recognizing sentences containing information about actionable tasks, which can help people quickly grasp core tasks in the meeting without going through the redundant meeting contents. Therefore, in this paper, we thoroughly describe our carefully designed solution for the Action Item Detection Track of the General Meeting Understanding and Generation (MUG) challenge in the ICASSP 2023 Signal Processing Grand Challenge. |
J. Huang; X. Feng; Y. Ye; L. Zhao; X. Feng; B. Qin; T. Liu; |
678 | Dialogue System with Missing Observation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the task of online dialogue orchestration where the user feedback associated with the dialogue agent may not always be observed. |
D. Bouneffouf; M. Agarwal; I. Rish; |
679 | Dictionary Learning on Graph Data with Weisfieler-Lehman Sub-Tree Kernel and Ksvd Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we utilize sparse dictionary learning techniques as a graph embedding solution. |
K. Liyanage; R. Pearsall; C. Izurieta; B. M. Whitaker; |
680 | Difference Coarrays of Rational Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we discuss further advantages of rational arrays, by considering their difference coarrays. |
P. Kulkarni; P. P. Vaidyanathan; |
681 | Difference Guided VHR Remote Sensing Image Change Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The same object with different spectral problem caused by environment changes, such as seasonal alternation, bad weather and shadow, is the biggest challenge in multitemporal image change detection, which is more prominent in VHR images. For this problem, a novel difference guided VHR image change detection (DGCD) method is proposed in this paper. |
J. Sun; G. Liu; X. Li; Y. Yuan; |
682 | Differentiable Adaptive Short-Time Fourier Transform with Respect to The Window Length Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a gradient-based method for on-the-fly optimization for both per-frame and per-frequency window length of the short-time Fourier transform (STFT), related to previous work in which we developed a differentiable version of STFT by making the window length a continuous parameter. |
M. Leiber; Y. Marnissi; A. Barrau; M. E. Badaoui; |
683 | Differential Analysis for Networks Obeying Conservation Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: That is, given two sets of node potential observations, the goal is to estimate the structural differences between the underlying networks. We formulate this novel differential network analysis problem for systems obeying conservation laws and devise a convex estimator to learn the edge changes directly from node potentials. |
A. Rayas; R. Anguluri; J. Cheng; G. Dasarathy; |
684 | Difficulty-Aware Data Augmentor for Scene Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel difficulty-aware data augmentation framework for scene text recognition, which jointly considers the difficulty of samples and the strength of augmentations. |
G. Meng; T. Dai; B. Chen; N. Li; Y. Jiang; S. -T. Xia; |
685 | DiffPhase: Generative Diffusion-Based STFT Phase Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we build upon previous work in the speech domain, adapting a speech enhancement diffusion model specifically for STFT phase retrieval. |
T. Peer; S. Welker; T. Gerkmann; |
686 | Diffroll: Diffusion-Based Generative Music Transcription with Unsupervised Pretraining Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT). |
K. W. Cheuk; R. Sawata; T. Uesaka; N. Murata; N. Takahashi; S. Takahashi; D. Herremans; Y. Mitsufuji; |
687 | Diffusion-Based Generative Speech Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). |
R. Scheibler; Y. Ji; S. -W. Chung; J. Byun; S. Choe; M. -S. Choi; |
688 | Diffusion-Based Sound Source Localization Using Networks of Planar Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach for distributed 3D sound source localization and tracking based on networks of planar microphone arrays, each of which estimates a 2D Direction Of Arrival (DOA). |
D. Albertini; G. Greco; A. Bernardini; A. Sarti; |
689 | Diffusion Motion: Generate Text-Guided 3D Human Motion By Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and novel method for generating 3D human motion from complex natural language sentences, which describe different velocity, direction and composition of all kinds of actions. |
Z. Ren; Z. Pan; X. Zhou; L. Kang; |
690 | Diffusionnet: An Efficient Framework to Classify Single-Molecule Images with Latent Entropy Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a spatial and channel attention-based convolutional neural network (CNN) architecture with latent entropy minimization that efficiently classifies individual single-molecule images by the imaged molecules’ diffusion coefficients. |
S. Guha; O. d. Cuba; A. Gahlmann; S. T. Acton; |
691 | Diffusion Probabilistic Modeling for Fine-Grained Urban Traffic Flow Inference with Relaxed Structural Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a diffusion probabilistic augmentation-based network for considering the uncertainties of urban flows with a relaxed structural constraint and a disentangled scheme for flow map and external factor learning. |
X. Xu; Y. Wei; P. Wang; X. Luo; F. Zhou; G. Trajcevski; |
692 | DiffVoice: Text-to-Speech with Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion. |
Z. Liu; Y. Guo; K. Yu; |
693 | Digital Phenotype Representation By Statistical, Information Theory, Data-Driven Approach with Digital Health Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes the development of digital phenotype profile (DPP) to represent a user’s physical and behavioural health baseline through systematic investigations with an emphasis on robustness and explainability. |
B. Nguyen; M. Nigro; A. Rueda; V. Bhat; S. Krishnan; |
694 | Direction Aware Positional and Structural Encoding for Directed Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method for computing joint 2-node structural representations for link prediction in directed graphs. |
Y. Sium; G. Kollias; T. Idé; P. Das; N. Abe; A. Lozano; Q. Li; |
695 | Direction-of-Arrival Estimation Using Gaussian Process Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that an array sampled with spacing larger than a half wavelength can benefit from GP interpolation, providing a smaller root mean squared error in comparison to the error of conventional beamforming for DOA estimation. |
I. D. Khurjekar; P. Gerstoft; C. F. Mecklenbräuker; Z. -H. Michalopoulou; |
696 | Direct Position Determination with One-Bit Signal for Multiple Targets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The traditional direct position determination (DPD) for multiple targets usually requires transmitting raw data to the fusion center (FC), which occupies large transmission bandwidth and hardware resource. To solve this problem, we adopt one-bit analog-to-digital converters (ADCs) for a distributed subarray (DS) system, and propose an one-bit DPD method with multiple signal classification (1-bit DPD-MUSIC). |
L. Ni; D. Zhang; T. Xing; M. Ran; N. Liu; Q. Wan; |
697 | Disambiguation of Cognitive Impairment Diagnosis with EEG-Based Dual-Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, we proposed a framework containing a dual-contrastive learning structure and a multi-level temporal-spectral EEG encoder, which transformed EEG signals into embeddings and automatically updated the ambiguous labels through intra-subject and cross-subject contrastive learning. |
Z. Song; Z. Pei; H. Ren; L. Zhu; Y. Guo; Z. Zhang; |
698 | DisCoHead: Audio-and-Video-Driven Talking Head Generation By Disentangled Control of Head Pose and Facial Expressions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For realistic talking head generation, creating natural head motion while maintaining accurate lip synchronization is essential. To fulfill this challenging task, we propose DisCoHead, a novel method to disentangle and control head pose and facial expressions without supervision. |
G. Hwang; S. Hong; S. Lee; S. Park; G. Chae; |
699 | Discriminative Speaker Representation Via Contrastive Learning with Class-Aware Attention in Angular Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the first challenge, we propose a contrastive learning SV framework incorporating an additive angular margin into the supervised contrastive loss in which the margin improves the speaker representation’s discrimination ability. |
Z. Li; M. -W. Mak; H. M. -L. Meng; |
700 | Discriminative Vector Learning with Application to Single Channel Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a discriminative vector learning method and apply it to single-channel speech separation. |
H. M. Tan; K. -W. Liang; J. -C. Wang; |
701 | Disentangled and Robust Representation Learning for Bragging Classification in Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the drawback, we propose a novel bragging classification method with disentangle-based representation augmentation and domain-aware adversarial strategy. |
X. Li; Y. Zhou; |
702 | Disentangled Feature Learning for Real-Time Neural Speech Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, instead of blind end-to-end learning, we propose to learn disentangled features for real-time neural speech coding. |
X. Jiang; X. Peng; Y. Zhang; Y. Lu; |
703 | Disentangled Training with Adversarial Examples for Robust Small-Footprint Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. |
Z. Wang; L. Wan; B. Zhang; Y. Huang; S. -W. Li; M. Sun; X. Lei; Z. Yang; |
704 | Disentangling Speech from Surroundings with Neural Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. |
A. Omran; N. Zeghidour; Z. Borsos; F. de Chaumont Quitry; M. Slaney; M. Tagliasacchi; |
705 | Disentangling The Horowitz Factor: Learning Content and Style From Expressive Piano Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel framework for learning representations that disentangle musical content and performance style from expressive piano performances in an unsupervised manner. |
H. Zhang; S. Dixon; |
706 | Distance-Based Online Label Inference Attacks Against Split Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on private labels and proposes three label inference attacks based on the similarities between exchanged gradients/smashed data and sample points. |
J. Liu; X. Lyu; |
707 | Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a weight transfer regularization (WTR) loss to constrain the distance of the weights between the pre-trained model and the fine-tuned model. |
L. Zhang; Q. Wang; H. Wang; Y. Li; W. Rao; Y. Wang; L. Xie; |
708 | Distill-Quantize-Tune – Leveraging Large Teachers for Low-Footprint Efficient Multilingual NLU on Edge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes Distill-Quantize-Tune (DQT), a pipeline to create viable small-footprint multilingual models that can perform NLU on extremely resource-constrained Edge devices. |
P. Kharazmi; Z. Zhao; C. Chung; S. Choudhary; |
709 | Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the framework ranked first place in VoicePrivacy 2022 challenge, the anonymization was imperfect, since the speaker distinguishability of the anonymized speech was deteriorated. To address this issue, in this paper, we directly model the formant distribution and fundamental frequency (F0) to represent speaker identity and anonymize the source speech by the uniformly scaling formant and F0. |
J. Yao; Q. Wang; Y. Lei; P. Guo; L. Xie; N. Wang; J. Liu; |
710 | Distortion-Aware Convolutional Neural Network-Based Interpolation Filter for AVS3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposed a distortion-aware convolutional neural network-based interpolation filter (DA-NNIF) to further improve the interpolation prediction accuracy of sub-pixels with one model. |
Y. Zhang; L. Wen; L. Wang; Y. Piao; W. Shi; K. P. Choi; |
711 | Distributed Adaptive Norm Estimation for Blind System Identification in Wireless Sensor Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this contribution, we extend a distributed adaptive algorithm for blind system identification that relies on the estimation of a stacked network-wide consensus vector at each node, the computation of which requires either broadcasting or relaying of node-specific values (i.e., local vector norms) to all other nodes. |
M. Blochberger; F. Elvander; R. Ali; J. Østergaard; J. Jensen; M. Moonen; T. v. Waterschoot; |
712 | Distributed Admm with Limited Communications Via Deep Unfolding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose unfolded D-ADMM, which facilitates the application of D-ADMM with limited communications using the emerging deep unfolding methodology. |
Y. Noah; N. Shlezinger; |
713 | Distributed Bayesian Tracking on The Special Euclidean Group Using Lie Algebra Parametric Approximations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes new distributed particle filters for tracking the state of a dynamic system that evolves on the Special Euclidean Group. |
C. J. Bordin; C. G. de Figueredo; M. G. S. Bruno; |
714 | Distributed Gaussian Process Hyperparameter Optimization for Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, exploiting edge-based constraints, we propose two fully-distributed algorithms pxADMMfd and pxADMMfd,fast for a network of multi-agent systems, which do not rely on a central station. |
P. Zhai; R. T. Rajan; |
715 | Distributed Online Learning With Adversarial Participants In An Adversarial Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Interestingly, when the environment is not fully adversarial so that the losses of the honest participants are i.i.d. (independent and identically distributed), we show that sublinear stochastic regret, in contrast to the aforementioned adversarial regret, is possible. We develop a Byzantine-robust distributed online gradient descent algorithm with momentum to attain such a sublinear stochastic regret bound. |
X. Dong; Z. Wu; Q. Ling; Z. Tian; |
716 | Distributed Quantum Sensing Network with Geographically Constrained Measurement Strategies Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Distributed quantum sensing network has the potential of enhancing the precision in estimating a global function of local parameters by utilizing an entangled probe, compared with … |
Y. Cao; X. Wu; |
717 | Distributed Signal Processing for Out-of-System Interference Suppression in Cell-Free Massive MIMO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The OoS interference differs from the in-system interference from other serving users in that for OoS interference, the associated pilot signals are unknown or non-existent, which makes estimation of the OoS interferer channel difficult.In this paper, we propose a novel sequential algorithm for the suppression of OoS interference for uplink CF-mMIMO with a stripe (daisy-chain) topology. |
Z. H. Shaik; E. G. Larsson; |
718 | Distributionally Robust Multiclass Classification and Applications in Deep Image Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply the proposed method in rendering deep Vision Transformer (ViT)-based [1] image classifiers robust to random and adversarial attacks. |
R. Chen; B. Hao; I. C. Paschalidis; |
719 | Divcon: Learning Concept Sequences for Semantically Diverse Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel two-step method for diverse image captioning, generating descriptions with more diverse semantic concepts (Di-vCon). |
Y. Zheng; Y. -L. Li; S. Wang; |
720 | Diverse and Vivid Sound Generation from Text Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we go beyond specific audio generation by using natural language description as a clue to generate broad sounds. |
G. Li; X. Xu; L. Dai; M. Wu; K. Yu; |
721 | Diversifying Message Aggregation in Multi-Agent Communication Via Normalized Tensor Nuclear Norm Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While successful, GAT can lead to homogeneity in the strategies of message aggregation, which can severely limit multi-agent coordination. To address this challenge, we study the adjacency tensor of the communication graph. |
Y. Zhai; K. Xu; B. Ding; D. Feng; Z. Gao; H. Wang; |
722 | DL-NET: Dilation Location Network for Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The current methods mainly use global features for boundary matching or predefine all possible proposals, while ignoring long context information and local action boundary features, resulting in the decline of detection accuracy. To fill this gap, we propose a Dilation Location Network (DL-Net) model to generate more precise action boundaries by enhancing boundary features of actions and aggregating long contextual information in this paper. |
D. You; H. Wang; B. Liu; Y. Yu; Z. Li; |
723 | DMFormer: Closing The Gap Between CNN and Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Dynamic Multi-level Attention mechanism (DMA), which captures different patterns of input images by multiple kernel sizes and enables input-adaptive weights with a gating mechanism. |
Z. Wei; H. Pan; L. Li; M. Lu; X. Niu; P. Dong; D. Li; |
724 | DMSA: Dynamic Multi-Scale Unsupervised Semantic Segmentation Based On Adaptive Affinity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The proposed method in this paper proposes an end-to-end unsupervised semantic segmentation architecture DMSA based on four loss functions. |
K. Yang; J. Lu; |
725 | Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The research community has produced many successful self-supervised speech representation learning methods over the past few years. |
A. Elkahky; W. -N. Hsu; P. Tomasello; T. -A. Nguyen; R. Algayres; Y. Adi; J. Copet; E. Dupoux; A. Mohamed; |
726 | DocRED-FE: A Document-Level Fine-Grained Entity and Relation Extraction Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we construct a large-scale document-level fine-grained JERE dataset DocRED-FE, which improves DocRED with Fine-Grained Entity Type. |
H. Wang; W. Xiong; Y. Song; D. Zhu; Y. Xia; S. Li; |
727 | Does A Quieter City Mean Fewer Complaints? The Sounds of New York City During Covid-19 Lockdown Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Small scale studies have show this change in noise levels across different locations in the globe. In this work, we extend these studies by using historical audio data from the SONYC sensor network deployed in New York City. |
M. Cartwright; M. Fuentes; C. Mydlarz; F. Miranda; J. P. Bello; |
728 | Does Human Speech Follow Benford’s Law? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we demonstrate that human speech spectra also follow Benford’s Law, on average. |
L. Hsu; V. Berisha; |
729 | Does Your Model Think Like An Engineer? Explainable AI for Bearing Fault Detection with Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the specific task of detecting faults in rolling element bearings from vibration signals. |
T. Decker; M. Lebacher; V. Tresp; |
730 | DO-FAM: Disentangled Non-Linear Latent Navigation For Facial Attribute Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the majority of existing FAM methods struggle in meeting at least one of the two requirements: high reconstruction quality and high irrelevance preservation. To alleviate these two limitations, we propose a novel Disentangled nOn-linear latent navigation framework for FAM, termed DO-FAM. |
Y. Yuan; S. Ma; H. Shan; J. Zhang; |
731 | Domain Adaptation with External Off-Policy Acoustic Catalogs for Scalable Contextual End-to-End Automated Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the potential of leveraging external knowledge, particularly through off-policy generated text-to-speech key-value stores, to allow for flexible post-training adaptation to new data distributions. |
D. M. Chan; S. Ghosh; A. Rastrow; B. Hoffmeister; |
732 | Domain Adaptation Without Catastrophic Forgetting on A Small-Scale Partially-Labeled Corpus for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we proposed a method that leverages domain adversarial multi-task learning to reconcile the definitions of emotion classes across domains and noisy student training to utilize unlabeled data. |
Z. Zhu; Y. Sato; |
733 | Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the effective finetuning of a large-scale pretrained model for automatic speech recognition (ASR) of lowresource languages with only a one-hour matched dataset. |
K. Soky; S. Li; C. Chu; T. Kawahara; |
734 | Domain Generalized Fundus Image Segmentation Via Dual-Level Mixing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many existing methods require the help of auxiliary tasks, bringing extra computation cost. Aiming at this pitfall, this study proposes Dual-Level Mixing (DLM) to boost the diversity of the single source domain and enhance the generalization performance. |
X. Luo; W. Chen; C. Li; B. Zhou; Y. Tan; |
735 | Doppler-Coded Joint Division Multiple Access Waveform for Automotive MIMO Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Doppler-coded joint division multiple access orthogonal waveform for automotive MIMO radar. |
Y. Wang; Q. Pei; X. Hu; J. Long; H. Yu; L. Zheng; |
736 | Do Prosody Transfer Models Transfer Prosodyƒ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: During training, the reference utterance is identical to the target utterance. Yet, during synthesis, these models are often used to transfer prosody from a reference that differs from the text or speaker being synthesized.To address this inconsistency, we propose to use a different, but prosodically-related, utterance during training too. |
A. T. Sigurgeirsson; S. King; |
737 | Double Compression Detection Based on The De-Blocking Filtering of HEVC Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, this paper proposes an algorithm based on the de-blocking filtering feature mode to detect RI frames in the double compressed HEVC videos with shifted GOP structure. |
X. Kang; P. Su; Z. Huang; Y. Chen; J. Wang; |
738 | Downlink Covariance Estimation in URA FDD Massive MIMO Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a low-complexity downlink channel covariance matrix estimation for massive multiple-input multiple-output systems in which the base station (BS) is equipped with a uniform rectangular antenna array (URA). |
S. Bameri; K. Almahrog; R. H. Gohary; A. El-Keyi; Y. Ahmed; |
739 | DPP-Based Client Selection for Federated Learning with NON-IID DATA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL’s data heterogeneity issue. |
Y. Zhang; C. Xu; H. H. Yang; X. Wang; T. Q. S. Quek; |
740 | DQFORMER: Dynamic Query Transformer for Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a top-down method called Dynamic Query Transformer (DQFormer), which uses a Dynamic Lane Queries (DLQs) module to predict lane shapes. |
H. Yang; S. Lin; R. Jiang; Y. Lu; H. Wang; |
741 | DRL Path Planning for UAV-Aided V2X Networks: Comparing Discrete to Continuous Action Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Deep Reinforcement Learning (DRL)-based solution, where a novel reward function is proposed with the aim of offering a continuous service to vehicles. |
L. Spampinato; A. Tarozzi; C. Buratti; R. Marini; |
742 | Drone-vs-Bird: Drone Detection Using YOLOv7 with CSRT Tracker Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new drone detection scheme that uses a customized YOLOv7 combined with a tracker. |
S. K. Mistry; S. Chatterjee; A. K. Verma; V. Jakhetiya; B. N. Subudhi; S. Jaiswal; |
743 | DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS By Time-Frequency Domain Supervision from DSP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DSP- GAN, a GAN-based universal vocoder for high-fidelity speech synthesis by applying the time-frequency domain supervision from digital signal processing (DSP). |
K. Song; Y. Zhang; Y. Lei; J. Cong; H. Li; L. Xie; G. He; J. Bai; |
744 | DST: Deformable Speech Transformer for Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Deformable Speech Transformer, named DST, for SER task. |
W. Chen; X. Xing; X. Xu; J. Pang; L. Du; |
745 | DTTR: Detecting Text with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel transformer-based model, named detecting text with transformers (DTTR), for scene text detection. |
J. Yang; Z. You; Z. Zhong; P. Liu; L. Mei; S. Huang; |
746 | Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. |
S. Y. Sahai; J. Liu; T. Muniyappa; K. M. Sathyendra; A. Alexandridis; G. P. Strimel; R. McGowan; A. Rastrow; F. -J. Chang; A. Mouchtaris; S. Kunzmann; |
747 | Dual-Based Online Learning of Dynamic Network Topologies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aligned with reproducible research practices, we share the code developed to produce all figures included in this paper. |
S. S. Saboksayr; G. Mateos; |
748 | Dual Collaborative Visual-Semantic Mapping for Multi-Label Zero-Shot Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel dual collaborative visual-semantic mapping framework, constructing abundant connection relationships by exploring two aspects of mapping streams, i.e., the visual-to-semantic (V2S) mapping and the semantic-to-visual (S2V) mapping. |
Y. Hu; X. Jin; X. Chen; Y. Zhang; |
749 | Dual-Cycle: Self-Supervised Dual-View Fluorescence Microscopy Image Reconstruction Using CycleGAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Three-dimensional fluorescence microscopy often suffers from anisotropy, where the resolution along the axial direction is lower than that within the lateral imaging plane. We address this issue by presenting Dual-Cycle, a new framework for joint deconvolution and fusion of dual-view fluorescence images. |
T. Kerepecky; J. Liu; X. W. Ng; D. W. Piston; U. S. Kamilov; |
750 | Dual-Feature Enhancement for Weakly Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Dual-Feature Enhancement (DFE) method for WTAL, which can utilize both intra-and inter-video information. |
S. Liu; Q. Liu; Q. Chu; B. Liu; N. Yu; |
751 | Dual-graph Co-representation Learning for Knowledge-Graph Enhanced Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a dual-graph framework, named DGCR, with two graph neural networks propagating in user-item graph and knowledge graph respectively to extract topology information from both graphs. |
X. Liu; B. Liang; J. Niu; C. Sha; D. Wu; |
752 | Dual-Head Fusion Network for Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods tend to construct a uniform enhancer for the color transformation of all pixels and ignore the local context information which is significant for photographs, causing unsatisfactory results. To solve these issues, we propose a novel dual-head fusion network for image enhancement, which synthetically considers both global scenario and local content information. |
Y. Zhang; H. Zhang; L. Song; R. Xie; W. Zhang; |
753 | Dual Meta Calibration Mix for Improving Generalization in Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the lack of a large number of diverse and qualified tasks is the bottleneck of current meta-learning, which can easily lead to overfitting and therefore seriously hurt the generalization ability. In this paper, to address this challenge, we proposed Dual Meta Calibration Mix (DMCM) to improve the diversity and quality of tasks with more data. |
Z. -Y. Mi; Y. -B. Yang; |
754 | Dual-Path Cross-Modal Attention for Better Audio-Visual Speech Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes, instead, to use a dual-path attention architecture in which the audio chunk length is comparable to the duration of a video frame. |
Z. Xu; X. Fan; M. Hasegawa-Johnson; |
755 | Dual-Path Dilated Convolutional Recurrent Network with Group Attention for Multi-Channel Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a dual-path convolutional recurrent network with group attention for ICASSP Signal Processing Grand Challenge: L3DAS23 Challenge. |
J. Cheng; C. Pang; R. Liang; J. Fan; L. Zhao; |
756 | Dual Path Modeling for Semantic Matching By Perceiving Subtle Conflicts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The modification, addition and deletion of words in sentence pairs may make it difficult for the model to predict their relationship. To alleviate this problem, we propose a novel Dual Path Modeling Framework to enhance the model’s ability to perceive subtle differences in sentence pairs by separately modeling affinity and difference semantics. |
C. Xue; D. Liang; S. Wang; J. Zhang; W. Wu; |
757 | Dual-Stage Graph Convolution Network With Graph Learning For Traffic Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, few solutions are satisfied with both long and short-term prediction tasks. In this paper, we propose a novel dual-stage graph convolution network based on graph learning (DSGCN) to address these challenges. |
Z. Li; Q. Ren; L. Chen; J. Sun; |
758 | Dual-Stream Siamese Vision Transformer With Mutual Attention For Radar Gait Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a Dual-stream Siamese Vision Transformer with Mutual Attention is proposed to verify whether a pair of radar gait sequences originate from the same person or not. |
R. Ji; J. Li; W. He; J. Ren; X. Jiang; |
759 | Dual-Uncertainty Guided Curriculum Learning and Part-Aware Feature Refinement for Domain Adaptive Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the current clustering-based methods have achieved promising success, they neglect the tolerance of the model to cope with different-level noise, which may cause the model to memorize some incorrect patterns caused by label noise and overfit on them rapidly in the early stages. In this paper, we introduce a novel Dual Uncertainty guided Curriculum Learning (DUCL) method to tackle the above problems. |
Z. Liu; B. Liu; Z. Zhao; Q. Chu; N. Yu; |
760 | Dual-Use Signal Design for MIMO Radcom with Inter-Pulse Index Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the resulting nonconvex and NP-hard optimization problem, the Linearized Approximation Alternating Direction Penalty Method (LA-ADPM) is proposed. |
X. Yao; G. Cui; X. Yu; |
761 | Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose more powerful pause insertion frameworks based on a pre-trained language model. |
D. Yang; T. Koriyama; Y. Saito; T. Saeki; D. Xin; H. Saruwatari; |
762 | DVQVC: An Unsupervised Zero-Shot Voice Conversion Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Zero-shot voice conversion (VC) is to convert speech from one speaker to a target speaker while preserving the original linguistic information, given only one reference speech clip of the unseen target speaker. This work proposes a new VC model, and its key idea is to conduct thorough speaker and content disentanglement by adopting an advanced speech encoder plus vector quantization (VQ) as a content encoder, and an advanced speaker encoder for accurate speaker embedding. |
D. Li; X. Li; X. Li; |
763 | DWFormer: Dynamic Window Transformer for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue, we propose Dynamic Window transFormer (DWFormer), a new architecture that leverages temporal importance by dynamically splitting samples into windows. |
S. Chen; X. Xing; W. Zhang; W. Chen; X. Xu; |
764 | DyLiteRADHAR: Dynamic Lightweight Slowfast Network for Human Activity Recognition Using MMWAVE Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dynamic lightweight SlowFast network named DyLiteRADHAR, which can efficiently extract spatial-temporal features and largely reduce the resource consumption for human activity recognition. |
B. Sheng; Y. Bao; F. Xiao; L. Gui; |
765 | Dynamic Alignment Mask CTC: Improved Mask CTC With Aligned Cross Entropy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic programming, (2) Dynamic Rectification, creating new training samples by replacing some masks with model predicted tokens. |
X. Zhang; H. Tang; J. Wang; N. Cheng; J. Luo; J. Xiao; |
766 | Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the performance gap still remains relatively large between non-streaming and a full-contextual model trained independently. To address this, we propose a dynamic chunk-based convolution replacing the causal convolution in a hybrid Connectionist Temporal Classification (CTC)-Attention Conformer architecture. |
X. Li; G. Huybrechts; S. Ronanki; J. Farris; S. Bodapati; |
767 | Dynamic Distributed Convex Optimization Over-The-Air In Decentralized Wireless Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a truly decentralized algorithm for solving distributed convex optimization problems with possibly time-varying objectives and dynamic networks. |
N. Agrawal; R. L. G. Cavalcante; S. Stańczak; |
768 | Dynamic Fair Node Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, while the fairness of algorithms is essential for their deployment in real-world systems, this issue has never been considered in the context of dynamic graphs to the best of our knowledge. Motivated by this, the present study proposes an efficient online node representation learning framework over dynamic graphs that can also mitigate bias. |
O. D. Kose; Y. Shen; |
769 | Dynamic Independent Component Extraction with Blending Mixing Vector: Lower Bound on Mean Interference-to-Signal Ratio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new source extraction model called CvxCSV which is a parameter-reduced modification of the recent Constant Separation Vector (CSV) mixing model. |
J. Čmejla; Z. Koldovský; V. Kautský; T. Adali; |
770 | Dynamic Local and Global Context Exploration for Small Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel context-based approach called Dynamic Local and Global Context Exploration (DCE) for small object detection. |
Z. Zhang; P. Gong; H. Sun; P. Wu; X. Yang; |
771 | Dynamic Multi-View Scene Reconstruction Using Neural Implicit Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between rendering dynamic scenes and recovering static surface geometry, we propose a template-free method to reconstruct surface geometry and appearance using neural implicit representations from multi-view videos. |
D. Chen; H. Lu; I. Feldmann; O. Schreer; P. Eisert; |
772 | Dynamic Scalable Self-Attention Ensemble for Task-Free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address a more challenging learning paradigm called Task-Free Continual Learning (TFCL), in which the task information is missing during the training. |
F. Ye; A. G. Bors; |
773 | Dynamic Selection of P-norm in Linear Adaptive Filtering Via Online Kernel-based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study addresses the problem of selecting dynamically, at each time instance, the optimal p-norm to combat outliers in linear adaptive filtering without any knowledge on the potentially time-varying probability density function of the outliers. |
M. Vu; Y. Akiyama; K. Slavakis; |
774 | Dynamic Signed Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dynamic signed GL (dynSGL) method based on the assumptions that (i) at each time point signals are smooth with respect to the signed graph, i.e. signal values at two nodes connected with a positive (negative) edge are similar (dissimilar) and (ii) evolution of the graph structures is smooth across time. |
A. Karaaslanli; S. Aviyente; |
775 | Dynamic Speech Endpoint Detection with Regression Targets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditionally, speech end-pointing is based on pure classification methods along with arbitrary binary targets. In this paper, we propose a novel regression-based speech end-pointing model, which enables an end-pointer to adjust its detection behavior based on the context of user queries. |
D. Liang; H. Su; T. Singh; J. Mahadeokar; S. Puri; J. Zhu; E. Thomaz; M. Seltzer; |
776 | Dynamic Split Computing for Efficient Deep EDGE Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. |
A. Bakhtiarnia; N. Milošević; Q. Zhang; D. Bajović; A. Iosifidis; |
777 | Dynamic TF-TDNN: Dynamic Time Delay Neural Network Based on Temporal-Frequency Attention for Dialect Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we introduce a hybrid attention mechanism in both the temporal and frequency domain, called the TF-attention module, which adaptively pays more attention to the indeed important frames and the frame-level important information under different receptive fields for dialect recognition. |
C. Liao; J. Huang; H. Yuan; P. Yao; J. Tan; D. Zhang; F. Deng; X. Wang; C. Song; |
778 | Dynamic Vehicle Graph Interaction for Trajectory Prediction Based on Video Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we design a dynamic vehicle graph to represent the dynamic interaction between vehicles for trajectory prediction. |
J. Chen; W. Wang; J. Chen; M. Cai; |
779 | E2E Segmentation in A Two-Pass Cascaded Encoder ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. |
W. R. Huang; S. -Y. Chang; T. N. Sainath; Y. He; D. Rybach; R. David; R. Prabhavalkar; C. Allauzen; C. Peyser; T. D. Strohman; |
780 | Early Detection of Cognitive Decline Using Voice Assistant Commands Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop multiple unique feature sets from VAS data that can be used in the training of machine learning models. |
E. Kurtz; Y. Zhu; T. Driesse; B. Tran; J. A. Batsis; R. M. Roth; X. Liang; |
781 | EBEN: Extreme Bandwidth Extension Network Applied To Speech Signals Captured With Noise-Resilient Body-Conduction Microphones Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Extreme Bandwidth Extension Network (EBEN), a Generative Adversarial network (GAN) that enhances audio measured with body-conduction microphones. |
J. Hauret; T. Joubaud; V. Zimpfer; É. Bavu; |
782 | E-Branchformer-Based E2E SLU Toward Stop On-Device Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we report our team’s study on track 2 of the Spoken Language Understanding Grand Challenge, which is a component of the ICASSP Signal Processing Grand Challenge 2023. |
Y. Kashiwagi; S. Arora; H. Futami; J. Huynh; S. -L. Wu; Y. Peng; B. Yan; E. Tsunoo; S. Watanabe; |
783 | ECG Artifact Removal from Single-Channel Surface EMG Using Fully Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study proposed a novel denoising method to eliminate ECG artifacts from the single-channel sEMG signals using fully convolutional networks (FCN). |
K. -C. Wang; K. -C. Liu; S. -Y. Peng; Y. Tsao; |
784 | ECGT2T: Towards Synthesizing Twelve-Lead Electrocardiograms from Two Asynchronous Leads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, data generated from these devices may be insufficient for accurately diagnosing more complex cardiac conditions. To bridge this gap, we propose ECGT2T, a deep generative model that synthesizes ten leads from an asynchronous Lead I and Lead II input to simulate a 12-lead ECG. |
Y. -Y. Jo; Y. S. Choi; J. -H. Jang; J. -M. Kwon; |
785 | EEG2IMAGE: Image Reconstruction from EEG Brain Signals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we have proposed a framework for synthesizing the images from the brain activity recorded by an electroencephalogram (EEG) using small-size EEG datasets. |
P. Singh; P. Pandey; K. Miyapuram; S. Raman; |
786 | Eeg Emotion Recognition Via Ensemble Learning Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on developing a deep learning model that makes use of the spatial and temporal representations of the EEG signal to generate EEG embeddings for emotion recognition. |
B. Taha; D. Y. Hwang; D. Hatzinakos; |
787 | Effective Graph-Based Modeling of Articulation Traits for Mispronunciation Detection and Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, prior knowledge about the articulation traits of the canonical phones in the text prompt is not fully utilized in MDD. On account of this, we propose a novel end-to-end MDD method that can streamline the dictation process and the alignment process in a non-autoregressive manner. |
B. -C. Yan; H. -W. Wang; Y. -C. Wang; B. Chen; |
788 | Effectiveness of Inter- and Intra-Subarray Spatial Features for Acoustic Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the effectiveness of spatial features for acoustic scene classification (ASC) with distributed microphones. |
T. Kawamura; Y. Kinoshita; N. Ono; R. Scheibler; |
789 | Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate an inexpensive and effective alternative by mining text and audio pairs for Indian languages from public sources, specifically from the public archives of All India Radio. |
K. Bhogale; A. Raman; T. Javed; S. Doddapaneni; A. Kunchukuttan; P. Kumar; M. M. Khapra; |
790 | Effectiveness of Text, Acoustic, and Lattice-Based Representations in Spoken Language Understanding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. |
E. Villatoro-Tello; S. Madikeri; J. Zuluaga-Gomez; B. Sharma; S. Saeed Sarfjoo; I. Nigmatulina; P. Motlicek; A. V. Ivanov; A. Ganapathiraju; |
791 | Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel modeling framework for effective training of end-to-end automatic speech recognition (ASR) models on various sources of data from diverse domains: speech paired with clean ground truth transcripts, speech with noisy pseudo transcripts from semi-supervised decodes and unpaired text-only data. |
T. Fukuda; S. Thomas; |
792 | Effect of Lossy Compression Algorithms on Face Image Quality and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates the effect of lossy image compression on a state-of-the-art face recognition model, and on multiple face image quality assessment models. |
T. Schlett; S. Schachner; C. Rathgeb; J. Tapia; C. Busch; |
793 | Efficent Large-Scale Multi-Unimodular Waveform Design with Good Correlation Properties Via Direct Phase Optimizations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient algorithm for designing large-scale multi-unimodular waveforms with low correlations. |
X. Zhao; Y. Li; R. Tao; |
794 | Efficient and Effective Multi-Camera Pose Estimation with Weighted M-Estimate Sample Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a multi-camera pose estimation method by leveraging point and line correspondences with non-negligible outliers, in which a weighted M-Estimate Sample Consensus (w-MSAC) based on the customized weights and the coarse pose prior is introduced to improve the efficiency and accuracy of pose estimation. |
X. Lin; Y. Zhou; X. Zhang; Y. Liu; C. Zhu; |
795 | Efficient Compressed Video Action Recognition Via Late Fusion with A Single Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study explores another approach to reduce the computational complexity, by using a single network instead of multiple networks to process compressed video features. |
H. Terao; W. Noguchi; H. Iizuka; M. Yamamoto; |
796 | Efficient Data Loading with Quantum Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an efficient quantum autoencoder architecture that can construct a quantum state approximating the unknown classical distribution with high precision and with only linear circuit depth. |
S. -R. Wu; C. -T. Li; H. -C. Cheng; |
797 | Efficient Domain Adaptation for Speech Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a pioneering study towards building an efficient solution for FM-based speech recognition systems. |
B. Li; D. Hwang; Z. Huo; J. Bai; G. Prakash; T. N. Sainath; K. Chai Sim; Y. Zhang; W. Han; T. Strohman; F. Beaufays; |
798 | Efficient Feature Extraction for Non-Maximum Suppression in Visual Person Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In visual person detection, most NMS methods typically suffer when analyzing crowded scenes with high levels of in-between occlusions. This paper proposes a modification on a deep neural architecture for NMS, suitable for such cases and capable of efficiently cooperating with recent neural object detectors. |
C. Symeonidis; I. Mademlis; I. Pitas; N. Nikolaidis; |
799 | Efficient Feature Fusion for Learning-Based Photometric Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how to efficiently fuse features from a variable number of input images. |
Y. Ju; K. -M. Lam; J. Xiao; C. Zhang; C. Yang; J. Dong; |
800 | Efficient Implementation of Robust CUSUM Algorithm to Characterize Nanogaps Measurements with Heavy-Tailed Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper suggests an approximation in the likelihood ratio step of the CUSUM algorithm that is more robust than the simple Gaussian noise assumption and, at the same time, is computationally more efficient than computing the fitted true likelihoods. |
J. Kipen; J. Jaldén; S. N. Raja; S. Jain; |
801 | Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for human speech intelligibility evaluation based on keyword spotting. |
C. Valentini-Botinhao; A. L. Aldana Blanco; O. Klejch; P. Bell; |
802 | Efficient Large-Scale Audio Tagging Via Transformer-to-CNN Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers. |
F. Schmid; K. Koutini; G. Widmer; |
803 | Efficient Learning of Balanced Signature Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Contrary to existing two-stage approaches that consist of graph learning followed by graph clustering, we propose a one-step procedure that directly learns a perfectly clustered graph. We describe the algorithmic constituents for our approach and illustrate its superiority via numerical simulations. |
G. Matz; C. Verardo; T. Dittrich; |
804 | Efficiently Fusing Sparse Lidar for Enhanced Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on the philosophy less is more (i.e., focusing only on valid pixels in sparse LiDAR), we propose a novel framework, Efficient Sparse Depth (EffisDepth), for predicting dense depth. |
Y. Wang; M. Gong; L. Xia; Q. Zhang; J. Cheng; |
805 | Efficient Monaural Speech Enhancement with Universal Sample Rate Band-Split RNN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the usage of a recently proposed frequency-domain source separation model, the band-split RNN (BSRNN), to the task of universal-sample-rate resource efficient speech enhancement. |
J. Yu; Y. Luo; |
806 | Efficient Multi-Scale Attention Module with Cross-Spatial Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a novel efficient multi-scale attention (EMA) module is proposed. |
D. Ouyang; S. He; G. Zhang; M. Luo; H. Guo; J. Zhan; Z. Huang; |
807 | Efficient Online Convolutional Dictionary Learning Using Approximate Sparse Components Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an OCDL approach that incorporates decomposed sparse approximations instead of the training samples and substantially improves the computational costs of the existing CDL methods. |
F. G. Veshki; S. A. Vorobyov; |
808 | Efficient Personalized Federated Learning on Selective Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we found that the vector magnitude, i.e. the parameter stability, could further promote personalized FL. |
Y. Guo; F. Liu; T. Zhou; Z. Cai; N. Xiao; |
809 | Efficient Practices for Profile-to-Frontal Face Synthesis and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose three efficient practices to improve the performance of profile-to-frontal face synthesis and recognition. |
H. Wang; X. Yang; |
810 | Efficient Privacy Preserving Graph Neural Network for Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a privacy-preserving GNN to enforce privacy preservation, which utilizes a private Functional Mechanism (FM) to train the learning model. |
X. Pei; X. Deng; S. Tian; K. Xue; |
811 | Efficient Protein Structural Class Prediction Via Chaos Game Representation and Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an efficient and accurate classification scheme based on chaos game representation and recurrent neural networks. |
M. A. Zervou; E. Doutsi; P. Tsakalides; |
812 | Efficient Quantized Constant Envelope Precoding for Multiuser Downlink Massive MIMO Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the QCE precoding problem for a massive MIMO system with phase shift keying (PSK) modulation and develop an efficient approach for solving the constructive interference (CI) based problem formulation. |
Z. Wu; Y. -F. Liu; B. Jiang; Y. -H. Dai; |
813 | Efficient Siamese Network for UAV Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an efficient Siamese-based tracker (ESTrack) for aerial visual object tracking using dual global correlation and accurate center localization. |
X. Zhang; D. Wang; X. Ma; |
814 | Efficient Similarity-Based Passive Filter Pruning for Compressing CNNS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the computational complexity of computing the pairwise similarity matrix is high, particularly when a convolutional layer has many filters. To reduce the computational complexity in obtaining the pairwise similarity matrix, we propose to use an efficient method where the complete pairwise similarity matrix is approximated from only a few of its columns by using a Nyström approximation method. |
A. Singh; M. D. Plumbley; |
815 | EfficientSpeech: An On-Device Text to Speech Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, an efficient neural TTS called EfficientSpeech that synthesizes speech on an ARM CPU in real-time is proposed. |
R. Atienza; |
816 | Efficient Speech Quality Assessment Using Self-Supervised Framewise Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an efficient system with results comparable to the best performing model in the ConferencingSpeech 2022 challenge. |
K. El Hajal; Z. Wu; N. Scheidwasser-Clow; G. Elbanna; M. Cernak; |
817 | Efficient Speech Translation with Dynamic Latent Perceivers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. |
I. Tsiamas; G. I. Gállego; J. A. R. Fonollosa; M. R. Costa-jussà; |
818 | Efficient Stuttering Event Detection Using Siamese Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consolidate the corpora of publicly available speech disfluency datasets with and without labels and propose DisfluentSiam – an efficient siamese network-based small-scale pretraining pipeline using task-specific data from multiple domains with only 10M trainable parameters. |
P. Mohapatra; B. Islam; M. T. Islam; R. Jiao; Q. Zhu; |
819 | Efficient Super-Resolution for Compression Of Gaming Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a super-resolution framework that improves the coding efficiency of computer-generated gaming videos at low bitrates. |
Y. Wang; L. Murn; L. Herranz; F. Yang; M. Mrak; W. Zhang; S. Wan; M. G. Blanch; |
820 | Efficient Uncertainty Estimation with Gaussian Process for Reliable Dialog Response Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though basic calibration methods like Monte Carlo Dropout and Ensemble can calibrate well, these methods are time-consuming in the training or inference stages. To tackle these challenges, we propose an efficient uncertainty calibration framework GPF-BERT for BERT-based conversational search, which employs a Gaussian Process layer and the focal loss on top of the BERT architecture to achieve a high-quality neural ranker. |
T. Ye; Z. Li; J. Wang; N. Cheng; J. Xiao; |
821 | EGAN: A Neural Excitation Generation Model Based on Generative Adversarial Networks with Harmonics and Noise Input Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a speech synthesis method based on source-filter modeling. |
Y. -T. Lin; C. -Y. Chiang; |
822 | Egocentric Action Anticipation for Personal Health Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The egocentric action anticipation task consists in predicting future (unobserved) actions based on input from a wearable first-person view camera. In this work, we are focusing on applying egocentric action anticipation methods to the personal health domain, i.e., utilizing them for the analysis of dietary and hygienic activities routine and for the prevention of undesirable behavior (such as tasting food before washing hands or adding sugar). |
I. Rodin; A. Furnari; D. Mavroeidis; G. M. Farinella; |
823 | Egocentric Audio-Visual Noise Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare object recognition and action classification-based visual feature extractors and investigate methods to align audio and visual representations. |
R. Sharma; W. He; J. Lin; E. Lakomkin; Y. Liu; K. Kalgaonkar; |
824 | EH-Enabled Distributed Detection Over Temporally Correlated Markovian MIMO Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider limited feedback of channel gain, defined as the Frobenius norm of MIMO channel matrix, at a fixed feedback frequency (e.g., every T time slots) from the FC to sensors. Modeling the randomly arriving energy units as a Poisson process and the quantized channel gain and the battery dynamics as homogeneous finite-state Markov chains, we propose an adaptive transmit power control strategy such that the J-divergence based detection metric is maximized at the FC, subject to an average transmit power per-sensor constraint. |
G. Ardeshiri; A. Vosoughi; |
825 | EI2SR: Learning An Enhanced Intra-Instance Semantic Relationship for Arbitrary-Shaped Scene Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards arbitrary-shaped text detection, fracture detection is the major concern due to the lack of semantic relationship within an instance in existing methods. To circumvent this dilemma, we propose a novel network to learn an Enhanced Intra-Instance Semantic Relationship (EI2SR) which consists of Text-Specific Attention Mechanism (TAM) and Border Attraction Grouping (BAG). |
Y. Shu; S. Liu; Y. Zhou; H. Xu; F. Jiang; |
826 | Eigen-Decomposition-Free Directed Graph Sampling Via Gershgorin Disc Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Graph sampling is the problem of choosing a node subset via sampling matrix H ∈ {0, 1}K×N to collect samples y = Hx ∈ℝK, K < N, so that the target signal x ∈ ℝN can be ... |
Y. Li; H. Vicky Zhao; G. Cheung; |
827 | Elastic Graph Transformer Networks for EEG-Based Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To extract more informative representations, we propose an elastic Graph Transformer network for emotion recognition (EmoGT) inspired by the advantages of Transformer in time-series analysis and the superior performance of graph convolutional networks in topological analysis. |
W. -B. Jiang; X. Yan; W. -L. Zheng; B. -L. Lu; |
828 | Electric Network Frequency Detection Using Least Absolute Deviations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel Least Absolute Deviations-based ENF detector is proposed that is coined as LAD-Likelihood Ratio Test (LAD-LRT). |
C. Korgialas; C. Kotropoulos; |
829 | Element Selection with Wide Class of Optimization Criteria Using Non-Convex Sparse Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these algorithms are applicable to only a specific class of optimization criteria such as the minimization of the squared error loss between the original and restored data. To overcome this limitation, we propose an element selection algorithm based on non-convex sparse optimization that can be used with a wider class of optimization criteria than conventional algorithms. |
T. Kawamura; N. Ueno; N. Ono; |
830 | Elliptical Wishart Distribution: Maximum Likelihood Estimator from Information Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work deals with elliptical Wishart distributions on the set of symmetric positive definite matrices. |
I. Ayadi; F. Bouchard; F. Pascal; |
831 | Embedding A Differentiable Mel-Cepstral Synthesis Filter to A Neural Speech Synthesis System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. |
T. Yoshimura; S. Takaki; K. Nakamura; K. Oura; Y. Hono; K. Hashimoto; Y. Nankaku; K. Tokuda; |
832 | Embrace Smaller Attention: Efficient Cross-Modal Matching with Dual Gated Attention Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Dual Gated Attention Fusion (DGAF) unit to save cross-modal matching from heavy attention computation. |
W. Guo; X. Kong; |
833 | EMC2-Net: Joint Equalization and Modulation Classification Based on Constellation Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel MC technique dubbed as Joint Equalization and Modulation Classification based on Constellation Network (EMC2-Net). |
H. Ryu; J. Choi; |
834 | EMCLR: Expectation Maximization Contrastive Learning Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the complexity, inspired by the traditional EM algorithm, we derive the embedding matrix of each batch with optimally uniform distribution and discard the uniformity part in objectives. |
M. Liu; R. Yi; L. Ma; |
835 | EMIX: A Data Augmentation Method for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel data augmentation (DA) method for the SER problem, namely EMix, which is simple but effective. |
A. Dang; T. H. Vu; L. Dinh Nguyen; J. -C. Wang; |
836 | Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose EmoDiff, a diffusion-based TTS model where emotion intensity can be manipulated by a proposed soft-label guidance technique derived from classifier guidance. |
Y. Guo; C. Du; X. Chen; K. Yu; |
837 | Emotion Recognition in Conversation from Variable-Length Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we explore the benefits of variable-length context and propose a more effective approach to ERC. |
M. Zhang; X. Zhou; W. Chen; M. Zhang; |
838 | Empathetic Response Generation Via Emotion Cause Transition Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an emotion cause transition graph to explicitly model the natural transition of emotion causes between two adjacent turns in empathetic dialogue. |
Y. Qian; B. Wang; T. -E. Lin; Y. Zheng; Y. Zhu; D. Zhao; Y. Hou; Y. Wu; Y. Li; |
839 | Enabling Large-Scale Image Search with Co-Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an efficient and effective query sensitive co-attention mechanism for large scale CBIR tasks. |
Z. Hu; A. G. Bors; |
840 | Encoder-Decoder Graph Convolutional Network for Automatic Timed-Up-and-Go and Sit-to-Stand Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel Encoder-Decoder Graph Convolutional Network (ED-GCN) to perform auto-segmentation on two widely accepted clinical tests for human mobility and balance assessment: the Timed-Up-and-Go (TUG) test and the Sit-to-Stand (STS) test. |
B. Wen; C. Du; T. Q. Nguyen; |
841 | End-to-End Amp Modeling: from Data to Controllable Guitar Amplifier Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a data-driven approach to creating real-time neural network models of guitar amplifiers, recreating the amplifiers’ sonic response to arbitrary inputs at the full range of controls present on the physical device. |
L. Juvela; E. -P. Damskägg; A. Peussa; J. Mäkinen; T. Sherson; S. I. Mimilakis; K. Rauhanen; A. Gotsopoulos; |
842 | End-to-End Classification of Cell-Cycle Stages with Center-Cell Focus Tracker Using Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for automatic detection of cell-cycle stages using a recurrent neural network (RNN). |
A. Jose; R. Roy; D. Eschweiler; I. Laube; R. Azad; D. Moreno-Andrés; J. Stegmaier; |
843 | End-to-End Neural Audio Coding in The MDCT Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective frequency-domain neural audio coding paradigm that adopts the modified discrete cosine transform (MDCT) for analysis and synthesis and DNNs for the quantization of variables. |
H. Lim; J. Lee; B. H. Kim; I. Jang; H. -G. Kang; |
844 | End-to-End Non-Autoregressive Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a semantic retrieval module that uses image features to retrieve semantic information as input of the non-autoregressive decoder, narrowing the performance gap between the non-autoregressive and the autoregressive model. |
H. Yu; Y. Liu; B. Qi; Z. Hu; H. Liu; |
845 | End-to-End Spoken Language Understanding Using Joint CTC Loss and Self-Supervised, Pretrained Acoustic Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage self-supervised acoustic encoders fine-tuned with Connectionist Temporal Classification (CTC) to extract textual embeddings and use joint CTC and SLU losses for utterance-level SLU tasks. |
J. Wang; M. Radfar; K. Wei; C. Chung; |
846 | End-to-End Spoken Language Understanding with Tree-Constrained Pointer Generator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper exploits contextual biasing, a technique to improve the speech recognition of rare words, in end-to-end SLU systems. |
G. Sun; C. Zhang; P. C. Woodland; |
847 | End-to-End Unsupervised Sketch to Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More importantly, the generated images are of low quality with distorted textures and colors, and shape deformation. In order to overcome the above challenges, we propose an end-to-end method to accomplish the freehand sketch-to-image task, and the proposed architecture is based on an unsupervised network. |
X. Lv; L. Wu; Z. Cheng; X. Meng; |
848 | End-to-End Word-Level Disfluency Detection and Classification in Children’s Reading Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose a novel attention-based model to perform word-level disfluency detection and classification in a fully end-to-end (E2E) manner making it fast and easy to use. |
L. Venkatasubramaniam; V. Sunder; E. Fosler-Lussier; |
849 | Energy Efficiency Maximization in RIS-aided Networks with Global Reflection Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Two optimization methods are proposed to optimize the mobile users’ powers, the RIS coefficients, and the linear receive filters. |
R. K. Fotock; A. Zappone; M. D. Renzo; |
850 | Energy Regularized RNNS for Solving Non-stationary Bandit Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to balance between exploration and exploitation, we present an energy minimization term that pre- vents the neural network from becoming too confident in support of a certain action. |
M. Rotman; L. Wolf; |
851 | Enhanced Coprime Array Configuration for DoA Estimation of Non-Circular Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, both the fundamental criteria of high DoFs and reduced mutual coupling are considered in the design of the proposed array configuration for the direction of arrival (DoA) estimation of non-circular signals. |
N. Mohsen; A. Hawbani; X. Wang; B. Bairrington; L. Zhao; S. Alsamhi; |
852 | Enhanced Dcf Tracker Regularized By Reliable Sample Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink the reliability problem of input samples in advance during template updating and propose an enhanced DCF tracking method regularized by a novel sparse representation based reliable sample construction term, called enhanced sparse correlation filter (ESCF). |
K. Hu; M. Cao; M. Wang; L. Lan; W. Yang; H. Tan; |
853 | Enhanced Embeddings in Zero-Shot Learning for Environmental Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study instead uses a modified YAMNet network to obtain semantic audio embeddings for zero-shot learning. |
Y. Sims; A. Mendes; S. Chalup; |
854 | Enhanced GM-PHD Filter for Real Time Satellite Multi-Target Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a real-time multi-object tracker using an enhanced version of the Gaussian mixture probability hypothesis density (GM-PHD) filter to track detections of a state-of-the-art convolutional neural network (CNN). |
C. Aguilar; M. Ortner; J. Zerubia; |
855 | Enhanced Low-Resolution LiDAR-Camera Calibration Via Depth Interpolation and Supervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the increasing application of low-resolution LiDAR, we target the problem of low-resolution LiDAR-camera calibration in this work. |
Z. Zhang; Z. Yu; S. You; R. Rao; S. Agarwal; F. Ren; |
856 | Enhancement of Text-Predicting Style Token With Generative Adversarial Network for Expressive Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes an advanced text-predicting style embedding for expressive speech synthesis. |
H. Kanagawa; Y. Ijima; |
857 | Enhance Transferability of Adversarial Examples with Model Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest alleviating the overfitting issue from a novel perspective, i.e., designing a fitted model architecture. |
M. Fan; W. Guo; Z. Ying; X. Liu; |
858 | Enhancing and Adversarial: Improve ASR with Speaker Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. |
W. Zhou; H. Wu; J. Xu; M. Zeineldeen; C. Lüscher; R. Schlüter; H. Ney; |
859 | Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel transformer structure with contrastive learning to align different modalities. |
Y. Wei; S. Yuan; M. Chen; L. Wang; |
860 | Enhancing Ontology Translation Through Cross-Lingual Agreement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce cross-lingual agreement to alleviate the aforementioned issues. |
M. Tian; F. Giunchiglia; R. Song; X. Chen; H. Xu; |
861 | Enhancing Representation Learning with Deep Classifiers in Presence of Shortcut Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method to improve the representations learned by deep neural image classifiers in spite of a shortcut in upstream data. |
A. Ahmadian; F. Lindsten; |
862 | Enhancing Robustness and Imperceptibility of Blind Watermarking with Improved Message Processor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Not only that, MBRS is easy to generate chessboard artifacts, resulting in the generated watermark being easy to be detected by the human eye. Therefore, we construct a more generalized watermarking framework and propose an improved blind watermarking method. |
Y. Wu; B. Wang; C. Dai; Y. Yuan; B. Li; W. Zheng; H. Wu; |
863 | Enhancing Spatio-Spectral Regularization By Structure Tensor Modeling for Hyperspectral Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new regularization function, named Spatio-Spectral Structure Tensor Total Variation (S3TTV), for hyperspectral image (HSI) denoising. |
S. Takemoto; S. Ono; |
864 | Enhancing Speech-To-Speech Translation with Multiple TTS Targets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze the effect of changing synthesized target speech for direct S2ST models. |
J. Shi; Y. Tang; A. Lee; H. Inaguma; C. Wang; J. Pino; S. Watanabe; |
865 | Enhancing The Accuracy of Resistive In-Memory Architectures Using Adaptive Signal Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through behavioral modeling and simulation, we demonstrate that ASC enhances the compute signal-to-noise ratio (SNR) by 8.6dB-to-13.9dB with negligible energy overhead. |
H. -M. Ou; N. R. Shanbhag; |
866 | Enhancing The Efficiency of WMMSE and FP for Beamforming By Minorization-Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To obtain the optimal Lagrange multiplier, we must repeatedly inverse an M × M matrix, where M is the number of transmit antennas, which incurs considerable complexity. To address the above issue, this work explores the connection of WMMSE and FP to minorization-maximization (MM), thereby modifying the two methods to get rid of the Lagrange multiplier. |
Z. Zhang; Z. Zhao; K. Shen; |
867 | Enhancing The Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our previous work, this work proposes a melody-unsupervised multi-speaker pretraining method conducted on a multi-singer dataset to enhance the vocal range of the single-speaker, while not degrading the timbre similarity. |
S. Zhou; X. Li; Z. Wu; Y. Shan; H. Meng; |
868 | Enhancing Unsupervised Speech Recognition with Diffusion GANS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusionGAN. |
X. Wu; |
869 | Enlightening The Student in Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the student may find a hard time absorbing the knowledge from a sophisticated teacher due to the capacity and confidence gaps between them. To address this issue, a new knowledge distillation and refinement (KDrefine) framework is proposed to enlighten the student by expending and refining its network structure. |
Y. Zheng; C. Wang; Y. Chen; J. Qian; J. Wang; J. Wu; |
870 | Enrollment Rate Prediction in Clinical Trials Based on CDF Sketching and Tensor Factorization Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We can always sketch a multivariate CDF in terms of multidimensional empirical cumulative probability array, i.e., a finite grid-sampled CDF tensor, and introduce a low-rank parametrization by a Canonical Polyadic Decomposition (CPD) model. |
M. Amiridi; C. Qian; N. D. Sidiropoulos; L. M. Glass; |
871 | Ensemble and Personalized Transformer Models for Subject Identification and Relapse Detection in E-Prevention Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this short paper, we present the devised solutions for the subject identification and relapse detection tasks, which are part of the e-Prevention Challenge hosted at the ICASSP 2023 conference [1] [2] [3]. |
S. Calcagno; R. Mineo; D. Giordano; C. Spampinato; |
872 | Ensemble Graph Q-Learning for Large Scale Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Numerical results show that the proposed algorithm achieves a reduction of 60% with respect to the policy error and 80% for the runtime versus other state-of-the-art Q-learning algorithms. |
T. Bozkus; U. Mitra; |
873 | Ensemble Knowledge Distillation of Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously. |
K. . -P. Huang; T. -H. Feng; Y. -K. Fu; T. -Y. Hsu; P. -C. Yen; W. -C. Tseng; K. -W. Chang; H. -Y. Lee; |
874 | Ensemble of Deep Neural Network Models for MOS Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our contribution to the ongoing research: a system for automatic prediction of the mean opinion score (MOS) given by human listeners. |
M. Kunešová; J. Matoušek; J. Lehečka; J. Švec; J. Michálek; D. Tihelka; M. Bulín; Z. Hanzlíček; M. Řezáčková; |
875 | Ensemble Prosody Prediction For Expressive Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We construct simple ensembles of prosody predictors by varying either model architecture or model parameter values.To automatically select amongst the models in the ensemble when performing Text-to-Speech, we propose a novel, and computationally trivial, variance-based criterion. |
T. H. Teh; V. Hu; D. S. Ram Mohan; Z. Hodari; C. G. R. Wallis; T. Gómez Ibarrondo; A. Torresquintero; J. Leoni; M. Gales; S. King; |
876 | Entropy Based Feature Regularization to Improve Transferability of Deep Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we are interested in the problem of solving fine-grain classification or regression, using a model trained on coarse-grain labels only. |
R. Baena; L. Drumetz; V. Gripon; |
877 | Epic-Sounds: A Large-Scale Dataset of Actions That Sound Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos from EPIC-KITCHENS-100. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. |
J. Huh; J. Chalk; E. Kazakos; D. Damen; A. Zisserman; |
878 | Equivalence of Aperture Reduction in Element Space and Constrained Combination of DFT Beams in Beamspace Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an analytical proof of equivalence of the signal processing in the reduced aperture element space and in beamspace produced by the combination of multiple adjacent DFT beams with a subsequent constraining of the resulting magnitudes. |
D. Rakhimov; M. Haardt; |
879 | ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. |
W. Ma; T. Hou; Q. Di; Z. Qi; Y. Shan; H. Wang; |
880 | Error Analysis of Convolutional Beamspace Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, theoretical MSE of CBS is given when MUSIC or root-MUSIC is used. |
P. -C. Chen; P. P. Vaidyanathan; |
881 | ERSAM: Neural Architecture Search for Energy-Efficient and Real-Time Social Ambiance Measurement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). |
C. Li; W. Chen; J. Yuan; Y. C. Lin; A. Sabharwal; |
882 | ESCL: Equivariant Self-Contrastive Learning for Sentence Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous contrastive learning methods for sentence representations often focus on insensitive transformations to produce positive pairs, but neglect the role of sensitive transformations that are harmful to semantic representations. Therefore, we propose an Equivariant Self-Contrastive Learning (ESCL) method to make full use of sensitive transformations, which encourages the learned representations to be sensitive to certain types of transformations with an additional equivariant learning task. |
J. Liu; Y. Liu; X. Han; C. Deng; J. Feng; |
883 | Estimating Acoustic Direction of Arrival Using A Single Structural Sensor on A Resonant Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, sensors are affixed to an acrylic panel and used to record acoustic noise signals at various angles of incidence. |
T. DiPassio; M. C. Heilemann; B. Thompson; M. F. Bocko; |
884 | Estimating and Analyzing Neural Information Flow Using Signal Processing on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply the diffusion model to micro-electrocorticography (ECoG) recordings from sensorimo-tor cortex of two non-human primates to estimate the neural communication flow during excitatory optogenetics. |
F. Schwock; J. Bloch; L. Atlas; S. Abadi; A. Yazdan-Shahmorad; |
885 | Estimating Inharmonic Signals with Optimal Transport Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the problem of estimating the frequency content of inharmonic signals, i.e., sinusoidal mixtures whose components are close to forming a harmonic set. |
F. Elvander; |
886 | Estimating Normalized Graph Laplacians in Financial Markets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More precisely, we design an optimization algorithm to learn precision matrices that are modeled as normalized graph Laplacians. |
J. V. de M. Cardoso; J. Ying; S. Kumar; D. P. Palomar; |
887 | Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for using the proxy acoustic model to estimate Shapley values for variable length utterances and demonstrate that the Shapley values provide a signal of example quality. |
A. Raza Syed; M. I. Mandel; |
888 | Estimating Uncertainty On Video Quality Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, whether such improvement or fault-detection is worth a significant increase in power consumption is questionable. Therefore, the goal of this work is to propose a method to predict the uncertainty of the quality metric. |
P. David; P. L. Callet; S. Ling; H. Wang; I. Katsavounidis; Z. Shahid; C. Stejerean; |
889 | Estimation of Cardiac Fibre Direction Based on Activation Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel method to estimate the fibre direction from EGMs is presented. |
J. W. de Vries; M. Sun; N. M. S. de Groot; R. C. Hendriks; |
890 | Estimation of High-Dimensional Differential Graphs from Multi-Attribute Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze a group lasso penalized D-trace loss function approach for differential graph learning from multi-attribute data. |
J. K. Tugnait; |
891 | Estimation of Time-Varying Graph Topologies from Graph Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe these signals, and the interest is in finding the time-varying topology of the graphs. We propose two Bayesian methods for estimating these topologies without assuming any specific functional relationships among the signals on the graphs. |
Y. Liu; C. Cui; M. Ajirak; P. M. Djurić; |
892 | Estimation of Visual Contents from Human Brain Signals Via VQA Based on Brain-Specific Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a method for estimation of visual cognitive contents from human brain signals via a newly derived visual question answering (VQA) model. |
R. Shichida; R. Togo; K. Maeda; T. Ogawa; M. Haseyama; |
893 | Euro: Espnet Unsupervised ASR Open-Source Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). |
D. Gao; J. Shi; S. -P. Chuang; L. P. Garcia; H. -Y. Lee; S. Watanabe; S. Khudanpur; |
894 | Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech processing tasks. |
Y. Li; A. Mehrish; R. Bhardwaj; N. Majumder; B. Cheng; S. Zhao; A. Zadeh; R. Mihalcea; S. Poria; |
895 | Evaluating Speech–Phoneme Alignment and Its Impact on Neural Text-To-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we perform experiments with five state-of-the-art speech–phoneme aligners and evaluate their output with objective and subjective measures. |
F. Zalkow; P. Govalkar; M. Müller; E. A. P. Habets; C. Dittmar; |
896 | Evaluating Variants of Wav2vec 2.0 on Affective Vocal Burst Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous studies focused on predicting affective state from speech; this study explores various tasks on affective vocal bursts. |
B. T. Atmaja; A. Sasou; |
897 | Evaluation of Categorical Generative Models – Bridging The Gap Between Real and Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on categorical data and introduce an appropriately scalable evaluation method. |
F. Regol; A. Kroon; M. Coates; |
898 | Event-Based Visual Microphone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose event-based visual microphone (EBVM), a passive electro-optical technique for capturing audio signals remotely using an event camera. |
M. Howard; K. Hirakawa; |
899 | Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the scope of previous analyses is extensive in acoustic, phonetic, and semantic perspectives, the physical grounding by speech production has not yet received full attention. To bridge this gap, we conduct a comprehensive analysis to link speech representations to articulatory trajectories measured by electromagnetic articulography (EMA). |
C. J. Cho; P. Wu; A. Mohamed; G. K. Anumanchipalli; |
900 | Evopose: A Recursive Transformer for 3D Human Pose Estimation with Kinematic Structure Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel transformer-based model EvoPose to introduce the human body prior knowledge for 3D human pose estimation effectively. |
Y. Zhang; Y. Lu; B. Liu; Z. Zhao; Q. Chu; N. Yu; |
901 | Expectation Propagation on Factor Graphs Based on Matrix Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate on new factor graph representations induced by the use of the Golub-Kahan bi-diagonal Decomposition (GKD) and of the Singular Value Decomposition (SVD). |
A. Mekhiche; A. M. Cipriano; C. Poulliat; |
902 | Explainable Audio Classification of Playing Techniques with Layer-wise Relevance Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we propose a data-driven approach to explain audio classification in terms of physical attributes in sound production. |
C. Wang; V. Lostanlen; M. Lagrange; |
903 | Explanations for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address quality assessment for neural network based ASR by providing explanations that help increase our understanding of the system and ultimately help build trust in the system. |
X. Wu; P. Bell; A. Rajan; |
904 | Explicit and Implicit Knowledge Distillation Via Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Experimental results show that our method can quickly converge and obtain higher accuracy than other state-of-the-art methods. |
Y. Wang; Z. Ge; Z. Chen; X. Liu; C. Ma; Y. Sun; L. Qi; |
905 | Explicit Ziv-Zakai Bound For Multiple Sources Doa Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In direction-of-arrival (DOA) estimation, Cramér-Rao bound is widely used to lower bound the mean square error (MSE), which, however, is a local bound. As a global bound, existing … |
Z. Zhang; Y. Gu; Z. Shi; |
906 | Exploiting 3D Human Recovery for Action Recognition with Spatio-Temporal Bifurcation Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel action recognition method with 3D human recovery and spatio-temporal bifurcations fusion. |
N. Jiang; W. Quan; Q. Geng; Z. Shi; P. Xu; |
907 | Exploiting CCTV Cameras for Hand Hygiene Recognition in ICU Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we created a clinical dataset using CCTV cameras installed in ICU to explore the feasibility of recognizing the hand-washing steps of clinicians. |
W. Huang; J. Huang; G. Wang; H. Lu; M. He; W. Wang; |
908 | Exploiting Interactivity and Heterogeneity for Sleep Stage Classification Via Heterogeneous Graph Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, those methods neglect the significance of simultaneously capturing the interactivity and heterogeneity of physiological signals. In this paper, we propose a novel Sleep Heterogeneous Graph Neural Network (SleepHGNN) to employ these essential features. |
Z. Jia; Y. Lin; Y. Zhou; X. Cai; P. Zheng; Q. Li; J. Wang; |
909 | Exploiting Modality-Invariant Feature for Robust Multimodal Emotion Recognition with Missing Modalities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the studies to predict the missing data across modalities, the inherent difference between heterogeneous modalities, namely the modality gap, presents a challenge. |
H. Zuo; R. Liu; J. Zhao; G. Gao; H. Li; |
910 | Exploiting Multi-Decision and Deep Refinement for Ultrasound Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel convolutional neural network (MDR-Net) for ultrasound image segmentation by exploiting multi-decision and deep refinement of the target. |
W. Liu; X. Li; K. Hu; X. Gao; |
911 | Exploiting One-Class Classification Optimization Objectives for Increasing Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We attribute their famous lack of robustness to the geometric properties of the deep neural network embedding space, derived from standard optimization options, which allow minor changes in the intermediate activation values to trigger dramatic changes to the decision values in the final layer. To counteract this effect, we explore optimization criteria that supervise the distribution of the intermediate embedding spaces, in a class-specific basis, by introducing and leveraging one-class classification objectives. |
V. Mygdalis; I. Pitas; |
912 | Exploiting PRNU and Linear Patterns in Forensic Camera Attribution Under Complex Lens Distortion Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the two main existing techniques, namely PRNU (Photo Response Non Uniformity)-based and linear-pattern-based, can be successfully combined to improve performance. |
A. Montibeller; F. Pérez-González; |
913 | Exploiting Prompt Learning with Pre-Trained Language Models for Alzheimer’s Disease Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function. |
Y. Wang; J. Deng; T. Wang; B. Zheng; S. Hu; X. Liu; H. Meng; |
914 | Exploiting Semantic Attributes for Transductive Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they neglect the semantic information in the unlabeled unseen data and thus fail to generate high-fidelity attribute-consistent unseen features. To address this issue, we present a novel transductive ZSL method that produces semantic attributes of the unseen data and imposes them on the generative process. |
Z. Wang; J. Liang; Z. Wang; T. Tan; |
915 | Exploiting Sparse Recovery Algorithms for Semi-Supervised Training of Deep Neural Networks for Direction-of-Arrival Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a semi-supervised training approach for a direction-of-arrival (DoA) estimation based on a convolutional neural network (CNN). |
M. Ali; A. A. Nugraha; K. Nathwani; |
916 | Exploiting Spatial Information with The Informed Complex-Valued Spatial Autoencoder for Target Speaker Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, it has been shown that for neural spatial filtering a joint approach of spectro-spatial filtering is more beneficial. In this contribution, we investigate the spatial filtering performed by such a time-varying spectro-spatial filter. |
A. Briegleb; M. M. Halimeh; W. Kellermann; |
917 | Exploiting Speaker Embeddings for Improved Microphone Clustering and Speech Separation in Ad-hoc Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose features generated by the state-of-the-art ECAPA-TDNN speaker verification model for the clustering. |
S. Kindt; J. Thienpondt; N. Madhu; |
918 | Exploiting Virtual Array Diversity for Accurate Radar Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Radatron++, a system leverages cascaded MIMO (Multiple-Input Multiple-Output) radar to achieve accurate vehicle detection for self-driving cars. |
J. Guan; S. Madani; W. Ahmed; S. Hussein; S. Gupta; H. Hassanieh; |
919 | Exploration Into Translation-Equivariant Image Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of focusing on anti-aliasing, we propose a simple yet effective way to achieve translation-equivariant image quantization by enforcing orthogonality among the codebook embeddings. |
W. Shin; G. Lee; J. Lee; E. Lyou; J. Lee; E. Choi; |
920 | Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since the two settings have been studied individually in general, there has been little research focusing on how effective a cross-lingual model is in comparison with a monolingual model. In this paper, we investigate this fundamental question empirically with Japanese automatic speech recognition (ASR) tasks. |
T. Ashihara; T. Moriya; K. Matsuura; T. Tanaka; |
921 | Exploring Approaches to Multi-Task Automatic Synthesizer Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand the current literature by exploring approaches to automatic synthesizer programming for multiple virtual instruments. |
D. Faronbi; I. Roman; J. P. Bello; |
922 | Exploring Attention Mechanisms for Multimodal Emotion Recognition in An Emergency Call Center Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The experiments conducted in this paper use the CEMO, which was collected in a French emergency call center. |
T. Deschamps-Berger; L. Lamel; L. Devillers; |
923 | Exploring Binary Classification Loss for Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SphereFace2 framework which uses several binary classifiers to train the speaker model in a pair-wise manner instead of performing multi-classification. |
B. Han; Z. Chen; Y. Qian; |
924 | Exploring Complementary Features in Multi-Modal Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The primary challenge is how to effectively extract the complementary emotional information implied in the pre-trained features of the respective modality. To tackle this challenge, we propose a novel modality-sensitive multimodal speech emotion recognition framework. |
S. Wang; Y. Ma; Y. Ding; |
925 | Exploring Instance Relation for Decentralized Multi-Source Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the domain shift on the decentralized source domains and target domain, we propose an instance relation consistency method for decentralized multi-source domain adaptation. |
Y. Wei; Y. Han; |
926 | Exploring Language-Agnostic Speech Representations Using Domain Knowledge for Detecting Alzheimer’s Dementia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we describe our approach to the ICASSP 2023 Signal Processing Grand Challenge, which involves extrapolating from models learned from English speech samples, to Greek speech samples, to determine which subjects have AD. |
Z. Shah; S. -A. Qi; F. Wang; M. Farrokh; M. Tasnim; E. Stroulia; R. Greiner; M. Plitsis; A. Katsamanis; |
927 | Exploring Progressive Hybrid-Degraded Image Processing for Homography Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies have shown that machine models do not coincide with the human perception of image quality, making mainstream image enhancement methods not always compatible with downstream tasks. To ameliorate this issue, this paper targets homography estimation, which is a fundamental step in image interpretation, to explore a hybrid-degraded image enhancement approach. |
Y. Lin; X. Su; F. Wu; J. Zhao; |
928 | Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a series of approaches to integrate domain adapted Self-Supervised Learning (SSL) pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends and domain adapted wav2vec2.0 speech representations; b) frame-level joint decoding of TDNN systems separately trained using standard acoustic features alone and with additional wav2vec2.0 features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain adapted wav2vec2.0 models. |
S. Hu; X. Xie; Z. Jin; M. Geng; Y. Wang; M. Cui; J. Deng; X. Liu; H. Meng; |
929 | Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. |
B. Labrador; G. Zhao; I. L. Moreno; A. Scorza Scarpati; L. Fowl; Q. Wang; |
930 | Exploring Subgroup Performance in End-to-End Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: End-to-End Spoken Language Understanding models are generally evaluated according to their overall accuracy, or separately on (a priori defined) data subgroups of interest. We propose a technique for analyzing model performance at the subgroup level, which considers all subgroups that can be defined via a given set of metadata and are above a specified minimum size. |
A. Koudounas; E. Pastor; G. Attanasio; V. Mazzia; M. Giollo; T. Gueudre; L. Cagliero; L. de Alfaro; E. Baralis; D. Amberti; |
931 | Exploring The Role of Fricatives in Classifying Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis and Parkinson’s Disease Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines three sustained voiceless fricatives – /s/, /sh/ and /f/, as compared to three sustained vowels – /a/, /i/ and /o/, for classifying patients with ALS/PD and Healthy Controls (HC). |
T. Bhattacharjee; Y. Belur; A. Nalini; R. Yadav; P. K. Ghosh; |
932 | Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposed a over 3200 hours dataset used for singing language identification, called Slingua. |
X. Wang; H. Wu; C. Ding; C. Huang; M. Li; |
933 | Exploring Vision Transformer Layer Choosing for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, unlike previous encoder and decoder work, we design a neck network for adaptive fusion and feature selection, called ViT-Controller. |
F. Lin; Y. Ma; S. Tian; |
934 | Exploring Wav2vec 2.0 Fine Tuning for Improved Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that V-FT is able to outperform state-of-the-art models on the IEMOCAP dataset. |
L. -W. Chen; A. Rudnicky; |
935 | Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current approaches struggle with the balance between speaker similarity, intelligibility, and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both the neural bottleneck feature (BNF) approach and the information perturbation approach. |
Z. Ning; Q. Xie; P. Zhu; Z. Wang; L. Xue; J. Yao; L. Xie; M. Bi; |
936 | Extended Expectation Maximization for Under-Fitted Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we generalize the well-known Expectation Maximization (EM) algorithm using the α−divergence for Gaussian Mixture Model (GMM). |
A. M. Rekavandi; A. -K. Seghouane; F. Boussaid; M. Bennamoun; |
937 | Extended Kalman Filter for Graph Signals in Nonlinear Dynamic Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Extended Kalman filter (EKF) is a suitable estimator for such dynamics, but its implementation tends to be complex and possibly unstable when tracking high-dimensional graph signals. To tackle this, we propose the graph signal processing (GSP)-EKF, which replaces the Kalman gain in the EKF with a graph filter that aims to minimize the computed prediction error. |
G. Sagi; N. Shlezinger; T. Routtenberg; |
938 | Extracting The Brain-Like Representation By An Improved Self-Organizing Map for Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an improved SOM with multi-winner, multi-code, and local receptive field, named mlSOM. |
J. Zhang; L. Cao; M. Zhang; W. Fu; |
939 | Extreme Audio Time Stretching Using Neural Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. |
L. Fierro; A. Wright; V. Välimäki; M. Hämäläinen; |
940 | F0 Estimation From Telephone Speech Using Deep Feature Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a method is proposed to synthesize EGG signal from telephone speech using deep feature loss network and subsequently pitch contour is derived from synthesized EGG (SEGG) signal. |
S. M. Shetty; S. Revankar; N. C. Iyer; K. T. Deepak; |
941 | Face Recognition on Point Cloud with Cgan-Top for Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an end-to-end 3D face recognition on a noisy point cloud is proposed, which synergistically integrates the denoising and recognition modules. |
J. Liu; J. Ren; H. Sun; X. Jiang; |
942 | Facial Texure Perceiver: Towards High-Fidelity Facial Texture Recovery with Input-Level Inductive Biased Perceiver IO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new method, called Facial Texture Perceiver. |
S. Lee; |
943 | Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to factorize out the language component in the AED model, we propose the factorized attention-based encoder-decoder (Factorized AED) model whose decoder takes as input the posterior probabilities of a jointly trained LM. |
X. Gong; W. Wang; H. Shao; X. Chen; Y. Qian; |
944 | Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to utilize HAT-style joiner factorization for the purpose of skipping the more expensive non-blank computation when the blank probability exceeds a certain threshold. |
D. Le; F. Seide; Y. Wang; Y. Li; K. Schubert; O. Kalinli; M. L. Seltzer; |
945 | Factorized Projection-Domain Spatio-Temporal Regularization for Dynamic Tomography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an object-domain recovery algorithm using a variational formulation that combines a partially separable spatio-temporal prior with a basic total-variation spatial regularization for improved performance, while preserving full interpretability. |
B. Iskender; M. L. Klasky; B. M. Patterson; Y. Bresler; |
946 | False Alarm Regulation for Off-Grid Target Detection With The Matched Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, an asymptotic PFA-threshold relationship for the popular Matched Filter is derived in the off-grid case under complex white Gaussian noise hypothesis using expected Euler characteristics. |
P. Develter; J. Bosse; O. Rabaste; P. Forster; J. . -P. Ovarlez; |
947 | Fan-Net: Fourier-Based Adaptive Normalization for Cross-Domain Stroke Lesion Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose a novel FAN-Net, a U-Net–based segmentation network incorporated with a Fourier-based adaptive normalization (FAN) and a domain classifier with a gradient reversal layer. |
W. Yu; Y. Lei; H. Shan; |
948 | FAPM: Fast Adaptive Patch Memory for Real-Time Industrial Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, some methods do not meet the speed requirements of real-time inference, which is crucial for real-world applications. To address this issue, we propose a new method called Fast Adaptive Patch Memory (FAPM) for real-time industrial anomaly detection. |
D. Kim; C. Park; S. Cho; S. Lee; |
949 | Fast 3D Human Pose Estimation Using RF Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a lightweight RF-based 3D human pose estimation model, i.e., Fast RFPose, to enable real-time human pose estimation. |
C. Yu; D. Zhang; Z. Wu; C. Xie; Z. Lu; Y. Hu; Y. Chen; |
950 | Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose several methods to improve the performance of the FNT model. |
R. Zhao; J. Xue; P. Parthasarathy; V. Miljanic; J. Li; |
951 | Fast and Efficient Speech Enhancement with Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors. |
M. Sadeghi; R. Serizel; |
952 | Fast and Exact Enumeration of Deep Networks Partitions Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide the first parallel algorithm that does exact enumeration of the DN’s partition regions. |
R. Balestriero; Y. LeCun; |
953 | Fast and Parallel Decoding for Transducer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The experiment results show that we achieve slight word error rate (WER) improvement as well as significant speedup in decoding. |
W. Kang; L. Guo; F. Kuang; L. Lin; M. Luo; Z. Yao; X. Yang; P. Żelasko; D. Povey; |
954 | Fast Convolution Algorithm for Real-Valued Finite Length Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The arithmetic analysis reveals that it efficiently reduces the operation counts. |
W. Wang; V. DeBrunner; L. S. DeBrunner; |
955 | Fast Cross-Correlation for TDoA Estimation on Small Aperture Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the Fast Cross-Correlation (FCC) method for Time Difference of Arrival (TDoA) Estimation for pairs of microphones on a small aperture microphone array. |
F. Grondin; M. -A. Maheux; J. -S. Lauzon; J. Vincent; F. Michaud; |
956 | Faster Than Fast: Accelerating The Griffin-Lim Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an accelerated version of the well known Fast Griffin-Lim algorithm (FGLA) for the phase retrieval problem in a general setting. |
R. Nenov; D. -K. Nguyen; P. Balazs; |
957 | Fast Low-Latency Convolution By Low-Rank Tensor Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider fast time-domain convolution, exploiting low-rank properties of an impulse response (IR). |
M. Jälmby; F. Elvander; T. van Waterschoot; |
958 | Fast Multiscale 3D Reconstruction Using Single-Photon Lidar Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a reconstruction algorithm that exploits data statistics and multi-scale information to deliver clean depth and reflectivity images together with associated uncertainty maps. |
S. Plosz; I. Gyongy; J. Leach; S. McLaughlin; G. S. Buller; A. Halimi; |
959 | Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the problem of separating moving sources using online independent vector analysis (IVA). |
T. Nakashima; R. Ikeshita; N. Ono; S. Araki; T. Nakatani; |
960 | Fast Robust Principle Component Analysis Using Gauss-Newton Iterations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a computation protocol that leverages Gauss-Newton iterations to speed up the sequential computation of SVDs and accelerate the entire RPCA process. |
W. Chettleburgh; Z. Huang; M. Yan; |
961 | Fast Single-Person 2D Human Pose Estimation Using Multi-Task Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel neural module for enhancing existing fast and lightweight 2D human pose estimation CNNs, in order to increase their accuracy. |
C. Papaioannidis; I. Mademlis; I. Pitas; |
962 | Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. |
C. Liang; X. -L. Zhang; B. Zhang; D. Wu; S. Li; X. Song; Z. Peng; F. Pan; |
963 | Fast Yet Effective Speech Emotion Recognition with Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we argue achieving a fast yet effective SER is possible with self-distillation, a method of simultaneously fine-tuning a pretrained model and training shallower versions of itself. |
Z. Ren; T. T. Nguyen; Y. Chang; B. W. Schuller; |
964 | FCIR: Rethink Aerial Image Super Resolution with Fourier Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink aerial image super resolution (AISR) task with the perspective of Fourier analysis. |
Y. Zhang; P. Zheng; J. Jiang; P. Xiao; X. Gao; |
965 | Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose feature-rich audio model inversion (FRAMI), a data-free knowledge distillation framework for general sound classification tasks. |
Z. Kang; Y. He; J. Wang; J. Peng; X. Qu; J. Xiao; |
966 | Feature Selection and Text Embedding for Detecting Dementia from Spontaneous Cantonese Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate Transformer-based features and propose an end-to-end system for dementia detection. |
X. Ke; M. -W. Mak; H. M. Meng; |
967 | Feature Space Recovery for Incomplete Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a feature space recovery based IMVC method, where low-rank feature space recovery and consensus representation learning of inter/intra-views are considered into a unified framework. |
Z. Long; C. Zhu; P. Comon; Y. Liu; |
968 | FED-3DA: A Dynamic and Personalized Federated Learning Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach to reduce the impact of the dynamic distribution on the local model based on metalearning and distribution distance measurement named Fed-3DA. |
H. Wang; J. Sun; T. Wo; X. Liu; |
969 | FedAudio: A Federated Learning Benchmark for Audio Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While a number of FL benchmarks have been developed to facilitate FL research, none of them include audio data and audio-related tasks. In this paper, we fill this critical gap by introducing a new FL benchmark for audio tasks which we refer to as FedAudio. |
T. Zhang; T. Feng; S. Alam; S. Lee; M. Zhang; S. S. Narayanan; S. Avestimehr; |
970 | FedEEG: Federated EEG Decoding Via Inter-Subject Structure Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, sending each individuals’ EEG data directly to a centralized server might cause privacy leakage. To overcome this issue, we present an inter-subject structure matching-based federated EEG decoding (FedEEG) framework. |
W. Hang; J. Li; S. Liang; Y. Wu; B. Lei; J. Qin; Y. Zhang; K. -S. Choi; |
971 | Federated Intelligent Terminals Facilitate Stuttering Monitoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose federated intelligent terminals for automatic monitoring of stuttering speech in different contexts. |
Y. Yu; W. Qiu; C. Quan; K. Qian; Z. Wang; Y. Ma; B. Hu; B. W. Schuller; Y. Yamamoto; |
972 | Federated Learning for ASR Based on Wav2vec 2.0 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a study on the use of federated learning to train an ASR model based on a wav2vec 2.0 model pre-trained by self supervision. |
T. Nguyen; S. Mdhaffar; N. Tomashenko; J. -F. Bonastre; Y. Estève; |
973 | Federated Self-Learning with Weak Supervision for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine transcriptions from a stronger ASR model. |
M. Rao; G. Chennupati; G. Tiwari; A. Kumar Sahu; A. Raju; A. Rastrow; J. Droppo; |
974 | Federated Semi-Supervised Learning for Object Detection in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified semi-supervised and federated learning (FL) approach that is designed to offer cost efficient and practical training of deep learning object detection models for autonomous driving. |
F. Chi; Y. Wang; P. Nasiopoulos; V. C. M. Leung; M. T. Pourazad; |
975 | FedPrompt: Communication-Efficient and Privacy-Preserving Prompt Tuning in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prompt tuning, which tunes soft prompts without modifying PLMs, has achieved excellent performance as a new learning paradigm. In this paper, we want to combine these methods and explore the effect of prompt tuning under FL. |
H. Zhao; W. Du; F. Li; P. Li; G. Liu; |
976 | FedRPO: Federated Relaxed Pareto Optimization for Acoustic Event Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose a novel Federated Relaxed Pareto Optimization (FedRPO) method for semi-supervised FL with heterogeneous client data. |
M. Feng; C. -C. Kao; Q. Tang; A. Solomon; V. Rozgic; C. Wang; |
977 | FedSD: A New Federated Learning Structure Used in Non-iid Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel structure called FedSD, a new method to accelerate the model convergence. |
M. Yi; H. Ning; P. Liu; |
978 | FedVMR: A New Federated Learning Method for Video Moment Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in real-world applications, due to the inherent nature of data generation and privacy concerns, data are often distributed on different silos, bringing huge challenges to effective large-scale training. In this work, we try to overcome above limitation by leveraging the recent success of federated learning. |
Y. Wang; X. Luo; Z. -D. Chen; P. -F. Zhang; M. Liu; X. -S. Xu; |
979 | Few But Informative Local Hash Code Matching for Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research study, we propose an expressive local feature extraction pipeline and a many-to-many local feature matching method for large-scale CBIR. |
Z. Hu; A. G. Bors; |
980 | FEW-Shot Continual Learning with Weight Alignment and Positive Enhancement for Bioacoustic Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new continual learning framework for few-shot bioacoustic event detection (BED). |
X. Wu; D. Xu; H. Wei; Y. Long; |
981 | FFEDCL: Fair Federated Learning with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a real-time fairness adjustment algorithm for the global model based on model-level contrastive learning, called FFedCL. |
X. Shi; L. Yi; X. Liu; G. Wang; |
982 | FFFN: Fashion Feature Fusion Network By Co-Attention Model for Fashion Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively utilize the advantages of high-level and low-level features in images, we propose a Fashion Feature Fusion Network (FFFN) to solve the fashion complementary recommended tasks, which extracts and combines the features of different dimensions in the neural network into a fusion feature. |
Z. Lin; X. Zhang; |
983 | Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous studies and our preliminary experiments reveal that the biggest challenge in filler word detection is that fillers can be easily confused with other hard categories like a or I. In this paper, we propose a novel filler word detection method that effectively addresses this challenge by adding auxiliary categories dynamically and applying an additional inter-category focal loss. |
Z. Zhao; L. Wu; C. Tang; D. Yin; Y. Zhao; C. Luo; |
984 | Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. |
I. López-Espejo; R. C. M. C. Shekar; Z. -H. Tan; J. Jensen; J. H. L. Hansen; |
985 | Filter Pruning Via Filters Similarity in Consecutive Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we intuitively propose a novel pruning method by explicitly leveraging the Filters Similarity in Consecutive Layers (FSCL). |
X. Wang; J. Wang; X. Tang; P. Gao; R. Fang; G. Xie; |
986 | FindAdaptNet: Find and Insert Adapters By Learned Layer Importance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a technique that achieves automatic insertion of adapters for downstream automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. |
J. Huang; K. Ganesan; S. Maiti; Y. Min Kim; X. Chang; P. Liang; S. Watanabe; |
987 | Finding Optimal Numerical Format for Sub-8-Bit Post-Training Quantization of Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes an analytical framework that optimizes the numerical format of each matrix multiplication of ViTs for mixed-format sub-8bit quantization. |
J. Lee; Y. Hwang; J. Choi; |
988 | Fine-Grained Blind Face Inpainting with 3D Face Component Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel fine-grained blind face inpainting framework, combining 3D face components disentanglement with generative network. |
Y. Bai; R. He; W. Tan; B. Yan; Y. Lin; |
989 | Fine-Grained Emotional Control of Text-to-Speech: Learning to Rank Inter- and Intra-Class Emotion Intensities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a fine-grained controllable emotional TTS, that considers both inter- and intra-class distances and be able to synthesize speech with recognizable intensity difference. |
S. Wang; J. Guðnason; D. Borth; |
990 | Fine-Grained Private Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to attain fine-grained privacy accountant and improve utility, this work proposes a model-free reverse k-NN labeling method towards record-level private knowledge distillation, where each private record is employed for labeling at most k queries. |
Y. Li; S. Wang; Y. Wang; J. Li; Y. Qian; B. Xin; W. Yang; |
991 | Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since these are E2E models operating on speech directly, there remains a potential to improve their performance using purely text based models like BERT, which have strong language understanding capabilities. In this paper, we propose a new training criteria for RNN-T based E2E ASR and SLU to transfer BERT’s knowledge into these systems. |
V. Sunder; S. Thomas; H. -K. J. Kuo; B. Kingsbury; E. Fosler-Lussier; |
992 | Finer-Grained Decomposition for Parallel Quantum Mimo Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss the parallelization of quantum MIMO processing and investigate a spin-level preprocessing method for relatively finer-grained decomposition that can support more flexible parallel quantum signal processing, compared to the recently reported symbol-level decomposition method. |
M. Kim; K. Jamieson; |
993 | Fixed-Point Quantization Aware Training for On-Device Keyword-Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method to train and obtain FXP convolutional keyword-spotting (KWS) models. |
S. Macha; O. Oza; A. Escott; F. Calivá; R. Armitano; S. K. Cheekatmalla; S. Hari Krishnan Parthasarathi; Y. Liu; |
994 | Flexible Beam Design for Vital Sign Monitoring Using A Phased Array Equipped With Double-Phase Shifters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recognizing the low-cost advantage of phased arrays, we investigate the improvement of the beamforming capability of a phased array via the use of double-phase shifters (DPS), i.e., each antenna is fed with the sum of the outputs of two phase shifters. |
Z. Xu; D. Gao; S. Li; C. -T. M. Wu; A. Petropulu; |
995 | Flowgrad: Using Motion for Visual Sound Source Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces temporal context into the state-of-the-art methods for sound source localization in urban scenes using optical flow to encode motion information. |
R. Singh; P. Zinemanas; X. Serra; J. P. Bello; M. Fuentes; |
996 | Flow-Guided Deformable Alignment Network with Self-Supervision for Video Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a self-supervised Flow-Guided Deformable Alignment (FGDA) network for aligning reference frames at the feature level. |
Z. Wu; K. Zhang; C. Sun; H. Xuan; Y. Yan; |
997 | Flowpose: Conditional Normalizing Flows for 3D Human Pose and Shape Estimation from Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods model human motion by learning a deterministic mapping from the input videos to the human body parameters, while the uncertainties such as occlusions and depth ambiguities are ignored. To address this problem, we propose a probabilistic model based on conditional normalizing flows called FlowPose to learn the distribution of feasible 3D human motion. |
Y. Du; Z. Zhang; Z. Li; P. Wei; Q. Liao; W. Yang; |
998 | Flowreg: Latent Space Regularization Using Normalizing Flow For Limited Samples Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FlowReg, a new learnable latent space regularization for limited sample problems. |
C. Wang; J. Gao; Y. Hua; H. Wang; |
999 | FNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity Via Integrated Full- and Sub-band Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. |
Z. -Q. Wang; S. Cornell; S. Choi; Y. Lee; B. -Y. Kim; S. Watanabe; |
1000 | Focusing on Targets for Improving Weakly Supervised Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose two simple but efficient methods for improving this approach. |
V. -Q. Pham; N. Mishima; |
1001 | Forecasting of Breathing Events from Speech for Respiratory Support Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we provide the first evidence that it is possible to forecasts the next inhale moment from the speech of the talker using deep learning techniques. |
A. Härmä; U. Großekathöfer; O. Ouweltjes; V. S. Nallanthighal; |
1002 | Forensics for Adversarial Machine Learning Through Attack Mapping Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an attack mapping identification method that utilizes a pre-attack example recovery mechanism as a feature extraction method. |
A. Yan; J. Kim; R. Raich; |
1003 | F-PABEE: Flexible-Patience-Based Early Exiting For Single-Label and Multi-Label Text Classification Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Computational complexity and overthinking problems have become the bottlenecks for pre-training language models (PLMs) with millions or even trillions of parameters. A Flexible-Patience-Based Early Exiting method (F-PABEE) has been proposed to alleviate the problems mentioned above for single-label classification (SLC) and multi-label classification (MLC) tasks. |
X. Gao; W. Zhu; J. Gao; C. Yin; |
1004 | Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we formulate it as a frame-level multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument. |
D. Li; M. Che; W. Meng; Y. Wu; Y. Yu; F. Xia; W. Li; |
1005 | Frame-Wise and Overlap-Robust Speaker Embeddings for Meeting Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using a Teacher-Student training approach we developed a speaker embedding extraction system that outputs embeddings at frame rate. |
T. Cord-Landwehr; C. Boeddeker; C. Zorilă; R. Doddipatla; R. Haeb-Umbach; |
1006 | Framewise Multiple Sound Source Localization and Counting Using Binaural Spatial Audio Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a binaural multiple sound source localization network (BMSSLnet) model, which can predict framewise azimuths without prior knowledge of sound source number in a binaural audio signal. |
L. Wang; Z. Jiao; Q. Zhao; J. Zhu; Y. Fu; |
1007 | Framewise Wavegan: High Speed Adversarial Vocoder In Time Domain With Very Low Computational Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new architecture for GAN vocoders that mainly depends on recurrent and fully-connected networks to directly generate the time domain signal in framewise manner. |
A. Mustafa; J. -M. Valin; J. Büthe; P. Smaragdis; M. Goodwin; |
1008 | Freevc: Towards High-Quality Text-Free One-Shot Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we adopt the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. |
J. Li; W. Tu; L. Xiao; |
1009 | Free-View Expressive Talking Head Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework for talking head video editing, allowing users to freely edit head pose, emotion, and eye blink while maintaining audio-visual synchronization. |
Y. Huang; S. Iizuka; K. Fukui; |
1010 | Frequency and Scale Perspectives of Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, by analyzing the sensitivity of neural networks to frequencies and scales, we find that neural networks not only have low- and mediumfrequency biases but also prefer different frequency bands for different classes, and the scale of objects influences the preferred frequency bands. |
L. Zhang; Y. Luo; X. Cao; H. Shen; T. Wang; |
1011 | Frequency-Aware Attentional Feature Fusion for Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although recent works achieve significant performance in deepfake detection, they still suffer from overfitting issues. To deal with this problem, we propose a novel framework to aggregate diverse information for deepfake detection from both RGB and frequency. |
C. Tian; Z. Luo; G. Shi; S. Li; |
1012 | Frequency Bin-Wise Single Channel Speech Presence Probability Estimation Using Multiple DNNS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a frequency bin-wise method to estimate the single-channel speech presence probability (SPP) with multiple deep neural networks (DNNs) in the short-time Fourier transform domain. |
S. Tao; H. Reddy; J. R. Jensen; M. G. Christensen; |
1013 | Frequency Reciprocal Action and Fusion for Single Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations, we propose a novel Frequency Reciprocal Action and Fusion Network (FRAF) that explores various frequency correlations and differences. |
S. Dong; F. Lu; C. Yuan; |
1014 | Frequency-Selective Hybrid Beamforming For Mmwave Full-Duplex Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study beamforming cancellation for a full-duplex (FD) wideband millimeter wave (mmWave) point-to-point bidirectional link in which both multicarrier-based nodes transmit and receive simultaneously and on the same frequency. |
A. Guamo-Morocho; R. López-Valcarce; |
1015 | Fretnet: Continuous-Valued Pitch Contour Streaming For Polyphonic Guitar Tablature Transcription Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Contemporary approaches to AMT do not adequately address pitch modulation, and offer only less quantization at the expense of more model complexity. In this paper, we present a GTT formulation that estimates continuous-valued pitch contours, grouping them according to their string and fret of origin. |
F. Cwitkowitz; T. Hirvonen; A. Klapuri; |
1016 | From Easy to Hard: Two-Stage Selector and Reader for Multi-Hop Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works commonly introduce techniques such as graph modeling and question decomposition to explore precise intermediate results of multi-hop reasoning, leading to complexity growth and error accumulation. In this paper, we propose FE2H, a simple yet effective framework without extra tasks to address these problems. |
X. -Y. Li; W. -J. Lei; Y. -B. Yang; |
1017 | From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English automatic speech recognition (ASR) models to recognize the other languages. |
C. -H. H. Yang; B. Li; Y. Zhang; N. Chen; R. Prabhavalkar; T. N. Sainath; T. Strohman; |
1018 | Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This consistency may introduce potential issues when the optimal front-end is not the same as that used in pre-training. In this paper, we propose a simple but effective front-end adapter to address this front-end discrepancy. |
X. Chen; Z. Ma; C. Tang; Y. Wang; Z. Zheng; |
1019 | Full-Band General Audio Synthesis with Score-Based Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a diffusion-based generative model for general audio synthesis, named DAG, which deals with full-band signals end-to-end in the waveform domain. |
S. Pascual; G. Bhattacharya; C. Yeh; J. Pons; J. Serrà; |
1020 | Fully Complex-Valued Deep Learning Model for Visual Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes that operating entirely in the complex domain increases the overall performance of complex-valued models. |
A. Sikdar; S. Udupa; S. Sundaram; |
1021 | Fully Distributed Federated Learning with Efficient Local Cooperations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a communication-efficient, fully distributed, diffusion-based learning algorithm that does not require a parameter server and propose an adaptive combination rule for the cooperation of the devices. |
E. Georgatos; C. Mavrokefalidis; K. Berberidis; |
1022 | Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to combine self-supervised representation learning (SSRL) methods as a component of spoken term discovery and probabilistic topic models. |
T. Maekaku; Y. Fujita; X. Chang; S. Watanabe; |
1023 | G2CNN: Geometric Prior Based GCNN for Single-View 3D Reconstruction with Loop Subdivision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a geometric prior based graph convolution neural network model (named G2CNN) for single-view 3D reconstruction with Loop subdivision. |
K. Cao; N. Qi; W. Xu; Q. Zhu; S. Xu; C. Pan; |
1024 | G2PL: Lexicon Enhanced Chinese Polyphone Disambiguation Using Bert Adapter with A New Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, due to the double long-tail distribution of polyphones, the ratio of pronunciation data for most polyphones is extremely unbalanced after sampling. To solve these problems, we propose a new dataset with 57,000 sentences from various domains by a new strategy for sampling. |
H. Zhao; H. Wan; L. Huang; M. Cao; |
1025 | Gaitcotr: Improved Spatial-Temporal Representation for Gait Recognition with A Hybrid Convolution-Transformer Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a novel hybrid convolution-transformer framework for gait recognition, termed GaitCoTr. |
J. Li; Y. Zhang; H. Shan; J. Zhang; |
1026 | Gaitmixer: Skeleton-Based Gait Representation Learning Via Wide-Spectrum Multi-Axial Mixer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the performance of skeleton-based solutions is still largely behind the appearance-based ones. This paper aims to close such performance gap by proposing a novel network model, GaitMixer, to learn more discriminative gait representation from skeleton sequence data. |
E. Pinyoanuntapong; A. Ali; P. Wang; M. Lee; C. Chen; |
1027 | GANStrument: Adversarial Instrument Sound Synthesis with Pitch-Invariant Instance Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose GANStrument, a generative adversarial model for instrument sound synthesis. |
G. Narita; J. Shimizu; T. Akama; |
1028 | GaPP: Multi-Target Tracking with Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel, flexible, multi-target tracking approach based upon a Gaussian process as a dynamical model, coupled with a non-homogeneous Poisson process for the observation model. |
F. Goodyer; B. I. Ahmad; S. Godsill; |
1029 | GAPter: Gray-Box Data Protector for Deep Learning Inference Services at User Side Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a fully-automatic userside data privacy enhancement solution, GAPter, for DLISes. |
H. Wu; B. Yang; X. Ke; S. He; F. Xu; S. Zhong; |
1030 | Gated Contextual Adapters For Selective Contextual Biasing In Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose gated contextual biasing models that can estimate at runtime when contextual biasing is needed and can toggle it on or off. |
A. Alexandridis; K. M. Sathyendra; G. P. Strimel; F. -J. Chang; A. Rastrow; N. Susanj; A. Mouchtaris; |
1031 | Gated Enhanced RPN and Hybrid-View for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In previous work, the quality of proposals generated by the RPN is not yet high due to the lack of a fine-grained matching strategy, and the detector performs feature integration only at the spatial location level is limited. Therefore, in this paper, we propose a new method for few-shot object detection which obtains high-quality proposals by the Gated Enhanced RPN (GRPN). |
X. Wei; Z. Zhou; P. Guo; W. Zhang; |
1032 | Gator: Graph-Aware Transformer with Motion-Disentangled Regression for Human Mesh Recovery from A 2D Pose Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is hard for existing methods to simultaneously capture the multiple relations during the evolution from skeleton to mesh, including joint-joint, joint-vertex and vertex-vertex relations, which often leads to implausible results. To address this issue, we propose a novel solution, called GATOR, that contains an encoder of Graph-Aware Transformer (GAT) and a decoder with Motion-Disentangled Regression (MDR) to explore these multiple relations. |
Y. You; H. Liu; X. Li; W. Li; T. Wang; R. Ding; |
1033 | Gaussian Prior Reinforcement Learning for Nested Named Entity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works for nested NER ignore the recognition order and boundary position relation of nested entities. To address these issues, we propose a novel seq2seq model named GPRL, which formulates the nested NER task as an entity triplet sequence generation process. |
Y. Yang; X. Hu; F. Ma; S. Li; A. Liu; L. Wen; P. S. Yu; |
1034 | Gaussian Process Dynamical Modeling for Adaptive Inference Over Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With nodal observation arriving on-the-fly, the proposed method simultaneously estimates the missing nodal values and selects the fitted dynamical model via data-adaptive weights. |
Q. Lu; K. D. Polyzos; |
1035 | Gaze Pre-Train For Improving Disparity Estimation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we used pre-training in the automotive domain where the setup was composed of a camera aimed outside the vehicle and an eye tracking system observing the driver. |
R. M. Hecht; O. Rahamim; S. Oron; A. Forgacs; G. Celniker; D. Levi; O. Tsimhoni; |
1036 | GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of localizing only the target speaker in multi-speaker scenarios and propose a target speaker localization algorithm, called GCC-speaker. |
G. Li; W. Xue; W. Liu; J. Yi; J. Tao; |
1037 | Gct: Gated Contextual Transformer for Sequential Audio Tagging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a gated contextual Transformer (GCT) with forward-backward inference (FBI). |
Y. Hou; Y. Wang; W. Wang; D. Botteldooren; |
1038 | Gender-Cartoon: Image Cartoonization Method Based on Gender Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current cartoonization methods suffer from the inability to classify and accurately cartoonize Qinqiang portraits by gender. Therefore, we propose Gender-Cartoon, which can achieve different gender portrait cartoons. |
L. Feng; G. Geng; C. Guo; L. Yan; X. Ma; Z. Li; K. Li; |
1039 | General Category Network: Handwritten Mathematical Expression Recognition with Coarse-Grained Recognition Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Several works introduce HMER-related tasks that enhance the performance of HMER. |
X. Zhang; H. Ying; Y. Tao; Y. Xing; G. Feng; |
1040 | Generalized Invariant Matching Property Via Lasso Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, by formulating a high-dimensional problem with intrinsic sparsity, we generalize the invariant matching property for an important setting when only the target is intervened. |
K. Du; Y. Xiang; |
1041 | Generalized Relative Harmonic Coefficients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper proposes a multi-channel feature denoted generalized relative harmonic coefficients (generalized RHC) in the spherical harmonics domain, which can equally localize both far- and near-field sound source without requiring any adjustments. |
Y. Hu; S. Gannot; T. D. Abhayapala; |
1042 | Generalized Two-Stage Particle Filter for High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new filter inspired by the TPF principle as well as multiple PF (MPF), that can be applied to any setup and that provides a posterior distribution of the tempering coefficient that is updated recursively. |
M. Iloska; M. F. Bugallo; |
1043 | General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the effectiveness of existing rigorous privacy-enhancing techniques, i.e., user-level differential privacy (UDP) and Voice-Indistinguishability (Voice-Ind), for enhancing FL in the scenario of Speech Emotion Recognition (SER), against gender inference attacks. |
C. Tan; Y. Cao; S. Li; M. Yoshikawa; |
1044 | Generative Model Based Highly Efficient Semantic Communication Approach for Image Transmission Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generative model based semantic communication to further improve the efficiency of image transmission and protect private information. |
T. Han; J. Tang; Q. Yang; Y. Duan; Z. Zhang; Z. Shi; |
1045 | Generative Modeling Based Manifold Learning for Adaptive Filtering Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to learn the manifold of a set of impulse responses and subsequently employ that learned manifold in an adaptation algorithm for system identification. |
K. Helwani; P. Smaragdis; M. M. Goodwin; |
1046 | Generic Dependency Modeling for Multi-Party Conversation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To model the dependencies between utterances in multi-party conversations, we propose a simple and generic framework based on the dependency parsing results of utterances. |
W. Shen; X. Quan; K. Yang; |
1047 | Geogcn: Geometric Dual-Domain Graph Convolution Network For Point Cloud Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose GeoGCN, a novel geometric dual-domain graph convolution network for point cloud denoising (PCD). |
Z. Chen; P. Li; Z. Wei; H. Chen; H. Xie; M. Wei; F. L. Wang; |
1048 | Geometric Matrix Completion with Collaborative Routing Between Capsules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel graph-based matrix completion model based on Collaborative Routing (CRMC) is proposed, which can exploit the user-item graph from both implicit and explicit perspectives to learn better representations, and preserve the relationships between users and items latent in the user-item graph. |
X. Li; L. Zhang; |
1049 | Geometry-Aware DOA Estimation Using A Deep Neural Network with Mixed-Data Input Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike model-based direction of arrival (DoA) estimation algorithms, supervised learning-based DoA estimation algorithms based on deep neural networks (DNNs) are usually trained for one specific microphone array geometry, resulting in poor performance when applied to a different array geometry. In this paper we illustrate the fundamental difference between supervised learning-based and model-based algorithms leading to this sensitivity. |
U. Kowalk; S. Doclo; J. Bitzer; |
1050 | Gesper: A Unified Framework for General Speech Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the legends-tencent team’s real-time General Speech Restoration (Gesper) system submitted to the ICASSP 2023 Speech Signal Improvement (SSI) Challenge. |
J. Chen; Y. Shi; W. Liu; W. Rao; S. He; A. Li; Y. Wang; Z. Wu; S. Shang; C. Zheng; |
1051 | Glacier: Glass-Box Transformer for Interpretable Dynamic Neuroimaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The lack of interpretability causes a restrain from applying deep learning to fields such as neuroimaging, where the results must be transparent, and interpretable. Therefore, we present a ’glass-box’ deep learning model and apply it to the field of neuroimaging. |
U. Mahmood; Z. Fu; V. Calhoun; S. Plis; |
1052 | Global and Nodal Mutual Information Maximization in Heterogeneous Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a self-supervised method that learns representations by relying on mutual information maximization among different graph structures (metapaths). |
C. Mavromatis; G. Karypis; |
1053 | Global-Context Aware Generative Protein Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose the Global-Context Aware generative de novo protein design method (GCA), consisting of local modules and global modules. |
C. Tan; Z. Gao; J. Xia; B. Hu; S. Z. Li; |
1054 | Global HRTF Interpolation Via Learned Affine Transformation of Hyper-Conditioned Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Especially for the data-driven approaches, existing HRTF datasets differ in spatial sampling distributions of source positions, posing a major problem when generalizing the method across multiple datasets. To alleviate this, we propose a deep learning method based on a novel conditioning architecture. |
J. W. Lee; S. Lee; K. Lee; |
1055 | Global Localisation in Continuous Magnetic Vector Fields Using Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the use of continuous vector fields provided by a Gaussian Process (GP) with a divergence-free kernel that follows the magnetic flux to localise a mobile robot moving in a 2D space. |
W. McDonald; C. Le Gentil; T. Vidal-Calleja; |
1056 | Global Matching-Optimization Network for Stereo Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it still remains a challenge to accurately estimate disparity for occlusion and textureless regions. To address this challenge, we present the Global Matching-Optimization Stereo Network (GMOStereo), which contains three components: Conv-Trans Feature Extraction Module (C-TFEM), Global Matching Module (GMM), and scene-aware disparity optimization. |
Y. Zhang; W. Huang; W. Yang; |
1057 | Gluformer: Transformer-based Personalized Glucose Forecasting with Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to model the future glucose trajectory conditioned on the past as an infinite mixture of basis distributions (i.e., Gaussian, Laplace, etc.). |
R. Sergazinov; M. Armandpour; I. Gaynanova; |
1058 | Going in Style: Audio Backdoors Through Stylistic Transformations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work explores stylistic triggers for backdoor attacks in the audio domain: dynamic transformations of malicious samples through guitar effects. |
S. Koffas; L. Pajola; S. Picek; M. Conti; |
1059 | Good Neighbors Are All You Need for Chinese Grapheme-To-Phoneme Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the system exhibits inconsistency in the segmentation of word boundaries which consequently degrades the performance of the G2P system. To address these issues, we propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations. |
J. Kim; C. Han; G. Nam; G. Chae; |
1060 | GOP-Based Latent Refinement for Learned Video Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a method allowing learned video encoders to apply arbitrary latent refinement strategies to serve as RateDistortion Optimization (RDO) at the time of encoding. |
M. Abdoli; G. Clare; F. Henry; |
1061 | Grad-CAM-Inspired Interpretation of Nearfield Acoustic Holography Using Physics-Informed Explainable Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this manuscript we propose a Grad-CAM-inspired approach for the visual explanation of neural network architecture for regression problems. |
H. Kafri; M. Olivieri; F. Antonacci; M. Moradi; A. Sarti; S. Gannot; |
1062 | Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. |
Y. Hu; C. Chen; R. Li; Q. Zhu; E. S. Chng; |
1063 | Grad-StyleSpeech: Any-Speaker Adaptive Text-to-Speech Synthesis with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Grad-StyleSpeech, which is an any-speaker adaptive TTS framework that is based on a diffusion model that can generate highly natural speech with extremely high similarity to target speakers’ voice, given a few seconds of reference speech. |
M. Kang; D. Min; S. J. Hwang; |
1064 | Graph-Based Point Cloud Color Denoising with 3-Dimensional Patch-Based Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a point cloud denoising method for color signals. |
R. Watanabe; K. Nonaka; E. Pavez; T. Kobayashi; A. Ortega; |
1065 | Graph Based Semantic Ensemble of Riemannian Neural Structured Learning for BCI-EEG Signal Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The proposed model is assessed on the standard BCI IV 2a dataset. |
V. Gupta; L. Behera; T. Sandhan; |
1066 | Graph-Based Spectro-Temporal Dependency Modeling for Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The proposed method achieves an equal error rate of 0.58% on the ASVspoof 2019 LA dataset and outperforms all competing systems. |
F. Chen; S. Deng; T. Zheng; Y. He; J. Han; |
1067 | Graph Contrastive Learning with Learnable Graph Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a Graph Contrastive learning framework with Learnable graph Augmentation called GraphCLA. |
X. Pu; K. Zhang; H. Shu; J. L. Coatrieux; Y. Kong; |
1068 | Graph-Graph Context Dependency Attention for Graph Edit Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a deep network architecture GED-CDA by introducing a graph-graph context dependency attention module to enhance embeddings. |
R. Jia; X. Feng; X. Lyu; Z. Tang; |
1069 | Graphit: Iterative Reweighted ℓ1 Algorithm for Sparse Graph Inference in State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose GraphIT, a majorization-minimization (MM) algorithm for estimating the linear operator in the state equation of an LG-SSM under sparse prior. |
É. Chouzenoux; V. Elvira; |
1070 | Graph Learning from Gaussian and Stationary Graph Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By assuming that the observations are both Gaussian and stationary in the sought graph, this paper proposes a new scheme to learn the network from nodal observations. |
A. Buciulea; A. G. Marques; |
1071 | Graphmad: Graph Mixup for Data Augmentation Using Data-Driven Convex Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, a promising approach for graph mixup is to first project the graphs onto a common latent feature space and then explore linear and nonlinear mixup strategies in this latent space. In this context, we propose to (i) project graphs onto the latent space of continuous random graph models known as graphons, (ii) leverage convex clustering in this latent space to generate nonlinear data-driven mixup functions, and (iii) investigate the use of different mixup functions for labels and data samples. |
M. Navarro; S. Segarra; |
1072 | Graph Neural Networks for Object Type Classification Based on Automotive Radar Point Clouds and Spectra Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose graph neural networks (GNN) for radar OTC, which jointly process the radar reflection list and spectra. |
L. Saini; A. Acosta; G. Hakobyan; |
1073 | Graph Neural Networks for Sound Source Localization on Distributed Microphone Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a localization method using the Relation Network GNN, which we show shares many similarities to classical signal processing algorithms for Sound Source Localization (SSL). |
E. Grinstein; M. Brookes; P. A. Naylor; |
1074 | Graph Representation Learning For Stroke Recurrence Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the recent success of graph learning methods on medical tasks, we introduce a graph representation framework for stroke recurrence prediction (GraSReP) based on patient data. |
N. Glaze; A. Bayer; X. Jiang; S. Savitz; S. Segarra; |
1075 | Graph Signal Processing for Narrowband Direction of Arrival Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the performance, a new GSP-based DOA estimation method is proposed. |
D. Li; W. Liu; Y. Zakharov; P. D. Mitchell; |
1076 | Graph Signal Processing For Neurogimaging to Reveal Dynamics of Brain Structure-Function Coupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we leverage the GSP framework in a sliding-window setting to investigate the dynamics of brain structure-function coupling during resting-state at the node- and edge-wise levels. |
M. G. Preti; T. A. W. Bolton; A. Griffa; D. V. De Ville; |
1077 | Graph Wavelet-Based Point Cloud Geometric Denoising with Surface-Consistent Non-Negative Kernel Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel graph construction method, surface-consistent non-negative kernel regression (SC-NNK), that can achieve more accurate denoising of geometry information in combination with spectral graph wavelet transforms (SGWTs). |
R. Watanabe; K. Nonaka; E. Pavez; T. Kobayashi; A. Ortega; |
1078 | Gridless Target Localization for FDA-Mimo Radar with Sparse Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Most studies of frequency diverse array multiple-input multiple-output (FDA-MIMO) radar are based on uniform linear array (ULA), and thus the extended aperture characteristic of … |
X. Wu; Y. Liu; X. Jia; |
1079 | Group Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the group personalization approach for applications of FL in which there exist inherent partitions over clients that are significantly distinct. |
Z. Liu; Y. Hui; F. Peng; |
1080 | Group-Wise Co-Salient Object Detection with Siamese Transformers Via Brownian Distance Covariance Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) Their models lack discriminability to differentiate semantic differences between different groups since only one group of images with the same semantic category has been taken into account for model training. To address these issues, this paper presents a Siamese Transformer architecture for CoSOD that can fully mine the group-wise semantic contrast information for more discriminative feature learning. |
Y. Wu; H. Zhang; L. Liang; Y. Zhao; K. Zhang; |
1081 | GSWIN: Gated MLP Vision Model with Hierarchical Structure of Shifted Window Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These architectures have been attracting attention recently to alternate the traditional CNNs, and many Vision Transformers and Vision MLPs have been proposed. By fusing the above two streams, this paper proposes gSwin, a novel vision model which can consider spatial hierarchy and locality due to its network structure similar to the Swin Transformer, and is parameter efficient due to its gated MLP-based architecture. |
M. Go; H. Tachibana; |
1082 | GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genre-consistent dance generation framework, GTN-Bailando. |
H. Zhuang; S. Lei; L. Xiao; W. Li; L. Chen; S. Yang; Z. Wu; S. Kang; H. Meng; |
1083 | Guide and Select: A Transformer-Based Multimodal Fusion Method for Points of Interest Description Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel Guide-Select multimodal fusion method that combines the guiding and selecting process to fuse various POI-related information efficiently. |
H. Liu; W. Wang; N. Hu; H. -T. Zheng; R. Xie; W. Wu; Y. Bai; |
1084 | Guided Speech Enhancement Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a speech enhancement solution that takes both the raw microphone and beamformer outputs as the input for an ML model. |
Y. Yang; S. -F. Shih; H. Erdogan; J. Menjay Lin; C. Lee; Y. Li; G. Sung; M. Grundmann; |
1085 | Hadamard Layer to Improve Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The Hadamard Layer, a simple and computationally efficient way to improve results in semantic segmentation tasks, is presented. This layer has no free parameters that require to … |
A. Hoyos; M. Rivera; |
1086 | HAG: Hierarchical Attention with Graph Network for Dialogue Act Classification in Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we pro-posed a hierarchical attention with the graph neural network (HAG) to consider the contextual interconnections as well as the semantics carried by the sentence itself. |
C. Fu; Z. Chen; J. Shi; B. Wu; C. Liu; C. T. Ishi; H. Ishiguro; |
1087 | Half-Temporal and Half-Frequency Attention U2Net for Speech Signal Improvement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes half-temporal and half-frequency attention U2Net for improving full-band speech signal. |
Z. Zhang; S. Xu; X. Zhuang; Y. Qian; L. Zhou; M. Wang; |
1088 | Halluaudio: Hallucinate Frequency As Concepts For Few-Shot Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Few-shot audio classification is an emerging topic that attracts more and more attention from the research community. Most existing work ignores the specificity of the form of the audio spectrogram and focuses largely on the embedding space borrowed from image tasks, while in this work, we aim to take advantage of this special audio format and propose a new method by hallucinating high-frequency and low-frequency parts as structured concepts. |
Z. Yu; S. Wang; L. Chen; Z. Cheng; |
1089 | Hankel Structured Low Rank and Sparse Representation Via L0-Norm Optimization for Compressed Ultrasound Plane Wave Signal Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, a L0-norm based Hankel structured low-rank and sparse model is proposed to reduce the channel data. |
M. Zhang; J. Chen; X. Fu; G. Xin; J. Zhang; N. Jiang; J. D’Hooge; |
1090 | HappyQuokka System for ICASSP 2023 Auditory EEG Challenge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This report describes our submission to Task 2 of the Auditory EEG Decoding Challenge at ICASSP 2023 Signal Processing Grand Challenge (SPGC). |
Z. Piao; M. Kim; H. Yoon; H. -G. Kang; |
1091 | Hardware Friendly Spline Sketched Lidar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an efficient photon acquisition approach is proposed that exploits the simplicity of piecewise polynomial splines to form a hardware-friendly compressed statistic, or spline sketch, of the ToF data. |
M. P. Sheehan; J. Tachella; M. E. Davies; |
1092 | Hardware-Limited Non-Uniform Task-Based Quantizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework based on generalized Bussgang decomposition that enables the design and analysis of hardware-limited task-based quantizers equipped with non-uniform scalar quantizers or have inputs with unbounded support. |
N. I. Bernardo; J. Zhu; Y. C. Eldar; J. Evans; |
1093 | HARQ Delay Minimization of 5G Wireless Network with Imperfect Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider various delay components in incremental redundancy (IR) HARQ systems and minimize the average delay by applying asymmetric feedback detection (AFD) and find the optimal transmission length for each transmission attempt. |
W. Ding; M. Shikh-Bahaei; |
1094 | HDNet: Hierarchical Dynamic Network for Gait Recognition Using Millimeter-wave Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, we propose a Hierarchical Dynamic Network (HDNet) for gait recognition using mmWave radar. |
Y. Huang; Y. Wang; K. Shi; C. Gu; Y. Fu; C. Zhuo; Z. Shi; |
1095 | Healthcall Corpus and Transformer Embeddings from Healthcare Customer-Agent Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the corpus called HealthCall which was recorded in real-life conditions in the call center of Malakoff Humanis a health insurance company. |
N. Lackovic; C. Montacié; C. Lequilliec; M. -J. Caraty; |
1096 | Hearing and Seeing Abnormality: Self-Supervised Audio-Visual Mutual Learning for Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With proper supervised pretraining on auxiliary tasks as prior, the situation can be improved, but the requirement to collect a large number of additional annotations for these tasks may restrict the further development of a generalized deep-fake detector. To address this issue, we propose an Audio-Visual Temporal Synchronization for Deepfake Detection framework for detecting deepfakes that maintains reasonable detection capabilities for unseen ones. |
C. -S. Sung; J. -C. Chen; C. -S. Chen; |
1097 | Heart Rate Estimation and Performance Analysis Using MIMO Radar with Dispersed Antennas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a multiple-input multiple-output (MIMO) radar with dispersed antennas is employed to monitor the physiological signals caused by respiration and heartbeat for non-contact HR estimation. |
P. Wang; Q. He; |
1098 | Heart Rate Extraction from Abdominal Audio Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As heart sounds collected from the abdomen suffer from significant noise from GI and respiratory tracts, we leverage wavelet denoising for improved heart beat detection. |
J. Stuchbury-Wass; E. Bondareva; K. -J. Butkow; S. Šćepanović; Z. Radivojevic; C. Mascolo; |
1099 | Hearttoheart: The Arts of Infant Versus Adult-Directed Speech Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, we exploit multiple datasets, combined to form a larger corpus for training. |
N. D. Al Futaisi; A. Cristia; B. W. Schuller; |
1100 | He-Gan: Differentially Private Gan Using Hamiltonian Monte Carlo Based Exponential Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose HE-GAN, a DP generative framework that eliminates noise addition by using Exponential Mechanism (EM) on the privacy-factor-adjusted posterior predictive distribution of a classifier trained on the private data. |
U. Hassan; D. Chen; S. -C. S. Cheung; C. -N. Chuah; |
1101 | HEiMDaL: Highly Efficient Method for Detection and Localization of Wake-Words Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an low footprint CNN model, called HEiMDaL, to detect and localize keywords in streaming conditions. |
A. Kundu; M. Samragh; M. Cho; P. Padmanabhan; D. Naik; |
1102 | HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for Data with Heteroscedastic Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a heteroscedastic mixtures of probabilistic PCA technique (HeMPPCAT) that uses a gen-eralized expectation-maximization (GEM) algorithm to jointly estimate the unknown underlying factors, means, and noise variances under a heteroscedastic noise setting. |
A. S. Xu; L. Balzano; J. A. Fessler; |
1103 | Heterogeneous Graph Learning for Acoustic Event Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Graphs for audiovisual data are constructed manually which is both difficult and sub-optimal. In this work, we address this problem by (i) proposing a parametric graph construction strategy for the intra-modal edges, and (ii) learning the crossmodal edges. |
A. Shirian; M. Ahmadian; K. Somandepalli; T. Guha; |
1104 | Heuristic Masking for Text Representation Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a heuristic token masking scheme is studied, in which those tokens that deep networks and shallow networks have inconsistent predictions for are more likely to be masked. |
Y. Zhuang; |
1105 | Hiding Speaker’s Sex in Speech Using Zero-Evidence Speaker Representation in An Analysis/Synthesis Pipeline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose to transform the speaker embedding and the pitch in order to hide the sex of the speaker. |
P. -G. Noé; X. Miao; X. Wang; J. Yamagishi; J. -F. Bonastre; D. Matrouf; |
1106 | Hierarchical Diffusion Models for Singing Voice Neural Vocoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a hierarchical diffusion model for singing voice neural vocoders. |
N. Takahashi; M. Kumar; Singh; Y. Mitsufuji; |
1107 | Hierarchical Filtering With Online Learned Priors for ECG Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents HKF, a hierarchical Kalman filtering method, that leverages a patient-specific learned structured prior of the ECG signal, and integrates it into a state space model to yield filters that capture both intra- and inter-heartbeat dynamics. |
T. Locher; G. Revach; N. Shlezinger; R. J. G. van Sloun; R. Vullings; |
1108 | Hierarchical Graph Learning for Stock Market Prediction Via A Domain-Aware Graph Pooling Operator Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The utility of Graph Neural Networks (GNN) for the paradigm of forecasting short-term stock price movements is investigated. In particular, a finance-specific graph pooling … |
A. N. Arya; Y. Lei Xu; L. Stankovic; D. P. Mandic; |
1109 | Hierarchical Hypergraph Recurrent Attention Network for Temporal Knowledge Graph Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to capture higher-order interactions of entities for TKG reasoning. |
J. Guo; M. Chen; Y. Zhang; J. Huang; Z. Liu; |
1110 | Hierarchical Interactive Reconstruction Network for Video Compressive Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Hierarchical InTeractive Video CS Reconstruction Network(HIT-VCSNet), which can cooperatively exploit the deep priors in both spatial and temporal domains to improve the reconstruction quality. |
T. Zhang; W. Cui; C. Hui; F. Jiang; |
1111 | Hierarchical Multi-Agent Reinforcement Learning with Intrinsic Reward Rectification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a value decomposition based hierarchical multi-agent reinforcement learning method with intrinsic reward rectification, which can determine the effectiveness of macro actions and correct the intrinsic rewards. |
Z. Liu; Z. Xu; G. Fan; |
1112 | Hierarchical Multi-Task Learning for Fabric Component Analysis Based on NIR Spectral Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper proposes a hierarchical architecture for Multi-Label Classification (MLC) and Multi-Output Regression (MOR) to simultaneously identify fabric classes and their contents, i.e., fabric component analysis. |
J. Kim; D. Wu; M. Chi; G. Xu; |
1113 | Hierarchical Network with Decoupled Knowledge Distillation for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a hierarchical network, called DKDFMH, which employs decoupled knowledge distillation in a deep convolutional neural network with a fused multi-head attention mechanism. |
Z. Zhao; H. Wang; H. Wang; B. Schuller; |
1114 | Hierarchical Pronunciation Assessment with Multi-Aspect Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Hierarchical Pronunciation Assessment with Multi-aspect Attention (HiPAMA) model, which hierarchically represents the granularity levels to directly capture their linguistic structures and introduces multi-aspect attention that reflects associations across aspects at the same level to create more connotative representations. |
H. Do; Y. Kim; G. G. Lee; |
1115 | Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing multilingual hierarchical Softmax decoding. |
Q. Liu; Z. Gong; Z. Yang; Y. Yang; S. Li; C. Ding; N. Minematsu; H. Huang; F. Cheng; C. Chu; S. Kurohashi; |
1116 | Hierarchical Spatial-Temporal Transformer with Motion Trajectory for Individual Action and Group Activity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel reasoning network, Hierarchical Spatial-Temporal Transformer termed HSTT, for individual action and group activity recognition, which focuses on capturing the various degrees of spatial-temporal dynamic interactions adaptively and jointly among actors. |
X. Zhu; D. Wang; Y. Zhou; |
1117 | Hierarchical Spatiotemporal Feature Fusion Network For Video Saliency Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current video saliency prediction methods have made great progress relying on the feature extraction capability of CNN, but there are still many defects in hierarchical feature fusion, limiting the further improvement of accuracy. To address this issue, we propose a 3D convolutional Hierarchical Spatiotemporal Feature Fusion Network (HSFF-Net). |
Y. Zhang; T. Zhang; C. Wu; Y. Zheng; |
1118 | Hierarchical Transformer for Multi-Label Trailer Genre Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by these, we propose a Hierarchical Transformer (HT). |
Z. Cai; H. Ding; X. Wu; M. Xu; X. Cui; |
1119 | HIFI++: A Unified Framework for Bandwidth Extension and Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. |
P. Andreev; A. Alanov; O. Ivanov; D. Vetrov; |
1120 | High-Acoustic Fidelity Text To Speech Synthesis With Fine-Grained Control Of Speech Attributes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a model with fine grained attribute control, which also has better acoustic fidelity (attributes of the output which we want to control do not deviate from the control signals) than previously proposed models as shown in our experiments1. |
R. Valle; J. F. Santos; K. J. Shih; R. Badlani; B. Catanzaro; |
1121 | High-Dimensional Confidence Regions in Sparse MRI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The purpose of this work is to extend the method to the MRI case in order to construct confidence intervals for each pixel of an MR image. |
F. Hoppe; F. Krahmer; C. Mayrink Verdun; M. I. Menzel; H. Rauhut; |
1122 | High-Dynamic Range ADC for Finite-Rate-of-Innovation Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider a modulo sampling framework for finite rate of innovation (FRI) signals which are used to model signals in time of flight imaging. |
S. Mulleti; Y. C. Eldar; |
1123 | Higher-Order Link Prediction Via Learnable Maximum Mean Discrepancy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods either make restrictive assumptions regarding the emergence of HOLs, or, they rely on reduced dimensionality models of limited expressiveness. To overcome these limitations, the HOLP approach developed here leverages distribution similarities across embeddings as captured by a learnable probability metric. |
G. V. Karanikolas; A. Pagès-Zamora; G. B. Giannakis; |
1124 | Higher-Order Sparse Convolutions in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new higher-order sparse convolution based on the Sobolev norm of graph signals. |
J. H. Giraldo; S. Javed; A. Mahmood; F. D. Malliaros; T. Bouwmans; |
1125 | Higher-Order Spatio-Temporal Neural Networks for Covid-19 Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the COVID-19 spatio-temporal graph (COV19-STG) datasets, i.e., spatio-temporal United States COVID-19 graph datasets on the county-level. |
Y. Chen; S. Batsakis; H. V. Poor; |
1126 | High-Frequency Transformer Network Based on Window Cross-Attention for Pansharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the powerful ability to capture long-distance dependencies in the vision transformer, we propose a novel high-frequency transformer network based on window cross-attention to fuse panchromatic (PAN) and multispectral (MS) images for a high-resolution MS image. |
C. Ke; H. Liang; D. Li; X. Tian; |
1127 | High-Level Feature Fusion Network for Session-Based Social Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most previous studies pay more attention to complex transitions to get item embedding in various ways and neglect the importance of users’ role in social network. Therefore, we design a High-level Feature Fusion Network to address these issues. |
L. Wang; M. Li; H. -T. Zheng; |
1128 | High Quality Audio Coding with Mdctnet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a neural audio generative model, MDCTNet, operating in the perceptually weighted domain of an adaptive modified discrete cosine transform (MDCT). |
G. Davidson; M. Vinton; P. Ekstrand; C. Zhou; L. Villemoes; L. Lu; |
1129 | High-Resolution Embedding Extractor for Speaker Diarisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a novel embedding extractor architecture, referred to as a high-resolution embedding extractor (HEE), which extracts multiple high-resolution embeddings from each speech segment. |
H. -S. Heo; Y. Kwon; B. -J. Lee; Y. J. Kim; J. -W. Jung; |
1130 | High-Resolution Neural Network Processing of LFM Radar Pulses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents an alternative processing scheme for oversampled radar signals based on small-sized neural networks. |
J. Akhtar; |
1131 | High-Speed Drone Detection Based On Yolo-V8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we modify the state-of-the-art YOLO-V8 to achieve fast and reliable drone detection. |
J. -H. Kim; N. Kim; C. S. Won; |
1132 | Hindi As A Second Language: Improving Visually Grounded Speech with Semantically Similar Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective. |
H. Ryu; A. Senocak; I. So Kweon; J. Son Chung; |
1133 | Hint-Dynamic Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Hint-dynamic Knowledge Distillation, dubbed HKD, which excavates the knowledge from the teacher’s hints in a dynamic scheme. |
Y. Liu; C. Li; X. Tu; X. Ding; Y. Huang; |
1134 | HIPI: A Hierarchical Performer Identification Model Based on Symbolic Representation of Music Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we apply a Recurrent Neural Network (RNN) model to classify the most likely music performers from their interpretative styles. |
S. R. Mahmud Rafee; G. Fazekas; G. Wiggins; |
1135 | HiSSNet: Sound Event Detection and Speaker Identification Via Hierarchical Prototypical Networks for Low-Resource Headphones Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Finally, such models should have a small memory footprint to run on low-power headphones with limited on-chip memory. In this paper, we propose addressing these challenges using HiSSNet (Hierarchical SED and SID Network). |
N. Shashaank; B. Banar; M. R. Izadi; J. Kemmerer; S. Zhang; C. -C. J. Huang; |
1136 | History, Present and Future: Enhancing Dialogue Generation with Few-Shot History-Future Prompt Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel lightweight dialogue generation framework named few-shot history-future prompt that utilizes useful histories and simulated futures to help generate informative responses, without the need for fine-tuning or adding extra parameters. |
Y. Wang; Y. Li; Y. Wang; F. Mi; P. Zhou; J. Liu; X. Jiang; Q. Liu; |
1137 | HITSZ TMG at ICASSP 2023 SPGC Shared Task: Leveraging Pre-Training and Distillation Method for Title Generation with Limited Resource Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our proposed method for the shared task of the ICASSP 2023 Signal Processing Grand Challenge (SPGC). |
T. Xu; Z. Zheng; X. Hu; Z. Sun; Y. Zhao; B. Hu; |
1138 | How to Push The Fastest Model 50x Faster: Streaming Non-Autoregressive Speech Synthesis on Resouce-Limited Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the FastStreamSpeech model is proposed combining the advantages of the advanced approaches in neural-based speech synthesis. |
V. -T. Nguyen; H. -C. Pham; D. -K. Mac; |
1139 | HPFTN: Hierarchical Progressive Fusion Transformer Network for Video Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video denoising. |
S. Zhang; Y. Zhang; Z. Zhao; D. Xie; S. Pu; |
1140 | HQP-MVS:High-Quality Plane Priors Assisted Multi-View Stereo for Low-Textured Areas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a novel framework to obtain high-quality planar priors. |
Z. Tian; R. Wang; Z. Wang; R. Wang; |
1141 | HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to use neural fields, a differentiable representation of functions through neural networks, to model HRTFs with arbitrary spatial sampling schemes. |
Y. Zhang; Y. Wang; Z. Duan; |
1142 | HTNet: Human Topology Aware Network for 3d Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Further considering the hierarchy of the human topology, joint-level and body-level dependencies are captured via graph convolutional networks and self-attentions, respectively. Based on these designs, we propose a novel Human Topology aware Network (HTNet), which adopts a channel-split progressive strategy to sequentially learn the structural priors of the human topology from multiple semantic levels: joint, part, and body. |
J. Cai; H. Liu; R. Ding; W. Li; J. Wu; M. Ban; |
1143 | HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose HuBERT-AGG, a novel method that learns noise-invariant SSL representations for robust speech recognition by distilling aggregated layer-wise representations. |
W. Wang; Y. Qian; |
1144 | Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-Temporal Masked Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution for pose estimation from ambiguous pressure data. |
V. Davoodnia; A. Etemad; |
1145 | Hybridformer: Improving Squeezeformer with Hybrid Attention and NSR Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method HybridFormer to improve SqueezeFormer in a fast and efficient way. |
Y. Yang; Y. Pan; J. Yin; J. Han; L. Ma; H. Lu; |
1146 | Hybrid Neural Network with Cross- and Self-Module Attention Pooling for Text-Independent Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this contribution, to extract speaker discriminant utterance level embeddings, we propose a hybrid neural network that employs both cross- and self-module attention pooling mechanisms. |
J. Alam; W. H. Kang; A. Fathan; |
1147 | Hybrid Ris-Assisted Interference Mitigation for Spectrum Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores reconfigurable intelligent surfaces (RIS) for mitigating cross-system interference in spectrum sharing applications. |
F. Wang; A. L. Swindlehurst; |
1148 | Hybrid Transformers for Music Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Hybrid Transformer Demucs (HT Demucs), an hybrid temporal/spectral bi-U-Net based on Hybrid Demucs [2], where the innermost layers are replaced by a cross-domain Transformer Encoder, using self-attention within one domain, and cross-attention across domains. |
S. Rouard; F. Massa; A. Défossez; |
1149 | HYDRA-HGR: A Hybrid Transformer-Based Architecture for Fusion of Macroscopic and Microscopic Neural Drive Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to complexities of sEMG decomposition and added computational overhead, HGR at microscopic level is less explored than its aforementioned macroscopic-level, DNN-based counterparts. In this regard, we propose the HYDRA-HGR framework, which is a hybrid model for HGR that simultaneously extracts a set of temporal and spatial features through its two independent Vision Transformer (ViT)-based parallel architectures (the so called Macro and Micro paths). |
M. Montazerin; E. Rahimian; F. Naderkhani; S. F. Atashzar; H. Alinejad-Rokny; A. Mohammadi; |
1150 | Hyneter: Hybrid Network Transformer for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we point out that the essential differences between CNN-based and Transformer-based detectors, which cause worse performance of small object in Transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. |
D. Chen; D. Miao; X. Zhao; |
1151 | Hyperbolic Audio Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features. |
D. Petermann; G. Wichern; A. Subramanian; J. L. Roux; |
1152 | Hypernetwork-Based Adaptive Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach that is highly accurate and allows a significant reduction in the number of parameters. |
S. Aharon; G. Ben-Artzi; |
1153 | Hyperspectral Image Denoising Via Nonlocal Rank Residual Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel nonlocal rank residual (NRR) approach for highly effective HSI denoising, which progressively approximates the underlying L-R tensor via minimizing the rank residual. |
Z. Zha; B. Wen; X. Yuan; J. Zhou; C. Zhu; |
1154 | HyperSteg: Hyperbolic Learning for Deep Steganography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through extensive experiments over image and audio datasets, we introduce HyperSteg as a practical, model and modality agnostic approach for information hiding. |
S. Agarwal; R. Soun; R. Shivani; V. Varanasi; N. Gill; R. Sawhney; |
1155 | Hypothesis Test for Leakage Detection in Water Pipelines with High-Dimensional Sensor Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detection method for high dimensional settings, which employs a regularized covariance matrix estimate. |
L. Yang; M. R. McKay; X. Wang; |
1156 | I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel Transformer encoder with Input-Dependent Dynamic Depth (I3D) to achieve strong performance-efficiency trade-offs. |
Y. Peng; J. Lee; S. Watanabe; |
1157 | IAST: Instance Association Relying on Spatio-Temporal Features for Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most offline video instance segmentation (VIS) methods lack consideration for multi-scale spatio-temporal features, which leads to unstable instance association across frames. To address this problem, we propose IAST that builds Instance Association relying on Spatio-Temporal features for video instance segmentation. |
J. Chen; S. Liu; R. Chen; B. Guo; F. Zhang; |
1158 | ICASSP 2023 Breaker Page Related Papers Related Patents Related Grants Related Venues Related Experts View |
|
1159 | ICASSP 2023 Cover Page Related Papers Related Patents Related Grants Related Venues Related Experts View |
|
1160 | ICCRN: Inplace Cepstral Convolutional Recurrent Neural Network for Monaural Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a neural network for monaural speech enhancement on time-frequency cepstral space that is implemented by inserting a cepstral frequency block into our inplace convolutional recurrent network. |
J. Liu; X. Zhang; |
1161 | ICEL: Learning with Inconsistent Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by CGC (Contrastive GradCAM consistency), we propose ICEL (InConsistent Explanation Learning) method which introduces inconsistent explanation loss measured by cosine similarity on heatmaps. |
B. Liu; X. Wu; B. Yuan; |
1162 | ICStega: Image Captioning-based Semantically Controllable Linguistic Steganography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we put forward a novel image captioning-based stegosystem, where the secret messages are embedded into the generated captions. |
X. Wang; Y. Wang; K. Chen; J. Ding; W. Zhang; N. Yu; |
1163 | Ideal: Improved Dense Local Contrastive Learning For Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we extend the concept of metric learning to the segmentation task, using a dense (dis)similarity learning for pre-training a deep encoder network, and employing a semi-supervised paradigm to fine-tune for the downstream task. Specifically, we propose a simple convolutional projection head for obtaining dense pixel-level features, and a new contrastive loss to utilize these dense projections thereby improving the local representations. |
H. Basak; S. Chattopadhyay; R. Kundu; S. Nag; R. Mallipeddi; |
1164 | Identifiable Bounded Component Analysis Via Minimum Volume Enclosing Parallelotope Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit bounded component analysis (BCA) and formulate it as a geometric problem of finding the minimum volume enclosing parallelotope (MVEP) of a set of data points in the Euclidean space. |
J. Hu; K. Huang; |
1165 | Identifying Coordination in A Cognitive Radar Network – A Multi-Objective Inverse Reinforcement Learning Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By ‘coordination’ we mean that the radar emissions satisfy Pareto optimality with respect to multiobjective optimization over each radar’s utility. This paper provides a novel multi-objective inverse reinforcement learning approach which allows for both detection of such Pareto optimal (‘coordinating’) behavior and subsequent reconstruction of each radar’s utility function, given a finite dataset of radar network emissions. |
L. Snow; V. Krishnamurthy; B. M. Sadler; |
1166 | Identifying Entrainment in Task-Oriented Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate short task-oriented Wizard-of-Oz conversations for acoustic-prosodic and lexical entrainment. |
R. Chen; S. Kim; A. Papangelis; J. Hirschberg; Y. Liu; D. Hakkani-Tür; |
1167 | Identifying Opinion Influencers Over Social Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we show that a sequence of publicly exchanged beliefs allows users to discover rich information about the underlying model. |
V. Shumovskaia; M. Kayaalp; A. H. Sayed; |
1168 | Identifying Source Speakers for Voice Conversion Based Spoofing Attacks on Speaker Verification Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of source speaker identification – inferring the identity of the source speaker given the voice converted speech. |
D. Cai; Z. Cai; M. Li; |
1169 | IfUNet++: Iterative Feedback UNet++ for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an iterative feedback UNet++ for infrared small target detection, dubbed ifUNet++. |
Z. Weng; P. Li; X. Zhuang; X. Yan; L. Gong; H. Xie; M. Wei; |
1170 | I Hear Your True Colors: Image Guided Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose IM2WAV, an image guided open-domain audio generation system. |
R. Sheffer; Y. Adi; |
1171 | Image Adversarial Steganography Based on Joint Distortion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, incorporating adversarial steganography and joint distortion assignment, we present a novel adversarial steganographic scheme named JAS (Joint Adversarial Steganography). |
Z. Fan; K. Chen; C. Qin; K. Zeng; W. Zhang; N. Yu; |
1172 | Image Completion Via Dual-Path Cooperative Filtering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, a Dual-path Cooperative Filtering (DCF) model is proposed, where one path predicts dynamic kernels, and the other path extracts multi-level features by using Fast Fourier Convolution to yield semantically coherent reconstructions. |
P. Shamsolmoali; M. Zareapoor; E. Granger; |
1173 | Image Fusion Via Slice-Based Convolutional Sparse Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To extract texture information of images more effectively, a slice-based convolutional sparse representation (SCSR) model is proposed, which is solved by an inertial proximal gradient method with dry friction (IPGM-DF) algorithm in the signal domain. |
J. Xu; Y. Zhang; Z. Li; J. Wang; |
1174 | Image Generation Is May All You Need for VQA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, image generation of VQA has been implemented in a limited way to modify only certain parts of the original image in order to control the quality and uncertainty. In this paper, to address this gap, we propose a method that utilizes the diffusion model, pre-trained with various tasks and images, to inject the prior knowledge base into generated images and secure diversity without losing generality about the answer. |
K. Kim; J. Lee; J. Lee; |
1175 | Image Inpainting with Semantic-Aware Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new Semantic-Aware Transformer, which in addition to including a self-attention block like previous vision transformers, also has a block for learning semantics from QSVM. |
S. Chen; W. Yu; Q. Wang; J. Gong; P. Chen; |
1176 | Image Reconstruction Without Explicit Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The framework we propose can handle general forward model corruptions, and we show that measurements derived from only a few ground-truth images (O(10)) are sufficient for image reconstruction without explicit priors. |
A. F. Gao; O. Leong; H. Sun; K. L. Bouman; |
1177 | Image Segmentation for Improved Lossless Screen Content Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a segmentation method that identifies natural regions enabling better adaptive treatment. |
S. R. Uddehal; T. Strutz; H. Och; A. Kaup; |
1178 | Image Sharing Chain Detection VIA Sequence-To-Sequence Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest a new sharing chain detection framework via Sequence-to-Sequence (Seq2Seq) model. |
J. You; Y. Li; R. Liang; Y. Tan; J. Zhou; X. Li; |
1179 | Image Source Method Based on The Directional Impulse Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the image source method for simulating the observed signals in the time-domain on the boundary of a spherical listening region. |
J. Wang; P. Samarasinghe; T. Abhayapala; J. A. Zhang; |
1180 | Imaginary Voice: Face-Styled Diffusion Model for Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices learnt from facial characteristics. |
J. Lee; J. Son Chung; S. -W. Chung; |
1181 | ImagineNet: Target Speaker Extraction with Intermittent Visual Cue Through Embedding Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the audio-visual speaker extraction algorithms with intermittent visual cue. |
Z. Pan; W. Wang; M. Borsdorf; H. Li; |
1182 | Immersive Enhancement and Removal of Loudspeaker Sound Using Wireless Assistive Listening Systems and Binaural Hearing Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate the proposed system using several commercial ALDs and assess the effects of delay, bandwidth, distortion, and noise on real-world system performance. |
R. M. Corey; A. C. Singer; |
1183 | Implementing Continuous HRTF Measurement in Near-Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a continuous measurement system targeted for the NF, and efficiently capturing HRTFs in the horizontal plane within 45 secs. |
E. -L. Tan; S. Peksi; W. -S. Gan; |
1184 | Implicit Bayes Adaptation: A Collaborative Transport Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In a related theme in the present work, we posit that domain adaptation robustness is rooted in the intrinsic (latent) representations of the respective data, which are inherently lying in a non-linear submanifold embedded in a higher dimensional Euclidean space. |
B. Jiang; H. Krim; T. Wu; D. Cansever; |
1185 | Implicitly Rotation Equivariant Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Implicitly Equivariant Network (IEN) which induces approximate equivariance in the different layers of a standard CNN by optimizing a multi-objective loss function. |
N. Khetan; T. Arora; S. u. Rehman; D. K. Gupta; |
1186 | Implicit Vehicle Positioning with Cooperative Lidar Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the problem of cooperative localization of passive objects in a vehicular environment through the fusion of lidar point clouds collected at different moving vehicles and sent to the road infrastructure. |
L. Barbieri; B. C. Tedeschini; M. Brambilla; M. Nicoli; |
1187 | Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How important are different temporal speech modulations for speech recognition? We answer this question from two complementary perspectives. |
S. Sadhu; H. Hermansky; |
1188 | Improved Acoustic-to-Articulatory Inversion Using Representations from Pretrained Self-Supervised Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). |
S. Udupa; S. C; P. K. Ghosh; |
1189 | Improved Appliance Transient Feature Extraction Via Template Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the long transients of appliances may be detected incompletely by the existing event detection methods based on general prior knowledge, which leads to the inconsistency of the extracted transient feature samples, and thus impacts the NILM performance. To fill in this gap, assuming that the true starting point of appliance state transition lies in the temporal neighborhood centering the starting point of detected event, an improved appliance transient feature extraction method via appliance-specific template matching is proposed for the event- based NILM approaches. |
B. Liu; F. Chang; W. Luan; B. Zhao; |
1190 | Improved Belief Propagation Decoding of Turbo Codes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose two polynomial-based methods to optimize the parity-check matrix of Turbo codes by improving the sparsity while also removing 4-cycles and even 6-cycles compared to the original matrix. |
Y. Shen; Y. Ren; A. T. Kristensen; X. You; C. Zhang; A. Burg; |
1191 | Improved Deep Speaker Localization and Tracking: Revised Training Paradigm and Controlled Latency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend this framework by using small jumps between neighboring discrete DOAs to simulate gradual movements. |
A. Bohlender; L. Roelens; N. Madhu; |
1192 | Improved Indoor Localization With NLOS Signal Propagations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although centimeter-level accuracy can be achieved using ultrawideband (UWB) signal sources, NLOS propagation due to reflection and refraction of signals as well as multipath effect may lead to significant errors in UWB-based localization systems. To mitigate this issue, a novel and simple NLOS error compensation method is proposed in this paper, which introduces a correction factor that takes into account the influence of NLOS conditions on the UWB distance measurements. |
W. Huang; Y. Zhao; X. Wu; L. Yin; |
1193 | Improved Mask-Based Neural Beamforming for Multichannel Speech Enhancement By Snapshot Matching Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new masking strategy to predict the Snapshot Matching Mask (SMM) that aims to minimize the distance between the predicted and the true signal snapshots, thereby estimating the PSD matrices in a more systematic way. |
C. -H. Lee; C. Yang; Y. Shen; H. Jin; |
1194 | Improved Projection Learning for Lower Dimensional Feature Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we explore an improved method for compressing all feature maps of pre-trained CNNs to below a specified limit. |
I. Price; J. Tanner; |
1195 | Improved Small Sample Hypothesis Testing Using The Uncertain Likelihood Ratio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with limited training data and observations, the GLR is suboptimal. To overcome this, we propose to use the Uncertain Likelihood Ratio (ULR) test statistic to accurately account for the uncertainty. |
J. Z. Hare; L. M. Kaplan; |
1196 | Improved Training Of Mixture-Of-Experts Language GANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Empirical study on synthetic and real benchmarks shows the superior performance in quantitative evaluation and demonstrates the effectiveness of our approach to adversarial text generation. |
Y. Chai; Q. Yin; J. Zhang; |
1197 | Improved Wifi-Based Respiration Tracking Via Contrast Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose WiResP, a practical and innovative WiFi-based respiration tracking system that utilizes a contrast enhancement technique to improve the detection of respiration. |
W. -H. Wang; X. Zeng; B. Wang; Y. Cao; K. J. Ray Liu; |
1198 | Improved Wordpcfg for Passwords with Maximum Probability Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the ambiguity, we improve WordPCFG by maximum probability segmentation with A*-like pruning algorithm. |
W. Li; J. Yang; H. Cheng; P. Wang; K. Liang; |
1199 | Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we contribute two methods that improve the accuracy of embedding-matching A2W. |
H. Yen; W. Jeon; |
1200 | Improving Accented Speech Recognition with Multi-Domain Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use speech audio representing four different French accents to create fine-tuning datasets that improve the robustness of pre-trained ASR models. |
L. Maison; Y. Esteve; |
1201 | Improving Acoustic Echo Cancellation By Mixing Speech Local and Global Features with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MiT-Net, a novel mix-transformer neural network with a pyramid encoder operating in the time domain, for the task of acoustic echo cancellation. |
Y. Liu; X. Xu; W. Tu; Y. Yang; L. Xiao; |
1202 | Improving Adversarial Robustness with Hypersphere Embedding and Angular-Based Regularizations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose integrating HE into AT with regularization terms that exploit the rich angular information available in the HE framework. |
O. Fakorede; A. Nirala; M. Atsague; J. Tian; |
1203 | Improving Audio Captioning Using Semantic Similarity Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider a metric measuring semantic similarities between predicted and reference captions instead of measuring exact word overlap. |
R. Mahfuz; Y. Guo; E. Visser; |
1204 | Improving Automatic Sleep Staging Via Temporal Smoothness Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a regularization method, so-called temporal smoothness regularization, for training deep neural networks for automatic sleep staging in small data settings. |
H. Phan; E. Heremans; O. Y. Chén; P. Koch; A. Mertins; M. De Vos; |
1205 | Improving Bert Fine-Tuning Via Stabilizing Cross-Layer Mutual Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, very little work explores mining the reliable part of pre-learned information that can help to stabilize fine-tuning. To address this challenge, we introduce a novel solution in which we fine-tune BERT with stabilized cross-layer mutual information. |
J. Li; X. Li; T. Wang; S. Wang; Y. Cao; C. Xu; D. Dou; |
1206 | Improving Contextual Biasing with Text Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a model-based approach to improving contextual biasing that improves quality without drastically increasing model computation during inference. |
T. N. Sainath; R. Prabhavalkar; D. Caseiro; P. Rondon; C. Allauzen; |
1207 | Improving Contextual Spelling Correction By External Acoustics Attention and Semantic Aware Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as … |
X. Wang; Y. Liu; J. Li; S. Zhao; |
1208 | Improving CTC-Based ASR Models With Gated Interlayer Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Gated Interlayer Collaboration (GIC) mechanism to improve the performance of CTC-based models, which introduces textual information into the model and thus relaxes the conditional independence assumption of CTC-based models. |
Y. Yang; Y. Li; B. Du; |
1209 | Improving Disfluency Detection with Multi-Scale Self Attention and Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they heavily rely on hand-craft features or word-to-word match patterns, which are insufficient to precisely capture such rough copy and cause under-tagging and over-tagging problems. To alleviate these problems, we propose a multi-scale self-attention mechanism (MSAT) and design contrastive learning (CL) loss for this task. |
P. Wang; C. Duan; M. Chen; X. He; |
1210 | Improving Dropout in Graph Convolutional Networks for Recommendation Via Contrastive Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel graph convolutional network (GCN)-based recommendation model that incorporates a contrastive loss. |
H. Okamura; K. Maeda; R. Togo; T. Ogawa; M. Haseyama; |
1211 | Improving EEG-based Emotion Recognition By Fusing Time-Frequency and Spatial Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a classification network of EEG signals based on the cross-domain feature fusion method, which makes the network more focused on the features most related to brain activities and thinking changes by using the multi-domain attention mechanism. |
K. Zhu; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
1212 | Improving Electric Load Demand Forecasting with Anchor-Based Forecasting Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we deal with the problem of Electric Load Demand Forecasting (ELDF) considering the Greek Energy Market. |
M. Tzelepi; P. Nousi; A. Tefas; |
1213 | Improving Fairness and Robustness in End-to-End Speech Recognition Through Unsupervised Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a privacy preserving approach to improve fairness and robustness of end-to-end ASR without using metadata, zip codes, or even speaker or utterance embeddings directly in training. |
I. -E. Veliche; P. Fung; |
1214 | Improving Fast-slow Encoder Based Transducer with Streaming Deliberation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a fast-slow encoder based transducer with streaming deliberation for end-to-end automatic speech recognition. |
K. Li; J. Mahadeokar; J. Guo; Y. Shi; G. Keren; O. Kalinli; M. L. Seltzer; D. Le; |
1215 | Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The lack of data resource results in poor synthesis effect. To alleviate this issue, we propose to use TTS (Text-To-Speech) for data augmentation to improve few-shot ability of the talking face system. |
Q. Chen; Z. Ma; T. Liu; X. Tan; Q. Lu; K. Yu; X. Chen; |
1216 | Improving Heart Rate and Heart Rate Variability Estimation from Video Through A HR-RR-Tuned Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents algorithms to improve the estimation of heart rate (HR) and heart rate variability (HRV) from smartphone video. |
M. Chan; L. Zhu; K. Vatanparvar; H. Jung; J. Kuang; A. Gao; |
1217 | Improving Image Captioning with Control Signal of Sentence Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new control signal of sentence quality, which is taken as an additional input to the captioning model. |
Z. Zhu; S. Wang; H. Qu; |
1218 | Improving Knowledge Distillation for Non-Intrusive Load Monitoring Through Explainability Guided Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to address the aforementioned issue by placing the KD NILM approach in a framework of explainable AI (XAI). |
D. Batic; G. Tanoni; L. Stankovic; V. Stankovic; E. Principi; |
1219 | Improving Learning Objectives for Speaker Verification from The Perspective of Score Comparison Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the limit of conventional learning objectives in speaker verification and propose a new learning objective designed from the perspective of similarity scores. |
M. H. Han; S. Hwan Mun; M. Kim; M. Jeong; S. H. Ahn; N. Soo Kim; |
1220 | Improving Massively Multilingual ASR with Auxiliary CTC Objectives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark, by conditioning the entire model on language identity (LID). |
W. Chen; B. Yan; J. Shi; Y. Peng; S. Maiti; S. Watanabe; |
1221 | Improving Music Genre Classification from Multi-modal Properties of Music and Genre Correlations Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, as genres normally co-occur in a music track, it is desirable to capture and model the genre correlations to improve the performance of multi-label music genre classification. To solve these issues, we present a novel multi-modal method leveraging audio-lyrics contrastive loss and two symmetric cross-modal attention, to align and fuse features from audio and lyrics. |
G. Ru; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
1222 | Improving Noisy Student Training on Non-Target Domain Data for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks. |
Y. Chen; W. Ding; J. Lai; |
1223 | Improving Non-Autoregressive Speech Recognition with Autoregressive Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AR pretraining to the NAR encoder to reduce the accuracy gap between AR and NAR models. |
Y. Li; L. Samarakoon; I. Fung; |
1224 | Improving Occluded Human Pose Estimation Via Linked Joints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider that the failure is mainly due to keypoint heatmaps’ insufficiency for distinguishing the joints from two occluded bodies. Therefore, in this paper, we propose a method termed SkeletonMap (SMap), which introduces the prior knowledge of body structure to constrain relative connection of joints. |
S. Ye; Z. Hong; J. Zheng; S. Zhang; |
1225 | Improving Performance of Real-Time Full-Band Blind Packet-Loss Concealment with Predictive Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a real-time recurrent method that leverages previous outputs to mitigate artefact of lost packets without the prior knowledge of loss mask. |
V. -A. Nguyen; A. H. T. Nguyen; A. W. H. Khong; |
1226 | Improving Phase-Vocoder-Based Time Stretching By Time-Directional Spectrogram Squeezing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to prevent percussion smearing, we propose a preprocessing for time stretching. |
N. Akaishi; K. Yatabe; Y. Oikawa; |
1227 | Improving Prosody for Cross-Speaker Style Transfer By Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that better style transfer can be achieved by using the source speaker’s prosody features that are easily predicted. |
C. Qiang; P. Yang; H. Che; Y. Zhang; X. Wang; Z. Wang; |
1228 | Improving Retrieval-Based Dialogue System Via Syntax-Informed Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose SIA, Syntax-Informed Attention, considering both intra- and inter-sentence syntax information. |
T. Song; N. Chen; J. Jiang; Z. Zhu; Y. Zou; |
1229 | Improving Scheduled Sampling for Neural Transducer-Based ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose SS approaches suited for RNNT. |
T. Moriya; T. Ashihara; H. Sato; K. Matsuura; T. Tanaka; R. Masumura; |
1230 | Improving Self-Supervised Learning for Audio Representations By Feature Diversity and Decorrelation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce SELFIE, a novel Self-supervised Learning approach for audio representation via Feature Diversity and Decorrelation. |
B. Nguyen; S. Uhlich; F. Cardinaux; |
1231 | Improving Sentence Similarity Estimation for Unsupervised Extractive Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed two novel strategies to improve sentence similarity estimation for unsupervised extractive summarization. |
S. Sun; R. Yuan; W. Li; S. Li; |
1232 | Improving Speech Enhancement Via Event-Based Query Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perceive speech and noises as different types of sound events and propose an event-based query method for SE. |
Y. Xin; X. Peng; Y. Lu; |
1233 | Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a multi-speaker Japanese audiobook text-to-speech (TTS) system that leverages multimodal context information of preceding acoustic context and bilateral textual context to improve the prosody of synthetic speech. |
D. Xin; S. Adavanne; F. Ang; A. Kulkarni; S. Takamichi; H. Saruwatari; |
1234 | Improving Speech-to-Speech Translation Through Unlabeled Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data to improve S2ST performance by applying various acoustic effects to the generated synthetic data. |
X. -P. Nguyen; S. Popuri; C. Wang; Y. Tang; I. Kulikov; H. Gong; |
1235 | Improving Spoken Language Identification with Map-Mix Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new data augmentation method that leverages model training dynamics of individual data points to improve sampling for the latent mixup. |
S. Rajaa; K. Anandan; S. Dalmia; T. Gupta; E. S. Chng; |
1236 | Improving Text-Audio Retrieval By Text-Aware Attention Pooling and Prior Matrix Revised Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, existing works aggregate the entire audio without considering the text, such as mean-pooling over the frames, which is likely to encode misleading audio information not described in the given text. In this paper, we present a text-aware attention pooling (TAP) module for TAR, which is essentially a scaled dot product attention for a text to attend to its most semantically similar frames. |
Y. Xin; D. Yang; Y. Zou; |
1237 | Improving The Modality Representation with Multi-view Contrastive Learning for Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the approach of representations improvement and devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives. |
P. Liu; X. Zheng; H. Li; J. Liu; Y. Ren; H. Zhu; L. Sun; |
1238 | Improving The Out-of-Distribution Generalization Capability of Language Models: Counterfactually-Augmented Data Is Not Enough Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attribute the inefficiency to Myopia Phenomenon caused by CAD: language models only focus on causal features that are edited in the augmentation and exclude other non-edited causal features. |
C. Fan; W. Chen; J. Tian; Y. Li; H. He; Y. Jin; |
1239 | Improving The Stochastic Gradient Descent’s Test Accuracy By Manipulating The ℓ∞ Norm of Its Gradient Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, from an algorithmic point of view, there is a clear incremental improvement path which relates all of them, i.e. from simple alternative such SG Clipping (SGC) to the well-known variance correction (Adagrad), follow by an (EMA) exponential moving average (RMSprop) to alternative furtherance such Newton (AdaDelta) or bias correction along with different EMA options for the gradient itself (Adam, AdaMAx, AdaBelief, etc.). In this paper, inspired by previous non-stochastic results on how to avoid divergence for ill chosen SS (for the accelerated proximal gradient algorithm), instead of directly using the standard SGD gradient’s EMA ${{\mathbf{\bar g}}_k}$, we propose to modify its entries so as to force $\left\{ {{{\left\| {{{{\mathbf{\bar g}}}_k}} \right\|}_\infty }} \right\}{\text{‘s}}$ moving average to be non-increasing. |
P. Rodriguez; |
1240 | Improving Transformer-Based End-to-End Speaker Diarization By Assigning Auxiliary Losses to Attention Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, to enhance the training effectiveness of SA-EEND models, we propose the use of auxiliary losses for the SA heads of the transformer layers. |
Y. -R. Jeoung; J. -Y. Yang; J. -H. Choi; J. -H. Chang; |
1241 | Improving Transformer-Based Networks with Locality for Automatic Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we enhance the Transformer with the enhanced locality modeling in two directions. |
M. Sang; Y. Zhao; G. Liu; J. H. L. Hansen; J. Wu; |
1242 | Improving Weakly Supervised Sound Event Detection with Causal Intervention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the causal analysis, we propose a causal intervention (CI) method for WSSED to remove the negative impact of co-occurrence confounders by iteratively accumulating every possible context of each class and then re-projecting the contexts to the frame-level features for making the event boundary clearer. |
Y. Xin; D. Yang; F. Cui; Y. Wang; Y. Zou; |
1243 | Incorporating Lip Features Into Audio-Visual Multi-Speaker DOA Estimation By Gated Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel audio-visual multi-speaker DOA estimation network, which for the first time incorporates multi-speaker lip features to adapt the complex overlapping and noisy scenarios. |
Y. Jiang; H. Chen; J. Du; Q. Wang; C. -H. Lee; |
1244 | Incorporating Reliability in Graph Information Propagation By Fluid Dynamics Diffusion: A Case of Multimodal Semisupervised Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose structuring graph neural networks on a new graph representation based on fluid dynamics diffusion that allows us to incorporate the reliability of the features used to characterise each sample within the graph structure itself. |
A. Marinoni; M. Mercier; Q. Shi; S. Selvakumaran; M. Girolami; |
1245 | Incorporating Uncertainty from Speaker Embedding Estimation to Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to incorporate the uncertainty estimate produced in the xi-vector network front-end with a probabilistic linear discriminant analysis (PLDA) back-end scoring for speaker verification. |
Q. Wang; K. A. Lee; T. Liu; |
1246 | Incorporating Visual Information Reconstruction Into Progressive Learning for Optimizing Audio-visual Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the large SNR gap between the learning target and input noisy speech, we propose a novel mask-based audio-visual progressive learning speech enhancement (AVPL) framework with visual information reconstruction (VIR) to increase SNRs gradually. |
C. -Y. Zhang; H. Chen; J. Du; B. -C. Yin; J. Pan; C. -H. Lee; |
1247 | Independent Vector Analysis with Multivariate Gaussian Model: A Scalable Method By Multilinear Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an efficient method for large-scale JBSS by multilinear regression. |
B. Gabrielson; M. Sun; M. A. B. S. Akhonda; V. D. Calhoun; T. Adali; |
1248 | Individual Sub-Band Estimation Approach to Bandwidth Extension and Enhancement of Coded Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to utilize two streaming SEANets in parallel, which are dedicated to the narrowband CSE and the generation of the upper band speech signal, respectively. |
Y. Choi; E. Lee; I. Jang; J. Won Shin; |
1249 | Inductive Relation Prediction from Relational Paths and Context with Hierarchical Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel method that captures both connections between entities and the intrinsic nature of entities, by simultaneously aggregating RElational Paths and cOntext with a unified hieRarchical Transformer framework, namely REPORT. |
J. Li; Q. Wang; Z. Mao; |
1250 | Information and Sensing Beamforming Optimization for Multi-User Multi-Target MIMO ISAC Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the joint beamforming design for simultaneous sensing and communication in a wireless multi-user system. |
M. Zhu; L. Li; S. Xia; T. -H. Chang; |
1251 | Information Extraction from Pill Bottle Images Via Text Stitching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an end-to-end framework for recognising entities from a sequence of images of pill bottles by using a combination of image and text features. |
R. K. Gupta; S. Roy; S. Jos; U. V. S.; L. Lavoie; F. Medous; W. Smith; |
1252 | InfoShape: Task-Based Neural Data Shaping Via Mutual Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose InfoShape, a task-based encoder that aims to remove unnecessary sensitive information from training data while maintaining enough relevant information for a particular ML training task. |
H. Esfahanizadeh; W. Wu; M. Ghobadi; R. Barzilay; M. Médard; |
1253 | Infrared and Visible Image Fusion By Using Multi-Scale Transformation and Fractional-Order Gradient Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For better highlighting the target, we use MDLatLRR to extract the base layer of the pre-fusion image and use it as the base layer of the fused image. |
S. Wu; K. Zhang; X. Yuan; C. Zhao; |
1254 | Inplace Cepstral Speech Enhancement System for The ICASSP 2023 Clarity Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This report summarizes our system submission to the ICASSP 2023 Clarity Challenge. |
J. Liu; X. Zhang; |
1255 | Input-Dependent Dynamical Channel Association For Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They generally adopt handcrafted matching or input-independent association matrix, which would lead to the semantic mismatch, thus suboptimal performance. To resolve this problem, we present an input-dependent channel association module. |
Q. Tang; Y. Zhang; X. Xu; J. Wang; Y. Guo; |
1256 | In Search of Strong Embedding Extractors for Speaker Diarisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input. |
J. -W. Jung; H. -S. Heo; B. -J. Lee; J. Huh; A. Brown; Y. Kwon; S. Watanabe; J. S. Chung; |
1257 | In-Sensor & Neuromorphic Computing Are All You Need for Energy Efficient Computer Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the overhead for the first layer MACs with direct encoding is negligible for deep SNNs and the CV processing is efficient using SNNs, the data transfer between the image sensors and the downstream processing costs significant bandwidth and may dominate the total energy. To mitigate this concern, we propose an in-sensor computing hardware-software co-design framework for SNNs targeting image recognition tasks. |
G. Datta; Z. Liu; M. A. -A. Kaiser; S. Kundu; J. Mathai; Z. Yin; A. P. Jacob; A. R. Jaiswal; P. A. Beerel; |
1258 | Instance-Aware Hierarchical Structured Policy for Prompt Learning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, enhancing methods are often time-consuming and inflexible because 1) class-specific prompts are inefficient in certain situations; 2) instance-specific prompts are put in a fixed position. To address these issues, inspired by the coarse-to-fine decision-making paradigm of human, we propose an Instance-Aware Hierarchical-Structured Policy (IAHSP) that integrates instance-specific prompt selection and appropriate position selection using a reinforcement learning fashion. |
X. Wu; G. Wang; Z. Liu; X. Dang; Z. Qin; |
1259 | Integrated Sensing and Full-Duplex Communication: Joint Transceiver Beamforming and Power Allocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the beamforming design for an integrated sensing and communication (ISAC) system involved full-duplex (FD) communications. |
Z. He; W. Xu; H. Shen; D. W. Kwan Ng; Y. C. Eldar; X. You; |
1260 | Integrating Syntactic and Semantic Knowledge in AMR Parsing with Heterogeneous Graph Attention Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel linguistic knowledge-enhanced AMR parsing model, which augments the pre-trained Transformer with syntactic dependency and semantic role labeling structures of sentences. |
Y. Sataer; C. Shi; M. Gao; Y. Fan; B. Li; Z. Gao; |
1261 | Integrating The Sensing and Radio Communications Channel Modelling From Radar Mutual Interference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a generalist dual channel model description for ISAC and a technique to relate the sensing and the communications channel, which is described analytically and verified with measurements. |
N. Cardona; J. S. Romero; W. Yang; J. Li; |
1262 | Intent Does Matter! Propagating High-Order Relations for Exploring Interest Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Hyper-relation alignment hyperGraph Convolutional Network, called Hyra-GCN, for better inferring the user preference of the current session. |
X. Zheng; X. Liang; B. Wu; J. Feng; Y. Guo; S. Zhang; |
1263 | Interaction-Assisted Multi-Modal Representation Learning for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Interaction-Assisted Multi-Modal Representation Learning for Recommendation (IRL) to inject the information of user interactions into item multi-modal representation learning. |
H. Wu; J. Wang; Z. Zu; |
1264 | Interference Leakage Minimization in RIS-Assisted MIMO Interference Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. |
I. Santamaria; M. Soleymani; E. Jorswieck; J. Gutiérrez; |
1265 | Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When training data is lacking in ASR, a large-scale pre-training and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to over-come, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. |
L. P. Violeta; D. Ma; W. -C. Huang; T. Toda; |
1266 | Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision. |
Y. Higuchi; T. Ogawa; T. Kobayashi; S. Watanabe; |
1267 | Internal Language Model Estimation Based Adaptive Language Model Fusion for Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an adaptive LM fusion approach called internal language model estimation based adaptive domain adaptation (ILME-ADA). |
R. Ma; X. Wu; J. Qiu; Y. Qin; H. Xu; P. Wu; Z. Ma; |
1268 | Interpolation Filter Model For Ramanujan Subspace Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically investigate an ideal interpolation filter model for Ramanujan subspace signals wherein an expander ↑ M is followed by the ideal q-th Ramanujan filter Cq (ejω). |
P. Kulkarni; P. P. Vaidyanathan; |
1269 | Interpolation of Spatial Room Impulse Responses Using Partial Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of finding an exact map, we propose using partial optimal transport to find a coupling between reflections requiring neither estimation of the room geometry nor explicit knowledge of the source-receiver configuration. |
A. Geldert; N. Meyer-Kahlen; S. J. Schlecht; |
1270 | Interpretability in The Context of Sequential Cost-Sensitive Feature Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the popularity of complex machine learning models, domain experts often struggle to understand and are reluctant to trust them due to lack of intuition and explanation of their predictions. |
Y. W. Liyanage; D. Zois; |
1271 | Interpretable Multi-Scale Neural Network for Granger Causality Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a consistency-based thresholding algorithm for binary causal structure inference, an effect sign detection method to distinguish the positive causal effects from the negative ones, and a self-adaptive lag discovery algorithm to identify the lagged time points. |
C. Fan; Y. Wang; Y. Zhang; W. Ouyang; |
1272 | Interpretable Nonnegative Incoherent Deep Dictionary Learning for FMRI Data Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current techniques still present several limitations; some ignore relevant aspects regarding the brain functioning or lack of interpretability. In an effort to overcome such limitations, we introduce an extension of the sparse matrix factorization approach to a multilinear decomposition. |
M. Morante; J. Østergaard; S. Theodoridis; |
1273 | Interpretable, Unrolled Deep Radar Beampattern Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a fast, learned, and – for the first time – interpretable (FLI) deep learning approach by unrolling a state-of-the-art iterative optimization approach. |
K. Metwaly; J. Kweon; K. Alhujaili; M. Greco; F. Gini; V. Monga; |
1274 | Interpretation of Neural Networks Is Susceptible to Universal Adversarial Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show the existence of a Universal Perturbation for Interpretation (UPI) for standard image datasets, which can alter a gradient-based feature map of neural networks over a significant fraction of test samples. To design such a UPI, we propose a gradient-based optimization method as well as a principal component analysis (PCA)-based approach to compute a UPI which can effectively alter a neural network’s gradient-based interpretation on different samples. |
H. E. Oskouie; F. Farnia; |
1275 | Inter-Pulse Estimation for Sperm Whale Click Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new methodology for detecting and extracting features from the IPI of sperm whale clicks. |
G. Gubnitsky; R. Diamant; |
1276 | Inter-Scale Sure-Let Denoise with Structured Deep Image Prior: Interpretable Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a novel image restoration technique inspired by the Ulyanov’s deep image prior (DIP) method. |
J. Li; S. Muramatsu; |
1277 | Inter-Subnet: Speech Enhancement with Subband Interaction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper introduces the subband interaction as a new way to complement the subband model with the global spectral information such as cross-band dependencies and global spectral patterns, and proposes a new lightweight single-channel speech enhancement framework called Interactive Subband Network (Inter-SubNet). |
J. Chen; W. Rao; Z. Wang; J. Lin; Z. Wu; Y. Wang; S. Shang; H. Meng; |
1278 | Interweaved Graph and Attention Network for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite substantial progress in 3D human pose estimation from a single-view image, prior works rarely explore global and local correlations, leading to insufficient learning of human skeleton representations. To address this issue, we propose a novel Interweaved Graph and Attention Network (IGANet) that allows bidirectional communications between graph convolutional networks (GCNs) and attentions. |
T. Wang; H. Liu; R. Ding; W. Li; Y. You; X. Li; |
1279 | Int-GNN: A User Intention Aware Graph Neural Network for Session-Based Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond these, we take user preference, a biased user intention, into account in the prediction stage. Forming these together, we propose a model named user Intention aware Graph Neural Network (Int-GNN) aiming at capturing user intention. |
G. Xu; J. Yang; J. Guo; Z. Huang; B. Zhang; |
1280 | Introducing Topography in Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, in this work, inspired by the neuroscience literature, we proposed a new topographic inductive bias in Convolutional Neural Networks (CNNs). |
M. Poli; E. Dupoux; R. Riad; |
1281 | Invariant Adversarial Imitation Learning From Visual Inputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an invariant model-based adversarial imitation learning (IMAIL) method to improve generalization. |
H. Zhang; Y. Tian; L. Yuan; Y. Lu; |
1282 | Inverse Quadratic Transform for Minimizing A Sum of Ratios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel technique called inverse quadratic transform for the sum-of-ratios minimization problem. |
Y. Chen; L. Zhao; Y. Zhang; K. Shen; |
1283 | Inverse Reinforcement Learning with Graph Neural Networks for IoT Resource Allocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose inverse reinforcement learning with graph neural networks (GNNIRL) to generate a new variable selection policy that closely matches the FSB variable selection. |
G. Wang; P. Cheng; Z. Chen; W. Xiang; B. Vucetic; Y. Li; |
1284 | Investigating Content-Aware Neural Text-to-Speech MOS Prediction Using Prosodic and Linguistic Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In modern high-quality neural TTS systems, prosodic appropriateness with regard to the spoken content is a decisive factor for speech naturalness. For this reason, we propose to include prosodic and linguistic features as additional inputs in MOS prediction systems, and evaluate their impact on the prediction outcome. |
A. Vioni; G. Maniati; N. Ellinas; J. Sig Sung; I. Hwang; A. Chalamandaris; P. Tsiakoulis; |
1285 | Investigating SINDy As A Tool for Causal Discovery in Time Series Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The SINDy algorithm has been successfully used to identify the governing equations of dynamical systems from time series data. In this paper, we argue that this makes SINDy a potentially useful tool for causal discovery and that existing tools for causal discovery can be used to dramatically improve the performance of SINDy as tool for robust sparse modeling and system identification. |
A. O’Brien; R. Weber; E. Kim; |
1286 | Investigation Into Phone-Based Subword Units for Multilingual End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the use of phone-based sub-words, specifically Byte Pair Encoding (BPE), as modeling units for multilingual end-to-end speech recognition. |
S. Yusuyin; H. Huang; J. Liu; C. Liu; |
1287 | Inv-Senet: Invariant Self Expression Network for Clustering Under Biased Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel framework for jointly removing confounding attributes while learning to cluster data points in individual subspaces. |
A. Singh; A. Singh; A. Masoomi; T. Imbiriba; E. Learned-Miller; D. Erdoğmuş; |
1288 | IoU-Aware Multi-Expert Cascade Network Via Dynamic Ensemble for Long-Tailed Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a Multi-Expert Cascade (MEC) framework that readjusts the weight of each category in the training process via a multi-expert loss. |
W. -C. Fan; C. -Y. Hong; Y. -C. Hsu; T. -L. Liu; |
1289 | IQGAN: Robust Quantum Generative Adversarial Network for Image Synthesis On NISQ Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose IQGAN, a quantum Generative Adversarial Network (GAN) framework for multiqubit image synthesis that can be efficiently implemented on Noisy Intermediate Scale Quantum (NISQ) devices. |
C. Chu; G. Skipper; M. Swany; F. Chen; |
1290 | IR-ECG: Invertible Reconstruction of ECG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an invertible neural network, called IR-ECG(invertible reconstruction of ECG), to model the processes of ECG reconstruction. |
P. Wang; X. Huang; L. Cui; |
1291 | I See What You Hear: A Vision-Inspired Method to Localize Words Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Noting that an audio can be interpreted as a 1-dimensional image, object localization techniques can be fundamentally useful for word localization. Building upon this idea, we propose a lightweight solution for word detection and localization. |
M. Samragh; A. Kundu; T. -Y. Hu; A. Chadha; A. Srivastava; M. Cho; O. Tuzel; D. Naik; |
1292 | ISmallNet: Densely Nested Network with Label Decoupling for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose iSmallNet, a multi-stream densely nested network with label decoupling for infrared small object detection. |
Z. Hu; Y. Wang; P. Li; J. Qin; H. Xie; M. Wei; |
1293 | Is Multi-Task Learning An Upper Bound for Continual Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel continual self-supervised learning approach, where each task involves learning an invariant representation for a specific class of data augmentations. |
Z. Wu; H. Tran; H. Pirsiavash; S. Kolouri; |
1294 | Is Quality Enoughƒ Integrating Energy Consumption in A Large-Scale Evaluation of Neural Audio Synthesis Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest relying on a multi-objective metric based on Pareto optimality, which considers equally the accuracy and energy consumption of a model. |
C. Douwes; G. Bindi; A. Caillon; P. Esling; J. -P. Briot; |
1295 | Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new shallow fusion (SF) method to exploit an external backward language model (BLM) for end-to-end automatic speech recognition (ASR). |
A. Ogawa; T. Moriya; N. Kamo; N. Tawara; M. Delcroix; |
1296 | Iterative Water-Filling Power and Subcarrier Allocation for Multicarrier NOMA Downlink Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the first time, we present closed form water-filling formulae that quantify exactly the following effects of multiple access interference for optimal power allocation of MC-NOMA downlink: (i) Each user should take into account the sum of interference from other stronger users plus the receiver Gaussian noise as an equivalent interference in each subcarrier; (ii) The sum of each user’s interference to other weaker users should be accounted as factors which determine the distinct water levels in each subcarrier of each of the other weaker users. |
C. Choy Chai; X. -P. Zhang; |
1297 | ITER-SIS: Robust Unlimited Sampling Via Iterative Signal Sieving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Depending on the severity of spectral leakage, Fourier domain algorithms may fail to reconstruct. To overcome this bottleneck, in this paper, we propose an Iterative Signal Sieving Algorithm (ITER-SIS) that solely operates in the time domain. |
R. Guo; A. Bhandari; |
1298 | I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies focus on scaling up the model size and the number of training data, which significantly increase the cost of model training. Different to these heavy-cost models, we introduce a lightweight image captioning framework (I-Tuning), which contains a small number of trainable parameters. |
Z. Luo; Z. Hu; Y. Xi; R. Zhang; J. Ma; |
1299 | JaCappella Corpus: A Japanese A Cappella Vocal Ensemble Corpus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We construct a corpus of Japanese a cappella vocal ensembles (ja-Cappella corpus) for vocal ensemble separation and synthesis. |
T. Nakamura; S. Takamichi; N. Tanji; S. Fukayama; H. Saruwatari; |
1300 | Jamming Source Localization Using Augmented Physics-Based Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to augment the pathloss model with a data-driven component, able to explain the complexities of the propagation channel. |
A. Nardin; T. Imbiriba; P. Closas; |
1301 | Jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning Research Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces the jazznet Dataset, a dataset of fundamental jazz piano music patterns for developing machine learning (ML) algorithms in music information retrieval (MIR). |
T. Adegbija; |
1302 | Jeffreys Divergence-Based Regularization of Neural Network Output Distribution Applied to Speaker Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This objective function provides highly discriminative features. Beyond this effect, we propose a theoretical justification of its effectiveness and try to understand how this loss function affects the model, in particular the impact on dataset types (i.e. in-domain or out-of-domain w.r.t the training corpus). |
P. -M. Bousquet; M. Rouvier; |
1303 | JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition. |
Z. Meng; W. Wang; R. Prabhavalkar; T. N. Sainath; T. Chen; E. Variani; Y. Zhang; B. Li; A. Rosenberg; B. Ramabhadran; |
1304 | JNDMix: Jnd-Based Data Augmentation for No-Reference Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, although only a few data augmentation methods are available for NR-IQA task, their ability to enrich dataset diversity is still insufficient. To address these issues, we propose a effective and general data augmentation based on just noticeable difference (JND) noise mixing for NR-IQA task, named JNDMix. |
J. Sheng; J. Fan; P. Ye; J. Cao; |
1305 | Joint Angle and Respiration Estimation for Passive and Device-Free Respiration Monitoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, prior studies used a phase calibration technique that restricts to Orthogonal Frequency Division Multiplex (OFDM) systems like e.g. WiFi. In this contribution, we propose a completely new approach for such kind of applications: As it is supported e.g. by recent versions of the Bluetooth standard, we propose the use of an antenna array at the receiver for sensing purposes. |
G. Maus; D. Brückmann; |
1306 | Joint Ann-SNN Co-training for Object Localization and Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Spiking neural networks (SNNs) have recently emerged as a low-power alternative to ANNs due to their sparsity nature.In this work, we propose a novel hybrid ANN-SNN cotraining framework to improve the performance of converted SNNs. |
M. Baltes; N. Abuhajar; Y. Yue; C. D. Smith; J. Liu; |
1307 | Joint Antenna Selection and Beamforming in Integrated Automotive Radar Sensing-Communications with Quantized Double Phase Shifters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel, low-cost, low power consumption and low-computation approach for designing a beam that can simultaneously reach the radar target of interest and the desired communication destination. |
L. Xu; S. Sun; Y. D. Zhang; A. Petropulu; |
1308 | Joint Channel and Direction Estimation for Ground-to-UAV Communications Enabled By A Simultaneous Reflecting and Sensing RIS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on HRIS-enabled Unmanned Aerial Vehicle (UAV) networks and design the HRIS parameters (phase profile, reception combining, and the power splitting between the two functionalities) for jointly estimating the individual UAV-HRIS and HRIS-base-station channels as well as the Angle of Arrival (AoA) of the Line-of-Sight (LoS) component of the UAV-HRIS channel. |
J. He; A. Fakhreddine; G. C. Alexandropoulos; |
1309 | Joint Compression and Demosaicking For Satellite Images Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Image sensors used in real camera systems are equipped with colour filter arrays which sample the light rays in different spectral bands. Each colour channel can thus be obtained … |
P. Bacchus; R. Fraisse; A. Roumy; C. Guillemot; |
1310 | Joint Cryo-ET Alignment and Reconstruction with Neural Deformation Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework to jointly determine the deformation parameters and reconstruct the unknown volume in electron cryotomography (CryoET). |
V. Debarnot; S. Gupta; K. Kothari; I. Dokmanić; |
1311 | Joint Data Association, NLOS Mitigation, and Clutter Suppression for Networked Device-Free Sensing in 6G Cellular Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of cooperative communication in cloud radio access network, this paper considers a networked device-free sensing architecture based on base station (BS) cooperation to transform the cellular network into a huge sensor that can provide ubiquitous sensing services. |
Q. Shi; L. Liu; S. Zhang; |
1312 | Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel domain adaptation framework for the E2E model, which only uses the text of the target domain. |
H. Shao; T. Tan; W. Wang; X. Gong; Y. Qian; |
1313 | Joint Estimation of Clustered User Activity and Correlated Channels with Unknown Covariance in MMTC Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive a computationally-efficient algorithm based on alternating direction method of multipliers (ADMM) to solve the MAP problem iteratively via a sequence of closed-form updates. |
H. Djelouat; M. Leinonen; M. Juntti; |
1314 | Joint Estimation of DOA and Distance in Noisy Reverberant Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to jointly estimate the DOA and distance in noisy and reverberant environments. |
S. Bu; T. Zhao; Y. Zhao; |
1315 | Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a joint generative and contrastive representation learning method (GeCo) for anomalous sound detection (ASD). |
X. -M. Zeng; Y. Song; Z. Zhuo; Y. Zhou; Y. -H. Li; H. Xue; L. -R. Dai; I. McLoughlin; |
1316 | Joint Human Orientation-Activity Recognition Using WIFI Signals for Human-Machine Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data collection setup and machine learning models for joint human orientation and activity recognition using WiFi signals from a single access point (AP) or multiple APs. |
H. Salehinejad; N. Hasanzadeh; R. Djogo; S. Valaee; |
1317 | Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, almost of existing TSLV approaches suffer from the same limitations: (1) They only focus on either frame-level or object-level visual representation learning and corresponding correlation reasoning, but fail to integrate them both; (2) They neglect to leverage the rich semantic contexts to further benefit the query reasoning. To address these issues, in this paper, we propose a novel Hierarchical Visual- and Semantic-Aware Reasoning Network (HVSARN), which enables both visual- and semantic-aware query reasoning from object-level to frame-level. |
D. Liu; P. Zhou; |
1318 | Joint Microstrip Selection and Beamforming Design for MmWave Systems with Dynamic Metasurface Antennas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the downlink millimeter wave (mmWave) DMA systems with limited number of radio frequency (RF) chains. |
W. Huang; H. Zhang; N. Shlezinger; Y. C. Eldar; |
1319 | Joint Millimeter-Wave AoD and AoA Estimation Using One OFDM Symbol and Frequency-Dependent Beams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop an algorithm for joint AoD and AoA estimation using only one Orthogonal Frequency Division Multiplexing (OFDM) symbol and frequency-dependent beams that can be synthesized by fully digital and true-time-delay arrays. |
V. Boljanovic; D. Cabric; |
1320 | Joint Modeling for ASR Correction and Dialog State Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. |
D. Wang; T. Zhang; C. Yuan; X. Wang; |
1321 | Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, separate models are used for each SLU task, leading to an increase in inference time and computation cost. Motivated by this, we aim to ask: can we jointly model all the SLU tasks while incorporating context to facilitate low-latency and lightweight inference? |
S. Arora; H. Futami; E. Tsunoo; B. Yan; S. Watanabe; |
1322 | Joint Multi-Level Feature Network for Lightweight Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although existing methods have made significant progress, utilizing multi-level information to obtain fine-grained features has not been explored in this field. To alleviate this issue, we propose a lightweight person Re-ID method named Joint Multi-Level Feature Network (JMLFNet) to obtain robust feature representation for the Re-ID task. |
Y. Zhang; W. Kang; Y. Liu; P. Zhu; |
1323 | Joint Neural Representation for Multiple Light Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, exploiting the features shared between different scenes remains an understudied problem. We provide a step towards this end by presenting a method for sharing the representation between thousands of light fields, splitting the representation between a part that is shared between all light fields and a part which varies individually from one light field to another. |
G. L. Guludec; C. Guillemot; |
1324 | Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although SE methods can suppress the noise contained in the speaker’s voice, they cannot deal with the noise that is physically present in the listener side. To address such a complicated but common scenario, we investigate a deep learning-based joint framework integrating noise reduction (NR) with listening enhancement (LE), in which the NR module first suppresses noise and the LE module then modifies the denoised speech, i.e., the output of the NR module, to further improve speech intelligibility. |
H. Li; Y. Liu; J. Yamagishi; |
1325 | Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, direct S2ST suffers from the data scarcity problem because the corpora from the speech of the source language to the speech of the target language are very rare. To address this issue, we propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks. |
K. Wei; L. Zhou; Z. Zhang; L. Chen; S. Liu; L. He; J. Li; F. Wei; |
1326 | Joint Robust Representation And Generalization Enhancement For Cross-Modality Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods ignore data bias due to different cameras and views and overlook the strong dependence between feature maps that hinders modal alignment. In this paper, we propose a unified method named Joint Robust Representation and Generalization Enhancement (RRGE) to alleviate the above issues. |
H. Cheng; Y. Feng; M. Zhou; X. Xiong; Y. Wang; B. Qiang; |
1327 | Joint Symbol-Level Precoding and Sub-Block-Level RIS Design for Dual-Function Radar-Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the RIS-aided dual-function radar-communication (DFRC) system and investigate the benefit of increasing the RIS updating frequency. |
L. Wu; B. Wang; Z. Cheng; B. S. M. R; B. Ottersten; |
1328 | Joint Training and Decoding for Multilingual End-to-End Simultaneous Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies on end-to-end speech translation(ST) have facilitated the exploration of multilingual end-to-end ST and end-to-end simultaneous ST. In this paper, we investigate end-to-end simultaneous speech translation in a one-to-many multilingual setting which is closer to applications in real scenarios. |
W. Huang; R. Jin; W. Zhang; J. Luan; B. Wang; D. Xiong; |
1329 | Joint Training of Hierarchical GANs and Semantic Segmentation for Expression Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, the solution becomes suboptimal and introduces unwanted artifacts. |
R. Bodur; B. Bhattarai; T. -K. Kim; |
1330 | Joint Unmixing And Demosaicing Methods For Snapshot Spectral Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the problem of joint demosaicing and unmixing for the hyperspectral images acquired by the SSI camera that we formulate as a low-rank matrix factorization and completion problem. |
K. Abbas; M. Puigt; G. Delmaire; G. Roussel; |
1331 | Joint Unsupervised and Supervised Learning for Context-Aware Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. |
J. Park; H. Y. Kim; J. Park; B. -Y. Kim; S. Choi; Y. Lim; |
1332 | Joint Waveform and Passive Beamformer Design in Multi-IRS-Aided Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To deal with the NP-hard nature of both UQPs, we propose unimodular waveform and beamforming design for multi-IRS radar (UBeR) algorithm that takes advantage of the low-cost power method-like iterations. |
Z. Esmaeilbeig; A. Eamaz; K. V. Mishra; M. Soltanalian; |
1333 | JPEG Pleno Call for Proposals Responses Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the quality evaluation of the responses to the Call for Proposals (CfP) of JPEG Pleno Point Cloud Coding is presented. |
J. Prazeres; Z. Luo; A. M. G. Pinheiro; L. A. da Silva Cruz; S. Perry; |
1334 | JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a variational autoencoder (VAE)-based method for voice conversion (VC) on arbitrary source-target speaker pairs without parallel corpora, i.e., non-parallel any-to-any VC. |
S. Seki; H. Kameoka; K. Tanaka; T. Kaneko; |
1335 | K2NN: Self-Supervised Learning with Hierarchical Nearest Neighbors for Remote Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a novel pixel-level task that leverages an ensemble of nearest neighbors from multiple images to explore diverse objects in each image, especially for remote sensing data. |
J. Yuan; Y. Xu; Z. Wang; |
1336 | Kalmanbot: Kalmannet-Aided Bollinger Bands for Pairs Trading Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose KalmanNet Bollinger Trading (KalmanBOT), a dataaided policy that preserves the advantages of KF-aided BB policies while leveraging data to overcome the approximated nature of the SS model. |
H. Deng; G. Revach; H. Morgenstern; N. Shlezinger; |
1337 | KEPS-NET: Robust Parking Slot Detection Based Keypoint Estimation for High Localization Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study parking slot detection problem for automated parking function. |
J. Lee; K. Sung; D. Park; Y. Jeon; |
1338 | Kernel Estimation and Deconvolution for Blind Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose our methods of a more accurate kernel estimation module (KEM) and deconvolution module (DM). |
J. Gong; H. Gao; J. Chao; Z. Zhou; Z. Yang; Z. Zeng; |
1339 | Kernel Interpolation of Acoustic Transfer Functions with Adaptive Kernel for Directed and Residual Reverberations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An interpolation method for region-to-region acoustic transfer functions (ATFs) based on kernel ridge regression with an adaptive kernel is proposed. |
J. G. C. Ribeiro; S. Koyama; H. Saruwatari; |
1340 | Kernel Ridge Regression for Generalized Graph Signal Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To perform non-linear filtering and regression under the GGSP framework, we formulate an operator-valued kernel ridge regression (KRR) filtering approach. |
X. Jian; W. P. Tay; |
1341 | Keyword-Specific Acoustic Model Pruning for Open-Vocabulary Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design a dynamic acoustic model with input-dependent parameters. |
Y. Yang; K. Zhang; Z. Wu; H. Meng; |
1342 | KG-ECO: Knowledge Graph Enhanced Entity Correction For Query Rewriting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose KG-ECO: Knowledge Graph enhanced Entity COrrection for query rewriting, an entity correction system with corrupt entity span detection and entity retrieval/re-ranking functionalities.To boost the model performance, we incorporate Knowledge Graph (KG) to provide entity structural information (neighboring entities encoded by graph neural networks) and textual information (KG entity descriptions encoded by RoBERTa). |
J. Cai; M. Li; Z. Jiang; E. Cho; Z. Chen; Y. Liu; X. Fan; C. Guo; |
1343 | Knowledge-Augmented Frame Semantic Parsing with Hybrid Prompt-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel Knowledge-Augmented Frame Semantic Parsing Architecture (KAF-SPA) to enhance semantic representation by incorporating accurate frame knowledge into PLMs during frame semantic parsing. |
R. Zhang; Y. Sun; J. Yang; W. Peng; |
1344 | Knowledge-Aware Bayesian Co-Attention for Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing models that are based on attention mechanisms have difficulty in learning emotionally relevant parts on their own. To solve this problem, we propose to incorporate external emotion-related knowledge in the co-attention based fusion of pre-trained models. |
Z. Zhao; Y. Wang; Y. Wang; |
1345 | Knowledge-Aware Few Shot Learning for Event Detection from Short Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problems, we propose a knowledge-aware event detector by incorporating the external knowledge to detect the events with few examples. |
J. Guo; Z. Huang; G. Xu; B. Zhang; C. Duan; |
1346 | Knowledge-Aware Graph Convolutional Network with Utterance-Specific Window Search for Emotion Recognition In Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a knowledge-aware graph convo-lutional network (KGCN-ERC) by introducing a knowledge graph into node connection of graph neural networks for the first time. |
X. Zhang; P. He; H. Liu; Z. Yin; X. Liu; X. Zhang; |
1347 | Knowledge Distillation with Active Exploration and Self-Attention Based Inter-Class Variation Transfer for Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach with active exploration and passive transfer (AEPT) and self-attention-based inter-class feature variation (AIFV) distillation for the cardiac image segmentation task. |
Y. Zhang; S. Li; X. Yang; |
1348 | Knowledge-Graph Augmented Music Representation for Genre Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose KGenre, a knowledge-embedded music representation learning framework for improved genre classification. |
H. Ding; W. Song; C. Zhao; F. Wang; G. Wang; W. Xi; J. Zhao; |
1349 | Knowledge Transfer for On-Device Speech Emotion Recognition With Neural Structured Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a neural structured learning (NSL) framework through building synthesized graphs. |
Y. Chang; Z. Ren; T. T. Nguyen; K. Qian; B. W. Schuller; |
1350 | LABANet: Lead-Assisting Backbone Attention Network for Oral Multi-Pathology Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Lead-Assisting Backbone Attention Network (LABANet), which is able to perform multi-pathology instance segmentation of dental panoramic X-rays. |
H. Chen; X. Huang; Q. Li; J. Wang; B. Fang; J. Chen; |
1351 | Label-Efficient and Robust Learning from Multiple Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Learning from multiple sources of labeled data presents unique challenges, including the cost of using many annotators in the training, test, and production time and their limited reliability. In this paper, we analyze the problem of label-efficient learning and propose a method for training a classification system from data labeled by multiple annotators, where only a small subset of them is chosen adaptively to reduce later communication effort and the effect of malicious annotators, especially in the validation and test phase. |
B. Kolosnjaji; A. Zarras; |
1352 | Label-Guided Contrastive Learning for Out-of-Domain Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the SSCL methods require complex data augmentation and are vulnerable to intrinsic false-negative pairs. To address the issues above and leverage both types of CL, we propose a novel Label-Guided Contrastive Learning (LGCL) framework. |
S. Zhang; T. Li; J. Bai; Z. Li; |
1353 | Large Covariance Matrix Estimation with Oracle Statistical Rate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to eliminate the estimation bias we propose to estimate large sparse covariance matrices using the non-convex penalty. |
Q. Wei; Z. Zhao; |
1354 | Large Dimensional Analysis of LS-SVM Transfer Learning: Application to Polsar Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article analyzes a kernel-based transfer learning method, under a k-class Gaussian mixture model for the input data. |
C. Doz; C. Ren; J. -P. Ovarlez; R. Couillet; |
1355 | Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. |
Y. Wu; K. Chen; T. Zhang; Y. Hui; T. Berg-Kirkpatrick; S. Dubnov; |
1356 | Large-Scale Language Model Rescoring on Long-Form Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. |
T. Chen; C. Allauzen; Y. Huang; D. Park; D. Rybach; W. R. Huang; R. Cabrera; K. Audhkhasi; B. Ramabhadran; P. J. Moreno; M. Riley; |
1357 | Large-Scale Nonverbal Vocalization Detection Using Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first, to the best of our knowledge, nonverbal vocalization detection model trained to detect as many as 67 types of emotional vocalizations. |
P. Tzirakis; A. Baird; J. Brooks; C. Gagne; L. Kim; M. Opara; C. Gregory; J. Metrick; G. Boseck; V. Tiruvadi; B. Schuller; D. Keltner; A. Cowen; |
1358 | Laryngeal Leukoplakia Classification Via Dense Multiscale Feature Extraction in White Light Endoscopy Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of this paper is to classify laryngeal leukoplakia in white light endoscopy images into six classes: normal tissues, inflammatory keratosis, mild dysplasia, moderate dysplasia, severe dysplasia and squamous cell carcinoma. |
Z. You; Y. Yan; Z. Shi; M. Zhao; J. Yan; H. Liu; X. Hei; X. Ren; |
1359 | Lasso-Based Fast Residual Recovery For Modulo Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a fast and robust algorithm for unfolding. |
S. B. Shah; S. Mulleti; Y. C. Eldar; |
1360 | Last: Scalable Lattice-Based Speech Modelling in Jax Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LAST, a LAttice-based Speech Transducer library in JAX. |
K. Wu; E. Variani; T. Bagby; M. Riley; |
1361 | Latent Iterative Refinement for Modular Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional source separation approaches train deep neural network models end-to-end with all the data available at once by minimizing the empirical risk on the whole training set. |
D. Bralios; E. Tzinis; G. Wichern; P. Smaragdis; J. L. Roux; |
1362 | Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. |
Z. Yang; W. Zhou; R. Schlüter; H. Ney; |
1363 | LA-VOCE: LOW-SNR Audio-Visual Speech Enhancement Using Neural Vocoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent advances in speech synthesis, most audio-visual approaches continue to use spectral mapping/masking to reproduce the clean audio, often resulting in visual backbones added to existing speech enhancement architectures. In this work, we propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture, and then converts them into waveform audio using a neural vocoder (HiFi-GAN). |
R. Mira; B. Xu; J. Donley; A. Kumar; S. Petridis; V. K. Ithapu; M. Pantic; |
1364 | LDTSF: A Label-Decoupling Teacher-Student Framework for Semi-Supervised Echocardiography Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fully leverage the easily accessible unlabeled data, we propose a label-decoupling teacher-student framework (LDTSF) based on semi-supervised learning. |
J. Zhang; Y. Wang; Z. Pan; Z. Tang; L. Chen; J. Liu; |
1365 | LeanSpeech: The Microsoft Lightweight Speech Synthesis System for Limmits Challenge 2023 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a lightweight encoder-decoder acoustic model composed of 1-D convolution and LSTM blocks, which is trained with knowledge distillation from a multi-speaker multi-lingual teacher model, DelightfulTTS [1]. |
C. Zhang; S. Bansal; A. Lakhera; J. Li; G. Wang; S. Satpal; S. Zhao; L. He; |
1366 | LEAPT: Learning Adaptive Prefix-to-Prefix Translation For Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by strategies utilized by human interpreters and wait policies, we propose a novel adaptive prefix-to-prefix training policy called LEAPT, which allows our machine translation model to learn how to translate source sentence prefixes and make use of the future context. |
L. Lin; S. Li; X. Shi; |
1367 | Learnable Flow Model Conditioned on Graph Representation Memory for Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although invertible flow models are developed to accomplish unsupervised anomaly detection, they are usually hard to train and have limited capabilities of accurately modeling the distribution of normal samples. To address this problem, we propose a novel enhanced flow model conditioned on graph representation memory (FlowGRM) for visual surface defect detection. |
Z. Zhu; W. Liu; Z. Deng; |
1368 | Learnable Frontends That Do Not Learn: Quantifying Sensitivity To Filterbank Initialisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus specifically on learnable filterbanks. |
M. Anderson; T. Kinnunen; N. Harte; |
1369 | Learned Generative Misspecified Lower Bound Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a method for approximating the MCRB using a Generative Model, referred to as a Generative Misspecified Lower Bound (GMLB), in which we train a generative model on data from the true measurement distribution. |
H. V. Habi; H. Messer; Y. Bresler; |
1370 | Learned Kalman Filtering in Latent Space with High-Dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we tackle the challenges associated with tracking from high-dimensional measurements by jointly learning the KF along with the latent space mapping. |
I. Buchnik; D. Steger; G. Revach; R. J. G. van Sloun; T. Routtenberg; N. Shlezinger; |
1371 | Learned Video Coding with Motion Compensation Mixture Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To avoid the problem of carrying false edges/details caused by inaccurate optical flow in the predicted frame to the residual, we propose a dynamic mixture of explicit and implicit motion compensations, where implicitness means that the encoding and decoding of the original frame are conditioned on the predicted frame in pixel and latent domains, respectively. |
K. Q. Dinh; K. Pyo Choi; |
1372 | Learning 3D Human Pose and Shape Estimation Using Uncertainty-Aware Body Part Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, noisy pixels introduced by inaccurate segmentation annotations also prevent the model from improving the reconstruction performance further. To address these problems, we propose a novel generalizable framework called Uncertainty-aware body Part Segmentation (UPS), which penalizes different body parts with data uncertainty estimation. |
Z. Wang; H. Yu; X. Zhu; Z. Li; C. Chen; L. Song; |
1373 | Learning ASR Pathways: A Sparse Multilingual ASR Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in multilingual ASR, language-agnostic pruning may lead to severe performance drops on some languages because language-agnostic pruning masks may not fit all languages and discard important language-specific parameters. In this work, we present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks (pathways), such that the parameters for each language are learned explicitly. |
M. Yang; A. Tjandra; C. Liu; D. Zhang; D. Le; O. Kalinli; |
1374 | Learning Audio-Visual Dereverberation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our idea is to learn to dereverberate speech from audio-visual observations. |
C. Chen; W. Sun; D. Harwath; K. Grauman; |
1375 | Learning A Weight Map for Weakly-Supervised Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose employing an image classifier f and training a generative network g that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image. |
T. Shaharabany; L. Wolf; |
1376 | Learning Causal Representations for Generalizable Face Anti Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method that learns Causal Representations for Face Anti-Spoofing (CRFAS). |
G. Zheng; Y. Liu; W. Dai; C. Li; J. Zou; H. Xiong; |
1377 | Learning Cross-Lingual Visual Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study cross-lingual self-supervised visual representation learning. |
A. Zinonos; A. Haliassos; P. Ma; S. Petridis; M. Pantic; |
1378 | Learning Cross-Modal Audiovisual Representations with Ladder Networks for Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on emotion recognition, this study proposes novel cross-modal ladder networks to capture modality-specific information while building strong cross-modal representations. |
L. Goncalves; C. Busso; |
1379 | Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we assume Markovian dependencies among latent variables, and propose to learn speech representations with neural hidden Markov models. |
S. -L. Yeh; H. Tang; |
1380 | Learning Dynamic Graphs Under Partial Observability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work examines the problem of learning a network graph from signals emitted by the network nodes, according to a diffusion model ruled by a Laplacian combination policy. |
M. Cirillo; V. Matta; A. H. Sayed; |
1381 | Learning Environmental Structure Using Acoustic Probes with A Deep Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, many algorithms in the literature also require a correct assignment of echoes to boundaries, which is combinatorially hard. To evade these limitations, we develop a convolutional neural network method for robust 2D boundary estimation, given known emitter and receiver locations. |
T. Arikan; A. Weiss; H. Vishnu; G. Deane; A. Singer; G. Wornell; |
1382 | Learning Expressive And Generalizable Motion Features For Face Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an effective sequence-based forgery detection framework based on an existing video classification method. |
J. Zhang; P. Zhang; J. Wang; D. Xie; S. Pu; |
1383 | Learning From Label Proportion with Online Pseudo-Label Decision By Regret Minimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel and efficient method for Learning from Label Proportions (LLP), whose goal is to train a classifier only by using the class label proportions of instance sets, called bags. |
S. Matsuo; R. Bise; S. Uchida; D. Suehiro; |
1384 | Learning From Positive and Unlabeled Data Using Observer-GAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By including an additional classifier into the GAN architecture, we describe a novel GAN-based approach. |
O. Zamzam; H. Akrami; R. Leahy; |
1385 | Learning From Single-Expert Annotated Labels for Automatic Sleep Staging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we treat labels mislabeled by a single expert as noisy labels and first propose SE-ASS, an automatic sleep staging learning framework based on single-expert annotated data. |
Z. Luan; Y. Ren; L. Peng; X. Chen; X. Yang; W. Tu; Y. Yang; |
1386 | Learning from The Raw Domain: Cross Modality Distillation for Compressed Video Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to solve the problem, we propose a cross-modality knowledge distillation method to force the CD model to learn the knowledge from the RD model. |
Y. Liu; J. Cao; W. Bai; B. Li; W. Hu; |
1387 | Learning From Yourself: A Self-Distillation Method For Fake Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. |
J. Xue; C. Fan; J. Yi; C. Wang; Z. Wen; D. Zhang; Z. Lv; |
1388 | Learning Generalizable Light Field Networks from Few Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore a new strategy for few-shot novel view synthesis based on a neural light field representation. |
Q. Li; F. Multon; A. Boukhayma; |
1389 | Learning Gradients of Convex Functions with Monotone Gradient Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose C-MGN and M-MGN, two monotone gradient neural network architectures for directly learning the gradients of convex functions. |
S. Chaudhari; S. Pranav; J. M. F. Moura; |
1390 | Learning Graph Laplacian from Intrinsic Patterns Via Gaussian Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel scheme to learn a graph topology named Laplacian constrained Gaussian process (LCGP). |
K. Watanabe; K. Maeda; T. Ogawa; M. Haseyama; |
1391 | Learning How to Learn Domain-Invariant Parameters for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the insight that only partial parameters of DNNs are optimized to extract domain-invariant representations, we expect a general model that is capable of well perceiving and emphatically updating such domain-invariant parameters. In this paper, we propose two modules of Domain Decoupling and Combination (DDC) and Domain-invariance-guided Backpropagation (DIGB), which can encourage such general model to focus on the parameters that have a unified optimization direction between pairs of contrastive samples. |
F. Hou; Y. Zhang; Y. Liu; J. Yuan; C. Zhong; Y. Zhang; Z. Shi; J. Fan; Z. He; |
1392 | Learning Hybrid Representations of Semantics and Distortion for Blind Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, it is virtually infeasible to learn the representations of image semantics and distortion by co-guiding with semantic and distortion labels. To address this issue, we propose a dual-perception network (DPNet) via an end-to-end multi-task learning method, where knowledge distillation is lever-aged as a semantic label-free strategy. |
X. Wang; J. Xiong; B. Li; J. Suo; H. Gao; |
1393 | Learning Hypergraphs From Signals With Dual Smoothness Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first challenge, we adopt the assumption that the ideal hypergraph structure can be derived from a learnable graph structure that captures the pairwise relations within signals. |
B. Tang; S. Chen; X. Dong; |
1394 | Learning Interpretable Filters In Wav-UNet For Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For instance, the internal filters used in each layers are chosen in an adhoc manner with only a loose relation with the nature of the processed signal. We propose in this paper an approach to learn interpretable filters within a specific neural architecture which allow to better understand the behaviour of the neural network and to reduce its complexity. |
F. Mathieu; T. Courtat; G. Richard; G. Peeters; |
1395 | Learning on Entropy Coded Images with CNN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an empirical study to see whether learning with convolutional neural networks (CNNs) on entropy coded data is possible. |
R. Piau; T. Maugey; A. Roumy; |
1396 | Learning on Graphs Under Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even though graph neural networks (GNNs) have produced promising results on this task, current techniques often presume that label information of nodes is accurate, which may not be the case in real-world applications. To tackle this issue, we investigate the problem of learning on graphs with label noise and develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve it. |
J. Yuan; X. Luo; Y. Qin; Y. Zhao; W. Ju; M. Zhang; |
1397 | Learning Properties of Holomorphic Neural Networks of Dual Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make a step in an unusual direction: we propose to use neural networks based on dual numbers. |
D. Kozlov; M. Bakulin; S. Pavlov; A. Zuev; M. Krylova; I. Kharchikov; |
1398 | Learning Quantum Entanglement Distillation With Noisy Classical Communications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the case in which communication takes place over noisy binary symmetric channels. |
H. H. Suthan Chittoor; O. Simeone; |
1399 | Learning Robust Self-Attention Features for Speech Emotion Recognition with Label-Adaptive Mixup Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a self-attention based method with combined use of label-adaptive mixup and center loss. |
L. Kang; L. Zhang; D. Jiang; |
1400 | Learning Scene Flow from 3d Point Clouds with Cross-Transformer and Global Motion Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a cross-transformer to capture more reliable dependencies for point pairs. |
M. Zhai; K. Ni; J. Xie; H. Gao; |
1401 | Learning Silhouettes with Group Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Contemporary deep learning architectures have been used to model neural activity, inspired by signal processing algorithms; however sparse coding architectures are not able to explain the higher-order categorization that has been empirically observed at the neural level. In this work, we propose a novel model-based architecture, termed group-sparse autoencoder, that produces sparse activity patterns in line with neural modeling, but showcases a higher-level order in its activation maps. |
E. Theodosis; D. Ba; |
1402 | Learning Sparse Alignments Via Optimal Transport for Cross-Domain Fake News Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous methods focus on excavating distinguishable features from news contents in a single domain with deep models, which are difficult to generalize to other domains. To solve this problem, News Optimal Transport (NOT) is proposed to learn transferable features across domains by aligning the source and target news using Optimal Transport (OT) techniques. |
W. Tang; Z. Ma; H. Sun; J. Wang; |
1403 | Learning Sparse Auto-Encoders for Green AI Image Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design an algorithm and test it on three constraints: the classical ℓ1 constraint, the ℓ1,∞ and the new ℓ1,1 constraint. |
C. Gille; F. Guyard; M. Antonini; M. Barlaud; |
1404 | Learning Speech Representations with Flexible Hidden Feature Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new voice conversion framework that uses only one encoder to obtain timbre and content information by partitioning the latent space in the channel dimension. |
H. Tang; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
1405 | Learning Supervised Covariation Projection Through General Covariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the preceding problem and propose two novel CCA approaches in a supervised manner by using a general covariance metric. |
X. Bao; Y. -H. Yuan; Y. Li; J. Qiang; Y. Zhu; |
1406 | Learning Task-Aligned Mask Query for Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel instance segmentation method, termed AlignMask, which effectively learns task-aligned mask queries for instance end-toend. |
B. Fu; H. He; P. Wei; J. Chen; |
1407 | Learning to Auto-Correct for High-Quality Spectrograms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new architecture ReAct-Speech to explicitly learn to auto-correct for high-quality spectrograms. |
Z. Zhou; S. Liu; |
1408 | Learning to Balance The Global Coherence and Informativeness in Knowledge-Grounded Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous work mainly focuses on retrieving diverse knowledge to assist the response generation whereas resulting in a rough dialogue transition. To alleviate this issue, we propose a History-Adapted Knowledge Copy (HAKC) network to adaptively select context-aware knowledge to ensure the coherence of dialogue. |
C. Niu; Y. Hu; W. Peng; Y. Xie; |
1409 | Learning to Build Reasoning Chains By Reliable Path Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a ReliAble Path-retrieval (RAP) to generate varying length evidence chains iteratively. |
M. Zhu; Y. Weng; S. He; C. Wang; K. Liu; L. Cai; J. Zhao; |
1410 | Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We specifically address the task of few shot detection of novel acoustic sequences, or sound events with semantically meaningful temporal structure, without assuming access to non-target audio. We develop procedures for pretraining suitable representations, and methods which transfer them to our few shot learning scenario. |
V. Kowtha; M. Espi Marques; J. Huang; Y. Zhang; C. Avendano; |
1411 | Learning to Explain: A Gradient-based Attribution Method for Interpreting Super-Resolution Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a gradient-based attribution method L2X(Learning to eXplain) to provide post-hoc visualization and interpretation for SR models by quantifying the attribution of individual features with regard to the SR output and generating a heatmap in pixel/input space. |
A. Yu; Y. -B. Yang; |
1412 | Learning To Generate 3d Representations of Building Roofs Using Single-View Aerial Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel pipeline for learning the conditional distribution of a building roof mesh given pixels from an aerial image, under the assumption that roof geometry follows a set of regular patterns. |
M. Khomiakov; A. V. Mahou; A. R. Sánchez; J. Frellsen; M. R. Andersen; |
1413 | Learning to Locate The Text Forgery in Smartphone Screenshots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the Screenshot Text Forgery Dataset (STFD), which is the first public dataset for the smartphone screenshot text forgery localization task. |
Z. Yu; B. Li; Y. Lin; J. Zeng; J. Zeng; |
1414 | Learning To Locate Visual Answer In Video Corpus Using Question Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a cross-modal contrastive global-span (CCGS) method for the VCVAL, jointly training the video corpus retrieval and visual answer localization subtasks with the global-span matrix. |
B. Li; Y. Weng; B. Sun; S. Li; |
1415 | Learning to Personalize Equalization for High-Fidelity Spatial Audio Reproduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an approach to assess the level of personalization achieved and benchmark the improvements delivered by the proposed algorithm relative to a generic solution. |
A. Gupta; P. F. Hoffmann; S. Prepelitǎ; P. Robinson; V. K. Ithapu; D. L. Alon; |
1416 | Learning to Reconnect Interrupted Trajectories for Weakly Supervised Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively reconnect the interrupted trajectories caused by noisy pseudo labels, we propose a novel weakly supervised MOT method based on a Trajectory-Reconnecting Transformer (TRTMOT). |
Y. -L. Li; Y. Lu; J. Li; H. Wang; |
1417 | Learning To Regularized Resource Allocation with Budget Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer without violating resource budget constraints. |
S. Fang; Q. Liu; L. Xu; W. Wu; |
1418 | Learning Unbiased Rewards with Mutual Information in Adversarial Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we theoretically analyze the bias in the AIL reward function and find that balancing the performance of a generator and a discriminator is not necessary when we recover an unbiased reward function. |
L. Zhang; Q. Liu; Z. Huang; L. Wu; |
1419 | Learning with Multigraph Convolutional Filters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a convolutional architecture to perform learning when information is supported on multigraphs. |
L. Butler; A. Parada-Mayorga; A. Ribeiro; |
1420 | Learnt Mutual Feature Compression for Machine Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the existing ICM methods separately compress features at each scale, neglecting the redundancy across multi-scale features. To address this issue, this paper proposes an end-to-end mutual compression framework for the ICM, such that the compression efficiency can be significantly improved by removing the cross-scale redundancy. |
T. Liu; M. Xu; S. Li; C. Chen; L. Yang; Z. Lv; |
1421 | Learn Topological Representation with Flexible Manifold Layer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the widely used softmax decision layer, originating from the conventional softmax regression for multi-class classification, fails to guide the powerful feature extractor to explore the topological structure hidden in data, which limits the quality of produced representations. Therefore, we propose a flexible manifold layer for better representation learning in this paper, rather than adding some regularized losses to introduce extra mechanisms. |
Z. Jiao; H. Zhang; X. Li; |
1422 | LED: Label Correlation Enhanced Decoder for Multi-Label Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods tend to take up extra input length and ignore the significance of taxonomic hierarchy. For this reason, we introduce a label correlation enhanced decoder (LED) for multi-label text classification. |
K. Ma; Z. Huang; X. Deng; J. Guo; W. Qiu; |
1423 | LE-DTA: Local Extrema Convolution for Drug Target Affinity Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a new graph-based prediction model, termed LE-DTA, that utilizes local extrema convolutions for effective feature extraction. |
T. Langore; T. -C. Hsu; Y. -H. Hsieh; C. Lin; |
1424 | Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though sparse, we show that the Lego-Features are powerful when tested with RNN-T or LAS decoders, maintaining high-quality downstream performance. |
R. Botros; R. Prabhavalkar; J. Schalkwyk; C. Chelba; T. N. Sainath; F. Beaufays; |
1425 | Less Is More: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Unified Acoustic Detector (UAD) for FTM when multiple invocation options are available on device. |
O. Rudovic; W. Chang; V. Garg; P. Dighe; P. Simha; J. Berkowitz; A. H. Abdelaziz; S. Kajarekar; E. Marchi; S. Adya; |
1426 | Level-Line Guided Edge Drawing for Robust Line Segment Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the observation that the line segments should locate on the edge points with both consistent coordinates and level-line information, i.e., the unit vector perpendicular to the gradient orientation, this paper proposes a level-line guided edge drawing for robust line segment detection (GEDRLSD). |
X. Lin; Y. Zhou; Y. Liu; C. Zhu; |
1427 | Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-Channel Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost. |
K. -L. Chen; D. D. E. Wong; K. Tan; B. Xu; A. Kumar; V. K. Ithapu; |
1428 | Leveraging Label Correlations in A Multi-Label Setting: A Case Study in Emotion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate ways to exploit label correlations in multi-label emotion recognition models to improve emotion detection. |
G. Chochlakis; G. Mahajan; S. Baruah; K. Burghardt; K. Lerman; S. Narayanan; |
1429 | Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose novel cross-lingual self-supervised speech representation learning methods that explicitly consider language information. |
T. Tanaka; R. Masumura; M. Ihori; H. Sato; T. Yamane; T. Ashihara; K. Matsuura; T. Moriya; |
1430 | Leveraging Large Text Corpora For End-To-End Speech Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two novel methods that leverage a large amount of external text summarization data for E2E SSum training. |
K. Matsuura; T. Ashihara; T. Moriya; T. Tanaka; A. Ogawa; M. Delcroix; R. Masumura; |
1431 | Leveraging Multiple Sources in Automatic African American English Dialect Detection for Adults and Children Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show robust, explainable performance across recording conditions for different features for adult speech, but fusing multiple features is important for good results on children’s speech. |
A. Johnson; V. M. Shetty; M. Ostendorf; A. Alwan; |
1432 | Leveraging Neural Koopman Operators to Learn Continuous Representations of Dynamical Systems from Scarce Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a new deep Koopman framework that represents dynamics in an intrinsically continuous way, leading to better performance on limited training data, as exemplified on several datasets arising from dynamical systems. |
A. Frion; L. Drumetz; M. D. Mura; G. Tochon; A. Aïssa-El-Bey; |
1433 | Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation. In this paper, we propose to use linguistic-acoustic similarity to explicitly measure the deviation of non-native production from its native reference for pronunciation assessment. |
W. Liu; K. Fu; X. Tian; S. Shi; W. Li; Z. Ma; T. Lee; |
1434 | Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose the Rawformer that leverages positional-related local-global dependency for synthetic speech detection. |
X. Liu; M. Liu; L. Wang; K. A. Lee; H. Zhang; J. Dang; |
1435 | Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer’s Disease Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features. |
J. Li; K. Song; J. Li; B. Zheng; D. Li; X. Wu; X. Liu; H. Meng; |
1436 | Leveraging Sparsity with Spiking Recurrent Neural Networks for Energy-Efficient Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare the trade-off between accuracy and energy-efficiency of a gated recurrent SNN (Spik-GRU) with a standard Gated Recurrent Unit (GRU) on the Google Speech Command Dataset (GSCD) v2. |
M. Dampfhoffer; T. Mesquida; E. Hardy; A. Valentian; L. Anghel; |
1437 | Lexicon-injected Semantic Parsing for Task-Oriented Dialog Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, semantic parsing using hierarchical representations for dialog systems has captured substantial attention. Task-Oriented Parse (TOP), a tree representation with intents … |
X. Meng; W. Dai; Y. Wang; B. Wang; Z. Wu; X. Jiang; Q. Liu; |
1438 | LGVIT: Local-Global Vision Transformer for Breast Cancer Histopathological Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the limited receptive field, CNNs have difficulty in learning the global information of breast cancer histopathological images, hindering the further improvement of this task. To solve this problem, we reasonably apply self-attention mechanism to this task and propose a new network called Local-Global Vision Transformer (LGViT) which utilizes CNNs to capture local features and self-attention mechanism to learn global features of histopathological images. |
L. Wang; J. Liu; P. Jiang; D. Cao; B. Pang; |
1439 | Light Field Compression Via Compact Neural Scene Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel light field compression method based on a low rank-constrained neural scene representation. |
J. Shi; C. Guillemot; |
1440 | LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present LightGrad, a lightweight DPM for TTS. |
J. Chen; X. Song; Z. Peng; B. Zhang; F. Pan; Z. Wu; |
1441 | Light Projection-Based Physical-World Vanishing Attack Against Car Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a stealthy physical adversarial attack by taking advantage of the transient of light projection. |
H. Wen; S. Chang; L. Zhou; |
1442 | Lightvessel: Exploring Lightweight Coronary Artery Vessel Segmentation Via Similarity Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is difficult to deploy complicated models in clinical scenarios since high-performance approaches have excessive parameters and high computation costs. To tackle this problem, we propose LightVessel, a Similarity Knowledge Distillation Framework, for lightweight coronary artery vessel segmentation. |
H. Dang; Y. Zhang; X. Qi; W. Zhou; M. Sun; |
1443 | Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform. |
M. Kawamura; Y. Shirahata; R. Yamamoto; K. Tachibana; |
1444 | Lightweight Annotation and Class Weight Training for Automatic Estimation of Alarm Audibility in Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus explore class weight to train a model that allows for a more robust decision threshold selection, ensuring a low false positive rate. |
F. Effa; R. Serizel; J. -P. Arz; N. Grimault; |
1445 | Light-Weight CNN-Attention Based Architecture for Hand Gesture Recognition Via Electromyography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a light-weight hybrid architecture (HDCAM) based on Convolutional Neural Network (CNN) and attention mechanism to effectively extract local and global representations of the input. |
S. Zabihi; E. Rahimian; A. Asif; A. Mohammadi; |
1446 | Lightweight Feature Encoder for Wake-Up Word Detection Based on Self-Supervised Speech Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LiteFEW, a lightweight feature encoder for wake-up word detection that preserves the inherent ability of wav2vec 2.0 with a minimum scale. |
H. Lim; Y. Kim; K. Yeom; E. Seo; H. Lee; S. J. Choi; H. Lee; |
1447 | Lightweight Fisher Vector Transfer Learning for Video Deduplication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a lightweight and robust deduplication feature based on the fisher vector aggregation of Scale-Invariant Feature Transform (SIFT) keypoints. |
C. Henry; R. Liao; R. Lin; Z. Zhang; H. Sun; Z. Li; |
1448 | Lightweight Machine Learning for Seizure Detection on Wearable Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the context of the ICASSP 2023 Seizure Detection Challenge, we propose a lightweight machine-learning framework for real-time epilepsy monitoring on wearable devices. |
B. Huang; A. Abtahi; A. Aminifar; |
1449 | Lightweight Portrait Segmentation Via Edge-Optimized Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current portrait segmentation architectures cannot meet the requirements of lightweight and edge-friendly. We built architecture with 0.06G FLOPs and 0.02M parameters to overcome this phenomenon. |
X. Zhang; G. Wang; L. Yang; C. Chen; |
1450 | Lightweight Prosody-TTS for Multi-Lingual Multi-Speaker Scenario Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a lightweight end-to-end text-to-speech (TTS) synthesis for the multi-lingual multi-speaker (ML-MS) scenario. |
G. Pamisetty; S. C. Varun; K. S. R. Murty; |
1451 | Light-Weight Sequential SBL Algorithm: An Alternative to OMP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a Light-Weight Sequential Sparse Bayesian Learning (LWS-SBL) algorithm as an alternative to the orthogonal matching pursuit (OMP) algorithm for the general sparse signal recovery problem. |
R. R. Pote; B. D. Rao; |
1452 | LIMI-VC: A Light Weight Voice Conversion Model with Mutual Information Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we proposed LIMI-VC, reducing the redundancy between the linguistic content and the timbre information with mutual information disentanglement. |
L. Huang; T. Yuan; Y. Liang; Z. Chen; C. Wen; Y. Xie; J. Zhang; D. Ke; |
1453 | Linear Microphone Array Parallel to The Driving Direction for In-Car Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a linear microphone array parallel to the driving direction for in-car speech enhancement. |
M. Tsujikawa; A. Sugiyama; K. Hanazawa; Y. Kajikawa; |
1454 | Line Segment Matching Based on Intersection-Enhanced Point Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, each existing method only relies on one specific kind of feature points to establish the correspondences, and could yield inferior performance when insufficient high-quality point correspondences are supplied. To address this fundamental problem, a novel line segment matching method is proposed. |
Z. Liu; B. Zhong; |
1455 | LINK: Linguistic Steganalysis Framework with External Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, even via the most advanced linguistic steganography methods, due to the random and uncontrollable message bits, steganographic texts may express content against common sense knowledge. To fully employ this defect of linguistic steganography, we propose LINK, a novel Linguistic steganalysis framework with the help of external Knowledge. |
J. Yang; Z. Yang; X. Ge; J. Zou; Y. Gao; Y. Huang; |
1456 | LiNuIQA: Lightweight No-Reference Image Quality Assessment Based on Non-Uniform Weighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an NR-IQA network named Lightweight Non-uniform Weighting-based NR-IQA (LiNuIQA) that adopts an efficient network as a feature extractor for a resource constraint environment and harnesses non-uniformly self-weighted local (from each patch) and global information (from all patches) to overcome the inherent problem of low performance stemming from use of lightweight feature extractor. |
W. -H. Kim; C. -H. Hahm; A. Baijal; N. Kim; I. Cho; J. Koo; |
1457 | Lip-to-Speech Synthesis in The Wild with Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Distinct from the previous methods, in this paper, we develop a powerful Lip2Speech method that can reconstruct speech with correct contents from the input lip movements, even in a wild environment. |
M. Kim; J. Hong; Y. M. Ro; |
1458 | LiQuiD-MIMO Radar: Distributed MIMO Radar with Low-Bit Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a low-bit quantized distributed MIMO (LiQuiD-MIMO) radar to significantly reduce the burden of signal acquisition and data transmission. |
Y. Xiang; F. Xi; S. Chen; |
1459 | LiteG2P: A Fast, Light and High Accuracy Model for Grapheme-to-Phoneme Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we integrate the advantages of both expert knowledge and connectionist temporal classification (CTC) based neural network and propose a novel method named LiteG2P which is fast, light and theoretically parallel. |
C. Wang; P. Huang; Y. Zou; H. Zhang; S. Liu; X. Yin; Z. Ma; |
1460 | Lit The Darkness: Three-Stage Zero-Shot Learning for Low-Light Enhancement with Multi-Neighbor Enhancement Factors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Also, it is challenging to enhance images and adapt to different illumination conditions. To address this problem, we introduce a zero-shot learning approach. |
M. Saeed; M. Torki; |
1461 | Liveness Score-Based Regression Neural Networks for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a liveness score-based regression network for overcoming the dependency on third party networks and users. |
Y. Kwak; M. Jung; H. Yoo; J. Shin; C. Kim; |
1462 | LMBAO: A Landmark Map for Bundle Adjustment Odometry in LiDAR SLAM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, as an effective joint optimization mechanism, bundle adjustment (BA) cannot be directly introduced into odometry due to the intensive computation of global landmarks. Therefore, this paper designs a landmark map for bundle adjustment odometry (LMBAO) in LiDAR SLAM. |
L. Zhang; J. Wang; L. Jie; N. Chen; X. Tan; Z. Duan; |
1463 | LMCodec: A Low Bitrate Speech Codec with Causal Transformer Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LMCodec, a causal neural speech codec that provides high quality audio at very low bitrates. |
T. Jenrungrot; M. Chinen; W. B. Kleijn; J. Skoglund; Z. Borsos; N. Zeghidour; M. Tagliasacchi; |
1464 | Locale Encoding for Scalable Multilingual Keyword Spotting Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional monolingual KWS approaches do not scale well to multilingual scenarios because of high development/maintenance costs and lack of resource sharing. To overcome this limit, we propose two locale-conditioned universal models with locale feature concatenation and feature-wise linear modulation (FiLM). |
P. Zhu; H. J. Park; A. Park; A. S. Scarpati; I. Lopez Moreno; |
1465 | Local Feature Enhanced Adversarial Network for The Blind Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the existing BIQA based on convolution neural networks has made significant progress in synthetic distortion evaluation, it still cannot be well extended to authentic distortion and algorithm-related distortion. Therefore, this paper proposes a BIQA adversarial network with local feature enhancement to deal with this challenge. |
X. Shi; M. Zhang; S. Xia; R. Zhang; J. Feng; |
1466 | Local-Global Progressive U-Transformers for Accurate Hepatic and Portal Veins Segmentation in Abdominal MR Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a new deep learning method called local-global progressive U-Transformers for precise extraction of hepatic and portal veins. |
Y. Wu; D. Shen; J. Jin; G. Xu; Y. Chen; X. Luo; |
1467 | Local-Global Siamese Network with Efficient Inter-Scale Feature Learning for Change Detection in VHR Remote Sensing Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, these networks often have a large number of parameters and high computational costs due to complex network architecture. To address the above issues, we propose a local-global siamese network (LGS-Net) for CD in VHR RS images. |
Y. Zhang; T. Lei; S. Han; Y. Xu; A. K. Nandi; |
1468 | Local Graph-Homomorphic Processing for Privatized Distributed Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method that we refer to as local graph-homomorphic processing; it relies on the construction of particular noises over the edges to ensure a certain level of differential privacy. |
E. Rizk; S. Vlaski; A. H. Sayed; |
1469 | Locality Preserving Multiview Graph Hashing For Large Scale Remote Sensing Image Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article proposes a multiview hashing with learnable parameters to retrieve the queried images for a large-scale remote sensing dataset. |
W. Li; G. Zhong; X. Lu; C. -M. Pun; |
1470 | Local to Global Prior Learning for Blind Unsupervised Image Super Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a cooperative local to global prior learning (LoGPT) framework for blind unsupervised image super resolution by jointly modeling the local connectivity with convolution operations and global context with transformer block. |
K. Yamawaki; X. -H. Han; |
1471 | Log-Can: Local-Global Class-Aware Network For Semantic Segmentation of Remote Sensing Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LoG-CAN, a multi-scale semantic segmentation network with a global class-aware (GCA) module and local class-aware (LCA) modules to remote sensing images. |
X. Ma; M. Ma; C. Hu; Z. Song; Z. Zhao; T. Feng; W. Zhang; |
1472 | Logo-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformer-based methods for DFER can achieve better performances but result in higher FLOPs and computational costs. |
F. Ma; B. Sun; S. Li; |
1473 | Logovit: Local-Global Vision Transformer for Object Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a local-global vision transformer (LoGoViT) for object re-identification by learning a hierarchical-level representation from fine-grained (local) to general (global) context features. |
N. Phan; T. D. Huy; S. T. M. Duong; N. T. Hoang; S. Tran; D. H. Hung; C. D. T. Nguyen; T. Bui; S. Q. H. Truong; |
1474 | LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor and then embeds token-level long-form features inside the vocabulary predictor, with a pre-trained contextual encoder RoBERTa to further boost the performance. |
X. Gong; Y. Wu; J. Li; S. Liu; R. Zhao; X. Chen; Y. Qian; |
1475 | Long-Memory Message-Passing for Spatially Coupled Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the reconstruction of sparse signals from spatially coupled, linear, and noisy measurements. |
K. Takeuchi; |
1476 | Long Range Imaging Using Multispectral Fusion of RGB and NIR Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose long-range imaging using multispectral fusion of RGB and NIR images. |
H. Zhang; L. Mei; C. Jung; |
1477 | Long-Short Attention Network For The Spectral Super-Resolution Of Multispectral Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To learn the global relationships among all bands and the correlations between adjacent bands simultaneously, this paper proposes a long-short attention network (LSA-Net) for the spectral super-resolution of MS images. |
K. Zhang; T. Jin; F. Zhang; J. Sun; |
1478 | Longshortnet: Exploring Temporal and Semantic Features Fusion In Streaming Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods for streaming perception are limited as they rely only on the current and adjacent two frames to learn movement patterns, which restricts their ability to model complex scenes, often leading to poor detection results. To address this limitation, we propose LongShortNet, a novel dual-path network that captures long-term temporal motion and integrates it with short-term spatial semantics for real-time perception. |
C. Li; Z. -Q. Cheng; J. -Y. He; P. Li; B. Luo; H. Chen; Y. Geng; J. -P. Lan; X. Xie; |
1479 | Long-Tailed Image Recognition with Dynamic Re-Weighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing re-weighting methods typically adopt a static weighting scheme, which usually hurts the accuracy of head categories. To deal with this issue, this paper proposes a progress-relevant weighting scheme called dynamic re-weighting, in which the weight assigned to a particular category first increases and then decreases, proportional to the number of samples that have been used in that category. |
X. Li; Y. Wang; J. Kato; |
1480 | Long-Tailed Recognition with Causal Invariant Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a comprehensive structural causal model is developed to excavate the intrinsic causal mechanism between data and labels. |
Y. Zhang; S. Shi; C. Fan; Y. Wang; W. Ouyang; WeiFan; J. Fan; |
1481 | Long-Term Synchronization of Wireless Acoustic Sensor Networks with Nonpersistent Acoustic Activity Using Coherence State Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While metrics of spatio-temporal sensor utility are key to successful aggregation of sensor nodes, for instance, to perform sound localization or beamforming, the same is true for waveform-based assessment and compensation of sampling-rate offset (SRO). This paper therefore proposes an acoustic coherence state (ACS) metric to support systems for SRO estimation to integrate estimations of various utility due to nonpersistent acoustic activity and geometrical diversity. |
A. Chinaev; N. Knaepper; G. Enzner; |
1482 | Look and Think: Intrinsic Unification of Self-Attention and Convolution for Spatial-Channel Specificity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider their intrinsic properties in spatial and channel domains for vision representation. |
X. Gao; H. Lin; Y. Li; R. Fang; X. Zhang; |
1483 | Loss Function Design for DNN-Based Sound Event Localization and Detection on Low-Resource Realistic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate our proposed methods on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Task 3 dataset, and the results demonstrate consistent improvements in SELD performance. |
Q. Wang; J. Du; Z. Nian; S. Niu; L. Chai; H. Wu; J. Pan; C. -H. Lee; |
1484 | Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation. |
N. Bhandari; P. -Y. Chen; |
1485 | Low-Bitrate Redundancy Coding of Speech Using A Rate-Distortion-Optimized Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a neural speech coder specifically optimized to transmit a large amount of overlapping redundancy at a very low bitrate, up to 50x redundancy using less than 32 kb/s. |
J. -M. Valin; J. Büthe; A. Mustafa; |
1486 | Low-Complexity Acoustic Echo Cancellation with Neural Kalman Filtering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the neural Kalman filtering (NKF), which uses neural networks to implicitly model the covariance of the state noise and observation noise and to output the Kalman gain in real-time. |
D. Yang; F. Jiang; W. Wu; X. Fang; M. Cao; |
1487 | Low-Dose CT Reconstruction Via Optimization-Inspired GAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a Proximal Linear ADMM framework-based Generative Adversarial Network (PLA-GAN) is proposed. |
J. Jiang; Y. Feng; H. Xu; J. Zheng; |
1488 | Low in Resolution, High in Precision: UAV Detection with Super-Resolution and Motion Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, to detect small UAV in video streams, the motion information of the target is also a noteworthy feature. In this regard, we propose a feature super-resolution-based UAV detector with motion information extractor. |
H. Wang; X. Wang; C. Zhou; W. Meng; Z. Shi; |
1489 | Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a low-latency sequence-to-sequence speech enhancement technique for electrolaryngeal (EL) speech. |
K. Kobayashi; T. Hayashi; T. Toda; |
1490 | Low Precision Representations for High Dimensional Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The large memory footprint of high dimensional models require quantization to a lower precision for deployment on resource constrained edge devices. With this motivation, we consider the problems of learning a (i) linear regressor, and a (ii) linear classifier from a given training dataset, and quantizing the learned model parameters subject to a pre-specified bit-budget. |
R. Saha; M. Pilanci; A. J. Goldsmith; |
1491 | Low-Rank Constrained Memory Autoencoder for Hyperspectral Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The process of constructing dictionary in these methods is complex and the stability of the models is hard to maintain. To address these problems, in this paper, we propose a low-rank constrained memory autoencoder (LRMAE) for HAD. |
Y. Lian; Y. Zhang; X. Feng; X. Jiang; Z. Cai; |
1492 | Low-Rank Plus Sparse Trajectory Decomposition for Direct Exoplanet Imaging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a direct imaging method for the detection of exo-planets based on a combined low-rank plus structured sparse model. |
S. Vary; H. Daglayan; L. Jacques; P. . -A. Absil; |
1493 | Low-Rank Tensor Decompositions for Quaternion Multiway Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After reviewing the theoretical difficulties related to quaternion tensor algebra, we propose the first construction of quaternion tensors as representation of dedicated quaternion multilinear forms. |
O. Imhogiemhe; J. Flamant; X. Luciani; Y. Zniyed; S. Miron; |
1494 | Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel method for leveraging pre-trained speech models for low-resource music classification based on the concept of Neural Model Reprogramming (NMR). |
Y. -N. Hung; C. -H. H. Yang; P. -Y. Chen; A. Lerch; |
1495 | LP-IOANet: Efficient High Resolution Document Shadow Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Assuming that the majority of practical document shadow removal scenarios require real-time, accurate models that can produce high-resolution outputs in-the-wild, we propose Laplacian Pyramid with Input/Output Attention Network (LP-IOANet), a novel pipeline with a lightweight architecture and an upsampling module. |
K. Georgiadis; M. K. Yucel; E. Skartados; V. Dimaridou; A. Drosou; A. Saà-Garriga; B. Manganelli; |
1496 | LQGNET: Hybrid Model-Based and Data-Driven Linear Quadratic Stochastic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present LQGNet, a stochastic controller that leverages data to operate under partially known dynamics. |
S. G. Casspi; O. Hüsser; G. Revach; N. Shlezinger; |
1497 | LSSED: A Robust Segmentation Network for Inflamed Appendix from CT Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These properties require high robustness and generalization capability of inflamed appendix segmentation networks. In this paper, we propose a CNN-Transformer-based encoder-decoder segmentation network (LSSED) equipped with localized stochastic sensitivity (LSS) loss function and residual dilated paths (RD-Paths) to solve above problems. |
W. W. Y. Ng; P. Zheng; T. Wang; J. Zhang; Y. Liang; H. Zhou; D. Liang; G. Li; X. Wei; |
1498 | LSTM-Based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they do not consider temporal distortions (e.g., frame freezes or skips) that occur during videoconferencing calls. In this paper, we present a data-driven approach for modeling such distortions automatically by training an LSTM with subjective quality ratings labeled via crowdsourcing. |
G. Mittag; B. Naderi; V. Gopal; R. Cutler; |
1499 | Lyapunov-Driven Deep Reinforcement Learning for Edge Inference Empowered By Reconfigurable Intelligent Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel algorithm for energy-efficient, low-latency, accurate inference at the wireless edge, in the context of 6G networks endowed with reconfigurable intelligent surfaces (RISs). |
K. Stylianopoulos; M. Merluzzi; P. Di Lorenzo; G. C. Alexandropoulos; |
1500 | M22: Rate-Distortion Inspired Gradient Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes M22, a rate-distortion inspired approach to model update compression for distributed training of deep neural networks (DNNs). |
Y. Liu; S. Salehkalaibar; S. Rini; J. Chen; |
1501 | M2-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose M2-CTTS, an end-to-end multi-scale multi-modal conversational text-to-speech system, aiming to comprehensively utilize historical conversation and enhance prosodic expression. |
J. Xue; Y. Deng; F. Wang; Y. Li; Y. Gao; J. Tao; J. Sun; J. Liang; |
1502 | M2TSR: Multi-Range and Mix-Grained Transformer for Single Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods typically calculate MSA in a single range and granularity, preventing the model from capturing sufficient relationships between pixels, thus leading to inferior representation ability. To address this issue, we propose Multi-range and Mix-grained Transformer (M2TSR) for accurate image SR. |
Z. -H. Niu; Q. -L. Zhang; Y. Fan; Y. -B. Yang; |
1503 | M3ST: Mix at Three Levels for Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Mix at three levels for Speech Translation (M3ST) method to increase the diversity of the augmented training corpus. |
X. Cheng; Q. Dong; F. Yue; T. Ko; M. Wang; Y. Zou; |
1504 | Mabnet: Master Assistant Buddy Network With Hybrid Learning for Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel Master Assistant Buddy Network (MAB-Net) for image retrieval which incorporates both the learning mechanisms. |
R. Agarwal; G. Das; S. Aggarwal; A. Horsch; D. K. Prasad; |
1505 | Machine Learning-Aided Piece-Wise Modeling Technique of Power Amplifier for Digital Predistortion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new power amplifier (PA) behavioral modeling approach, to characterize and compensate for the signal quality degrading effects induced by a PA with a machine learning (ML) aided piece-wise (PW) modeling approach. |
S. S. Krishna Chaitanya Bulusu; N. Tervo; P. Susarla; M. J. Sillanpää; O. Silvén; M. Juntti; A. Pärssinen; |
1506 | Machine Learning Based Early Debris Detection Using Automotive Low Level Radar Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a radar-based stationary object detection system that combines signal processing techniques with machine learning technology to detect stationary in-path objects from the low level spectra of front looking radars. |
K. Tyagi; S. Zhang; Y. Zhang; J. Kirkwood; S. Song; N. Manukian; |
1507 | MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel UDA approach for ASR via inter-domain MAtching and intra-domain DIscrimination (MADI), which improves the model transferability by fine-grained inter-domain matching and discriminability by intra-domain contrastive discrimination simultaneously. |
J. Zhou; S. Zhao; N. Jiang; G. Zhao; Y. Qin; |
1508 | MAID: A Conditional Diffusion Model for Long Music Audio Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the information about these segments may differ significantly from the original. To solve this problem, we propose MAID (Music Audio Inpainting DDPM), a model for music audio inpainting based on DDPM (Denoising Diffusion Probability Model). |
K. Liu; W. Gan; C. Yuan; |
1509 | Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the simple and cost-effective method of concatenating the original data examples to build new training instances. |
T. K. Lam; S. Schamoni; S. Riezler; |
1510 | Make Your Enemy Your Friend: Improving Image Rotation Angle Estimation with Harmonics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper revisits the harmonics caused by rotation in the cyclic spectrum and points out that such harmonics can help our rotation angle estimation if used effectively. Based on this observation, we propose aggregating the magnitudes of the candidate peaks and their harmonics in detecting the rotation-specific peak. |
K. Yu; M. D. M. Hosseini; A. Peng; H. Zeng; M. Goljan; |
1511 | Making Synchrosqueezing Locally Adaptive in The Time-Frequency Plane Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the problem of making synchrosqueezing transform adaptive. |
M. A. Colominas; S. Meignen; |
1512 | Managing Information Updating with Edge Computing: A Distributed and Learning Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate information updating scheduling with multiple users in MEC. |
J. He; D. Zhang; S. Liu; Y. Zhou; Y. Zhang; |
1513 | Margin-Mixup: A Method for Robust Speaker Verification In Multi-Speaker Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that current speaker verification systems are not robust against audio with noticeable speaker overlap. |
J. Thienpondt; N. Madhu; K. Demuynck; |
1514 | MarginNCE: Robust Sound Localization with A Negative Margin Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work is to localize sound sources in visual scenes with a self-supervised approach. |
S. Park; A. Senocak; J. S. Chung; |
1515 | Maskdul: Data Uncertainty Learning in Masked Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, Data Uncertainty Learning (DUL) fails to achieve reasonable performance in MFR. Therefore, we propose a novel two-stream convolutional network, masked face data uncertainty learning (MaskDUL), that solves the problems by sampling uncertainty and intra-class distribution learning in MFR. |
L. Zhang; W. Xiong; K. Zhao; K. Chen; M. Zhong; |
1516 | MASKED-AP: Attention Pyramid Convolutional Neural Network with Mask for Cervical Cell Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The difference between classes is small while the difference within a class is large, so it is difficult to capture the discriminative features between different classes of cells for classification. To address this problem, this paper proposes an attention pyramid model (Masked-AP) used for cervical cell classification. |
Y. Jin; J. Liu; H. Chen; W. Duan; D. Cao; B. Pang; |
1517 | Masked Autoencoders Are Articulatory Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a deep learning based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. |
A. A. Attia; C. Y. Espy-Wilson; |
1518 | Masked Modeling Duo: Learning Representations By Encouraging Both Networks to Model The Input Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches. |
D. Niizumi; D. Takeuchi; Y. Ohishi; N. Harada; K. Kashino; |
1519 | Masked Spectrogram Prediction for Self-Supervised Audio Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel self-supervised learning method for transformer-based audio models, called masked spectrogram prediction (MaskSpec), to learn powerful audio representations from unlabeled audio data (AudioSet used in this paper). |
D. Chong; H. Wang; P. Zhou; Q. Zeng; |
1520 | Masked Token Similarity Transfer for Compressing Transformer-Based ASR Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, setting embedding dimension of teacher and student network to different values makes it difficult to transfer token embeddings for better performance. To mitigate this issue, we present a novel KD method in which student mimics the prediction vector of teacher under our proposed masked token similarity transfer (MTST) loss where the temporal relation between a token and the other unmasked ones is encoded into a dimension-agnostic token similarity vector. |
E. Choi; Y. Lim; B. -Y. Kim; H. Y. Kim; H. Lee; Y. Lim; S. W. Yu; S. Yoo; |
1521 | Mask Guided Selective Context Decoding for Handwritten Chinese Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article proposes a multi-modal attention-based framework for offline HCTR capable of visual and semantic reasoning. |
T. Li; S. Wu; Z. Wang; |
1522 | Masking Speech Contents By Random Splicing: Is Emotional Expression Preserved? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss the influence of random splicing on the perception of emotional expression in speech signals. |
F. Burkhardt; A. Derington; M. Kahlau; K. Scherer; F. Eyben; B. Schuller; |
1523 | Mask The Bias: Improving Domain-Adaptive Generalization of CTC-Based ASR with Internal Language Model Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel ILME technique for CTC-based ASR models. |
N. Das; M. Sunkara; S. Bodapati; J. Cai; D. Kulshreshtha; J. Farris; K. Kirchhoff; |
1524 | Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores large-scale multilingual ASR models on 70 languages. |
A. Tjandra; N. Singhal; D. Zhang; O. Kalinli; A. Mohamed; D. Le; M. L. Seltzer; |
1525 | Massively Multilingual Shallow Fusion with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages. |
K. Hu; T. N. Sainath; B. Li; N. Du; Y. Huang; A. M. Dai; Y. Zhang; R. Cabrera; Z. Chen; T. Strohman; |
1526 | MAST: Multiscale Audio Spectrogram Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST) [1]. |
S. Ghosh; A. Seth; S. Umesh; D. Manocha; |
1527 | Matching-Based Term Semantics Pre-Training for Spoken Patient Query Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize MSF into a matching problem and propose a Term Semantics Pre-trained Matching Network (TSPMN) that takes both terms and queries as input to model their semantic inter-action. |
Z. Hu; X. Chen; H. Wu; M. Han; Z. Ni; J. Shi; S. Xu; B. Xu; |
1528 | Matrix Low-Rank Approximation for Policy Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we put forth low-rank matrix-based models to estimate efficiently the parameters of PG algorithms. |
S. Rozada; A. G. Marques; |
1529 | Matrix Recovery Using Deep Generative Priors with Low-Rank Deviations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But such methods require that the recovered data be limited to the scope of the generator, otherwise it will lead to large recovery error. To circumvent this problem, in this paper, a framework for matrix recovery from limited measurements is proposed, which employs low rank approximation to characterize the deviation of generator, referred to as Low-Rank-Gen. |
P. Yu; J. Wang; C. Xu; |
1530 | Matrix Resolvent Eigenembeddings for Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce a novel matrix resolvent expansion-based projection scheme to update eigenvector embeddings of dynamic graphs. |
V. Kalantzis; P. A. Traganitis; |
1531 | Maximum Likelihood Distillation for Robust Modulation Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build on knowledge distillation ideas and adversarial training in order to build more robust AMC systems. |
J. Maroto; G. Bovet; P. Frossard; |
1532 | MCKD: Mutually Collaborative Knowledge Distillation For Federated Domain Adaptation And Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adapting a different approach in this work, we devised a data-free semantic collaborative distillation strategy to learn domain-invariant representation for both federated UDA and DG. |
Z. Niu; H. Wang; H. Sun; S. Ouyang; Y. -w. Chen; L. Lin; |
1533 | MCNET: Fuse Multiple Cues for Multichannel Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrowband spatial, sub-band spectral, and full-band spectral information. |
Y. Yang; C. Quan; X. Li; |
1534 | MCNeT: Measurement-Consistent Networks Via A Deep Implicit Layer For Solving Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such instabilities in DNNs can be explained by the fact that they ignore the forward measurement model during deployment, and thus fail to enforce consistency between their output and the input measurements. To overcome this, we propose a framework that transforms any DNN for inverse problems into a measurement-consistent one. |
R. Mourya; J. F. C. Mota; |
1535 | Mcrood: Multi-Class Radar Out-Of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a reconstruction-based multi-class OOD detector that operates on radar range doppler images (RDIs). |
S. M. Kahya; M. Sami Yavuz; E. Steinbach; |
1536 | M-CTRL: A Continual Representation Learning Framework with Slowly Improving Past Pre-Trained Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present momentum continual representation learning (M-CTRL), a framework that slowly updates the offline model with an exponential moving average of the online model. |
J. -S. Choi; J. -H. Lee; C. -W. Lee; J. -H. Chang; |
1537 | MDR-MFI:Multi-Branch Decoupled Regression and Multi-Scale Feature Interaction for Partial-to-Partial Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, previous methods extract and interact features in a single scale, which ignores the rich information from multiple scales. To address above issues, in this paper, we propose a multi-branch decoupled regression and multi-scale feature interaction (MDR-MFI) framework for point cloud registration. |
W. Dai; X. Yan; J. Wang; D. Xie; S. Pu; |
1538 | Measure and Countermeasure of The Capsulation Attack Against Backdoor-Based Deep Neural Network Watermarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a metric to measure a backdoor-based watermarking scheme’s security against the capsulation attack, and design a new backdoor-based deep neural network watermarking scheme that is secure against the capsulation attack by inverting the encoding process. |
F. -Q. Li; S. -L. Wang; Y. Zhu; |
1539 | Measuring Deviation from Stochasticity in Time-Series Using Autoencoder Based Time-Invariant Representation: Application to Black Hole Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach to quantify deviation from stochasticity (DS) in a time-series. |
C. S. Pradeep; N. Sinha; B. Mukhopadhyay; |
1540 | Measuring The Transferability of ℓ∞ Attacks By The ℓ2 Norm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since larger perturbations naturally lead to better transferability, we thereby advocate that the strength of attacks should be simultaneously measured by both the ℓ∞ and ℓ2 norm. |
S. Chen; Q. Tao; Z. Ye; X. Huang; |
1541 | Medleyvox: An Evaluation Dataset for Multiple Singing Voices Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. |
C. -B. Jeon; H. Moon; K. Choi; B. S. Chon; K. Lee; |
1542 | MEET: A Monte Carlo Exploration-Exploitation Trade-Off for Buffer Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. |
J. Ott; L. Servadei; J. Arjona-Medina; E. Rinaldi; G. Mauro; D. S. Lopera; M. Stephan; T. Stadelmayer; A. Santra; R. Wille; |
1543 | Meeting Action Item Detection with Regularized Context Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We construct and release the first Chinese meeting corpus with manual action item annotations1. |
J. Liu; C. Deng; Q. Zhang; Q. Chen; W. Wang; |
1544 | Memory-Augmented Contrastive Learning for Talking Head Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The same speech clip can generate multiple possible lip and head movements, that is, there is no one-to-one mapping relationship between them. To overcome this problem, we propose a Speech Feature Extractor (SFE) based on memory-augmented self-supervised contrastive learning, which introduces the memory module to store multiple different speech mapping results. |
J. Wang; Y. Zhao; H. Fan; T. Xu; Q. Li; S. Li; L. Liu; |
1545 | Memory-Augmented U-Transformer For Multivariate Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods can easily suffer from overfitting or lack of discrimination between normal and abnormal samples. In this work, we propose a memory-augmented U-Transformer framework to address these issues. |
S. Qin; Y. Luo; G. Tao; |
1546 | Mendam: Multi-Expert Network with Distribution-Aware Momentum for Long-Tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The existing deferred re-balancing methods suffer from low accuracy for tail classes due to their bias towards head classes. In this paper, we find that properly adjusting the momentum can significantly improve the performance for class-imbalanced tasks, which provides a novel perspective to solve this thorny problem. |
Q. Zhang; H. Ye; K. Yu; |
1547 | Meta-Dag: Meta Causal Discovery Via Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general causal learning model inspired by meta-learning, which aims at finding an invariant DAG over multiple domains and increasing the generalization performance of DAG structure discovery. |
S. Lu; T. Gao; |
1548 | Meta Learning for Domain Agnostic Soft Prompt Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, either applying the hard prompt for sentences by defining a collection of human-engineering prompt templates or directly optimizing the soft or continuous prompt with labeled data may not really generalize well for unseen domain data. To cope with this issue, this paper presents a new prompt-based unsupervised domain adaptation where the learned soft prompt is able to boost the frozen pre-trained language model to deal with the input tokens from unseen domains. |
M. -Y. Chen; M. Rohmatillah; C. -H. Lee; J. -T. Chien; |
1549 | Meta-Learning for Image-Guided Millimeter-Wave Beam Selection in Unseen Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to use the Model-Agnostic Meta-Learning (MAML) framework on the image data of the mmWave vehicle-to-infrastructure beam selection FLASH dataset, to overcome the generalization issues of a pre-trained model in unseen non-line-of-sight (NLOS) connectivity environments. |
J. Gu; L. Collins; D. Roy; A. Mokhtari; S. Shakkottai; K. R. Chowdhury; |
1550 | Meta Learning with Adaptive Loss Weight for Low-Resource Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to apply a loss weight adaption method to MAML using Convolutional Neural Network (CNN) with Homoscedastic Uncertainty. |
Q. Wang; W. Hu; L. Li; Q. Hong; |
1551 | Meta++ Network for Few-Shot Aerospace Crack Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to solve the problem of sample scarcity in the industrial field, this paper proposes a UNet++ based few-shot segmentation network Meta++ for aerospace metal structural component fatigue crack. |
C. Xu; K. Liu; X. Li; |
1552 | Metric Learning for User-Defined Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work is to detect new spoken terms defined by users. |
J. Jung; Y. Kim; J. Park; Y. Lim; B. -Y. Kim; Y. Jang; J. S. Chung; |
1553 | Metric-Oriented Speech Enhancement Using Diffusion Probabilistic Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This mismatch between the training objective and evaluation metric likely results in sub-optimal performance. To alleviate it, we propose a metric-oriented speech enhancement method (MOSE), which leverages the recent advances in the diffusion probabilistic model and integrates a metric-oriented training strategy into its reverse process. |
C. Chen; Y. Hu; W. Weng; E. S. Chng; |
1554 | MFAT: A Multi-Level Feature Aggregated Transformer for Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, we find that Transformer’s lower-level information is also helpful for the recognition accuracy of the query person, especially, when the scene changes greatly. Therefore, we propose a Multi-level Feature Aggregated Transformer for person re-identification (MFAT) with high performance. |
B. Tan; L. Xu; Z. Qiu; Q. Wu; F. Meng; |
1555 | MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. |
M. R. Hasanabadi; M. Behdad; D. Gharavian; |
1556 | MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). |
W. Fan; X. Xing; B. Cai; X. Xu; |
1557 | MHLAT: Multi-Hop Label-Wise Attention Model for Automatic ICD Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, although pretrained language models have been used to address these problems, they suffer from huge memory usage. To address the above problems, we propose a simple but effective model called the Multi-Hop Label-wise ATtention (MHLAT), in which multi-hop label-wise attention is deployed to get more precise and informative representations. |
J. Duan; H. Jiang; Y. Yu; |
1558 | MHSCNET: A Multimodal Hierarchical Shot-Aware Convolutional Network for Video Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to construct a more powerful and robust frame-wise representation and predict the frame-level importance score in a fair and comprehensive manner. |
W. Xu; R. Wang; X. Guo; S. Li; Q. Ma; Y. Zhao; S. Guo; Z. Zhu; J. Yan; |
1559 | MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for intermediating multiple speakers’ attributes and diversifying their voice characteristics in “speaker generation,” an emerging task that aims to synthesize a nonexistent speaker’s naturally sounding voice. |
A. Watanabe; S. Takamichi; Y. Saito; D. Xin; H. Saruwatari; |
1560 | Mimo Radar Transmit Beampattern Matching Via Manifold Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After that, an efficient Riemannian conjugate gradient algorithm is developed to solve it. |
W. Xiong; J. Hu; K. Zhong; |
1561 | Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-Trained Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a temporal shift module to mingle channel-wise information without introducing any parameter or FLOP. |
S. Shen; F. Liu; A. Zhou; |
1562 | Minimising Distortion for GAN-Based Facial Attribute Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Evaluations demonstrate that our method performs better than the state-of-the-art in terms of both accuracy and fidelity. |
M. Shao; L. Lu; Y. Ding; Q. Liao; |
1563 | Misspecified Cramér-Rao Bound of RIS-Aided Localization Under Geometry Mismatch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we derive the misspecified Cramér-Rao bound (MCRB) for a single-input-single-output RIS-aided localization system with RIS geometry mismatch. |
P. Zheng; H. Chen; T. Ballal; H. Wymeersch; T. Y. Al-Naffouri; |
1564 | Mitigating Domain Dependency for Improved Speech Enhancement Via SNR Loss Boosting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, models trained with these losses heavily depend on specific domain properties, i.e. speaker, noise type, and signal-to-noise ratio (SNR). In this paper, we first validate this assumption by visually analyzing the model’s internal representation, and these dependencies result in severe performance degradation in unseen situations. |
L. Yin; D. Wu; Z. Qiu; H. Huang; |
1565 | Mitigating Unintended Memorization in Language Models Via Alternating Teaching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. |
Z. Liu; X. Zhang; F. Peng; |
1566 | Mixed Far-field and Near-field Source Localization Based on Low-Rank Matrix Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a mixed far-field (FF) and near-field (NF) source localization algorithm based on low-rank matrix reconstruction (LRMR). |
Y. Liu; H. Jiang; Q. Zhang; |
1567 | Mixed Sample Augmentation for Online Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap, we make the first attempt at incorporating CutMix into online distillation, where we empirically observe a significant improvement. Encouraged by this fact, we propose an even stronger MSR specifically for online distillation, named as CutnMix. |
Y. Shen; L. Xu; Y. Yang; Y. Li; Y. Guo; |
1568 | Mixer: DNN Watermarking Using Image Mixup Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The DNN should perform two main tasks: its primary task and watermarking task. This paper proposes a lightweight, reliable, and secure DNN watermarking that attempts to establish strong ties between these two tasks. |
K. Kallas; T. Furon; |
1569 | MLCGAN: Multi-Lead ECG Synthesis with Multi Label Conditional Generative Adversarial Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For ECG synthesis, to the best of our knowledge as the reason of time sequences and multiple labels constraints, no model can generate ECG corresponding to clinic data.In this paper, we present a novel multi-label conditional generative adversarial network, named MLCGAN. |
J. Wu; L. Wang; H. Pan; B. Wang; |
1570 | MLP-GAN for Brain Vessel Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One successful approach is to consider the segmentation as an image-to-image translation task and perform a conditional Generative Adversarial Network (cGAN) to learn a transformation between two distributions. In this paper, we present a novel multi-view approach, MLP-GAN, which splits a 3D volumetric brain vessel image into three different dimensional 2D images (i.e., sagittal, coronal, axial) and then feed them into three different 2D cGANs. |
B. Xie; H. Tang; B. Duan; D. Cai; Y. Yan; |
1571 | MMATR: A Lightweight Approach for Multimodal Sentiment Analysis Based on Tensor Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the considerable research output on Multimodal Learning for Affect-related tasks, most of the current methods are very complex in terms of the number of trainable parameters, and thus do not constitute effective solutions for real-life applications. In this work we try to alleviate this gap in the literature by introducing the Multimodal Attention Tensor Regression (MMATR) network, a lightweight model that is based on: (i) a static input representation (2D matrix of dimensions time × features) for each modality, which helps to avoid high-parameterized sequential models by incorporating a CNN, (ii) the replacement of the usual pooling and flattening operations as well as the linear layers by tensor contraction and tensor regression layers that are able to reduce the number of parameters, while keeping the high-order structure of the multimodal data, and (iii) a bimodal attention layer that learns multimodal co-occurrences. |
P. Koromilas; M. A. Nicolaou; T. Giannakopoulos; Y. Panagakis; |
1572 | MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fueled by the success of cosine loss that builds hyperspherical feature spaces and achieves lower intra-class angular variability, this paper proposes Multi-Modal Cosine loss, MMCosine. |
R. Xu; R. Feng; S. -X. Zhang; D. Hu; |
1573 | MmSense: Detecting Concealed Weapons with A Miniature Radar Sensor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our paper proposes mmSense, an end-to-end portable miniaturised real-time system that can accurately detect the presence of concealed metallic objects on persons in a discrete, privacy-preserving modality. |
K. Mitchell; K. Kassem; C. Kaul; V. Kapitany; P. Binner; A. Ramsay; D. Faccio; R. Murray-Smith; |
1574 | MmWave Wi-Fi Trajectory Estimation with Continuous-Time Neural Dynamic Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a dual-decoder neural dynamic learning framework to simultaneously reconstruct Wi-Fi beam training measurements at irregular time instances and learn the unknown dynamics over the latent space in a continuous-time fashion by enforcing strong supervision at both the coordinate and measurement levels. |
C. J. Vaca-Rubio; P. Wang; T. Koike-Akino; Y. Wang; P. Boufounos; P. Popovski; |
1575 | Möbius Total Variation for Directed Acyclic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel definition of total variation (TV) specifically defined for directed acyclic graphs (DAGs). |
V. Mihal; M. Püschel; |
1576 | Modaldrop: Modality-Aware Regularization for Temporal-Spectral Fusion in Human Activity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then delve into the intrinsic mechanism of the multi-view representation fusion, and propose ModalDrop as a novel modality-aware regularization method to learn and exploit representations of both views effectively. |
X. Zeng; Y. Chen; B. Xu; T. Zhang; |
1577 | MODEFORMER: Modality-Preserving Embedding For Audio-Video Synchronization Using Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ModEFormer, which independently extracts audio and video embeddings using modality-specific transformers. |
A. Gupta; R. Tripathi; W. Jang; |
1578 | Model-Based Spectral Reconstruction Of Interferometric Acquisitions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we represent the system through an ∞-wave model. |
M. Jouni; D. Picone; M. D. Mura; |
1579 | Model-based Vs. Data-driven Approaches for Predicting Rain-induced Attenuation in Commercial Microwave Links: A Comparative Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an empirical study of model-based and data-driven techniques applied to multi-step predictions of rain attenuation in terrestrial microwave links. |
D. Jacoby; J. Ostrometzky; H. Messer; |
1580 | Model Fingerprinting with Benign Inputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose fingerprinting scheme (coined FBI) that are resilient to significant modifications of the models. |
T. Maho; T. Furon; E. Le Merrer; |
1581 | Model-Free Learning of Optimal Beamformers for Passive IRS-Assisted Sumrate Maximization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Assuming fully-passive, sensing- free IRS operation, we introduce a new data-driven Zeroth-order Stochastic Gradient Ascent (ZoSGA) algorithm for sumrate optimization in an IRS-aided downlink setting. |
H. Hashmi; S. Pougkakiotis; D. S. Kalogerias; |
1582 | Model-Free Online Learning for Waveform Optimization In Integrated Sensing And Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers waveform optimization problems for managing and mitigating interference in integrated sensing and communications (ISAC) systems. |
P. Pulkkinen; V. Koivunen; |
1583 | Modeling Global Latent Semantic in Multi-Turn Conversations with Random Context Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Global semantic-guided Variational Dialog (GVDialog) model, which introduces a Variational Autoencoder (VAE) into basic Transformer-based hierarchical dialogue models and use Random Context Reconstruction (RCR) task to compress global semantics into latent space without any timeconsuming human annotation. |
C. Zhang; D. Wu; |
1584 | Modeling The Wave Equation Using Physics-Informed Neural Networks Enhanced With Attention to Loss Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an enhancement that focuses on the assigned weights for the PINN loss function terms in order to more accurately model the wave PDE in a homogeneous, inhomogeneous domain, and with a higher frequency wave source function. |
S. Alkhadhr; M. Almekkawy; |
1585 | Modeling Turn-Taking in Human-To-Human Spoken Dialogue Datasets Using Self-Supervised Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to that, in this paper we intro-duce a modular End-to-End system based on an Upstream + Downstream architecture paradigm, which allows easy use/integration of a large variety of self-supervised features to model the specific Turn-Taking task of End-of-Turn Detection (EOTD). |
E. Morais; M. Damasceno; H. Aronowitz; A. Satt; R. Hoory; |
1586 | Modelling Black-Box Audio Effects with Time-Varying Feature Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression. To address this, we propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones, an approach that enables learnable adaptation of the intermediate activations. |
M. Comunità; C. J. Steinmetz; H. Phan; J. D. Reiss; |
1587 | Modelling Low-Resource Accents Without Accent-Specific TTS Frontend Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work focuses on modelling a speaker’s accent that does not have a dedicated text-to-speech (TTS) frontend, including a grapheme-to-phoneme (G2P) module. |
G. Tinchev; M. Czarnowska; K. Deja; K. Yanagisawa; M. Cotescu; |
1588 | Model-Matching Principle Applied to The Design of An Array-Based All-Neural Binaural Rendering System for Audio Telepresence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this contribution, we propose an array-based binaural rendering system that converts the array microphone signals into the head-related transfer function (HRTF)-filtered output signals for headphone-rendering. |
Y. Hsu; C. Ma; M. R. Bai; |
1589 | Modify: Model-Driven Face Stylization Without Style Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing face stylization methods always acquire the presence of the target (style) domain during the translation process, which violates privacy regulations and limits their applicability in real-world systems. To address this issue, we propose a new method called MODel-drIven Face stYlization (MODIFY), which relies on the generative model to bypass the dependence of the target images. |
Y. Ding; J. Liang; J. Cao; A. Zheng; R. He; |
1590 | Modular Conformer Training for Flexible End-to-End ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an alternative approach, called Modular Conformer training, which splits the Conformer model into a backbone convolutional model and attention submodels, which are added at each layer. |
K. Audhkhasi; B. Farris; B. Ramabhadran; P. J. Moreno; |
1591 | Modulation-Based Center Alignment and Motion Mining for Spatial Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Modulation-based Center Alignment (MCA) and Sparse Valuable Motion Mining (SVMM) for more accurate action detection: With deformable convolution, key-frame based modulation is firstly designed to align the action center between temporal frames; then motion region guided sparse self-attention is developed for valuable motion mining. |
W. Zhao; K. Huang; C. Zhang; |
1592 | Modulo EEG Signal Recovery Using Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a deep learning method for modulo signal recovery, which can be applied to recover folded EEG signals. |
T. Geng; F. Ji; Pratibha; W. P. Tay; |
1593 | MoLE : Mixture Of Language Experts For Multi-Lingual Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a multi-lingual speech recognition network named Mixture-of-Language-Experts (MoLE), which digests speech in a variety of languages. |
Y. Kwon; S. -W. Chung; |
1594 | Monocular 3D Human Pose Estimation Based on Global Temporal-Attentive and Joints-Attention In Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local contextual relations of human motion and ignores human joint hierarchies. To address this problem, we propose a Global Temporal-Attentive and Joints-Attention network (GTAJA-Net). |
R. He; S. Xiang; P. Tao; Y. Yu; |
1595 | More Speaking or More Speakers? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we aim to analyse the effect of number of speakers in the training data on a recent SSL algorithm (wav2vec 2.0), and a recent ST algorithm (slimIPL). |
D. Berrebbi; R. Collobert; N. Jaitly; T. Likhomanenko; |
1596 | MossFormer: Pushing The Performance Limit of Monaural Speech Separation Using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we achieve the upper bound by proposing a gated single-head transformer architecture with convolution-augmented joint self-attentions, named MossFormer (Monaural speech separation TransFormer). |
S. Zhao; B. Ma; |
1597 | Motion-Aware Video Paragraph Captioning Via Exploring Object-Centered Internal Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works rarely put focus on modeling the dynamic changing state of the objects in the videos, causing the activities occurred in videos are poorly or wrongly depicted in paragraphs. To address this problem, we propose a novel Object State Tracking Network, which can capture the temporal state change of objects. |
Y. Hu; G. Yu; Y. Zhang; R. Feng; T. Zhang; X. Lu; S. Gao; |
1598 | Motion Matters: A Novel Motion Modeling for Cross-View Gait Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel motion modeling method to extract the discriminative and robust representation. |
J. Li; J. Gao; Y. Zhang; H. Shan; J. Zhang; |
1599 | Motor Activity Recognition Using Eeg Data and Ensemble of Stacked BLSTM-LSTM Network and Transformer Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a Stacked BLSTM-LSTM, EEG-Transformer, and their ensemble network to predict real-life motor activities of individuals using EEG (ElectroEncephalo-Gram) data. |
P. Kaushik; I. Tripathi; P. P. Roy; |
1600 | Mouth Breathing Detection Using Audio Captured Through Earbuds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a machine-learning approach using audio captured by commercially available earbuds to detect mouth breathing. |
T. Ahmed; M. M. Rahman; E. Nemati; J. Kuang; A. Gao; |
1601 | Movienet-PS: A Large-Scale Person Search Dataset in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a more general and realistic task in the wild, where we aim to search target persons with a much higher degree of diversity. |
J. Qin; P. Zheng; Y. Yan; R. Quan; X. Cheng; B. Ni; |
1602 | Moving Towards Non-Binary Gender Identification Via Analysis of System Errors in Binary Gender Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to analyse human perceptions of gender in speech signals, focusing on signals that are misclassified by methods for binary gender classification, looking at the features of speech signals that are more likely to be misclassified, or classified as either nonbinary or unclassifiable. |
S. Ellis; S. Goetze; H. Christensen; |
1603 | MPE4G : Multimodal Pretrained Encoder for Co-Speech Gesture Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To acquire robust and generalized encodings, we propose a novel framework with a multimodal pre-trained encoder for co-speech gesture generation. |
G. Kim; S. Noh; I. Ham; H. Ko; |
1604 | MPS-AMS: Masked Patches Selection and Adaptive Masking Strategy Based Self-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, their fixed high masking strategy limits the upper bound of conditional mutual information, and the gradient noise is considerable, making less the learned representation information. Motivated by these limitations, in this paper, we propose masked patches selection and adaptive masking strategy based self-supervised medical image segmentation method, named MPS-AMS. |
X. Wang; R. Wang; B. Tian; J. Zhang; S. Zhang; J. Chen; T. Lukasiewicz; Z. Xu; |
1605 | MRML: Multimodal Rumor Detection By Deep Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods mainly focus on the multimodal fusion process while paying little attention to the intra-modal relationships. To address these limitations, we propose a multimodal rumor detection method with deep metric learning (MRML) to effectively extract multimodal relationships of news for detecting rumors. |
L. Peng; S. Jian; D. Li; S. Shen; |
1606 | MRNET: Multi-Refinement Network for Dual-Pixel Images Defocus Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though many methods have been proposed, the problem is still challenging because of their low deblurring performance and long processing time. To solve this problem, we propose an efficient Multi-Refinement Network (MRNet) for dual-pixel images defocus deblurring. |
D. Zhang; X. Wang; Z. Jin; |
1607 | MSFORMER: Multi-Scale Transformer with Neighborhood Consensus for Feature Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes MSFormer, which uses Transformers situated in different branches to obtain feature descriptors. |
D. Li; Y. Yan; D. Liang; S. Du; |
1608 | MSNet: A Deep Architecture Using Multi-Sentiment Semantics for Sentiment-Aware Image Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To incorporate the sentiment information into the image style transfer task for better sentiment-aware performance, we introduce a new task named sentiment-aware image style transfer. |
S. Sun; J. Jia; H. Wu; Z. Ye; J. Xing; |
1609 | MSN-net: Multi-Scale Normality Network for Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective Multi-Scale Normality network (MSN-net) that uses hierarchical memories to learn multi-level prototypical spatial-temporal patterns of normal events. |
Y. Liu; D. Li; W. Zhu; D. Yang; J. Liu; L. Song; |
1610 | M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. |
L. Berry; Y. -J. Shih; H. -F. Wang; H. -J. Chang; H. -Y. Lee; D. Harwath; |
1611 | MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to handle the diverse scenes, we propose a multi-scale projection transformer (MSP-Former), which understands and covers a variety of snow degradation features in a multi-path manner, and integrates comprehensive scene context information for clean reconstruction via self-attention operation. |
S. Chen; T. Ye; Y. Liu; T. Liao; J. Jiang; E. Chen; P. Chen; |
1612 | MTDL-NET: Morphological and Temporal Discriminative Learning for Heartbeat Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous methods on ECG mainly lay emphasis on extracting the optimal hand-crafted or deep features, while ignore to explore the potential of morphological and temporal representation to further boost the performance of heartbeat classification task. To address this challenge, in this work, we propose two main modules: (1) Masked attention embedding for extracting discriminative morphological feature; (2) Temporal feature enhanced mechanism for enhancing temporal representation of heartbeat. |
C. Han; S. Xiang; D. Qian; |
1613 | MTFD: Multi-Teacher Fusion Distillation for Compressed Video Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The current work utilizes the keyframes, residual and motion information retained by compressed video for computation, which greatly reduces the computational effort but still cannot satisfy real-time applications. Therefore, we propose a multi-teacher fusion distillation framework for compressed video action recognition (MTFD). |
J. Guo; J. Zhang; S. Li; X. Zhang; M. Ma; |
1614 | MUG: A General Meeting Understanding and Generation Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To the best of our knowledge, the AliMeeting4MUG Corpus is so far the largest meeting corpus in scale and facilitates most SLP tasks. In this paper, we provide a detailed introduction of this corpus, SLP tasks and evaluation methods, baseline systems and their performance1. |
Q. Zhang; C. Deng; J. Liu; H. Yu; Q. Chen; W. Wang; Z. Yan; J. Liu; Y. Ren; Z. Zhao; |
1615 | Multi-Agent Adversarial Training Using Diffusion Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general adversarial training framework for multi-agent systems using diffusion learning. |
Y. Cao; E. Rizk; S. Vlaski; A. H. Sayed; |
1616 | Multi-Agent Reinforcement Learning for Covert Semantic Communications Over Wireless Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a covert semantic communication framework is proposed for image transmission over wireless networks. |
Y. Wang; Y. Hu; H. Du; T. Luo; D. Niyato; |
1617 | Multi-Aspect Interest Neighbor-Augmented Network for Next-Basket Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Limited by the sparsity brought by short-term user interaction behavior, existing NBR methods typically fail to mine fine-grained and complete representation of user interests, resulting in unsatisfactory recommendation performance. To address this issue, we propose a novel Multi-aspect Interest Neighbor-augmented Network (MINN) to capture fine-grained and complete representation of user interest for the next basket prediction. |
Z. Deng; J. Li; Z. Guo; G. Li; |
1618 | Multi-Blank Transducers for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In standard RNN-T, the emission of a blank symbol consumes exactly one input frame; in our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted. |
H. Xu; F. Jia; S. Majumdar; S. Watanabe; B. Ginsburg; |
1619 | Multi-Carrier Wideband OCDM-Based THZ Automotive Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a bistatic THz automotive radar that employs the recently proposed orthogonal chirp division multiplexing (OCDM) multi-carrier waveform. |
S. Bhattacharjee; K. V. Mishra; R. Annavajjala; C. R. Murthy; |
1620 | Multicast Beamformer Design for Mimo Coded Caching Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on improving the finite-SNR performance of MIMOCC systems. |
M. Salehi; M. NaseriTehrani; A. Tölli; |
1621 | Multi-Channel Audio Signal Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a multi-channel audio signal generation scheme based on machine-learning and probabilistic modeling. |
W. B. Kleijn; M. Chinen; F. S. C. Lim; J. Skoglund; |
1622 | Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we detail our submission to the Clarity ICASSP 2023 grand challenge, in which participants have to develop a strong target speech enhancement system for hearing-aid (HA) devices in noisy-reverberant environments. |
S. Cornell; Z. -Q. Wang; Y. Masuyama; S. Watanabe; M. Pariente; N. Ono; S. Squartini; |
1623 | Multichannel Time-Encoding of Finite-Rate-of-Innovation Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose multichannel time-encoding of signals with a finite-rate-of-innovation (FRI) in single-input-multi-output (SIMO) and multi-input-multi-output (MIMO) configurations using the integrate-and-fire model. |
A. J. Kamath; C. Sekhar Seelamantula; |
1624 | Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized By Discriminative Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach named Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation. |
Z. Mu; X. Yang; W. Zhu; |
1625 | Multi-Dimensional Frequency Dynamic Convolution with Confident Mean Teacher for Sound Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional convolution is deficient in learning time-frequency domain representation of different sound events. To address this issue, we propose multi-dimensional frequency dynamic convolution (MFDConv), a new design that endows convolutional kernels with frequency-adaptive dynamic properties along multiple dimensions. |
S. Xiao; X. Zhang; P. Zhang; |
1626 | Multi-Dimensional Signal Recovery Using Low-Rank Deconvolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we present Low-rank Deconvolution, a powerful framework for low-level feature-map learning for efficient signal representation with application to signal recovery. |
D. Reixach; |
1627 | Multi-Functional Reconfigurable Intelligent Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new multi-functional reconfigurable intelligent surface (MF-RIS) architecture. |
W. Wang; W. Ni; H. Tian; Y. C. Eldar; |
1628 | Multi-Head Attention and GRU for Improved Match-Mismatch Classification of Speech Stimulus and EEG Response Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate the benefits of using mel-spectrograms instead of speech envelopes as input features as well as the effectiveness of Multi-Head Attention and GRU for EEG and speech processing. |
M. Borsdorf; S. Pahuja; G. Ivucic; S. Cai; H. Li; T. Schultz; |
1629 | Multi-Head Feature Pyramid Networks for Breast Mass Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we propose the multi-head feature pyramid module (MHFPN) to solve the problem of unbalanced focus of target boxes during feature map fusion and design a multi-head breast mass detection network (MBMDnet). |
H. Zhang; Z. Xu; D. Yao; S. Zhang; J. Chen; T. Lukasiewicz; |
1630 | Multi-Head Uncertainty Inference for Adversarial Attack Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-head uncertainty inference (MH-UI) framework for detecting adversarial attack examples. |
Y. Yang; S. Yang; J. Xie; Z. Si; K. Guo; K. Zhang; K. Liang; |
1631 | Multi-Label Temporal Evidential Neural Networks for Early Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, technically, we propose a novel framework, Multi-Label Temporal Evidential Neural Network (MTENN), for multi-label uncertainty estimation in temporal data. |
X. Zhao; X. Zhang; C. Zhao; J. -H. Cho; L. Kaplan; D. H. Jeong; A. Jøsang; H. Chen; F. Chen; |
1632 | Multi-Layer Feature Division Transferable Adversarial Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By contrast, we propose the Multi-layer Feature Division Attack (MFDA), which aggregates multi-layer feature information on the basis of feature division to attack. |
Z. Jin; C. Yin; P. Li; L. Zhou; L. Fang; X. Chang; Z. Liu; |
1633 | Multi-Layer Seasonal Perception Network for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a neural network model called Multilayer Seasonal Perception Network (MSPNet) to predict seasonal time series. |
R. Wang; S. Miao; D. Liu; X. Jin; W. Zhang; |
1634 | Multilayer Subspace Learning With Self-Sparse Robustness for Two-Dimensional Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel bilinear subspace learning model to achieve flexible and robust two-dimensional feature extraction. |
H. Zhang; M. Gong; F. Nie; X. Li; |
1635 | Multilevel FISTA for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a multilevel fast iterative soft thresholding algorithm (FISTA), based on the use of the Moreau envelope to incorporate correction from coarse models, which is easy to compute when the explicit form of the proximal operator for the considered functions is known. |
G. Lauga; E. Riccietti; N. Pustelnik; P. Gonçalves; |
1636 | Multi-Level Fusion for Burst Super-Resolution with Deep Permutation-Invariant Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a neural network architecture for burst SR, called MLB-FuseNet (Multi-Level Burst Fusion Network), that is capable of extracting features in a manner that is invariant to permutations in the burst and to progressively condition features extracted from a reference image. |
M. Cilia; D. Valsesia; G. Fracastoro; E. Magli; |
1637 | Multilevel Transformer for Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by Transformer TTS, we propose a multilevel transformer model to perform fine-grained multimodal emotion recognition. |
J. He; M. Wu; M. Li; X. Zhu; F. Ye; |
1638 | Multilingual End-To-End Spoken Language Understanding For Ultra-Low Footprint Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an extension to TinyS2I and train a multilingual system supporting several languages. |
M. Müller; A. Alexandridis; Z. Trozenski; J. Whiteman; G. Strimel; N. Susanj; A. Mouchtaris; S. Kunzmann; |
1639 | Multi-Lingual Pronunciation Assessment with Unified Phoneme Set and Language-Specific Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified method to take advantage of multi-lingual data for multi-lingual pronunciation assessment. |
B. Lin; L. Wang; |
1640 | Multilingual Query-by-Example Keyword Spotting with Metric Learning and Phoneme-to-Embedding Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multilingual query-by-example keyword spotting (KWS) system based on a residual neural network. |
P. M. Reuter; C. Rollwage; B. T. Meyer; |
1641 | Multilingual Word Error Rate Estimation: E-Wer3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multilingual framework – eWER3 – jointly trained on acoustic and lexical representation to estimate word error rate. |
S. A. Chowdhury; A. Ali; |
1642 | Multi-Local Attention for Speech-Based Depression Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article shows that an attention mechanism, the Multi-Local Attention, can improve a depression detection approach based on Long Short-Term Memory Networks. |
F. Tao; X. Ge; W. Ma; A. Esposito; A. Vinciarelli; |
1643 | Multi-Microphone Speaker Separation By Spatial Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a data-driven approach using a modified version of a state-of-the-art network, where different layers model spatial and spectro-temporal information. |
J. Wechsler; S. R. Chetupalli; W. Mack; E. A. P. Habets; |
1644 | Multi-modal ASR Error Correction with Joint ASR Error Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To include the audio information for better error correction, we propose a sequence-to-sequence multi-modal ASR error correction model. |
B. Lin; L. Wang; |
1645 | Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The large-scale pre-training image-text foundation models have excelled in a number of downstream applications. The majority of domain generalization techniques, however, have … |
Y. Zhang; M. Zhang; W. Li; R. Tao; |
1646 | Multimodal Dyadic Impression Recognition Via Listener Adaptive Cross-Domain Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perform impression recognition using a proposed listener adaptive cross-domain architecture, which consists of a listener adaptation function to model the causality between speaker and listener behaviors and a cross-domain fusion function to strengthen their connection. |
Y. Li; P. Bell; C. Lai; |
1647 | Multimodal Emotion Recognition Based on Deep Temporal Features Using Cross-Modal Transformer and Self-Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in multimodal approaches, the interactive relations for model building using different modalities of speech representations for emotion recognition have not been well investigated yet. To address this issue, we introduce a new approach to capturing the deep temporal features of audio and text. |
B. Maji; M. Swain; R. Guha; A. Routray; |
1648 | Multimodal Facial Action Unit Detection with Physiological Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose deep networks to extract temporal features from periodic and non-periodic time-series signals and design an informativeness-based feature fusion module to handle the signal noise. |
Z. Li; L. Yin; |
1649 | Multi-Modal Food Classification in A Diet Tracking System with Spoken and Visual Inputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present multi-modal approaches to diet tracking. |
S. Gowda; Y. Hu; M. Korpusik; |
1650 | Multimodal Knowledge Distillation for Arbitrary-Oriented Object Detection in Aerial Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a multimodal knowledge distillation (MKD) method is proposed for AOOD in aerial images. |
Z. Huang; W. Li; R. Tao; |
1651 | Multimodal Microscopy Image Alignment Using Spatial and Shape Information and A Branch-and-Bound Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We cast multimodal microscopy alignment as a cell subset matching problem. To solve this non-convex problem, we introduce an efficient and globally optimal branch-and-bound algorithm to find subsets of point clouds that are in rotational alignment with each other. |
S. Chen; B. Y. Rao; S. Herrlinger; A. Losonczy; L. Paninski; E. Varol; |
1652 | Multimodal Propaganda Detection Via Anti-Persuasion Prompt Enhanced Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel propaganda detection model called Antipersuasion Prompt Enhanced Contrastive Learning (abbreviated as APCL). |
J. Cui; L. Li; X. Zhang; J. Yuan; |
1653 | Multi-Object Localization and Irrelevant-Semantic Separation for Nuclei Segmentation in Histopathology Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective nuclei segmentation method for histopathology images based on a novel neural network for multi-object localization and irrelevant-semantic separation (MI-Net), which includes a multi-object localization module (MOLM), a deep boundary awareness module (DBAM), and an irrelevant semantic separation module (ISSM). |
Y. Tang; X. Ye; X. Li; Z. Chen; |
1654 | Multi-Observation Hidden Semi-Markov Model for Photoplethysmogram Signal Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing algorithms for analyzing these signals do not serve the purpose effectively and accurately, particularly for abnormal signals. This paper proposes a multi-observation hidden semi-Markov model (HSMM) for PPG signal semantic segmentation, which leverages the information available in raw signal and its first and second derivatives simultaneously. |
N. Hasanzadeh; S. Valaee; H. Salehinejad; |
1655 | Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a multi-output joint network architecture for RNN-T transducer, for multi-task modeling of ASR and auxiliary tasks that rely on ASR outputs. |
W. Wang; D. Zhao; S. Ding; H. Zhang; S. -Y. Chang; D. Rybach; T. N. Sainath; Y. He; I. McGraw; S. Kumar; |
1656 | Multiple Access Computation Offloading for The K-User Case Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of optimizing the rates and powers of the users transmitting in each time slot, and the time slot lengths, so as to minimize the energy expended by the users. |
X. Liu; C. Schaible; T. N. Davidson; |
1657 | Multiple Acoustic Features Speech Emotion Recognition Using Cross-Attention Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attempt to use the cross-attention transformer (CAT) to handle bi-source input. |
Y. He; N. Minematsu; D. Saito; |
1658 | Multiple Contrastive Learning for Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Multimodal fine-grained interaction with the Multiple Contrastive Learning (M2CL) model for image-text multi-modal sentiment detection. |
X. Yang; S. Feng; D. Wang; P. Hong; S. Poria; |
1659 | Multiple Domain-Adversarial Ensemble Learning for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified framework for DG that combines multiple research lines to enhance the generalization ability. |
Z. -Y. Mi; K. Long; Y. -B. Yang; |
1660 | Multiple Signed Graph Learning for Gene Regulatory Network Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework (mvSGL) for joint estimation of multiple related signed graphs. |
A. Karaaslanli; S. Saha; T. Maiti; S. Aviyente; |
1661 | Multiple Target Measurements: Bayesian Framework for Moving Object Detection in Mimo Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new scheme called multiple target measurements (MTM). |
B. Eisele; A. Bereyhi; R. Müller; |
1662 | Multi-Rate Adaptive Transform Coding for Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose learned transforms and entropy coding that may either serve as (non)linear drop-in replacements, or enhancements for linear transforms in existing codecs. |
L. R. Duong; B. Li; C. Chen; J. Han; |
1663 | Multi-Resolution Convolutional Dictionary Learning for Riverbed Dynamics Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a novel formulation of convolutional-sparse-coded dynamic mode decomposition (CSC-DMD) incorporating a deep learning framework. |
E. Kobayashi; H. Yasuda; K. Hayasaka; Y. Otake; S. Ono; S. Muramatsu; |
1664 | Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce multi-resolution LBT to estimate the complex spectrograms from low to high time and frequency resolutions. |
H. Taherian; D. Wang; |
1665 | Multi-Resolution Sequence Aggregation and Model-Agnostic Framework for Time-Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, these methods merge multi-resolution inputs without carefully concern to chronological order of time-series, which is very important in the time-series. To overcome this challenge, we propose a framework that can fully utilize multi-resolution time-series signals in up, original, and downscale, and sequentially aggregate them, named multi-resolution sequence aggregation and model-agnostic (MAMA) framework. |
J. Lyu; J. Yang; J. Kim; W. Lim; W. Ahn; D. Kang; M. Kim; N. S. Kim; |
1666 | Multiresolution Signal Processing of Financial Market Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we combine neural networks, known to capture non-linear associations, with a multiscale decomposition to facilitate a better understanding of financial market data substructures. |
I. Boier; |
1667 | Multiscale Audio Spectrogram Transformer for Efficient Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs hierarchical representation learning for efficient audio classification. |
W. Zhu; M. Omar; |
1668 | Multi-Scale Compositional Constraints for Representation Learning on Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we extract video representations by combining multi-scale processing with compositional constraints, i.e., we constrain the latent space created by the network so that coarse grained video features are composed from a set of fine-grained video features using simple functions. |
G. Paraskevopoulos; C. Lavania; L. Chum; S. Sundaram; |
1669 | Multi-Scale Receptive Field Graph Model for Emotion Recognition in Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a Multi-Scale Receptive Field Graph model (MSRFG) to tackle the challenges of ERC. |
J. Wei; G. Hu; L. A. Tuan; X. Yang; W. Zhu; |
1670 | Multi-Source Templates Learning for Real-Time Aerial Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-source templates learning method to alleviate the paradox of efficiency and effectiveness for aerial tracking. |
Y. Sun; Y. Li; C. Wang; |
1671 | Multi-Speaker and Wide-Band Simulated Conversations As Training Data for End-to-End Neural Diarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The compromise solution consists in generating synthetic data and the recently proposed simulated conversations (SC) have shown remarkable improvements over the original simulated mixtures (SM). In this work, we create SC with multiple speakers per conversation and show that they allow for substantially better performance than SM, also reducing the dependence on a fine-tuning stage. |
F. Landini; M. Diez; A. Lozano-Diez; L. Burget; |
1672 | Multi-Speaker Data Augmentation for Improved End-to-end Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While E2E ASR models achieve state-of-the-art performance on recognition tasks that match well with such training data, they are observed to fail on test recordings that contain multiple speakers, significant channel or background noise or span longer durations than training data utterances. To mitigate these issues, we propose an on-the-fly data augmentation strategy that transforms single speaker training data into multiple speaker data by appending together multiple single speaker utterances. |
S. Thomas; H. -K. J. Kuo; G. Saon; B. Kingsbury; |
1673 | Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for The MISP 2022 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the design and implementation of our system for Track 1 of the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. |
T. Liu; Z. Chen; Y. Qian; K. Yu; |
1674 | Multi-Speaker Expressive Speech Synthesis Via Multiple Factors Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better transfer the fine-grained expression from references to the target speaker in non-parallel transfer, we introduce a reference-candidate pool and propose an attention-based reference selection approach. |
X. Zhu; Y. Lei; K. Song; Y. Zhang; T. Li; L. Xie; |
1675 | Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the systems developed by the SJTU X-LANCE team for LIMMITS 2023 Challenge, and we mainly focus on the winning system on naturalness for track 1. |
C. Du; Y. Guo; F. Shen; K. Yu; |
1676 | Multi-Speaker Speech Synthesis from Electromyographic Signals By Soft Speech Unit Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Speech-Unit-based EMG-to-Speech (SU-E2S), a system which relies on EMG to synthesize speech which contains the articulated content but is vocalized in another voice, determined by an acoustic reference utterance. |
K. Scheck; T. Schultz; |
1677 | Multispectral Image Fusion Based on Super Pixel Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For applications such as dehazing and object detection, there is a need to offer solutions that can perform in real-time on any type of scene. |
N. Ofir; |
1678 | Multi-Stage Aggregation Transformer for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how to fully utilize the advantages of Convolutional neural networks (CNN) and Transformer, and propose a novel multi-stage aggregation architecture named MA-Transformer for accurate segmentation of medical images with large variations and blurs. |
X. Wang; M. Shao; D. Guo; Y. Cui; X. Huang; M. Xia; C. Bai; |
1679 | Multistage Spatial Context Models for Learned Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a series of multistage spatial context models allowing both fast decoding and better RD performance. |
F. Lin; H. Sun; J. Liu; J. Katto; |
1680 | Multi-Stream Facial Adaptive Network for Expression Recognition from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that unrelated surrounding regions in the rough facial image prevent deep neural networks from learning facial-related discriminate features. To solve this problem, we present a Facial Adaptive Network (FAN) which is able to adaptively select an interest region from the given facial image, thus suffering less from the effect of unrelated regions. |
B. Zhang; F. Meng; R. Ding; M. Liu; |
1681 | Multi-Task Bias-Variance Trade-Off Through Functional Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we draw intuition from the two extreme learning scenarios – a single function for all tasks, and a task-specific function that ignores the other tasks dependencies – to propose a bias-variance trade-off. |
J. Cerviño; J. A. Bazerque; M. Calvo-Fullana; A. Ribeiro; |
1682 | Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using Wav2vec 2.0 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the effectiveness of this model on three basic speech classification tasks: speaker change detection, overlapped speech detection, and voice activity detection. |
M. Kunešová; Z. Zajíc; |
1683 | Multi-Task Sub-Band Network For Deep Residual Echo Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the SWANT team’s entry to the ICASSP 2023 AEC Challenge. |
J. Sun; D. Luo; Z. Li; J. Li; Y. Ju; Y. Li; |
1684 | Multi-Task Transformer with Relation-Attention and Type-Attention for Named Entity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a multi-task Transformer, which incorporates an entity boundary detection task into the named entity recognition task. |
Y. Mo; H. Tang; J. Liu; Q. Wang; Z. Xu; J. Wang; W. Wu; Z. Li; |
1685 | Multi-Temporal Lip-Audio Memory for Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Multi-Temporal Lip-Audio Memory (MTLAM) that makes the best use of audio signals to complement insufficient information of lip movements. |
J. H. Yeo; M. Kim; Y. M. Ro; |
1686 | Multitrack Music Transcription with A Time-Frequency Perceiver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deep neural network architecture, Perceiver TF, to model the time-frequency representation of audio input for multitrack transcription. |
W. . -T. Lu; J. -C. Wang; Y. . -N. Hung; |
1687 | Multitrack Music Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length. |
H. -W. Dong; K. Chen; S. Dubnov; J. McAuley; T. Berg-Kirkpatrick; |
1688 | Multi-User Data Detection in Massive MIMO with 1-Bit ADCS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a multi-UE setting with correlated Rayleigh fading, where the soft-estimated symbols are obtained by means of maximum ratio combining based on imperfectly estimated channels. |
A. Radbord; I. Atzeni; A. Tölli; |
1689 | Multi-User Methods for Vibrational Radar Backscatter Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we compare the performance between two multi-user processing methods in vibrational radar backscatter communications (VRBC). |
J. Centers; J. Krolik; |
1690 | Multi-View Graph Regularized Deep Autoencoder-Like NMF Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Because the nonnegative matrix factorization (NMF) can favorably explain the extracted features, the NMF based MVC is usually a good choice for multi-view data, and promising results are achieved. Inspired by this, we propose a multi-view graph regularized deep autoencoder-like NMF (MGANMF) framework in this paper for multi-view clustering. |
L. Zhao; Z. Wang; Z. Wang; Z. Chen; |
1691 | Multi-View K-Means with Laplacian Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, some two-stage strategies cannot achieve ideal results due to the absence of capturing the correlation between views. In view of this, we propose Multi-View K-means with Laplacian Embedding (MVKLE), which is capable of clustering multi-view data in the learned embedding space. |
Z. Hao; Z. Lu; F. Nie; R. Wang; X. Li; |
1692 | Multi-View Learning for Speech Emotion Recognition with Categorical Emotion, Categorical Sentiment, and Dimensional Scores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we evaluate and quantify the predictive power of the dimensional scores towards categorical emotions and sentiment for two publicly available speech emotion datasets. |
D. Tompkins; D. Emmanouilidou; S. Deshmukh; B. Elizalde; |
1693 | Multi-View Millimeter-Wave Imaging Over Wireless Cellular Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on the uplink channels of the wireless cellular network, we propose a multi-view mmWave imaging architecture. |
X. Tong; Z. Zhang; Z. Yang; |
1694 | Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. |
J. Koo; M. A. Martínez-Ramírez; W. -H. Liao; S. Uhlich; K. Lee; Y. Mitsufuji; |
1695 | Music Rearrangement Using Hierarchical Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a method for automatically rearranging music recordings that takes into account the hierarchical structure of the recording. |
C. Plachouras; M. Miron; |
1696 | Mutual Information Based Reweighting for Precipitation Nowcasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we find that if the imbalance ratio is fixed, tasks with higher mutual information make the nowcasting model more robust to the data imbalance problem. |
Y. Cao; D. Zhang; X. Zheng; H. Shan; J. Zhang; |
1697 | Mutually Guided Few-Shot Learning For Relational Triple Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The performance will drop dramatically when only few labeled data are available. To tackle this problem, we propose the Mutually Guided Few-shot learning framework for Relational Triple Extraction (MG-FTE). |
C. Yang; S. Jiang; B. He; C. Ma; L. He; |
1698 | MvCo-DoT: Multi-View Contrastive Domain Transfer Network for Medical Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing medical report generation methods cannot exploit the rich multi-view mutual information of medical images. Therefore, in this work, we propose the first multi-view medical report generation model, called MvCo-DoT. |
R. Wang; X. Wang; Z. Xu; W. Xu; J. Chen; T. Lukasiewicz; |
1699 | N2MVSNet: Non-Local Neighbors Aware Multi-View Stereo Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current works are limited to using fixed-size convolution kernels, leading to suboptimal features that lack anisotropy in low-textured regions and tend to produce invalid depth blending at the edge of the foreground and background. In this paper, we propose N2MVSNet, which learns adaptive non-local neighbors matching (ANNM) and their spatial impact to overcome these deficiencies. |
Z. Zhang; H. Gao; Y. Hu; R. Wang; |
1700 | Named Entity Detection and Injection for Direct Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore how to leverage dictionaries of NEs known to likely appear in a given context to improve S2T model outputs. |
M. Gaido; Y. Tang; I. Kulikov; R. Huang; H. Gong; H. Inaguma; |
1701 | Narrow Down Before Selection: A Dynamic Exclusion Model for Multiple-Choice QA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a dynamic exclusion model for MCQA named ExcMC, which mimics human thinking in selection. |
X. Liu; Y. Shi; R. Liu; G. Bai; Y. Chen; |
1702 | NAS-DYMC: NAS-Based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to enhance the representation ability of CNN, we propose NAS-DYMC, a NAS-based dynamic multi-scale convolutional neural network to extract a more effective acoustic representation. |
J. Wang; P. Yao; F. Deng; J. Tan; C. Song; X. Wang; |
1703 | Nasty-SFDA: Source Free Domain Adaptation from A Nasty Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A challenging problem called Nasty Source Free Domain Adaptation (Nasty-SFDA) is proposed in this work, where only a nasty source model and unlabeled target samples are available for DA. |
J. Cao; Y. Liu; W. Bai; J. Ding; L. Li; |
1704 | Native Multi-Band Audio Coding Within Hyper-Autoencoded Reconstruction Propagation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel neural audio coding network that natively supports a multi-band coding paradigm. |
D. Petermann; I. Jang; M. Kim; |
1705 | Naturalistic Head Motion Generation from Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the variation in the perceptual quality of head motions sampled from a generative model. |
T. Mittal; Z. Aldeneh; M. Fedzechkina; A. Ranjan; B. -J. Theobald; |
1706 | Navigating and Reaching Therapeutic Goals with Dynamical Systems in Conversation-Based Interventions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Modern human behavioral signal processing and machine-learning methods have introduced novel ways for representing and estimating internal states of people in goal-based conversational interactions, such as psychotherapy. |
V. Ardulov; S. Narayanan; |
1707 | NBA-OMP: Near-Field Beam-Split-Aware Orthogonal Matching Pursuit for Wideband THz Channel Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design an NBA dictionary of near-field steering vectors by exploiting the corresponding angular and range deviation. |
A. M. Elbir; K. Vijay Mishra; S. Chatzinotas; |
1708 | NCL: Textual Backdoor Defense Using Noise-Augmented Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Noise-augmented Contrastive Learning (NCL) framework to defend against textual backdoor attacks when training models with untrustworthy data. |
S. Zhai; Q. Shen; X. Chen; W. Wang; C. Li; Y. Fang; Z. Wu; |
1709 | NC-WAMKD: Neighborhood Correction Weight-Adaptive Multi-Teacher Knowledge Distillation for Graph-Based Semi-Supervised Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, they rely on a large amount of labeled data, which is inconsistent with semi-supervised learning requirements. To solve these limitations, we propose a Neighborhood Correction Weight-Adaptive Multi-teacher Knowledge Distillation (NC-WAMKD) framework, which involves the knowledge distillation strategy WAMKD and the student model with Neighborhood Correction Label Propagation and Feature Transformation (NCLF). |
J. Liu; P. Guo; Y. Song; |
1710 | Near-field Localization with Dynamic Metasurface Antennas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use a direct positioning estimation method based on curvature-of-arrival of the impinging wavefront to obtain the user location, and characterize the effects of DMA tuning on the estimation accuracy. |
Q. Yang; A. Guerra; F. Guidi; N. Shlezinger; H. Zhang; D. Dardari; B. Wang; Y. C. Eldar; |
1711 | Neighborhood Information-Based Label Refinement for Person Re-Identification with Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a label refinement module based on neighborhood information (LRNI) for person Re-ID with label noise. |
X. Zhong; S. Su; W. Liu; X. Jia; W. Huang; M. Wang; |
1712 | Nested Attention Network with Graph Filtering for Visual Question and Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Recently, Visual Question Answering(VQA), which is required to generate the answer by understanding both visual and textual content, has attracted considerable research interest. … |
J. Lu; C. Wu; L. Wang; S. Yuan; J. Wu; |
1713 | Networked Policy Gradient Play in Markov Potential Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a networked policy gradient play algorithm for solving Markov potential games. |
S. Aydın; C. Eksin; |
1714 | Neural-AFC: Learning-Based Step-Size Control for Adaptive Feedback Cancellation with Closed-Loop Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, model-based and more recently learning-based AF methods typically neglect this correlation in their derivation, leading to sub-optimal performance in closed-loop scenarios. In this paper, we propose Neural-AFC, a recurrent neural network (RNN) designed for AFC step-size control optimization, that addresses this problem by including not only the adaptive filter but also the acoustic feedback path and the system gain in the recurrence model. |
B. Soleimani; H. Schepker; M. Mirbagheri; |
1715 | Neural Architecture of Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent progress on deep learning models for speech, we present a first systematic study on understanding human speech processing by probing neural speech models to predict both language and auditory brain region activations. |
S. R. Oota; K. Pahwa; M. Marreddy; M. Gupta; B. S. Raju; |
1716 | Neural Architecture Search with Multimodal Fusion Methods for Diagnosing Dementia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform extensive experiments on the ADReSS Challenge dataset and show the effectiveness of our approach over state-of-the-art methods. |
M. Chatzianastasis; L. Ilias; D. Askounis; M. Vazirgiannis; |
1717 | Neural Band-to-Piano Score Arrangement with Stepless Difficulty Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a music arrangement method of popular music that can convert a band score into a piano score with a steplessly-specified level of performance difficulty. |
M. Terao; E. Nakamura; K. Yoshii; |
1718 | Neural Diarization with Non-Autoregressive Intermediate Attractors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel EEND model that introduces the label dependency between frames. |
Y. Fujita; T. Komatsu; R. Scheibler; Y. Kida; T. Ogawa; |
1719 | Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces additional coding efficiency in speech coding by reducing the temporal redundancy existing in the frame-level feature sequence via a feature predictor. |
H. Yang; W. Lim; M. Kim; |
1720 | Neural Fourier Shift for Binaural Speech Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a neural network for rendering binaural speech from given monaural audio, position, and orientation of the source. |
J. Woo Lee; K. Lee; |
1721 | Neurally Augmented State Space Model for Simultaneous Communication and Tracking with Low Complexity Receivers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an integrated sensing and communications (ISAC) system where a base station (BS) equipped with an antenna array and a co-located radar receiver transmits data packets while simultaneously tracking the position of users. |
F. Pedraza; G. Caire; |
1722 | Neural Maximum-a-Posteriori Beamforming for Ultrasound Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep learning based reconstruction methods have demonstrated impressive results over the past years, but often lack interpretability and require vast amounts of data.In this work we present a neural MAP beamforming technique, which efficiently combines deep learning in the MAP beamforming framework. |
B. Luijten; B. W. Ossenkoppele; N. de Jong; M. D. Verweij; Y. C. Eldar; M. Mischi; R. J. G. van Sloun; |
1723 | Neural Mode Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we leverage the neural mode decomposition technique and propose an open-source Neural Mode Estimation (NME) to deliver a large speedup (at least 50×) while maintaining accuracy. |
P. Sun; Z. Wen; Y. Zhou; Z. Hong; T. Lin; |
1724 | Neural Network Models with Integrated Training and Adaptation For Nonlinear Acoustic System Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: System identification is instrumental in various tasks of acoustics, including acoustic measurement and acoustic echo cancellation. Adaptive filtering is proven to be successful … |
S. Voit; G. Enzner; |
1725 | Neural Networks with Quantization Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a constrained learning approach to quantization aware training. |
I. Hounie; J. Elenter; A. Ribeiro; |
1726 | Neural Optimization Of Geometry And Fixed Beamformer For Linear Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the issue by jointly optimizing the array geometry and spatial filters through a neural network based model. |
L. Yan; W. Huang; W. B. Kleijn; T. D. Abhayapala; |
1727 | Neural Source Coding For Bandwidth-Efficient Brain-Computer Interfacing With Wireless Neuro-Sensor Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our goal is to investigate the use of NSC in so-called neuro-sensor networks, i.e., a type of body-sensor network consisting of a collection of wireless sensor nodes that record brain activity at different scalp locations, e.g., via electroencephalography (EEG) sensors. |
T. Strypsteen; A. Bertrand; |
1728 | Neural Speech Phase Prediction Based on Parallel Estimation Architecture and Anti-Wrapping Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. |
Y. Ai; Z. -H. Ling; |
1729 | Neural Transducer Training: Reduced Memory Consumption with Sample-Wise Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. |
S. Braun; E. McDermott; R. Hsiao; |
1730 | New Interpretable Patterns and Discriminative Features from Brain Functional Network Connectivity Using Dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new method that leverages ICA and DL for the identification of directly interpretable patterns to discriminate between the HC and Sz groups. |
F. Ghayem; H. Yang; F. Kantar; S. . -J. Kim; V. D. Calhoun; T. Adali; |
1731 | Newton-Based Trainable Learning Rate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an algorithm for automatically adjusting the learning rate during the training process, assuming a gradient descent formulation. |
G. Retsinas; G. Sfikas; P. P. Filntisis; P. Maragos; |
1732 | Next-Speaker Prediction Based on Non-Verbal Information in Multi-Party Video Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for next-speaker prediction, a task to predict who speaks in the next turn among multiple current listeners, in multi-party video conversation. |
S. Mizuno; N. Hojo; S. Kobashikawa; R. Masumura; |
1733 | NF-PCAC: Normalizing Flow Based Point Cloud Attribute Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a different and novel approach to compress PC attributes by using normalizing flows. |
R. B. Pinheiro; J. -E. Marvie; G. Valenzise; F. Dufaux; |
1734 | NL-DSE: Non-Local Neural Network with Decoder-Squeeze-and-Excitation for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a SE-Net-based module for the decoder part in the encoder-decoder architecture to improve the result. |
T. -H. Tsai; W. -C. Wan; |
1735 | NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes the design of NNSVS, an open-source soft-ware for neural network-based singing voice synthesis research. |
R. Yamamoto; R. Yoneyama; T. Toda; |
1736 | Node-Wise Domain Adaptation Based on Transferable Attention for Recognizing Road Rage Via EEG Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on the biological topology of multi-channel electroencephalogram (EEG) signals, we propose a model which combines transferable attention (TA) and regularized graph neural network (RGNN). |
X. Gao; C. Xu; Y. Song; J. Hu; J. Xiao; Z. Meng; |
1737 | Noise-Aware Target Extension with Self-Distillation for Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a noise-aware target extension (NATE) that extends the senone target to contain noise awareness by jointly classifying the senone and noise in a single branch. |
J. -S. Seong; J. -H. Choi; J. Kyung; Y. -R. Jeoung; J. -H. Chang; |
1738 | Noise-Disentanglement Metric Learning for Robust Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic speaker verification (ASV) suffers from performance degradation in noisy environments. To solve this problem, we propose the noise-disentanglement metric learning to reduce the speaker-irrelevant noisy components and build a noise-invariant embedding space. |
Y. Sun; H. Zhang; L. Wang; K. A. Lee; M. Liu; J. Dang; |
1739 | Noise PSD Insensitive RTF Estimation in A Reverberant and Noisy Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Spatial filtering techniques typically rely on estimates of the target relative transfer function (RTF). However, the target speech signal is typically corrupted by late … |
C. Li; R. C. Hendriks; |
1740 | Noncoherent Multiuser Grassmannian Constellations for The Mimo Multiple Access Channel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the design of multiuser constellations for a multiple access channel (MAC) with K users, with M antennas each, that transmit simultaneously to a receiver equipped with N antennas through a Rayleigh block-fading channel, when no channel state information (CSI) is available to either the transmitter or the receiver. |
J. Álvarez-Vizoso; D. Cuevas; C. Beltrán; I. Santamaria; V. Tuček; G. Peters; |
1741 | Non-Convex Approaches for Low-Rank Tensor Completion Under Tubal Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate a specific sampling strategy, referred to as tubal sampling. |
Z. Tan; L. Huang; H. Cai; Y. Lou; |
1742 | Nonnegative Block-Term Decomposition with The Β-Divergence: Joint Data Fusion and Blind Spectral Unmixing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method for solving simultaneously hyperspectral super-resolution and spectral unmixing of the unknown super-resolution image. |
C. Prévost; V. Leplat; |
1743 | Nonparallel Emotional Voice Conversion for Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the problem of converting the emotion of speakers whose only neutral data are present during the time of training and testing (i.e., unseen speaker-emotion combinations). |
N. Shah; M. Singh; N. Takahashi; N. Onoe; |
1744 | Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although these methods achieve highly accurate super-resolution if the acoustic characteristics of the input data are similar to those of the training data, challenges remain: the models suffer from quality degradation for out-of-domain data, and paired data are required for training. To address these problems, we propose Dual-CycleGAN, a high-quality audio super-resolution method that can utilize unpaired data based on two connected cycle consistent generative adversarial networks (CycleGAN). |
R. Yoneyama; R. Yamamoto; K. Tachibana; |
1745 | Nord: Non-Matching Reference Based Relative Depth Estimation from Binaural Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose NORD: a novel framework for estimating the relative depth between two binaural speech recordings. |
P. Manocha; I. D. Gebru; A. Kumar; D. Markovic; A. Richard; |
1746 | No Reference Quality Assessment for Screen Content Images Based on Entire and High-Influence Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering the impact of local distortions on the visual quality of the entire SCIs, this paper proposes a no-reference quality assessment method based on entire and high-influence regions. |
Z. Xu; Y. Yang; Z. Zhang; W. Zhang; |
1747 | Not All Classes Are Equal: Adaptively Focus-Aware Confidence for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose adaptively focus-aware confidence, which treats object classes differently. |
H. Zhu; Y. Lu; H. Zhao; G. Zhao; X. Zhao; |
1748 | Note and Playing Technique Transcription of Electric Guitar Solos in Real-World Music Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Transcribing electric guitar solo in real-world performance is challenging because of the interference of background accompaniments, the strong coupling between music pitch and … |
T. -S. Huang; P. -C. Yu; L. Su; |
1749 | Nowcasting of Extreme Precipitation Using Deep Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, novel deep generative models are proposed for precipitation nowcasting. |
H. Bi; M. Kyryliuk; Z. Wang; C. Meo; Y. Wang; R. Imhoff; R. Uijlenhoet; J. Dauwels; |
1750 | NRTSI: Non-Recurrent Time Series Imputation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we reformulate time series as sets and propose a novel non-recurrent imputation model, Non-Recurrent Time Series Imputation (NRTSI), that does not impose any recurrent structures. |
S. Shan; Y. Li; J. B. Oliva; |
1751 | NSV-TTS: Non-Speech Vocalization Modeling And Transfer In Emotional Text-To-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an emotion TTS system (NSV-TTS) to model NSV and emotional speech. |
H. Zhang; X. Yu; Y. Lin; |
1752 | Numerical Semantic Modeling for Implicit Discourse Relation Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we attach importance to numerical semantics and design a numerical logic reasoning module specifically for the numeric tokens in discourse arguments to enhance the discourse logic inferring. |
C. Wang; P. Jian; H. Wang; |
1753 | NVOC-22: A Low Cost Mel Spectrogram Vocoder for Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The model is trained as a GAN and demonstrates stable training behavior. |
R. Iyer; |
1754 | OAFormer: Learning Occlusion Distinguishable Feature for Amodal Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a simple yet effective Occlusion-aware transformer-based model, OAFormer, is proposed for accurate amodal instance segmentation. |
Z. Li; R. Shi; T. Huang; T. Jiang; |
1755 | Oct Image Blind Despeckling Based on Gradient Guided Filter with Speckle Statistical Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, speckles occur in OCT images due to the property of coherent imaging, inevitably affecting the visual quality and clinical analysis. To alleviate this problem, we propose a novel gradient-guided speckle image filtering method (GGSF) with structure enhancement for directly removing speckles in OCT images. |
S. Li; M. Xiong; B. Yang; X. Zhang; R. Higashita; J. Liu; |
1756 | On Adversarial Robustness of Audio Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We make three contributions to improve adversarial robustness of audio classifiers. |
K. Lu; M. C. Nguyen; X. Xu; C. S. Foo; |
1757 | On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that using a small batch size during training improves performance in both conditions for all batching strategies. |
P. Gonzalez; T. Sonne Alstrøm; T. May; |
1758 | On Bidirectional Preestimates and Their Application to Identification of Fast Time-Varying Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel preestimator, called bidirectional, which further improves performance of the fLBF scheme. |
M. Niedźwiecki; A. Gańcza; L. Shen; Y. Zakharov; |
1759 | Once-for-All Sequence Compression for Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a once-for-all (OFA) sequence compression framework for self-supervised speech models that supports a continuous range of operating compressing rates. |
H. -J. Chen; Y. Meng; H. -y. Lee; |
1760 | On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its effectiveness in saving computational resources, OTFusion requires the input networks to have the same number of layers. To address this issue, we propose a novel model fusion framework, named CLAFusion, to fuse neural networks with different numbers of layers, which we refer to as heterogeneous neural networks, via cross-layer alignment. |
D. Nguyen; T. Nguyen; K. Nguyen; D. Phung; H. Bui; N. Ho; |
1761 | On Crowdsourcing-Design with Comparison Category Rating for Evaluating Speech Enhancement Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a user evaluation based on crowdsourcing (subjective) and the Comparison Category Rating (CCR) method is compared against the DNS-MOS, ViSQOL and 3QUEST (objective) metrics. |
A. S. Z. Suárez; C. Laroche; L. H. Clemmensen; S. Das; |
1762 | On Designing A 3d Imaging Summer Project For Ontario’s High School Students During Covid-19 Pandemic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We document our design methodology and experiences in the project, as well as feedback and evaluations from participants at all levels. |
F. Lan; G. Cheung; P. Arora; D. Richard-Koko; L. Cole; |
1763 | On Designing Light-Weight Object Trackers Through Network Pruning: Use CNNS or Transformers? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper demonstrates how highly compressed light-weight object trackers can be designed using neural architectural pruning of large CNN and Transformer based trackers. |
S. Aggarwal; T. Gupta; P. K. Sahu; A. Chavan; R. Tiwari; D. K. Prasad; D. K. Gupta; |
1764 | One-Shot Action Detection Via Attention Zooming In Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work focuses on the one-shot image scenario and introduces the attention zooming in strategy to effectively and progressively carry out support-query cross-attention while generating proposals. |
H. -Y. Hsieh; D. -J. Chen; C. -W. Chang; T. -L. Liu; |
1765 | One-Shot Medical Action Recognition With A Cross-Attention Mechanism And Dynamic Time Warping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the classification of medical actions with only one single sample by developing a novel one-shot learning framework which contains both cross-attention and dynamic time warping (DTW) modules. |
L. Xie; Y. Yang; Z. Fu; S. M. Naqvi; |
1766 | One-Shot Neural Band Selection for Spectral Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel one-shot Neural Band Selection (NBS) framework for spectral recovery. |
H. -M. Hu; Z. Xu; W. Xu; Y. Song; Y. Zhang; L. Liu; Z. Han; A. Meng; |
1767 | Online Binaural Speech Separation Of Moving Speakers With A Wavesplit Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we describe a real-time binaural speech separation model based on a Wavesplit network to mitigate the speaker swap problem for moving speaker separation. |
C. Han; N. Mesgarani; |
1768 | Online Caching with Fetching Cost for Arbitrary Demand Pattern: A Drift-Plus-Penalty Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of caching in a single server setting from the stochastic optimization viewpoint. |
S. P; B. N. Bharath; |
1769 | Online Edge Flow Prediction Over Expanding Simplicial Complexes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle the streaming nature of data, we propose an online prediction for edge flows which generalizes to other higher-order simplicial signals. |
M. Yang; B. Das; E. Isufi; |
1770 | Online Learning-Based Waveform Selection for Improved Vehicle Recognition in Automotive Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel learning approach based on satisficing Thompson sampling, which quickly identifies a waveform expected to yield satisfactory classification performance. |
C. E. Thornton; W. W. Howard; R. M. Buehrer; |
1771 | Online Model Compression for Federated Learning with Large Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The proposed Online Model Compression (OMC) provides a framework that stores model parameters in a compressed format and decompresses them only when needed. We use quantization as the compression method in this paper and propose three methods, (1) per-variable transformation, (2) weight-matrix-only quantization, and (3) partial variable quantization, to minimize its impact on model accuracy. |
T. -J. Yang; Y. Xiao; G. Motta; F. Beaufays; R. Mathews; M. Chen; |
1772 | Online Residual-Based Key Frame Sampling with Self-Coach Mechanism and Adaptive Multi-Level Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ORSampler, an adaptive Online Residual-based key frame Sampler. |
R. Zhang; Y. Hua; T. Song; Z. Xue; R. Ma; H. Guan; |
1773 | Online Vector Autoregressive Models Over Expanding Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods work principally on graphs with a fixed size, whereas in several applications there are expanding graphs where new nodes join the network; e.g., new sensors joining a sensor network or new users joining a recommender system. This paper focuses on the non-trivial extension of spatiotemporal methods to this setting, where now it is key to jointly capture both the topological and signal dynamics. |
B. Das; E. Isufi; |
1774 | On Minimal Variations for Unsupervised Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution is to unveil minimal variations as a guiding principle behind unsupervised representation learning paves the way to better practical guidelines for self-supervised learning algorithms. |
V. Cabannes; A. Bietti; R. Balestriero; |
1775 | On Multiple-Input/Binaural-Output Antiphasic Speaker Signal Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the problem of target speaker signal exaction and antiphasic rendering with an array of microphones in the scenarios where there are two active speakers. Based on the important findings achieved in the psychoacoustic field as well as our recent works on single-channel speech enhancement, we present a rendering based approach in which a temporal convolutional network (TCN) is trained to take the multiple signals observed by the microphone array as its inputs and generate two output (binaural) signals. |
X. Wang; N. Pan; J. Benesty; J. Chen; |
1776 | On Negative Sampling for Contrastive Audio-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. |
H. Xie; O. Räsänen; T. Virtanen; |
1777 | On Neural Architectures for Deep Learning-Based Source Separation of Co-Channel OFDM Signals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, through a prototype problem based on the OFDM source model, we assess—and question—the efficacy of using audio-oriented neural architectures in separating signals based on features pertinent to communication waveforms. |
G. C. F. Lee; A. Weiss; A. Lancho; Y. Polyanskiy; G. W. Wornell; |
1778 | On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is despite the fact that audio is a central modality for many tasks, such as speaker diarization, automatic speech recognition, and sound event detection. To address this, we propose to leverage feature-space of the model with deep k-nearest neighbors to detect OOD samples. |
Z. Bukhsh; A. Saeed; |
1779 | On Parametric Misspecified Bayesian Cramér-Rao Bound: An Application to Linear/Gaussian Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In reality, however, the true model generating the data is either unknown or simplified when deriving estimators, which motivates the works to derive estimation bounds under modeling mismatch situations. |
S. Tang; G. LaMountain; T. Imbiriba; P. Closas; |
1780 | On Super-Resolution with Separation Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Separation-aware super resolution algorithms are studied in this paper. |
X. Mao; H. Qiao; |
1781 | On The Design and Training Strategies for Rnn-Based Online Neural Speech Separation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how RNN-based offline neural speech separation systems can be changed into their online counterparts while mitigating the performance degradation. |
K. Li; Y. Luo; |
1782 | On The Detection of Synthetic Images Generated By Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With this work, we seek to understand how difficult it is to distinguish synthetic images generated by diffusion models from pristine ones and whether current state-of-the-art detectors are suitable for the task. |
R. Corvi; D. Cozzolino; G. Zingarini; G. Poggi; K. Nagano; L. Verdoliva; |
1783 | On The Effectiveness of Monoaural Target Source Extraction for Distant End-to-end Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate the effectiveness of target source extraction for improving the robustness of end-to-end automatic speech recognition in noisy and reverberant conditions. |
C. Zorilă; R. Doddipatla; |
1784 | On The Fairness of Multitask Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider a novel fairness scenario where T tasks can be split into majority and minority groups of sizes Tmaj and Tmin respectively: The group assignments are unknown during MTL and Tmin/Tmaj ratio corresponds to the imbalance level of the problem. |
Y. Li; S. Oymak; |
1785 | On-the-Fly Text Retrieval for End-to-end ASR Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose augmenting a transducer-based ASR model with a retrieval language model, which directly retrieves from an external text corpus plausible completions for a partial ASR hypothesis. |
B. Yusuf; A. Gourav; A. Gandhe; I. Bulyko; |
1786 | On The Importance of Different Cough Phases for COVID-19 Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, our aim is two-fold. |
Y. Zhu; M. H. Shaik; T. H. Falk; |
1787 | On The Joint Estimation of Phase Noise and Time-Varying Channels for OFDM Under High-Mobility Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use separate sets of Basis Expansion Model (BEM) coefficients for modelling the time variation over intervals of several OFDM symbols of the channel paths and the phase noise process. |
F. Linsalata; N. Ksairi; |
1788 | On The Minimum Perimeter Criterion for Bounded Component Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This contribution is focused on the statistical justification and practical implementation of the minimum perimeter criterion, a bounded component analysis technique that can be used for the extraction of complex signals. |
S. Cruces; |
1789 | On The Primal and Dual Formulations Of The Discrete Mumford-Shah Functional Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work focuses on the discrete Mumford-Shah (D-MS) functional which aims to perform jointly image reconstruction and contour detection but at the price of minimizing a non-convex objective function. |
N. Pustelnik; |
1790 | On The Quantization of Recurrent Neural Networks for Smiles Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we observed good performance even for 4-bit models making use of LSTM and GRU layers, the same way we concluded that Simple RNN quantization does not compensate the effort. |
A. Durao; J. P. Arrais; B. Ribeiro; G. Falcao; |
1791 | On The Reduction of Large-Scale Room Acoustic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moment-Matching (MM) techniques are well established and can be directly applied in the resulting sound diffusion equation. In this paper, we propose a computationally efficient MM algorithm based on extended Krylov subspace method, that can generate very compact models in order to efficiently simulate them across many time-steps. |
P. Stoikos; O. Axelou; G. Floros; N. Evmorfopoulos; G. Stamoulis; |
1792 | On The Relevance of The Differences Between HRTF Measurement Setups for Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform a simple experiment to test the relevance of the remaining differences between datasets when applying machine learning techniques. |
J. Pauwels; L. Picinali; |
1793 | On The Robustness of Non-Intrusive Speech Quality Model By Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work shows that deep speech quality predictors can be vulnerable to adversarial perturbations, where the prediction can be changed drastically by unnoticeable perturbations as small as −30 dB compared with speech inputs. |
H. -Y. Lin; H. -H. Tseng; Y. Tsao; |
1794 | On The Role of LIP Articulation in Visual Speech Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the degree of articulation and run a series of experiments to study how articulation strength impacts human perception of lip motion accompanying speech. |
Z. Aldeneh; M. Fedzechkina; S. Seto; K. Metcalf; M. Sarabia; N. Apostoloff; B. -J. Theobald; |
1795 | On The Role of Visual Context in Enriching Music Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose VCMR – Video-Conditioned Music Representations, a contrastive learning framework that learns music representations from audio and the accompanying music videos. |
K. Avramidis; S. Stewart; S. Narayanan; |
1796 | On The Value of Stochastic Side Information in Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the effectiveness of stochastic side information in deterministic online learning scenarios. We propose a forecaster to predict a deterministic sequence where its performance is evaluated against an expert class. |
J. Jia; X. Wu; J. Evans; J. Zhu; |
1797 | Ontology-Aware Network for Zero-Shot Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The pioneering work focused on the modal gap but ignored inter-class information. |
H. Zhang; H. Jiang; Z. Wang; D. Cheng; |
1798 | On Tracking A Stochastically Time-Varying Subspace Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It turns out to be a very hard problem and so we tackle the problem of a rank 1 subspace. |
V. Solo; |
1799 | On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that DUST’s PL filtering, as initially used, fail under severe source and target domain mismatch. |
N. Dawalatabad; S. Khurana; A. Laurent; J. Glass; |
1800 | On Using The UA-Speech and Torgo Databases to Validate Automatic Dysarthric Speech Classification Approaches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such approaches typically rely on the underlying assumption that recordings from control and dysarthric speakers are collected in the same noiseless environment using the same recording setup. In this paper, we show that this assumption is violated for the UA-Speech and TORGO databases. |
G. Schu; P. Janbakhshi; I. Kodrasi; |
1801 | On Weighted Cross-Entropy for Label-Imbalanced Separable Data: An Algorithmic-Stability Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study generalization under label imbalances. |
P. Deora; C. Thrampoulidis; |
1802 | On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a general framework to compute the word error rate (WER) of ASR systems that process recordings containing multiple speakers at their input and that produce multiple output word sequences (MIMO). |
T. von Neumann; C. Boeddeker; K. Kinoshita; M. Delcroix; R. Haeb-Umbach; |
1803 | Open-Set Automatic Target Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an Open-set Automatic Target Recognition framework where we enable open-set recognition capability for ATR algorithms. |
B. Safaei; V. VS; C. M. de Melo; S. Hu; V. M. Patel; |
1804 | Optimal Carrier Frequency Design for Frequency Diverse Array Mimo Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel approach for designing the transmit frequency offset scheme based on Cramér-Rao lower bound (CRLB) minimization for a frequency diverse array multiple-input multiple-output (FDA-MIMO) radar. |
J. Cheng; M. Juhlin; W. -Q. Wang; A. Jakobsson; |
1805 | Optimal Compression for Minimizing Classification Error Probability: An Information-Theoretic Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate the problem of performing optimal data compression under the constraints that compressed data can be used for accurate classification in machine learning. |
J. Gao; A. Tang; W. Xu; |
1806 | Optimal Condition Training for Target Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new optimal condition training (OCT) method for single-channel target source separation, based on greedy parameter updates using the highest performing condition among equivalent conditions associated with a given target source. |
E. Tzinis; G. Wichern; P. Smaragdis; J. L. Roux; |
1807 | Optimal Kernel for Real-Time Arbitrary-Shaped Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We follow the issues and design an efficient framework for arbitrary-shaped text detection, which is constructed based on Optimal Kernel Representation (OKR) and Pixel Enhancement Module (PEM). |
H. Ma; C. Yang; Y. Yuan; Q. Wang; |
1808 | Optimal Mixed-ADC Arrangement for DOA Estimation Via CRB Using ULA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a mixed analog-to-digital converter (ADC) based architecture for direction of arrival (DOA) estimation using a uniform linear array (ULA). |
X. Zhang; Y. Cheng; X. Shang; J. Liu; |
1809 | Optimal Transport in Diffusion Modeling for Conversion Tasks in Audio Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we empirically show that applying the optimal transport point of view on diffusion modeling allows making a good choice of a noise sample the reverse diffusion starts generating from. |
V. Popov; A. Amatov; M. Kudinov; V. Gogoryan; T. Sadekova; I. Vovk; |
1810 | Optimal Transport with A Diversified Memory Bank for Cross-Domain Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in scenarios involving over-massive categories (speakers) or difficult samples in discrimination, OT often has difficulty computing effective transports. To address this challenge, we propose an OT-based unsupervised domain adaptation (UDA) framework for SV, OT with a diversified memory bank, called DMB-OT, which ensures the accuracy of transfers by two strategies: (1) It regularizes the solution space of OT, which attempts to plan transformations between audio samples from the same speaker with high confidence; (2) it integrates a dynamic curriculum learning algorithm, preventing OT from calculating transport couplings based on hard-discriminative samples in the early stage of UDA. |
R. Zhang; J. Wei; X. Lu; W. Lu; D. Jin; L. Zhang; J. Xu; |
1811 | Optimising Different Feature Types for Inpainting-Based Image Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fully automatic algorithm that aims at finding the optimal features from a given collection as well as their locations and their function values within a specified total feature density. |
F. Jost; V. Chizhov; J. Weickert; |
1812 | Optimization for Robustness Evaluation Beyond ℓp Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel algorithmic framework that blends a general-purpose constrained-optimization solver PyGRANSO, With Constraint-Folding (PWCF), to add reliability and generality to robustness evaluation. |
H. Liang; B. Liang; Y. Cui; T. Mitchell; J. Sun; |
1813 | Optimization of Sensor Configurations for Fault Identification in Smart Buildings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem relies directly on the sensor configurations (e.g., sampling rate, coding, quantization) and the fault detection algorithm. To address this question, we introduce a codesign framework and an algorithm for joint optimization of the sensor configurations and the accuracy of the fault detection classifier. |
N. Ahmad; M. Egan; J. -M. Gorce; J. S. Dibangoye; F. Le Mouël; |
1814 | Optimization of The Deep Neural Networks for Seizure Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of the present study was to optimize model selection and data preparation procedures for seizure detection in patients with epilepsy on wearable EEG data for the ICASSP Signal Processing Grand Challenge’. |
A. Shovkun; A. Kiryasov; I. Zakharov; M. Khayretdinova; |
1815 | Optimized Dithering for Quantization Index Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From the perspective of lattices, this work shows that using fixed optimized dithering is beneficial for achieving a smaller amount of distortion to the cover object. |
S. Lyu; |
1816 | Optimized Quality Feature Learning for Video Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the optimized quality feature learning via a multi-channel convolutional neural network (CNN) with the gated recurrent unit (GRU) for no-reference (NR) VQA. |
N. -W. Kwong; Y. -L. Chan; S. -H. Tsang; D. P. -K. Lun; |
1817 | Optimizing Distributed Multi-Sensor Multi-Target Tracking Algorithm Based On Labeled Multi-Bernoulli Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an improved distributed fusion algorithm under the Labeled multi-Bernoulli (LMB) filter framework. |
H. Liu; J. Yang; Y. Xu; L. Yang; |
1818 | Optimizing Quantum Federated Learning Based on Federated Quantum Natural Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an efficient optimization algorithm, namely federated quantum natural gradient descent (FQNGD), and further, apply it to a QFL framework that is com-posed of a variational quantum circuit (VQC)-based quantum neural networks (QNN). |
J. Qi; X. -L. Zhang; J. Tejedor; |
1819 | Optimizing Vision Transformers for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. |
Q. Liu; C. Kaul; J. Wang; C. Anagnostopoulos; R. Murray-Smith; F. Deligianni; |
1820 | OPT: One-shot Pose-Controllable Talking Head Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT). |
J. Liu; X. Wang; X. Fu; Y. Chai; C. Yu; J. Dai; J. Han; |
1821 | Order Reduction of Multi-Channel FIR Filters By Balanced Truncation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this contribution, we present a BT algorithm which is specifically tailored to the approximation of MIMO FIR filters which avoids most of the usually needed computations. |
F. Hilgemann; P. Jax; |
1822 | OTW: Optimal Transport Warping for Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This hinders its use in deep learning architectures, where layers involving DTW computations cause severe bottlenecks. To alleviate these issues, we introduce a new metric for time series data based on the Optimal Transport (OT) framework, called Optimal Transport Warping (OTW). |
F. Latorre; C. Liu; D. Sahoo; S. C. H. Hoi; |
1823 | Outlier-Insensitive Kalman Filtering Using NUV Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, an outlier-insensitive KF (OIKF) is proposed, where robustness is achieved by modeling a potential outlier as a normally distributed random variable with unknown variance (NUV). |
S. Truzman; G. Revach; N. Shlezinger; I. Klein; |
1824 | Output-Dependent Gaussian Process State-Space Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To jointly learn the output-dependent GPSSM and infer the latent states, we propose a variational sparse GP-based learning method that only gently increases the computational complexity. |
Z. Lin; L. Cheng; F. Yin; L. Xu; S. Cui; |
1825 | Outside Knowledge Visual Question Answering Version 2.0 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the analysis, corrections, and removals completed and presents a new dataset: OK-VQA Version 2.0. |
B. Z. Reichman; A. Sundar; C. Richardson; T. Zubatiy; P. Chowdhury; A. Shah; J. Truxal; M. Grimes; D. Shah; W. J. Chee; S. Punjwani; A. Jain; L. Heck; |
1826 | Overcoming Posterior Collapse in Variational Autoencoders Via EM-Type Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Under this framework, we propose a novel EM-type training algorithm that gives a controllable optimization process and it allows for further extensions, e.g., employing implicit distribution models. |
Y. Li; L. Cheng; F. Yin; M. M. Zhang; S. Theodoridis; |
1827 | Overcoming The Seesaw in Monocular 3D Object Detection Via Language Knowledge Transferring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Language Knowledge Transferring to introduce language information in monocular 3D object detection, termed as MonoLT. |
W. Xu; T. Fu; |
1828 | Overlay Cognitive Radio Using Symbol Level Precoding With Quantized CSI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply the additive quantization noise model to describe the statistics of the quantized PBS CSI and employ a stochastic constraint to formulate the optimization problem, which is then converted to be deterministic. |
L. Liu; A. L. Swindlehurst; |
1829 | Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. |
M. Yang; J. Konan; D. Bick; Y. Zeng; S. Han; A. Kumar; S. Watanabe; B. Raj; |
1830 | PAGE: A Position-Aware Graph-Based Model for Emotion Cause Entailment in Conversation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue, we devise a novel position-aware graph to encode the entire conversation, fully modeling causal relations among utterances. |
X. Gu; R. Lou; L. Sun; S. Li; |
1831 | Pair DETR: Toward Faster Convergent DETR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a simple approach to address the main problem of DETR, the slow convergence, by using representation learning technique. |
S. M. Iranmanesh; S. X. Chen; K. -C. Lien; |
1832 | Papez: Resource-Efficient Speech Separation with Auditory Working Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. |
H. Oh; J. Yi; Y. Lee; |
1833 | Parafac2-Based Coupled Matrix and Tensor Factorizations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an algorithmic framework for fitting PARAFAC2-based CMTF models with the possibility of imposing various constraints on all modes and linear couplings, using Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM). |
C. Schenker; X. Wang; E. Acar; |
1834 | Parallel 2D Seismic Ray Tracing Using Cuda on A Jetson Nano Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a parallel implementation of a 2D seismic ray tracer on a graphics processing unit of the compact Jetson Nano by Nvidia. |
B. -S. Shin; L. Wientgens; D. Shutin; |
1835 | Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, we also notice the high latency of autoregressive sentence-level explanation generation, which leads to asynchronous interpretability after prediction. Therefore, we propose a non-autoregressive interpretable model to facilitate parallel explanation generation and simultaneous prediction. |
Y. Liu; X. Chen; Q. Dai; |
1836 | Parameter Efficient Transfer Learning for Various Speech Processing Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose a new adapter architecture to acquire feature representations more flexibly for various speech tasks. |
S. Otake; R. Kawakami; N. Inoue; |
1837 | Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a comprehensive analysis of applying parameter-efficient transfer learning (PETL) methods to reduce the required learnable parameters for adapting to speaker verification tasks. |
J. Peng; T. Stafylakis; R. Gu; O. Plchot; L. Mošner; L. Burget; J. Černocký; |
1838 | Parasympathetic-Sympathetic Causal Interactions and Perceived Workload for Varying Difficulty Affective Computing Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a Granger causality (GC)-based ad-hoc statistical framework to analyze the causality between the PNS and SNS response for affective computing tasks. |
P. Lavanuru; S. Pratiher; K. P. Sahoo; M. Acharya; S. S; N. Ghosh; A. Patra; |
1839 | Partially Adaptive Multichannel Joint Reduction of Ego-Noise and Environmental Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multichannel partially adaptive scheme to jointly model ego-noise and environmental noise utilizing the VAE-NMF framework, where we take advantage of spatially and spectrally structured characteristics of ego-noise by pre-training the ego-noise model, while retaining the ability to adapt to unknown environmental noise. |
H. Fang; N. Wittmer; J. Twiefel; S. Wermter; T. Gerkmann; |
1840 | Particle Flow Gaussian Sum Particle Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use a bank of PFGPF filters to construct a Particle flow Gaussian sum particle filter (PFGSPF), which approximates the prediction and posterior as Gaussian mixture model. |
K. Comandur; Y. Li; S. Nannuru; |
1841 | Passive Acoustic Tracking of Whales in 3-D Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data processing chain that can detect and track multiple whales in 3-D from passively recorded underwater acoustic signals. |
J. Jang; F. Meyer; E. R. Snyder; S. M. Wiggins; S. Baumann-Pickering; J. A. Hildebrand; |
1842 | Passive Detection of Rank-One Gaussian Signals for Known Channel Subspaces and Arbitrary Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the passive detection of a common signal in two multi-sensor arrays. For this problem, we derive a detector based on likelihood theory for the case of one-antenna transmitters, independent Gaussian noises with arbitrary spatial structure, Gaussian signals, and known channel subspaces. |
D. Ramírez; I. Santamaria; L. L. Scharf; |
1843 | PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further improve ECAPA-TDNN, we propose a progressive channel fusion strategy that splits the spectrogram across the feature channel and gradually expands the receptive field through the network. |
Z. Zhao; Z. Li; W. Wang; P. Zhang; |
1844 | PCQA-Graphpoint: Efficient Deep-Based Graph Metric for Point Cloud Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet most of the efforts in the research area ignore the local geometrical structures between points representation. In this paper, we overcome this limitation by introducing a novel and efficient objective metric for Point Clouds Quality Assessment, by learning local intrinsic dependencies using Graph Neural Network (GNN). |
M. Tliba; A. Chetouani; G. Valenzise; F. Dufaux; |
1845 | PCSalmix: Gradient Saliency-Based Mix Augmentation for Point Cloud Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the selection of mixed regions is all built on randomness, ignoring the significance of point clouds’ saliency. To address this deficiency, we propose PCSalMix: a novel Saliency-based Mix augmentation for Point Cloud classification. |
T. Hong; Z. Zhang; J. Ma; |
1846 | Peak-First CTC: Reducing The Peak Latency of CTC Models By Applying Peak-First Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the peak latency, we propose a simple and novel method named peak-first regularization, which utilizes a frame-wise knowledge distillation function to force the probability distribution of the CTC model to shift left along the time axis instead of directly modifying the calculation process of CTC loss and gradients. |
Z. Tian; H. Xiang; M. Li; F. Lin; K. Ding; G. Wan; |
1847 | Perceive and Predict: Self-Supervised Speech Representation Based Loss Functions for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work it is shown that the distance between the feature encodings of clean and noisy speech correlate strongly with psychoacoustically motivated measures of speech quality and intelligibility, as well as with human Mean Opinion Score (MOS) ratings. |
G. Close; W. Ravenscroft; T. Hain; S. Goetze; |
1848 | Perceptual Analysis of Speaker Embeddings for Voice Discrimination Between Machine And Human Listening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study investigates the information captured by speaker embeddings with relevance to human speech perception. |
I. Thoidis; C. Gaultier; T. Goehring; |
1849 | Perceptual–Neural–Physical Sound Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, spectral loss is a poor predictor of pitch intervals and its gradient may be computationally expensive; hence a slow convergence. Against this conundrum, we present Perceptual-Neural-Physical loss (PNP). |
H. Han; V. Lostanlen; M. Lagrange; |
1850 | Perceptual Quality Assessment for Digital Human Heads Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the quality assessment of digital humans has fallen behind. Therefore, to tackle the challenge of digital human quality assessment issues, we propose the first large-scale quality assessment database for three-dimensional (3D) scanned digital human heads (DHHs). |
Z. Zhang; Y. Zhou; W. Sun; X. Min; Y. Wu; G. Zhai; |
1851 | Performance Above All? Energy Consumption Vs. Performance, A Study on Sound Event Detection with Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we perform an extensive study using the DCASE task 4 baseline system and monitor energy consumption and training time for different GPU types and batch sizes. |
R. Serizel; S. Cornell; N. Turpault; |
1852 | Performance Comparison of TTS Models for Brazilian Portuguese to Establish A Baseline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper compares the performance of three text-to-speech (TTS) models released from June 2021 to January 2022 in order to establish a baseline for Brazilian Portuguese. |
W. Lobato; F. Farias; W. Cruz; M. Amadeus; |
1853 | Performance of Social Machine Learning Under Limited Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the non-asymptotic classification performance of the social machine learning strategy. |
P. Hu; V. Bordignon; M. Kayaalp; A. H. Sayed; |
1854 | Performing Neural Architecture Search Without Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we address the problem of training-free (or zero-shot) search. |
P. Rumiantsev; M. Coates; |
1855 | Period VITS: Variational Inference with Explicit Pitch Modeling for End-To-End Emotional Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often generate unstable pitch contour with audible artifacts when the dataset contains emotional attributes, i.e., large diversity of pronunciation and prosody. To address this problem, we propose Period VITS, a novel end-to-end TTS model that incorporates an explicit periodicity generator. |
Y. Shirahata; R. Yamamoto; E. Song; R. Terashima; J. -M. Kim; K. Tachibana; |
1856 | Permutation Invariant Training for Paraphrase Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although currently popular cross-encoder solutions with pre-trained language models as backbone have achieved remarkable performance, they suffer from the lack of the permutation invariance or symmetry that is one of the most important inductive biases to such task. To alleviate this issue, in this research we propose a permutation invariant training framework, in which a symmetry regularization is introduced during training that forces the model to produce the same predictions for input sentence pairs in both forward and backward directions. |
J. Bai; C. Yin; H. Hong; J. Zhang; C. Li; Y. Wang; W. Rong; |
1857 | Personalized Federated Learning on Long-Tailed Data Via Adversarial Feature Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The joint problem of data heterogeneity and long-tail distribution in the FL environment is more challenging and severely affects the performance of personalized models. In this paper, we propose a PFL method called Federated Learning with Adversarial Feature Aug-mentation (FedAFA) to address this joint problem in PFL. |
Y. Lu; P. Qian; G. Huang; H. Wang; |
1858 | Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose applying trainable structured pruning to voice cloning. |
S. -F. Huang; C. -P. Chen; Z. -S. Chen; Y. -P. Tsai; H. -Y. Lee; |
1859 | Personalized Speech Enhancement Combining Band-Split RNN and Speaker Attentive Module Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a speaker attentive module to calculate the attention scores between the speaker embedding and the intermediate features, which are used to rescale the features. |
X. Le; L. Chen; C. He; Y. Guo; C. Chen; X. Xia; J. Lu; |
1860 | Personalized Task Load Prediction in Speech Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a framework that isolates quality-dependent changes and controls most outside influencing factors like personal preference in a simulated conversational environment. |
R. P. Spang; K. El Hajal; S. Möller; M. Cernak; |
1861 | Personalizing Federated Learning with Over-The-Air Computations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a distributed training paradigm that employs analog over-the-air computation to alleviate the communication bottleneck. |
Z. Chen; Z. Li; H. H. Yang; T. Q. S. Quek; |
1862 | Person Identification with Wearable Sensing Using Missing Feature Encoding and Multi-Stage Modality Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a missingness-aware fusion network (MAFN) to identify a person’s digital phenotype from continuously measured longitudinal multi-modal wearable data. |
P. Mohapatra; A. Pandey; S. Keten; W. Chen; Q. Zhu; |
1863 | Perspective Projection-Based 3d CT Reconstruction from Biplanar X-Rays Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose PerX2CT, a novel CT reconstruction framework from X-ray that reflects the perspective projection scheme. |
D. Kyung; K. Jo; J. Choo; J. Lee; E. Choi; |
1864 | PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further unleash the potential of binocular images, in this letter, we propose a novel Transformer-based parallax fusion module called Parallax Fusion Transformer (PFT). |
H. Guo; J. Li; G. Gao; Z. Li; T. Zeng; |
1865 | PHASEAUG: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present PhaseAug, the first differentiable augmentation for speech synthesis that rotates the phase of each frequency bin to simulate one-to-many mapping. |
J. Lee; S. Han; H. Cho; W. Jung; |
1866 | Phase-Aware Spoof Speech Detection Based On Res2net with Phase Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we discovered that the randomness difference between magnitude and phase features is large, which can interrupt the feature-level fusion via backend neural network. |
J. Kim; S. M. Ban; |
1867 | Phase Retrieval for Rydberg Quantum Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In phased array or synthetic aperture applications, if a Rydberg atom probe is used in place of an antenna, then measurements of only electric field intensity are possible at each spatial sample. In this paper, we cast the extraction of useful information from these Rydberg probe measurements as a novel phase retrieval problem. |
P. Vouras; K. Vijay Mishra; A. Artusio-Glimpse; |
1868 | Phase Unwrapping in Correlated Noise for FMCW Lidar Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an algorithm that performs frequency estimation via phase unwrapping by explicitly accounting for correlations in the phase noise. |
A. Ulvog; J. Rapp; T. Koike-Akino; H. Mansour; P. Boufounos; K. Parsons; |
1869 | Phonation Mode Detection in Singing: A Singer Adapted Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define the phonation mode detection (PMD) problem, which entails the prediction of phonation mode labels as well as their onset and offset timestamps. |
Y. Wang; W. Wei; Y. Wang; |
1870 | Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an acoustic feature processing strategy, named PHONEix, with a phoneme distribution predictor, to alleviate the gap between the music score and the singing voice, which can be easily adopted in different SVS systems. |
Y. Wu; J. Shi; T. Qian; D. Gao; Q. Jin; |
1871 | Phoneme-Level Bert for Enhanced Prosody of Text-To-Speech with Grapheme Predictions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions. |
Y. A. Li; C. Han; X. Jiang; N. Mesgarani; |
1872 | Phonetic Anchor-Based Transfer Learning to Facilitate Unsupervised Cross-Lingual Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study focuses on domain adaptation in cross-lingual scenarios using phonetic constraints. |
S. G. Upadhyay; L. Martinez-Lucas; B. -H. Su; W. -C. Lin; W. -S. Chien; Y. -T. Wu; W. Katz; C. Busso; C. -C. Lee; |
1873 | Phonetic RNN-Transducer for Mispronunciation Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify two important knowledge gaps in MDD that have not been well studied in existing MDD research. |
D. Y. Zhang; S. Saha; S. Campbell; |
1874 | Physics-Informed Transfer Learning for Voltage Stability Margin Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, one may find related easy auxiliary tasks, such as voltage stability verification, that can aid in training for the hard task. This paper develops a novel approach for such settings by leveraging transfer learning. |
M. K. Singh; K. D. Polyzos; P. A. Traganitis; S. V. Dhople; G. B. Giannakis; |
1875 | Picking The Underused Heads: A Network Pruning Perspective of Attention Head Selection for Fusing Dialogue Coreference Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the attention head selection and manipulation strategy for feature injection from a network pruning perspective, and conduct a case study on dialogue summarization. |
Z. Liu; N. F. Chen; |
1876 | Piecewise Position Encoding in Convolutional Neural Network for Cough-Based Covid-19 Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even the convolutional neural networks that are capable to learn position information may be affected by small transformations of input features. Therefore, we propose piecewise position encoding added to time-frequency features to provide supplementary position information explicitly. |
J. Shen; X. Zhang; P. Zhang; Y. Yan; S. Zhang; Z. Huang; Y. Tang; Y. Wang; F. Zhang; A. Sun; |
1877 | Pitch Mark Detection from Noisy Speech Waveform Using Wave-U-Net Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Pitch mark (PM) is a time point corresponding to the closing time of vocal fold in voiced speech. PMs are useful for real-life speech processing because of their noise immunity. … |
H. -J. Nam; H. -J. Park; |
1878 | PI-Trans: Parallel-Convmlp and Implicit-Transformation Based Gan for Cross-View Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel generative adversarial network, PI-Trans, which mainly consists of a novel Parallel-ConvMLP module and an Implicit Transformation module at multiple semantic levels. |
B. Ren; H. Tang; Y. Wang; X. Li; W. Wang; M. Sebe; |
1879 | Play It Back: Iterative Attention For Audio Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an end-to-end attention-based architecture that through selective repetition attends over the most discriminative sounds across the audio sequence. |
A. Stergiou; D. Damen; |
1880 | PMMSD: Development of The Matrix Sentence Intelligibility Dataset for Mandarin with Lombard Effect Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Paired Mandarin Matrix Sentence Dataset (PMMSD), which will be available after publication. |
H. Pei; Y. Yang; X. Chen; Q. Liu; H. Chen; W. Tu; S. Lin; |
1881 | PMNet: Large-Scale Channel Prediction System for ICASSP 2023 First Pathloss Radio Map Prediction Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our pathloss prediction system submitted to the ICASSP 2023 First Pathloss Radio Map Prediction Challenge. |
J. -H. Lee; J. Lee; S. -H. Lee; A. F. Molisch; |
1882 | POINTACL: Adversarial Contrastive Learning for Robust Point Clouds Representation Under Adversarial Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To train the self-supervised contrastive learning framework adversarially, we introduce our robust aware loss function. |
J. Huang; J. Yuan; C. Qiao; Y. An; C. Lu; C. Bai; |
1883 | Polarized Signal Singular Spectrum Analysis with Complex SSA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the analysis of bivariate signals using complex Singular Spectrum Analysis (SSA). |
S. Journé; N. L. Bihan; F. Chatelain; J. Flamant; |
1884 | Police: Provably Optimal Linear Constraint Enforcement For Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose the first provable affine constraint enforcement method for DNNs that only requires minimal changes into a given DNN’s forward-pass, that is computationally friendly, and that leaves the optimization of the DNN’s parameter to be unconstrained, i.e. standard gradient-based method can be employed. |
R. Balestriero; Y. LeCun; |
1885 | Pondering About Task Spatial Misalignment: Classification-Localization Equilibrated Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To well address anchor label misjudgment issue in irregular- shaped object detection, we define a new classification-aware IoU metric to assign anchors intelligently. |
Y. Zhang; W. Lu; X. Wang; P. Wang; Y. Wang; |
1886 | Pooling Strategies for Simplicial Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to introduce pooling strategies for simplicial convolutional neural networks. |
D. M. Cinque; C. Battiloro; P. Di Lorenzo; |
1887 | Pop2Piano : Pop Audio-Based Piano Cover Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Pop2Piano, a Transformer network that generates piano covers given waveforms of pop music. |
J. Choi; K. Lee; |
1888 | Position-Aware Graph-Based Learning of Whole Slide Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, classification of cancer from WSI is performed with positional embedding and graph attention. |
M. Aryal; N. Y. Soltani; |
1889 | Positive-Pair Redundancy Reduction Regularisation for Speech-Based Asthma Diagnosis Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider another avenue, that of modelling self-recorded voice samples made using regular smartphones, along with self-reported clinical diagnosis annotations; specifically of asthma. |
G. Rizos; R. A. Calvo; B. W. Schuller; |
1890 | Possibilistic Bernoulli Filter for Extended Target Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper formulates the analog of the BF-X in the framework of possibility theory, where uncertainty is represented using possibility functions, rather than probability distributions. |
Z. Chen; B. Ristic; D. Y. Kim; |
1891 | Post-Trained Language Model Adaptive to Extractive Summarization of Long Spoken Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a post-trained DeBERTA which does not only adapt to spoken language but also manages long documents. |
H. Ok; S. -B. Park; |
1892 | Powerful and Extensible WFST Framework for Rnn-Transducer Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. |
A. Laptev; V. Bataev; I. Gitman; B. Ginsburg; |
1893 | PQLM – Multilingual Decentralized Portable Quantum Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a highly portable quantum language model (PQLM) that can easily transmit information to downstream tasks on classical machines. |
S. S. Li; X. Zhang; S. Zhou; H. Shu; R. Liang; H. Liu; L. P. Garcia; |
1894 | Practice of The Conformer Enhanced Audio-Visual Hubert on Mandarin and English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The conformer-enhanced AV-HuBERT we proposed brings 7% on MISP and 6% CER reduction on CMLR, compared with the baseline AV-HuBERT system. |
X. Ren; C. Li; S. Wang; B. Li; |
1895 | Precognition in Contextual Spoken Language Understanding Via Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to jointly model historical and future information using knowledge distillation methods to address the discrepancy between offline and online information in dialogue understanding. |
N. Su; B. Du; Y. Zhang; C. Liu; Y. Wang; H. Chen; X. Lu; |
1896 | Predicting Brain Age Using Transferable Covariance Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate the utility of VNNs in inferring brain age using cortical thickness data. |
S. Sihag; G. Mateos; C. McMillan; A. Ribeiro; |
1897 | Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we reformulate the generation of teacher label as a codec problem. |
L. Guo; X. Yang; Q. Wang; Y. Kong; Z. Yao; F. Cui; F. Kuang; W. Kang; L. Lin; M. Luo; P. Żelasko; D. Povey; |
1898 | “Prediction of Sleepiness Ratings from Voice By Man and Machine”: A Perceptual Experiment Replication Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following the release of the SLEEP corpus during the Interspeech 2019 paralinguistic continuous sleepiness estimation challenge, a paper presented at Interspeech 2020 by Huckvale et al. examined the reasons for the poor performance of the models proposed for this task. Careful analyses of the corpus led to the conclusion that its bias makes it hazardous to use for training machine learning systems, but a perceptual experiment on a subset of this corpus seemed to indicate that human hearing is however able to estimate sleepiness on this corpus.In this study, we present the results of the Endymion replication study, in which the same samples were rated by thirty French-speaking naive listeners. |
V. P. Martin; A. Ferron; J. -L. Rouas; P. Philip; |
1899 | Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we apply the contrastive predictive coding (CPC) method to the previously proposed online Skipping Memory (SkiM) speech separation model, which is a low-latency model for online speech separation. |
C. Li; Y. Wu; Y. Qian; |
1900 | Prefallkd: Pre-Impact Fall Detection Via CNN-ViT Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel pre-impact fall detection via CNN-ViT knowledge distillation, namely PreFallKD, to strike a balance between detection performance and computational complexity. |
T. -H. Chi; K. -C. Liu; C. -Y. Hsieh; Y. Tsao; C. -T. Chan; |
1901 | Prefix-Level Detection and Autocorrection of Keyboard Input Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, this paper proposes a prefix autocorrection framework, in which error detection and correction occurs at the prefix level for a more immediate response. |
J. R. Bellegarda; |
1902 | Prefix Tuning for Automated Audio Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple yet effective method of dealing with small-scaled datasets by leveraging a pre-trained language model. |
M. Kim; K. Sung-Bin; T. -H. Oh; |
1903 | Preformer: Predictive Transformer with Multi-Scale Segment-Wise Correlations for Long-Term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a predictive Transformer-based model called Preformer. |
D. Du; B. Su; Z. Wei; |
1904 | Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end framework via multitask learning which sequentially cascades a source separation (SS) module, a bottleneck feature extraction module and a VC module. |
J. Yao; Y. Lei; Q. Wang; P. Guo; Z. Ning; L. Xie; H. Li; J. Liu; D. Xie; |
1905 | Pre-Trained Model Representations and Their Robustness Against Noise for Speech Emotion Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we used multi-modal fusion representations from pre-trained models to generate state-of-the-art speech emotion estimation, and we showed a 100% and 30% relative improvement in concordance correlation coefficient (CCC) on valence estimation compared to standard acoustic and lexical baselines. |
V. Mitra; V. Kowtha; H. -Y. S. Chien; E. Azemi; C. Avendano; |
1906 | Pretrained Transformers for Seizure Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce a transformer-based model pretrained using annotated EEG scalp data that can detect the presence of seizures in a behind-the-ear wearable device setup for the 2023 ICASSP Signal Processing Grand Challenge. |
S. Panchavati; S. V. Dussen; H. Semwal; A. Ali; J. Chen; H. Li; C. Arnold; W. Speier; |
1907 | Pretraining Conformer with ASR for Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to pretrain Conformer with automatic speech recognition (ASR) task for speaker verification. |
D. Cai; W. Wang; M. Li; R. Xia; C. Huang; |
1908 | Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. |
P. Alonso-Jiménez; X. Favory; H. Foroughmand; G. Bourdalas; X. Serra; T. Lidy; D. Bogdanov; |
1909 | PRIME: 3D Human Pose and Body Shape Recovery with Perspective Projection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, weak perspective projection inevitably introduce prediction biases. To address this issue, we propose a plug-and-play Perspective Residual Log-likehood on Monocular 3D HPS Estimation (PRIME) module to significantly improve the accuracy of monocular 3D HPS estimation with trivial sacrifice on running time. |
B. Xu; S. Fang; Z. Li; S. Yang; D. Xie; S. Pu; |
1910 | Prior-Enhanced Temporal Action Localization Using Subject-Aware Spatial Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a prior-enhanced temporal action localization method (PETAL), which only takes in RGB input and incorporates action subjects as priors. |
Y. Liu; Y. Tang; N. Zhang; R. -S. Lin; H. Wang; |
1911 | Privacy-Enhanced Federated Learning Against Attribute Inference Attack for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Federal learning-based (FL) Speech Emotion Recognition (SER) framework aims to protect data privacy when characterizing emotions. |
H. Zhao; H. Chen; Y. Xiao; Z. Zhang; |
1912 | Privacy-Preserving Automatic Speaker Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, to the best of our knowledge, the development of privacy-preserving ASD systems has been overlooked thus far. In this work, we tackle this problem using a combination of two cryptographic techniques, Secure Multiparty Computation (SMC) and Secure Modular Hashing, and apply them to the two main steps of a cascaded ASD system: speaker embedding extraction and agglomerative hierarchical clustering. |
F. Teixeira; A. Abad; B. Raj; I. Trancoso; |
1913 | Privacy Preserving Face Recognition with Lensless Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a face recognition system that works without compromising user privacy. |
C. Henry; M. S. Asif; Z. Li; |
1914 | Privacy-Preserving Occupancy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an audio-based framework for occupancy estimation, including a new public dataset, and evaluate occupancy in a ‘cocktail party’ scenario where the party is simulated by mixing audio to produce speech with overlapping talkers (1-10 people). |
J. Williams; V. Yazdanpanah; S. Stein; |
1915 | Priv-Aug-Shap-ECGResNet: Privacy Preserving Shapley-Value Attributed Augmented Resnet for Practical Single-Lead Electrocardiogram Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to build an effective automated single-lead Electrocardiogram (ECG) classification system to enable remote and timely screening of critical cardio-vascular diseases like Heart attack. |
A. Ukil; L. Marin; A. J. Jara; |
1916 | Probabilistic Back-ends for Online Speaker Recognition and Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an online clustering algorithm allowing us to take benefits from the PLDA model such as the ability to handle uncertainty and better score calibration. |
A. Sholokhov; N. Kuzmin; K. A. Lee; E. S. Chng; |
1917 | Procontext: Exploring Progressive Context Transformer for Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as … |
J. -P. Lan; Z. -Q. Cheng; J. -Y. He; C. Li; B. Luo; X. Bao; W. Xiang; Y. Geng; X. Xie; |
1918 | Procter: Pronunciation-Aware Contextual Adapter For Personalized Speech Recognition In Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a PROnunCiation-aware conTextual adaptER (PROCTER) that dynamically injects lexicon knowledge into an RNN-T model by adding a phonemic embedding along with a textual embedding. |
R. Pandey; R. Ren; Q. Luo; J. Liu; A. Rastrow; A. Gandhe; D. Filimonov; G. Strimel; A. Stolcke; I. Bulyko; |
1919 | Product Graph Learning From Multi-Attribute Graph Signals with Inter-Layer Coupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on a product graph setting with homogeneous layers, we propose a bivariate polynomial graph filter model. |
C. Zhang; Y. He; H. -T. Wai; |
1920 | Progressive Diversifying Policy for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, one of the main bottleneck challenges for MARL is the sparsity of the team reward, which can lead to the homogenization of agents’ behaviors. To address these issues, we propose a Progressive Diversifying Policy (PDP) algorithm in this paper. |
S. Sun; Y. Zhai; K. Xu; D. Feng; B. Ding; |
1921 | Progressive Meta-Pooling Learning for Lightweight Image Classification Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Meta-Pooling framework to make the receptive field learnable for a lightweight network, which consists of parameterized pooling-based operations. |
P. Dong; X. Niu; Z. Tian; L. Li; X. Wang; Z. Wei; H. Pan; D. Li; |
1922 | Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we improve the efficiency of the progressive multi-stage neural audio codec (PR-Codec) by utilizing perceptually motivated training criteria. |
B. H. Kim; H. Lim; J. Lee; I. Jang; H. -G. Kang; |
1923 | Progressive Perception Learning for Distribution Modulation in Siamese Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) The background data may affect the total feature distribution of the search branch. To address these issues, we proposed a plug-and-play component named Progressive Perception Learning Module (P2LM) to modulate the distribution using three feature normalization blocks successively, i.e., Self-Aware Block (SAB), Target-Aware Block (TAB), and Region-Aware Block (RAB). |
K. Hu; X. Zhou; M. Cao; M. Wang; G. Gao; W. Yang; H. Tan; |
1924 | Progressive Refinement Learning Based on Feature Cross Perception for Residential Areas Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a semantic segmentation method for residential areas by progressive refinement learning. |
X. Lyu; L. Zhang; |
1925 | Projected Hierarchical ALS for Generalized Boolean Matrix Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a versatile approach for Boolean factorization of binary data matrices based on a projected hierarchical alternating least squares method. |
R. C. Farias; S. Miron; |
1926 | Promoting Cooperation in Multi-Agent Reinforcement Learning Via Mutual Help Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel algorithm Mutual-Help-based MARL (MH-MARL) to instruct agents to help each other in order to promote cooperation. |
Y. Qiu; Y. Jin; L. Yu; J. Wang; X. Zhang; |
1927 | Prompt-Distiller: Few-Shot Knowledge Distillation for Prompt-Based Language Learners with Dual Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, distilling prompt-tuned PLMs in the few-shot learning setting is a non-trivial problem due to the lack of task-specific training data and KD techniques for the new prompting paradigm. We propose Prompt-Distiller, the first few-shot KD algorithm for prompt-tuned PLMs, which forces the student model to learn from both its pre-trained and prompt-tuned teacher models to alleviate the model overfitting problem. |
B. Hou; C. Wang; X. Chen; M. Qiu; L. Feng; J. Huang; |
1928 | Prompt Makes Mask Language Models Better Adversarial Attackers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ProAttacker1 which uses Prompt to make the mask language models better adversarial Attackers. |
H. Zhu; C. Li; H. Yang; Y. Wang; W. Huang; |
1929 | Prompttts: Controllable Text-To-Speech With Text Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond text and image generation, in this work, we explore the possibility of utilizing text descriptions to guide speech synthesis. |
Z. Guo; Y. Leng; Y. Wu; S. Zhao; X. Tan; |
1930 | Prosody-Aware Speecht5 for Expressive Neural TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we enhance SpeechT5 by adding a new sub-task on prosody modeling (prosody-aware SpeechT5) for neural text-to-speech (TTS), which can improve the model capability to learn richer contextual representations through multi-task learning. |
Y. Deng; L. Zhou; Y. Yi; S. Liu; L. He; |
1931 | Prosody-Controllable Spontaneous TTS with Neural HMMS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a TTS architecture that can rapidly learn to speak from small and irregular datasets, while also reproducing the diversity of expressive phenomena present in spontaneous speech. |
H. Lameris; S. Mehta; G. E. Henter; J. Gustafson; É. Székely; |
1932 | Prosody Is Not Identity: A Speaker Anonymization Approach Using Prosody Cloning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a system that extends a speech-to-text-to-speech anonymization pipeline with prosody cloning and show how to control the cloning by multiplying pitch and energy sequences with random offset values. |
S. Meyer; F. Lux; J. Koch; P. Denisov; P. Tilli; N. T. Vu; |
1933 | Prototype-Based Layered Federated Cross-Modal Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method called prototype-based layered federated cross-modal hashing. |
J. Liu; Y. -W. Zhan; X. Luo; Z. -D. Chen; Y. Wang; X. -S. Xu; |
1934 | Prototype Knowledge Distillation for Medical Segmentation with Missing Modality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a prototype knowledge distillation (ProtoKD) method to tackle the challenging problem, especially for the toughest scenario when only single modal data can be accessed. |
S. Wang; Z. Yan; D. Zhang; H. Wei; Z. Li; R. Li; |
1935 | Provable Computational and Statistical Guarantees for Efficient Learning of Continuous-Action Graphical Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of learning the set of pure strategy Nash equilibria and the exact structure of a continuous-action graphical game with parametric payoffs by observing a small set of perturbed equilibria. |
A. Barik; J. Honorio; |
1936 | Provably Convergent Plug & Play Linearized ADMM, Applied to Deblurring Spatially Varying Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Plug & Play framework based on linearized ADMM that allows us to bypass the computation of intractable proximal operators. |
C. Laroche; A. Almansa; E. Coupeté; M. Tassano; |
1937 | PRRD: Pixel-Region Relation Distillation For Efficient Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes one novel pixel- region relation distillation (PPRD) to transfer the multi-scale pixel-region relation (PRR) from the teacher to the student. |
C. Wang; J. Zhong; Q. Dai; Y. Qi; R. Li; Q. Lei; B. Fang; X. Li; |
1938 | Prune Then Distill: Dataset Distillation with Importance Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate whether all samples in large datasets contribute equally to better model accuracy. |
A. S. Sundar; G. Keskin; C. Chandak; I. -F. Chen; P. Ghahremani; S. Ghosh; |
1939 | Pseudo-Inverted Bottleneck Convolution for Darts Search Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. |
A. Ahmadian; L. S. P. Liu; Y. Fei; K. N. Plataniotis; M. S. Hosseini; |
1940 | Pseudo Multi-Source Domain Extension and Selective Pseudo-Labeling for Unsupervised Domain Adaptive Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel unsupervised domain adaptation framework named as Domain Expansion and PseudoLabeling (DEPL). |
X. Liu; Z. Wang; K. Hu; X. Gao; |
1941 | Pseudo-Query Generation For Semi-Supervised Visual Grounding With Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, collecting queries in natural language is labor-intensive, which limits the application scenarios of these methods. To overcome this weakness, we propose a novel semi-supervised visual grounding framework. |
J. Jin; J. Ye; X. Lin; L. He; |
1942 | PU-Edgeformer: Edge Transformer for Dense Prediction in Point Cloud Upsampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the recent development of deep learning-based point cloud upsampling, most MLP-based point cloud upsampling methods have limitations in that it is difficult to train the local and global structure of the point cloud at the same time. To solve this problem, we present a combined graph convolution and transformer for point cloud upsampling, denoted by PU-EdgeFormer. |
D. Kim; M. Shin; J. Paik; |
1943 | PUFFIN: Pitch-Synchronous Neural Waveform Generation for Fullband Speech on Modest Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. |
O. Watts; L. Wihlborg; C. Valentini-Botinhao; |
1944 | Pushing The Limits of Self-Supervised Speaker Verification Using Regularized Distillation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. |
Y. Chen; S. Zheng; H. Wang; L. Cheng; Q. Chen; |
1945 | Pyramid Dynamic Inference: Encouraging Faster Inference Via Early Exit Boosting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to optimize the trade-off between model accuracy and latency, we propose Pyramid Dynamic Inference (PDI), a scheme that encourages fast inference via boosting the performance of early exit heads. |
E. Banijamali; P. Kharazmi; S. Eghbali; J. Wang; C. Chung; S. Choudhary; |
1946 | Pyramid Spatial Feature Transform and Shared-Offsets Deformable Alignment Based Convolutional Network for HDR Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate ghost-free high dynamic range (HDR) images by merging multiple differently exposed low dynamic range (LDR) images, the key is to handle ill-exposed areas in the input LDR images and misalignment among them. In this paper, a Pyramid Spatial Feature Transform and shared-offsets Deformable convolutional Network (PSFTDNet) is proposed to achieve this target. |
J. Liao; Q. Liu; T. Ikenaga; |
1947 | QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver the speaker’s questioning intention while transferring emotion from reference speech. |
H. Tang; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
1948 | QTROJAN: A Circuit Backdoor Against Quantum Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a circuit-level backdoor attack, QTrojan, against Quantum Neural Networks (QNNs) in this paper. |
C. Chu; L. Jiang; M. Swany; F. Chen; |
1949 | Quantifying Catastrophic Forgetting in Continual Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formally quantify catastrophic forgetting in a CFL setup, establish links to training optimization and evaluate different episodic replay approaches for CFL on a large scale real-world NLP dataset. |
C. Dupuy; J. Majmudar; J. Wang; T. G. Roosta; R. Gupta; C. Chung; J. Ding; S. Avestimehr; |
1950 | Quantile Online Learning for Semiconductor Failure Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on novel quantile online learning for semiconductor failure analysis. |
B. Zhou; P. Jieming; M. Sivan; A. V. -Y. Thean; J. Senthilnath; |
1951 | Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we look into several important but overlooked aspects of the enrollment embeddings, including the suitability of the widely used speaker identification embeddings, the introduction of the log-mel filterbank and self-supervised embeddings, and the embeddings’ cross-dataset generalization capability. |
X. Liu; X. Li; J. Serrà; |
1952 | Quantized Precoding and RIS-Assisted Modulation for Integrated Sensing and Communications Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel reconfigurable intelligent surface (RIS)-assisted integrated sensing and communication (ISAC) system with 1-bit quantization at the ISAC base station. |
R. S. Prasobh Sankar; S. Prabhakar Chepuri; |
1953 | Quantpipe: Applying Adaptive Post-Training Quantization For Distributed Transformer Pipelines In Dynamic Edge Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike in cloud scenarios with high-speed and stable network inter-connects, dynamic bandwidth in edge systems can degrade distributed pipeline performance. We address this issue with QuantPipe, a communication-efficient distributed edge system that introduces post-training quantization (PTQ) to compress the communicated tensors. |
H. Wang; C. Imes; S. Kundu; P. A. Beerel; S. P. Crago; J. Paul Walters; |
1954 | Quantum Deep Recurrent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One of the challenges yet to solve is how to train quantum RL in the partially observable environments. In this paper, we approach this challenge through building QRL agents with quantum recurrent neural networks (QRNN). |
S. Y. -C. Chen; |
1955 | Quantum Graph Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Quantum Graph Transformers (QGT), a novel approach for realizing the Transformer architecture for graph learning with quantum processors. |
G. Kollias; V. Kalantzis; T. Salonidis; S. Ubaru; |
1956 | Quantum Transfer Learning Using The Large-Scale Unsupervised Pre-Trained Model Wavlm-Large for Synthetic Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a classical-to-quantum transfer learning system based on the large-scale unsupervised pre-trained model to demonstrate the competitive performance of quantum transfer learning for synthetic speech detection. |
R. Wang; J. Du; T. Gao; |
1957 | Quantum Variational Bayes on Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes using a quantum natural gradient within the manifold VB framework. |
A. Lopatnikova; M. -N. Tran; |
1958 | Quaternion Orthogonal Transformer for Facial Expression Recognition in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, ViT cannot fully utilize emotional features extracted from raw images and requires a lot of computing resources. To overcome these problems, we propose a quaternion orthogonal transformer (QOT) for FER. |
Y. Zhou; L. Guo; L. Jin; |
1959 | Query-Utterance Attention With Joint Modeuing For Query-Focused Meeting Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a query-aware framework with joint modeling token and utterance based on Query-Utterance Attention. |
X. Liu; B. Duan; B. Xiao; Y. Xu; |
1960 | Question Answering System with Sparse and Noisy Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by a practical need in Question Answering System of processing these two types of rewards, this paper investigates and proposes a new stochastic multi-armed bandit model in which each action has a noisy reward and a sparse reward. We studied this problem in the contextual bandit settings, and proposed and analyzed efficient algorithms that are based on the LINUCB frameworks. |
D. Bouneffouf; O. Alkan; R. Feraud; B. Lin; |
1961 | Quickest Change Detection with Leave-one-out Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The problem of quickest change detection in a sequence of independent observations is considered. The pre-change distribution is assumed to be known, while the post-change … |
Y. Liang; V. V. Veeravalli; |
1962 | Radar Clutter Covariance Estimation: A Nonlinear Spectral Shrinkage Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we exploit the spiked covariance structure of the clutter plus noise covariance matrix for adaptive radar signal processing. |
S. Jain; V. Krishnamurthy; M. Rangaswamy; B. Kang; S. Gogineni; |
1963 | Radio-Astronomy Imaging and Interference Excision Using Tensor Decomposition and Canonical Correlation Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new multi-frequency covariance matrix model for radio astronomical imaging that exploits spectral variability of the astronomical sources. |
M. Sørensen; N. D. Sidiropoulos; |
1964 | Radio Map Based UAV Target Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose target localization by UAV with radio maps as prior knowledge, which are generated from real topographic maps. |
C. He; W. Gong; Y. Dong; X. Xie; Z. J. Wang; |
1965 | Radio Sensing with Large Intelligent Surface for 6G Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an exemplary use case, we evaluate this method for passive multi-human detection in an indoor setting. |
C. J. Vaca-Rubio; P. Ramirez-Espinosa; K. Kansanen; Z. -H. Tan; E. d. Carvalho; |
1966 | Rain2Avoid: Self-Supervised Single Image Deraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we present Rain2Avoid (R2A), a training scheme that requires only rainy images for image deraining. |
Y. -T. Peng; W. -H. Li; |
1967 | Raising The Limit of Image Rescaling Using Auxiliary Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in places of random sampling of z, we propose auxiliary encoding modules to further push the limit of image rescaling performance. |
C. Yin; Z. Pan; X. Zhou; L. Kang; P. Bogdan; |
1968 | Randmasking Augment: A Simple and Randomized Data Augmentation For Acoustic Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe RandMasking Augment as an effective data augmentation method for acoustic scene classification research. |
J. Han; M. Matuszewski; O. Sikorski; H. Sung; H. Cho; |
1969 | Random Projector: Efficient Deep Image Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, its optimization is extremely sluggish, which inevitability hinders its practical usage when there are hard time constraints. To mitigate this issue, we propose a more compact and efficient model, dubbed random projector (RP), and freeze the convolutional layers of the neural network to prevent slow learning. |
T. Li; Z. Zhuang; H. Wang; J. Sun; |
1970 | Range-ISL Minimization and Spectral Shaping in MIMO Radar Systems Via Waveform Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we look at a waveform design problem for colocated Multiple-Input Multiple-Output (MIMO) radar systems. |
E. Raei; M. Alaee-Kerahroodi; B. Shankar; B. Ottersten; |
1971 | Rapid Audiometric Evaluation for Personalized Headphone Listening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In experiment 1, the developed modified Békésy audiometric approach was performed in the laboratory to show that the approach could rapidly and reliably obtain hearing threshold information from 100-16,000 kHz. |
M. J. Goupell; M. Davoodian; S. Weinstein; D. Gadzinski; D. N. Zotkin; K. Sethunath; R. Duraiswami; |
1972 | Rate-Distortion Optimization with Alternative References for UGC Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the saturation problem in UGC compression, where the goal is to identify and avoid during encoding, the coding parameters and rates that lead to quality saturation. |
X. Xiong; E. Pavez; A. Ortega; B. Adsumilli; |
1973 | Rate-Distortion Optimized Variable-Node-size Trisoup for Point Cloud Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To maximize the coding performances of the variable-node-size method, we propose a new cost function considering both bit rates and distortions. |
K. Unno; K. Matsuzaki; S. Komorita; K. Kawamura; |
1974 | Rate Region Characterization for Semantics and Bits Based Multiuser Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The coexistence of semantic communication (SemCom) and bit-based communication (BitCom) towards next-generation wireless networks is investigated. First, a semantic and bit uplink … |
X. Mu; Y. Liu; |
1975 | Rate Splitting and Precoding Strategies for Multi-User MIMO Broadcast Channels with Common and Private Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a precoder design for multi-user multiple-input multiple-output (MU-MIMO) broadcast systems with rate splitting at the transmitter. |
L. Khamidullina; A. L. F. de Almeida; M. Haardt; |
1976 | RAT: Radial Attention Transformer for Singing Technique Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To recognize types of singing techniques can be quite challenging because 1) the time-frequency features in singing are highly dynamic that may appear in a long range of audio signals; 2) different singing techniques such as vibrato and trill tend to have similar features in the locality; 3) The distribution of singing technique dataset suffers from the long-tailed issue. To man-age these problems, we proposed a novel Radial Attention Transformer (RAT) with a Radial Attention (RA) Module that can capture the fine-grained local features as well as the long range inter-dependency of audio features. |
G. -Y. Chen; Y. -F. Yeh; V. -W. Soo; |
1977 | Raw Ultrasound-Based Phonetic Segments Classification Via Mask Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite achieving satisfactory performance, deep models rely on a large amount of manually labeled data, which is often difficult to obtain in practical settings. To address this issue, this paper focuses on how to utilize a large amount of unlabeled UTI data to improve the performance of UTI classification task. |
K. You; B. Liu; K. Xu; Y. Xiong; Q. Xu; M. Feng; T. G. Csapó; B. Zhu; |
1978 | RCDPT: Radar-Camera Fusion Dense Prediction Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the performance of using readout tokens in a vision transformer is limited. Therefore, we propose a novel fusion strategy to integrate radar data into a dense prediction transformer network by reassembling camera representations with radar representations. |
C. -C. Lo; P. Vandewalle; |
1979 | RD-NAS: Enhancing One-Shot Supernet Ranking Ability Via Ranking Distillation From Zero-Cost Proxies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Ranking Distillation one-shot NAS (RD-NAS) to enhance ranking consistency, which utilizes zero-cost proxies as the cheap teacher and adopts the margin ranking loss to distill the ranking knowledge. |
P. Dong; X. Niu; L. Li; Z. Tian; X. Wang; Z. Wei; H. Pan; D. Li; |
1980 | RDO Candidate Selection for Maximizing Coding Efficiency in A Practical HEVC Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the effectiveness of the following universally applicable RDO techniques: 1) rough mode decision for intra RDO candidate selection; 2) number of intra and inter RDO search candidates; and 3) accurate bit cost estimation in entropy coding. |
J. Sainio; A. Mercat; J. Vanne; |
1981 | Real-Time Audio-Visual End-To-End Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a low-latency real-time audio-visual end-to-end enhancement (AV-E3Net) model based on the recently proposed end-to-end enhancement network (E3Net). |
Z. Zhu; H. Yang; M. Tang; Z. Yang; S. E. Eskimez; H. Wang; |
1982 | Real-Time Human Reconstruction Based on Human Pose Prior and Epipolar Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a real-time human performance capture framework to generate high-quality surface geometry from multi-view RGB inputs. |
K. Luo; Z. Li; |
1983 | Real-Time Modelling of Observation Filter in The Remote Microphone Technique for An Active Noise Control Application Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method for combining the RMT with a new source-localization technique to estimate the source ratio parameter. |
C. K. Lai; B. Lam; D. Shi; W. -S. Gan; |
1984 | Real-Time MRI Video Synthesis from Time Aligned Phonemes with Sequence-to-Sequence Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on estimating utterance level rtMRI video from the spoken phoneme sequence. |
S. Udupa; P. K. Ghosh; |
1985 | Real-Time Multichannel Speech Separation and Enhancement Using A Beamspace-Domain-Based Lightweight CNN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a real-time multichannel speech separation and enhancement technique, which is based on the combination of a directional representation of the sound field, denoted as beamspace, with a lightweight Convolutional Neural Network (CNN). |
M. Olivieri; L. Comanducci; M. Pezzoli; D. Balsarri; L. Menescardi; M. Buccoli; S. Pecorino; A. Grosso; F. Antonacci; A. Sarti; |
1986 | Real-Time Speech Enhancement with Dynamic Attention Span Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to adaptively change the receptive field according to the input signal in deep neural network based SE model. |
C. Zheng; Y. Zhou; X. Peng; Y. Zhang; Y. Lu; |
1987 | Real-Time Speech Interruption Analysis: from Cloud to Client Deployment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have recently developed the first speech interruption analysis model WavLM_SI, which detects failed speech interruptions, shows very promising performance, and is being deployed in the cloud. To deliver this feature in a more cost-efficient and environment-friendly way, we reduced the model complexity and size to ship the WavLM_SI model in client devices. |
Q. Fu; S. -W. Fu; Y. Fan; Y. Wu; Z. Chen; J. Gupchup; R. Cutler; |
1988 | Real-Time Target Sound Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first neural network model to achieve real-time and streaming target sound extraction. |
B. Veluri; J. Chan; M. Itani; T. Chen; T. Yoshioka; S. Gollakota; |
1989 | Real-Time Wireless ECG-Derived Respiration Rate Estimation Using An Autoencoder with A DCT Layer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a wireless ECG-derived Respiration Rate (RR) estimation using an autoencoder with a DCT Layer. |
H. Pan; X. Zhu; Z. Ye; P. -Y. Chen; A. E. Cetin; |
1990 | Received Power Maximization with Practical Phase-Dependent Amplitude Response in RIS-Aided OFDM Wireless Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on a practical phase-shift model, we formulate an optimization problem to maximize the power received by the user subject to a frequency-selective fading channel under Orthogonal Frequency Division Multiplexing (OFDM) transmission. |
D. Kompostiotis; D. Vordonis; V. Paliouras; |
1991 | Receptive Field Reliant Zero-Cost Proxies for Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we have proposed a set of receptive field reliant zero-cost proxies which need only one iteration of training and thereby reduce the computational time associated with evaluation criterion during the NAS. |
P. Keserwani; S. S. Miriyala; V. N. Rajendiran; P. N. Shivamurthappa; |
1992 | Recouple Event Field Via Probabilistic Bias for Event Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Probabilistic reCoupling model enhanced Event extraction framework (ProCE). |
X. Bai; T. Wu; H. Guo; Z. Zhao; X. Yang; J. Li; W. Liu; Q. Ju; W. Guo; Y. Yang; |
1993 | Recurrent Fine-Grained Self-Attention Network for Video Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Recurrent Fine-Grained Self-Attention Network (RFSNet) to achieve efficient and accurate counting in video scenes via the self-attention mechanism and a recurrent fine-tuning strategy. |
J. Zhang; Z. Wu; X. Zhang; G. Song; Y. Wang; J. Chen; |
1994 | Recursive Estimation of User Intent From Noninvasive Electroencephalography Using Discriminative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of inferring user intent from noninvasive electroencephalography (EEG) to restore communication for people with severe speech and physical impairments (SSPI). |
N. Smedemark-Margulies; B. Celik; T. Imbiriba; A. Kocanaogullari; D. Erdoğmuş; |
1995 | Recursive/Iterative Unique Projection-Aggregation Decoding of Reed-Muller Codes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe recursive unique projection-aggregation (RUPA) decoding and iterative unique projection-aggregation (IUPA) decoding of Reed-Muller (RM) codes, which remove non-unique projections from the recursive projection-aggregation (RPA) and iterative projection-aggregation (IPA) algorithms respectively. |
M. Hashemipour-Nazari; R. Debets; K. Goossens; A. Balatsoukas-Stimming; |
1996 | Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a recursive joint attention model is proposed along with long short-term memory (LSTM) modules for the fusion of vocal and facial expressions in regression-based ER. |
R. G. Praveen; E. Granger; P. Cardinal; |
1997 | Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. |
H. Liu; H. Xu; L. P. Garcia; A. W. H. Khong; Y. He; S. Khudanpur; |
1998 | Reducing The Communication and Computational Cost of Random Fourier Features Kernel LMS in Diffusion Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a censoring algorithm for adaptive kernel diffusion networks based on random Fourier features that locally adapts the number of nodes censored according to the estimation error. |
D. G. Tiglea; R. Candido; L. A. Azpicueta-Ruiz; M. T. M. Silva; |
1999 | Reducing The Computational Complexity of Learning with Random Convolutional Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since these methods are data-independent, many of the extracted features are redundant. To address this problem, we propose a simple and efficient feature selection method based on knee/elbow detection in the curve of ordered coefficients in linear regression. |
M. A. Omidi; B. Seyfe; S. Valaee; |
2000 | Reducing The GAP Between Streaming and Non-Streaming Transducer-Based ASR By Adaptive Two-Stage Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an adaptive two-stage knowledge distillation method consisting of hidden layer learning and output layer learning. |
H. Tang; Y. Fu; L. Sun; J. Xue; D. Liu; Y. Li; Z. Ma; M. Wu; J. Pan; G. Wan; M. Zhao; |
2001 | Refined Pseudo Labeling for Source-Free Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a refined pseudo labeling framework for source-free DAOD. |
S. Zhang; L. Zhang; Z. Liu; |
2002 | Region-Awared Transformer with Asymmetric Loss in Multi-Label Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By leveraging the attention mechanism in transformer, we propose a region-awared transformer to focus on top related regions and neglect background interference. |
L. Zhang; J. Liu; Y. Bao; J. Wang; |
2003 | Regression to Classification: Waveform Encoding for Neural Field-Based Audio Signal Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce noise and improve representation quality, we propose using waveform encoding in the neural field. |
T. Kim; D. Rho; G. Lee; J. Park; J. H. Ko; |
2004 | Regularized Deep Generative Model Learning for Real-Time Massive MIMO Channel Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a real-time compressive channel tracking (CT) algorithm based on regularized deep generative model (DGM) to recover time-varying channels recursively. |
L. Lian; B. Wang; |
2005 | Regularized EM Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, in many signal processing problems, a priori information can be available indicating certain structures for different cluster covariance matrices. In this paper, we present a regularized EM algorithm for GMM-s that can make efficient use of such prior knowledge as well as cope with LSS situations. |
P. Houdouin; E. Ollila; F. Pascal; |
2006 | Regularized Neural Detection for Millimeter Wave Massive Mimo Communication Systems with One-Bit Adcs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Addressing these, we introduce a new framework to ensure equitable per-user performance, in spite of joint multi-user detection. |
A. Sant; B. D. Rao; |
2007 | Relapse Detection in Patients with Psychotic Disorders Using Unsupervised Learning on Smartwatch Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our solution for the ICASSP Signal Processing Grand Challenge e-Prevention track 2 Relapse Detection. |
S. Hamieh; V. Heiries; H. Al Osman; C. Godin; |
2008 | Relapse Prediction from Long-Term Wearable Data Using Self-Supervised Learning and Survival Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we use long-term data acquired from commercial smartwatches, including kinetic and physiological signals, to extract information-thick descriptors that are used for the prediction of subsequent relapses in patients in the psychotic spectrum. |
E. Fekas; A. Zlatintsi; P. P. Filntisis; C. Garoufis; N. Efthymiou; P. Maragos; |
2009 | Relate Auditory Speech To Eeg By Shallow-Deep Attention-Based Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Shallow-Deep Attention-based Network (SDANet) to classify the correct auditory stimulus evoking the EEG signal. |
F. Cui; L. Guo; L. He; J. Liu; E. Pei; Y. Wang; D. Jiang; |
2010 | Relating EEG Recordings to Speech Using Envelope Tracking and The Speech-FFR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We report on an approach developed for the ICASSP 2023 ‘Auditory EEG Decoding’ Signal Processing Grand Challenge. |
M. Thornton; D. Mandic; T. Reichenbach; |
2011 | Relational Representation Learning for Zero-Shot Relation Extraction with Instance Prompting and Prototype Rectification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method based on Instance Prompting and Prototype Rectification (IPPR) to conduct relational representation learning for zeroshot relation extraction. |
B. Duan; X. Liu; S. Wang; Y. Xu; B. Xiao; |
2012 | Relative Dynamic Time Warping Comparison for Pronunciation Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose using a dynamic time warping (DTW) difference-to-sum ratio to classify speech as either matching or diverging from a linguistic standard. |
C. Richter; J. Guðnason; |
2013 | Relevance Propagation Through Deep Conditional Random Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is a lack of work on post-hoc explanation approaches to CRFs, especially when the model is softmax-activated like the deep mean field network (DMFN). In this paper, we bridge this gap by proposing a layer-wise relevance propagation (LRP) method based on deep Taylor decomposition to explain CRFs, especially the DMFN model. |
X. Yang; B. Joukovsky; N. Deligiannis; |
2014 | Reliability Estimation for Synthetic Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for estimating the reliability of a prediction performed by a speech deepfake detector. |
D. Salvi; P. Bestagini; S. Tubaro; |
2015 | Reliable Beamforming at Terahertz Bands: Are Causal Representations The Way Forward? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Herein, a dynamic, semantically aware beamforming solution is proposed for the first time, utilizing novel artificial intelligence algorithms in variational causal inference to compute the time-varying dynamics of the causal representation of multi-modal data and the beamforming. |
C. K. Thomas; W. Saad; |
2016 | Reliable Cluster-Based Framework for Open Set Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a Reliable Cluster-based Framework (RCF), including Coarse Target Clustering, Structured Matching Strategy, and Reliable Pseudo-Label Training modules, as a general framework to solve the impact of faulty pseudo-labels for cluster-based OSDA methods. |
X. Zheng; Y. Huang; J. Tang; |
2017 | Removing Radio Frequency Interference From Auroral Kilometric Radiation With Stacked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we extend recent developments in deep learning algorithms to astronomy data. |
A. Chang; M. Knapp; J. LaBelle; J. Swoboda; R. Volz; P. J. Erickson; |
2018 | Repackagingaugment: Overcoming Prediction Error Amplification in Weight-Averaged Speech Recognition Models Subject to Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Weight-averaging methods have been employed to refine the pseudo-labels in a variety of studies; however, these methods amplify the prediction errors of each self-trained model. To alleviate this problem, we propose RepackagingAugment, a data augmentation method that improves the diversity of models while preventing the same incorrect labels from recursively occurring in every epoch. |
J. -H. Lee; D. -H. Kim; J. -H. Chang; |
2019 | Repetition Counting from Compressed Videos Using Sparse Residual Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach that directly utilizes the components of a compressed video for predicting the count of a repeating action occurring in the video. |
R. Khurana; J. R. Vachhani; S. Vasant Gothe; P. Kashyap; |
2020 | Representation Learning of Clinical Multivariate Time Series with Random Filter Banks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the Random Frequency Butchering (RFB) method to enhance the generalization performance of classification tasks on limited time series in health care. |
A. Keshavarzian; H. Salehinejad; S. Valaee; |
2021 | Representation of Vocal Tract Length Transformation Based on Group Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the property of vocal tract length transformation (VTLT) that forms a group, and derive the novel speech representation VTL spectrum based on group theory analysis, where only the phase of the VTL spectrum is changed by VTLT, which is a simple linear shift. |
A. Miyashita; T. Toda; |
2022 | Residual Hybrid Attention Network for Compression Artifact Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, to remove the compression artifacts, we propose a hybrid attention block (HAB) to adaptively restore the loss of high-frequency components, which parallelly predicts attention maps along two separate dimensions spatial and frequency spectra. |
B. Luo; W. Yu; |
2023 | Residual Squeeze-and-Excitation U-Shaped Network for Minutia Extraction in Contactless Fingerprint Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes and analyzes a residual squeeze-and-excitation U-shaped deep learning model for extracting minutiae in contactless fingerprint images. |
A. N. Cotrim; H. Pedrini; |
2024 | Resolving Doppler Ambiguity Via Spread Phase Alignment in FDA-MIMO Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our modeling stage, a spread phase alignment (SPA) method is proposed by utilizing the pulse-dependent transmit spatial frequency. |
Y. Wang; S. Zhu; G. Liao; L. Lan; Z. Chen; F. Liu; |
2025 | Resource Allocation for UAV-Enabled Integrated Sensing and Communication (ISAC) Via Multi-Objective Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider an integrated sensing and communication (ISAC) system with wireless power transfer (WPT) where an unmanned aerial vehicle (UAV)-based radar serves a group of energy-limited communication users in addition to its sensing functionality. |
O. Rezaei; M. M. Naghsh; S. M. Karbasi; M. M. Nayebi; |
2026 | Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we analyze the performance of features at different layers of a foundation model on the speech recognition task and propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models. |
Z. Huo; K. C. Sim; B. Li; D. Hwang; T. N. Sainath; T. Strohman; |
2027 | Restoration of Time-Varying Graph Signals Using Deep Algorithm Unrolling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a restoration method of time-varying graph signals, i.e., signals on a graph whose signal values change over time, using deep algorithm unrolling. |
H. Kojima; H. Noguchi; K. Yamada; Y. Tanaka; |
2028 | Rethinking Implicit Neural Representations For Vision Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we present three key designs for basic blocks in INRN along with two different stacking ways and corresponding loss functions. |
Y. Song; Q. Zhou; L. Ma; |
2029 | Rethinking Learning-Based Method for Lossless Genome Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we redesign the deep learning model and propose a simple yet effective position-driven transformer for genome data compression. |
H. Yang; F. Gu; J. Ye; |
2030 | Rethinking Random Walk in Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attempt to provide a graph neural network architecture that simultaneously addresses expressiveness, complexity and real-world performance. |
D. Zeng; W. Chen; W. Liu; L. Zhou; H. Qu; |
2031 | Rethinking Rule-Based Approaches in Session-Based Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in recent years, few studies have tried to build traditional models. In this paper, we investigate this issue and propose a concise rule-based method for session-based recommendation. |
L. Wang; M. Li; H. -T. Zheng; |
2032 | Rethinking The Reasonability of The Test Set for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we manually annotate a monotonic test set based on the MuST-C English-Chinese test set, denoted as SiMuST-C. |
M. Liu; W. Zhang; X. Li; J. Luan; B. Wang; Y. Guo; S. Chen; |
2033 | Rethink Long-Tailed Recognition with Vision Transforms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit recent LTR methods with promising Vision Transformers (ViT). |
Z. Xu; S. Yang; X. Wang; C. Yuan; |
2034 | Rethink Pair-Wise Self-Supervised Cross-Modal Retrieval From A Contrastive Learning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing self-supervised crossmodal approaches still suffer from the faulty negative sample selection strategy and the lack of reliable high-level semantic discriminative guidance. Therefore, we propose a robust self-supervised co-training instance and semantic discrimination learning method (RCL) for cross-modal retrieval. |
T. Gong; J. Wang; L. Zhang; |
2035 | Retiformer: Retinex-Based Enhancement In Transformer For Low-Light Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformer-based methods have shown impressive potential in many low-level vision tasks but are rarely used for low-light image enhancement (LLIE). |
J. Ruan; X. Kong; W. Huang; W. Yang; |
2036 | Retinal Biomarkers for Detecting Diabetic Retinopaty Using Smartphone-Based Deep Learning Frameworks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After training a CNN on original fundus images from diverse datasets, we evaluate the trained model on various regions of the retina (the fovea, the optic disc, the center of fovea and optic disc, the center of lower fovea and optic disc, and center of upper fovea and optic disc) that could be most effective to determine DR. Our experiments show that retinal images from smartphone-based systems with a narrower FoV (40%) that covers around fovea provided very close performance to original images with 0.963 AUC (within 0.99 of the optimum). |
M. Karakaya; R. S. Aygun; |
2037 | Retrieval-Based Natural 3D Human Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, context-aware retrieval-based approaches are proposed for predicting motion lengths and generating proper 3D motions (C-MO). |
Z. Tan; W. Yang; S. Wu; |
2038 | Reverberation As Supervision For Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation. |
R. Aralikatti; C. Boeddeker; G. Wichern; A. Subramanian; J. Le Roux; |
2039 | Revisit Out-Of-Vocabulary Problem For Slot Filling: A Unified Contrastive Framework With Multi-Level Data Augmentations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified contrastive learning framework, which pull representations of the origin sample and augmentation samples together, to make the model resistant to OOV problems. |
D. Guo; G. Dong; D. Fu; Y. Wu; C. Zeng; T. Hui; L. Wang; X. Li; Z. Wang; K. He; X. Cui; W. Xu; |
2040 | Revisit Sampling Theory of Bandlimited Graph Signals: One Bridge Between GSP and DSP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though there were amounts of sampling objectives and algorithms proposed for BL graph signals, the essential relationship between those sampling objectives in GSP and Nyquist sampling theorem in DSP is still undiscovered. In this paper, we bridge this gap by revisiting sampling theory in GSP thoroughly. |
F. Wang; T. Li; X. Zhang; |
2041 | RGB-D Based Pose-Invariant Face Recognition Via Attention Decomposition Module Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a RGB-D based pose-invariant face recognition model which is light enough to meet the demands of edge devices. |
W. -C. Lin; C. -T. Chiu; K. -C. Shih; |
2042 | Rigid-Body Sound Synthesis with Differentiable Modal Resonators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel end-to-end framework for training a deep neural network to generate modal resonators for a given 2D shape and material using a bank of differentiable IIR filters. |
R. Diaz; B. Hayes; C. Saitis; G. Fazekas; M. Sandler; |
2043 | Ripple Sparse Self-Attention for Monaural Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a simple yet effective sparse self-attention for speech enhancement, called ripple attention, which simultaneously performs fine- and coarse-grained modeling for local and global dependencies, respectively. |
Q. Zhang; H. Zhu; Q. Song; X. Qian; Z. Ni; H. Li; |
2044 | RIS-Aided Wideband DFRC with Reconfigurable Holographic Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel frequency-selective RIS-assisted wideband DFRC system that is also equipped with a RHS at the transceiver. |
T. Wei; L. Wu; K. V. Mishra; M. R. Bhavani Shankar; |
2045 | RIS Reflection and Placement Optimisation for Underlay D2D Communications in Cognitive Cellular Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An RIS-aided device-to-device (D2D) communication system operating in underlay mode is considered in this work. |
S. Ghose; D. Mishra; S. P. Maity; G. C. Alexandropoulos; |
2046 | RL-IFF: Indoor Localization Via Reinforcement Learning-Based Information Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper is motivated by the importance of the Smart Cities (SC) concept for future management of global urbanization. |
M. Salimibeni; A. Mohammadi; |
2047 | RNN-Based Step-Size Estimation for The RLS Algorithm with Application to Acoustic Echo Cancellation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an recurrent neural network (RNN) based step-size estimation for the recursive least squares (RLS) algorithm with application to acoustic echo cancellation (AEC). |
O. Schwartz; A. Schwartz; |
2048 | Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to use lightweight character representations to encode fine-grained pronunciation features to improve contextual biasing guided by acoustic similarity between the audio and the contextual entities (termed acoustic biasing). |
X. Fu; K. M. Sathyendra; A. Gandhe; J. Liu; G. P. Strimel; R. McGowan; A. Mouchtaris; |
2049 | Robust Adaptive Beamforming with Proximal Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A first-order method is proposed to solve the beamformers for large arrays. |
R. Li; D. Cabric; |
2050 | Robust and Globally Sparse Pca Via Majorization-Minimization and Variable Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve it, we propose to leverage variable splitting methods, with the crucial step then lying on the Stiefel manifold. |
H. Brehier; A. Breloy; M. N. El Korso; S. Kumar; |
2051 | Robust and Parallelizable Tensor Completion Based on Tensor Factorization and Maximum Correntropy Criterion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new robust and parallelizable tensor completion method using the tubal rank model. |
Y. He; G. K. Atia; |
2052 | Robust Angle Estimation for Hybrid MmWave Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider robust channel estimation for a millimeter wave (mmWave) massive MIMO system that is equipped with uniform linear arrays (ULA). |
Y. -P. Lin; T. -M. Yang; |
2053 | Robust Audio-Visual ASR with Unified Cross-Modal Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new audio-visual speech recognition model with a unified cross-modal attention mechanism. |
J. Li; C. Li; Y. Wu; Y. Qian; |
2054 | Robust Autoencoders for Collective Corruption Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ℓ1- and scaling-invariant ℓ1/ℓ2-robust autoencoders based on a surprisingly compact formulation built on the intuition that deep autoencoders perform manifold learning. |
T. Li; H. Wang; P. Le; X. Tang; J. Sun; |
2055 | Robust Binary Component Decompositions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend their approach in the presence of noise. |
C. Kolomvakis; N. Gillis; |
2056 | Robust Binaural Sound Localisation with Temporal Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite there being clear evidence for attentional effects in biological spatial hearing, relatively few machine hearing systems exploit attention in binaural sound localisation. This paper addresses this issue by proposing a novel binaural machine hearing system with temporal attention for robust localisation of sound sources in noisy and reverberant conditions. |
Q. Hu; N. Ma; G. J. Brown; |
2057 | Robust Content-Variant Reference Image Quality Assessment Via Similar Patch Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively utilize CVR images and make the algorithm more robust, we propose a CVR IQA scheme based on similar patch matching. |
W. Shi; W. Yang; Q. Liao; |
2058 | Robust Data2VEC: Noise-Robust Speech Representation Learning for ASR By Combining Regression and Improved Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a noise-robust data2vec for self-supervised speech representation learning by jointly optimizing the contrastive learning and regression tasks in the pre-training stage. |
Q. -S. Zhu; L. Zhou; J. Zhang; S. -J. Liu; Y. -C. Hu; L. -R. Dai; |
2059 | Robust Data-Driven Accelerated Mirror Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One such approach is based on the classical mirror descent algorithm, where the mirror map is modelled using input-convex neural networks. In this work, we extend this functional parameterization approach by introducing momentum into the iterations, based on the classical accelerated mirror descent. |
H. Y. Tan; S. Mukherjee; J. Tang; A. Hauptmann; C. -B. Schönlieb; |
2060 | Robustdistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Their large sizes, in turn, make them unfeasible for edge applications. In this work, we propose a knowledge distillation methodology termed RobustDistiller which compresses universal representations while making them more robust against environmental artifacts via a multi-task learning objective. |
H. R. Guimarães; A. Pimentel; A. R. Avila; M. Rezagholizadeh; B. Chen; T. H. Falk; |
2061 | Robust Dominant Periodicity Detection for Time Series with Missing Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a robust and effective periodicity detection algorithm for time series with block missing data. |
Q. Wen; L. Yang; L. Sun; |
2062 | Robust Fir Filters for Wireless Low-Frequency Sound Zones Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose robust FIR filters for low-frequency sound zone system by incorporating information about the expected packet losses into the design. |
M. Zhou; M. B. Møller; C. S. Pedersen; J. Østergaard; |
2063 | Robust GMM Parameter Estimation Via The K-BM Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an expectation-maximization (EM)-like scheme, called ${\mathcal{K}}$-BM, for iterative numerical computation of the minimum ${\mathcal{K}}$-divergence estimator (M${\mathcal{K}}$DE). |
O. Kenig; K. Todros; T. Adali; |
2064 | Robust Hyperspectral Anomaly Detection with Simultaneous Mixed Noise Removal Via Constrained Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method to achieve robust anomaly detection even when HS images contain various types of noise. |
K. Sato; S. Ono; |
2065 | Robust Hypothesis Testing With Moment Constrained Uncertainty Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The problem of robust binary hypothesis testing is studied. Under both hypotheses, the data-generating distributions are assumed to belong to uncertainty sets constructed through … |
A. Magesh; Z. Sun; V. V. Veeravalli; S. Zou; |
2066 | Robust Iterative Solution for Linear Array-Based 3-D Localization By Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper reformulates the maxi-mum likelihood (ML) estimation of the 3-D localization problem using the factor graph model, where an effective algorithm is designed through message passing. |
Y. Sun; K. C. Ho; Y. Yang; L. Zhang; L. Chen; |
2067 | Robust Knowledge Distillation from RNN-T Models with Noisy Training Labels Using Full-Sum Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. |
M. Zeineldeen; K. Audhkhasi; M. K. Baskar; B. Ramabhadran; |
2068 | Robust Log-Based Anomaly Detection with Hierarchical Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, logs usually suffer from perturbations and it makes the existing log-based anomaly detection methods unstable. In this paper, we aim to solve this problem from the perspective of contrastive learning, by which the intrinsic and robust representations of logs are learned for anomaly detection. |
Y. Zhao; R. Yang; N. Yang; T. Lin; Q. Fu; Y. Ma; |
2069 | Robust M-Estimation Based Distributed Expectation Maximization Algorithm with Robust Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose a robust distributed expectation maximization (EM) algorithm based on Real Elliptically Symmetric (RES) distributions, which is highly adaptive to outliers and moreover is combined with a robust data aggregation step which provides robustness against malicious nodes. |
C. A. Schroth; S. Vlaski; A. M. Zoubir; |
2070 | Robust Monocular Localization of Drones By Adapting Domain Maps to Depth Prediction Inaccuracies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel monocular localization framework by jointly training deep learning-based depth prediction and Bayesian filtering-based pose reasoning. |
P. Shukla; S. Sureshkumar; A. C. Stutts; S. Ravi; T. Tulabandhula; A. R. Trivedi; |
2071 | Robust Multi-modal Speech Emotion Recognition with ASR Error Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an SER method robust to ASR errors. |
B. Lin; L. Wang; |
2072 | Robust Multi-Object Tracking With Spatial Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, spatial uncertainty is proposed for MOT. |
P. -J. Liao; Y. -C. Huang; C. -K. Chiang; S. -H. Lai; |
2073 | Robustness and Convergence of Mirror Descent for Blind Deconvolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The algorithm that emerges has nice convergence guarantees and is provably robust in a sense we formalize in the paper. |
R. Mehta; S. N. Ravi; V. Singh; |
2074 | Robustness of Deep Equilibrium Architectures to Changes in The Measurement Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep model-based architectures (DMBAs) are widely used in imaging inverse problems to integrate physical measurement models and learned image priors. |
J. Hu; S. Shoushtari; Z. Zou; J. Liu; Z. Sun; U. S. Kamilov; |
2075 | Robustness-Preserving Lifelong Learning Via Dataset Condensation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, it is also known that machine learning (ML) models can be vulnerable in the sense that even tiny, adversarial input perturbations can deceive the models into producing erroneous predictions. This motivates the research objective of this paper – specification of a new LL framework that can salvage model robustness (against adversarial attacks) from catastrophic forgetting. |
J. Jia; Y. Zhang; D. Song; S. Liu; A. Hero; |
2076 | Robust Network Topologies for Distributed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, the majority of neighbors of any benign agent must be benign, and the subgraph of benign agents must be connected. In this work, we propose a scheme for the design of such topologies based on prior information of the risk profile of participating agents. |
C. Wang; S. Vlaski; |
2077 | Robust Online Multiband Drift Estimation in Electrophysiology Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: High-density electrophysiology probes have opened new possibilities for systems neuroscience in human and non-human animals, but probe motion poses a challenge for downstream analyses, particularly in human recordings. We improve on the state of the art for tracking this motion with four major contributions. First, we extend previous decentralized methods to use multiband information, leveraging the local field potential (LFP) in addition to spikes. |
C. Windolf; A. C. Paulk; Y. Kfir; E. Trautmann; D. Meszéna; W. Muñoz; I. Caprara; M. Jamali; J. Boussard; Z. M. Williams; S. S. Cash; L. Paninski; E. Varol; |
2078 | Robust Self-Guided Deep Image Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the deep image prior (DIP) for reconstruction problems in magnetic resonance imaging (MRI). |
E. Bell; S. Liang; Q. Qu; S. Ravishankar; |
2079 | Robust Spatiotemporal Fusion of Satellite Images Via Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an optimization-based ST fusion method that is robust to noise. |
R. Isono; K. Naganuma; S. Ono; |
2080 | Robust Subspace Tracking with Contamination Mitigation Via Α-Divergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We studied the problem of robust subspace tracking (RST) in contaminated environments. |
L. T. Thanh; A. M. Rekavandi; A. -K. Seghouane; K. Abed-Meraim; |
2081 | Robust Time Series Recovery and Classification Using Test-Time Noise Simulator Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a general framework and models for time-series that can make use of (unlabeled) test samples to estimate the noise model—entirely at test time. |
E. S. Jeon; S. Lohit; R. Anirudh; P. Turaga; |
2082 | Robust Video Anomaly Detection Framework Via Prior Knowledge and Multi-Path Frame Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Video anomaly detection aims to automatically detect abnormal objects or behaviors. Most existing methods tackle the problem by minimizing the reconstruction errors stemming from … |
M. Zhang; J. Wang; J. Wang; Q. Qi; Z. Zhuang; H. Sun; N. Xiao; |
2083 | Robust Video Object Segmentation with Restricted Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Robust Video Object Segmentation With Restricted Attention (RVOSR), which can suppress the effects caused by similar objects and filter out noise confusion from other irrelevant regions. |
H. Zhang; P. Guo; Z. Le; W. Zhang; |
2084 | Robust Watermarking Scheme in Encrypted Domain Based on Integer Lifting Wavelet Transform and Compressed Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the above, we propose a robust watermarking scheme in encrypted domain based on integer lifting wavelet transform (LWT) and compressed sensing (CS). |
D. Xiao; Q. Tang; A. Zhao; M. Li; |
2085 | ROI-Based Deep Image Compression with Swin Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a ROI-based image compression framework with Swin transformers as main building blocks for the autoencoder network. |
B. Li; J. Liang; H. Fu; J. Han; |
2086 | Role of Bias Terms in Dot-Product Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This attention module is comprised of three linear transformations, namely query, key, and value linear transformations, each of which has a bias term. In this work, we study the role of these bias terms, and mathematically show that the bias term of the key linear transformation is redundant and could be omitted without any impact on the attention module. |
M. Namazifar; D. Hazarika; D. Hakkani-Tür; |
2087 | Role of Lexical Boundary Information in Chunk-Level Segmentation for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study analyzes the role of lexical boundary information by exploring alternative segmentation strategies for chunk-level SER. |
W. -C. Lin; C. Busso; |
2088 | Room Impulse Response Reconstruction Based on Spatio-Temporal-Spectral Features Learned from A Spherical Microphone Array Measurement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method to reconstruct RIRs based on reflection source locations and time-frequency-direction-dependent reflection magnitude response estimated from a single spherical microphone array measurement. |
A. Bastine; T. D. Abhayapala; J. A. Zhang; |
2089 | RØROS: Building A Responsive Online Recommender System Via Meta-Gradients Updating Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first study on the responsiveness aspect of recommender system and present Responsive Online RecOmmender System (RØROS) based on Meta-Gradients Update (MGU) techniques, which helps improve the recommendation quality for both existing and new users when the system only observes a limited number of in-coming interactions. |
X. Pan; M. Zhang; D. Wu; |
2090 | Row Conditional-TGAN for Generating Synthetic Relational Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Row Conditional–Tabular Generative Adversarial Network (RC-TGAN), a novel generative adversarial network (GAN) model that extends the tabular GAN to support modeling and synthesizing relational databases. |
M. Gueye; Y. Attabi; M. Dumas; |
2091 | Rumor Detection Via Assessing The Spreading Propensity of Users Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: User context, such as historical posts, offers extensive details about propensities for spreading rumors, which has great potential to promote rumor detection. Therefore, we explore a new feature space by extracting the spreading propensity from user context, and combine it with social interaction information to construct a creative detection algorithm. |
P. Zheng; Z. Huang; Y. Dou; Y. Yan; |
2092 | Runtime Prediction of Machine Learning Algorithms in Automl Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce a metalearning-based methodology for predicting the training runtime of various machine learning algorithms. |
P. Dube; T. Salonidis; P. Ram; A. Verma; |
2093 | S3I-PointHop: SO(3)-Invariant PointHop for 3D Point Cloud Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When input point clouds are not aligned, the classification performance drops significantly. In this work, we focus on a mathematically transparent point cloud classification method called PointHop, analyze its reason for failure due to pose variations, and solve the problem by replacing its pose dependent modules with rotation invariant counterparts. |
P. Kadam; H. Prajapati; M. Zhang; J. Xue; S. Liu; C. . -C. J. Kuo; |
2094 | SADE: A Self-Adaptive Expert for Multi-Dataset Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach, however, has its limitations when generalized to an unseen new distribution, and the number of extra parameters will increase with the number of training datasets. In this paper, we devise Self-ADaptive Expert (SADE), the key idea of which is to train a single expert that can be automatically adapted to each individual instance according to its gradients. |
Y. Peng; Q. Wang; Z. Mao; Y. Zhang; |
2095 | SADI: A Self-Adaptive Decomposed Interpretable Framework for Electric Load Forecasting Under Extreme Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we solve the electric load forecasting problem under extreme events such as scorching heats. |
H. Liu; Z. Ma; L. Yang; T. Zhou; R. Xia; Y. Wang; Q. Wen; L. Sun; |
2096 | SafeDeep: A Scalable Robustness Verification Framework for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This has been demonstrated by adversarial examples in the deep learning domain. To address this challenge, here, we propose a scalable robustness verification framework for Deep Neural Networks (DNNs). |
A. Baninajjar; K. Hosseini; A. Rezine; A. Aminifar; |
2097 | Saliency-Driven Hierarchical Learned Image Coding for Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to employ a saliency-driven hierarchical neural image compression network for a machine-to-machine communication scenario following the compress-then-analyze paradigm. |
K. Fischer; F. Brand; C. Blum; A. Kaup; |
2098 | Salient Co-Speech Gesture Synthesizing with Discrete Motion Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most previous research efforts, however, ignore this nature of co-speech gestures and synthesize deterministic results, producing over-smoothed movements with limited expressiveness. To address this issue, we propose a new co-speech gesture generation approach that produces high-quality salient gesticulations. |
Z. Ye; J. Jia; H. Wu; S. Huang; S. Sun; J. Xing; |
2099 | SAMO: Speaker Attractor Multi-Center One-Class Learning For Voice Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. |
S. Ding; Y. Zhang; Z. Duan; |
2100 | Sample-Adapt Fusion Network for RGB-D Hand Detection in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This inter-sample variance cannot be effectively perceived by static modeling operations shared across all samples. To address this problem, we propose a Sample-Adapt Fusion Network (SAFNet) with Channel Dynamic Refinement Module (CDRM) and Spatial Dynamic Aggregation Module (SDAM) to adaptively model the channel-wise and spatial-wise cross-modal correlation. |
X. Liu; P. Ren; Y. Chen; C. Liu; J. Wang; H. Sun; Q. Qi; J. Wang; |
2101 | Sample-Aware Knowledge Distillation for Long-Tailed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From the perspective of solving imbalance at the sample level, we propose a simple but effective method, named Sample-aware Knowledge Distillation (SAKD), which includes Selective Knowledge Distillation module and Stable Feature Center Learning module. |
S. Zheng; Y. Zhang; H. Huang; Y. Qu; |
2102 | Sample-Efficient Robust MMV Recovery Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a robust rank-aware algorithm to tackle noisy scenarios and show that it results in lower errors in the estimation of sparse vectors compared to the existing approaches. |
Y. Singh; J. S. Rohela; S. Mulleti; |
2103 | Sampling Order-Limited Signals on The Sphere Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a sampling method and develop associated SHT to save both the required number of samples and the computation cost while maintaining the signal reconstruction accuracy for order-limited signals. |
M. S. A. Khan; S. Nadeem; Z. Khalid; |
2104 | SAN: A Robust End-to-End ASR Model Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. |
Z. Min; Q. Ge; G. Huang; |
2105 | Sandformer: CNN and Transformer Under Gated Fusion for Sand Dust Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an effective hybrid architecture for sand image restoration tasks, which leverages local features from CNN and long-range dependencies captured by transformer to improve the results further. |
J. Shi; B. Wei; G. Zhou; L. Zhang; |
2106 | Sanet: Spatial Attention Network with Global Average Contrast Learning for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aiming to promote the comprehensive performance of detection, this paper proposes a Spatial Attention Network (SANet) with global average contrast learning specially for infrared small target. |
J. Zhu; S. Chen; L. Li; L. Ji; |
2107 | SARdBScene: Dataset and Resnet Baseline for Audio Scene Source Counting and Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a first of its kind dataset for audio scene analysis (ASA) and presents a baseline approach for audio source counting. |
M. Nigro; S. Krishnan; |
2108 | SAR Image Despeckling with Residual-in-Residual Dense Generative Adversarial Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel residual-in-residual dense generative adversarial network is proposed to effectively suppress SAR image speckle while retaining rich spatial information. |
Y. Bai; Y. Xiao; X. Hou; Y. Li; C. Shang; Q. Shen; |
2109 | Scalable and Secure Federated XGBoost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel formulation, termed splitting matrix, in the context of federated XGBoost that mathematically characterizes the role of passive party (PP) having been neglected in the literature. |
Q. M. Nguyen; N. Khanh Le; L. M. Nguyen; |
2110 | Scalable Multi-Task Semantic Communication System with Feature Importance Ranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel scalable multi-task semantic communication system with feature importance ranking (SMSC-FIR) is explored. |
J. Hu; F. Wang; W. Xu; H. Gao; P. Zhang; |
2111 | Scalable Weight Reparametrization for Efficient Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel, efficient transfer learning method, called Scalable Weight Reparametrization (SWR) that is efficient and effective for multiple downstream tasks. |
B. Kim; J. -T. Lee; S. Yang; S. Chang; |
2112 | Scale-Adaptive Tiny Object Detection Enhanced By Across-Scale and Shape-Preserved Semantic Location Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problems, we proposed an Instance-level, Scale-adaptive, Shape-preserved, and Semantic-consistent Supervision (I4S) module for better locating tiny objects. |
Y. He; R. Huang; Y. Shi; G. Xiao; B. Yang; Y. Li; |
2113 | ScaleMix: Intra- And Inter-Layer Multiscale Feature Combination for Change Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to mix intra- and inter-layer multiscale features to generate more complete change regions. |
R. Huang; Q. Zhao; R. Wang; C. Liu; S. Gao; Y. Zhang; W. Fan; |
2114 | Scaling Law Analysis for Covariance Based Activity Detection in Cooperative Multi-Cell Massive Mimo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to analyze the scaling law of covariance based activity detection in the multi-cell massive MIMO system. |
Z. Wang; Y. -F. Liu; Z. Wang; W. Yu; |
2115 | SCA: Streaming Cross-Attention Alignment For Echo Cancellation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an end-to-end echo cancellation network with a streaming cross-attention alignment (SCA). |
Y. Liu; Y. Shi; Y. Li; K. Kalgaonkar; S. Srinivasan; X. Lei; |
2116 | SC-Net: Salient Point and Curvature Based Adversarial Point Cloud Generation Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the shortcomings mentioned above, we propose a method called SC-Net, which can generate an adversarial point cloud in a single forward pass. |
Z. Zhang; N. Sang; X. Wang; M. Cai; |
2117 | Scoreformer: Score Fusion-Based Transformers for Weakly-Supervised Violence Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a score fusion-based transformer framework, named Scoreformer. |
Y. Xiao; L. Wang; T. Wang; H. Lai; |
2118 | SCSGNet: Spatial-Correlated and Shape-Guided Network for Breast Mass Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it has been a challenging task for two main reasons: (1) Breast masses are diverse; and (2) The boundaries of masses are ambiguous. To address these problems, we propose a Spatial-Correlated and Shape-Guided Network (SCSGNet), which combines global context extraction with local boundary refinement. |
Q. Li; J. Xu; R. Yuan; Y. Zhang; R. Feng; |
2119 | SDG-L: A Semiparametric Deep Gaussian Process Based Framework for Battery Capacity Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a semiparametric deep Gaussian process regression framework named SDG-L to give predictions based on the modeling of time series battery state data. |
H. Liu; Y. Wu; Y. Li; E. E. Kuruoglu; X. Zhang; |
2120 | SD-PINN: Physics Informed Neural Networks for Spatially Dependent PDES Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a modification of PINN, named as SD-PINN, which can recover the coefficients in spatially-dependent PDEs using only one neural network without the requirement of domain-specific physical knowledge. |
R. Liu; P. Gerstoft; |
2121 | SDRNet: Shape Decoupled Regression Network for 3d Face Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Shape Decoupled Regression Network (SDRNet) consisting of identity-focused branch and expression-focused branch with focused criteria for representation learning, which interact with the union branch to achieve the final 3DMM parameters regression for improved shape reconstruction. |
S. Zhang; F. Song; G. Song; M. Yang; |
2122 | SDTN: Speaker Dynamics Tracking Network for Emotion Recognition in Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The speakers’ emotional states are independent but influence each other during the conversation. To address the above issues, we propose a Speaker Dynamics Tracking Network (SDTN) for ERC. |
J. Chen; P. Huang; G. Huang; Q. Li; Y. Xu; |
2123 | Search for Efficient Deep Visual-Inertial Odometry Through Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is difficult to deploy such VIO models directly on energy-constrained mobile platforms in real-time due to the extensive complexity of existing deep neural network (DNN) models. To address this issue, we propose to adopt the neural architecture search (NAS) technique to search for the most efficient VIO network architecture. |
Y. Chen; M. Yang; H. -S. Kim; |
2124 | Second-Order Statistic Deviation to Model Anomalies in The Design of Unsupervised Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hereby we propose a tool to generate anomalies as a statistical deviation from the characterization of the signal representing the normal behavior. |
A. Enttsel; F. Martinini; A. Marchioni; M. Mangia; R. Rovatti; G. Setti; |
2125 | Selecting Language Models Features VIA Software-Hardware Co-Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a method to fully utilize the intermediate outputs of the popular large pre-trained models in natural language processing when used as frozen feature extractors, and further close the gap between their fine-tuning and more computationally efficient solutions. |
V. Pandelea; E. Ragusa; P. Gastaldo; E. Cambria; |
2126 | Selective Film Conditioning with CTC-Based ASR Probability for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although prior studies have improved the performance, they are inefficient because the two networks are combined and require large model sizes. To address this limitation, we propose an efficient way to use feature-wise linear modulation (FiLM) conditioning with CTC-based ASR probabilities for the SE system. |
D. -H. Yang; J. -H. Chang; |
2127 | Select The Best: Enhancing Graph Representation with Adaptive Negative Sample Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the impact of negative samples on learning graph-level representations, and innovatively propose a Reinforcement Graph Contrastive Learning (ReinGCL) for negative sample selection. |
X. Zheng; X. Liang; B. Wu; |
2128 | Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we improve the self-adaptive TTS using character-vocabulary level ASR feedback at higher granularity, considering the losses in the positive and negative classes. |
S. Novitasari; S. Sakti; S. Nakamura; |
2129 | Self-Adaptive Reasoning on Sub-Questions for Multi-Hop Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the Self-Adapting Reasoning Model (SAR) for solving multi-hop question answering (MHQA) tasks, where the QA system is supposed to find the correct answer within the given multiple documents and a multi-hop question. |
Z. Li; W. Peng; |
2130 | Self-Attention Based Action Segmentation Using Intra-And Inter-Segment Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a self-attention-based model that refines initial segmentations by separately considering intra-as well as inter-segment relations between predicted action segments. |
C. Patsch; E. Steinbach; |
2131 | Self-Attention for Enhanced OAMP Detection in MIMO Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we introduce a self-attention model for the enhancement of the iterative Orthogonal Approximate Message Passing (OAMP)-based decoding algorithm. |
A. Fuchs; C. Knoll; N. N. Moghadam; A. Pak; J. Huang; E. Leitinger; F. Pernkopf; |
2132 | Self-Convolution for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we take their complementary advantages and propose a new module, namely self-convolution, to compensate for each individual limitations. |
T. -H. Zhang; Q. Liu; X. Qian; S. -L. Chen; F. Chen; X. -C. Yin; |
2133 | Self-Distillation Hashing for Efficient Hamming Space Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Self-Distillation Hashing (SeDH), which improves the image retrieval performance without introducing a complex teacher model and significantly reduces the overall computation costs. |
H. Zhai; H. Li; H. Zhang; H. Bao; G. Zhang; |
2134 | Self-Healing Through Error Detection, Attribution, and Retraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, when negative feedback is received, it can be difficult to attribute the system error to a specific sub-component. In this work, we address this challenge by building a system for error attribution and correction. |
A. MacLaughlin; A. Rumshisky; R. Khaziev; A. Ramakrishna; Y. Merhav; R. Gupta; |
2135 | Self-Paced Partial Domain-Aware Learning for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they neglect that domain-related information may also contain helpful features for the classification task. To address this issue, we propose a self-paced partial domain-aware framework (SPDA) to preserve domain-related features helpful for the discrimination of fake and real faces, thereby increasing generalization for unseen domains. |
Z. Chen; Y. Lu; X. Deng; J. Meng; S. Zhang; L. Cao; |
2136 | Self-Remixing: Unsupervised Speech Separation VIA Separation and Remixing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Self-Remixing, a novel self-supervised speech separation method, which refines a pre-trained separation model in an unsupervised manner. |
K. Saijo; T. Ogawa; |
2137 | Self-Similarity Is All You Need for Fast and Light-Weight Generic Event Boundary Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a projected SSM with cosine distance that produces an efficient representation of SSM that can be interpreted using lighter transformer decoders. |
S. Vasant Gothe; J. Rajkumar Vachhani; R. Khurana; P. Kashyap; |
2138 | Self-Sufficient Framework for Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition. |
Y. Jang; Y. Oh; J. W. Cho; M. Kim; D. -J. Kim; I. S. Kweon; J. Son Chung; |
2139 | Self-Supervised Accent Learning for Under-Resourced Accents Using Native Language Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to improve the accuracy of an English speech recognizer for a target accent using the corresponding native language data. |
M. Kumar; J. Kim; D. Gowda; A. Garg; C. Kim; |
2140 | Self-Supervised Adversarial Training for Contrastive Sentence Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel method to re-formulate CL to meet a self-supervised classification objective. |
J. -T. Chien; Y. -A. Chen; |
2141 | Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that the complementary information in different modalities ensures a robust supervisory signal for audio and visual representation learning. This motivates us to propose an audio-visual self-supervised learning framework named Co-Meta Learning. |
H. Chen; H. Zhang; L. Wang; K. A. Lee; M. Liu; J. Dang; |
2142 | Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel method, named AV2vec, for learning audio-visual speech representations by multimodal self-distillation. |
J. -X. Zhang; G. Wan; Z. -H. Ling; J. Pan; J. Gao; C. Liu; |
2143 | Self Supervised Bert for Legal Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Legal text classification faces two trivial problems: labeling legal data is a sensitive process and can only be carried out by skilled professionals, and legal text is prone to privacy issues hence not all the data can be made available in the public domain. This means that we have limited diversity in the textual data, and to account for this data paucity, we propose a self-supervision approach to train Legal-BERT classifiers. |
A. Pal; S. Rajanala; R. C. . -W. Phan; K. Wong; |
2144 | Self-Supervised Facial Action Unit Detection with Region and Relation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel self-supervised framework for AU detection with the region and relation learning. |
J. Song; Z. Liu; |
2145 | Self-Supervised Guided Hypergraph Feature Propagation for Semi-Supervised Classification with Missing Node Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although these methods have achieved superior performance, how to exactly exploit the complex data correlations among nodes to reconstruct missing node features is still a great challenge. To solve the above problem, we propose a self-supervised guided hypergraph feature propagation (SGHFP). |
C. Lei; S. Fu; Y. Wang; W. Qiu; Y. Hu; Q. Peng; X. You; |
2146 | Self-Supervised Hierarchical Metrical Structure Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method to model hierarchical metrical structures for both symbolic music and audio signals in a self-supervised manner with minimal domain knowledge. |
J. Jiang; G. Xia; |
2147 | Self-Supervised Learning-Based Source Separation for Meeting Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, seven SSL models were compared on both simulated and real-world corpora. |
Y. Li; X. Zheng; P. C. Woodland; |
2148 | Self-Supervised Learning for Speech Enhancement Through Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a denoising vocoder (DeVo) approach, where a vocoder accepts noisy representations and learns to directly synthesize clean speech. |
B. Irvin; M. Stamenovic; M. Kegler; L. -C. Yang; |
2149 | Self-Supervised Learning of Audio Representations Using Angular Contrastive Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the discriminative ability of feature embeddings in SSL, we propose a new loss function called Angular Contrastive Loss (ACL), a linear combination of angular margin and contrastive loss. |
S. Wang; S. Tripathy; A. Mesaros; |
2150 | Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion. |
Z. Huang; Z. Chen; N. Kanda; J. Wu; Y. Wang; J. Li; T. Yoshioka; X. Wang; P. Wang; |
2151 | Self-Supervised Learning with Explorative Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we devise a self-supervised explorative distillation (SSED) algorithm to improve the representation quality of the lightweight models. |
T. Su; J. Zhang; G. Wang; X. Liu; |
2152 | Self-Supervised Representations for Singing Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we circumvent disentanglement training and propose a new model that leverages ASR fine-tuned self-supervised representations as inputs to a HiFi-GAN neural vocoder for singing voice conversion. |
T. Jayashankar; J. Wu; L. Sari; D. Kant; V. Manohar; Q. He; |
2153 | Self-Supervised Representations in Speech-Based Depression Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). |
W. Wu; C. Zhang; P. C. Woodland; |
2154 | Self-Supervised Speech Representation Learning for Keyword-Spotting With Light-Weight Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Self-supervised speech representation learning (S3RL) is revolutionizing the way we leverage the ever-growing availability of data. |
C. Gao; Y. Gu; F. Caliva; Y. Liu; |
2155 | Self-Transriber: Few-Shot Lyrics Transcription With Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first semi-supervised lyrics transcription paradigm, Self-Transcriber, by leveraging on unlabeled data using selftraining with noisy student augmentation. |
X. Gao; X. Yue; H. Li; |
2156 | Selinet: A Lightweight Model for Single Channel Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a lightweight yet effective network for speech separation, namely SeliNet. |
H. M. Tan; D. -Q. Vu; J. -C. Wang; |
2157 | SemanticAC: Semantics-Assisted Framework for Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SemanticAC, a semantics-assisted framework for Audio Classification to better leverage the semantic information. |
Y. Xiao; Y. Ma; S. Li; H. Zhou; R. Liao; X. Li; |
2158 | Semantically-Informed Deep Neural Networks For Sound Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cognitive neuroscience research, however, suggests that human listeners automatically exploit higher-level semantic information on the sources besides acoustic information. Inspired by this notion, we introduce here a DNN that learns to recognize sounds and simultaneously learns the semantic relation between the sources (semDNN). |
M. Esposito; G. Valente; Y. Plasencia-Calaña; M. Dumontier; B. L. Giordano; E. Formisano; |
2159 | Semantic-Aware Gated Fusion Network For Interactive Colorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel interactive colorization network, which explicitly builds input-semantic correspondences with an attention mechanism and proposes a gated feature fusion module to balance the influences of global and local inputs. |
J. Zhang; Y. Xiao; Y. Zhenga; Z. Wang; C. -S. Leung; |
2160 | Semantic Centralized Contrastive Learning for Unsupervised Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Semantic Centralized Contrastive Hashing (SCCH) to allow the learned features closer to their semantic centers and more applicable to hashing. |
F. Liang; C. Fan; B. Xiao; K. Liang; |
2161 | Semantic Memory Guided Image Representation for Polyp Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a memory-based feature enhancement module to capture the cross-image contextual relations. |
Z. Yin; R. Wei; K. Liang; Y. Lin; W. Liu; Z. Ma; M. Min; J. Guo; |
2162 | Semantic Preprocessor for Image Compression for Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an image preprocessor that optimizes the input image for machine consumption prior to encoding by an off-the-shelf codec designed for human consumption. |
M. Yang; L. Herranz; F. Yang; L. Murn; M. G. Blanch; S. Wan; F. Yang; M. Mrak; |
2163 | Semantic-Preserving Augmentation for Robust Image-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel image-text retrieval technique, referred to as robust visual semantic embedding (RVSE), which consists of novel image-based and text-based augmentation techniques called semantic-preserving augmentation for image (SPAug-I) and text (SPAug-T). |
S. Kim; K. Shim; L. T. Nguyen; B. Shim; |
2164 | Semantic Preserving Learning for Task-Oriented Point Cloud Downsampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general semantic-preserved downsampling framework (SPDF) for point clouds by exploiting the rich knowledge inherent in the task network. |
J. Xiong; T. Dai; Y. Zha; X. Wang; S. -T. Xia; |
2165 | Semantics-Aware Gamma Correction for Unsupervised Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a semantics-aware yet unsupervised low-light enhancement model based on gamma correction. |
Y. -H. Chen; F. -C. Pan; Y. -C. Liao; J. -H. Kao; Y. -C. F. Wang; |
2166 | Semantics-Disentangled Contrastive Embedding for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel contrastive disentanglement learning framework for the GZSL task (SDCE-GZSL), where the original and generated visual features are factorized into semantic-consistent and semantic-unrelated representations via a novel mutual information (MI)-based constraint. |
J. Ni; Y. Liao; |
2167 | Semantics-Guided Object Removal for Facial Images: with Broad Applicability and Robust Style Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a Semantics-Guided Inpainting Network (SGIN), which is the invention of a desirable trade-off between those two methods that can be applied to any form of occluding mask while maintaining a consistent style and preserving high-fidelity details of the original image. |
J. Song; Y. Chang; S. Park; N. Kwak; |
2168 | SemGeo: Semantic Keywords for Cross-View Image Geo-Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Exploring information other than color statistics is important to enable geo-localization for them. We present SemGeo, a ground to aerial cross-view image geo-localization framework that considers semantic keywords from images while geo-localizing them. |
R. Rodrigues; M. Tani; |
2169 | Semi-Federated Learning for Edge Intelligence with Imperfect SIC Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a semi-federated learning (SemiFL) framework that allows computing-limited clients to collaboratively train a shared model with resource-abundant clients. |
W. Ni; J. Zheng; Y. C. Eldar; C. You; K. Huang; |
2170 | Semi-Supervised Contrastive Learning with Soft Mask Attention for Facial Action Unit Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel facial action unit (AU) detection method by simultaneously improving AU feature’s discriminative ability and alleviating the AU data scarcity problem. |
Z. Liu; R. Liu; Z. Shi; L. Liu; X. Mi; K. Murase; |
2171 | Semi-Supervised Domain Generalization with Graph-Based Classifier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to domain generalization, SSDG represents a realistic and challenging goal, which only requires a few labels from source domains. To tackle this problem, this work presents a novel pseudo-labeling method that facilitates incremental learning on a large amount of unlabeled data. |
M. Ye; Y. Zhang; S. Zhu; A. Xie; S. Xiang; |
2172 | Semi-Supervised Graph Ultra-Sparsifier Using Reweighted ℓ1 Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, graph sparsification cannot generate ultra-sparse graphs while simultaneously maintaining the performance of the GCN family. To address this problem, we propose Graph Ultra-sparsifier, a semi-supervised graph sparsifier with dynamically-updated regularization terms based on the graph convolution. |
J. Li; T. Zhang; S. Jin; R. Zafarani; |
2173 | Semi-Supervised Learning with Per-Class Adaptive Confidence Scores for Acoustic Environment Classification with Imbalanced Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we concentrate on the per-class accuracy of neural network-based classification in the context of identifying acoustic environments. |
L. V. Fiorio; B. Karanov; J. David; W. v. Houtum; F. Widdershoven; R. M. Aarts; |
2174 | Semi-Supervised Local Structured Feature Learning with Dynamic Maximum Entropy Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel semi-supervised dimensionality reduction method based on local structured feature learning with dynamic maximum entropy graph. |
R. Xu; X. Liang; |
2175 | Semi-Supervised Remote Sensing Image Change Detection Using Mean Teacher Model for Constructing Pseudo-Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a semi-supervised training that uses the mean teacher model to construct pseudo-labels to increase the generalizability of the model trained with a handful of data supervision. |
Z. Mao; X. Tong; Z. Luo; |
2176 | Semi-Supervised Semantic Segmentation with Structured Output Space Adaption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we creatively propose a method for semi-supervised semantic segmentation. |
W. Huang; F. Zhang; |
2177 | Semi-Supervised Sound Event Detection with Pre-Trained Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the semi-supervised SED task, and combine pre-trained model from other field to assist in improving the detection effect. |
L. Xu; L. Wang; S. Bi; H. Liu; J. Wang; |
2178 | Semi-Supervised Speech Enhancement Based On Speech Purity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the reality is most of them are a mix of both. In this paper, we propose a semi-supervised speech enhancement framework to enhance such typical speech datasets. |
Z. Cui; S. Zhang; Y. Chen; Y. Gao; C. Deng; J. Feng; |
2179 | Semi-Swinderain: Semi-Supervised Image Deraining Network Using SWIN Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Real rain image deraining is still a challenge. To solve this problem, we propose a semi-supervised image deraining network using Swin Transformer, which can both use features of synthetic data and real data to get a better result. |
C. Ren; D. Yan; Y. Cai; Y. Li; |
2180 | SENER: Sentiment Element Named Entity Recognition for Aspect-Based Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the generative methods are designed to capture sentence-level semantic information, they are inappropriate for explicit comprehension of sentiment structure. In order to address this issue, we propose sentiment element named entity recognition (SENER) for ABSA. |
S. -K. Lee; J. -H. Kim; |
2181 | Sensor Selection for Angle of Arrival Estimation Based on The Two-Target Cramér-Rao Bound Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to tackle the sensor selection problem for angle of arrival estimation using the worst-case Cramér-Rao bound of two uncorrelated sources. |
C. A. Kokke; M. Coutino; L. Anitori; R. Heusdens; G. Leus; |
2182 | SEPDIFF: Speech Separation Based on Denoising Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SepDiff – a monaural speech separation method based on the denoising diffusion model (diffusion model). |
B. Chen; C. Wu; W. Zhao; |
2183 | Sequence-Based Device-Free Gesture Recognition Framework for Multi-Channel Acoustic Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a time-sequence-based deep learning framework that can exploit the spatio-temporal information of sensing signals. |
Z. Yang; X. Wang; D. Xia; W. Wang; H. Dai; |
2184 | Sequential Datum–Wise Joint Feature Selection and Classification in The Presence of External Classifier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a supervised machine learning framework for sequential datum–wise joint feature selection and classification. |
S. P. Ekanayake; D. Zois; C. Chelmis; |
2185 | Sequential Invariant Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a new framework termed the sequential invariant information bottleneck (seq-IIB) to improve the generalization ability of learning agents in sequential environments. |
Y. Zhang; S. Yu; B. Chen; |
2186 | Seri: Sketching-Reasoning-Integrating Progressive Workflow for Empathetic Response Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Seri, a SkEtching-Reasoning-Integrating framework for empathetic response generation. |
G. Bi; Y. Cao; P. Li; Y. Xie; F. Fang; Z. Lin; |
2187 | S-Feature Pyramid Network and Attention Model for Drone Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, small drones can be misused for illegal activities and the threat from them is on the rise. Driven by this situation, we used data provided by the ICASSP Drone-vs-Bird detection Grand Challenge for drone detection and used the method of adding shallow feature pyramid network and attention model on SSD [1] (SFA-SSD) to solve the problem of drone detection in competition. |
P. Dong; C. Wang; Z. Lu; K. Zhang; W. Wan; J. Sun; |
2188 | SFEMGN: Image Denoising with Shallow Feature Enhancement Network and Multi-Scale ConvGRU Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an image denoising algorithm based on a feature enhancement network and multi-scale convGRU, named a shallow feature enhancement and multi-scale convGRU denoising network (SFEMGN), through an in-depth study of convolutional networks and GRU networks. |
Q. Wang; L. Guo; S. Ding; J. Zhang; X. Xu; |
2189 | SFR: Semantic-Aware Feature Rendering of Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a semantic-aware and task-oriented differentiable feature rendering (SFR), which reduces the information loss during projection by generating rendered images with more point cloud semantic information for downstream tasks. |
Y. Zha; R. Li; T. Dai; J. Xiong; X. Wang; S. -T. Xia; |
2190 | SG-VAD: Stochastic Gates Based Speech Activity Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel voice activity detection (VAD) model in a low-resource environment. |
J. Svirsky; O. Lindenbaum; |
2191 | Shadocnet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Transformer-based model for document shadow removal that utilizes shadow context encoding and decoding in both shadow and shadow-free regions. |
X. Chen; X. Cun; C. -M. Pun; S. Wang; |
2192 | Shadow Removal of Text Document Images Using Background Estimation and Adaptive Text Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a simple yet effective method to re-move shadows from text document images. |
W. Liu; B. Wang; J. Zheng; W. Wang; |
2193 | Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. |
S. M. Hernandez; D. Zhao; S. Ding; A. Bruguier; R. Prabhavalkar; T. N. Sainath; Y. He; I. McGraw; |
2194 | Shift to Your Device: Data Augmentation for Device-Independent Speaker Verification Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel Deconvolution-enhanced data Augmentation method, DeAug, for ultrasonic-based speaker verification anti-spoofing systems to detect the liveness of voice sources in physical access, which aims to improve the performance of liveness detection on unseen devices where no data is collected yet. |
J. Wang; L. Lu; Z. Ba; F. Lin; K. Ren; |
2195 | Short-Segment Speaker Verification Using ECAPA-TDNN with Multi-Resolution Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to incorporate multi-resolution time-domain information into the ECAPA-TDNN speaker verification system. |
S. Han; Y. Ahn; K. Kang; J. W. Shin; |
2196 | Show Me The Instruments: Musical Instrument Retrieval From Mixture Audio Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To search the musical instrument samples or virtual instruments that make one’s desired sound, music producers use their ears to listen and compare each instrument sample in their collection, which is time-consuming and inefficient. In this paper, we call this task as Musical Instrument Retrieval and propose a method for retrieving desired musical instruments using reference mixture audio as a query. |
K. Kim; M. Park; H. Joung; Y. Chae; Y. Hong; S. Go; K. Lee; |
2197 | Shuffleaugment: A Data Augmentation Method Using Time Shuffling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ShuffleAugment, a data augmentation method for speech processing that randomly shuffles data in the time direction. |
Y. Sato; N. Ikeda; H. Takahashi; |
2198 | Shuffled Autoregression for Motion Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When applied to motion interpolation, these deep learning methods have limited performance since they do not leverage the flexible dependencies between interpolation frames as the original geometric formulas do. To realize this interpolation characteristic, we propose a novel framework, referred to as Shuffled AutoRegression, which expands the autoregression to generate in arbitrary (shuffled) order and models any inter-frame dependencies as a directed acyclic graph. |
S. Huang; J. Jia; Z. Yang; W. Wang; H. Wu; Y. Yang; J. Xing; |
2199 | SIAST: A Slot Imbalance-Aware Self-Training Scheme for Semi-Supervised Slot Filling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods could exacerbate label imbalance during learning iterations, resulting in poorer performance in minority slot classes, which is crucial in many dialogue systems applications. To solve this, we propose a novel self-training scheme for imbalanced slot filling that aims to learn unbiased margins between slot classes while mitigating potential slot confusion, and adaptively samples pseudo-labelled data to balance the slot distribution of the training set. |
J. Liu; S. Xiong; Y. He; T. Zhou; L. Wang; X. Li; B. Xiao; |
2200 | Signal Analysis-Synthesis Using The Quantum Fourier Transform Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the development of Quantum Fourier transform (QFT) education tools in the object-oriented Java-DSP (J-DSP) simulation environment. |
A. Sharma; G. Uehara; V. Narayanaswamy; L. Miller; A. Spanias; |
2201 | Signal Processing And Quantum State Tomography on Noisy Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Quantum State Tomography (QST) is a fundamental tool for quantum signal processing. However, in real noisy quantum devices construction of the state’s density matrix via QST can … |
W. Shi; R. Malaney; |
2202 | Signal Processing Grand Challenge 2023 – E-Prevention: Sleep Behavior As An Indicator of Relapses in Psychotic Patients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the approach and results of USC SAIL’s submission to the Signal Processing Grand Challenge 2023 – e-Prevention (Task 2), on detecting relapses in psychotic patients. |
K. Avramidis; K. Adsul; D. Bose; S. Narayanan; |
2203 | Signal Processing On Product Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We establish a framework for signal processing on product spaces of simplicial and cellular complexes. |
T. M. Roddenberry; V. P. Grande; F. Frantzen; M. T. Schaub; S. Segarra; |
2204 | Signal Processing with Optical Quadratic Random Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this context, the possibility of performing data processing (such as pattern detection or classification) directly in the sketched domain without accessing the original data was previously achieved for linear random sketching methods and compressive sensing. In this work, we show how to estimate simple signal processing tasks (such as deducing local variations in a image) directly using random quadratic projections achieved by an optical processing unit. |
R. Delogne; V. Schellekens; L. Daudet; L. Jacques; |
2205 | Signal Reconstruction for FMCW Radar Interference Mitigation Using Deep Unfolding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Removal of frequency-modulated continuous wave (FMCW) interference by zeroing corrupted samples causes significant distortions and peak power losses in the range-Doppler map. Existing methods aim to diminish these distortions by utilizing data from one dimension to reconstruct the corrupted samples, which do not perform well when a large number of samples are interfered and have difficulty recovering weak target signals.In this paper, model-based deep learning interference mitigation algorithms, called ALISTA and ALFISTA, are presented that reduce these artifacts by leveraging the full integration gain using all uncorrupted fast-time and slow-time samples. |
J. Overdevest; A. G. C. Koppelaar; M. J. G. Bekooij; J. Youn; R. J. G. v. Sloun; |
2206 | Sign Language Recognition Via Deformable 3D Convolutions and Modulated Graph Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address SI isolated SLR from RGB video, proposing an innovative deep-learning framework that leverages multi-modal appearanceand skeleton-based information. |
K. Papadimitriou; G. Potamianos; |
2207 | SIGVIC: Spatial Importance Guided Variable-Rate Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask. |
J. Liang; M. Liu; C. Yao; C. Lin; Y. Zhao; |
2208 | Similarity Relation Preserving Cross-Modal Learning for Multispectral Pedestrian Detection Against Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new method that can preserve the similarity relation between candidates against adversarial attacks using multispectral knowledge. |
J. U. Kim; Y. Man Ro; |
2209 | Simple Pooling Front-Ends for Efficient Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that instead of reducing model size using complex methods, eliminating the temporal redundancy in the input audio features (e.g., mel-spectrogram) could be an effective approach for efficient audio classification. |
X. Liu; H. Liu; Q. Kong; X. Mei; M. D. Plumbley; W. Wang; |
2210 | Simplicial Vector Autoregressive Model For Streaming Edge Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simplicial VAR model to mitigate the curse of dimensionality of the VAR models when the time series are defined over higher-order network structures such as edges, triangles, etc. |
J. Krishnan; R. Money; B. Beferull-Lozano; E. Isufi; |
2211 | Simulating Realistic Speech Overlaps Improves Multi-Talker ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an improved technique to simulate multi-talker overlap-ping speech with realistic speech overlaps, where an arbitrary pattern of speech overlaps is represented by a sequence of discrete tokens. |
M. Yang; N. Kanda; X. Wang; J. Wu; S. Sivasankaran; Z. Chen; J. Li; T. Yoshioka; |
2212 | Simultaneous Acoustic Echo Sorting and 3-D Room Geometry Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A low-complexity iterative method is proposed for determining the boundaries in convex polygonal rooms using common tangent planes to sets of ellipsoids. |
K. MacWilliam; F. Elvander; T. v. Waterschoot; |
2213 | Simultaneous Estimation of Direction of Arrival and Sound Speed Using A Non-Uniform Sensor Array Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As described herein, a newly proposed algorithm can estimate DOA and sound speed simultaneously. |
R. Nishimura; K. Takizawa; |
2214 | Simultaneously Learning Robust Audio Embeddings and Balanced Hash Codes for Query-by-Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods generate imbalanced hash codes, leading to their suboptimal performance. Therefore, we propose a self-supervised learning framework to compute fingerprints and balanced hash codes in an end-to-end manner to achieve both fast and accurate retrieval performance. |
A. Singh; K. Demuynck; V. Arora; |
2215 | Simultaneous Reconstruction and Uncertainty Quantification for Tomography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, in the absence of ground truth, quantifying the solution quality is highly desirable but under-explored. In this work, we address this challenge through Gaussian process modeling to flexibly and explicitly incorporate prior knowledge of sample features and experimental noises through the choices of the kernels and noise models. |
A. Dasgupta; C. Graziani; Z. W. Di; |
2216 | SINCO: A Novel Structural Regularizer for Image Compression Using Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present structural regularization for INR compression (SINCO) as a novel INR method for image compression. |
H. Gao; W. Gan; Z. Sun; U. S. Kamilov; |
2217 | Sine: Similarity-Regularized Intra-Class Exploitation for Cross-Granularity Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we locate the root cause of the intrinsic conflict. |
J. Yang; H. Yang; |
2218 | Singing Voice Synthesis Based on A Musical Note Position-Aware Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS). |
Y. Hono; K. Hashimoto; Y. Nankaku; K. Tokuda; |
2219 | Single-Anchor UWB Localization Using Channel Impulse Response Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on single-anchor UWB localization methods that learn statistical features of the channel impulse response (CIR) in different location areas using a Gaussian mixture model (GMM). |
S. Li; A. Balatsoukas-Stimming; A. Burg; |
2220 | Single-branch Network for Multimodal Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. |
M. S. Saeed; S. Nawaz; M. H. Khan; M. Zaigham Zaheer; K. Nandakumar; M. H. Yousaf; A. Mahmood; |
2221 | Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to extend the deep, complex U-Network architecture for speech enhancement by incorporating a probabilistic (i.e., variational) latent space model. |
E. J. Nustede; J. Anemüller; |
2222 | Single Domain Dynamic Generalization for Iris Presentation Attack Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, we usually face a more realistic scenario with only one single domain available for training. To tackle the above issues, we propose a Single Domain Dynamic Generalization (SDDG) framework, which simultaneously exploits domain-invariant and domain-specific features on a per-sample basis and learns to generalize to various unseen domains with numerous natural images. |
Y. Li; J. Wang; Y. Chen; D. Xiey; S. Pu; |
2223 | Single-Particle Tracking By Graph Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Third, the high density of molecules causes frequent ID switches. Therefore, we propose Particle Tracking via Graph Transformer (PTGT), which takes into account the relationships among molecules, to solve these problems. |
S. Kamiya; K. Hotta; T. -a. Tsunoyama; A. Kusumi; |
2224 | Single-Photon Image Super-Resolution Via Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By extending Equivariant Imaging (EI) to volumetric single-photon data, we propose a self-supervised learning framework for the SPISR task. |
Y. Chen; C. Jiang; Y. Pan; |
2225 | Single-Sample Direction-of-Arrival Estimation for Fast and Robust 3D Localization With Real Measurements from A Massive MIMO System Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Fast, robust, high-accuracy localization is a key enabler for future location-aware applications in streetscape communication networks and next-generation networked autonomous … |
S. Mazokha; S. Naderi; G. I. Orfanidis; G. Sklivanitis; D. A. Pados; J. O. Hallstrom; |
2226 | Single-Shot Domain Adaptation Via Target-Aware Generative Augmentations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the challenging setting of single-shot adaptation and explore the design of augmentation strategies. |
R. Subramanyam; K. Thopalli; S. Berman; P. Turaga; J. J. Thiagarajan; |
2227 | Single-Shot Fractional Fourier Phase Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an efficient phase retrieval technique from the single fractional Fourier transform (FrFT) magnitude measurement. |
Y. Yang; R. Tao; |
2228 | SingNet: A Real-time Singing Voice Beat and Downbeat Tracking System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel dynamic particle filtering approach that incorporates offline historical data to correct the online inference by using a variable number of particles. |
M. Heydari; J. -C. Wang; Z. Duan; |
2229 | Sinusoidal Frequency Estimation By Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents a technique for joint sinusoidal frequency and amplitude estimation using the Wirtinger derivatives of a complex exponential surrogate and any first order gradient-based optimizer, enabling end-to-end training of neural network controllers for unconstrained sinusoidal models. |
B. Hayes; C. Saitis; G. Fazekas; |
2230 | Sketch Less Face Image Retrieval: A New Challenge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we proposed a new task named sketch less face image retrieval (SLFIR), in which the retrieval was carried out at each stroke and aim to retrieve the target face photo using a partial sketch with as few strokes as possible (see Fig. 1). |
D. Dai; Y. Li; L. Wang; S. Fu; S. Xia; G. Wang; |
2231 | Skillnet-NLG: General-Purpose Natural Language Generation with A Sparsely Activated Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SkillNet-NLG, a sparsely activated approach that handles many natural language generation tasks with one model. |
J. Liao; D. Tang; F. Zhang; S. Shi; |
2232 | SLBERT: A Novel Pre-Training Framework for Joint Speech and Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SLBERT (Speech and Language pre-training framework for BERT), an end-to-end trainable framework for learning joint representations of speech and language modalities. |
O. Susladkar; P. Gatti; S. Kumar Yadav; |
2233 | SLICER: Learning Universal Audio Representations Using Low-Resource Self-Supervised Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification. |
A. Seth; S. Ghosh; S. Umesh; D. Manocha; |
2234 | SL-MoE: A Two-Stage Mixture-of-Experts Sequence Learning Framework for Forecasting Rapid Intensification of Tropical Cyclone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to its contingency, there is a great fluctuate in the class distribution between positive (RI) and negative (non-RI) samples each year. To address the above issues, we propose a novel two-stage mixture-of-experts sequence learning framework (SL-MoE) which aims to solve the class imbalanced distribution with decoupled two learning stages, thereby boosting RI forecast: (1) in the representation learning stage, the shared sequence learning backbone is trained to extract general features from class imbalanced data and the TC life flag is included to lessen the impact of the TC dying period. |
J. Xu; Y. Lei; G. Zhu; Y. Feng; B. Xiao; Q. Qian; Y. Xu; |
2235 | Slot-Triggered Contextual Biasing For Personalized Speech Recognition Using Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method whereby the E2E ASR model is trained to emit opening and closing tags around slot content which are used to both selectively enable biasing and decide which catalog to use for biasing. |
S. Tong; P. Harding; S. Wiesler; |
2236 | Small-Footprint Slimmable Networks for Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Slimmable Neural Networks applied to the problem of small-footprint keyword spotting. |
Z. Akhtar; M. O. Khursheed; D. Du; Y. Liu; |
2237 | Smart Split-Federated Learning Over Noisy Channels for Embryo Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the effects of noise in the communication channels on the learning process and the quality of the final model. |
Z. H. Kafshgari; I. V. Bajić; P. Saeedi; |
2238 | SMCL: Saliency Masked Contrastive Learning for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. |
S. Park; S. -W. Hwang; J. So; |
2239 | Smoothing Complex-Valued Signals on Graphs with Monte-Carlo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce new smoothing estimators for complex signals on graphs, based on a recently studied Determinantal Point Process (DPP). |
H. Jaquard; M. Fanuel; P. -O. Amblard; R. Bardenet; S. Barthelmé; N. Tremblay; |
2240 | Smoothing Point Adjustment-Based Evaluation of Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes smoothing point adjustment, a novel range-based evaluation protocol for time series anomaly detection. |
M. Liu; Y. Wang; H. Xu; X. Zhou; B. Li; Y. Wang; |
2241 | SMUG: Towards Robust Mri Reconstruction By Smoothed Unrolling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This raises the question of how to design robust DL methods for MRI reconstruction. To address this problem, we propose a novel image reconstruction framework, termed SMOOTHED UNROLLING (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning operation. |
H. Li; J. Jia; S. Liang; Y. Yao; S. Ravishankar; S. Liu; |
2242 | Soft 2D-to-3D Delivery Using Deep Graph Neural Networks for Holographic-Type Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel soft delivery scheme to realize efficient 3D content delivery. |
T. Fujihashi; T. Koike-Akino; T. Watanabe; |
2243 | Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we show how soft dynamic time warping (SoftDTW), a differentiable variant of classical DTW, can be used as an alternative to CTC. |
M. Krause; C. Weiß; M. Müller; |
2244 | Soft Label Coding for End-to-end Sound Source Localization with Ad-hoc Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new soft label coding method, named label smoothing, for the classification-based two-dimensional sound source location with ad-hoc microphone arrays. |
L. Feng; Y. Gong; X. -L. Zhang; |
2245 | Solving Audio Inverse Problems with A Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. |
E. Moliner; J. Lehtinen; V. Välimäki; |
2246 | Solving Jigsaw Puzzle of Large Eroded Gaps Using Puzzlet Discriminant Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we solve Jigsaw Puzzles of Large Eroded Gaps (JPLEG), where boundary similarities are weak and image semantics are the only feasible clues. |
X. Song; X. Yang; J. Ren; R. Bai; X. Jiang; |
2247 | Sora: Scalable Black-Box Reachability Analyser on Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work on robustness verification of DNNs not only lacks scalability but also requires severe restrictions on the architecture (layers, activation functions, etc.). To address these limitations, we propose a novel framework, SORA, for scalable blackbox reachability analysis of DNNs. |
P. Xu; F. Wang; W. Ruan; C. Zhang; X. Huang; |
2248 | Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To realize a fast and pitch- controllable high-fidelity neural vocoder, we introduce the source-filter theory into HiFi-GAN by hierarchically conditioning the resonance filtering network on a well-estimated source excitation information. |
R. Yoneyama; Y. -C. Wu; T. Toda; |
2249 | Source-Free Unsupervised Domain Adaptation for Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Source-Free Domain Adaptation Framework for QA (denoted as SFQA), which only allows access to trained source models for target learning, making data privacy protection more promising. |
Z. Zhao; Y. Xie; J. Xie; Z. Lin; Y. Li; Y. Shen; |
2250 | Source Localization for Extremely Large-Scale Antenna Arrays with Spatial Non-Stationarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that under the exact steering vector model of ELAA, the eigenvectors of the signal subspace and the steering vectors are approximately collinear in most scenarios. |
X. Wu; J. Sun; X. Jia; S. Wang; |
2251 | Space-Time Graph Neural Networks with Stochastic Graph Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: ST-GNNs are particularly useful in multi-agent systems, due to their stability properties and their ability to respect communication delays between the agents. In this paper we revisit the stability properties of ST-GNNs and prove that they are stable to stochastic graph perturbations. |
S. Hadou; C. I. Kanatsoulis; A. Ribeiro; |
2252 | Space-Time Variable Density Samplings for Sparse Bandlimited Graph Signals Driven By Diffusion Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a sampling framework consisting of selecting a small subset of space-time nodes at random according to some probability distribution, generalizing the classical variable density sampling to the heat diffusion field. |
Q. Yao; L. Huang; S. Tang; |
2253 | SPADE: Self-Supervised Pretraining for Acoustic Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a self-supervised approach to disentangle room acoustics from speech and use the acoustic representation on the downstream task of device arbitration. |
J. Harvill; J. Barber; A. Nair; R. Pishehvar; |
2254 | Spammer Detection on Short Video Applications: A New Challenge and Baselines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new challenge of spammer detection on short video applications, where the multi-modal information of videos and reviews plays a more critical role than the spam relation graph. |
M. Yi; D. Liang; R. Wang; Y. Ding; H. Lu; |
2255 | Sparse Aggregation-Based Channel Estimation For Massive Mimo Systems With Decentralized Baseband Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cope with the bottlenecks of the high computational complexity and excessive inter-connection communication in the conventional centralized baseband processing architecture, the decentralized baseband processing (DBP) architecture has been proposed, where the antennas are partitioned into multiple clusters, each connected to a local baseband unit (BBU). In this paper, we are interested in the distributed channel estimation (DCE) method under such DBP architecture, which is rarely studied in the literature. |
Y. Xu; E. Song; Q. Shi; T. -H. Chang; |
2256 | Sparse and Structured Modelling of Underwater Acoustic Channel Impulse Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider real-time modelling of an underwater acoustic channel impulse response (CIR), exploiting the inherent structure and sparsity of such channels. |
C. Yang; Q. Ling; X. Sheng; M. Mu; A. Jakobsson; |
2257 | Sparse Asynchronous Samples from Networks of Tems for Reconstruction of Classes of Non-Bandlimited Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a signal driven multi-channel time encoding system for sampling signals with finite rate of innovation (FRI). |
M. Hilton; P. L. Dragotti; |
2258 | Sparse Bayesian Learning Assisted Decision Fusion in Millimeter Wave Massive MIMO Sensor Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present low-complexity fusion rules based on the hybrid combining architecture for the considered framework. |
A. Chawla; D. Ciuonzo; P. S. Rossi; |
2259 | Sparse Bayesian Learning Based Three-Dimensional Imaging for Antenna Array Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a 3-D imaging method based on a sparse Bayesian learning(SBL) framework for antenna array radar. |
Y. Li; J. R. Jensen; M. Fu; Z. Deng; M. G. Christensen; |
2260 | Sparse Black-Box Inversion Attack with Limited Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, query-based black-box inversion attacks still suffer from low image quality and high computational costs. To bridge these gaps, in this paper, we propose BMI-S, a sparse black-box inversion attack against face recognition models. |
Y. Xu; X. Liu; T. Hu; B. Xin; R. Yang; |
2261 | Sparse Convolution Based Octree Feature Propagation for Lidar Point Cloud Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This nature of point cloud data presents a considerable challenge to not only store but also understand and extract the topology of object(s) from the point cloud data. In this regard, our work presents a point cloud compression procedure that leverages sparse 3D convolutions to extract features at various octree scales for lossless compression of octree representation of point clouds. |
M. A. Lodhi; J. Pang; D. Tian; |
2262 | Sparse Delay-Doppler Channel Estimation for OTFS Modulation Using 2D-Music Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of estimating the delays and Doppler shifts introduced by a sparse wireless channel for orthogonal time frequency space (OTFS) modulation. |
A. S. Bondre; C. D. Richmond; A. Alkhateeb; N. Michelusi; |
2263 | Sparse Error Correction for Power Network Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage the sparse nature of parameter errors and propose an iterative greedy algorithm for nonlinear sparse error correction of parameter errors based on SCADA measurements. |
D. Senaratne; J. Kim; |
2264 | Sparse Graph Learning with Spectrum Prior for Deep Graph Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a sparse graph learning algorithm incorporating a new spectrum prior to compute a graph topology that circumvents over-smoothing while preserving pairwise correlations inherent in data. |
J. Zeng; Y. Liu; G. Cheung; W. Hu; |
2265 | Sparse Mixture Once-for-all Adversarial Training for Efficient In-situ Trade-off Between Accuracy and Robustness of DNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we present a sparse mixture once for all adversarial training (SMART), that allows a model to train once and then in-situ trade-off between accuracy and robustness, that too at a reduced compute and parameter overhead. |
S. Kundu; S. Sundaresan; S. N. Sridhar; S. Lu; H. Tang; P. A. Beerel; |
2266 | Sparse Non-Contact Multiple People Localization and Vital Signs Monitoring Via FMCW Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our approach offers excellent performance in a medical application where high accuracy is required. |
Y. Eder; Z. Liu; Y. C. Eldar; |
2267 | Sparse Representations with Cone Atoms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We extend the notion of sparse representation to the case where the atoms are not vectors, but cones, hence infinite sets. |
D. C. Ilie-Ablachim; A. Băltoiu; B. Dumitrescu; |
2268 | Sparsity Constraint Implementation for The Joint Eigenvalue Decomposition of Matrices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, the subset sequence is arbitrarily chosen and remains the same through the iterations. In this paper, we propose a more general algorithmic framework that allows to overcome these limitations. |
R. André; X. Luciani; |
2269 | Sparsity-Driven Joint Blind Deconvolution-Demodulation with Application to Motor Fault Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a sparsity-driven joint blind deconvolution-demodulation approach to extract small fault signatures of motors operating at a varying load. |
V. A. Kelkar; D. Liu; H. Inoue; M. Kanemaru; |
2270 | Sparsity-Smoothness-Aware Power Spectral Density Estimation with Application to Phased Array Weather Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a sparsity and smoothness regularized model for the estimation of nonnegative power spectral densities (PSDs) of complex-valued random processes from mixtures of realizations. |
H. Kuroda; D. Kitahara; E. Yoshikawa; H. Kikuchi; T. Ushio; |
2271 | SPASHT: Semantic and Pragmatic Speech Features for Automatic Assessment of Autism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the semantic and pragmatic language features in children with autism (CwA) to understand their significance in the diagnosis of autism. |
A. B; V. Narayan; J. Shukla; |
2272 | Spatial Active Noise Control Method Based on Sound Field Interpolation from Reference Microphone Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to interpolate the sound field using reference microphones, which are normally placed outside the target region, instead of the error microphones. |
K. Arikawa; S. Koyama; H. Saruwatari; |
2273 | Spatial Correlation Fusion Network for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Spatial Correlation Fusion Network(SCFNet) for few-shot segmentation to address the issues. |
X. Wang; W. Huang; W. Yang; Q. Liao; |
2274 | Spatial Cross-Attention for Transformer-Based Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel cross-attention architecture that utilizes spatial information from coordinate differences between relevant image patches. |
K. Anh Ngo; K. Shim; B. Shim; |
2275 | Spatial-Domain Object Detection Under Mimo-Fmcw Automotive Radar Interference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers spatial-domain detector design for mutual interference mitigation among automotive MIMO-FMCW radars. |
S. Jin; P. Wang; P. Boufounos; R. Takahashi; S. Roy; |
2276 | Spatial Graph Signal Interpolation with An Application for Merging BCI Datasets with Various Dimensionalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a spatial graph signal interpolation technique, that allows to interpolate efficiently multiple electrodes. |
Y. E. Ouahidi; L. Drumetz; G. Lioi; N. Farrugia; B. Pasdeloup; V. Gripon; |
2277 | Spatial Inference Using Censored Multiple Testing with Fdr Control Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A wireless sensor network performs spatial inference on a physical phenomenon of interest. The areas in which this phenomenon exhibits interesting or anomalous behavior are … |
M. Gölz; A. M. Zoubir; V. Koivunen; |
2278 | Spatially Informed Independent Vector Analysis for Source Extraction Based on The Convolutive Transfer Function Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The underlying reason is that those methods are derived based on the multiplicative transfer function model with a rank-1 assumption, which does not hold true if reverberation is strong. To circumvent this issue, this paper proposes to use the convolutive transfer function (CTF) model to improve the source extraction performance and develop a spatially informed IVA algorithm. |
X. Wang; A. Brendel; G. Huang; Y. Yang; W. Kellermann; J. Chen; |
2279 | Spatially Selective Deep Non-Linear Filters For Speaker Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a deep joint spatial-spectral non-linear filter that can be steered to an arbitrary target direction. |
K. Tesch; T. Gerkmann; |
2280 | Spatial Similarity Guidance for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the challenging issue of few-shot segmentation. |
X. Luo; Z. Duan; T. Zhang; |
2281 | Spatial-Temporal Graph Convolutional Network Boosted Flow-Frame Prediction For Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose STGCN-FFP, an unsupervised Spatial-Temporal Graph Convolutional Networks (STGCN) boosted Flow-Frame Prediction model. |
K. Cheng; X. Zeng; Y. Liu; M. Zhao; C. Pang; X. Hu; |
2282 | Spatio-Temporal Attention in Multi-Granular Brain Chronnectomes For Detection of Autism Spectrum Disorder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce IMAGIN, a multI-granular, Multi-Atlas spatio-temporal attention Graph Isomorphism Network, which, which we use to learn graph representations of dynamic functional brain connectivity (chronnectome), as opposed to static connectivity (connectome). |
J. Orme-Rogers; A. Srivastava; |
2283 | Spatio-Temporal Hybrid Fusion of CAE and SWin Transformers for Lung Cancer Malignancy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper proposes a novel hybrid discovery Radiomics framework that simultaneously integrates temporal and spatial features extracted from non-thin chest Computed Tomography (CT) slices to predict Lung Adenocarcinoma (LUAC) malignancy with minimum expert involvement. |
S. Khademi; S. Heidarian; P. Afshar; F. Naderkhani; A. Oikonomou; K. N. Plataniotis; A. Mohammadi; |
2284 | Spatio-Temporal Structure Consistency for Semi-Supervised Medical Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To efficiently leverage abundant unlabeled data, we propose a novel Spatio-Temporal Structure Consistent (STSC) learning framework to combine both spatial and temporal structure consistency. |
W. Lei; L. Liu; L. Liu; |
2285 | Speakeraugment: Data Augmentation for Generalizable Source Separation Via Speaker Parameter Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SpeakerAugment (SA), a data augmentation method for generalizable speech separation that aims to increase the diversity of speaker identity in training data, to mitigate speaker mismatch of domain mismatch. |
K. Wang; Y. Yang; H. Huang; Y. Hu; S. Li; |
2286 | Speaker-Aware Hierarchical Transformer For Personality Recognition In Multiparty Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we create a multiparty dialogue-based personality dataset derived from CPED containing 1,195 data samples. |
W. Han; Y. Chen; X. Xing; G. Zhou; X. Xu; |
2287 | Speaker Change Detection For Transformer Transducer ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing SCD solutions either require additional ensemble for the time based decisions and recognized word sequences, or implement a tight integration between ASR and SCD, limiting the potential optimum performance for both tasks. To address these issues, we propose a novel framework for the SCD task, where an additional SCD module is built on top of an existing Transformer Transducer ASR (TT-ASR) network. |
J. Wu; Z. Chen; M. Hu; X. Xiao; J. Li; |
2288 | Speaker Diaphragm Excursion Prediction: Deep Attention and Online Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes efficient DL solutions to accurately model and predict the nonlinear excursion, which is challenging for conventional solutions. |
Y. Ren; M. Zivney; Y. Huang; E. Choy; C. Patel; H. Xu; |
2289 | Speaker-Independent Acoustic-to-Articulatory Speech Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we build an acoustic-to-articulatory inversion (AAI) model that leverages autoregression, adversarial training, and self supervision to generalize to unseen speakers. |
P. Wu; L. -W. Chen; C. J. Cho; S. Watanabe; L. Goldstein; A. W. Black; G. K. Anumanchipalli; |
2290 | Speaker Recognition with Two-Step Multi-Modal Deep Cleansing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a two-step audio-visual deep cleansing framework to eliminate the effect of noisy labels in speaker representation learning. |
R. Tao; K. A. Lee; Z. Shi; H. Li; |
2291 | Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the method of spectral clustering-aware learning of embeddings (SCALE) to address the mismatch. |
E. P. C. Lee; G. Sun; C. Zhang; P. C. Woodland; |
2292 | Spectral Super-Resolution on The Unit Circle Via Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a nonconvex method composed of a Hankel-Toeplitz matrix factorization model and a gradient descent algorithm termed as HT-GD. |
X. Wu; Z. Yang; J. -F. Cai; Z. Xu; |
2293 | SPECTRANET-SO(3): Learning Satellite Orientation from Optical Spectra By Implicitly Modeling Mutually Exclusive Probability Distributions on The Rotation Manifold Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the utility of using the recently proposed Implicit-PDF network to implicitly learn challenging probability distributions associated with satellite orientations using only raw optical spectra as input. |
M. Phelps; T. Swindle; J. Z. Gazak; A. Vandenberg; J. Fletcher; |
2294 | Spectro-Temporal Post-Filtering Via Short-Time Target Cancellation for Directional Speech Enhancement in A Dual-Microphone Hearing AID Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The evaluation is carried out in both semi-anechoic and reverberant rooms using instrumental measures of speech enhancement. As demonstrated in this brief study, our STTC spatial spectro-temporal filtering can be implemented with hearing aid microphones and can be used to enhance established hearing aid processing. |
M. A. Cantu; V. Hohmann; |
2295 | Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input. |
H. Lu; N. Li; T. Song; L. Wang; J. Dang; X. Wang; S. Zhang; |
2296 | Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to attentive pooling based on correlations between the representations’ coefficients combined with label smoothing, a method aiming to reduce the confidence of the classifier on the training labels. |
S. Kakouros; T. Stafylakis; L. Mošner; L. Burget; |
2297 | Speech Dereverberation with A Reverberation Time Shortening Target Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. |
R. Zhou; W. Zhu; X. Li; |
2298 | Speech Emotion Recognition Based on Low-Level Auto-Extracted Time-Frequency Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel low-level feature extraction method based on the Time-Frequency Attention (TFA) module and Time-Frequency Weighting (TFW) module. |
K. Liu; J. Hu; J. Feng; |
2299 | Speech Emotion Recognition Via Heterogeneous Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-level attention method to effectively learn the heterogeneous information from the hand-crafted feature (MFCC) and the feature (W2V2) extracted from the pre-trained model. |
K. Liu; D. Wu; D. Wang; J. Feng; |
2300 | Speech Emotion Recognition Via Two-Stream Pooling Attention With Discriminative Channel Weighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to learn effective emotion-related information from two feature views. |
K. Liu; D. Wang; D. Wu; J. Feng; |
2301 | Speech Enhancement with Intelligent Neural Homomorphic Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a neural source filter network for speech enhancement. |
S. He; W. Rao; J. Liu; J. Chen; Y. Ju; X. Zhang; Y. Wang; S. Shang; |
2302 | Speech Intelligibility Classifiers from 550k Disordered Speech Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale. We trained three models following different deep learning approaches and evaluated them on ~ 94K utterances from 100 speakers. |
S. Venugopalan; J. Tobin; S. J. Yang; K. Seaver; R. J. N. Cave; P. -P. Jiang; N. Zeghidour; R. Heywood; J. Green; M. P. Brenner; |
2303 | Speechlmscore: Evaluating Speech Generation Using Speech Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SpeechLMScore, an unsupervised metric to evaluate generated speech using a speech language model. |
S. Maiti; Y. Peng; T. Saeki; S. Watanabe; |
2304 | Speech Modeling with A Hierarchical Transformer Dynamical VAE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to model speech signals with the Hierarchical Transformer DVAE (HiT-DVAE), which is a DVAE with two levels of latent variable (sequence-wise and frame-wise) and in which the temporal dependencies are implemented with the Transformer architecture. |
X. Lin; X. Bie; S. Leglaive; L. Girin; X. Alameda-Pineda; |
2305 | Speech MOS Multi-Task Learning and Rater Bias Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a multitask framework to include additional labels and data in training to improve the performance of a blind MOS estimation model. |
H. Akrami; H. Gamper; |
2306 | Speech Privacy Leakage from Shared Gradients in Distributed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore methods for recovering private speech/speaker information from the shared gradients in distributed learning settings. |
Z. Li; J. Zhang; J. Liu; |
2307 | Speech Reconstruction from Silent Tongue and Lip Articulation By Pseudo Target Generation and Domain Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to employ a method built on pseudo target generation and domain adversarial training with an iterative training strategy to improve the intelligibility and naturalness of the speech recovered from silent tongue and lip articulation. |
R. -C. Zheng; Y. Ai; Z. -H. Ling; |
2308 | Speech Separation with Large-Scale Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend the exploration of the SSL-based SS by massively scaling up both the pre-training data (more than 300K hours) and fine-tuning data (10K hours). |
Z. Chen; N. Kanda; J. Wu; Y. Wu; X. Wang; T. Yoshioka; J. Li; S. Sivasankaran; S. E. Eskimez; |
2309 | Speech Signal Improvement Using Causal Generative Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. |
J. Richter; S. Welker; J. -M. Lemercier; B. Lay; T. Peer; T. Gerkmann; |
2310 | Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a speech summarization system that enables E2E summarization from 100 seconds, which is the limit of the conventional method, to up to 10 minutes (i.e., the duration of typical instructional videos on YouTube). |
T. Kano; A. Ogawa; M. Delcroix; R. Sharma; K. Matsuura; S. Watanabe; |
2311 | Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose to employ a novel bidirectional attention mechanism (BiAM) to jointly learn both ASR encoder (bottom layers) and text encoder with a multi-modal learning method. |
Y. Yang; H. Xu; H. Huang; E. S. Chng; S. Li; |
2312 | Spherical Sector Harmonics Based Soundfield Radial Extrapolation And Robustness Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a radial extrapolation method for spherical sector soundfields. |
H. Bi; T. D. Abhayapala; F. Ma; P. N. Samarasinghe; |
2313 | Spherical Vector Quantization for Spatial Direction Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address in this paper the problem of designing high bit-rate spherical vector quantization (VQ) to achieve transparent joint coding of angular values (azimuth and elevation) representing source directions in parametric spatial audio coding. |
S. Ragot; A. Vasilache; |
2314 | Spice+: Evaluation of Automatic Audio Captioning Systems with Pre-Trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SPICE+, a modification of SPICE that improves caption annotation and comparison with pre-trained language models. |
F. Gontier; R. Serizel; C. Cerisara; |
2315 | Spike-Based Optical Flow Estimation Via Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, only using the flow reconstruction loss is unable to effectively deal with the details of motion, which may lead to noise and blur in the estimated flow fields. To address this issue, we introduce a contrastive loss into spike-based optical flow estimation, which exploits both the information of positive samples and negative samples. |
M. Zhai; K. Ni; J. Xie; H. Gao; |
2316 | Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To make better use of pairs of bona fide and spoofed data, this study introduces a contrastive feature loss that can be plugged into the standard training criterion. |
X. Wang; J. Yamagishi; |
2317 | Spteae: A Soft Prompt Transfer Model for Zero-Shot Cross-Lingual Event Argument Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are some problems with these prompts, including using suboptimal prompts and difficult to transfer from source language to target language. To overcome these issues, we propose a method called SPTEAE(A Soft Prompt Transfer model for zero-shot cross-lingual Event Argument Extraction). |
H. Ma; Q. Tang; N. Zhang; R. Xu; Y. Shao; W. Yan; Y. Wang; |
2318 | SQA: Strong Guidance Query with Self-Selected Attention for Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a strong guidance query model with self-selected attention called SQA. |
F. Zhang; L. Sheng; B. Guo; R. Chen; J. Chen; |
2319 | SQuId: Measuring Speech Naturalness in Many Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This incurs heavy costs and slows down the development process, especially in heavily multilingual applications where recruiting and polling annotators can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 locales—the largest effort of this type to date. |
T. Sellam; A. Bapna; J. Camp; D. Mackinnon; A. P. Parikh; J. Riesa; |
2320 | SR-init: An Interpretable Layer Pruning Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the interpretation of existing pruning criteria is always overlooked. To counter this issue, we propose a novel layer pruning method by exploring the Stochastic Re-initialization. |
H. Tang; Y. Lu; Q. Xuan; |
2321 | SRTNET: Time Domain Speech Enhancement Via Stochastic Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use the diffusion model as a module for stochastic refinement. |
Z. Qiu; M. Fu; Y. Yu; L. Yin; F. Sun; H. Huang; |
2322 | SS-ADMM: Stationary and Sparse Granger Causal Discovery for Cortico-Muscular Coupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, inferring significant causal relationships between motor cortex electroencephalogram (EEG) and surface electromyogram (sEMG) of concurrently active muscles is challenging since relevant processes involved in muscle control are relatively weak compared to additive noise and background activities. In this paper, a framework for identification of cortico-muscular linear time invariant communication is proposed that simultaneously estimates model order and its parameters by enforcing sparsity and stationarity conditions in a convex optimization program. |
F. Abbas; V. McClelland; Z. Cvetkovic; W. Dai; |
2323 | SSGD: A Smartphone Screen Glass Dataset for Defect Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While it is desirable to develop effective defect detection technologies to optimize the automatic touch screen production lines, the development of these technologies suffers from the lack of publicly available datasets. To address this issue, we in this paper propose a dedicated touch screen glass defect dataset which includes seven types of defects and consists of 2504 images captured in various scenarios. |
H. Han; R. Yang; S. Li; R. Hu; X. Li; |
2324 | SSI-Net: A Multi-Stage Speech Signal Improvement System for ICASSP 2023 SSI Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the speech signal improvement network (SSI-Net) submitted to the ICASSP 2023 SSI Challenge, which satisfies the real-time condition. |
W. Zhu; Z. Wang; J. Lin; C. Zeng; T. Yu; |
2325 | SSVMR: Saliency-Based Self-Training for Video-Music Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel saliency-based self-training framework, which is termed SSVMR. |
X. Cheng; Z. Zhu; H. Li; Y. Li; Y. Zou; |
2326 | ST360IQ: No-Reference Omnidirectional Image Quality Assessment With Spherical Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we present a method for no-reference 360° image quality assessment. |
N. J. Tofighi; M. Hedi Elfkir; N. Imamoglu; C. Ozcinar; E. Erdem; A. Erdem; |
2327 | Stabilising and Accelerating Light Gated Recurrent Units for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the unbounded nature of its rectified linear unit on the candidate recurrent gate induces a gradient exploding phenomenon disrupting the training process and preventing it from being applied to medium to large ASR datasets. In this paper, we theoretically and empirically derive the necessary conditions for its stability as well as engineering mechanisms to speed up by a factor of five its training time, hence introducing a novel version of this architecture named SLi-GRU. |
A. Moumen; T. Parcollet; |
2328 | Stacking-Based Attention Temporal Convolutional Network for Action Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, high layers in TCNs are more coarse access to video features, resulting in the loss of fine-grained information for frame-wise action classification. To address the above issues, we propose a novel Attention-based Temporal Convolution (ATC) block to capture fine-grained information of temporal dependencies for frame-wise action classification by self-attention mechanism. |
L. Yang; Y. Jiang; J. Hong; Z. Wu; Z. Yang; J. Long; |
2329 | STACKMAPS: A Visualization Technique for Diabetic Retinopathy Grading Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the model interpretation problem for the Diabetic Retinopathy grading task. |
I. El-Yamany; A. Wael; N. Adly; M. Torki; |
2330 | Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a cross-domain data generation method to obtain a domain-invariant ASV system. |
H. -R. Hu; Y. Song; J. -T. Zhang; L. -R. Dai; I. McLoughlin; Z. Zhuo; Y. Zhou; Y. -H. Li; H. Xue; |
2331 | Static and Dynamic Source and Filter Cues for Classification of Amyotrophic Lateral Sclerosis Patients and Healthy Subjects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such cues can further be attributed to the vocal cord (source) and vocal tract (filter) involved in speech production. This paper analyzes the relative contributions of these static (captured through average spectral characteristics) and dynamic (captured through spectral variations over time) source and filter cues toward automatic classification of ALS patients and healthy subjects using sustained utterances of /a/, /i/, /o/ and /u/. |
T. Bhattacharjee; C. V. Thirumala Kumar; Y. Belur; A. Nalini; R. Yadav; P. K. Ghosh; |
2332 | Static-Scene Constrained Optimization for Matrix/Tensor-Decomposition-free Foreground-Background Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an efficient foreground-background separation (FBS) method for (possibly noisy) video data. |
K. Naganuma; S. Ono; |
2333 | Statistical Analysis of Speech Disorder Specific Features to Characterise Dysarthria Severity Level Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the aim of reducing the subjectivity in clinical evaluations, many automated systems are proposed in the literature to assess the dysarthria severity level using these features. This work aims to analyse the suitability of these features in determining the severity level. |
A. A. Joshy; P. N. Parameswaran; S. R. Nair; R. Rajan; |
2334 | Stay In The Middle: A Semi-Supervised Model for CT Metal Artifact Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel semi-supervised framework for MAR, termed SemiMAR. |
T. Wang; H. Yu; Z. Lu; Z. Zhang; J. Zhou; Y. Zhang; |
2335 | Step Restriction for Improving Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm to automatically restrict the step size in the iterative optimization process with an application to adversarial attacks on speaker verification models. |
K. Goto; S. Otake; R. Kawakami; N. Inoue; |
2336 | Stereoscopic Video Retargeting Based on Camera Motion Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The existing stereo video retargeting algorithms commonly use a same methodology to perform resizing without considering different videos with various features, leading to the low quality of reconstructed videos. To address this issue, we propose a stereo video retargeting method based on camera motion classification, which employs different retargeting strategies to rescale stereo videos. |
L. Cai; Z. Tang; |
2337 | ST-MVDNet++: Improve Vehicle Detection with Lidar-Radar Geometrical Augmentation Via Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to improve the performance of the vehicle detection model with Lidar-Radar fusion and data augmentation. |
Y. -J. Li; M. O’Toole; K. Kitani; |
2338 | Stochastic Optimization of Vector Quantization Methods in Application to Speech and Image Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Vector quantization (VQ) methods have been used in a wide range of applications for speech, image, and video data. |
M. H. Vali; T. Bäckström; |
2339 | Stochastic Super-Resolution For Gaussian Textures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to show that stochastic SR is a well-posed and solvable problem when restricting to Gaussian stationary textures. |
É. Pierret; B. Galerne; |
2340 | Strategies for Enhanced Signal Modulation Classifications Under Unknown Symbol Rates and Noise Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the performance of AMC under varying sampling rates and signal-to-noise ratio (SNR). |
R. Wang; Y. Qi; M. Vaezi; X. Jiao; M. Amin; |
2341 | Stream Attention Based U-Net for L3DAS23 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a stream attention based U-Net to remove background noise and reverberation based on ICASSP Signal Processing Grand Challenge 2023: L3DAS23 Challenge1 Audio-only track task1 3D Speech Enhancement. |
H. Wang; Y. Fu; J. Li; M. Ge; L. Wang; X. Qian; |
2342 | Streaming Joint Speech Recognition and Disfluency Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose Transformer-based encoder-decoder models that jointly solve speech recognition and disfluency detection, which work in a streaming manner. |
H. Futami; E. Tsunoo; K. Shibata; Y. Kashiwagi; T. Okuda; S. Arora; S. Watanabe; |
2343 | Streaming Multi-Channel Speech Separation with Online Time-Domain Generalized Wiener Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we modify the offline TD-GWF to an online counterpart via a Sherman-Morrison formula-based approximation and introduce how we simplify and stabilize the training phase. |
Y. Luo; |
2344 | Streaming Stroke Classification of Online Handwriting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Multiple Stroke State Transformer (MSST), a novel framework to enable simultaneous real-time classification and modifiability of previous predictions. |
J. -Y. Liu; Y. -M. Zhang; F. Yin; C. -L. Liu; |
2345 | Streaming Voice Conversion Via Intermediate Bottleneck Features and Non-Streaming Teacher Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs. |
Y. Chen; M. Tu; T. Li; X. Li; Q. Kong; J. Li; Z. Wang; Q. Tian; Y. Wang; Y. Wang; |
2346 | StreamSpeech: Low-Latency Neural Architecture for High-Quality On-Device Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe StreamSpeech – an optimized architecture of a complete TTS system that produces high-quality speech and runs faster than real time with imperceptible latency on resource-constrained devices by utilizing a single CPU core. |
G. Shopov; S. Gerdjikov; S. Mihov; |
2347 | String-Based Molecule Generation Via Multi-Decoder VAE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the problem of string-based molecular generation via variational autoencoders (VAEs) that have served a popular generative approach for various tasks in artificial intelligence. |
K. Kwon; K. Jeong; J. Park; H. Na; J. Shin; |
2348 | Structural Optimization of Factor Graphs for Symbol Detection Via Continuous Clustering and Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method to optimize the structure of factor graphs for graph-based inference. |
L. Rapp; L. Schmid; A. Rode; L. Schmalen; |
2349 | Structural Reparameterization Lightweight Network for Video Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach to reduce the model size while preserves accuracy by combining lightweight networks with structural reparameterization. |
A. Zhu; Y. Wang; W. Li; P. Qian; |
2350 | Structure-Aware Multi-Feature Co-Learning for Dual Branch Face Super Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, effective co-learning of both has been limiting the performance improvement of existing state-of-the-art methods. Therefore, we focus on the texture and structure of images in this paper, and design a two-branch network containing a texture network (T-Net) and a structure network (S-Net) to jointly explore texture and structure information for co-learning. |
K. Zeng; Z. Wang; T. Lu; J. Chen; |
2351 | Structure-Aware Sparse Bayesian Learning-Based Channel Estimation for Intelligent Reflecting Surface-Aided MIMO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents novel cascaded channel estimation techniques for an intelligent reflecting surface-aided multiple-input multiple-output system. |
Y. He; G. Joseph; |
2352 | Structured-Anchor Projected Clustering for Hyperspectral Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods typically disregard noisy bands and require post-processing. To tackle these issues, we propose a structured-anchor projected clustering (SAPC) model for HSIs. |
G. Jiang; J. Zhang; Y. Zhang; X. Jiang; Z. Cai; |
2353 | Structured Errors-in-Variables Modelling for Cortico-Muscular Coherence Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes an approach based on structured errors-in-variables (EIV) modelling to estimate components of the cortex and muscle signals involved in movement control from noisy EEG and EMG signals for the purpose of coherence estimation. |
Z. Guo; V. M. McClelland; W. Dai; Z. Cvetkovic; |
2354 | Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This frontend has a small size but a heavy computational cost. In this work, we propose three task-specific structured pruning methods to deal with such heterogeneous networks. |
Y. Peng; K. Kim; F. Wu; P. Sridhar; S. Watanabe; |
2355 | Structured State Space Decoder for Speech Recognition and Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we applied S4 as a decoder for ASR and text-to-speech (TTS) tasks, respectively, by comparing it with the Transformer decoder. |
K. Miyazaki; M. Murata; T. Koriyama; |
2356 | Structure-Preserving and Redundancy-Free Features Refinement for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most models achieve competitive performance but still suffer from two problems: (1) Topological structure neglection; (2) Redundant information interference. In this paper, we propose a Structure-preserving and Redundancy-free Features Refinement model (referred to as SP-RFFR) to address these problems correspondingly in two modules: (1) Structure-preserving, to explicitly incorporate the topological structure into the learning of the latent space and the generator; (2) Redundancy-free features refinement, to remove the redundant information from the visual features and learn class- and semantically-relevant representations. |
J. Ni; Y. Liao; |
2357 | Stuart: Individualized Classroom Observation of Students with Automatic Behavior Recognition And Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present StuArt, a novel automatic system designed for the individualized classroom observation, which empowers instructors to concern the learning status of each student. |
H. Zhou; F. Jiang; J. Si; L. Xiong; H. Lu; |
2358 | Study And Design Of Robust Personal Sound Zones With Vast Using Low Rank Rirs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the feasibility and effectiveness of using low rank approximations of RIRs to calculate sound zone control filters, to improve the robustness of sound zone control algorithms to perturbations in the bright zone (BZ) microphones. |
S. S. Bhattacharjee; L. Shi; G. Ping; X. Shen; M. G. Christensen; |
2359 | Study of Manifold Geometry Using Multiscale Non-Negative Kernel Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a framework to study the geometric structure of the data. |
C. Hurtado; S. Shekkizhar; J. Ruiz-Hidalgo; A. Ortega; |
2360 | Study on The Fairness of Speaker Verification Systems Across Accent and Gender Groups Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze the performance of two X-vector-based SV systems across groups defined by gender and accent of the speakers when speaking English. |
M. Estevez; L. Ferrer; |
2361 | Style Modeling for Multi-Speaker Articulation-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation. |
M. Kim; Z. Piao; J. Lee; H. -G. Kang; |
2362 | STYX: Adaptive Poisoning Attacks Against Byzantine-Robust Defenses in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that a range of popular robust aggregation strategies, when applied to neural networks, can be trivially circumvented through simple adaptive attacks. |
Y. Wen; J. Geiping; M. Goldblum; T. Goldstein; |
2363 | Sub-Band Contrastive Learning-Based Knowledge Distillation For Sound Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new KD loss function that enables a student network to learn informative contrastive distribution and fine grained information from spectrogram representation of a signal thus enhancing performance of a student network for sound classification task. |
A. M. Tripathi; A. Mishra; |
2364 | Subband Dependency Modeling for Sound Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a subband dependency model (SDM) to enhance the capability of CRNN in modeling subband dependencies from the input spectrogram. |
Y. Guan; G. Zheng; J. Han; H. Wang; |
2365 | Subgradient Descent Learning with Over-the-Air Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a distributed learning problem in a communication network, consisting of N distributed nodes and a central parameter server (PS). |
T. L. S. Gez; K. Cohen; |
2366 | Subject-Specific Adaptation for A Causally-Trained Auditory-Attention Decoding System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Performance in either approach is limited: group models suffer due to the variability across subjects and time, while individual models are constrained by the limited data samples available. To overcome this challenge, we introduce a subject-specific adaptive form of auditory attention decoding (AAD) over short time windows to account for the variability across EEG recording sessions. |
C. Beauchene; M. Brandstein; S. Haro; T. F. Quatieri; C. J. Smalt; |
2367 | Subspace-Based Detector For Distributed Mmwave Mimo Radar Sensors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general signal model for distributed connected MIMO radar sensors that collect unwanted and interference signals in a low-rank subspace based on their angle and Doppler frequency spread with different subspace coefficients. |
M. Ahmadi; M. Alaee-Kerahroodi; B. S. M. R.; B. Ottersten; |
2368 | Subspace Hybrid Beamforming for Head-Worn Microphone Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Components Analysis (PCA) denoising. |
S. Hafezi; A. H. Moore; P. Guiraud; P. A. Naylor; J. Donley; V. Tourbabin; T. Lunner; |
2369 | Subspace Modeling Enabled High-Sensitivity X-Ray Chemical Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, by exploiting the intrinsic properties and subspace modeling of the TXM-XANES imaging data, we introduce a simple and robust denoising approach to improve the image quality, which enables fast and high-sensitivity chemical characterization. |
J. Li; B. Chen; G. Zan; G. Qian; P. Pianetta; Y. Liu; |
2370 | Suffix Retrieval-Augmented Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel language model, SUffix REtrieval-Augmented LM (SUREALM), that simulates a bi-directional contextual effect in an autoregressive manner. |
Z. Wang; Y. -C. Tam; |
2371 | Supercm: Revisiting Clustering for Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we instead propose a novel approach that explicitly incorporates the underlying clustering assumption in SSL through extending a recently proposed differentiable clustering module. |
D. Singh; A. Boubekki; R. Jenssen; M. C. Kampffmeyer; |
2372 | Super Dilated Nested Arrays with Ideal Critical Weights and Increased Degrees of Freedom Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce two further dilations of the recently introduced dilated nested arrays (DNAs), which possess an equal virtual ULA part to that of the nested arrays but possess two dense physical ULAs with the critical spacing (2 × λ/2). |
A. M. A. Shaalan; J. Du; |
2373 | Super-Resolution for Macro X-Ray Fluorescence Data Collected from Old Master Paintings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this quality is limited by the acquisition time for the XRF datacube, resulting in a trade-off between signal-to-noise ratio (SNR) and spatial resolution. To solve this problem we propose to enhance the spatial resolution of the XRF datacube of a painting leveraging a corresponding high-resolution (HR) RGB image. |
S. Yarn; H. Verinaz-Jadan; J. -J. Huang; N. Daly; C. Higgitt; P. L. Dragotti; |
2374 | Super-Resolution Harmonic Retrieval of Non-Circular Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a super-resolution harmonic retrieval method for uncorrelated strictly non-circular signals, whose covariance and pseudo-covariance present Toeplitz and Hankel structures, respectively. |
Y. Zhang; Y. Wang; Z. Tian; G. Leus; G. Zhang; |
2375 | Super-Resolution Information Enhancement for Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a more elegant method termed Multi-Scale Super-Resolution Module (MSSRM). |
J. Xie; W. Xu; D. Liang; Z. Ma; K. Liang; W. Liu; R. Wang; L. Jin; |
2376 | Supervised Contrastive Learning As Multi-Objective Optimization for Fine-Tuning Large Pre-Trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formulate the SCL problem as a Multi-Objective Optimization (MOO) problem for the fine-tuning phase of RoBERTa language model. |
Y. Moukafih; M. Ghogho; K. Smaili; |
2377 | Supervised Hierarchical Clustering Using Graph Neural Networks for Speaker Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Supervised HierArchical gRaph Clustering algorithm (SHARC) for speaker diarization where we introduce a hierarchical structure using Graph Neural Network (GNN) to perform supervised clustering. |
P. Singh; A. Kaul; S. Ganapathy; |
2378 | Surface-Sampling Based Objective Quality Assessment Metrics for Meshes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove that it is feasible to perform mesh quality assessment by sampling it into point cloud. |
C. Fu; X. Zhang; T. Nguyen-Canh; X. Xu; G. Li; S. Liu; |
2379 | Surrogate Based Post-HOC Calibration for Distributional Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the finding, we propose a simple yet effective approach named Surrogate Based Temperature Scaling (SBTS), where the surrogate model is trained to map the relationship between temperature and the shifting intensity. |
J. Zhang; |
2380 | SUVR: A Search-Based Approach to Unsupervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Search-based Unsupervised Visual Representation Learning (SUVR) to learn better image representations in an unsupervised manner. |
Y. -Z. Xu; C. -Y. Chen; C. -T. Li; |
2381 | SVMV: Spatiotemporal Variance-Supervised Motion Volume for Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we design a lightweight neural network to construct motion volumes via ensembles of offset approximations, and propose a spatiotemporal variance-aware loss to supervise the network learning. |
Y. Luo; J. Pan; J. Tang; |
2382 | Switching Kronecker Product Linear Filtering for Multispeaker Adaptive Speech Dereverberation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How to deal with reverberation in multiple-speaker scenarios is still a challenging problem, which is studied in this work. |
G. Huang; J. Benesty; I. Cohen; E. Winebrand; J. Chen; W. Kellermann; |
2383 | SW-WAVENET: Learning Representation from Spectrogram and Wavegram Using Wavenet for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework for ASD named Spectrogram-Wavegram WaveNet (SW-WaveNet), which segments the 2-D log-mel spectrogram into 1-D waveform signals of different frequency bands and combines the representation vector extracted by WaveNet from segmented log-mel spectrograms and Wavegrams, respectively. |
H. Chen; L. Ran; X. Sun; C. Cai; |
2384 | Symbol Level Precoding in The RF Domain for Low Hardware Complexity RIS-Assisted MU-MISO Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a radio-frequency (RF) domain symbol level precoding technique is developed for reconfigurable intelligent surface (RIS)-assisted downlink multiuser multiple-input single-output (MU-MISO) systems. |
C. G. Tsinos; T. A. Tsiftsis; R. Schober; |
2385 | Symbol-Level Precoding Is Related to Parameter Estimation from Quantized Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, much attention has been paid to the formulation and optimization aspects. In this paper we contribute to these aspects by drawing a connection between SLP and a seemingly unrelated topic—namely, parameter estimation from quantized data. |
M. Shao; W. -K. Ma; Y. Liu; |
2386 | SyncNet: Correlating Objective for Time Delay Estimation in Audio Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to the popular signal processing based methods, this paper proposes to transform the input signals using a deep neural network into another pair of sequences which show high cross correlation at the actual time delay. |
A. Raina; V. Arora; |
2387 | Syngen: A Syntactic Plug-And-Play Module for Generative Aspect-Based Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SynGen, a plug-and-play syntactic information aware module. |
C. Yu; T. Wu; J. Li; X. Bai; Y. Yang; |
2388 | SYNTACC : Synthesizing Multi-Accent Speech By Weight Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method uses the YourTTS model and involves a novel multi-accent training mechanism. |
T. -N. Nguyen; N. -Q. Pham; A. Waibel; |
2389 | Synthesizer Preset Interpolation Using Transformer Auto-Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: After training, the proposed model can be integrated into commercial synthesizers for live interpolation or sound design tasks. |
G. L. Vaillant; T. Dutoit; |
2390 | Synthesizing Speech from ECoG with A Combination of Transformer-Based Encoder and Neural Vocoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper reports on a novel invasive brain–computer interface (BCI) paradigm that has successfully reconstructed spoken sentences from invasive electrocorticogram (ECoG) signals using deep-neural-network-based encoders and a pre-trained neural vocoder. |
K. Shigemi; S. Komeiji; T. Mitsuhashi; Y. Iimura; H. Suzuki; H. Sugano; K. Shinoda; K. Yatabe; T. Tanaka; |
2391 | Synthetic Pseudo Anomalies for Unsupervised Video Anomaly Detection: A Simple Yet Efficient Framework Based on Masked Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, even with only normal data training, the AEs often reconstruct anomalies well, which depletes their anomaly detection performance. To alleviate this issue, we propose a simple yet efficient framework for video anomaly detection. |
X. Huang; C. Zhao; C. Gao; L. Chen; Z. Wu; |
2392 | T5lephone: Bridging Speech and Text Self-Supervised Models for Spoken Language Understanding Via Phoneme Level T5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct extensive studies on how PLMs with different tokenization strategies affect spoken language understanding task including spoken question answering (SQA) and speech translation (ST). |
C. -J. Hsu; H. -L. Chung; H. -Y. Lee; Y. Tsao; |
2393 | T5-SR: A Unified Seq-to-Seq Decoding Strategy for Semantic Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, compared with abstract-syntactic-tree-based SQL generation, seq2seq semantic parsers face much more challenges, including poor quality on schematical information prediction and poor semantic coherence between natural language queries and SQLs. This paper analyses the above difficulties and proposes a seq2seq-oriented decoding strategy called SR, which includes a new intermediate representation SSQL and a reranking method with score re-estimator to solve the above obstacles respectively. |
Y. Li; Z. Su; Y. Li; H. Zhang; S. Wang; W. Wu; Y. Zhang; |
2394 | TABLEIE: Capturing The Interactions Among Sub-Tasks in Information Extraction Via Double Tables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a double-table framework, TableIE, to capture the interactions among IE sub-tasks as well as improve the model efficiency. |
J. Lin; R. Xu; B. Chang; |
2395 | TAMformer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on pedestrians’ early intention prediction in which, from a current observation of an urban scene, the model predicts the future activity of pedestrians that approach the street. |
N. Osman; G. Camporese; L. Ballan; |
2396 | Tangent Bundle Filters and Neural Networks: From Manifolds to Cellular Sheaves and Back Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce a convolution operation over the tangent bundle of Riemannian manifolds exploiting the Connection Laplacian operator. |
C. Battiloro; Z. Wang; H. Riess; P. D. Lorenzo; A. Ribeiro; |
2397 | TAPE: An End-to-End Timbre-Aware Pitch Estimator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an alternative to this approach, we introduce a timbre-aware pitch estimator (TAPE), which estimates the pitch of a target source in an end-to-end manner without the need for an explicit source separation step. |
N. C. Tamer; Y. Özer; M. Müller; X. Serra; |
2398 | TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an objective for perceptual quality based on temporal acoustic parameters. |
Y. Zeng; J. Konan; S. Han; D. Bick; M. Yang; A. Kumar; S. Watanabe; B. Raj; |
2399 | Targeted Adversarial Attacks Against Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new targeted adversarial attack against NMT models. |
S. Sadrizadeh; A. D. Aghdam; L. Dolamic; P. Frossard; |
2400 | Target Sound Extraction with Variable Cross-Modality Clues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To leverage variable number of clues cross modalities available in the inference phase, including a video, a sound event class, and a text caption, we propose a unified transformer-based TSE model architecture, where a multi-clue attention module integrates all the clues across the modalities. |
C. Li; Y. Qian; Z. Chen; D. Wang; T. Yoshioka; S. Liu; Y. Qian; M. Zeng; |
2401 | Target Speaker Extraction with Ultra-Short Reference Speech By VE-VE Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Voice Extractor-Voice Extractor (VE-VE) framework for TSE task with ultra-short enrollment speech. |
L. Yang; W. Liu; L. Tan; J. Yang; H. -G. Moon; |
2402 | Target-Speaker Voice Activity Detection Via Sequence-to-Sequence Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel Sequence-to-Sequence Target-Speaker Voice Activity Detection (Seq2Seq-TSVAD) method that can efficiently address the joint modeling of large-scale speakers and predict high-resolution voice activities. |
M. Cheng; W. Wang; Y. Zhang; X. Qin; M. Li; |
2403 | Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers. |
D. Wang; X. Xiao; N. Kanda; T. Yoshioka; J. Wu; |
2404 | Target Velocity Estimation for Quantization-Based Cooperative MIMO Radar and Communications System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive the corresponding distributed maximum likelihood (ML) estimators and Cramér-Rao bounds (CRBs). |
Z. Wang; X. Yan; Q. He; R. S. Blum; |
2405 | Tayloraecnet: A Taylor Style Neural Network For Full-Band Echo Cancellation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes aecX team’s entry to the ICASSP 2023 acoustic echo cancellation (AEC) challenge. |
W. Xu; Z. Guo; |
2406 | TDMA-Based Multi-User Binary Computation Offloading in The Finite-Block-Length Regime Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the devices require timely results, that allocation ought to be guided by the fundamental rate limits for finite block lengths, rather than the classical (asymptotic) limits. We develop an efficient algorithm for such an allocation. |
M. A. Manouchehrpour; H. Lehal; M. Salmani; T. N. Davidson; |
2407 | TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 Dns-Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the Unbeatable Team’s submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. |
Y. Ju; J. Chen; S. Zhang; S. He; W. Rao; W. Zhu; Y. Wang; T. Yu; S. Shang; |
2408 | TeAw: Text-Aware Few-Shot Remote Sensing Image Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a text-aware framework for few-shot remote sensing image scene classification (TeAw). |
K. Cheng; C. Yang; Z. Fan; D. Wu; N. Guan; |
2409 | TEFISTA-NET: GTD Parameter Estimation of Low-Frequency Ultra- Wideband Radar Via Model-Based Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new model-based deep learning method for GTD parameter estimation. |
R. Li; X. Wang; G. Li; X. -P. Zhang; |
2410 | Tell Model Where to Attend: Improving Interpretability of Aspect-Based Sentiment Classification Via Small Explanation Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Interpretation-Enhanced Gradient-based framework for ABSC via a small number of explanation annotations, namely IGA. |
Z. Cheng; J. Zhou; W. Wu; Q. Chen; L. He; |
2411 | Temporal Contrastive Learning with Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ConCur, a contrastive video representation learning method that uses curriculum learning to impose a dynamic sampling strategy in contrastive training. |
S. Roy; A. Etemad; |
2412 | Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards that goal, we introduce a novel temporal emotional modeling approach for SER, termed Temporal-aware bI-direction Multi-scale Network (TIM-Net), which learns multi-scale contextual affective representations from various time scales. |
J. Ye; X. -C. Wen; Y. Wei; Y. Xu; K. Liu; H. Shan; |
2413 | Tempo Vs. Pitch: Understanding Self-Supervised Tempo Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Particularly in the context of music, there are few insights about the fragility of these models regarding different distributions of data, and how they could be mitigated. In this paper, we explore these questions by dissecting a self-supervised model for pitch estimation adapted for tempo estimation via rigorous experimentation with synthetic data. |
G. Morais; M. E. P. Davies; M. Queiroz; M. Fuentes; |
2414 | Tensor-based Complex-valued Graph Neural Network for Dynamic Coupling Multimodal Brain Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a Tensor-based Complex-valued Graph Neural Network (TC-GNN) to model multimodal neuroimages as complex-valued tensor graphs by investigating underlying complementary associations and cross-modality message aggregation. |
Y. Yang; G. Cai; C. Ye; Y. Xiang; T. Ma; |
2415 | Tensor Completion for Efficient and Accurate Hyperparameter Optimisation in Large-Scale Statistical Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a completely different approach for hyperparameter optimisation, based on low-rank tensor completion. |
A. Rebello; K. Konstantinidis; Y. L. Xu; D. P. Mandic; |
2416 | Tensor Decomposition Based Latent Feature Clustering for Hyperspectral Band Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the proposed model, we present an effective optimization algorithm as solution. |
J. Qi; J. Zhang; Y. Zhang; X. Jiang; Z. Cai; |
2417 | Tensorized LSSVMS For Multitask Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: High-order tensors are capable of providing efficient representations for such tasks, while preserving structural task-relations. In this paper, a new MTL method is proposed by leveraging low-rank tensor analysis and constructing tensorized Least Squares Support Vector Machines, namely the tLSSVM-MTL, where multilinear modelling and its nonlinear extensions can be flexibly exerted. |
J. Liu; Q. Tao; C. Zhu; Y. Liu; J. A. K. Suykens; |
2418 | Tensorized Neural Layer Decomposition for 2-D DOA Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Simulation results demonstrate that the proposed method reduces the number of trained parameters by more than 122,000 times compared to the matrix-based neural network while maintaining a moderate accuracy. |
H. Zheng; C. Zhou; S. A. Vorobyov; Z. Shi; |
2419 | Tensor Low Rank Column-Wise Compressive Sensing for Dynamic Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Secondly, for image or volume image sequences, it requires vectorizing the image or volume as one column of a matrix and this ignores the inherent 2D or 3D structure of the images or volumes. To address these limitations, in this work, we explore the use of a tensor LR model on the image sequence along with developing a fast and memory-efficient gradient descent (GD) based recovery algorithm and evaluating it experimentally. |
S. Babu; S. Aviyente; N. Vaswani; |
2420 | Terminology-Aware Medical Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework to improve medical dialogue generation by considering features centered on domain-specific terminology. |
C. Tang; H. Zhang; T. Loakman; C. Lin; F. Guerin; |
2421 | Ternary Weight Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a memory and computation efficient ternary weight networks (TWNs) – with weights constrained to +1, 0 and -1. |
B. Liu; F. Li; X. Wang; B. Zhang; J. Yan; |
2422 | Test-Time Training-Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By exploiting spatial activation that was previously overlooked and simply averaged out, we propose a simple method based on Feature Statistics Transformation (FST) on-the-fly for each test example. |
Y. Feng; W. He; K. You; B. Liu; Z. Zhang; Y. Wang; M. Li; Y. Lou; J. Li; G. Li; J. Liao; |
2423 | Test Your Samples Jointly: Pseudo-Reference for Image Quality Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the well-known image quality assessment problem but in contrast from existing approaches that predict image quality independently for every images, we propose to jointly model different images depicting the same content to improve the precision of quality estimation. |
M. Tworski; S. Lathuilière; |
2424 | Text Classification In The Wild: A Large-Scale Long-Tailed Name Normalization Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We construct our test set from four different subsets: many-, medium-, and few-shot sets, as well as a zero-shot open set, which are meant to isolate the few-shot and zero-shot learning scenarios from the massive many-shot classes. |
J. Qi; S. Li; Z. Guo; Y. Huang; C. Zhou; W. Zhang; X. Wang; Z. Lin; |
2425 | Text Is All You Need: Personalizing ASR Models Using Controllable Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. |
K. Yang; T. -Y. Hu; J. -H. R. Chang; H. Swetha Koppula; O. Tuzel; |
2426 | Textless Direct Speech-to-Speech Translation with Discrete Speech Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel model, Textless Translatotron, which is based on Translatotron 2 [1], for training an end-to-end direct S2ST model without any textual supervision. |
X. Li; Y. Jia; C. -C. Chiu; |
2427 | Textless Speech-to-Music Retrieval Using Emotion Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a framework that recommends music based on the emotions of speech. |
S. Doh; M. Won; K. Choi; J. Nam; |
2428 | Text-to-ECG: 12-Lead Electrocardiogram Synthesis Conditioned on Clinical Text Reports Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The diagnosis classes of ECGs are insufficient to capture the intricate differences between ECGs depending on various features (e.g. patient demographic details, co-existing diagnosis classes, etc.). To alleviate these challenges, we present a text-to-ECG task, in which textual inputs are used to produce ECG outputs. |
H. Chung; J. Kim; J. -M. Kwon; K. -H. Jeon; M. S. Lee; E. Choi; |
2429 | Text-To-Speech Synthesis Based on Latent Variable Conversion Using Diffusion Probabilistic Model and Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a TTS method based on latent variable conversion using a diffusion probabilistic model and the variational autoencoder (VAE). |
Y. Yasuda; T. Toda; |
2430 | TFCnet: Time-Frequency Domain Corrector for Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering the robustness of time-frequency (T-F) domain methods, we propose an innovative network architecture called Time-Frequency Domain Corrector Network (TFCNet), which consists of a time-domain separator and a specially-designed T-F domain corrector. |
W. Tong; J. Zhu; J. Chen; Z. Wu; S. Kang; H. Meng; |
2431 | TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-frequency (T-F) domain, for monaural talker-independent speaker separation in anechoic conditions. |
Z. -Q. Wang; S. Cornell; S. Choi; Y. Lee; B. -Y. Kim; S. Watanabe; |
2432 | Tg-Critic: A Timbre-Guided Model For Reference-Independent Singing Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a data-driven model TG-Critic is proposed to introduce timbre embeddings as one of the model inputs to guide the evaluation of singing quality. |
X. Sun; Y. Gao; H. Lin; H. Liu; |
2433 | The 2nd Clarity Enhancement Challenge for Hearing Aid Speech Intelligibility Enhancement: Overview and Outcomes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper reports on the design and outcomes of the 2nd Clarity Enhancement Challenge (CEC2), a challenge for stimulating novel approaches to hearing-aid speech intelligibility enhancement. |
M. A. Akeroyd; W. Bailey; J. Barker; T. J. Cox; J. F. Culling; S. Graetzer; G. Naylor; Z. Podwińska; Z. Tu; |
2434 | The Ajmide Topic Segmentation System for The ICASSP 2023 General Meeting Understanding and Generation Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our topic segmentation (TS) system submitted to the ICASSP2023 Signal Processing Grand Challenge – General Meeting Understanding and Generation challenge (MUG). |
B. Hu; Q. Li; X. Xia; |
2435 | The DKU Post-Challenge Audio-Visual Wake Word Spotting System for The 2021 MISP Challenge: Deep Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: This paper further explores our previous wake word spotting system ranked 2-nd in Track 1 of the MISP Challenge 2021. First, we investigate a robust unimodal approach based on 3D … |
H. Wang; M. Cheng; Q. Fu; M. Li; |
2436 | The Edinburgh International Accents of English Corpus: Towards The Democratization of English ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first release of The Edinburgh International Accents of English Corpus (EdAcc). |
R. Sanabria; N. Bogoychev; N. Markl; A. Carmantini; O. Klejch; P. Bell; |
2437 | The MBSTOI Binaural Intelligibility Metric Using A Close-Talking Microphone Reference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper the deep correlation modified binaural short time objective intelligibility metric (Dcor-MBSTOI) is evaluated with a single-channel close-talking microphone signal as the reference. |
P. Guiraud; A. H. Moore; R. R. Vos; P. A. Naylor; M. Brookes; |
2438 | The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the dataset, track settings, and baselines of the MISP2022 challenge. |
Z. Wang; S. Wu; H. Chen; M. -K. He; J. Du; C. -H. Lee; J. Chen; S. Watanabe; S. Siniscalchi; O. Scharenborg; D. Liu; B. Yin; J. Pan; J. Gao; C. Liu; |
2439 | The NERCSLIP-USTC System for The L3DAS23 Challenge Task2: 3D Sound Event Localization and Detection (SELD) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a robust network architecture with data augmentation techniques is proposed to improve SELD performance, where ResNet and Conformer blocks are combined to model both local and global patterns. |
H. Yan; H. Xu; Q. Wang; J. Zhang; |
2440 | The NIO System for Audio-Visual Diarization and Recognition in MISP Challenge 2022 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes NIO system for audio-visual diarization and recognition in the Multimodal Information Based Speech Processing (MISP) Challenge 2022. |
G. Xu; X. Wang; S. Wang; J. Yuan; W. Guo; W. Li; J. Gao; |
2441 | The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. |
P. Guo; H. Wang; B. Mu; A. Zhang; P. Chen; |
2442 | The NPU-Elevoc Personalized Speech Enhancement System for Icassp2023 DNS Challenge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge[1] at ICASSP 2023. |
X. Yan; Y. Yang; Z. Guo; L. Peng; L. Xie; |
2443 | The Pipeline System of ASR and NLU with MLM-based Data Augmentation Toward Stop Low-Resource Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our system for the low-resource domain adaptation track (Track 3) in Spoken Language Understanding Grand Challenge, which is a part of ICASSP Signal Processing Grand Challenge 2023. |
H. Futami; J. Huynh; S. Arora; S. -L. Wu; Y. Kashiwagi; Y. Peng; B. Yan; E. Tsunoo; S. Watanabe; |
2444 | The Potential of Neural Speech Synthesis-Based Data Augmentation for Personalized Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve the personalization goal, while dealing with the typical lack of personal data, we investigate the effect of data augmentation based on neural speech synthesis (NSS). In the proposed method, we show that the quality of the NSS system’s synthetic data matters, and if they are good enough the augmented dataset can be used to improve the PSE system that outperforms the speaker-agnostic baseline. |
A. Kuznetsova; A. Sivaraman; M. Kim; |
2445 | The R3VIVAL Dataset: Repository of Room Responses and 360 Videos of A Variable Acoustics Lab Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a dataset of spatial room impulse responses (SRIRs) and 360° stereoscopic video captures of a variable acoustics laboratory. |
F. Klein; S. V. Amengual Garí; |
2446 | Thermal Infrared Image Inpainting Via Edge-Aware Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel task—Thermal Infrared Image Inpainting, which aims to reconstruct missing regions of TIR images. |
Z. Wang; H. Shen; C. Men; Q. Sun; K. Huang; |
2447 | The Role of Initial Entanglement in Adaptive Gibbs State Preparation on Quantum Computers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we explore the role of the initial quantum correlations (i.e., entanglement) built into the ‘data’ quantum bits (those that will be sampled) and the ancilla quantum bits (those that assist in preparing the Gibbs state) in the run-time of the adaptive state-preparation algorithm. |
S. E. Economou; A. Warren; E. Barnes; |
2448 | The Role of Memory in Social Learning When Sharing Partial Opinions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This introduces significant challenges as compared to the standard case of full opinion sharing. We propose a novel strategy where each agent forms a valid belief by completing the partial beliefs received from its neighbors. |
M. Cirillo; V. Bordignon; V. Matta; H. Sayed; |
2449 | The Secret Source : Incorporating Source Features to Improve Acoustic-To-Articulatory Speech Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we incorporated acoustically derived source features, aperiodicity, periodicity and pitch as additional targets to an acoustic-to-articulatory speech inversion (SI) system. |
Y. M. Siriwardena; C. Espy-Wilson; |
2450 | The Uniqueness Problem of Physical Law Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One key problem consists in the fact that the governing equations might not be uniquely determined by the given data. We will study this problem in the common situation that a physical law is described by an ordinary or partial differential equation. |
P. Scholl; A. Bacho; H. Boche; G. Kutyniok; |
2451 | The Ustc System for Adress-m Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our submission to the ICASSP 2023 Signal Processing Grand Challenge (SPGC), which focuses on multilingual Alzheimer’s disease (AD) recognition through spontaneous speech. |
K. Mei; X. Ding; Y. Liu; Z. Guo; F. Xu; X. Li; T. Naren; J. Yuan; Z. Ling; |
2452 | The WHU-Alibaba Audio-Visual Speaker Diarization System for The MISP 2022 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the system developed by the WHU-Alibaba team for the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. |
M. Cheng; H. Wang; Z. Wang; Q. Fu; M. Li; |
2453 | The XMU System for Audio-Visual Diarization and Recognition in MISP Challenge 2022 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our work in track 2 of the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. |
T. Li; H. Zhou; J. Wang; Q. Hong; L. Li; |
2454 | Think Before You Speak: Concept-Guided Explicit Persona Reasoning for Personalized Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Think-Before-You-Speak (TBYS) model, consisting of Concept-guided Persona Reasoning module and Consistent Dialogue Generation module, to explicitly select persona sentences semantically relevant to the current turn and generate responses based on the selection results. |
Y. Li; Y. Hu; W. Peng; Y. Xie; |
2455 | This Changes to That : Combining Causal and Non-Causal Explanations to Generate Disease Progression in Capsule Endoscopy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a unified explanation approach that given an instance combines both model-dependent and agnostic explanations to produce an explanation set. |
A. Vats; A. Mohammed; M. Pedersen; N. Wiratunga; |
2456 | Time-Aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most of existing methods are developed based on pre-trained language models, which might not be capable to learn temporal-specific presentations of entities in terms of temporal KGQA task. To alleviate this problem, we propose a novel Time-aware Multiway Adaptive (TMA) fusion network. |
Y. Liu; D. Liang; F. Fang; S. Wang; W. Wu; R. Jiang; |
2457 | Time-Domain Speech Enhancement Assisted By Multi-Resolution Frequency Encoder and Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Experiments on the Voice-Bank dataset show that the proposed method obtained a 0.14 PESQ improvement. |
H. Shi; M. Mimura; L. Wang; J. Dang; T. Kawahara; |
2458 | Time-Frequency Awareness Network For Human Mesh Recovery From Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Time-Frequency Awareness Network for human mesh recovery. |
B. Zhang; S. Wu; M. Jia; |
2459 | Time-Resolved FMRI Shared Response Model Using Gaussian Process Factor Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Some recent work has implemented probabilistic models to extract a shared representation in task fMRI. In the present work, we improve upon these models by incorporating temporal information in the common latent structures. |
M. Ebrahimi; N. Calarco; C. Hawco; A. Voineskos; A. Khisti; |
2460 | Time-Varying Signals Recovery Via Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, this smoothness assumption could result in a degradation of performance in the corresponding application when the prior does not hold. In this work, we relax the requirement of this hypothesis by including a learning module. |
J. A. Castro-Correa; J. H. Giraldo; A. Mondal; M. Badiey; T. Bouwmans; F. D. Malliaros; |
2461 | Time-Weighted Frequency Domain Audio Representation with GMM Estimator for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Time-Weighted Frequency Domain Representation (TWFR) with the GMM method (TWFR-GMM) for anomalous sound detection. |
J. Guan; Y. Liu; Q. Zhu; T. Zheng; J. Han; W. Wang; |
2462 | TINYCOD: Tiny and Effective Model for Camouflaged Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an effective and tiny model for real-time Camouflaged Object Detection (COD) named Tiny-COD. |
H. Xing; S. Gao; H. Tang; T. Q. Mok; Y. Kang; W. Zhang; |
2463 | TinyOOD: Effective Out-of-Distribution Detection for TinyML Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel effective out-of-distribution detection method for TinyML (TinyOOD), which exploits cascading early exit and channel-attention-based neural mean discrepancy (CA-NMD) for dynamic and efficient OOD detection on microcontroller units (MCUs). |
Y. Li; J. Jia; Y. Zuo; W. Zhu; |
2464 | Token2vec: A Joint Self-Supervised Pre-Training Framework Using Unpaired Speech and Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce two modality-specific tokenizers for speech and text. |
X. Yue; J. Ao; X. Gao; H. Li; |
2465 | TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In EEND, speaker diarization is formulated as a multi-label prediction problem, where speaker activities are estimated independently and their dependency are not well considered. To overcome these disadvantages, we employ the power set encoding to reformulate speaker diarization as a single-label classification problem and propose the overlap-aware EEND (EEND-OLA) model, in which speaker overlaps and dependency can be modeled explicitly. |
J. Wang; Z. Du; S. Zhang; |
2466 | Topgformer: Topological-Based Graph Transformer for Mapping Brain Structural Connectivity to Functional Connectivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose a novel Topological-based Graph Transformer (TopGFormer) to generate functional connectivity from the structure connectivity with sufficient consideration of topological properties of brain connectivity. |
D. Guo; K. Zhang; J. Li; Y. Kong; |
2467 | Top-K Visual Tokens Transformer: Selecting Tokens for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the features extracted by CNN may contain useless identity-irrelevant information, which inevitably reduces the discrimination of features. To address this issue, this paper introduces a Top-K Visual Tokens Transformer (TVTR) framework which utilizes a top-k visual tokens selection module to accurately select top-k discriminative visual patches for reducing the distraction of identity-irrelevant information and learning discriminative features. |
B. Yang; J. Chen; M. Ye; |
2468 | Topological Signal Processing Over Weighted Simplicial Complexes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal in this paper is to present topological signal processing tools for weighted simplicial complexes. |
C. Battiloro; S. Sardellitti; S. Barbarossa; P. D. Lorenzo; |
2469 | Topological Slepians: Maximally Localized Representations of Signals Over Simplicial Complexes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces topological Slepians, i.e., a novel class of signals defined over topological spaces (e.g., simplicial complexes) that are maximally concentrated on the topological domain (e.g., over a set of nodes, edges, triangles, etc.) and perfectly localized on the dual domain (e.g., a set of frequencies). |
C. Battiloro; P. Di Lorenzo; S. Barbarossa; |
2470 | Topology Uncertainty Modeling For Imbalanced Node Classification on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose Graph Topology Uncertainty (GraphTU), a novel probabilistic class-imbalanced solution specifically for graphs. |
J. Gao; J. Li; K. Zhang; Y. Kong; |
2471 | TOPO-MLP : A Simplicial Network Without Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While powerful, message passing can have disadvantages during inference, particularly when the higher order connectivity information is missing or corrupted. To overcome such limitations, we propose Topo-MLP, a purely MLP-based simplicial neural network algorithm to learn the representation of elements in a simplicial complex without explicitly relying on message passing. |
K. N. Ramamurthy; A. Guzmán-Sáenz; M. Hajij; |
2472 | Torchaudio-Squim: Reference-Less Speech Quality and Intelligibility Measures in Torchaudio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed. Through this paper, we introduce tools and a set of models to estimate such known metrics using deep neural networks. |
A. Kumar; K. Tan; Z. Ni; P. Manocha; X. Zhang; E. Henderson; B. Xu; |
2473 | To Regularize or Not to Regularize: The Role of Positivity in Sparse Array Interpolation with A Single Snapshot Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we strengthen the sufficiency results by proving that in case of positive sources it is possible to interpolate the nested array by performing a simple convex feasibility search instead of solving a rank minimization problem. |
M. C. Hücümenoğlu; P. Sarangi; R. Rajamäki; P. Pal; |
2474 | Toroidal Probabilistic Spherical Discriminant Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present toroidal PSDA (T-PSDA). |
A. Silnova; N. Brümmer; A. Swart; L. Burget; |
2475 | To Wake-Up or Not to Wake-Up: Reducing Keyword False Alarm By Successive Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One of the most challenging tasks in designing such systems is to reduce False Alarm (FA) which happens when the system falsely registers a keyword despite the keyword not being uttered. In this paper, we propose a simple yet elegant solution to this problem that follows from the law of total probability. |
Y. M. Saidutta; R. Sharma Srinivasa; C. -H. Lee; C. Yang; Y. Shen; H. Jin; |
2476 | Toward A Multimodal Approach for Disfluency Detection and Categorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we evaluate the impact of using automatic speech recognition (ASR) transcripts for disfluency detection and categorization. |
A. Romana; K. Koishida; |
2477 | Toward Asymptotic Optimality: Sequential Unsupervised Regression of Density Ratio for Early Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Theoretically-inspired sequential density ratio estimation (SDRE) algorithms are proposed for the early classification of time series. |
A. F. Ebihara; T. Miyagawa; K. Sakurai; H. Imaoka; |
2478 | Toward Auto-Evaluation With Confidence-Based Category Relation-Aware Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Confidence-based Category Relation-aware Regression (C2R2) method. |
J. Wang; J. Chen; B. Su; |
2479 | Toward Privacy-Enhancing Ambulatory-Based Well-Being Monitoring: Investigating User Re-Identification Risk in Multimodal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The sensitivity of data collected via ambulatory monitoring, which regularly involve the recording of speech signals and sensor information, can cause strong privacy concerns. We investigate user re-identification risk in a corpus of such data collected to observe the interplay between behavior, physiology, and well-being of healthcare workers in their daily life. |
R. Pranjal; R. Seshadri; R. Kumar Sanath Kumar Kadaba; T. Feng; S. S. Narayanan; T. Chaspari; |
2480 | Towards Accurate and Real-Time End-of-Speech Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variant of the endpoint (EP) detection problem in automatic speech recognition (ASR), which we call the end-of-speech (EOS) estimation. |
Y. Fan; C. Vaz; D. He; J. Heymann; V. A. Trinh; Z. Zhang; V. Ravichandran; |
2481 | Towards Adversarially Robust Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill in this research gap, we are the first to study adversarial robustness in continual learning and propose a novel method called Task-Aware Boundary Augmentation (TABA) to boost the robustness of continual learning models. |
T. Bai; C. Chen; L. Lyu; J. Zhao; B. Wen; |
2482 | Towards A More Stable and General Subgraph Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Graph Neural Networks (GNNs) have been widely applied to graph-structured data. However, the lack of interpretability impedes its practical deployment especially in high-risk … |
H. Liu; K. Zheng; S. Yu; B. Chen; |
2483 | Towards A Robust and Efficient Classifier for Real World Radio Signal Modulation Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a lightweight deep learning model that accurately and quickly classifies the modulation of signals having different types of distortions, without the need to be trained using distorted signals. |
D. Liu; K. Ergun; T. Š. Rosing; |
2484 | Towards A Unified Conformer Structure: from ASR to ASV Task Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Length-Scaled Attention (LSA) method and Sharpness-Aware Minimization (SAM) are adopted to improve model generalization. |
D. Liao; T. Jiang; F. Wang; L. Li; Q. Hong; |
2485 | Towards A Unified Training for Levenshtein Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By carefully designing experiments, our work reveals that the deletion module is under-trained while the insertion module is over-trained due to the imbalance training signals for the two refinement modules. |
K. Zheng; L. Wang; Z. Wang; B. Chen; M. Zhang; Z. Tu; |
2486 | Towards Bandwidth Estimation for Graph Signal Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider data on graphs modeled as bandlimited graph signals. |
A. Jayawant; A. Ortega; |
2487 | Towards Building Text-to-Speech Systems for The Next Billion Users Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. |
G. K. Kumar; P. S V; P. Kumar; M. M. Khapra; K. Nandakumar; |
2488 | Towards Controllable Audio Texture Morphing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a data-driven approach to train a Generative Adversarial Network (GAN) conditioned on soft-labels distilled from the penultimate layer of an audio classifier trained on a target set of audio texture classes. |
C. Gupta; P. Kamath; Y. Wei; Z. Li; S. Nanayakkara; L. Wyse; |
2489 | Towards Dialogue Modeling Beyond Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we model aspects of communication beyond the words that are said. |
T. Wu; Y. Zhou; W. Ling; H. Yang; J. Veloso; L. Sun; R. Huang; N. Guimaraes; S. Sanner; |
2490 | Towards Diverse and Coherent Augmentation for Time-Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As time-series data generated by real-life physical processes exhibit characteristics in both the time and frequency domains, we propose to combine Spectral and Time Augmentation (STAug) for generating more diverse and coherent samples. |
X. Zhang; R. Roy Chowdhury; J. Shang; R. Gupta; D. Hong; |
2491 | Towards Domain Generalisation in ASR with Elitist Sampling and Ensemble Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper propose to use an elitist sampling strategy at the output of ensemble teacher models to select the best-decoded utterance generated by completely out-of-domain teacher models for generalizing unseen domain. |
R. Ahmad; M. A. Jalal; M. Umar Farooq; A. Ollerenshaw; T. Hain; |
2492 | Towards Efficient and Optimal Joint Beamforming and Antenna Selection: A Machine Learning Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To avoid sub-optimal solutions, an effective branch and bound (B&B) algorithm is proposed. |
S. Shrestha; X. Fu; M. Hong; |
2493 | Towards Explainable Recommendation Via Bert-Guided Explanation Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such information is not given in the practical scenario. To address this issue, we propose a novel Explainable recommender system with BERT-guided explanation generator, named ExBERT to generate reliable explanation with finer granularity. |
H. Zhan; L. Li; S. Li; W. Liu; M. Gupta; A. C. Kot; |
2494 | Towards Hyperbolic Regularizers For Point Cloud Part Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend our earlier work, that showed how to use regularizers in the hyperbolic space to improve performance of point cloud classification models, to the problem of part segmentation. |
A. Montanaro; D. Valsesia; E. Magli; |
2495 | Towards Improved Room Impulse Response Estimation for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). |
A. Ratnarajah; I. Ananthabhotla; V. K. Ithapu; P. Hoffmann; D. Manocha; P. Calamia; |
2496 | Towards Improved Sonar Performance Using Environment-Informed Sparse Sub-Array Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to enhance the approach in two ways. |
A. L’Her; A. Drémeau; F. L. Courtois; G. Real; X. Cristol; Y. Stéphan; |
2497 | Towards Interpretable Seizure Detection Using Wearables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SeizFt, a robust seizure detection framework using EEG from a wearable device. |
I. Al-Hussaini; C. S. Mitchell; |
2498 | Towards Learning Emotion Information from Short Segments of Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this approach has been successful, there is a growing interest in modelling speech emotion information at the short segment level, at around 250ms-500ms (e.g. the 2021-22 MuSe Challenges). This paper investigates both hand-crafted feature-based and end-to-end raw waveform DNN approaches for modelling speech emotion information in such short segments. |
T. Purohit; S. Yadav; B. Vlasenko; S. P. Dubagunta; M. Magimai.-Doss; |
2499 | Towards Low-Power Heart Rate Estimation Based on User’s Demographics and Activity Level For Wearables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we proposed a model based on linear regression and a Proportional–Integral–Derivative (PID) controller that uses an accelerometer and user’s demographics to estimate HR. |
A. G. C. Pacheco; F. A. C. Cabello; A. M. O. Fonoff; P. G. Rodrigues; O. A. B. Penatti; P. R. Pinto; |
2500 | Towards Making A Trojan-Horse Attack on Text-to-Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By contrast, we present in this paper the first study about a threat that occurs at the back end of a text-to-image retrieval (T2IR) system. |
F. Hu; A. Chen; X. Li; |
2501 | Towards Polymorphic Adversarial Examples Generation for Short Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Short texts are more susceptible to word substitution than long texts, which makes semantic shifting more likely to occur, and the number of words in short texts can be modified is small, making the attack difficult to succeed and hard to guarantee naturality and fluency. To tackle the above problems, we present Polymorphic Adversarial Examples Generation (PAEG) attack, a generative method by combining pre-trained language model BERT and Variational Autoencoder. |
Y. Liang; Z. Lin; F. Yuan; H. Zhang; L. Wang; W. Wang; |
2502 | Towards Practical Edge Inference Attacks Against Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an edge inference attack in a more realistic and practical setting. |
K. Li; J. Sun; R. Chen; W. Ding; K. Yu; J. Li; C. Wu; |
2503 | Towards Privacy and Utility in Tourette TIC Detection Through Pretraining Based on Publicly Available Video Data of Healthy Subjects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to detect tics based on video data of patients with Gilles de la Tourette syndrome. |
N. Sophie Brügge; E. Mohammadi; A. Münchau; T. Bäumer; C. Frings; C. Beste; V. Roessner; H. Handels; |
2504 | Towards Realizing The Value of Labeled Target Samples: A Two-Stage Approach for Semi-Supervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Semi-Supervised Domain Adaptation (SSDA) is a recently emerging research topic that extends from the widely-investigated Unsupervised Domain Adaptation (UDA) by further having a few target samples labeled, i.e., the model is trained with labeled source samples, unlabeled target samples as well as a few labeled target samples. |
M. Jin; K. Li; S. Li; C. He; X. Li; |
2505 | Towards Real-Time Person Search with Invariant Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel real-time framework for both effective and efficient person search, termed as InvarPS. |
C. Jia; M. Luo; Z. Dang; X. Chang; Q. Zheng; |
2506 | Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore low-complexity, resource-efficient, causal DNN architectures for real-time separation of two or more simultaneous speakers. |
J. Neri; S. Braun; |
2507 | Towards Reducing Patient Effort for The Automatic Prediction of Speech Intelligibility in Head and Neck Cancers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an automatic way to regress an intelligibility score based on a recurrent model with a self-attention mechanism. |
S. Quintas; A. Abad; J. Mauclair; V. Woisard; J. Pinquier; |
2508 | Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Concretely, we propose a Depth-Guided Outpainting Network to model different feature representations of two modalities and learn the structure-aware cross-modal fusion. |
L. Zhang; C. Lin; K. Liao; Y. Zhao; |
2509 | Towards Robust Audio-Based Vehicle Detection Via Importance-Aware Audio-Visual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new audio-based vehicle detector that can transfer multimodal knowledge of vehicles to the audio modality during training. |
J. U. Kim; S. Tae Kim; |
2510 | Towards Robust Data-Driven Underwater Acoustic Localization: A Deep CNN Solution with Performance Guarantees for Model Mismatch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the recently proposed data-driven method [18] based on a deep convolutional neural network, and demonstrate that it can learn to localize in complex and mismatched environments. |
A. Weiss; A. C. Singer; G. W. Wornell; |
2511 | Towards Scale Adaptive Underwater Detection Through Refined Pyramid Grid Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most object detection methods have achieved impressive performance on several public benchmarks, instead, facing underwater detection tasks, it is challenging to detect marine targets because of the inherent illumination inhomogeneity in underwater images. |
X. Deng; L. Liao; P. Jiang; Y. Qian; |
2512 | Towards Simultaneous Segmentation Of Liver Tumors And Intrahepatic Vessels Via Cross-Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we collect the first liver tumor, and vessel segmentation benchmark datasets containing 52 portal vein phase computed tomography images with liver, liver tumor, and vessel annotations. In this case, we propose a 3D U-shaped Cross-Attention Network (UCA-Net) that utilizes a tailored cross-attention mechanism instead of the traditional skip connection to effectively model the encoder and decoder feature. |
H. Kuang; D. Yang; S. Wang; X. Wang; L. Zhang; |
2513 | Towards Trustworthy Multi-Label Sewer Defect Classification Via Evidential Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a trustworthy multi-label sewer defect classification (TMSDC) method, which can quantify the uncertainty of sewer defect prediction via evidential deep learning. |
C. Zhao; C. Hu; H. Shao; Z. Wang; Y. Wang; |
2514 | Towards Trustworthy Phoneme Boundary Detection with Autoregressive Model and Improved Evaluation Metric Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. |
H. Kim; H. -S. Choi; |
2515 | Towards Zero-Shot Code-Switched Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot set-ting where no transcribed CS speech data is available for training. |
B. Yan; M. Wiesner; O. Klejch; P. Jyothi; S. Watanabe; |
2516 | Towards Zero-Shot Personalized Table-to-Text Generation with Contrastive Persona Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, few of them shed light on generating personalized expressions, which often requires well-aligned persona-table-text datasets that are difficult to obtain. To overcome these obstacles, we explore personalized table-to-text generation under a zero-shot setting, by assuming no well-aligned persona-table-text triples are required during training. |
H. Zhan; X. Lin; S. Cui; Z. Zhao; W. Zhou; H. Chen; |
2517 | Toward Universal Text-To-Music Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces effective design choices for text-to-music retrieval systems. |
S. Doh; M. Won; K. Choi; J. Nam; |
2518 | Tracking Objects and Activities with Attention for Temporal Sentence Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new perspective to address the TSG task by tracking pivotal objects and activities to learn more fine-grained spatio-temporal behaviors. |
Z. Xiong; D. Liu; P. Zhou; J. Zhu; |
2519 | Tracking Targets in Hyper-Scale Cameras Using Movement Predication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At its core, HyMOT builds a probabilistic target movement graph with tempo-spatial correlation knowledge extracted from historical statistics. With the graph, we formulate tracking scheduling as an optimization problem with efficiency-accuracy tradeoff constraints and solve this NP-hard problem with a greedy strategy. |
J. Yu; T. Zhou; Z. Cai; W. Kuang; |
2520 | Training Graph Neural Networks on Growing Stochastic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to learn GNNs on very large graphs by leveraging the limit object of a sequence of growing graphs, the graphon. |
J. Cerviño; L. Ruiz; A. Ribeiro; |
2521 | Training Large-Vocabulary Neural Language Models By Private Federated Learning for Resource-Constrained Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Partial Embedding Updates (PEU), a novel technique to reduce the impact of DP-noise by decreasing payload size. |
M. Xu; C. Song; Y. Tian; N. Agrawal; F. Granqvist; R. van Dalen; X. Zhang; A. Argueta; S. Han; Y. Deng; L. Liu; A. Walia; A. Jin; |
2522 | Training Neural Networks for Sequential Change-Point Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel approach for online change-point detection using neural net-works. |
J. Lee; Y. Xie; X. Cheng; |
2523 | Training Robust Spiking Neural Networks on Neuromorphic Data with Spatiotemporal Fragments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Event Spatio Temporal Fragments (ESTF) augmentation method. |
H. Shen; Y. Luo; X. Cao; L. Zhang; J. Xiao; T. Wang; |
2524 | Training Robust Spiking Neural Networks with Viewpoint Transform and Spatiotemporal Stretching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel data augmentation method, View-Point Transform and SpatioTemporal Stretching (VPT-STS). |
H. Shen; J. Xiao; Y. Luo; X. Cao; L. Zhang; T. Wang; |
2525 | Training Set Cleansing of Backdoor Poisoning By Self-Supervised Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we focus on image classification tasks and show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. |
H. Wang; S. Karami; O. Dia; H. Ritter; E. Emamjomeh-Zadeh; J. Chen; Z. Xiang; D. J. Miller; G. Kesidis; |
2526 | Training Sound Event Detection with Soft Labels from Crowdsourced Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the use of soft labels to train a system for sound event detection (SED). |
I. Martín-Morató; M. Harju; P. Ahokas; A. Mesaros; |
2527 | Training Stronger Spiking Neural Networks with Biomimetic Adaptive Internal Association Neurons Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Adaptive Internal Association (AIA) neuron model to establish previously ignored influences within neurons. |
H. Shen; Y. Luo; X. Cao; L. Zhang; J. Xiao; T. Wang; |
2528 | Transadapt: A Transformative Framework for Online Test Time Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle online settings, we propose TransAdapt, a framework that uses transformer and input transformations to improve segmentation performance. |
D. Das; S. Borse; H. Park; K. Azarian; H. Cai; R. Garrepalli; F. Porikli; |
2529 | Transaudio: Towards The Transferable Adversarial Audio Attack Via Learning Contextualized Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing attack methods are mostly investigated in voice assistant scenarios with restricted voice commands, prohibiting their applicability to more general ASR related applications. To tackle this challenge, we propose a novel contextualized attack with deletion, insertion, and substitution adversarial behaviors, namely TransAudio, which achieves arbitrary word-level attacks based on the proposed two-stage framework. |
G. Qi; Y. Chen; Y. Zhu; B. Hui; X. Li; X. Mao; R. Zhang; H. Xue; |
2530 | Transceiver Design for MIMO-DFRC Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses joint design of the transmitting waveform and the receivers of a dual-function radar-communication (DFRC) system that enables both multiple-input multiple-output (MIMO) radar sensing and multi-user multiple-input single-output (MU-MISO) communications. |
C. Wen; T. N. Davidson; |
2531 | Transcription Free Filler Word Detection with Neural Semi-CRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate filler word detection system1 that does not depend on ASR systems. |
G. Zhu; Y. Yan; J. -P. Caceres; Z. Duan; |
2532 | Transductive Matrix Completion with Calibration for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a transductive matrix completion algorithm that incorporates a calibration constraint for the features under the multi-task learning framework. |
H. Wang; Y. Zhang; X. Mao; Z. Wang; |
2533 | Transferring Quantified Emotion Knowledge for The Detection of Depression in Alzheimer’s Disease Using Forestnets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a transfer learning strategy for automatically detecting AD and depression in AD patients using acoustic information and ForestNet, an artificial neural network that allows computing the contribution of a set of features to a model’s decision. |
P. A. Pérez-Toro; D. Rodríguez-Salas; T. Arias-Vergara; S. P. Bayerl; P. Klumpp; K. Riedhammer; M. Schuster; E. Nöth; A. Maier; J. R. Orozco-Arroyave; |
2534 | Transformer-Based Bioacoustic Sound Event Detection on Few-Shot Learning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an approach that combines the audio spectrogram transformer (AST), a data augmentation regime and transductive inference to detect sound events on the DCASE2022 (Task 5) dataset. |
L. You; E. P. Coyotl; S. Gunturu; M. Van Segbroeck; |
2535 | Transformer-Based Deep Hashing Method for Multi-Scale Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in the inefficiency of the model computation and the neglect of useful image information. Therefore, this paper proposes a Transformer-based deep hashing method for multi-scale feature fusion (TDH). |
C. He; H. Wei; |
2536 | Transformer-Based Multi-Prototype Approach for Diabetic Macular Edema Analysis in OCT Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, in this work, we propose a novel approach based on multi-prototype networks with vision transformers to obtain an example-based explainable classification. |
P. L. Vidal; J. de Moura; J. Novo; M. Ortega; J. S. Cardoso; |
2537 | Transformer-based Tracking Network for Maneuvering Targets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the state estimation problem of strong maneuvering targets, we propose a transformer-based tracking network, named TrTNet. |
Y. Zhang; G. Li; X. -P. Zhang; Y. He; |
2538 | Transient Dictionary Learning for Compressed Time-of-Flight Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we step aside from mainstream deep learning methods to invert the problem and propose exploiting underlying sparsity in an appropriate basis, in combination with compressive sampling schemes. |
M. H. Conde; |
2539 | TransLink: Transformer-Based Embedding for Tracklets’ Global Link Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Transformer-based tracklet linking method called TransLink to mitigate the association failures. |
Y. Zhang; S. Wang; Y. Fan; G. Wang; C. Yan; |
2540 | Transmit Energy Focusing For Parameter Estimation in Transmit Beamspace Slow-Time MIMO Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, Parallel Factor-Direct (PARAFAC-Direct) method has been proposed for parameter estimation including velocity disambiguation for Doppler Division Multiple Access (DDMA) Multiple-Input Multiple-Output (MIMO) radar. |
T. Zhang; F. Xu; S. A. Vorobyov; |
2541 | Transplayer: Timbre Style Transfer with Flexible Timbre Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the practice in voice conversion, we propose TransPlayer, which uses an autoencoder model with one-hot representations of instruments as the condition, and a Diffwave model trained especially for music synthesis. |
Y. Wu; Y. He; X. Liu; Y. Wang; R. B. Dannenberg; |
2542 | Transwnet: Integrating Transformers Into CNNS Via Row and Column Attention for Abdominal Multi-Organ Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing U-shaped structure methods use feature fusion to address these two challenges, but still lack the ability to balance capturing global relationships and local details. To address these issues, we propose a novel multi-organ segmentation framework called TransWnet to mine global relationships and local details from both intra- and inter-scale perspectives. |
Y. Xie; Y. Huang; Y. Zhang; X. Li; X. Ye; K. Hu; |
2543 | Tree-Like Interaction Learning for Bundle Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue hyperbolic space provides a promising way to get accurate entity embeddings, with this paper proposing a novel bundle recommendation model. |
H. Ke; L. Li; P. Wang; J. Yuan; X. Tao; |
2544 | TreeXGNN: Can Gradient-boosted Decision Trees Help Boost Heterogeneous Graph Neural Networks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a novel framework, the tree-boosted heterogeneous graph neural network abbreviated as TreeXGNN, which could efficiently and automatically extract target node features via gradient-boosted decision trees (GBDT). |
M. -Y. Hong; S. -Y. Chang; H. -W. Hsu; Y. -H. Huang; C. -Y. Wang; C. Lin; |
2545 | TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose Triple Adaptive Attention Normalization VC (TriAAN-VC), comprising an encoder-decoder and an attention-based adaptive normalization block, that can be applied to non-parallel any-to-any VC. |
H. J. Park; S. Woo Yang; J. S. Kim; W. Shin; S. W. Han; |
2546 | TRICL: Triplet Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, the prototypes rapidly become outdated as the agent adapts to new data sequentially, and the previous example embeddings spread out in an unforeseen way, which exacerbates forgetting (i.e., concept drift). Based on this observation, we propose a replay-based method, called TriCL, which gathers the embeddings near the prototype from the same class and separates the embeddings from the different class prototypes. |
X. Zhang; G. Wang; X. Zhang; H. Liu; Z. Yin; W. Yang; |
2547 | TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. |
X. Song; D. Wu; Z. Wu; B. Zhang; Y. Zhang; Z. Peng; W. Li; F. Pan; C. Zhu; |
2548 | Trinet: Stabilizing Self-Supervised Learning From Complete or Slow Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pretraining. |
L. Cao; J. Wang; B. Yang; D. Su; D. Yu; |
2549 | TrOMR:Transformer-Based Polyphonic Optical Music Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. |
Y. Li; H. Liu; Q. Jin; M. Cai; P. Li; |
2550 | TRUSTERA: A Live Conversation Redaction System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Trustera1, the first functional system that redacts personally identifiable information (PII) in real-time spoken conversations to remove agents’ need to hear sensitive information while preserving the naturalness of live customer-agent conversations. |
E. Gouvêa; A. Dadgar; S. Jalalvand; R. Chengalvarayan; B. Jayakumar; R. Price; N. Ruiz; J. McGovern; S. Bangalore; B. Stern; |
2551 | Trust Your Partner’s Friends: Hierarchical Cross-Modal Contrastive Pre-Training for Video-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to leverage the well-represented information of each original modality and exploit complementary information in two views of the same video, i.e., video clips and captions, by using one view to obtain positive samples with the neighboring samples of the other. |
Y. Xiang; K. Liu; S. Tang; L. Bai; F. Zhu; R. Zhao; X. Lin; |
2552 | TSpeech-AI System Description to The 5th Deep Noise Suppression (DNS) Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This report presents the development of Tencent AI Lab’s personalized speech enhancement system for the 2023 ICASSP Signal Processing Grand Challenge – deep noise suppression (DNS) challenge1, which includes the use of a modified band-split recurrent neural network (BSRNN) and a multi-resolution spectrogram discriminator to improve perceptual quality metrics. |
J. Yu; H. Chen; Y. Luo; R. Gu; W. Li; C. Weng; |
2553 | TSPTQ-ViT: Two-Scaled Post-Training Quantization for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the quantization loss and improve classification accuracy, we propose a two-scaled post-training quantization scheme for vision transformer (TSPTQ-ViT). |
Y. -S. Tai; M. -G. Lin; A. -Y. A. Wu; |
2554 | TT-Net: Dual-Path Transformer Based Sound Field Translation in The Spherical Harmonic Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the problems mentioned above, we propose a neural network scheme based on the dual-path transformer. |
Y. Wang; Z. Lan; X. Wu; T. Qu; |
2555 | Twitter Stance Detection Via Neural Production Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an interpretable neural production system for stance detection (NPS4SD). |
B. Zhang; D. Ding; G. Xu; J. Guo; Z. Huang; X. Huang; |
2556 | Two-Branch Multi-Scale Deep Neural Network for Generalized Document Recapture Attack Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering the current learning-based methods suffer from serious over-fitting problem, in this paper, we propose a novel two-branch deep neural network by mining better generalized recapture artifacts with a designed frequency filter bank and multi-scale cross-attention fusion module. |
J. Li; C. Kong; S. Wang; H. Li; |
2557 | Two-Phase Prototypical Contrastive Domain Generalization for Cross-Subject EEG-Based Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a two-phase prototypical contrastive domain generalization framework (PCDG) is proposed for cross-subject EEG-based emotion recognition, which mainly consists of a new convolutional neural network based on a residual block and a CBAM block and a two-phase prototypical representation-based contrastive learning method. |
H. Cai; J. Pan; |
2558 | Two-Stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. |
M. Liu; S. Lv; Z. Zhang; R. Han; X. Hao; X. Xia; L. Chen; Y. Xiao; L. Xie; |
2559 | Two-Stage UNet with Multi-Axis Gated Multilayer Perceptron for Monaural Noisy-Reverberant Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To combine advantages and improve speech enhancement performance, we propose a two-stage UNet (TSUNet) to estimate complex spectral masking and complex spectral mapping. |
Z. Zhang; S. Xu; X. Zhuang; L. Zhou; H. Li; M. Wang; |
2560 | Two-Stage Video De-Raining with Spatio-Temporal Fusion and Illumination-Invariant Detail Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though numerous video de-raining methods are developed with encouraging performance, two major challenges for video de-raining are still unsatisfactorily solved and need to be further investigated as follows: 1) how to sufficiently explore the useful spatio-temporal information from adjacent rainy frames to facilitate the rain removal, and 2) how to well preserve background details even in a video with illumination variance. Regarding the above challenges, this paper specifically develops a new two-stage video de-raining method, which cleverly integrates two typical modules that are beneficial for the video de-raining task, namely Spatio-Temporal Fusion (STF) module and Illumination-Invariant Detail Preservation (IIDP) module. |
Y. Tan; Y. Xiang; L. Cai; P. Wang; Y. Zhang; Y. Fu; |
2561 | Two-Step Band-Split Neural Network Approach For Full-Band Residual Echo Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a Two-step Band-split Neural Network (TBNN) approach for full-band acoustic echo cancellation. |
Z. Zhang; S. Zhang; M. Liu; Y. Leng; Z. Han; L. Chen; L. Xie; |
2562 | Two-Stream Decoder Feature Normality Estimating Network for Industrial Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, these approaches are not explicitly optimized for distinguishable anomalies. To address these problems, we propose a two-stream decoder network (TSDN), designed to learn both normal and abnormal features. |
C. Park; M. Lee; S. Cho; D. Kim; S. Lee; |
2563 | Two-Stream Joint-Training for Speaker Independent Acoustic-to-Articulatory Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel network called SPN that uses two different streams to carry out the AAI task. |
J. Wang; J. Liu; X. Li; M. Yu; J. Gao; Q. Fang; L. Liu; |
2564 | UAV Local Path Planning Based on Improved Proximal Policy Optimization Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the Actor-Critic framework of PPO suffers from the problem of high variance. To address the above issues, we proposed a Delayed-policy-update PPO with a Prioritized Reply of Recent experience (DPPO-PR2) for local path planning. |
J. Xu; X. Yan; C. Peng; X. Wu; L. Gu; Y. Niu; |
2565 | UAV Remote Sensing Image Dehazing Based on Multi-Dimensional Saliency Awareness Unequal Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-dimensional saliency awareness unequal network to avoid texture loss and color distortions. |
R. Zheng; L. Zhang; |
2566 | U-Beat: A Multi-Scale Beat Tracking Model Based on Wave-U-Net Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multi-scale model for beat tracking based on the Wave-U-Net model. |
T. Cheng; M. Goto; |
2567 | UCONV-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The work proposes a new Uconv-Conformer architecture 1 based on the standard Conformer model. |
A. Andrusenko; R. Nasretdinov; A. Romanenko; |
2568 | UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. |
J. Guo; M. Wang; X. Qiao; D. Wei; H. Shang; Z. Li; Z. Yu; Y. Li; C. Su; M. Zhang; S. Tao; H. Yang; |
2569 | UFO2: A Unified Pre-Training Framework for Online and Offline Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. |
L. Fu; S. Li; Q. Li; L. Deng; F. Li; L. Fan; M. Chen; X. He; |
2570 | Ultimate Negative Sampling for Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel ultimate negative sampling for contrastive learning. |
H. Guo; L. Shi; |
2571 | Ultra Real-Time Portrait Matting Via Parallel Semantic Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an ultra-light-weighted portrait matting network via parallel semantic guidance (PSGNet) for real-time portrait matting without any auxiliary inputs. |
X. Huang; J. Xie; B. Xu; H. Huang; Z. Li; C. Lu; Y. Guo; Y. Tang; |
2572 | Ultrasound Image Quality Control Using Speech-Assisted Switchable CycleGAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, there are many cases where such real-time control of IQ is difficult, especially in the intensive care unit (ICU) or operating room (OR), since the operator should simultaneously treat patients in sterile status and adjust the system parameters. To address this, inspired by the recent success of Switchable CycleGAN using Adaptive Instance Normalization (AdaIN) layers, here we propose a novel speech-assisted Switchable CycleGAN architecture that can be controlled by operator’s verbal commands. |
J. Huh; S. Khan; E. Sun Lee; J. Chul Ye; |
2573 | UML: A Universal Monolingual Output Layer For Multilingual Asr Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For multilingual ASR, due to the differences in written scripts across languages, multilingual WPMs bring the challenges of having overly large output layers and scaling to more languages. In this work, we propose a universal monolingual output layer (UML) to address such problems. |
C. Zhang; B. Li; T. N. Sainath; T. Strohman; S. -Y. Chang; |
2574 | Unbiased Unsupervised Stimulus Reconstruction for EEG-Based Auditory Attention Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The vast majority of research has focused on developing supervised AAD algorithms in which the decoder is trained based on ground truth labels about the attention to each speaker. |
N. Heintz; S. Geirnaert; T. Francart; A. Bertrand; |
2575 | Uncer2Natural: Uncertainty-Aware Unsupervised Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The presence of Aleatoric uncertainty causes degradation of the reconstructed target pixels, resulting in high uncertainty for these pixels (i.e., low confidence), which in turn leads to sub-optimal denoising results. To address this problem, we propose a novel uncertainty-aware unsupervised image denoising method named Uncer2Natural (U2N). |
C. Huang; W. Tan; J. Shi; Z. Xing; B. Yan; |
2576 | Uncertainty-Aware Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, in order to imitate the cognitive way of human beings and improve the continuous representation ability, we propose a pseudo-incremental task construction mechanism based on uncertainty estimation, where the machine learn to recognize from simple to difficult. |
J. Zhu; J. Zhao; J. Zhou; L. He; J. Yang; Z. Zhang; |
2577 | Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. |
H. Fang; T. Gerkmann; |
2578 | Understandable Relu Neural Network For Signal Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a constrained neural network model that replaces polyhedrons by orthotopes: each hidden neuron processes only a single component of the input signal. |
M. Guyomard; S. Barbosa; L. Fillatre; |
2579 | Understanding Shared Speech-Text Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we expand our understanding of the resulting shared speech-text representations with two types of analyses. |
G. Wang; K. Kastner; A. Bapna; Z. Chen; A. Rosenberg; B. Ramabhadran; Y. Zhang; |
2580 | Underwater Image Restoration with Light-Aware Progressive Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is difficult to obtain images with both color equalization and rich texture in various underwater scenes. To alleviate these issues, this paper proposes a reflected light-aware multi-scale progressive restoration network. |
J. Yang; C. Li; X. Li; |
2581 | UNeXt: A Low-Dose CT Denoising UNet Model with The Modified ConvNeXt Block Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, low-dose CT images (LDCT) have been denoised in the UNet-based novel architecture of convolutional neural network (CNN) and compared with normal-dose images (NDCT). |
F. N. Mazandarani; P. Babyn; J. Alirezaie; |
2582 | Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces three mobile-device deployable models named Unified Transformers (UiT). |
H. Dinkel; Y. Wang; Z. Yan; J. Zhang; Y. Wang; |
2583 | Unified Prompt Learning Makes Pre-Trained Language Models Better Few-Shot Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient few-shot learning method to dynamically decide the degree to which task-specific and instance-dependent information are incorporated according to different task and instance characteristics, enriching the prompt with task-specific and instance-dependent information. |
F. Jin; J. Lu; J. Zhang; |
2584 | Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they would degrade significantly when put under realistic noisy conditions, as the background noise could be mistaken for speaker’s speech and thus interfere with the separated sources. To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness. |
Y. Hu; C. Chen; H. Zou; X. Zhong; E. S. Chng; |
2585 | Unique Bispectrum Inversion for Signals with Finite Spectral/Temporal Support Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional methods do not provide an accurate inversion of bispectrum to the underlying signal. In this paper, we present an approach that uniquely recovers signals with finite spectral support (band-limited signals) from at least 3B measurements of its bispectrum function (BF), where B is the signal’s bandwidth. |
S. Pinilla; K. V. Mishra; B. M. Sadler; |
2586 | Unitary Esprit for Coprime Arrays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, ESPRIT is applied separately to each ULA to yield a set of residues of DOAs, and residues from the ULAs are paired to resolve DOAs. |
P. -C. Chen; P. P. Vaidyanathan; |
2587 | Universal Speaker Recognition Encoders for Different Speech Segments Duration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe our simple recipe for training universal speaker encoder for any type of selected neural network architecture. |
S. Novoselov; V. Volokhov; G. Lavrentyeva; |
2588 | Unlimited Sampling in Phase Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution in this paper is a novel modulo acquisition pipeline in phase space and a mathematically guaranteed recovery algorithm that is also backwards compatible with Fourier domain theory. |
P. Zhang; A. Bhandari; |
2589 | Unlimited Sampling of FRI Signals Independent of Sampling Rate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, in this paper, we consider non-bandlimited signals, in particular, sparse inputs with finite-rate-of-innovation (FRI). |
R. Guo; A. Bhandari; |
2590 | Unlimited Sampling Radar: Life Below The Quantization Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the trade-off between the quantization noise and the dynamic range of ADCs used to acquire radar signals is revisited using the Unlimited Sensing Framework (USF) in a practical setting. |
T. Feuillen; B. Shankar MRR; A. Bhandari; |
2591 | Unobtrusive Respiratory Monitoring System for Intensive Care Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The video-based non-contact respiration detection technology can be used in many application scenarios to unobtrusively and ubiquitously monitor the physical state of living … |
X. Tan; M. Hu; G. Zhai; Y. Zhu; W. Li; X. Zhang; |
2592 | Unrestricted Anchor Graph Based GCN for Incomplete Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It’s a really challenge to capture the graph structure of incomplete views for GCN to process, especially in the high missing-rate situation. To address this is-sue, this paper proposes a novel and effective graph construct method called unrestricted anchor graph(UAG). |
L. Zhao; Z. Wang; Y. Yuan; F. Ding; |
2593 | Unrolled Fourier Disparity Layer Optimization for Scene Reconstruction from Few-Shots Focal Stacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel unrolled optimization method to reconstruct a dense light field from a focal stack containing only very few images captured with different focus. |
B. L. Bon; M. Le Pendu; C. Guillemot; |
2594 | Unsupervised Action Segmentation of Untrimmed Egocentric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a novel approach for unsupervised activity segmentation that detects frames corrupted by ego-motion and estimates action boundaries using kernel change-point detection. |
S. Perochon; L. Oudre; |
2595 | Unsupervised Anomaly Detection and Localization of Machine Audio: A Gan-Based Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input spectrograms. |
A. Jiang; W. -Q. Zhang; Y. Deng; P. Fan; J. Liu; |
2596 | Unsupervised Deep Digital Staining for Microscopic Cell Images Via Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel unsupervised deep learning framework for the digital staining of cell images using knowledge distillation and generative adversarial networks (GANs). |
Z. Xu; L. Guo; S. Zhang; A. C. Kot; B. Wen; |
2597 | Unsupervised Domain Adaptation for Preference Learning Based Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is desirable to build a preference learning framework that ranks speech samples according to emotional attribute values that generalize well to new domains. |
A. R. Naini; M. A. Kohler; C. Busso; |
2598 | Unsupervised Domain Adaptation Via Subspace Interpolating Deep Dictionary Learning: A Case Study in Machine Inspection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents an unsupervised domain adaptation method where labeled data is available only in the source domain via subspace interpolation using deep dictionary learning. |
K. Kumar; A. Majumdar; A. A. Kumar; M. Girish Chandra; |
2599 | Unsupervised Extractive Summarization With Heterogeneous Graph Embeddings for Chinese Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we are the first to propose an unsupervised extractive summarizaiton method with heterogeneous graph embeddings (HGEs) for Chinese documents. |
C. Lin; Y. Liu; S. An; D. Yin; |
2600 | Unsupervised Feature Selection with Self-Weighted and ℓ2,0-Norm Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But most of them follow an assumption that all features are equally importance. To settle this problem, we draw a novel feature selection module that simultaneously performs learning of feature weights matrix, similarity graph structure and projection matrix, so that the local structure after feature weighting and subspace sparse projection is received. |
Y. Yuan; Z. Wang; F. Nie; X. Li; |
2601 | Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget. |
R. Gody; D. Harwath; |
2602 | Unsupervised Model-Based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the learning hidden unit contributions (LHUC) based adaptation techniques with compact speaker dependent (SD) parameters are used to facilitate both speaker adaptive training (SAT) and unsupervised test-time speaker adaptation for end-to-end (E2E) lattice-free MMI (LF-MMI) models. |
X. Xie; X. Liu; H. Chen; H. Wang; |
2603 | Unsupervised Noise Adaptation Using Data Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the unsupervised noise adaptation problem in speech enhancement, where the ground truth of target domain data is completely unavailable. |
C. Chen; Y. Hu; H. Zou; L. Sun; E. S. Chng; |
2604 | Unsupervised Out-of-Distribution Detection Using Few In-Distribution Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At last, we present an extensive evaluation of three ID datasets and three OOD datasets. |
C. Gautam; A. Kane; S. Ramasamy; S. Sundaram; |
2605 | Unsupervised Pre-Training for Data-Efficient Text-to-Speech on Low Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an unsupervised pre-training method for a sequence-to-sequence TTS model by leveraging large untranscribed speech data. |
S. Park; M. Song; B. Kim; T. -H. Oh; |
2606 | Unsupervised Speaker Verification Using Pre-Trained Model and Label Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present a novel strategy for unsupervised speaker verification using the Sub-structure of Pre-Trained Model (Sub-PTM), which consists of a CNN-based feature extractor and several Transformer blocks. |
Z. Chen; J. Wang; W. Hu; L. Li; Q. Hong; |
2607 | Unsupervised Video Anomaly Detection For Stereotypical Behaviours in Autism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Correspondingly, we propose a Dual Stream deep model for Stereotypical Behaviours Detection, DS-SBD, based on the temporal trajectory of human poses and the repetition patterns of human actions. |
J. Gao; X. Jiang; Y. Yang; D. Li; L. Qiu; |
2608 | Unsupervised Vocal Dereverberation with Diffusion-Based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. |
K. Saito; N. Murata; T. Uesaka; C. -H. Lai; Y. Takida; T. Fukui; Y. Mitsufuji; |
2609 | Unsupervised Voice Type Discrimination Score Adaptation Using X-Vector Clusters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing work has described methods for performing the VTD task. This paper presents a method for adapting the output of these existing methods in an unsupervised manner via x-vector clustering and correlation. |
M. Lindsey; T. Vuong; R. M. Stern; |
2610 | Unsupervised Word Segmentation Based on Word Influence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Combined with the fine-tuning word segmentation task, a multilingual unsupervised word segmentation model was proposed. |
R. Yan; H. Zhang; W. Silamu; A. Hamdulla; |
2611 | Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To extend their effectiveness to unsupervised word segmentation, we propose a pseudo-labeling strategy. |
T. S. Fuchs; Y. Hoshen; |
2612 | UNTAG: Learning Generic Features for Unsupervised Type-Agnostic Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel framework for unsupervised type-agnostic deepfake detection called UNTAG. |
N. Mejri; E. Ghorbel; D. Aouada; |
2613 | Untargeted Backdoor Attack Against Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Currently, most of the existing backdoor attacks were conducted on the image classification under the targeted manner. In this paper, we reveal that these threats could also happen in object detection, posing threatening risks to many mission-critical applications (e.g., pedestrian detection and intelligent surveillance systems). |
C. Luo; Y. Li; Y. Jiang; S. -T. Xia; |
2614 | UPGLADE: Unplugged Plug-and-Play Audio Declipper Based on Consensus Equilibrium of DNN and Sparse Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel audio declipping method that fuses sparse-optimization-based and deep neural network (DNN)– based methods. |
T. Tanaka; K. Yatabe; Y. Oikawa; |
2615 | URM4DMU: An User Representation Model for Darknet Markets Users Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent works mainly use CNN to model the text information of posts, failing to effectively model posts whose length changes frequently in an episode. To address the above problems, we propose a model named URM4DMU(User Representation Model for Darknet Markets Users) which mainly improves the post representation by augmenting convolutional operators and self-attention with an adaptive gate mechanism. |
H. Liu; J. Zhao; Y. Huo; Y. Wang; C. Liao; L. Shen; S. Cui; J. Shi; |
2616 | U-Shiftformer: Brain Tumor Segmentation Using A Shifted Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we proposed a network structure based on the shifted attention mechanism, namely U-Shiftformer, to overcome the limitation of existing convolution neural networks (CNNs) in brain tumor segmentation that lacks multimodal information interaction. |
C. -W. Lin; Z. Chen; |
2617 | Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to overcome CF for E2E ASR by inserting adapters, small architectures of few parameters which allow a general model to be fine-tuned to a specific task, into our model. |
S. V. Eeckt; H. Van Hamme; |
2618 | Using Auxiliary Tasks In Multimodal Fusion of Wav2vec 2.0 And Bert for Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The lack of data and the difficulty of multimodal fusion have always been challenges for multimodal emotion recognition (MER). In this paper, we propose to use pre-trained models as upstream network, wav2vec 2.0 for audio modality and BERT for text modality, and finetune them in downstream task of MER to cope with the lack of data. |
D. Sun; Y. He; J. Han; |
2619 | Using Emotion Embeddings to Transfer Knowledge Between Emotions, Languages, and Annotation Formats Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A shared model between different configurations would enable the sharing of knowledge and a decrease in training costs, and would simplify the process of deploying emotion recognition models in novel environments. In this work, we study how we can build a single model that can transition between these different configurations by leveraging multilingual models and Demux, a transformer-based model whose input includes the emotions of interest, enabling us to dynamically change the emotions predicted by the model. |
G. Chochlakis; G. Mahajan; S. Baruah; K. Burghardt; K. Lerman; S. Narayanan; |
2620 | Using Machine Learning to Understand The Relationships Between Audiometric Data, Speech Perception, Temporal Processing, And Cognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We constructed ensemble models for 443 participants who varied in age and hearing loss. |
R. M. Khalil; A. Papanicolaou; R. T. Chou; B. E. Gibbs; S. Anderson; S. Gordon-Salant; M. P. Cummings; M. J. Goupell; |
2621 | Using Modified Adult Speech As Data Augmentation for Child Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel way of applying data augmentation for child speech recognition in the low data resource scenario. |
Z. Fan; X. Cao; G. Salvi; T. Svendsen; |
2622 | Using Received Power in Microphone Arrays to Estimate Direction of Arrival Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem is recast into a linear regression framework where the least squares method applies, and the main drawback is that different sound sources are not readily separable.Our proposed approach is based on a training phase where the directional sensitivity of each microphone element is estimated. |
G. Zetterqvist; F. Gustafsson; G. Hendeby; |
2623 | Utility Polelocalization By Learning from Ambient Traces on Distributed Acoustic Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a pole localization solution by learning the ambient data collected from a DAS system, which are vibration patterns excited by random ambient events, such as wind and nearby traffic. |
Z. Jiang; Y. Tian; Y. Ding; S. Ozharar; T. Wang; |
2624 | Utilization of Bessel Beams in Wideband Sub Terahertz Communication Systems to Mitigate Beamsplit Effects in The Near-field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the near field, the efficiency of conventional far-field beamforming is reduced, while state-of-the-art near-field beamfocusing requires perfect positioning and channel state information to work efficiently. In this paper, first, experimental measurements of wireless data transmission above 100 GHz in indoor scenarios are presented to highlight the need for VLAAs that will likely operate in the near field. |
A. Singh; V. Petrov; J. M. Jornet; |
2625 | Utilizing Wav2Vec In Database-Independent Voice Disorder Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to use a pre-trained wav2vec 2.0 model as a feature extractor to build automatic detection systems for voice disorders. |
S. Tirronen; F. Javanmardi; M. Kodali; S. Reddy Kadiri; P. Alku; |
2626 | UWB Localization-of-Things Via Soft Information: Network Experimentation in Indoor Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper demonstrates real-time SI-based LoT using ultrawideband (UWB) radios. We consider two data collection approaches and evaluate them via network experimentation in an indoor environment. |
C. A. Gómez-Vega; M. Z. Win; A. Conti; |
2627 | UX-Net: Filter-and-Process-Based Improved U-Net for Real-time Time-domain Audio Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents UX-Net, a time-domain audio separation network (TasNet) based on a modified U-Net architecture. |
K. Patel; A. Kovalyov; I. Panahi; |
2628 | VAN-ICP: GPU-Accelerated Approximate Nearest Neighbor Search for ICP Registration Via Voxel Dilation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the bottleneck, we propose a novel GPU-friendly approximate nearest neighbor search (ANNS) acceleration scheme, named Voxel dilAtioN (VAN), which can efficiently convert the global search to local $\left. |
W. Wang; Q. Chang; |
2629 | Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. |
R. Badlani; A. Arora; S. Ghosh; R. Valle; K. J. Shih; J. Felipe Santos; B. Ginsburg; B. Catanzaro; |
2630 | Vararray Meets T-Sot: Advancing The State of The Art of Streaming Distant Conversational Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. |
N. Kanda; J. Wu; X. Wang; Z. Chen; J. Li; T. Yoshioka; |
2631 | Variable Attention Masking for Configurable Transformer Transducer Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. |
P. Swietojanski; S. Braun; D. Can; T. F. Da Silva; A. Ghoshal; T. Hori; R. Hsiao; H. Mason; E. McDermott; H. Silovsky; R. Travadi; X. Zhuang; |
2632 | Variable Rate Allocation for Vector-Quantized Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as a neural compression method, they lack the possibility to allocate a variable number of bits to each image location, e.g. according to the semantic content or local saliency. In this paper, we address this limitation in a simple yet effective way. |
F. Baldassarre; A. El-Nouby; H. Jégou; |
2633 | Variational Bayesian Channel Estimation in Wideband Multi-Scale Multi-Lag Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of channel estimation for VBMC based communications over wideband MSML channels. |
N. Halder; A. K.P.; C. R. Murthy; |
2634 | Variational Inference Aided Estimation of Time Varying Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new DVAE architecture, called k-MemoryMarkovVAE (k-MMVAE), whose sparsity can be controlled by an additional memory parameter. |
B. Böck; M. Baur; V. Rizzello; W. Utschick; |
2635 | Variational Message Passing-Based Respiratory Motion Estimation and Detection Using Radar Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a variational message passing (VMP)-based approach to detect the presence of a person based on their respiratory chest motion using multistatic ultra-wideband (UWB) radar. |
J. Möderl; E. Leitinger; F. Pernkopf; K. Witrisal; |
2636 | VarietySound: Timbre-Controllable Video to Sound Generation Via Unsupervised Information Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the task of generating sound with a specific timbre given a silent video input and a reference audio sample. |
C. Cui; Z. Zhao; Y. Ren; J. Liu; R. Huang; F. Chen; Z. Wang; B. Huai; F. Wu; |
2637 | Various Performance Bounds on The Estimation of Low-Rank Probability Mass Function Tensors from Partial Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we derive theoretical bounds on the attainable performance under this model assumption. |
T. Hershkovitz; M. Haardt; A. Yeredor; |
2638 | Vehicle View Synthesis By Generative Adversarial Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel view synthesis method is proposed based on Generative Adversarial Networks (GANs), named PTGAN. |
C. -S. Hu; S. -W. Tseng; X. -Y. Fan; C. -K. Chiang; |
2639 | VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel visual modality enhanced end-to-end KWS framework (VE-KWS), which fuses audio and visual modalities from two aspects. |
A. Zhang; H. Wang; P. Guo; Y. Fu; L. Xie; Y. Gao; S. Zhang; J. Feng; |
2640 | VF-Taco2: Towards Fast and Lightweight Synthesis for Autoregressive Models with Variation Autoencoder and Feature Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new fast and lightweight TTS framework named VF-Taco2, which can quickly synthesize speech without GPUs. |
Y. Liu; C. Gong; L. Wang; X. Wu; Q. Liu; J. Dang; |
2641 | Video Captioning Via Relation-Aware Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel relation-aware graph learning framework. |
Y. Zheng; H. Jing; Q. Xie; Y. Zhang; R. Feng; T. Zhang; S. Gao; |
2642 | Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Virtuoso, a massively multilingual speech–text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. |
T. Saeki; H. Zen; Z. Chen; N. Morioka; G. Wang; Y. Zhang; A. Bapna; A. Rosenberg; B. Ramabhadran; |
2643 | Vision2Touch: Imaging Estimation of Surface Tactile Physical Properties Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use the framework of GANs to propose a cross-modal imaging method for estimating the tactile physical properties values based on the Gramian Summation Angular Field, combined with visual-tactile embedding cluster fusion and feature matching methods. |
J. Chen; S. Zhou; |
2644 | Vision, Deduction and Alignment: An Empirical Study on Multi-Modal Knowledge Graph Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Entity alignment (EA) for knowledge graphs (KGs) plays a critical role in knowledge engineering. Existing EA methods mostly focus on utilizing the graph structures and entity … |
Y. Li; J. Chen; Y. Li; Y. Xiang; X. Chen; H. -T. Zheng; |
2645 | Vision Transformer-Based Feature Extraction for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we put forth a new GZSL technique exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. |
J. Kim; K. Shim; J. Kim; B. Shim; |
2646 | Vision Transformer with Progressive Tokenization for CT Metal Artifact Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Also, transformer-based methods have been harnessed in CT image denoising. Nevertheless, these methods have been little explored in MAR. |
S. Zheng; D. Zhang; C. Yu; D. Zhu; L. Zhu; H. Liu; Z. Huang; |
2647 | Visual Answer Localization with Cross-Modal Mutual Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a cross-modal mutual knowledge transfer span localization (MutualSL) method to reduce the knowledge deviation. |
Y. Weng; B. Li; |
2648 | Visual-Aware Text-to-Speech* Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and sequential visual feedback (e.g., nod, smile) of the listener in face-to-face communication. |
M. Zhou; Y. Bai; W. Zhang; T. Yao; T. Zhao; T. Mei; |
2649 | Visual Graph Reasoning Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Image segmentation by grid features leads to the fragmentation of meaningful visual regions, limiting the cross-modal alignment capability of the model. Therefore, we proposed a more flexible method called Visual Graph. |
D. Li; X. Lin; H. Cai; W. Chen; |
2650 | Visual Information Matters for ASR Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The other is that the community lacks a high-quality benchmark where visual information matters for the EC models. Therefore, this paper provides 1) simple yet effective methods, namely gated fusion and image captions as prompts to incorporate visual information to help EC; 2) large-scale benchmark datasets, namely Visual-ASR-EC, where each item in the training data consists of visual, speech, and text information, and the test data are carefully selected by human annotators to ensure that even humans could make mistakes when visual information is missing. |
V. B. Kumar; S. Cheng; N. Peng; Y. Zhang; |
2651 | Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for synthesizing environmental sounds from visually represented onomatopoeias and sound sources. |
H. Ohnaka; S. Takamichi; K. Imoto; Y. Okamoto; K. Fujii; H. Saruwatari; |
2652 | Visual Prompting for Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage visual prompting (VP) to improve adversarial robustness of a fixed, pre-trained model at test time. |
A. Chen; P. Lorenz; Y. Yao; P. -Y. Chen; S. Liu; |
2653 | Vitasd: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the use of the Vision Transformer (ViT) for the computational analysis of pediatric ASD. |
X. Cao; W. Ye; E. Sizikova; X. Bai; M. Coffee; H. Zeng; J. Cao; |
2654 | ViT-Cat: Parallel Vision Transformers With Cross Attention Fusion for Popularity Prediction in MEC Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This necessitates an urgent quest to develop and design a new and innovative popularity prediction architecture to tackle this critical challenge. The paper addresses this gap by proposing a novel hybrid caching framework based on the attention mechanism. |
Z. HajiAkhondi-Meybodi; A. Mohammadi; M. Hou; J. Abouei; K. N. Plataniotis; |
2655 | VLKP:Video Instance Segmentation with Visual-Linguistic Knowledge Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the observation that incorporating linguistic knowledge can significantly improve the model’s contextual understanding of the video, in this paper, we present a Video Instance Segmentation approach with Visual-Linguistic Knowledge Prompts(VLKP), a novel paradigm for offline video instance Segmentation. |
R. Chen; S. Liu; J. Chen; B. Guo; F. Zhang; |
2656 | Voice Conversion Using Feature Specific Loss Function Based Self-Attentive Generative Adversarial Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing VC model-generated speech samples possess substantial dissimilarity from their corresponding natural human speech. Therefore, in this work a GAN-based VC model is proposed which is incorporated with a self-attention (SA) mechanism based generator network to obtain the formant distribution of the target mel-spectrogram efficiently. |
S. Dhar; P. Banerjee; N. D. Jana; S. Das; |
2657 | Voice-Preserving Zero-Shot Multiple Accent Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we use adversarial learning to disentangle accent dependent features while retaining other acoustic characteristics. |
M. Jin; P. Serai; J. Wu; A. Tjandra; V. Manohar; Q. He; |
2658 | Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent success in the identifiability of nonnegative matrix factorization, the goal of this work is to achieve similar results for nonnegative Tucker decomposition (NTD). |
Y. Sun; K. Huang; |
2659 | Volumetric 3D Reconstruction with Window-Wise Global Feature Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as such approaches utilize discrete feature voxels to encode the observed scenes, the global feature interaction within and across different voxels is ignored, leading to imperfect reconstructions. To solve this problem, we propose a novel volumetric 3D reconstruction method named VolGARecon. |
S. Ren; Y. Ding; J. Liao; X. Li; J. Guo; W. Feng; X. Wang; |
2660 | Volumetric Attribute Compression for 3D Point Clouds Using Feedforward Network with Geometric Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Extending a previous work Region Adaptive Hierarchical Transform (RAHT) that employs piecewise constant functions to span a nested sequence of function spaces, we propose a feedforward linear network that implements higher-order B-spline bases spanning function spaces without eigen-decomposition. |
T. T. Do; P. A. Chou; G. Cheung; |
2661 | VPPT: Visual Pre-Trained Prompt Tuning Framework for Few-Shot Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Extensive experiments show that our VPPT framework achieves 16.08% average accuracy absolute improvement under 1 shot setting on five fine-grained visual classification datasets, compared with the previous PETuning techniques, e.g., VPT, in few-shot image classification. |
Z. Song; K. Yang; N. Guan; J. Zhu; P. Qiao; Q. Hu; |
2662 | VQ-CL: Learning Disentangled Speech Representations with Contrastive Learning and Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposed a novel zero-shot voice conversion framework that utilizes contrastive learning and vector quantization to encourage the frame-level hidden features closer to the phoneme-level linguistic information, called VQ-CL. |
H. Tang; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
2663 | W2KPE: Keyphrase Extraction with Word-Word Relation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes our submission to ICASSP 2023 MUG Challenge Track 4, Keyphrase Extraction, which aims to extract keyphrases most relevant to the conference theme from conference materials. |
W. Cheng; S. Dong; W. Wang; |
2664 | Wassertein Gan Synthesis for Time Series with Complex Temporal Dynamics: Frugal Architectures and Arbitrary Sample-Size Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This works can thus be considered a contribution towards sustainable Artificial Intelligence. |
T. Beroud; P. Abry; Y. Malevergne; M. Senneret; G. Perrin; J. Macq; |
2665 | Water Leak Detection and Localization Using Convolutional Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method for leak detection and localization. |
D. U. Leonzio; P. Bestagini; M. Marcon; G. P. Quarta; S. Tubaro; |
2666 | Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. |
F. Wu; K. Kim; S. Watanabe; K. J. Han; R. McDonald; K. Q. Weinberger; Y. Artzi; |
2667 | Wav2vec-Based Detection and Severity Level Classification of Dysarthria From Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, the pre-trained wav2vec 2.0 model is studied as a feature extractor to build detection and severity level classification systems for dysarthric speech. |
F. Javanmardi; S. Tirronen; M. Kodali; S. R. Kadiri; P. Alku; |
2668 | Waveform Boundary Detection for Partially Spoofed Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The present paper proposes a waveform boundary detection system for audio spoofing attacks containing partially manipulated segments. |
Z. Cai; W. Wang; M. Li; |
2669 | Waveform Design to Improve The Estimation of Target Parameters Using The Fourier Transform Method in A MIMO OFDM DFRC System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the local maxima do not necessarily correspond to the values of interest. To avoid this problem, we present an operation mode making it possible to address the estimations of the DOAs separately. |
S. Bhogavalli; E. Grivel; K. V. S. Hari; V. Corretja; |
2670 | WAVELET2VEC: A Filter Bank Masked Autoencoder for EEG-Based Seizure Subtype Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing deep learning approaches face two challenges in such applications: 1) convolutional or recurrent neural network based models have difficulty learning long-term dependencies; and, 2) there are not enough labeled seizure sub-type data for training such models. This paper proposes a Transformer-based self-supervised learning model for EEG-based seizure subtype classification, which copes well with these two challenges. |
R. Peng; C. Zhao; Y. Xu; J. Jiang; G. Kuang; J. Shao; D. Wu; |
2671 | Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Alternatively, this study proposes a Wave-U-Net discriminator, which is a single but expressive discriminator with Wave-U-Net architecture. |
T. Kaneko; H. Kameoka; K. Tanaka; S. Seki; |
2672 | Wavsyncswap: End-To-End Portrait-Customized Audio-Driven Talking Face Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using state-of-the-art methods Wav2Lip and SimSwap for this purpose, we meet some issues: affected mouth synchronization, lost texture information, and slow inference speed. To resolve these issues, we propose an end-to-end model that combines the advantages of both approaches. |
W. Bao; L. Chen; C. Zhou; S. Yang; Z. Wu; |
2673 | Weakly- and Semi-Supervised Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Weakly supervised object localization deals with the lack of location-level labels to train localization models. |
Z. -T. Huang; Y. -H. Chen; M. -C. Yeh; |
2674 | Weakly-Supervised Scene-Specific Crowd Counting Using Real-Synthetic Hybrid Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a weakly-supervised method with real-synthetic hybrid data which only requires a small portion of unlabelled real images and auto-generated synthetic labelled images for training. |
Y. Fan; J. Wan; Y. Yuan; Q. Wang; |
2675 | Weavspeech: Data Augmentation Strategy For Automatic Speech Recognition Via Semantic-Aware Weaving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, if speech signals are indiscriminately mixed without considering semantics, the risk of generating nonsensical sentences arises. To address these issues, in this paper, we propose WeavSpeech, still a simple yet effective cut-and-paste augmentation method for ASR tasks that weaves a pair of speech data considering semantics. |
K. Seo; J. Park; J. Song; E. Yang; |
2676 | Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on End-to-End ASR, in this paper, we propose a simple yet effective method to overcome catastrophic forgetting: weight averaging. |
S. Vander Eeckt; H. Van Hamme; |
2677 | Weight-Based Mask For Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The second is that previous approaches align image-level features regardless of foreground and background, although the classifier requires foreground features. To solve these problems, we introduce Weight-based Mask Network (WEMNet) composed of Domain Ignore Module (DIM) and Semantic Enhancement Module (SEM). |
E. Lee; I. Kim; D. Kim; |
2678 | Weighted Sampling for Masked Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Representation learning of rare tokens is poor and PLMs have limited performance on downstream tasks. To alleviate this frequency bias issue, we propose two simple and effective Weighted Sampling strategies for masking tokens based on token frequency and training loss. |
L. Zhang; Q. Chen; W. Wang; C. Deng; X. Cao; K. Hao; Y. Jiang; W. Wang; |
2679 | Weight-Sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a Once-For-All (OFA) Neural Architecture Search (NAS) framework for AEC. |
G. -T. Lin; Q. Tang; C. -C. Kao; V. Rozgic; C. Wang; |
2680 | Wekws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. |
J. Wang; M. Xu; J. Hou; B. Zhang; X. -L. Zhang; L. Xie; F. Pan; |
2681 | WeSinger 2: Fully Parallel Singing Voice Synthesis Via Multi-Singer Conditional Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to introduce a robust singing voice synthesis (SVS) system to produce very natural and realistic singing voices efficiently by leveraging the adversarial training strategy. |
Z. Zhang; Y. Zheng; X. Li; L. Lu; |
2682 | Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. |
H. Wang; C. Liang; S. Wang; Z. Chen; B. Zhang; X. Xiang; Y. Deng; Y. Qian; |
2683 | WHC: Weighted Hybrid Criterion for Filter Pruning on Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Filter pruning has attracted increasing attention in recent years for its capacity in compressing and accelerating convolutional neural networks. Various data-independent … |
S. Chen; W. Sun; L. Huang; |
2684 | When Is Mimo Massive in Radar? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, it has been demonstrated that as N grows to infinity, one can fully characterize the false alarm and detection probabilities with very minimal assumptions on the disturbance vector. In this work, these results are partially refined and a lower bound on the probability of detection is provided for any fixed, finite N under certain randomness models for the noise. |
J. Shah; M. Cardone; A. Dytso; C. Rush; |
2685 | Whether Contribution of Features Differ Between Video-Mediated and In-Person Meetings in Important Utterance Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study investigated differences in the contributions of various features to in-person (IP) and video-mediated (VM) meetings. |
F. Nihei; R. Ishii; Y. I. Nakano; A. Fukayama; T. Nakamura; |
2686 | Which Country Is This Picture From? New Data and Methods For Dnn-Based Country Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, recognizing in which country an image was taken could be more critical, from a semantic and forensic point of view, than estimating its spatial coordinates. In the above framework, this paper provides two contributions. First, we introduce the VIPPGeo dataset, containing 3.8 million geo-tagged images. |
O. Alamayreh; G. M. Dimitri; J. Wang; B. Tondi; M. Barni; |
2687 | Wiener Filtering Without Covariance Matrix Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents several approximate formulas for the Wiener filter (WF), the optimal linear filter minimizing the mean-squared error. |
P. U. Damale; E. K. P. Chong; L. L. Scharf; |
2688 | WIFI-Based Robust Child Presence Detection for Smart Cars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a WiFi-based robust CPD system consisting of a motion and breathing detector. |
S. S. Jayaweera; B. Wang; X. Zeng; W. -H. Wang; K. J. Ray Liu; |
2689 | Windowed Fourier Analysis for Signal Processing on Graph Bundles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the locality of this procedure, we demonstrate that bases for the signal spaces of the components of the graph bundle can be lifted in the same way, yielding a basis for the signal space of the total graph. We demonstrate this construction on synthetic graphs, as well as with an analysis of the energy landscape of conformational manifolds in stereochemistry. |
T. M. Roddenberry; S. Segarra; |
2690 | Wireless Deep Speech Semantic Transmission Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new class of high-efficiency semantic coded transmission methods to realize end-to-end speech transmission over wireless channels. |
Z. Xiao; S. Yao; J. Dai; S. Wang; K. Niu; P. Zhang; |
2691 | Wireless Location Tracking Via Complex-Domain Super MDS with Time Series Self-Localization Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a wireless localization algorithm based on complex-domain super multidimensional scaling (CD-SMDS) augmented with a self-localization (SL) component, whereby each target tracks its own motion by incorporating bearing information, obtained e.g., from integrated inertial sensors. |
Y. Nishi; T. Takahashi; H. Iimori; G. Abreu; S. Ibi; S. Sampei; |
2692 | Wireless Power Transfer Using Chirp Waveforms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate a superimposed chirp waveform for wireless power transfer (WPT) applications. |
A. Roy; C. Psomas; I. Krikidis; |
2693 | Wireless Sensing for Simultaneous Human Vocal Sound and Heart Sound Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate human vocal sound and heart sound detection and separation using a single millimeter-wave radar sensor and advanced array processing techniques to achieve superior motion sensitivity. |
Y. Rong; K. V. Mishra; D. W. Bliss; |
2694 | WITT: A Wireless Image Transmission Transformer for Semantic Communications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to redesign the vision Transformer (ViT) as a new backbone to realize semantic image transmission, termed wireless image transmission transformer (WITT). |
K. Yang; S. Wang; J. Dai; K. Tan; K. Niu; P. Zhang; |
2695 | WL-MSR: Watch and Listen for Multimodal Subtitle Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Watch and Listen for Multimodal Subtitle Recognition (WL-MSR) framework to obtain comprehensive video subtitles, by fusing the information provided by Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) models. |
J. Liu; H. Wang; W. Wang; X. He; J. Liu; |
2696 | Wordreg: Mitigating The Gap Between Training and Inference with Worst-Case Drop Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although effective, the sampled sub-model by random dropout during training is inconsistent with the full model (without dropout) during inference. To mitigate this undesirable gap, we propose WordReg, a simple yet effective regularization built on dropout that enforces the consistency between the outputs of different sub-models sampled by dropout. |
J. Xia; G. Wang; B. Hu; C. Tan; J. Zheng; Y. Xu; S. Z. Li; |
2697 | WUDA: Unsupervised Domain Adaptation Based on Weak Source Domain Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To explore solutions for WUDA, this paper proposes two intuitive frameworks and conducts comparative experiments. |
S. Liu; C. Zhu; Y. Li; W. Tang; |
2698 | X-SEPFORMER: End-To-End Speaker Extraction Network with Explicit Optimization on Speaker Confusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Both loss schemes aim to encourage a TSE network to pay attention to those SC chunks based on the said distribution information. On this basis, we present X-SepFormer, an end-to-end TSE model with proposed loss schemes and a backbone of SepFormer. |
K. Liu; Z. Du; X. Wan; H. Zhou; |
2699 | Yolo-Based Lightweight Object Detection With Structure Simplification And Attention Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a lightweight object detector by optimizing the structure of YOLOv3. |
S. Sun; X. Yang; J. Peng; |
2700 | YOLOX-B: A Better Yolox Model for Real-Time Driver Behavior Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the coal transportation scene, the object detection model proposed for the driver behavior detection task generally has the problems of inaccurate positioning and difficult detection of small objects, we propose a new model YOLOX-B, which introduces a serialized atrous spatial pyramid pooling structure (S-ASPP), obtains different sizes of receptive field information through serialized atrous convolution, solves the problem of information loss in max-pooling, and maximizes the efficiency of atrous convolution. |
X. Guo; M. Ma; J. Zhang; S. Li; |
2701 | Your Camera Improves Your Point Cloud Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the fusion of camera and LiDAR for vision perception has been well studied, it remains unexplored that how we can improve the compression of LiDAR point cloud data using cross-modal information from cameras. In this paper’ we propose a multi-modality compression framework for LiDAR point cloud by exploiting the depth information predicted from its paired image. |
Y. Lin; T. Xu; Z. Zhu; Y. Li; Z. Wang; Y. Wang; |
2702 | Zephyr: Zero-Shot Punctuation Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Zephyr algorithm, which utilizes PLMs to perform zero-shot and few-shot punctuation restoration for both offline and streaming scenarios. |
M. Wang; Y. Li; J. Guo; X. Qiao; C. Su; M. Zhang; S. Tao; H. Yang; |
2703 | Zero-Shot Anomalous Sound Detection in Domestic Environments Using Large-Scale Pretrained Audio Pattern Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, it is not feasible to expect a demanding labeling effort from the end user. To address these problems, we present a novel zero-shot method relying on an auxiliary large-scale pretrained audio neural network in support of an unsupervised anomaly detector. |
A. Ilic Mezza; G. Zanetti; M. Cobos; F. Antonacci; |
2704 | Zero-Shot Domain Adaptation of Anomalous Samples for Semi-Supervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practical situations, SSAD methods suffer adapting to domain shifts, since anomalous data are unlikely to be available for the target domain in the training phase. To solve this problem, we propose a domain adaptation method for SSAD where no anomalous data are available for the target domain. |
T. Nishida; T. Endo; Y. Kawaguchi; |
2705 | Zero-Shot Personalized Lip-To-Speech Synthesis with Face Image Based Voice Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a zero-shot personalized Lip2Speech synthesis method, in which face images control speaker identities. |
Z. -Y. Sheng; Y. Ai; Z. -H. Ling; |
2706 | Zero-Shot Sound Event Classification Using A Sound Attribute Vector with Global and Local Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new ZS-SEC method that can learn discriminative global features and local features simultaneously to enhance SAV-based ZS-SEC. |
Y. -H. Lin; X. Chen; R. Takashima; T. Takiguchi; |
2707 | Zero-Shot Speech Emotion Recognition Using Generative Learning with Reconstructed Prototypes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works on zero-shot SER directly employ original prototypes and only consider inter-domain knowledge transfer through learning unseen-emotional classifiers. In this regard, we propose a zero-shot SER approach using generative learning with reconstructed prototypes in this paper. |
X. Xu; J. Deng; Z. Zhang; Z. Yang; B. W. Schuller; |
2708 | ZO-DARTS: Differentiable Architecture Search with Zeroth-Order Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel NAS framework to address the differentiable neural architecture search problem by inspecting the bi-level problem formulation from scratch. |
L. Xie; K. Huang; F. Xu; Q. Shi; |
2709 | Zone Plate Virtual Lenses for Memory-Constrained NLOS Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we propose using zone plates (ZP), which require significant less memory. |
P. Luesia-Lahoz; D. Gutierrez; A. Muñoz; |
2710 | Ψ-Net: Point Structural Information Network for No-Reference Point Cloud Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a point structural information (PSI) network (ψ-Net) for no-reference PCQA. |
J. Xiong; S. Wu; W. Luo; J. Suo; H. Gao; |