Paper Digest: ICASSP 2022 Highlights
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper. Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services on ranking, search, tracking and automatic literature review.
If you do not want to miss interesting academic papers, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICASSP 2022 Highlights
Paper | Author(s) | |
---|---|---|
1 | Coughtrigger: Earbuds IMU Based Cough Detection Activator Using An Energy-Efficient Sensitivity-Prioritized Time Series Classifier Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present CoughTrigger, which utilizes a lower-power sensor, inertial measurement unit (IMU), in earbuds as a cough detection activator to trigger a higher-power sensor for audio processing and classification. |
S. Zhang; et al. |
2 | Non-Invasive Blood Pressure Monitoring with Multi-Modal In-Ear Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a measurement technique based on the vascular transit time which utilises the time difference between the S1 heart sound and the PPG upstroke in one pulse cycle. |
H. Truong; A. Montanari; F. Kawsar; |
3 | Intelligent Wi-Fi Based Child Presence Detection System Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents the first-of-its-kind intelligent CPD system using commodity Wi-Fi. |
X. Zeng; B. Wang; C. Wu; S. Deepika Regani; K. J. Ray Liu; |
4 | Real-Time Fall Detection Using Mmwave Radar Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose mmFall, a real time fall detection system using millimeter wave signal which can achieve impressive accuracy with low computation complexity. |
W. Li; et al. |
5 | Hierarchical Deep Learning Model with Inertial and Physiological Sensors Fusion for Wearable-Based Human Activity Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a human activity recognition (HAR) system with wearable devices. |
D. Y. Hwang; et al. |
6 | Speech Recovery For Real-World Self-Powered Intermittent Devices Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel intermittent speech recovery (ISR) system for real-world self-powered intermittent devices. |
Y. -C. Lin; et al. |
7 | Phase Control of Parametric Array Loudspeaker By Optimizing Sideband Weights Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a method for controlling the directivity of parametric array loudspeakers (PAL) by optimizing the weights for the sideband signals. |
A. Okano; Y. Kajikawa; |
8 | Low-Latency Human-Computer Auditory Interface Based on Real-Time Vision Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a visuo-auditory substitution method to assist visually impaired people in scene understanding. |
F. Scalvini; C. Bordeau; M. Ambard; C. Migniot; J. Dubois; |
9 | Robust Adaptive Noise Canceller Algorithm with Snr-Based Stepsize Control and Noise-Path Gain Compensation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a robust adaptive noise canceller algorithm with SNR-based stepsize control and noise-path gain compensation. |
A. Sugiyama; |
10 | Neartracker: Acoustic 2-D Target Tracking with Nearby Reflector in Siso System Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose NearTracker, a contactless acoustic tracking system, achieves 2-D target tracking with only one speaker and one microphone (i.e., Single Input Single Output, SISO). |
C. Liu; L. Gao; R. Jiang; |
11 | An Efficient Method For Generic Dsp Implementation Of Dilated Convolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a scheme that allows efficient/generic implementation of 2D dilated convolution and stride on typical DSPs where the instruction sets are well tuned for standard 1D and 2D filtering and convolution operations. |
H. E. V; S. Ghanekar; |
12 | Compression-Aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. |
Y. -S. Tai; C. -F. Teng; C. -Y. Chang; A. -Y. A. Wu; |
13 | Optimizing The Consumption Of Spiking Neural Networks With Activity Regularization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we look into different techniques to enforce sparsity on the neural network activation maps and compare the effect of different training regularizers on the efficiency of the optimized DNNs and SNNs. |
S. Narduzzi; S. A. Bigdeli; S. -C. Liu; L. A. Dunbar; |
14 | IMPQ: Reduced Complexity Neural Networks Via Granular Precision Assignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the problem of granular precision assignment is challenging due to an exponentially large search space and efficient methods for such precision assignment are lacking. To address this problem, we introduce the iterative mixed-precision quantization (IMPQ) framework to allocate precision at variable granularity. |
S. K. Gonugondla; N. R. Shanbhag; |
15 | Rate Coding Or Direct Coding: Which One Is Better For Accurate, Robust, And Energy-Efficient Spiking Neural Networks? Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency. |
Y. Kim; H. Park; A. Moitra; A. Bhattacharjee; Y. Venkatesha; P. Panda; |
16 | PYXIS: An Open-Source Performance Dataset Of Sparse Accelerators Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we present PYXIS, a performance dataset for customized accelerators on sparse data. |
L. Song; Y. Chi; J. Cong; |
17 | Fast Fault Diagnosis Method Of Rolling Bearings In Multi-Sensor Measurement Enviroment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a fast bearing state detection method based on multi-sensor signal fusion and compression feature extraction is proposed. |
Z. Pan; Z. Lin; Y. Zheng; Z. Meng; |
18 | Detecting Anomaly in Chemical Sensors Via Regularized Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a method for detecting anomalous chemical sensors using contrastive learning-based framework. |
D. Badawi; I. Bassi; S. Ozev; A. E. Cetin; |
19 | Evolutionary Neural Architecture Design of Liquid State Machine for Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Manually defining a neural architecture will be ineffective and laborious in most cases. Therefore, based on a state-of-the-art differential evolution algorithm, an evolutionary neural architecture design methodology is proposed to automatically build suitable model topologies for LSM in this study, without any prior knowledge. |
C. Tang; J. Ji; Q. Lin; Y. Zhou; |
20 | Invisible and Efficient Backdoor Attacks for Compressed Deep Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the feasibility of practical backdoor attacks for the compressed DNNs. |
H. Phan; Y. Xie; J. Liu; Y. Chen; B. Yuan; |
21 | Tensor-Based Orthogonal Matching Pursuit with Phase Rotation for Channel Estimation In Hybrid Beamforming Mimo-Ofdm Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to incorporate phase rotation in factor matrices of tensor-based orthogonal matching pursuit (T-OMP) algorithm to solve the energy leakage problem caused by the grid constraint. |
C. -H. Lo; P. -Y. Tsai; |
22 | Spain-Net: Spatially-Informed Stereophonic Music Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While existing deep learning models implicitly absorb the spatial information conveyed by the multi-channel input signals, we argue that a more explicit and active use of spatial information could not only improve the separation process but also provide an entry-point for many user-interaction based tools. To this end, we introduce a control method based on the stereophonic location of the sources of interest, expressed as the panning angle. |
D. Petermann; M. Kim; |
23 | Improved Singing Voice Separation with Chromagram-Based Pitch-Aware Remixing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel data augmentation technique, chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed. |
S. Yuan; et al. |
24 | Don�t Separate, Learn To Remix: End-To-End Neural Remixing With Joint Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we re-purpose Conv-TasNet, a well-known source separation model, into two neural remixing architectures that learn to remix directly rather than just to separate sources. |
H. Yang; S. Firodiya; N. J. Bryan; M. Kim; |
25 | Few-Shot Musical Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. |
Y. Wang; D. Stoller; R. M. Bittner; J. Pablo Bello; |
26 | Source Separation By Steering Pretrained Music Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use OpenAI�s Jukebox as the pretrained generative model, and we couple it with four kinds of pretrained music taggers (two architectures and two tagging datasets). |
E. Manilow; P. O�Reilly; P. Seetharaman; B. Pardo; |
27 | Infant Crying Detection In Real-World Environments Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features. |
X. Yao; M. Micheletti; M. Johnson; E. Thomaz; K. de Barbaro; |
28 | Wikitag: Wikipedia-Based Knowledge Embeddings Towards Improved Acoustic Event Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe how to extract label embeddings from multiple Wikipedia texts, and formulate the multi-view aligned AEC problem based on VGGish model. |
Q. Zhang; Q. Tang; C. -C. Kao; M. Sun; Y. Liu; C. Wang; |
29 | Urban Sound & Sight: Dataset And Benchmark For Audio-Visual Urban Scene Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Yet, the lack of well-curated resources for training and evaluating models to research in this area hinders their development. To address this we present a curated audio-visual dataset, Urban Sound & Sight (Urbansas), developed for investigating the detection and localization of sounding vehicles in the wild. |
M. Fuentes; et al. |
30 | Real-World On-Board Uav Audio Data Set For Propeller Anomalies Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a novel real-world audio data set of propeller anomalies, and use several deep learning models to classify the damage. |
S. S. Katta; K. Vuoj�rvi; S. Nandyala; U. -M. Kovalainen; L. Baddeley; |
31 | Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To support research on building robust and accurate vocal sound recognition, we have created a VocalSound dataset consisting of over 21,000 crowdsourced recordings of laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. |
Y. Gong; J. Yu; J. Glass; |
32 | Wearable Seld Dataset: Dataset For Sound Event Localization And Detection Using Wearable Devices Around Head Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Some applications (including those for pedestrians that perform SELD while walking) require a wearable microphone array whose geometry can be designed to suit the task. In this paper, for development of such a wearable SELD, we propose a dataset named Wearable SELD dataset. |
K. Nagatomo; M. Yasuda; K. Yatabe; S. Saito; Y. Oikawa; |
33 | Tunet: A Block-Online Bandwidth Extension Model Based On Transformers And Self-Supervised Pretraining Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. |
V. -A. Nguyen; A. H. T. Nguyen; A. W. H. Khong; |
34 | DRC-NET: Densely Connected Recurrent Convolutional Neural Network for Speech Dereverberation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: So that a massive deployment of RNN in time dimension is realized in this paper, by using the channel-wise long short-term memory neural network. |
J. Liu; X. Zhang; |
35 | Customizable End-To-End Optimization Of Online Neural Network-Supported Dereverberation For Hearing Devices Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an end-to-end approach specialized for online processing, that directly optimizes the dereverberated output signal. |
J. -M. Lemercier; J. Thiemann; R. Koning; T. Gerkmann; |
36 | Importance of Switch Optimization Criterion in Switching WPE Dereverberation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We thus propose a new SwWPE processing flow that enables us to optimize switching parameters based on an arbitrary optimization criterion. |
N. Kamo; R. Ikeshita; K. Kinoshita; T. Nakatani; |
37 | Audio-To-Symbolic Arrangement Via Cross-Modal Music Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Could we automatically derive the score of a piano accompaniment based on the audio of a pop song? This is the audio-to-symbolic arrangement problem we tackle in this paper. |
Z. Wang; D. Xu; G. Xia; Y. Shan; |
38 | Music Phrase Inpainting Using Long-Term Representation and Contrastive Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we tackle the problem of long-term, phrase-level symbolic melody inpainting by equipping a sequence prediction model with phrase-level representation (as an extra condition) and contrastive loss (as an extra optimization term). |
S. Wei; G. Xia; Y. Zhang; L. Lin; W. Gao; |
39 | Melons: Generating Melody With Long-Term Structure Using Transformers And Structure Graph Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose MELONS, a melody generation framework based on a graph representation of music structure which consists of eight types of bar-level relations. |
Y. Zou; P. Zou; Y. Zhao; K. Zhang; R. Zhang; X. Wang; |
40 | Difficulty-Aware Neural Band-to-Piano Score Arrangement Based on Note- and Statistic-Level Criteria Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes a neural music arrangement method that converts a given band score into a piano score with an elementary or advanced level. |
M. Terao; Y. Hiramatsu; R. Ishizuka; Y. Wu; K. Yoshii; |
41 | Score Difficulty Analysis for Piano Performance Education Based on Fingering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce score difficulty classification as a sub-task of music information retrieval (MIR), which may be used in music education technologies, for personalised curriculum generation, and score retrieval. |
P. Ramoneda; N. C. Tamer; V. Eremenko; X. Serra; M. Miron; |
42 | A Neural Network-based Howling Detection Method for Real-Time Communication Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a convolutional recurrent neural network (CRNN) based method for howling detection in RTC applications, achieving excellent accuracy with low false-alarm rates. |
Z. Chen; Y. Hao; Y. Chen; G. Chen; L. Ruan; |
43 | Alarm Sound Detection Using Topological Signal Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel approach to alarm sound detection using topological data analysis. |
T. Fireaizen; S. Ron; O. Bobrowski; |
44 | A Method For Estimating The Grouping Of Participants In Classroom Group Work Using Only Audio Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel method for estimating which microphone belongs to the same group in a situation where there are multiple discussion groups in one room, using only audio information. |
O. Ichikawa; Y. Shima; T. Nakayama; H. Shirouzu; |
45 | Environmental Sound Extraction Using Onomatopoeic Words Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an environmental-sound-extraction method using onomatopoeic words to specify the target sound to be extracted. |
Y. Okamoto; S. Horiguchi; M. Yamamoto; K. Imoto; Y. Kawaguchi; |
46 | Echo-Aware Adaptation of Sound Event Localization and Detection in Unknown Environments Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this study, we propose echo-aware feature refinement (EAR) for SELD, which suppresses environmental effects at the feature level by using additional spatial cues of the unknown environment obtained through measuring acoustic echoes. |
M. Yasuda; Y. Ohishi; S. Saito; |
47 | On Adversarial Robustness Of Large-Scale Audio Visual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This work aims to study several key questions related to multi-modal learning through the lens of robustness: 1) Are multi-modal models necessarily more robust than uni-modal models? |
J. B. Li; S. Qu; X. Li; P. -Y. B. Huang; F. Metze; |
48 | Adversarial Sample Detection for Speaker Verification By Neural Vocoders Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we adopt neural vocoders to spot adversarial samples for ASV. |
H. Wu; et al. |
49 | Amicable Examples for Informed Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, in this work, we improve the performance of a pre-trained separation model that does not use any side-information. |
N. Takahashi; Y. Mitsufuji; |
50 | Multi-Modal Pre-Training for Automated Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a novel approach that leverages a self-supervised learning technique based on masked language modeling to compute a global, multi-modal encoding of the environment in which the utterance occurs. |
D. M. Chan; S. Ghosh; D. Chakrabarty; B. Hoffmeister; |
51 | Speaker-Targeted Audio-Visual Speech Recognition Using A Hybrid CTC/Attention Model with Interference Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although AV Align shows an improvement in recognition accuracy in background noise environments, we have observed that the recognition accuracy degrades significantly in interference speaker environments, where a target speech and an interfering speech overlap each other. In order to improve the speech recognition accuracy of the target speaker in such situations, we propose a method that combines the auxiliary loss function that maximizes the recognition accuracy of the interference speaker and the CTC loss function for training the AV-ASR model. |
R. Tsunoda; R. Aihara; R. Takashima; T. Takiguchi; Y. Imai; |
52 | Time-Domain Audio-Visual Speech Separation on Low Quality Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new structure to fuse the audio and visual features, which uses the audio feature to select relevant visual features by utilizing the attention mechanism. |
Y. Wu; C. Li; J. Bai; Z. Wu; Y. Qian; |
53 | Complex-Valued Spatial Autoencoders for Multichannel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this contribution, we present a novel online approach to multichannel speech enhancement. |
M. M. Halimeh; W. Kellermann; |
54 | Multichannel Noise Reduction Using Dilated Multichannel U-Net and Pre-Trained Single-Channel Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a transfer learning approach that leverages existing pre-trained single-channel neural networks for the optimization of multichannel neural networks. |
Z. -W. Tan; A. H. T. Nguyen; Y. Liu; A. W. H. Khong; |
55 | One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Single-channel personalized speech enhancement (PSE) methods show promising results compared with the unconditional speech enhancement (SE) methods in these scenarios due to their ability to remove interfering speech in addition to the environmental noise. In this work, we leverage spatial information afforded by microphone arrays to improve such systems� performance further. |
H. Taherian; S. E. Eskimez; T. Yoshioka; H. Wang; Z. Chen; X. Huang; |
56 | Multi-Channel Speech Denoising for Machine Ears Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a MCSDN-Beamforming-MCSDN framework in the inference stage. |
C. Han; E. M. Kaya; K. Hoefer; M. Slaney; S. Carlile; |
57 | Localization Based Sequential Grouping for Continuous Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Given a block of frames with at most two speakers, we apply a two-speaker separation model to separate (and enhance) the speakers, estimate the DOA of each separated speaker, and group the separation results across blocks based on the DOA estimates. |
Z. -Q. Wang; D. Wang; |
58 | Convolutional Weighted Minimum Mean Square Error Filter for Joint Source Separation and Dereverberation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we derive a convolutional multichannel filter which performs jointly optimum dereverberation and desired source signal extraction. |
M. Fras; M. Witkowski; K. Kowalczyk; |
59 | Improving Source Separation By Explicitly Modeling Dependencies Between Sources Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new method for training a supervised source separation system that aims to learn the interdependent relationships between all combinations of sources in a mixture. |
E. Manilow; C. Hawthorne; C. -Z. A. Huang; B. Pardo; J. Engel; |
60 | Music Source Separation With Deep Equilibrium Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hence, in this paper we propose an architecture and training scheme for MSS with DEQ. |
Y. Koyama; N. Murata; S. Uhlich; G. Fabbro; S. Takahashi; Y. Mitsufuji; |
61 | Harmonic and Percussive Sound Separation Based on Mixed Partial Derivative of Phase Spectrogram Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel HPSS method named MipDroP that relies only on phase and does not use information of magnitude spectrograms. |
N. Akaishi; K. Yatabe; Y. Oikawa; |
62 | On Loss Functions and Evaluation Metrics for Music Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation. |
E. Gus�; J. Pons; S. Pascual; J. Serr�; |
63 | Time-Balanced Focal Loss for Audio Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This variability results in an inherent disproportional representation of effective training samples. To address this compounded imbalance issue, this work proposes a balanced focal learning function that introduces a novel time-sensitive classwise weight. |
S. Park; M. Elhilali; |
64 | Multi-ACCDOA: Localizing And Detecting Overlapping Sounds From The Same Class With Auxiliary Duplicating Permutation Invariant Training Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, there is still a challenge in detecting the same event class from multiple locations. To overcome this problem while maintaining the advantages of the class-wise format, we extended ACCDOA to a multi one and proposed auxiliary duplicating permutation invariant training (ADPIT). |
K. Shimada; Y. Koyama; S. Takahashi; N. Takahashi; E. Tsunoo; Y. Mitsufuji; |
65 | Improved Representation Learning For Acoustic Event Classification Using Tree-Structured Ontology Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a structure-aware semi-supervised learning framework for acoustic event classification (AEC). |
A. Zharmagambetov; et al. |
66 | Temporal Contrastive-Loss for Audio Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose coherence-based learning, formulated as a contrastive loss, to train event detection models whereby embeddings driven by acoustic events are coherently constrained to maximize discriminability across events. |
S. Kothinti; M. Elhilali; |
67 | A Frame Loss of Multiple Instance Learning for Weakly Supervised Sound Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the general MIL method only optimizes the global loss calculated from the aggregated clip-wise predictions and weak clip labels, lacking a direct constraint on the frame-wise predictions, which leads to a large number of unreasonable prediction values. To address this issue, we explore the deterministic information that can be used to constrain the framewise predictions and based on which we design a frame loss with two terms. |
X. Wang; X. Zhang; Y. Zi; S. Xiong; |
68 | Pseudo Strong Labels for Large Scale Weakly Supervised Audio Tagging Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This work proposes pseudo strong labels (PSL), a simple label augmentation framework that enhances the supervision quality for large-scale weakly supervised audio tagging. |
H. Dinkel; Z. Yan; Y. Wang; J. Zhang; Y. Wang; |
69 | Individualized Hear-Through For Acoustic Transparency Using PCA-Based Sound Pressure Estimation At The Eardrum Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A particular challenge is the transfer function between the hearing device receiver and the eardrum, which is difficult to obtain in practice as it requires additional probe-tube measurements. In this work, we address this issue by proposing an individualized hear-through equalization filter design that leverages the measurement of the so-called secondary path to predict the sound pressure at the eardrum using a principle component analysis based estimator. |
W. Jin; T. Schoof; H. Schepker; |
70 | On Spectral and Temporal Sparsification of Speech Signals for The Improvement of Speech Perception in CI Listeners Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, two methods inspired by music simplification approaches were developed and evaluated through instrumental measures and in listening tests with adult CI listeners. |
B. Lentz; R. Martin; K. Oberl�nder; C. V�lter; |
71 | A Differentiable Optimisation Framework for The Design of Individualised DNN-based Hearing-Aid Strategies Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Current hearing aids mostly provide sound amplification fittings based on individual hearing thresholds or perceived loudness, even though it is known that sensorineural hearing damage is functionally complex, and requires different treatment strategies. To meet this demand, we propose an optimisation framework for the design of individualised hearingaid signal processing based on simulated (hearing-impaired) auditory-nerve responses. |
F. Drakopoulos; S. Verhulst; |
72 | Personalized Speech Enhancement: New Models and Comprehensive Evaluation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose two neural networks for PSE that achieve superior performance to the previously proposed VoiceFilter. |
S. E. Eskimez; T. Yoshioka; H. Wang; X. Wang; Z. Chen; X. Huang; |
73 | Dynamic Sliding Window for Realtime Denoising Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In response, we propose a new sliding window strategy and a lightweight neural network to leverage it. |
J. Xiang; Y. Zhu; R. Wu; R. Xu; Y. Ishiwaka; C. Zheng; |
74 | Bloom-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a blockwise optimization method for masking-based networks (BLOOM-Net) for training scalable speech enhancement networks. |
S. Kim; M. Kim; |
75 | HGCN: Harmonic Gated Compensation Network for Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, it is hard for most models to handle the situation when harmonics are partially masked by noise. To tackle this challenge, we propose a harmonic gated compensation network (HGCN). |
T. Wang; W. Zhu; Y. Gao; J. Feng; S. Zhang; |
76 | Speech Enhancement with Neural Homomorphic Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a new speech enhancement method based on neural homomorphic synthesis. |
W. Jiang; Z. Liu; K. Yu; F. Wen; |
77 | A Bayesian Permutation Training Deep Representation Learning Method for Speech Enhancement with Variational Autoencoder Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. |
Y. Xiang; J. L. H�jvang; M. H. Rasmussen; M. G. Christensen; |
78 | Integrating Statistical Uncertainty Into Neural Network-Based Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the benefits of modeling uncertainty in neural network-based speech enhancement. |
H. Fang; T. Peer; S. Wermter; T. Gerkmann; |
79 | Unsupervised Speech Enhancement with Speech Recognition Embedding and Disentanglement Losses Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, we propose an unsupervised loss function to tackle those two problems. |
V. A. Trinh; S. Braun; |
80 | Musicyolo: A Sight-Singing Onset/Offset Detection Framework Based on Object Detection Instead of Spectrum Frames Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose MusicYOLO based on object detection to detect the onset and offset in singing for the first time. |
X. Wang; W. Xu; W. Yang; W. Cheng; |
81 | Modeling Beats and Downbeats with A Time-Frequency Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel Transformer-based approach to tackle beat and downbeat tracking. |
Y. -N. Hung; J. -C. Wang; X. Song; W. -T. Lu; M. Won; |
82 | Hierarchical Classification of Singing Activity, Gender, and Type in Complex Music Recordings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Traditionally, work on singing voice detection has focused on identifying singing activity in music recordings. In this work, our aim is to extend this task towards simultaneously detecting the presence of singing voice as well as determining singer gender and voice type. |
M. Krause; M. M�ller; |
83 | Deepchorus: A Hybrid Model of Multi-Scale Convolution And Self-Attention for Chorus Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To solve the problem, in this paper we propose an end-to-end chorus detection model DeepChorus, reducing the engineering effort and the need for prior knowledge. |
Q. He; X. Sun; Y. Yu; W. Li; |
84 | To Catch A Chorus, Verse, Intro, or Anything Else: Analyzing A Song with Structural Functions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, explicitly identifying the function of each segment (e.g., �verse� or �chorus�) is rarely attempted, but has many applications. We introduce a multi-task deep learning framework to model these structural semantic labels directly from audio by estimating verseness, chorusness, and so forth, as a function of time. |
J. -C. Wang; Y. -N. Hung; J. B. L. Smith; |
85 | A Novel 1D State Space for Efficient Music Rhythmic Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper proposes a new state space and a semi-Markov model for music time structure analysis. |
M. Heydari; M. McCallum; A. Ehmann; Z. Duan; |
86 | Upmixing Via Style Transfer: A Variational Autoencoder for Disentangling Spatial Images And Musical Content Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. |
H. Yang; S. Wager; S. Russell; M. Luo; M. Kim; W. Kim; |
87 | Spatial Mixup: Directional Loudness Modification As Data Augmentation for Sound Event Localization and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Spatial Mixup, as an application of parametric spatial audio effects for data augmentation, which modifies the directional properties of a multi-channel spatial audio signal encoded in the ambisonics domain. |
R. Falc�n-P�rez; K. Shimada; Y. Koyama; S. Takahashi; Y. Mitsufuji; |
88 | Towards Faster Continuous Multi-Channel HRTF Measurements Based On Learning System Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To cope with faster rotations, we present a novel continuous HRTF measurement method. |
T. Kabzinski; P. Jax; |
89 | Towards Fast And Convenient End-To-End HRTF Personalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose and evaluate a system utilizing this model to generate an individualized HRTF using a minimal set of easily obtainable measurements: single photographs of both ears, as well as head and ear scale for matching interaural time difference (ITD). |
B. Zhi; D. N. Zotkin; R. Duraiswami; |
90 | Wishart Localization Prior On Spatial Covariance Matrix In Ambisonic Source Separation Using Non-Negative Tensor Factorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents an extension of the existing Non-negative Tensor Factorization (NTF) based method for sound source separation under reverberant conditions, formulated for Ambisonic microphone mixture signals. |
M. Guzik; K. Kowalczyk; |
91 | Improving Lyrics Alignment Through Joint Pitch Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a multi-task learning approach for lyrics alignment that incorporates pitch and thus can make use of a new source of highly accurate temporal information. |
J. Huang; E. Benetos; S. Ewert; |
92 | Learning Music Audio Representations Via Weak Language Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we pose the question of whether it may be possible to exploit weakly aligned text as the only supervisory signal to learn general-purpose music audio representations. |
I. Manco; E. Benetos; E. Quinton; G. Fazekas; |
93 | On The Prediction of The Frequency Response of A Wooden Plate from Its Mechanical Parameters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by deep learning applications in structural mechanics, we focus on how to train two predictors to model the relation between the vibrational response of a prescribed point of a wooden plate and its material properties. |
D. G. Badiane; R. Malvermi; S. Gonzalez; F. Antonacci; A. Sarti; |
94 | Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore a data-driven approach that uses a generative adversarial network to create the song transition by learning from real-world DJ mixes. |
B. -Y. Chen; W. -H. Hsu; W. -H. Liao; M. A. M. Ram�rez; Y. Mitsufuji; Y. -H. Yang; |
95 | Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a self-supervised representation learning method is proposed for anomalous sound detection (ASD). |
H. Chen; Y. Song; L. -R. Dai; I. McLoughlin; L. Liu; |
96 | Federated Self-Training for Data-Efficient Audio Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FedSTAR, a self-training approach to exploit large-scale on-device unlabeled data to improve the generalization of audio recognition models. |
V. Tsouvalas; A. Saeed; T. Ozcelebi; |
97 | Federated Self-Supervised Learning for Acoustic Event Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploaded to the server. |
M. Feng; et al. |
98 | Temporal Knowledge Distillation for On-device Audio Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new knowledge distillation method designed to incorporate the temporal knowledge embedded in attention weights of large transformer-based models into on-device models. |
K. Choi; M. Kersner; J. Morton; B. Chang; |
99 | Streaming On-Device Detection of Device Directed Speech from Voice and Touch-Based Invocation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in many cases, the VA can accidentally be invoked by the keyword-like speech or accidental button press, which may have implications on user experience and privacy. To this end, we propose an acoustic false-trigger-mitigation (FTM) approach for on-device device-directed speech detection that simultaneously handles the voice-trigger and touch-based invocation. |
O. O. Rudovic; A. Bindal; V. Garg; P. Simha; P. Dighe; S. Kajarekar; |
100 | Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new extension of FCA, aiming to improve BSS performance for mixtures in which the length of reverberation exceeds the analysis frame. |
H. Sawada; R. Ikeshita; K. Kinoshita; T. Nakatani; |
101 | Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes a blind source separation method for multichannel audio signals, called NF-FastMNMF, based on the integration of the normalizing flow (NF) into the multichannel nonnegative matrix factorization with jointly-diagonalizable spatial covariance matrices, a.k.a. FastMNMF. |
A. A. Nugraha; K. Sekiguchi; M. Fontaine; Y. Bando; K. Yoshii; |
102 | Harvesting Partially-Disjoint Time-Frequency Information for Improving Degenerate Unmixing Estimation Technique Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, to avoid the erroneous retention, instead of masking, we propose to use multiple linear spatial filters (e.g., the minimum variance distortionless response filter) to extract the desired signals. |
Y. He; H. Wang; Q. Chen; R. H. Y. So; |
103 | Investigation And Comparison of Optimization Methods for Variational Autoencoder-Based Underdetermined Multichannel Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate two algorithms for variational autoencoder (VAE)-based underdetermined multichannel source separation. |
S. Seki; H. Kameoka; L. Li; |
104 | HBP: An Efficient Block Permutation Solver Using Hungarian Algorithm and Spectrogram Inpainting for Multichannel Audio Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a method called Hungarian Block Permutation (HBP) to solve the block permutation problem in frequency-domain multichannel audio source separation. |
L. Li; H. Kameoka; S. Seki; |
105 | EAD-Conformer: A Conformer-Based Encoder-Attention-Decoder-Network for Multi-Task Audio Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Conformer-based network to improve the performance of multi-task audio source separation. |
C. Li; Y. Wang; F. Deng; Z. Zhang; X. Wang; Z. Wang; |
106 | The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, separating an audio mixture (e.g., movie soundtrack) into the three broad categories of speech, music, and sound effects (understood to include ambient noise and natural sound events) has been left largely unexplored, despite a wide range of potential applications. This paper formalizes this task as the cocktail fork problem, and presents the Divide and Remaster (DnR) dataset to foster research on this topic. |
D. Petermann; G. Wichern; Z. -Q. Wang; J. L. Roux; |
107 | Phase Shifted Bedrosian Filterbank: An Interpretable Audio Front-End for Time-Domain Audio Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This type of filters also allows a potential reduction of the computational cost since larger encoder filters can be used. In this work, we propose to build a new parameterization of such encoder filter-bank which allows gaining interpretability while keeping flexibility. |
F. Mathieu; T. Courtat; G. Richard; G. Peeters; |
108 | Harmonicity Plays A Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that performance deteriorates significantly if one source is even slightly harmonically jittered, e.g., an imperceptible 3% harmonic jitter degrades performance of Conv-TasNet from 15.4 dB to 0.70 dB. |
R. Parikh; I. Kavalerov; C. Espy-Wilson; S. Shamma; |
109 | Multi-Channel Narrow-Band Deep Speech Separation with Full-Band Permutation Invariant Training Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper addresses the problem of multi-channel multi-speech separation based on deep learning techniques. In the short time Fourier transform domain, we propose an end-to-end narrow-band network that directly takes as input the multi-channel mixture signals of one frequency, and outputs the separated signals of this frequency. |
C. Quan; X. Li; |
110 | Csenet: Complex Squeeze-and-Excitation Network for Speech Depression Level Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In order to make full use of speech information, this paper proposes a complex squeeze-and-excitation network (CSENet) for SDLP. |
C. Fan; Z. Lv; S. Pei; M. Niu; |
111 | Ubilung: Multi-Modal Passive-Based Lung Health Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Lung health assessment is traditionally done mainly through X-ray images and spirometry tests which are time-consuming, cumbersome, and costly. In this paper, we investigate the potential of passively recordable contents such as speech, cough and heart signal for such an assessment. |
E. Nemati; et al. |
112 | The Second Dicova Challenge: Dataset and Performance Analysis for Diagnosis of Covid-19 Using Acoustics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an overview of the challenge, the rationale for the data collection and the baseline system. |
N. K. Sharma; S. R. Chetupalli; D. Bhattacharya; D. Dutta; P. Mote; S. Ganapathy; |
113 | Supervised and Self-Supervised Pretraining Based Covid-19 Detection Using Acoustic Breathing/Cough/Speech Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a bi-directional long short-term memory (BiLSTM) network based COVID-19 detection method using breath/speech/cough signals. |
X. -Y. Chen; Q. -S. Zhu; J. Zhang; L. -R. Dai; |
114 | Exploring Auditory Acoustic Features for The Diagnosis of Covid-19 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work presents the details of the automatic system for COVID-19 detection using breath, cough and speech recordings. |
M. R. Kamble; J. Patino; M. A. Zuluaga; M. Todisco; |
115 | Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. |
A. Ratnarajah; S. -X. Zhang; M. Yu; Z. Tang; D. Manocha; D. Yu; |
116 | Region-to-Region Kernel Interpolation of Acoustic Transfer Function with Directional Weighting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A method of interpolating the acoustic transfer function (ATF) between regions that takes into account both the physical properties of the ATF and the directionality of region configurations is proposed. |
J. G. C. Ribeiro; S. Koyama; H. Saruwatari; |
117 | Blind Reverberation Time Estimation in Dynamic Acoustic Conditions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Previously proposed methods involving deep neural networks were mostly designed and tested under the assumption of static acoustic conditions. In this work, we show that these approaches can perform poorly in dynamically evolving acoustic environments. |
P. G�tz; C. Tuna; A. Walther; E. A. P. Habets; |
118 | Sparse Modeling of The Early Part of Noisy Room Impulse Responses with Sparse Bayesian Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to reconstruct the sparse model for the early part of RIRs with sparse Bayesian learning (SBL). |
M. Fu; J. R. Jensen; Y. Li; M. G. Christensen; |
119 | Improved Simulation of Realistically-Spatialised Simultaneous Speech Using Multi-Camera Analysis in The Chime-5 Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In earlier work, we analysed a 50-hour audio-visual dataset of multiparty recordings made in real homes to estimate typical angular separations between speakers. |
J. Deadman; J. Barker; |
120 | A Data-Driven Approach for Acoustic Parameter Similarity Estimation of Speech Recording Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose two methods to estimate acoustic parameter similarity between a speech recording under analysis and a reference one. |
M. Papa; C. Borrelli; P. Bestagini; F. Antonacci; A. Sarti; S. Tubaro; |
121 | Violinist Identification Using Note-Level Timbre Feature Distributions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To verify if timbre features can describe a performer�s style adequately, we examine a violinist identification method based on note-level timbre feature distributions. |
Y. Zhao; G. Fazekas; M. Sandler; |
122 | S3T: Self-Supervised Pre-Training with Swin Transformer For Music Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose S3T, a self-supervised pre-training method with Swin Transformer for music classification, aiming to learn meaningful music representations from massive easily accessible unlabeled music data. |
H. Zhao; C. Zhang; B. Zhu; Z. Ma; K. Zhang; |
123 | Ambiguity Modelling with Label Distribution Learning for Music Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we address the issue of ambiguity that can arise in many classification problems. |
M. Buisson; P. Alonso-Jim�nez; D. Bogdanov; |
124 | Bytecover2: Towards Dimensionality Reduction of Latent Embedding for Efficient Cover Song Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an up-graded version of ByteCover, termed ByteCover2, which further improves ByteCover in both identification performance and efficiency. |
X. Du; K. Chen; Z. Wang; B. Zhu; Z. Ma; |
125 | Tonet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose TONet1, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. |
K. Chen; S. Yu; C. -i. Wang; W. Li; T. Berg-Kirkpatrick; S. Dubnov; |
126 | Hierarchical Graph-Based Neural Network for Singing Melody Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel hierarchical graph-based network for singing melody extraction. |
S. Yu; X. Chen; W. Li; |
127 | On The Impact of Normalization Strategies in Unsupervised Adversarial Domain Adaptation for Acoustic Scene Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address a more practical scenario where parallel data are not available. |
M. Olvera; E. Vincent; G. Gasso; |
128 | Improving Bird Classification with Unsupervised Sound Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate improved separation quality when training a MixIT model specifically for birdsong data, outperforming a general audio separation model by over 5 dB in SI-SNR improvement of reconstructed mixtures. |
T. Denton; S. Wisdom; J. R. Hershey; |
129 | Scalable Neural Architectures for End-to-End Environmental Sound Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose novel neural architectures based on PhiNets for real-time acoustic event detection on microcontroller units. |
F. Paissan; A. Ancilotto; A. Brutti; E. Farella; |
130 | HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, existing audio transformers require large GPU memories and long training time, meanwhile relying on pretrained vision models to achieve high performance, which limits the model�s scalability in audio tasks. To combat these problems, we introduce HTS-AT: an audio transformer with a hierarchical structure to reduce the model size and training time. |
K. Chen; X. Du; B. Zhu; Z. Ma; T. Berg-Kirkpatrick; S. Dubnov; |
131 | Hybrid Attention-Based Prototypical Networks for Few-Shot Sound Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a hybrid attention module and combine it with prototypical networks for few-shot sound classification. |
Y. Wang; D. V. Anderson; |
132 | End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we exploit the offset-compensating property of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. |
K. N. Watcharasupat; T. N. T. Nguyen; W. -S. Gan; S. Zhao; B. Ma; |
133 | NN3A: Neural Network Supported Acoustic Echo Cancellation, Noise Suppression and Automatic Gain Control for Real-Time Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a neural network supported algorithm for RTC, namely NN3A, which incorporates an adaptive filter and a multi-task model for residual echo suppression, noise reduction and near-end speech activity detection. |
Z. Wang; Y. Na; B. Tian; Q. Fu; |
134 | Deep Residual Echo Suppression and Noise Reduction: A Multi-Input FCRN Approach in A Hybrid Speech Enhancement System Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Using the fully convolutional recurrent network (FCRN) architecture that is among state of the art topologies for noise reduction, we present a novel deep residual echo suppression and noise reduction with up to four input signals as part of a hybrid speech enhancement system with a linear frequency domain adaptive Kalman filter AEC. |
J. Franzen; T. Fingscheidt; |
135 | Neural Cascade Architecture for Joint Acoustic Echo and Noise Suppression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a neural cascade architecture for joint acoustic echo and noise suppression. |
H. Zhang; D. Wang; |
136 | Cascade Multi-Channel Noise Reduction and Acoustic Feedback Cancellation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a cascade noise reduction (NR) and acoustic feedback cancellation (AFC) algorithm is presented for speech applications where a multi-channel Wiener filter (MWF) based NR is applied first followed by a single-channel prediction-error method (PEM) based adaptive feedback cancellation stage. |
S. Ruiz; T. van Waterschoot; M. Moonen; |
137 | Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We proposed a simple yet efficient model named Skipping Memory (SkiM) for the long sequence modeling. |
C. Li; L. Yang; W. Wang; Y. Qian; |
138 | Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate using MixIT to adapt a separation model on real far-field overlapping reverberant and noisy speech data from the AMI Corpus. |
A. Sivaraman; S. Wisdom; H. Erdogan; J. R. Hershey; |
139 | Quantifying Discriminability Between NMF Bases Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a quantitative measure to calculate how discriminative two NMF bases are. |
E. Konno; D. Saito; N. Minematsu; |
140 | Location-Based Training for Multi-Channel Talker-Independent Speaker Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Leveraging spatial information afforded by microphone arrays, we propose a new training approach to resolving permutation ambiguities for multi-channel speaker separation. |
H. Taherian; K. Tan; D. Wang; |
141 | SDR � Medium Rare with Fast Computations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast algorithm fixing shortcomings of publicly available implementations. |
R. Scheibler; |
142 | Attentionpit: Soft Permutation Invariant Training for Audio Source Separation with Attention Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: \right)$, which makes it infeasible as J increases, and the other is that it is prone to getting stuck in bad local optimal solutions due to the hard output-target assignment process. To overcome these problems simultaneously, in this paper, we propose AttentionPIT, which uses an attention mechanism to find soft output-target assignments for separation network training, and can be run in polynomial time in J, as with the recently proposed fast PIT variants such as SinkPIT and HungarianPIT. |
H. Kameoka; S. Seki; L. Li; C. Watanabe; |
143 | Locate This, Not That: Class-Conditioned Sound Event DOA Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an alternative class-conditioned SELD model for situations where we may not be interested in localizing all classes all of the time. |
O. Slizovskaia; G. Wichern; Z. -Q. Wang; J. Le Roux; |
144 | SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we introduce SALSA-Lite, a fast and effective feature for polyphonic SELD using microphone array inputs. |
T. N. Tho Nguyen; D. L. Jones; K. N. Watcharasupat; H. Phan; W. -S. Gan; |
145 | SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to use deep learning techniques to learn competing and time-varying direct-path phase differences for localizing multiple moving sound sources. |
B. Yang; H. Liu; X. Li; |
146 | Closed-Form Single Source Direction-of-Arrival Estimator Using First-Order Relative Harmonic Coefficients Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, this paper utilizes the first-order RHC to propose a closed-form DOA estimator by deriving a direction vector, which points towards to the desired source direction. |
Y. Hu; S. Gannot; |
147 | A Slide-Save Based Framework for Multi-Source DOA Extraction with Closely Spaced Sources Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a slide-save based framework to address the problem of extracting multi-source DOAs for closely spaced sources. |
J. Geng; S. Wang; X. Lou; |
148 | An End-to-End Deep Learning Framework For Multiple Audio Source Separation And Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an end-to-end deep learning framework to separate and localize multiple audio sources from the mixture of multi-channels. |
Y. Chen; B. Liu; Z. Zhang; H. -S. Kim; |
149 | Deep Adaptation Control for Acoustic Echo Cancellation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a general framework for adaptation control using deep neural networks (NNs) and apply it to acoustic echo cancellation (AEC). |
A. Ivry; I. Cohen; B. Berdugo; |
150 | Off-the-Shelf Deep Integration For Residual-Echo Suppression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we fine-tune three pre-trained deep learning-based systems originally designed for RES, SS, and SE, and show that the best performing system for the task of RES varies with respect to the acoustic conditions. |
A. Ivry; I. Cohen; B. Berdugo; |
151 | A Complex Spectral Mapping with Inplace Convolution Recurrent Neural Networks For Acoustic Echo Cancellation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Different from most methods which process the entire frequency band, we propose inplace convolution recurrent neural networks (ICRN) for end-to-end AEC, which utilizes inplace convolution and channel-wise temporal modeling to ensure the near-end signal information being preserved. |
C. Zhang; J. Liu; X. Zhang; |
152 | Deep Adaptive Aec: Hybrid of Deep Learning and Adaptive Acoustic Echo Cancellation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we integrate classic adaptive filtering algorithms with modern deep learning to propose a new approach called deep adaptive AEC. |
H. Zhang; S. Kandadai; H. Rao; M. Kim; T. Pruthi; T. Kristjansson; |
153 | Computationally Efficient Fixed-Filter ANC for Speech Based on Long-Term Prediction for Headphone Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose to solve the causality problem in feedforward fixed-filter ANC systems by integrating a long-term linear prediction filter to predict the incoming disturbance, here speech, by the same amount of samples ahead in time, as the non-causal delay. |
Y. Iotov; S. M. N�rholm; V. Belyi; M. Dyrholm; M. G. Christensen; |
154 | End-To-End Deep Learning-Based Adaptation Control for Frequency-Domain Adaptive System Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel end-to-end deep learning-based adaptation control algorithm for frequency-domain adaptive system identification. |
T. Haubner; A. Brendel; W. Kellermann; |
155 | A Few-Sample Strategy for Guitar Tablature Transcription Based on Inharmonicity Analysis and Playability Constraints Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The current work combines the two aforementioned strategies in an explicit manner by employing two discrete components for string-fret classification. |
G. Bastas; S. Koutoupis; M. Kaliakatsos-Papakostas; V. Katsouros; P. Maragos; |
156 | Exploring Transformer�s Potential on Automatic Piano Transcription Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Most recent research about automatic music transcription (AMT) uses convolutional neural networks and recurrent neural networks to model the mapping from music signals to symbolic notation. |
L. Ou; Z. Guo; E. Benetos; J. Han; Y. Wang; |
157 | A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a lightweight neural network for musical instrument transcription, which supports polyphonic outputs and generalizes to a wide variety of instruments (including vocals). |
R. M. Bittner; J. J. Bosch; D. Rubinstein; G. Meseguer-Brocal; S. Ewert; |
158 | Towards Automatic Transcription of Polyphonic Electric Guitar Music: A New Dataset and A Multi-Loss Transformer Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new dataset named EGDB, that contains transcriptions of the electric guitar performance of 240 tablatures rendered with different tones. |
Y. -H. Chen; W. -Y. Hsiao; T. -K. Hsieh; J. -S. R. Jang; Y. -H. Yang; |
159 | Genre-Conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to transcribe the lyrics of polyphonic music using a novel genre-conditioned network. |
X. Gao; C. Gupta; H. Li; |
160 | Pseudo-Label Transfer from Frame-Level to Note-Level in A Teacher-Student Framework for Singing Transcription from Polyphonic Music Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We address the issue by using pseudo labels from vocal pitch estimation models given unlabeled data. |
S. Kum; J. Lee; K. L. Kim; T. Kim; J. Nam; |
161 | Sound Event Detection Guided By Semantic Contexts of Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is because one-hot representations of pre-defined scenes are exploited as prior contexts for such conventional methods. To alleviate this problem, we propose scene-informed SED where pre-defined scene-agnostic contexts are available for more accurate SED. |
N. Tonami; K. Imoto; R. Nagase; Y. Okamoto; T. Fukumori; Y. Yamashita; |
162 | CNN-Transformer with Self-Attention Network for Sound Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To construct a model with high prediction accuracy while capturing the properties of acoustic signals well, we propose an architecture called a CNN-SAN-Transformer, which retains CNN in the blocks close to the input and uses SAN in all remaining blocks. |
K. Wakayama; S. Saito; |
163 | A Mutual Learning Framework for Few-Shot Sound Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Secondly, the feature extractor is task-agnostic (or class-agnostic): the feature extractor is trained with base-class data and directly applied to unseen-class data. To address these issues, we present a novel mutual learning framework with transductive learning, which aims at iteratively updating the class prototypes and feature extractor. |
D. Yang; H. Wang; Y. Zou; Z. Ye; W. Wang; |
164 | Anomalous Sound Detection Using Spectral-Temporal Information Fusion Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper proposes a spectral-temporal fusion based self-supervised method to model the feature of the normal sound, which improves the stability and performance consistency in detection of anomalous sounds from individual machines, even of the same type. |
Y. Liu; J. Guan; Q. Zhu; W. Wang; |
165 | Sparse Self-Attention for Semi-Supervised Sound Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a sparse self-attention mechanism to alleviate the impact. |
Y. Guan; J. Xue; G. Zheng; J. Han; |
166 | Peer Collaborative Learning for Polyphonic Sound Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes how semi-supervised learning, called peer collaborative learning (PCL), can be applied to the polyphonic sound event detection (PSED) task, which is one of the tasks in the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. |
H. Endo; H. Nishizaki; |
167 | PostGAN: A GAN-Based Post-Processor to Enhance The Quality of Coded Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose PostGAN, a GAN-based neural post-processor that operates in the sub-band domain and relies on the U-Net architecture and a learned affine transform. |
S. Korse; N. Pia; K. Gupta; G. Fuchs; |
168 | A DNN Based Post-Filter to Enhance The Quality of Coded Speech in MDCT Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a mask-based post-filter operating directly in MDCT domain of the codec, inducing no extra delay. |
K. Gupta; S. Korse; B. Edler; G. Fuchs; |
169 | A Two-Stage U-Net for High-Fidelity Denoising of Historical Recordings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel denoising method based on a fully-convolutional deep neural network. |
E. Moliner; V. V�lim�ki; |
170 | Experts Versus All-Rounders: Target Language Extraction for Multiple Target Languages Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we extend the Single-TLE framework to Multi-TLE. |
M. Borsdorf; K. Scheck; H. Li; T. Schultz; |
171 | Category-Adapted Sound Event Enhancement with Weakly Labeled Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a category-adapted system to enable enhancement on any selected sound category, where we first familiarize the model to all common sound classes and followed by a category-specific fine-tune procedure to enhance the targeted sound class. |
G. Li; X. Xu; H. Dinkel; M. Wu; K. Yu; |
172 | Sequential MCMC Methods for Audio Signal Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: With the aim of addressing audio signal restoration as a sequential inference problem, we build upon Gabor regression to propose a state-space model for audio time series. |
R. M. Claver�a; S. J. Godsill; |
173 | Architecture for Variable Bitrate Neural Speech Codec with Configurable Computation Complexity Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present a new neural speech codec that: 1) supports variable bitrates 2) supports packet losses of up to 120 ms and 3) can operate at low-compute and high-compute modes. |
T. Jayashankar; et al. |
174 | End-to-End Neural Speech Coding for Real-Time Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. |
X. Jiang; X. Peng; C. Zheng; H. Xue; Y. Zhang; Y. Lu; |
175 | Deep Neural Network (DNN) Audio Coder Using A Perceptually Improved Training Method Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The distortion in the perceptual domain is measured using the psychoacoustic model (PAM), and a loss function is obtained through the two-stage compensation approach. |
S. Shin; J. Byun; Y. Park; J. Sung; S. Beack; |
176 | Progressive Multi-Stage Neural Audio Coding with Guided References Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an effective multi-stage neural audio coding algorithm that encodes full-band audio signals (up to 20 kHz) using an end-to-end training criterion. |
C. Lee; H. Lim; J. Lee; I. Jang; H. -G. Kang; |
177 | Vocbench: A Neural Vocoder Benchmark for Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, it becomes more challenging to assess these new vocoders and compare their performance to previous ones. To address this problem, we present VocBench, a framework that benchmark the performance of state-of-the-art neural vocoders. |
E. A. AlBadawy; A. Gibiansky; Q. He; J. Wu; M. -C. Chang; S. Lyu; |
178 | Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we train an objective metric based on P.835 human ratings that output 3 scores: i) speech quality (SIG), ii) background noise quality (BAK), and iii) the overall quality (OVRL) of the audio. |
C. K. A. Reddy; V. Gopal; R. Cutler; |
179 | SQAPP: No-Reference Speech Quality Assessment Via Pairwise Preference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a learning framework for estimating the quality of a recording without any reference, and without any human judgments. |
P. Manocha; Z. Jin; A. Finkelstein; |
180 | LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we present LDNet, a unified framework for mean opinion score (MOS) prediction that predicts the listener-wise perceived quality given the input speech and the listener identity. |
W. -C. Huang; E. Cooper; J. Yamagishi; T. Toda; |
181 | AECMOS: A Speech Quality Assessment Metric for Echo Impairment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: More precisely, we develop a neural network model to evaluate call quality degradations in two separate categories: echo and degradations from other sources. |
M. Purin; S. Sootla; M. Sponza; A. Saabas; R. Cutler; |
182 | MOS Predictor for Synthetic Speech with I-Vector Inputs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a neural-network-based model that splices the deep features extracted by convolutional neural network (CNN) and i-vector on the time axis and uses Transformer encoder as time sequence model. |
M. Liu; J. Wang; S. Li; F. Xiang; Y. Yao; L. Yang; |
183 | Wave-Domain Approach for Cancelling Noise Entering Open Windows Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a wave-domain approach that converges instantaneously, operates with low computational effort and does not require error microphones. |
D. Ratering; W. B. Kleijn; J. Gonzalez Silva; R. M. G. Ferrari; |
184 | On Synchronization of Wireless Acoustic Sensor Networks in The Presence of Time-Varying Sampling Rate Offsets and Speaker Changes Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: A wireless acoustic sensor network records audio signals with sampling time and sampling rate offsets between the audio streams, if the analog-digital converters (ADCs) of the network devices are not synchronized. Here, we introduce a new sampling rate offset model to simulate time-varying sampling frequencies caused, for example, by temperature changes of ADC crystal oscillators, and propose an estimation algorithm to handle this dynamic aspect in combination with changing acoustic source positions. |
T. Gburrek; J. Schmalenstroeer; R. Haeb-Umbach; |
185 | Picknet: Real-Time Channel Selection for Ad Hoc Microphone Arrays Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes PickNet, a neural network model for real-time channel selection using an ad hoc microphone array. |
T. Yoshioka; X. Wang; D. Wang; |
186 | End-To-End Alexa Device Arbitration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a variant of the speaker localization problem, which we call device arbitration. |
J. Barber; Y. Fan; T. Zhang; |
187 | Instantaneous Linear Dimensionality Reduction of Multichannel Time-Series Signal for Array Signal Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a frequency-independent, i.e., instantaneous, linear dimensionality reduction method that achieves low computational cost and latency and high restoration accuracy. |
N. Ueno; N. Ono; |
188 | Generalized Time Domain Velocity Vector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce and analyze Generalized Time Domain Velocity Vector (GTVV), an extension of the previously presented acoustic multipath footprint extracted from the Ambisonic recordings. |
S. Kitic; J. Daniel; |
189 | Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a model (DDSP mixture model) that represents a mixture as the sum of the outputs of multiple pretrained DDSP autoencoders. |
M. Kawamura; T. Nakamura; D. Kitamura; H. Saruwatari; Y. Takahashi; K. Kondo; |
190 | The Mirrornet : Learning Audio Synthesizer Controls Inspired By Sensorimotor Interaction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, the MirrorNet is applied to learn, in an unsupervised manner, the controls of a specific audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. |
Y. M. Siriwardena; G. Marion; S. Shamma; |
191 | Deep Performer: Score-to-Audio Music Performance Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hence, we propose two new techniques for handling polyphonic inputs and providing a fine-grained conditioning in a transformer encoder-decoder model. |
H. -W. Dong; C. Zhou; T. Berg-Kirkpatrick; J. McAuley; |
192 | KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE Using Mel-Spectrograms Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel neural network model called KaraSinger for a less-studied singing voice synthesis (SVS) task named score-free SVS, in which the prosody and melody are spontaneously decided by machine. |
C. -F. Liao; J. -Y. Liu; Y. -H. Yang; |
193 | Adversarial Audio Synthesis Using A Harmonic-Percussive Discriminator Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a discriminator design scheme for generative adversarial network-based audio signal generation. |
J. Lee; H. Lim; C. Lee; I. Jang; H. -G. Kang; |
194 | SleepGAN: Towards Personalized Sleep Therapy Music Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we take the first step towards generating personalized sleep therapy music. |
J. Yang; C. Min; A. Mathur; F. Kawsar; |
195 | Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel neural conditional captioning model to balance the diversity and accuracy trade-off. |
X. Xu; M. Wu; K. Yu; |
196 | Audioclip: Extending Clip to Image, Text and Audio Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Utilizing the AudioSet dataset, our proposed model incorporates the ESResNeXt audio-model into the CLIP framework, thus enabling it to perform multimodal classification and keeping CLIP�s zero-shot capabilities.AudioCLIP achieves new state-of-the-art results in the Environmental Sound Classification (ESC) task and out-performs others by reaching accuracies of 97.15 % on ESC-50 and 90.07 % on UrbanSound8K. |
A. Guzhov; F. Raue; J. Hees; A. Dengel; |
197 | Can Audio Captions Be Evaluated With Image Caption Metrics? Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To overcome their limitations, we propose a metric named FENSE, where we combine the strength of Sentence-BERT in capturing similarity, and a novel Error Detector to penalize erroneous sentences for robustness. |
Z. Zhou; Z. Zhang; X. Xu; Z. Xie; M. Wu; K. Q. Zhu; |
198 | A Data-Driven Cognitive Salience Model for Objective Perceptual Audio Quality Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel data-driven salience model that informs the quality mapping stage by explicitly estimating the cognitive/degradation metric interactions using a salience measure. |
P. M. Delgado; J. Herre; |
199 | Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-Box Acoustic Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A deep neural network (DNN)-based speech enhancement (SE) aiming to maximize the performance of an automatic speech recognition (ASR) system is proposed in this paper. |
R. Sawata; Y. Kashiwagi; S. Takahashi; |
200 | Effect of Noise Suppression Losses on Speech Distortion and ASR Performance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, the introduced speech distortion and artifacts greatly harm speech quality and intelligibility, and often significantly degrade automatic speech recognition (ASR) rates. In this work, we shed light on the success of the spectral complex compressed mean squared error (MSE) loss, and how its magnitude and phase-aware terms are related to the speech distortion vs. noise reduction trade off. |
S. Braun; H. Gamper; |
201 | Increasing Loudness in Audio Signals: A Perceptually Motivated Approach to Preserve Audio Quality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method to maintain the subjective perception of volume of audio signals and, at the same time, reduce their absolute peak value. |
A. Jeannerot; N. de Koeijer; P. Mart�nez-Nuevo; M. B. M�ller; J. Dyreby; P. Prandoni; |
202 | Audio Peak Reduction Using A Synced Allpass Filter Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a new technique for linear peak amplitude reduction is proposed based on a Schroeder allpass filter, whose delay line and gain parameters are synced to match peaks of the signal�s auto-correlation function. |
S. J. Schlecht; L. Fierro; V. V�lim�ki; J. Backman; |
203 | APPLADE: Adjustable Plug-and-Play Audio Declipper Combining DNN with Sparse Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an audio declipping method that takes advantages of both sparse optimization and deep learning. |
T. Tanaka; K. Yatabe; M. Yasuda; Y. Oikawa; |
204 | Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, and Pretraining: An Ablation Study Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents an ablation study that analyzes which components contribute to the boost in performance and training time. |
D. Tompkins; K. Kumar; J. Wu; |
205 | Threshold Independent Evaluation of Sound Event Detection Scores Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Performing an adequate evaluation of sound event detection (SED) systems is far from trivial and is still subject to ongoing research. |
J. Ebbers; R. Haeb-Umbach; R. Serizel; |
206 | Multimodal Evaluation Method for Sound Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel multimodal method to evaluate SED systems from multiple perspectives such as detection, total duration, relative duration, and uniformity. |
S. M. R. Modaresi; A. Osmani; M. Razzazi; A. Chibani; |
207 | A Benchmark of State-of-the-Art Sound Event Detection Systems Evaluated on Synthetic Soundscapes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a benchmark of submissions to Detection and Classification Acoustic Scene and Events 2021 Challenge (DCASE) Task 4 representing a sampling of the state-of-the-art in Sound Event Detection task. |
F. Ronchini; R. Serizel; |
208 | Attentive Max Feature Map and Joint Training for Acoustic Scene Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the attentive max feature map that combines two effective techniques, attention and a max feature map, to further elaborate the attention mechanism and mitigate the above-mentioned phenomenon. |
H. -j. Shim; J. -w. Jung; J. -h. Kim; H. -J. Yu; |
209 | A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. |
H. Hu; S. M. Siniscalchi; C. -H. H. Yang; C. -H. Lee; |
210 | ORCA-PARTY: An Automatic Killer Whale Sound Type Separation Toolkit Using Deep Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The current study is the first introducing a fully-automated deep signal separation approach for overlapping orca vocalizations, addressing all of the previously mentioned challenges, together with one of the largest bioacoustic data archives recorded on killer whales (Orcinus Orca). |
C. Bergler; M. Schmitt; A. Maier; R. X. Cheng; V. Barth; E. N�th; |
211 | Sparsity-Based Sound Field Separation in The Spherical Harmonics Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Sound field analysis and reconstruction has been a topic of intense research in the last decades for its multiple applications in spatial audio processing tasks. In this context, the identification of the direct and reverberant sound field components is a problem of great interest, where several solutions exploiting spherical harmonics representations have already been proposed. |
M. Pezzoli; M. Cobos; F. Antonacci; A. Sarti; |
212 | Spatial Active Noise Control Based on Individual Kernel Interpolation of Primary and Secondary Sound Fields Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, whereas the sound field to be interpolated is a superposition of primary and secondary sound fields, the directional weight for the primary noise source was applied to the total sound field in previous work; therefore, the performance improvement was limited. We propose a method of individually interpolating the primary and secondary sound fields and formulate a normalized least-mean-square algorithm based on this interpolation method. |
K. Arikawa; S. Koyama; H. Saruwatari; |
213 | Time-Domain Acoustic Contrast Control with A Spatial Uniformity Constraint for Personal Audio Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a spatial uniformity constraint on time-domain broadband ACC in addition to the frequency response trend estimation constraint with the aim of ensuring a uniform sound field distribution in the bright zone. |
S. Zhao; I. S. Burnett; |
214 | Generation of Personal Sound Fields in Reverberant Environments Using Interframe Correlation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a personal sound field control approach that exploits interframe correlation. |
L. Shi; G. Ping; X. Shen; M. G. Christensen; |
215 | Variable Span Trade-Off Filter for Sound Zone Control with Kernel Interpolation Weighting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A sound zone control method is proposed, based on the frequency domain variable span trade-off filter (VAST). |
J. Brunnstr�m; S. Koyama; M. Moonen; |
216 | Time Domain Radial Filter Design for Spherical Waves Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, the time-domain radial functions for spherical waves are realized as FIR filters. |
N. Hahn; F. Schultz; S. Spors; |
217 | Feature Space Message Passing Network for Medical Image Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve both problems, we propose a novel feature space message passing network (FSMPN) framework. |
J. Sun; K. Zhang; S. Niu; Y. Zhang; Y. Kong; |
218 | Cross-Domain Few-Shot Learning for Rare-Disease Skin Lesion Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a cross-domain few-shot segmentation (CD-FSS) framework, which enables the model to leverage the learning ability obtained from the natural domain, to facilitate rare-disease skin lesion segmentation with limited data of common diseases. |
Y. Wang; et al. |
219 | Adaptive Pseudo Labeling for Source-Free Domain Adaptation in Medical Image Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we combine the dual-classifiers consistency and predictive category-aware confidence to form a novel regularization for pseudo-label denoising. |
C. Li; W. Chen; X. Luo; Y. He; Y. Tan; |
220 | Object Detection and Tracking in Ultrasound Scans Using An Optical Flow and Semantic Segmentation Framework Based on Convolutional Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a framework to autonomously detect, localize and track anatomical structures in ultrasound scans during scanning and therapeutic sessions in real-time. |
A. F. Al-Battal; I. R. Lerman; T. Q. Nguyen; |
221 | Heuristic Dropout: An Efficient Regularization Method for Medical Image Segmentation Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This manuscript goes deep into the research of the Dropout algorithm, which is commonly used in neural networks to alleviate the overfitting problem. From the perspective of solving the co-adaptation problem, this manuscript explains the basic principles of the Dropout algorithm and discusses the existing limitations of its derivative methods. |
D. Shi; R. Liu; L. Tao; C. Yuan; |
222 | Superresolution and Segmentation of OCT Scans Using Multi-Stage Adversarial Guided Attention Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work aims to segment the OCT images automatically; however, it is a challenging task due to various issues such as the speckle noise, small target region, and unfavorable imaging conditions. |
P. Jeihouni; O. Dehzangi; A. Amireskandari; A. Dabouei; A. Rezai; N. M. Nasrabadi; |
223 | Heart Rate and Oxygen Saturation Estimation from Facial Video with Multimodal Physiological Data Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a method to estimate heart rate and oxygen saturation from facial videos with multimodal physiological data generation. |
Y. Akamatsu; Y. Onishi; H. Imaoka; |
224 | EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel EMGSE framework for multimodal SE, which integrates audio and facial electromyography (EMG) signals. |
K. -C. Wang; K. -C. Liu; H. -M. Wang; Y. Tsao; |
225 | A Dilated Residual Vision Transformer for Atrial Fibrillation Detection from Stacked Time-Frequency ECG Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a new vision transformer (ViT) variant, namely, Dilated Residual ViT (DiResViT), by replacing the original patchify stem in ViT with dilated convolutional stem having residual connections for improved AF detection from an ensemble of ECG time-frequency representations. |
S. Pratiher; A. Srivastava; Y. B. Priyatha; N. Ghosh; A. Patra; |
226 | Contrastive Heartbeats: Contrastive Learning for Self-Supervised ECG Representation and Phenotyping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hence, we propose a new self-supervised representation learning framework, contrastive heartbeats (CT-HB), which learns general and robust electrocardiogram representations for efficient training on various downstream tasks. |
C. T. Wei; M. -E. Hsieh; C. -L. Liu; V. S. Tseng; |
227 | Ubiquitous Physiological Prediction of SUD Patients� Wellness State Using Memory-Based Convolutional Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Using wearable sensors, we aim to evaluate the impact of changes in heart rate (HR) and heart rate variability (HRV) signals on SUD wellness development using long-term and ubiquitous monitoring and machine learning and collected data from 10 subjects over an extended period of time. |
O. Dehzangi; P. Jeihouni; J. Ramadan; V. Finomore; N. M. Nasrabadi; A. Rezai; |
228 | Joint Hypoglycemia Prediction and Glucose Forecasting Via Deep Multi-Task Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a multitask learning approach to the problem of hypoglycemia (HG) prediction in diabetes. |
M. Yang; D. Dave; M. Erraguntla; G. L. Cote; R. Gutierrez-Osuna; |
229 | SegNet-Based Deep Representation Learning for Dysphagia Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This article presents a SegNet-based method for classifying healthy and dysphagic swallow signals by learning mel-spectrogram features. |
S. Subramani; A. R. M. V; A. Roy; P. S. Hegde; P. Kumar Ghosh; |
230 | Robust Collaborative Learning for Sequence Modelling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By constructing model-agnostic robustness checks and reusing features obtained from both architectures, we build a collaborative framework that improves performance and stability. |
F. Buet-Golfouse; H. Roggeman; I. Utyagulov; |
231 | A Self-Supervised Pre-Training Framework for Vision-Based Seizure Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to classify ES and PNES based on clinical signs in the seizure videos. |
J. -C. Hou; A. McGonigal; F. Bartolomei; M. Thonnat; |
232 | Design of Real-Time System Based on Machine Learning for Snoring and OSA Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we place a microphone under the patient�s bed and combined with full-night polysomnography to record audio signals. |
H. Luo; L. Zhang; L. Zhou; X. Lin; Z. Zhang; M. Wang; |
233 | Parametric Modeling of Human Wrist for Bioimpedance-Based Physiological Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This study provides a parametric model of the human wrist that involves different tissue layers (i.e., skin, fat, artery, muscle, bone) with complex dielectric properties built based on the human wrist anatomy. |
K. Sel; N. Huerta; M. S. Sacks; R. Jafari; |
234 | Preliminary Results on The Generation of Artificial Handwriting Data Using A Decomposition-Recombination Strategy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes the use of data augmentation techniques to improve the accuracy of a Long short-term memory system in the diagnosis of essential tremor. |
J. F. Adr�n Otero; O. Sol�ns Caballer; P. Marti-Puig; Z. Sun; T. Tanaka; J. Sol�-Casals; |
235 | A Style Transfer Mapping and Fine-Tuning Subject Transfer Framework Using Convolutional Neural Networks for Surface Electromyogram Pattern Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a style transfer mapping (STM) and fine-tuning (FT) subject transfer framework using convolutional neural networks (CNNs). |
S. Kanoga; T. Hoshino; M. Tada; |
236 | Feature-Based Sensing Matrix Design for Analog to Information Converters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel sensing matrix design for the pulse-width modulation (PWM)-based analog-to-information converter (AIC), which obtains the digital feature of an analog signal rather than its sparse coefficients. |
C. Guo; H. Qian; B. Hong; |
237 | ALSNet: A Dilated 1-D CNN for Identifying ALS from Raw EMG Signal Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a dilated one dimensional convolutional neural network, named ALSNet, is proposed for identifying ALS from raw EMG signal. |
K. M. Naimul Hassan; et al. |
238 | Joint Model Order Estimation for Multiple Tensors with A Coupled Mode and Applications to The Joint Decomposition of EEG, MEG Magnetometer, and Gradiometer Tensors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we extend the rank estimation techniques, designed for a single tensor, to noise-corrupted coupled low-rank tensors that share one of their factor matrices. |
B. Ahmad; L. Khamidullina; A. A. Korobkov; A. Manina; J. Haueisen; M. Haardt; |
239 | An Experimental Study on Transferring Data-Driven Image Compressive Sensing to Bioelectric Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we conduct an experimental study on transferring existing data-driven image CS methods to bioelectric signals. |
Z. Zhang; J. Zhao; F. Ren; |
240 | Hand Gesture Recognition Using Temporal Convolutions and Attention Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such data-driven models, however, have been challenged by their need for a large number of trainable parameters and their structural complexity. Here we propose the novel Temporal Convolutions-based Hand Gesture Recognition architecture (TC-HGR) to reduce this computational burden. |
E. Rahimian; S. Zabihi; A. Asif; D. Farina; S. F. Atashzar; A. Mohammadi; |
241 | Combining Multiple Style Transfer Networks and Transfer Learning For LGE-CMR Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents an algorithm for segmenting late gadolinium enhancement cardiac magnetic resonance (LGE-CMR) in the absence of labeled training data. |
B. Fang; J. Chen; W. Wang; Y. Zhou; |
242 | Multi-Domain Unpaired Ultrasound Image Artifact Removal Using A Single Convolutional Neural Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the recent success of multi-domain image transfer, herein, we propose a novel unpaired deep learning approach where a single neural network can deal with different types of US artifacts simply by changing a mask vector that switches between different target domains. |
J. Huh; S. Khan; J. C. Ye; |
243 | Improving Ultrasound Image Classification with Local Texture Quantisation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel image classification framework for small-scaled and noisy ultrasound image datasets. |
X. Li; H. Liang; S. Nagala; J. Chen; |
244 | Accelerated Intravascular Ultrasound Imaging Using Deep Reinforcement Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To efficiently accelerate IVUS imaging, we propose a framework that utilizes deep reinforcement learning for an optimal adaptive acquisition policy on a per-frame basis enabled by actor-critic methods and Gumbel top-K sampling. |
T. S. W. Stevens; N. Chennakeshava; F. J. de Bruijn; M. Pekar; R. J. G. van Sloun; |
245 | Deep Proximal Unfolding For Image Recovery from Under-Sampled Channel Data in Intravascular Ultrasound Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a model-based deep learning solution that aims to reconstruct images from data that has been beamformed by under-sampling the number of channels by a factor of 4. |
N. Chennakeshava; et al. |
246 | Multiview Long-Short Spatial Contrastive Learning For 3D Medical Image Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we extend the contrastive learning framework to 3D volumetric medical imaging. |
G. Cao; Y. Wang; M. Zhang; J. Zhang; G. Kang; X. Xu; |
247 | Composing Graphical Models with Generative Adversarial Networks for EEG Signal Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a generative and inference approach that combines the complementary benefits of probabilistic graphical models and generative adversarial networks (GANs) for EEG signal modeling. |
K. Vo; M. Vishwanath; R. Srinivasan; N. Dutt; H. Cao; |
248 | Domain-Invariant Representation Learning from EEG with Private Encoders Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To that end, we propose a multi-source learning architecture where we extract domain-invariant representations from dataset-specific private encoders. |
D. Bethge; et al. |
249 | Holistic Semi-Supervised Approaches for EEG Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we adapt three state-of-the-art holistic semi-supervised approaches, namely MixMatch [1], Fix-Match [2], and AdaMatch [3], as well as five classical semi-supervised methods for EEG learning. |
G. Zhang; A. Etemad; |
250 | Music Identification Using Brain Responses to Initial Snippets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We examine EEG encoding of naturalistic musical patterns employing the NMED-T and MUSIN-G datasets. |
P. Pandey; G. Sharma; K. P. Miyapuram; R. Subramanian; D. Lomas; |
251 | Multi-Level Spatial-Temporal Adaptation Network for Motor Imagery Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: And this variance is more significant across subjects and sessions, which imposes limitations on the cross-domain MI tasks. To address this problem, we propose a Multi-level Spatial-Temporal Adaptation Network (MSTAN), extracting domain-invariant multi-level spatial-temporal features to overcome domain differences. |
W. Xu; J. Wang; Z. Jia; Z. Hong; Y. Li; Y. Lin; |
252 | Learning Subject-Invariant Representations from Speech-Evoked EEG Using Variational Autoencoders Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we adapt factorized hierarchical variational autoencoders to exploit parallel EEG recordings of the same stimuli. |
L. Bollens; T. Francart; H. V. Hamme; |
253 | Unsupervised Hierarchical Translation-Based Model for Multi-Modal Medical Image Registration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an unsupervised hierarchical translation-based model to perform a coarse to fine registration of multi-modal medical images. |
X. Dai; T. Ma; H. Cai; Y. Wen; |
254 | FAZ-BV: A Diabetic Macular Ischemia Grading Framework Combining Faz Attention Network and Blood Vessel Enhancement Filters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, none of the existing methods can effectively segment the damaged foveal avascular zone (FAZ) and blood vessels (BV) of DMI patients. To avoid this disadvantage, this study proposes a DMI grading framework, i.e. FAZ-BV, combining accurate FAZ and vessel segmentation designed for DMI. |
Z. Chen; H. Lan; Y. Meng; Y. Xiong; J. Luo; H. Shen; |
255 | Fracture Detection and Localization in Chest X-Rays Using Semi-Supervised Learning with Dynamic Sharpening Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a low-cost and efficient method for training a rib and clavicle fracture detection model for chest X-ray (CXR) in a semi-supervised setting where only a small portion of training data with location annotation. |
L. Lu; S. Miao; L. Ye; |
256 | Histokt: Cross Knowledge Transfer in Computational Pathology Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we take a data-centric approach to the transfer learning problem and examine the existence of generalizable knowledge between histopathological datasets. |
R. Zhang; et al. |
257 | Unsupervised Deep Learning Network for Deformable Fundus Image Registration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Aiming at addressing the retina registration problem from the Deep Learning perspective, in this paper we introduce an end-to-end framework capable of learning the registration task in a fully unsupervised way. |
G. A. Benvenuto; M. Colnago; W. Casaca; |
258 | A Minimally Supervised Approach for Medical Image Quality Assessment in Domain Shift Settings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a minimally-supervised image quality assessment (MIQA) approach that can learn effectively with small datasets and limited labels in class-imbalanced domain shift scenarios. |
H. Yang; et al. |
259 | A Channel Attention Based MLP-Mixer Network for Motor Imagery Decoding With EEG Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Convolutional neural networks (CNNs) and their variants have been successfully applied to the electroencephalogram (EEG) based motor imagery (MI) decoding task. However, these … |
Y. He; Z. Lu; J. Wang; J. Shi; |
260 | Towards Closed-Loop Speech Synthesis from Stereotactic EEG: A Unit Selection Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The present study aims to address both challenges. |
M. Angrick; et al. |
261 | Enhancing Contextual Encoding With Stage-Confusion and Stage-Transition Estimation for EEG-Based Sleep Staging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel network architecture that takes advantage of two auxiliary classification tasks and exploits their outputs to adapt feature representations, thus effectively discriminating confusing stages. |
J. Phyo; W. Ko; E. Jeon; H. -I. Suk; |
262 | Improving BCI-based Color Vision Assessment Using Gaussian Process Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present metamer identification plus (metaID+), an algorithm that enhances the performance of brain-computer interface (BCI)-based color vision assessment. |
H. Habibzadeh; K. J. Long; A. E. Atkins; D. -S. Zois; J. J. S. Norton; |
263 | Transformer-Based Estimation of Spoken Sentences Using Electrocorticography Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Invasive brain�machine interfaces (BMIs) are a promising neurotechnological venture for achieving direct speech communication from a human brain, but it faces many challenges. In this paper, we measured the invasive electrocorticogram (ECoG) signals from seven participating epilepsy patients as they spoke a sentence consisting of multiple phrases. |
S. Komeiji; et al. |
264 | Boost Ensemble Learning for Classification of CTG SIGNALS Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In practice, we face highly imbalanced data, where the hypoxic fetuses are significantly underrepresented. We propose to address this problem by boost ensemble learning, where for learning, we use the distribution of classification error over the dataset. |
M. Ajirak; C. Heiselman; J. G. Quirk; P. M. Djuric; |
265 | Multi-View Learning Based on Non-Redundant Fusion for Icu Patient Mortality Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Those predicting from a single perspective cannot fully apply multiple sources of information, while the fusion of multiple perspectives may produce much redundant information. Therefore, this paper proposes a multi-view fusion method based on non-redundant information learning, applying it to ICU patient mortality prediction. |
Y. Wang; Y. Lan; |
266 | Improving Phase-Rectified Signal Averaging for Fetal Heart Rate Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we examine PRSA-based methods through the lens of dynamical systems theory and reveal the intrinsic connection between state space reconstruction and PRSA. |
T. Chen; G. Feng; C. Heiselman; J. G. Quirk; P. M. Djuric; |
267 | Unsupervised Clustering and Analysis of Contraction-Dependent Fetal Heart Rate Segments Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we provide a complete method for FHR-UC segment clustering and analysis via the Gaussian process latent variable model, and density-based spatial clustering. |
L. Yang; C. Heiselman; J. G. Quirk; P. M. Djuric; |
268 | A Method for Detecting Coronary Artery Disease Using Noisy Ultrashort Electrocardiogram Recordings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The current study aims at creating an algorithm able to detect Coronary Artery Disease (CAD), using ultrashort (duration of 30 seconds) one-lead ECG recordings. |
O. Apostolou; V. Charisis; G. Apostolidis; L. J. Hadjileontiadis; |
269 | Multi-Task Gaussian Process Regression for The Detection of Sleep Cycles in Premature Infants Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Studies on neonatal sleep suggest that the pattern of their sleep stages is determined by an endogenous ultradian rhythm, superimposed by other rhythms and external influences. In this article, we propose the use of multi-task Gaussian process regression as a flexible nonparametric approach to analyze this kind of sleep data while incorporating prior knowledge, such as of correlations between signals, signal periodicity, information from manual annotations and certain other signal properties. |
N. S. Br�gge; J. Grasshoff; A. Weigenand; P. Rostalski; |
270 | Fast Low Rank Column-Wise Compressive Sensing For Accelerated Dynamic MRI Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Accelerated dynamic MRI is a key application where this problem occurs. In this work, we show the power of our approach (and of its modification for the MRI setting) for four very different highly undersampled dynamic MRI applications. |
S. Babu; S. S. Nayer; S. G. Lingala; N. Vaswani; |
271 | MRI Recovery with A Self-Calibrated Denoiser Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a PnP-inspired recovery method that does not require data beyond the single, incomplete set of measurements. |
S. Liu; P. Schniter; R. Ahmad; |
272 | 3d Cross-Scale Feature Transformer Network for Brain Mr Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a 3D cross-scale feature transformer network (CFTN) to utilize the cross-scale priors within MR features. |
W. Zhang; L. Wang; W. Chen; Y. Jia; Z. He; J. Du; |
273 | Data Efficient Support Vector Machine Training Using The Minimum Description Length Principle Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a novel approach to training SVMs which does not suffer from the aforementioned limitation, which is at the same time much more rigorous in nature, being built upon solid information theoretic grounds. |
H. Singh; O. Arandjelovic; |
274 | Multiple Instance Learning with Task-Specific Multi-Level Features for Weakly Annotated Histopathological Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, three major challenges including lack of data efficiency because MIL approaches rely on task-agnostic feature extractor, overfitting challenges caused by high data imbalance between tumor and normal tissues, and the similarity between tumor and normal patches, are to be tackled. We proposed a three-stage deep MIL approach to address these challenges. |
Y. Zhou; Y. Lu; |
275 | Self-Knowledge Distillation Based Self-Supervised Learning for Covid-19 Detection from Chest X-Ray Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel self-knowledge distillation based self-supervised learning method for COVID-19 detection from chest X-ray images. |
G. Li; R. Togo; T. Ogawa; M. Haseyama; |
276 | Pixel-Level and Affinity-Level Knowledge Distillation for Unsupervised Segmentation of Covid-19 Lesions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although an unsupervised method based on anomaly detection has shown promising results in [1], its performance is relatively poor. We address this problem by proposing a pixel-level and affinity-level knowledge distillation method. |
R. Xu; et al. |
277 | Data Shapley Value for Handling Noisy Labels: An Application in Screening Covid-19 Pneumonia from Chest CT Scans Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, effects of utilizing different evaluation metrics for computation of the SV, detecting the noisy labels, and measuring the data points� importance has not yet been thoroughly investigated. In this context, we performed a series of comparative analyses to assess SV�s capabilities to detect noisy input labels when measured by different evaluation metrics. |
N. Enshaei; M. J. Rafiee; A. Mohammadi; F. Naderkhani; |
278 | Accurate Multiscale Selective Fusion of CT and Video Images for Real-Time Endoscopic Camera 3D Tracking in Robotic Surgery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an accurate multiscale selective fusion framework to register 2D endoscopic video images to 3D pre-operative CT data for endoscope 3D tracking. |
X. Luo; |
279 | Learning Deep Pathological Features for WSI-Level Cervical Cancer Grading Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As WSIs are in gigapixel resolution, it is impossible to train a deep classification neural network with the entire WSIs as inputs. To bypass this problem, we propose a two-stage learning framework. |
R. Geng; Q. Liu; S. Feng; Y. Liang; |
280 | Selective Scale Cascade Attention Network for Breast Cancer Histopathology Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose selective scale cascade attention network (SSCA) to learning discriminative features for breast histopathological image classification. |
B. Xu; W. Zhang; |
281 | Frequency-Specific Non-Linear Granger Causality in A Network of Brain Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel algorithm to extract frequency-band specific and non-linear Granger causality (Spectral NLGC) connections between components of a multivariate time series. |
A. Biswas; H. Ombao; |
282 | Epileptic Spike Detection By Recurrent Neural Networks with Self-Attention Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper thus considers a scenario where candidates are not detected; that is, we propose a recurrent neural network (RNN)�based self-attention model that can be fitted from the EEG segments generated without spike candidates being detected. |
K. Fukumori; N. Yoshida; H. Sugano; M. Nakajima; T. Tanaka; |
283 | Topological Correlation of Brain Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a new correlation measure for EEG signals by correlating topological features across multiple directions. |
J. Yin; Y. Wang; |
284 | Online Detection of Scalp-Invisible Mesial-Temporal Brain Interictal Epileptiform Discharges from EEG Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a method namely temporal components analysis (TCA) to detect the IEDs from ongoing sEEG and iEEG signals recorded simultaneously. |
B. Abdi-Sargezeh; A. Valentin; G. Alarcon; S. Sanei; |
285 | Leveraging Sparse Coding for EEG Based Emotion Recognition in Shooting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we collected EEG of novice shooters and high-level shooters in different emotion states, and established two shooting datasets. |
Y. Wang; Y. Sun; L. Fang; C. Zhang; |
286 | A Novel Unsupervised Autoencoder-Based HFOs Detector in Intracranial EEG Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of the existing HFOs detectors are based on manual feature extraction and supervised learning, which incur laborious feature selection and time-consuming labeling process. In order to tackle these issues, we propose an automatic unsupervised HFOs detector based on convolutional variational autoencoder (CVAE). |
W. Li; L. Zhong; W. Xiang; T. Kang; D. Lai; |
287 | A Novel Convolutional Neural Network Based on Adaptive Multi-Scale Aggregation and Boundary-Aware for Lateral Ventricle Segmentation on MR Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel convolutional neural network based on adaptive multi-scale feature aggregation and boundary-aware for lateral ventricle segmentation (MB-Net), which mainly includes three parts, i.e., an adaptive multi-scale feature aggregation module (AMSFM), an embedded boundary refinement module (EBRM), and a local feature extraction module (LFM). |
F. Ye; Z. Wang; S. Zhu; X. Li; K. Hu; |
288 | Multiscale Attention Aggregation Network for 2D Vessel Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel multiscale attention aggregation network (MAA-Net) for vessel segmentation. |
W. Liu; H. Yang; T. Tian; X. Pan; W. Xu; |
289 | TCRNet: Make Transformer, CNN and RNN Complement Each Other Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel encoder-decoder network named TCRNet, which makes Transformer, Convolutional neural network (CNN) and Recurrent neural network (RNN) complement each other. |
X. Shan; T. Ma; A. Gu; H. Cai; Y. Wen; |
290 | Double Noise Mean Teacher Self-Ensembling Model for Semi-Supervised Tumor Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel double noise mean teacher self-ensembling model for semi-supervised 2D tumor segmentation. |
K. Zheng; J. Xu; J. Wei; |
291 | Rethinking Computer-Aided Pelvis Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Some mainstream segmentation algorithms are trained and evaluated on the proposed PCT14K dataset and served as the baselines for future research. |
S. Yuan; Q. Liu; S. Liao; F. Han; H. Wei; Y. Zhang; |
292 | Vision Transformer-Based Retina Vessel Segmentation with Deep Adaptive Gamma Correction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the complexity of edge structural information and the changeable intensity distribution depending on retina images reduce the performance of the segmentation tasks. This paper proposes two novel deep learning-based modules, channel attention vision transformer (CAViT) and deep adaptive gamma correction (DAGC), to tackle these issues. |
H. Yu; J. -h. Shim; J. Kwak; J. W. Song; S. -J. Kang; |
293 | Spectral Permutation Test on Persistence Diagrams Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a novel spectral permutation test on PDs by permuting Fourier coefficients from heat kernel estimation of the PDs. |
Y. Wang; M. K. Chung; J. Fridriksson; |
294 | Multi-Task FMRI Data Fusion Using IVA and PARAFAC2 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Various formulations of coupled matrix factorizations have been proposed, each with its own modeling assumptions. In this paper, we study two such methods, namely Independent Vector Analysis (IVA), i.e., extension of Independent Component Analysis (ICA) to multiple datasets, and PARAFAC2, a tensor factorization approach. |
I. Lehmann; et al. |
295 | Independent Vector Analysis Based Subgroup Identification from Multisubject FMRI Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a completely data-driven approach, subgroup identification using independent vector analysis (SI-IVA), which leverages the desirable properties of IVA to uncover the relationship across subjects along with the discovery of subgroup structures revealed by Gershgorin disc theorem. |
H. Yang; M. A. B. S. Akhonda; F. Ghayem; Q. Long; V. D. Calhoun; T. Adali; |
296 | Improving Brain Decoding Methods and Evaluation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to directly classify an fMRI scan, mapping it to the corresponding word within a fixed vocabulary. |
D. Pascual; B. Egressy; N. Affolter; Y. Cai; O. Richter; R. Wattenhofer; |
297 | Cmri2spec: Cine MRI Sequence to Spectrogram Synthesis Via A Pairwise Heterogeneous Translator Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a new synthesis framework to translate from cine MRI sequences to spectrograms with a limited dataset size. |
X. Liu; et al. |
298 | Spatio-Temporal Attention Graph Convolution Network for Functional Connectome Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a novel Spatio-Temporal Attention Graph Convolution Network (STAGCN) for FC classification. |
W. Wang; Y. Kong; Z. Hou; C. Yang; Y. Yuan; |
299 | Bilevel Learning of L1 Regularizers with Closed-Form Gradients (BLORC) Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method for supervised learning of sparsity-promoting regularizers, which are a key ingredient in many modern signal reconstruction approaches. |
A. Ghosh; M. T. Mccann; S. Ravishankar; |
300 | Multiband Image Fusion with Controllable Error Guarantees Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In classical variational techniques, this problem is formulated as the minimization of an objective function consisting of two quadratic data-fidelity terms and an edge-preserving regularizer; the former account for blur, resolution mismatch and additive noise. In this work, we explore a constrained formulation of this problem where the regularization function is minimized subject to hard constraints on the data fidelity. |
U. V. S.; R. G. Gavaskar; K. N. Chaudhury; |
301 | Weighted Graph Embedded Low-Rank Projection Learning for Feature Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve those problems, in this paper a weighted graph embedded low-rank projection (WGE_LRP) method is proposed. |
Z. Huang; S. Zhao; L. Fei; J. Wu; |
302 | ADMM-DAD Net: A Deep Unfolding Network for Analysis Compressed Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a new deep unfolding neural network based on the ADMM algorithm for analysis Compressed Sensing. |
V. Kouni; G. Paraskevopoulos; H. Rauhut; G. C. Alexandropoulos; |
303 | High-Dimensional Sparse Bayesian Learning Without Covariance Matrices Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new inference scheme that avoids explicit construction of the covariance matrix by solving multiple linear systems in parallel to obtain the posterior moments for SBL. |
A. Lin; A. H. Song; B. Bilgic; D. Ba; |
304 | A Trainable Bounded Denoiser Using Double Tight Frame Network for Snapshot Compressive Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, the PnP-GAP algorithm has achieved remarkable reconstruction quality for snapshot compressive imaging (SCI), and its convergence has been proven based on the condition of diminishing noise levels and the assumption of bounded denoisers. |
B. Shi; Y. Wang; Q. Lian; |
305 | Progressive Image Super-Resolution Via Neural Differential Equation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new approach for the image super-resolution (SR) task that progressively restores a high-resolution (HR) image from an input low-resolution (LR) image on the basis of a neural ordinary differential equation. |
S. Park; T. H. Kim; |
306 | High-Quality Self-Supervised Snapshot Hyperspectral Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper leverages the image priors encoded in untrained neural networks (NNs) to have a self-supervised learning method which is free from training datasets while adaptive to the statistics of a test sample. |
Y. Quan; X. Qin; M. Chen; Y. Huang; |
307 | Robust Bayesian Reconstruction of Multispectral Single-Photon 3D Lidar Data with Non-Uniform Background Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a new Bayesian algorithm for the robust reconstruction of multispectral single-photon Lidar data acquired in extreme conditions. |
A. Halimi; J. Koo; R. A. Lamb; G. S. Buller; S. McLaughlin; |
308 | Joint Calibration and Mapping of Satellite Altimetry Data Using Trainable Variational Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we show how a data-driven variational data assimilation framework could be used to jointly learn a calibration operator and an interpolator from non-calibrated data . |
Q. Febvre; R. Fablet; J. L. Sommer; C. Ubelmann; |
309 | 4D Convolutional Neural Networks for Multi-Spectral and Multi-Temporal Remote Sensing Data Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose the extension of current fully-convolutional models for multi-temporal remote sensing data classification to their high-dimensional analogs, which can naturally capture multi-dimensional dependencies and correlations. |
M. Giannopoulos; G. Tsagkatakis; P. Tsakalides; |
310 | A New Deep Learning Method for Multispectral Image Time Series Completion Using Hyperspectral Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new deep learning approach to that end. |
C. T. Ciss�; et al. |
311 | Image Denoising with Deep Unfolding And Normalizing Flows Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, current proximal mappings based on (predominantly convolutional) neural networks only implicitly learn such image priors. In this paper, we propose to make these image priors fully explicit by embedding deep generative models in the form of normalizing flows within the unfolded proximal gradient algorithm, and training the entire algorithm in an end-to-end fashion. |
X. Wei; H. van Gorp; L. G. Carabarin; D. Freedman; Y. C. Eldar; R. J. G. van Sloun; |
312 | 3D Texture Super Resolution Via The Rendering Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the nature of rendering 3D models, 2D SR methods applied directly to 3D object texture may not be a good approach. In this paper, we propose a rendering loss derived from the rendering of a 3D model and demonstrate its application to the SR task in the context of 3D texturing. |
R. Ranade; Y. Liang; S. Wang; D. Bai; J. Lee; |
313 | Bundle ICP with Virtual Depth for Hand-Held 3d Scanner Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a general-purpose hand-held 3D scan system that combines a iterative closest point (ICP) algorithm based on a large amount of virtual information for accuracy with the advantage of a graph-based reconstruction system for robustness. |
C. Sung; B. Kim; |
314 | Sketched RT3D: How to Reconstruct Billions of Photons Per Second Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In particular, we propose a sketched version of a recent state-of-the-art algorithm which uses point cloud denoisers to provide spatially regularized reconstructions. |
J. Tachella; M. P. Sheehan; M. E. Davies; |
315 | A Generic Method to Estimate Camera Extrinsic Parameters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, an approach to self-calibrate an outward-looking camera from camera images is presented. |
N. Kuruba; N. Badadare; V. Narayan; S. Putta; |
316 | Photon-Limited Deblurring Using Algorithm Unrolling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present an algorithm unrolling approach that unrolls a Plug-and-Play algorithm using a fixed-iteration network. |
Y. Sanghvi; A. Gnanasambandan; S. H. Chan; |
317 | NEX+: Novel View Synthesis with Neural Regularisation Over Multi-Plane Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Overfitting to training data is a common challenge for all learning-based models. We propose a novel solution for resolving such issue in the context of NVS with signal denoising-motivated operations over the alpha coefficients of the MPI, without any additional requirements for supervision. |
W. Xing; J. Chen; |
318 | Compressive Scanning Transmission Electron Microscopy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a scanning method based on the theory of Compressive Sensing (CS) and subsampling the electron probe locations using a line hop sampling scheme that significantly reduces the electron beam damage. |
D. Nicholls; et al. |
319 | Deep Iterative Phase Retrieval for Ptychography Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we specifically consider ptychography, a sub-field of diffractive imaging, where objects are reconstructed from multiple overlapping diffraction images. |
S. Welker; T. Peer; H. N. Chapman; T. Gerkmann; |
320 | Compressive Phase Retrieval Based On Sparse Latent Generative Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to introduce structure on the signal by enforcing sparsity in the latent-space via proximal method while training the generator. |
V. Killedar; C. S. Seelamantula; |
321 | Model-Based Reconstruction for Collimated Beam Ultrasound Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such systems include a transmitter and multiple receivers to capture reflected signals. Common algorithms for ultrasound reconstruction use delay-and-sum (DAS) approaches; these have low computational complexity but produce inaccurate images in the presence of complex structures and specialized geometries such as collimated beams.In this paper, we propose a multi-layer, ultrasonic, model-based iterative reconstruction algorithm designed for collimated beam systems. |
A. Alanazi; S. Venkatakrishnan; H. Santos-Villalobos; G. Buzzard; C. Bouman; |
322 | Learned Acoustic Reconstruction Using Synthetic Aperture Focusing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many algorithmic approaches to 3D acoustic imaging have been devised which rely on a large abundance of receiving elements to produce images with delay-and-sum techniques, but these have found little use in air due to hardware complexity and low accuracy. |
T. Straubinger; R. Xiao; H. Rhodin; |
323 | SDETR: Attention-Guided Salient Object Detection with Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a two-stage predict-refine SDETR model to leverage both benefits of transformer and CNN layers that can produce results with accurate saliency prediction and fine-grained local details. |
G. Liu; B. Xu; H. Huang; C. Lu; Y. Guo; |
324 | Evaluation of Video Coding for Machines Without Ground Truth Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, current methods have to either evaluate their codecs on still images or on already compressed data. To mitigate this problem, we propose an evaluation method based on pseudo ground-truth data from the field of semantic segmentation to the evaluation of video coding for machines. |
K. Fischer; M. Hofbauer; C. Kuhn; E. Steinbach; A. Kaup; |
325 | Raw Plenoptic Video Coding Under Hexagonal Lattice Resolution of Motion Vectors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A study in this paper shows that motion vectors are highly concentrated at hexagonal lattice points, leading to use of the proposed resolution in the context of video compression. |
T. N. Huu; V. Duong Van; J. Yim; B. Jeon; |
326 | Comparison of Boundary Artifact Removal Methods in Coding of Generalized Cubemap Projection Using VVC Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigated the effect of different methods in Versatile Video Coding standard and other pre- and post-processing algorithms for removing boundary artifacts by introducing a new objective quality metric for systematic comparison. |
K. Jafari; A. Aminlou; M. M. Hannuksela; |
327 | Low-Complexity Multi-Model CNN In-Loop Filter for AVS3 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a low-complexity multi-model CNN in-loop filtering scheme is proposed for AVS3. |
S. Wang; Y. Fu; C. Zhu; L. Song; W. Zhang; |
328 | Unified Matrix Coding for NN Originated MIP in H.266/VVC Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper designs an efficient algorithm to determine the input vector of MIP, with which the range of the matrices can be minimized, and all matrices can be converted to integers with a unified shift and a unified offset. |
J. Huo; Y. Sun; H. Wang; S. Wan; F. Yang; M. Li; |
329 | FOV-Based Coding Optimization for 360-Degree Virtual Reality Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an FoV-based coding scheme for 360-degree videos, which allocates more bits to tiles of the predicted FoV area than other tiles. |
Y. Xu; T. Yang; Z. Tan; H. Lan; |
330 | Multi-Hierarchy Proxy Structure for Deep Metric Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these details are meaningful for capturing features of the class. Therefore, we propose a multi-hierarchy proxy (MHP) structure to extract the hierarchical details and regular features hidden in the embedding space. |
J. Wang; X. Li; W. Song; Z. Zhang; W. Guo; |
331 | Exploiting Caption Diversity for Unsupervised Video Summarization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a novel DPP-based regularizer is proposed that exploits a pretrained DNN-based image captioner in order to additionally enforce maximal key-frame diversity from the perspective of textual semantic content. |
M. Kaseris; I. Mademlis; I. Pitas; |
332 | Clustering and Separating Similarities for Deep Unsupervised Hashing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These fixed features are, however, neither designed originally for retrieval nor updated adaptively during training. In this paper, we propose a novel deep Unsupervised Cluster and Separate Hashing (UCSH) to address these issues. |
W. Zhang; D. Wu; C. Yang; B. Li; W. Wang; |
333 | Enhancing Prototypical Few-Shot Learning By Leveraging The Local-Level Strategy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the problem, this paper returns the perspective to the local-level feature and proposes a series of local-level strategies. |
J. Huang; F. Chen; K. Wang; L. Lin; D. Zhang; |
334 | Blind Unmixing Using A Double Deep Image Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel network structure to solve the blind hyperspectral unmixing problem using a double Deep Image Prior (DIP). |
C. Zhou; M. R. D. Rodrigues; |
335 | A New Framework for Multiple Deep Correlation Filters Based Object Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: According to this framework, we design each component step by step. |
Y. Liu; Y. Liang; Q. Wu; L. Zhang; H. Wang; |
336 | Adaptive Actor-Critic Bilateral Filter Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, while most prior works analyze the adaptation of the range kernel in one-step manner, in this paper we take a more constructive view towards multi-step framework with the goal of unveiling the vulnerability of bilateral filtering. To this end, we adaptively model the width setting of range kernel as a multi-agent reinforcement learning problem and learn an adaptive actor-critic bilateral filter from local image context during successive bilateral filtering operations. |
B. -H. Chen; H. -Y. Cheng; J. -L. Yin; |
337 | Domain Decomposition Algorithms for Real-Time Homogeneous Diffusion Inpainting in 4K Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This prevents them from being applicable to time-critical scenarios such as real-time inpainting of 4K images. As a remedy, we adapt state-of-the-art numerical algorithms of domain decomposition type to this problem. |
N. K�mper; J. Weickert; |
338 | Deep Temporal Interpolation of Radar-Based Precipitation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study optical flow-based interpolation of globally available weather radar images from satellites. |
M. Tatsubori; et al. |
339 | A Nonlinear Steerable Complex Wavelet Decomposition of Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a two-dimensional nonlinear transform that uses only two subbands to achieve rotation invariance property, and enjoys a mirror reconstruction making it similar to a tight frame. |
Z. Sun; T. Blu; |
340 | Kernel Estimation Network for Blind Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these methods suffer a severe performance drop when the real degradations deviate from this assumption. To address this issue, this paper proposes a novel kernel estimation network (KENet) for kernel prediction. |
X. Cao; H. Shen; L. Zhang; Y. Luo; T. Wang; |
341 | Terahertz Image Restoration Benchmarking Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The paper introduces a new terahertz (THz) image benchmarking dataset for THz imaging. |
Y. Zhang; Z. Su; F. Qi; J. Zhou; X. -P. Zhang; |
342 | Binary Dense Predictors for Human Pose Estimation Based on Dynamic Thresholds and Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two approaches to conduct image-aware and pixel-aware dynamic binarization in a model for human pose estimation. |
X. Xing; et al. |
343 | Self-Supervised Learning for Sentiment Analysis Via Image-Text Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: There is often a resemblance in the sentiment expressed in social media posts (text) and their accompanying images. In this paper, We leverage this sentiment congruence for self-supervised representation learning for sentiment analysis. |
H. Zhu; Z. Zheng; M. Soleymani; R. Nevatia; |
344 | Domain-Agnostic Meta-Learning for Cross-Domain Few-Shot Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the challenging task of cross-domain few-shot classification and propose Domain-Agnostic Meta-Learning (DAML) algorithm. |
W. -Y. Lee; J. -Y. Wang; Y. -C. F. Wang; |
345 | Semantic Association Network for Video Corpus Moment Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Extensive ablation studies and qualitative analyses show the effectiveness of the proposed model. |
D. Kim; S. Yoon; J. W. Hong; C. D. Yoo; |
346 | Statistical, Spectral and Graph Representations for Video-Based Facial Expression Recognition in Children Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose the first approach that (i) constructs video-level heterogeneous graph representation for facial expression recognition in children, and (ii) predicts children�s facial expressions using the automatically detected Action Units (AUs). |
N. I. Abbasi; S. Song; H. Gunes; |
347 | Deriving Explainable Discriminative Attributes Using Confusion About Counterfactual Class Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel counterfactual explanation method, Discriminative Gradients (DiscGrad) that derives explainable discriminative attributes by considering not only the predicted class but also the counterfactual classes. |
N. Yang; T. Kang; K. Jung; |
348 | Realistic Monocular-To-3d Virtual Try-On Via Multi-Scale Characteristics Capture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In prior methods, the fundamental problems lie in the limitations on texture retention during garment deformation and the lack of feature context capture during depth estimation. To address these problems, we propose a new 3D virtual try-on network via multi-scale characteristic capture (VTON-MC), which can produce an exact 3D model with the generated photo-realistic monocular image. |
C. Du; et al. |
349 | Optimizing Latent Space Directions for Gan-Based Local Image Editing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We thus present a novel objective function to evaluate the locality of an image edit. |
E. Pajouheshgar; T. Zhang; S. S�sstrunk; |
350 | Towards Using Clothes Style Transfer for Scenario-Aware Person Video Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To further improve the generation performance, we propose a novel framework with disentangled multi-branch encoders and a shared decoder. |
J. Xu; et al. |
351 | Multi-Domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Despite the impressive results, they mainly focus on the I2I translation between two domains, so the multi-domain I2I translation still remains a challenge. To address this problem, we propose a novel multi-domain unsupervised image-to-image translation (MDUIT) framework that leverages the decomposed content feature and appearance adaptive convolution to translate an image into a target appearance while preserving the given geometric content. |
S. Jeong; J. Lee; K. Sohn; |
352 | VR-FAM: Variance-Reduced Encoder with Nonlinear Transformation for Facial Attribute Manipulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing works suffer from the entanglement of facial attributes, leading to unexpected artifacts and the loss of facial identity information after editing. To alleviate these issues, we propose a novel FAM framework based on StyleGAN, termed VR-FAM, which can meet the requirements of FAM�editing ability, distortion, and fidelity. |
Y. Yuan; S. Ma; J. Zhang; |
353 | Wavelet-Based Unsupervised Label-to-Image Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: State-of-the-art conditional Generative Adversarial Networks (GANs) need a huge amount of paired data to accomplish this task while generic un-paired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and learn correspondences in appearance instead of semantic content. Starting from the assumption that a high quality generated image should be segmented back to its semantic layout, we propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination. |
G. Eskandar; M. Abdelsamad; K. Armanious; S. Zhang; B. Yang; |
354 | Fast Graph Sampling for Short Video Summarization Using Gershgorin Disc Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the problem of efficiently summarizing a short video into several keyframes, leveraging recent progress in fast graph sampling. |
S. Sahami; G. Cheung; C. -W. Lin; |
355 | Towards Practical and Efficient Long Video Summary Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we find that the Kernel Temporal Segmentation (KTS) method designed for detecting the shot boundaries in SOTA VS methods is time-consuming while handling long videos. |
X. Ke; B. Chang; H. Wu; F. Xu; S. Zhong; |
356 | Cut And Continuous Paste Towards Real-Time Deep Fall Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a simple and efficient framework to detect falls through a single and small-sized convolutional neural network. |
S. Hwang; M. Ki; S. -H. Lee; S. Park; B. -K. Jeon; |
357 | Mannet: A Large-Scale Manipulated Image Detection Dataset And Baseline Evaluations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, no large-scale dataset having manipulated images generated using both handcrafted and deep learning algorithms is available. Therefore, in this research, we have proposed a large dataset with more than 5.5 million images, termed as ManNet dataset. |
A. Singh; S. Chhabra; P. Majumdar; R. Singh; M. Vatsa; |
358 | Approaches Toward Physical and General Video Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We introduce the Physical Anomalous Trajectory or Motion (PHANTOM) dataset 1, which contains six different video classes. |
L. Kart; N. Cohen; |
359 | Considering User Agreement in Learning to Predict The Aesthetic Quality Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we thus propose (1) a re-adapted multi-task attention network to predict both the mean opinion score and the standard deviation in an end-to-end manner; (2) a brand-new confidence interval ranking loss that encourages the model to focus on image-pairs that are less certain about the difference of their aesthetic scores. |
S. Ling; A. Pastor; J. Wang; P. L. Callet; |
360 | No-Reference Quality Assessment of Variable Frame-Rate Videos Using Temporal Bandpass Statistics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a first-of-a-kind blind VQA model for evaluating HFR videos, which we dub the Framerate-Aware Videos Evaluator w/o Reference (FAVER). |
Q. Zheng; Z. Tu; Y. Fan; X. Zeng; A. C. Bovik; |
361 | Towards Joint Frame-Level and MOS Quality Predictions with Low-Complexity Objective Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Consequently, an original way to train the models, using jointly the subjective scores and the frame level scores of a full-reference metric, is proposed. |
J. Jung; A. Giraud; M. Song; S. Li; X. Li; S. Liu; |
362 | Teaching CNNs to Mimic Human Visual Cognitive Process & Regularise Texture-Shape Bias Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose CognitiveCNN, a new intuitive architecture, inspired from feature integration theory in psychology to utilise human-interpretable feature like shape, texture, edges etc. to reconstruct, and classify the image. |
S. Mohla; A. Nasery; B. Banerjee; |
363 | Subjective And Objective Quality Assessment Of Mobile Gaming Video Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Abstract: Nowadays, with the vigorous expansion and development of gaming video streaming techniques and services, the expectation of users, especially the mobile phone users, for higher … |
S. Wen; S. Ling; J. Wang; X. Chen; Y. Jing; P. L. Callet; |
364 | ER-PIQA: A Task-Guided Pedestrian Image Quality Assessment Via Embedding Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a novel task-guided method is proposed to measure pedestrain image quality based on embedding reconstruction without the involvement of subjective labels. |
Y. Zhong; H. Pan; B. Tang; Z. Liu; Y. Zhu; J. Yin; |
365 | Multiscale Crowd Counting and Localization By Multitask Point Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose a multitask approach for crowd counting and person localization in a unified framework. |
M. Zand; H. Damirchi; A. Farley; M. Molahasani; M. Greenspan; A. Etemad; |
366 | Super-Resolution of Satellite Images By Two-Dimensional RRDB and Edge-Enhancement Generative Adversarial Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use Kaggle and AID open experimental datasets to test and compare the results among different methods. |
Y. -Z. Chen; T. -J. Liu; K. -H. Liu; |
367 | Leveraging Local Temporal Information for Multimodal Scene Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel self-attention block that leverages both local and global temporal relation-ships between the video frames to obtain better contextualized representations for the individual frames. |
S. Sahu; P. Goyal; |
368 | Predicting Human Motion Using Key Subsequences Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Usually, human motion tends to repeat itself and follows patterns that are well-represented by a few short key subsequences. Based on the above observations, we propose an attention-based feed-forward network, which is explicitly guided by the key subsequences, for human motion prediction. |
M. Li; M. Pei; W. Liang; |
369 | Dynamic Texture Recognition Using PDV Hashing and Dictionary Learning on Multi-Scale Volume Local Binary Pattern Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: STLBP often encounters the high-dimension problem as its dimension increases exponentially, so that STLBP could only utilize a small neighborhood. To tackle this problem, we propose a method for dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern (PHD-MVLBP). |
R. Ding; J. Ren; H. Yu; J. Li; |
370 | Do You Live A Healthy Life? Analyzing Lifestyle By Visual Life Logging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate the problem of lifestyle analysis and build a visual lifelogging dataset for lifestyle analysis (VLDLA). |
Q. Gao; M. Pei; H. Shen; |
371 | Weighted Wavelet-Based Spectral-Spatial Transforms For CFA-Sampled Raw Camera Image Compression Considering Image Features Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This study introduces weighted WSSTs (WWSSTs) that work especially for the CFA-sampled raw images with many edges well. |
L. Huang; T. Suzuki; |
372 | Jmpnet: Joint Motion Prediction for Learning-Based Video Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, problems such as tail shadow and background distortion in the predicted frame remain unsolved. To tackle these problems, JMPNet is introduced in this paper to provide more accurate motion information by using both optical flow and dynamic local filter as well as an attention map to further fuse these motion information in a smarter way. |
D. Li; et al. |
373 | A Low-Parametric Model for Bit-Rate Estimation of VVC Residual Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a set of four features together with a linear model, which is able to estimate the rate of arbitrary residual blocks which were compressed using the VVC standard. |
F. Brand; C. Herglotz; A. Kaup; |
374 | OPTE: Online Per-Title Encoding for Live Video Streaming Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces an online per-title encoding scheme (OPTE) for live video streaming applications. |
V. V. Menon; H. Amirpour; M. Ghanbari; C. Timmerer; |
375 | SADN: Learned Light Field Image Compression with Spatial-Angular Decorrelation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel end-to-end spatial-angular-decorrelated network (SADN) for high-efficiency light field image compression. |
K. Tong; X. Jin; C. Wang; F. Jiang; |
376 | Hierarchical Feature Aggregation Network for Deep Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing CNN-based methods for image compression extract features through serially connected high-to-low (encoder) or low-to-high (decoder) resolution stages, leading to insufficient utilization of hierarchical features. To solve this problem, we present a hierarchical feature aggregation network (HFAN) for generating more informative latent representations. |
W. Li; Z. Du; H. He; J. Tang; G. Wu; |
377 | Accurate Instance Segmentation Via Collaborative Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an instance segmentation model, named CoMask, that effectively alleviates the scale variation issue and addresses the precise localization. |
T. Chen; X. Hu; J. Xiao; G. Zhang; S. Wang; |
378 | Dynamic Binary Neural Network By Learning Channel-Wise Thresholds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This process limits representation capacity of BNNs since different samples may adapt to unequal thresholds. To address this problem, we propose a dynamic BNN (DyBNN) incorporating dynamic learnable channel-wise thresholds of Sign function and shift parameters of PReLU. |
J. Zhang; Z. Su; Y. Feng; X. Lu; M. Pietik�inen; L. Liu; |
379 | Self-Supervised Learning on A Lightweight Low-Light Image Enhancement Model with Curve Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Another challenge for paired training networks is the limited generalization capacity caused by the sample bias. To overcome these two challenges, we propose a lightweight self-supervised low-light image enhancement method, that trains with low light images only. |
W. Wu; W. Wang; K. Jiang; X. Xu; R. Hu; |
380 | Semantically Proportional Patchmix for Few-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although excelling at distinguishing training data, these models are not well generalized to unseen data, probably due to insufficient feature representations on evaluation. To tackle this issue, we propose Semantically Proportional Patchmix (SePPMix), in which patches are cut and pasted among training images and the ground truth labels are mixed proportionally to the semantic information of the patches. |
J. Wang; J. Xu; Y. Pan; Z. Xu; |
381 | Noise Suppression for Improved Few-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we identify that noise suppression is important to improve the performance of FSL algorithms. |
Z. Chen; T. Ji; S. Zhang; F. Zhong; |
382 | Online Continual Learning Using Enhanced Random Vector Functional Link Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an online continual learning algorithm based on an enhanced Random Vector Functional Link Network (OCL-eRVFL), that learns a sequence of tasks continually, where each task is defined by streaming data with each sample arriving once and only once. |
C. S. Yin Wong; G. Yang; A. Ambikapathi; R. Savitha; |
383 | A Generalized Kernel Risk Sensitive Loss for Robust Two-Dimensional Singular Value Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, 2DSVD algorithm is based on the squared error loss, which may exaggerate the projection errors with the presence of outliers. To solve this problem, we propose a generalized kernel risk sensitive loss for measuring the projection error in 2DSVD, which automatically eliminates the outlier information during optimization. |
M. Zhang; Y. Gao; J. Zhou; |
384 | Video Frame Interpolation Via Local Lightweight Bidirectional Encoding with Channel Attention Cascade Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a lightweight-driven video frame interpolation network (L2BEC2) is proposed. |
X. Ding; P. Huang; D. Zhang; X. Zhao; |
385 | Sain: Similarity-Aware Video Frame Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Since moving objects usually have similarities in consecutive frames, we propose a similarity-aware video frame interpolation method (SAIN) that searches patches with similar texture in the embedding space from input frames to extract features and capture image details. |
Y. Lv; W. Yang; W. Zuo; Q. Liao; R. Zhu; |
386 | Self-Learned Video Super-Resolution with Augmented Spatial and Temporal Context Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the issue and get rid of the synthetic paired data, in this paper, we make exploration in utilizing the internal self-similarity redundancy within the video to build a Self-Learned Video Super-Resolution (SLVSR) method, which only needs to be trained on the input testing video itself. |
Z. Fan; J. Liu; W. Yang; W. Xiang; Z. Guo; |
387 | Deformable Convolution Dense Network for Compressed Video Quality Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a Multi-frame Residual Dense Network (MRDN) with deformable convolution is developed to improve the quality of the compressed video, by utilizing high-quality frame to compensate the low-quality frame. |
J. Liu; M. Zhou; M. Xiao; |
388 | Convolutional ISTA Network with Temporal Consistency Constraints for Video Reconstruction from Event Cameras Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Current deep networks achieve high-quality video reconstruction from events, but most of them are large and difficult to interpret. In this work, we present a solution to this problem by systematically designing a deep network based on sparse representation. |
S. Liu; R. Alexandru; P. L. Dragotti; |
389 | PMP-NET: Rethinking Visual Context for Scene Graph Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we revisit the concept of incorporating visual context via a randomly ordered bidirectional Long Short Temporal Memory (biLSTM) based baseline, and show that noisy estimation is worse than random. |
X. Tong; R. Wang; C. Wang; S. Zhang; X. Cao; |
390 | Improve Image Captioning Via Relation Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel approach that combines scene graphs with Transformer, which we call SGT, to explicitly encode available visual relationships between detected objects. |
F. Huang; Z. Li; |
391 | Equal Loss: A Simple Loss Function for Noise Robust Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that DNN learning with Cross Entropy is not robust to label noise and exhibits imbalance between the gradient of clean and noisy samples. |
L. Cui; H. Peng; Y. Li; C. Li; X. Xing; |
392 | Informative Attention Supervision for Grounded Video Description Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the prevailing attention loss functions enforce the GVDMs to focus equally on all sampled regions when the GVDMs generate words, which may make it difficult for the model to attend to informative regions and thus degrade the quality of the generated sentences. To alleviate the above problems, we propose an informative attention supervision method including a novel attention groundtruth sampling method and a group-based weak grounding supervision. |
B. Wan; W. Jiang; Y. Fang; |
393 | Spatial-Context-Aware Deep Neural Network for Multi-Class Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Over the past few decades, solutions exploring relationships between semantic labels have made great progress. |
J. Zhang; Q. Zhang; J. Ren; Y. Zhao; J. Liu; |
394 | Transtl: Spatial-Temporal Localization Transformer for Multi-Label Video Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Generally, there exist many complex action labels in real-world videos and these actions are with inherent dependencies at both spatial and temporal domains. Motivated by this observation, we propose TranSTL, a spatial-temporal localization Transformer framework for MLVC task. |
H. Wu; M. Li; Y. Liu; H. Liu; C. Xu; X. Li; |
395 | Deep Video Inpainting Guided By Audio-Visual Self-Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Humans can easily imagine a scene from auditory information based on their prior knowledge of audio-visual events. In this paper, we mimic this innate human ability in deep learning models to improve the quality of video inpainting. |
K. Kim; J. Jung; W. J. Kim; S. -E. Yoon; |
396 | Navigating Audio-Visual Event Detection Across Mismatched Modalities Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We focus on AV parsing on fully unconstrained data where the audio and visual events do not necessarily co-present. |
G. Li; X. Xu; M. Wu; K. Yu; |
397 | Look, Listen and Pay More Attention: Fusing Multi-Modal Information for Video Violence Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Most existing works focus on single modal data analysis, which is not effective when multi-modality is available. Therefore, we propose a two-stage multi-modal information fusion method for violence detection: 1) the first stage adopts multiple instance learning strategies to refine video-level hard labels into clip-level soft labels, and 2) the next stage uses multi-modal information fused attention module to achieve fusion, and supervised learning is carried out using the soft labels generated at the first stage. |
D. -L. Wei; C. -G. Liu; Y. Liu; J. Liu; X. -G. Zhu; X. -H. Zeng; |
398 | Multi-Modal Learning with Text Merging for TEXTVQA Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a Multi-Modal Learning framework with Text Merging (MML&TM in short) for TextVQA, where we develop a text merging (TM) algorithm, which can effectively merge the word-level text obtained from the text recognition module to construct line-level and paragraph-level texts for enhancing semantic context, which is crucial to visual text understanding. |
C. Xu; Z. Xu; Y. He; S. Zhou; J. Guan; |
399 | A Novel Part Feature Integration and Fusion Method for Fine-Grained Vehicle Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel light-weight feature integration and fusion method to enhance the discriminative ability of deep convolutional features for the task of fine-grained vehicle recognition. |
P. Wang; Y. Cao; L. Lu; |
400 | Monocular Vehicle 3D Bounding Box Estimation Using Homograhy and Geometry in Traffic Scene Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel vehicle 3D bounding box estimation method making use of the 3D-2D geometry consistency and homography transformation. |
Y. Chen; F. Liu; K. Pei; |
401 | FSM: Feature Sampling Module for Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Towards enhancing the quality of the features, we propose a Feature Sampling Module (FSM), which learns multiple two-dimensional Gaussian distributions by the sampling network (SN) and applies those Gaussian masks to extract valid information of the features. |
X. Yi; B. Ma; J. Wu; |
402 | Rethinking Two-B-Real Net for Real-Time Salient Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: And its backbone is borrowed from image classification tasks, may be inefficient for SOD due to the deficiency of task-specific design. To handle these problems, we propose a novel and efficient structure named short-range concatenate module (SRCM) by removing structure redundancy. |
S. Kuang; S. Meng; B. Xiao; L. Tang; B. Li; |
403 | Balanced Ranking and Sorting For Class Incremental Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose balanced ranking and sorting (BRS), to tackle the catastrophic forgetting and data imbalance problems for CIOD. |
B. Cui; H. Qu; X. Huang; S. Yu; |
404 | Multi-Scale Reinforcement Learning Strategy for Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Multi-scale Reinforcement Learning Strategy (MRLS) for balanced multi-scale training. |
Y. Luo; X. Cao; J. Zhang; L. Pan; T. Wang; Q. Feng; |
405 | Deep Object Detection with Example Attribute Based Prediction Modulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Deep object detectors suffer from the gradient contribution imbalance during training. In this paper, we point out that such imbalance can be ascribed to the imbalance in example attributes, e.g., difficulty and shape variation degree. |
Z. Wu; C. Liu; C. Huang; J. Wen; Y. Xu; |
406 | Universal Efficient Variable-Rate Neural Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, computational complexity and rate flexibility are still two major challenges for its practical deployment. To tackle these problems, this paper proposes two universal modules named Energy-based Channel Gating(ECG) and Bit-rate Modulator(BM), which can be directly embedded into existing end-to-end image compression models. |
S. Yin; C. Li; Y. Bao; Y. Liang; F. Meng; W. Liu; |
407 | AdderIC: Towards Low Computation Cost Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although numerous progress has been made in learned image compression, the computation cost is still at a high level. To address this problem, we propose AdderIC, which utilizes adder neural networks (AdderNet) to construct an image compression framework. |
B. Li; Y. Xin; C. Li; Y. Bao; F. Meng; Y. Liang; |
408 | DCNGAN: A Deformable Convolution-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. |
S. Zhang; L. Herranz; M. Mrak; M. G. Blanch; S. Wan; F. Yang; |
409 | Specialised Video Quality Model For Enhanced User Generated Content (UGC) With Special Effects Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to conduct a benchmark on existing full-reference, non-reference, and aesthetic quality metrics for UGC with special effects. |
A. -F. Perrin; Y. Xie; T. Zhang; Y. Liao; J. Li; P. L. Callet; |
410 | Improving Maximum Likelihood Difference Scaling Method To Measure Inter Content Scale Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The goal of most subjective studies is to place a set of stimuli on a perceptual scale. |
A. Pastor; L. Krasula; X. Zhu; Z. Li; P. Le Callet; |
411 | Texture Information Boosts Video Quality Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we deeply investigate three elements of HVS, including texture masking, content-dependency, and temporal-memory effects from an experimental perspective. |
A. -X. Zhang; Y. -G. Wang; |
412 | Plug-and-Play and Relay Regularizations on Noisy Low Rank Tensor Completion for Snapshot Multispectral Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To improve the restoration performance, we introduce two regularizations in a Plug-and-Play (PnP) manner. |
K. Ozawa; |
413 | LERPS: Lighting Estimation and Relighting for Photometric Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a deep learning framework to perform three tasks jointly: (i) lighting estimation, (ii) image relighting, and (iii) surface normal estimation, all from a single input image of an object with non-Lambertian surface and general reflectance. |
A. Tiwari; S. Raman; |
414 | A Unified Two-Stage Model for Separating Superimposed Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a human vision-inspired framework for separating superimposed images. |
H. Duan; X. Min; W. Shen; G. Zhai; |
415 | Parameter-Free Style Projection for Arbitrary Image Style Transfer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing feature transformation algorithms often suffer from loss of content or style details, non-natural stroke patterns, and unstable training. To mitigate these issues, this paper proposes a new feature-level style transformation technique, named Style Projection, for parameter-free, fast, and effective content-style transformation. |
S. Huang; et al. |
416 | Optimization of Compressive Light Field Display in Dual-Guided Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Conventionally, the excessive processing time impacts its practical value in commercial, along with the severe degradation of display brightness. Therefore, in this paper, we propose a learning-based factorization framework to promote the visual results and expedite the layer decomposition and display adaption. |
Y. Sun; Z. Li; L. Li; S. Wang; W. Gao; |
417 | ARM 4-BIT PQ: SIMD-Based Acceleration for Approximate Nearest Neighbor Search on ARM Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. |
Y. Matsui; Y. Imaizumi; N. Miyamoto; N. Yoshifuji; |
418 | Iterative Learning for Distorted Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the influence of different learning schemes on fitting capability and tackle the problem by proposing a novel iterative learning scheme. |
C. Wang; et al. |
419 | JE2NET: Joint Exploitation and Exploration in Reinforcement Learning Based Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, we argue that these agents rely on pre-trained RL models with fixed-length paths for restoration, which performs poorly in the case of unknown distortions. To address these issues, we propose a joint exploitation and exploration reinforcement learning network (JE2Net). |
X. Zhang; W. Gao; H. Yuan; G. Li; |
420 | Multiple Patch-Aware Network for Faster Real-World Image Dehazing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, we propose a novel data enhancement method called Concentration Sampling Enhancement (CSE), which generates new training samples by haze concentration sampling based on hazy images and clear images. |
K. Yang; J. Zhang; X. Lang; |
421 | Learning to Fuse Heterogeneous Features for Low-Light Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To break down the limitation, we propose a new classification-driven enhancement method with heterogeneous feature fusion. |
Z. Tang; L. Ma; X. Shang; X. Fan; |
422 | Deep Scale-Aware Image Smoothing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a deep-learning-based scale-aware image smoothing method, which is built on a downscaling-upscaling mechanism with attention. |
J. Li; K. Qin; R. Xu; H. Ji; |
423 | A Multiscale Gradient-Backpropagation Optimization Framework for Deformable Convolution Based Compressed Video Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a multiscale gradient-backpropagation optimization framework is proposed for the deformable convolution based compressed video quality enhancement. |
Y. Gao; M. Jia; S. Li; X. Cai; M. Ye; F. Dufaux; |
424 | Downstream Augmentation Generation For Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we aim at improving the augmentation process and propose an augmentation generator, a network that learns to augment images for contrastive learning. |
T. Hayase; S. Yasutomi; N. Inoue; |
425 | Few-Shot Learning with Improved Local Representations Via Bias Rectify Module Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Deep Bias Rectify Network (DBRN) to fully exploit the spatial information that exists in the structure of the feature representations. |
C. Dong; Q. Ye; W. Meng; K. Yang; |
426 | Image-to-Video Re-Identification Via Mutual Discriminative Knowledge Transfer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a mutual discriminative knowledge distillation framework to transfer a video-based richer representation to an image based representation more effectively. |
P. Wang; F. Wang; H. Li; |
427 | DynSNN: A Dynamic Approach to Reduce Redundancy in Spiking Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, inspired by the topology of neuronal co-activity in the neural system, we propose a dynamic pruning framework (dubbed DynSNN) for SNNs, enabling us to seamlessly optimize network topology on the fly almost without accuracy loss. |
F. Liu; W. Zhao; Y. Chen; Z. Wang; F. Dai; |
428 | MEJIGCLU: More Effective Jigsaw Clustering For Unsupervised Visual Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To achieve competitive results to contrastive learning with low computational overhead, we propose a new unsupervised representation learning method with jigsaw clustering and classification as pretext tasks motivate the network to learn discriminative feature. |
Y. Zhang; Q. Liu; Y. Zhao; Y. Liang; |
429 | Ganet: Unary Attention Reaches Pairwise Attention Via Implicit Group Clustering in Light-Weight CNNs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The two groups of attention, unary and pair-wise attention, seem like being incompatible as fire and water due to the completely different operations. In this paper, we propose a Group Attention (GA) block to bridge the gap between these two attentions and merely leverage unary attention to lightweightly reach the effect of pairwise attention, based on the implicit group clustering of light-weight CNNs. |
C. Zhuang; Y. Sun; |
430 | Find The Way Back: Invertible Kernel Estimator For Blind Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address the task of zero-shot blind image super-resolution, where it aims to recover the high-resolution details from the low-resolution input image under a challenging problem setting of having no external training data, no prior assumption on the downsampling kernel, and no pre-training components used for estimating the downsampling kernel. |
T. -W. Chang; W. -C. Chiu; C. -C. Huang; |
431 | Fine-Grained Dynamic Loss for Accurate Single-Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Developing new loss function provides a promising SISR solution, i.e. one should be beyond the existing regression loss functions, which encounter problem in reconstructing the image texture details. For such goal, this paper proposes a dynamic fine-grained loss function. |
H. Wang; G. Zhang; Z. Lei; |
432 | Multi-Frame Super-Resolution With Raw Images Via Modified Deformable Convolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a novel model towards multi-frame super-resolution, which leverages multiple RAW images and yields a super-resolved RGB image. |
G. Li; L. Qiu; H. Zhang; F. Xie; Z. Jiang; |
433 | Local-Global Feature Aggregation for Light Field Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, due to the limitations of CNNs, these methods can�t fully model the global spatial properties of the whole LF images. In this paper, we propose a network with Local-Global Feature Aggregation (LF-LGFA) to handle these problems for LF image SR. |
Y. Wang; Y. Lu; S. Wang; W. Zhang; Z. Wang; |
434 | Pyramid Fusion Attention Network For Single Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these methods exclusively consider interdependencies among channels or spatials, leading to equal treatment of channel-wise or spatial-wise features thus hindering the power of AM. In this paper, we propose a pyramid fusion attention network (PFAN) to tackle this problem. |
H. He; Z. Du; W. Li; J. Tang; G. Wu; |
435 | VCD: View-Constraint Disentanglement for Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the View-Constraint Disentanglement (VCD) framework for cross-view action recognition. |
X. Zhong; et al. |
436 | Privacy-Preserving Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose to use unified actor score (UAS) to enhance the action recognition accuracy. |
C. Zou; D. Yuan; L. Lan; H. Chi; |
437 | Spatio-Temporal Motion Aggregation Network for Video Action Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Spatio-Temporal Motion Aggregation mechanism for integrating the local motion feature from a short term snippet and the longer spatio-temporal information to predict the action category. |
H. Zhang; X. Zhao; |
438 | TP-VIT: A Two-Pathway Vision Transformer for Video Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: How to use multiple pathways and multiple streams with Transformer for action recognition has not been studied. To address this issue, we present a novel structure namely Two-Pathway Vision Transformer (TP-ViT). |
Y. Jing; F. Wang; |
439 | Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a spatial-temporal attention mechanism to learn inter- and intra-correlations of video clips, and the boosted features are encouraged to be task-specific via the mutual cosine embedding loss. |
Y. Liu; J. Liu; X. Zhu; D. Wei; X. Huang; L. Song; |
440 | W-ART: Action Relation Transformer for Weakly-Supervised Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose W-ART, a relation Transformer to explicitly capture the relationships between action segments. |
M. Li; H. Wu; Y. Liu; H. Liu; C. Xu; X. Li; |
441 | MS-ROCANet: Multi-Scale Residual Orthogonal-Channel Attention Network for Scene Text Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a Multi-scale Residual Orthogonal-Channel Attention Network (MS-ROCANet) is proposed to improve the recall and accuracy of scene text detection. |
J. Liu; S. Wu; D. He; G. Xiao; |
442 | Bi-Directional Normalization and Color Attention-Guided Generative Adversarial Network for Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a bi-directional normalization and color attention-guided generative adversarial network (BNCAGAN) for unsupervised image enhancement. |
S. Liu; G. Xiao; X. Xu; S. Wu; |
443 | Dual-Attention Network for Few-Shot Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the issue, we propose a Dual-Attention Network (DANet) for few-shot segmentation. |
Z. Chen; H. Wang; S. Zhang; F. Zhong; |
444 | Attention Guided Invariance Selection for Local Feature Descriptors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, we propose a novel parallel self-attention module to get meta descriptors with the global receptive field, which guides the invariance selection more correctly. |
J. Li; G. Li; T. H. Li; |
445 | Attention Probe: Vision Transformer Distillation in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to effectively compress ViTs using the unlabeled data in the wild, consisting of two stages. |
J. Wang; M. Cao; S. Shi; B. Wu; Y. Yang; |
446 | Stacked Multi-Scale Attention Network for Image Colorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a stacked multi-scale attention network (SMSANet) for image colorization. |
B. Jiang; F. Xu; J. Xia; C. Yang; W. Huang; Y. Huang; |
447 | CRPN: Distinguish Novel Categories Via Class-Relevant Region Proposal Network for Few-Shot Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a Class-relevant Region Proposal Network (CRPN). |
H. Wang; Y. Li; S. Wang; |
448 | An Efficient Framework for Detection and Recognition of Numerical Traffic Signs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this letter, we fully explore the relationship between different traffic signs with digital characters and transform the category objects into multi-level classes to alleviate the uneven distribution of samples. |
Z. Li; M. Chen; Y. He; L. Xie; H. Su; |
449 | Divergence-Guided Feature Alignment for Cross-Domain Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To remedy the defects, in this paper, we propose a novel divergence-guided feature alignment method for cross-domain object detection. |
Z. Li; R. Togo; T. Ogawa; M. Haseyama; |
450 | PGTRNET: Two-Phase Weakly Supervised Object Detection with Pseudo Ground Truth Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, we propose a novel online PGT refinement approach to steadily improve the quality of PGT by fully taking advantage of the power of FSD during the second-phase training, decoupling the first and second-phase models. |
J. Wang; H. Zhou; X. Yu; |
451 | Novel Instance Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, a new instance mining model is proposed in this paper to excavate the novel samples from the base set. |
W. Liu; C. Wang; S. Yu; C. Tao; J. Wang; J. Wu; |
452 | BiP-Net: Bidirectional Perspective Strategy Based Arbitrary-Shaped Text Detection Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, to detect arbitrary-shaped text instances with high detection accuracy and speed simultaneously, we propose a Bidirectional Perspective strategy based Network (BiP-Net). |
C. Yang; M. Chen; Y. Yuan; Q. Wang; |
453 | A Novel Lightweight Network for Fast Monocular Depth Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a lightweight network which leverages the advantages of dimension-wise convolutions and depthwise separable convolutions to reduce complexity in the architecture. |
T. Heydrich; Y. Yang; X. Ma; Y. Liu; S. Du; |
454 | A Lightweight Self-Supervised Training Framework for Monocular Depth Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a lightweight self-supervised training framework which utilizes computationally cheap methods to compute ground truth approximations. |
T. Heydrich; Y. Yang; S. Du; |
455 | PU-Refiner: A Geometry Refiner with Adversarial Learning for Point Cloud Upsampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present PU-Refiner, a generative adversarial network for point cloud upsampling. |
H. Liu; H. Yuan; R. Hamzaoui; W. Gao; S. Li; |
456 | CF-Net: Complementary Fusion Network for Rotation Invariant Point Cloud Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our CF-Net can achieve competitive results both geometrically and semantically as demonstrated in this paper. |
B. -F. Chen; Y. -M. Yeh; Y. -C. Lu; |
457 | TH-Net: A Method Of Single 3d Object Tracking Based On Transformers And Hausdorff Distance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new 3D object tracking method called Transformer-Hausdorff Net (TH-Net). |
Z. Zhang; N. Sang; X. Wang; |
458 | Enrich Features for Few-Shot Point Cloud Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these methods require a lot of labeled data as support, which is challenging to obtain. To alleviate this problem, we propose a novel few-shot point cloud classification method to classify new categories given a few labeled samples. |
H. Feng; W. Liu; Y. Wang; B. Liu; |
459 | Semi-Supervised 360� Depth Estimation from Multiple Fisheye Cameras with Pixel-Level Selective Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study a practical omnidirectional depth estimation with neural networks that enables effective learning on real world data obtained using wide-baseline multiple fish-eye cameras. |
J. Lee; D. Park; D. Lee; D. Ji; |
460 | Underwater Stereo Matching Via Unsupervised Appearance And Feature Adaptation Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the domain gap also leads to the failure of directly applying existing models of terrestrial scenes to underwater scenes. Therefore, this paper proposes a novel underwater depth estimation network which can infer depth maps from real underwater stereo images in an unsupervised adaptation manner. |
W. Zhong; Y. Yuan; X. Ye; D. Zheng; R. Xu; |
461 | Domain Adaptation Via Mutual Information Maximization for Handwriting Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To improve the model�s generalization ability for sequence modeling task, this paper proposes to use domain adaptation with statistical distribution alignment and entropy regularization. |
P. Tang; et al. |
462 | Attribute-Conditioned Face Swapping Network for Low-Resolution Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Attribute-Conditioned Face Swapping Network (AFSNet) to preserve attributes and handle low resolution images. |
A. Li; J. Hu; C. Fu; X. Zhang; J. Zhou; |
463 | Learning Multiple Explainable and Generalizable Cues for Face Anti-Spoofing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, many other generalizable cues are unexplored for face anti-spoofing, which limits their performance under cross-dataset testing. To this end, we propose a novel framework to learn multiple explainable and generalizable cues (MEGC) for face anti-spoofing. |
Y. Bian; P. Zhang; J. Wang; C. Wang; S. Pu; |
464 | Off-The-Grid Covariance-Based Super-Resolution Fluctuation Microscopy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a gridless problem accounting for the independence of fluctuations. |
B. Laville; L. Blanc-F�raud; G. Aubert; |
465 | Simultaneous Nonlocal Low-Rank And Deep Priors For Poisson Denoising Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach using simultaneous nonlocal low-rank and deep priors (SNLDP) for Poisson denoising. |
Z. Zha; B. Wen; X. Yuan; J. Zhou; C. Zhu; |
466 | Double Closed-Loop Network for Image Deblurring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a deep learning network with double closed-loop structure is introduced to tackle the image deblurring problem. |
Y. Liu; Y. Zhang; Q. Li; J. Kong; M. Qi; J. Wang; |
467 | Single Image De-Raining with High-Low Frequency Guidance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new High-Low-Frequency Guided De-raining (HLFGD) method to remove the rain streaks clearly while reserve the image details. |
Y. Zhang; Y. Xiang; L. Cai; Y. Fu; W. Huo; J. Xia; |
468 | Detail Generation and Fusion Networks for Image Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel detail generation and fusion network (DGFNet) to strengthen the generation of texture details for image inpainting, which includes a dual-stream texture generation network and a multi-scale difference perception fusion network. |
W. Yang; W. Shi; |
469 | Adaptive Weighted Network With Edge Enhancement Module For Monocular Self-Supervised Depth Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, factors such as occlusion and texture sparsity can lead to the failure of the photometric consistency, affecting the prediction performance. To overcome these deficiencies, an adaptive weighted monocular self-supervised depth estimation framework that exploits enhanced edge information and texture sparsity based adaptive weights is proposed. |
H. Liu; Y. Zhu; G. Hua; W. Huang; R. Ding; |
470 | Pas-Mef: Multi-Exposure Image Fusion Based On Principal Component Analysis, Adaptive Well-Exposedness And Saliency Map Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To minimize the information loss and produce high quality HDR-like images for LDR screens, this study proposes an efficient multi-exposure fusion (MEF) approach with a simple yet effective weight extraction method relying on principal component analysis, adaptive well-exposedness and saliency maps. |
D. Karakaya; O. Ulucan; M. Turkan; |
471 | PDD-Net: A Precise Defect Detection Network Based on Point Set Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: iii) Extreme imbalance problem between defects and background classes during training. To address these issues, we propose a novel anchor-free defect detection network named PDD-Net. |
M. Ban; R. Ding; J. Zhang; T. Guo; T. Wang; |
472 | Solving The Long-Tailed Problem Via Intra- And Inter-Category Balance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel gradient harmonized mechanism with category-wise adaptive precision to decouple the difficulty and sample size imbalance in the long-tailed problem, which are correspondingly solved via intra- and inter-category balance strategies. |
R. Zhang; T. Lin; R. Zhang; Y. Xu; |
473 | Extracting and Distilling Direction-Adaptive Knowledge for Lightweight Object Detection in Remote Sensing Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, some lightweight convolutional neural network (CNN) models have been proposed for airborne or spaceborne remote sensing object detection (RSOD) tasks. |
Z. Huang; W. Li; R. Tao; |
474 | Pseudo-Interacting Guided Network for Few-Shot Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel network that combines a universal cross-guided branch with a new pseudo-interacting guided branch. |
X. Luo; J. Luo; Z. Duan; J. Tan; T. Zhang; |
475 | Few-Shot Generation By Modeling Stereoscopic Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a few-shot generative network which leverages 3D priors to improve the diversity and quality of generated images. |
Y. Wang; Q. Wang; D. Zhang; |
476 | Relative Viewpoint Estimation Based on Structured 3d Representation Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a relative viewpoint estimation method using an end-to-end trainable network that learns structured 3D representations. |
K. Matsuzaki; K. Kawamura; |
477 | Deep Markov Clustering for Panoptic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we adopt a box-free strategy and incorporate a graph-based clustering method to merge repetitive kernel weights for object instances. |
M. Ye; Y. Zhang; S. Zhu; A. Xie; D. Zhang; |
478 | Multi-Task Learning Improves The Brain Stoke Lesion Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel multi-task learning framework to achieve enhanced segmentation of stroke lesions. |
L. Liu; C. Huang; C. Cai; X. Zhang; Q. Hu; |
479 | Mixed Transformer U-Net for Medical Image Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter- and intra- affinities learning. |
H. Wang; et al. |
480 | Contrastive Translation Learning For Medical Image Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes an advantageous domain translation mechanism to improve the perceptual ability of the network for accurate unlabeled target data segmentation. |
W. Zeng; W. Fan; D. Shen; Y. Chen; X. Luo; |
481 | Fast Video Object Segmentation Via Dynamic YOLACT Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: VOS can be considered an extension of semantic segmentation from a static image to a dynamic image sequence. Following this idea, we propose a fast VOS framework based on YOLACT, a real-time static image segmentation framework. |
T. Meng; W. Zhang; |
482 | Depth Removal Distillation for RGB-D Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Therefore, it is extremely challenging to take full advantage of RGB-D semantic segmentation methods for segmenting RGB images without the depth input. To address this challenge, a general depth removal distillation method is proposed to remove depth dependence from RGB-D semantic segmentation model by knowledge distillation, which can be employed to any CNN-based segmentation network structure. |
T. Fang; Z. Liang; X. Shao; Z. Dong; J. Li; |
483 | Mask-Based Attention Parallel Network for In-the-Wild Facial Expression Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: But most previous attention-based methods are inadequate in locating crucial expression-related regions precisely and capturing useful facial expression features comprehensively. For these reasons, we present a novel mask-based attention parallel network (MAPNet). |
L. Ju; X. Zhao; |
484 | SDNET: Lightweight Facial Expression Recognition For Sample Disequilibrium Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Facial expression recognition (FER) based on the convolutional neural network (CNN) in the wild have numerous challenges. For instance, the complexity of the network model makes … |
L. Zhou; S. Li; Y. Wang; J. Liu; |
485 | A Novel Micro-Expression Recognition Approach Using Attention-Based Magnification-Adaptive Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the single fixed magnification strategy, widely used in existing works of MER, is not appropriate for different subjects, because each subject has specific expression intensity corresponding to different MEs. To cope with this issue, we propose a novel Attention-based Magnification-Adaptive Network (AMAN) to learn adaptive magnification levels for the ME representation. |
M. Wei; W. Zheng; Y. Zong; X. Jiang; C. Lu; J. Liu; |
486 | Lipreading Model Based On Whole-Part Collaborative Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on the full use of spatial information in lipreading tasks. |
W. Tian; H. Zhang; C. Peng; Z. -Q. Zhao; |
487 | What Is The Patient Looking At? Robust Gaze-Scene Intersection Under Free-Viewing Conditions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We demonstrate the utility of the proposed algorithm in regressing the PoR from scenes captured in the Intensive Care Unit (ICU) at Chelsea & Westminster Hospital NHS Foundation Trusta. |
A. Al-Hindawi; M. P. Vizcaychipi; Y. Demiris; |
488 | GAZEATTENTIONNET: Gaze Estimation with Attentions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel structure named GazeAttentionNet. |
H. Huang; L. Ren; Z. Yang; Y. Zhan; Q. Zhang; J. Lv; |
489 | Low-Light Image Enhancement Via Feature Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Most existing Retinex-based methods deal with the noise and color distortion via some careful designs to denoising and/or color correction. In this paper, we propose a simple yet effective network from the perspective of feature map restoration to mitigate such issues without constructing any explicit modules. |
Y. Yang; Y. Zhang; X. Guo; |
490 | HIRL: Hybrid Image Restoration Based on Hierarchical Deep Reinforcement Learning Via Two-Step Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, each tool adopted inevitably introduces additional noise and will affect the subsequent recovery results. To address this issue, in this paper, we propose a hierarchical deep reinforcement learning framework (HIRL), which balance both benefits and noises brought by each tool and select the appropriate type and degree tools. |
X. Zhang; W. Gao; |
491 | High-Fidelity Portrait Editing Via Exploring Differentiable Guided Sketches from The Latent Space Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nonetheless, passing sketch information to the generating model directly is nontrivial. To this end, we present an algorithm that addresses the problem of well controlling the generation process via differentiable guided sketches from latent space. |
C. Wang; C. Cao; Y. Fu; X. Xue; |
492 | Learning Adjustable Image Rescaling with Joint Optimization of Perception and Distortion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on the invertible rescaling net (IRN) which learns image downscaling and upscaling together, we propose a joint optimization method to train just one model that could achieve adjustable trade-off between perception and distortion for upscaling at inference time. |
Z. Pan; |
493 | FSOINET: Feature-Space Optimization-Inspired Network For Image Compressive Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose the idea of achieving information flow phase by phase in feature space and design a Feature-Space Optimization-Inspired Network (dubbed FSOINet) to implement it by mapping both steps of proximal gradient descent algorithm from pixel space to feature space. |
W. Chen; C. Yang; X. Yang; |
494 | Disentangled Feature-Guided Multi-Exposure High Dynamic Range Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a disentangled feature-guided HDR network (DFGNet) to alleviate the above-stated problems. |
K. Lee; Y. I. Jang; N. I. Cho; |
495 | Defending Against Universal Attack Via Curvature-Aware Category Adversarial Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a curvature-aware category adversarial training method to avoid excessive perturbations. |
P. Du; X. Zheng; L. Liu; H. Ma; |
496 | SP Attack: Single-Perspective Attack for Generating Adversarial Omnidirectional Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The safety of Deep Neural Networks (DNNs) processing omnidirectional images (ODIs) is an under-researched topic. In this paper, we propose a novel sparse attack, named Single-Perspective (SP) Attack, towards fooling these models by perturbing only one perspective image (PI) rendered from the target ODI. |
Y. Zhang; Y. Liu; J. Liu; P. Zhan; L. Wang; Z. Xu; |
497 | Few-Shot One-Class Domain Adaptation Based On Frequency For Iris Presentation Attack Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We thus define a new domain adaptation setting called Few-shot One-class Domain Adaptation (FODA), where adaptation only relies on a limited number of target bonafide samples. To address this problem, we propose a novel FODA framework based on the expressive power of frequency information. |
Y. Li; Y. Lian; J. Wang; Y. Chen; C. Wang; S. Pu; |
498 | Pixinwav: Residual Steganography for Hiding Pixels in Audio Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. |
M. Geleta; C. Punt�; K. McGuinness; J. Pons; C. Canton; X. Giro-i-Nieto; |
499 | A Semi-Handcrafted Keypoint Detector with Discriminative Feature Encoding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: And yet, the intrinsic relationships of key-points have not been explored actively, which may lead to the ambiguity of feature codes for further analysis. To tackle this problem, in this paper, we introduce a novel semi-handcrafted keypoint detector through a scheme of discriminative feature representations (SDFR). |
Y. Xie; L. Guan; |
500 | Safari from Visual Signals: Recovering Volumetric 3d Shapes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a convex approach for recovering a detailed 3D volumetric geometry of several objects from visual signals. |
A. Agudo; |
501 | Coupled Feature Learning Via Structured Convolutional Sparse Coding for Multimodal Image Fusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel method for learning correlated features in multimodal images based on convolutional sparse coding with applications to image fusion is presented. |
F. G. Veshki; S. A. Vorobyov; |
502 | DOMAINDESC: Learning Local Descriptors With Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel descriptor DomainDesc which is invariant as much as possible by learning local Descriptor with Domain adaptation. |
R. Xu; et al. |
503 | Multi-Head Relu Implicit Neural Representation Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, a novel multi-head multi-layer perceptron (MLP) structure is presented for implicit neural representation (INR). |
A. Aftab; A. Morsali; S. Ghaemmaghami; |
504 | An Efficient Method for Model Pruning Using Knowledge Distillation with Few Samples Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Progressive Feature Distribution Distillation (PFDD) without modifying network structures, which surpasses FSKD. |
Z. Zhou; Y. Zhou; Z. Jiang; A. Men; H. Wang; |
505 | Adaptive Intra-Group Aggregation for Co-Saliency Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the feature aggregation between group feature representation and individual feature representation is still a challenging issue. In this work, we propose a novel adaptive intra-group aggregation (AIGA) method, which provides a new perspective to investigate the interaction relationship between group and single-image features and aggregate these features in an adaptive way. |
G. Ren; T. Dai; T. Stathaki; |
506 | Novel Class Discovery: A Dependency Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we look at the problem where the model is required to discover novel classes never encountered in the labeled set. |
T. Mukherjee; N. Deligiannis; |
507 | Single-Shot Balanced Detector for Geospatial Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, to achieve excellent speed/accuracy trade-off for geospatial object detection, a single-shot balanced detector is presented. |
Y. Liu; Q. Li; Y. Yuan; Q. Wang; |
508 | Regularized Latent Space Exploration for Discriminative Face Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a regularized latent space exploration approach to facilitate self-supervised face super-resolution. |
R. Shi; J. Zhang; Y. Li; S. Ge; |
509 | Enhancing and Dissecting Crowd Counting By Synthetic Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this article, we propose a simulated crowd counting dataset CrowdX, which has a large scale, accurate labeling, parameterized realization, and high fidelity. |
Y. Hou; et al. |
510 | Multi-Pose Virtual Try-On Via Self-Adaptive Feature Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Prior methods lack an effective geometric deformation to maintain the original image details resulting in many details loss in the head and garment. To address this problem, we propose a new multi-pose virtual try-on network, which can fit a garment to the corresponding area of a person in arbitrary poses. |
C. Du; F. Yu; M. Jiang; X. Wei; T. Peng; X. Hu; |
511 | Histogram-Guided Semantic-Aware Colorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel histogram-guided semantic-aware colorization method, which explicitly builds the correspondences between global colors and local features with an attention mechanism and uses a differentiable histogram loss to impose the histogram of the results. |
J. Zhang; Y. Xiao; G. Chen; Q. Sun; F. Xu; C. -S. Leung; |
512 | Content Preserving Scale Space Network for Fast Image Restoration from Noisy-Blurry Pairs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a fast method to estimate a latent image given a pair of noisy-blurry images. |
G. R. K S; N. Krishnan; B. H. Pawan Prasad; S. Lomte; |
513 | Flow-Based Point Cloud Completion Network with Adversarial Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a coarse-to-fine approach to complete the partial point cloud with two stages: 1) Flow-based Completion Network, a principled probabilistic model that built on continuous normalizing flow to generate coarse completions conditioned on partial inputs. |
R. Bao; Y. Ren; G. Li; W. Gao; S. Liu; |
514 | Weakly Supervised Point Cloud Upsampling VIA Optimal Transport Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing learning-based methods usually train a point cloud upsampling model with synthesized, paired sparse-dense point clouds. |
Z. Li; W. Wang; N. Lei; R. Wang; |
515 | Point Cloud Denoising Using Normal Vector-Based Graph Wavelet Shrinkage Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel graph-based point cloud denoising approach using the spectral graph wavelet transform (SGWT) and graph wavelet shrinkage. |
R. Watanabe; K. Nonaka; H. Kato; E. Pavez; T. Kobayashi; A. Ortega; |
516 | Dynamic Point Cloud Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a first point cloud interpolation framework for photorealistic dynamic point clouds. |
A. Akhtar; Z. Li; G. Van der Auwera; J. Chen; |
517 | Point Cloud Attribute Compression Via Chroma Subsampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop a framework to incorporate chroma subsampling into geometry-based point cloud encoders, such as region adaptive hierarchical transform (RAHT) and region adaptive graph Fourier transform (RAGFT). |
S. N. Sridhara; E. Pavez; A. Ortega; R. Watanabe; K. Nonaka; |
518 | Rangeinet: Fast Lidar Point Cloud Temporal Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the existing methods heavily depend on 3D scene flow or 2D flow estimation, which yield huge computational complexity and obstacles in real-time applications. To resolve this issue, we propose a fast and non-flow involved method, which analyzes the LiDAR point cloud by exploiting its corresponding 2D range images (RIs). |
L. Zhao; X. Lin; W. Wang; K. -K. Ma; J. Chen; |
519 | MBNet: A Multi-Resolution Branch Network for Semantic Segmentation Of Ultra-High Resolution Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Semantic segmentation of ultra-high resolution images is more challenging than ordinary images since high-resolution images need to be cropped into patches in training due to GPU memory limitation. To solve this problem, we design a multibranch structure to deal with multi-resolution inputs, called Multi-resolution Branch Network (MBNet). |
L. Shan; W. Wang; |
520 | BSOLO: Boundary-Aware One-Stage Instance Segmentation SOLO Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a boundary-aware method to refine boundary information, called BSOLO. |
Y. Zhang; W. Yang; |
521 | CS-GResNet: A Simple and Highly Efficient Network for Facial Expression Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a highly efficient Channel-Shift Gabor-ResNet (CS-GResNet) to capture the crucial visual properties in facial images. |
S. Jiang; X. Xu; F. Liu; X. Xing; L. Wang; |
522 | RCANet: Row-Column Attention Network for Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a Row-Column Attention Network (RCANet) to encode globally contextual information. |
B. Lu; Q. Hu; Y. Wang; G. Hu; |
523 | Exploring Category Consistency for Weakly Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Generating a reliable and detailed pseudo mask label is the main challenge for improving the quality of predicted mask. In this paper, we propose Category Consistency Mask Refinement (CCMR) to explore the category consistency cued with the input image, and inject such information to mask refinement, guaranteeing the completeness of the refined mask. |
Z. Xie; H. Lu; |
524 | Vision Transformer Equipped With Neural Resizer On Facial Expression Recognition Task Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel training framework, Neural Resizer, to support Transformer by compensating information and downscaling in a data-driven manner trained with loss function balancing the noisiness and imbalance. |
H. Hwang; S. Kim; W. -J. Park; J. Seo; K. Ko; H. Yeo; |
525 | ISDA: Position-Aware Instance Segmentation with Deformable Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we propose a novel end-to-end instance segmentation method termed ISDA. |
K. Ying; Z. Wang; C. Bai; P. Zhou; |
526 | Improving Class Activation Map for Weakly Supervised Object Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We revisit current CAM-based WSOL approaches and propose a pipeline to: (1) refine the CAM map using Weighted Global Average Pooling (WGAP), (2) recombine weights to make use of the negative features, (3) adaptively select a suitable threshold to achieve better object localization. |
Z. Zhang; M. -C. Chang; T. D. Bui; |
527 | A Robust Object Segmentation Network for UnderWater Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The challenges of underwater object segmentation originate from two aspects, 1) the complex underwater environment and 2) the camouflage characteristics of marine animals. In this paper, we propose WaterSNet, an underwater object segmentation network to address these challenges. |
R. Chen; Z. Fu; Y. Huang; E. Cheng; X. Ding; |
528 | A Fast and Efficient Network for Single Image Shadow Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel multi-level feature-aware network, called TransShadow, which uses Transformer to capture both local and global context from a single image for shadow detection. |
L. Jie; H. Zhang; |
529 | Importance Sampling Cams For Weakly-Supervised Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we approach both problems with two contributions for improving CAM learning. |
A. Jonnarth; M. Felsberg; |
530 | DeepGBASS: Deep Guided Boundary-Aware Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To improve the semantic boundary accuracy, we propose low complexity Deep Guided Decoder (DGD) networks, trained with a novel Semantic Boundary-Aware Learning (SBAL) strategy. |
Q. Liu; H. Su; M. El-Khamy; K. -B. Song; |
531 | Camera Calibration Through Camera Projection Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work we propose a novel method to predict extrinsic (baseline, pitch, and translation), intrinsic (focal length and principal point offset) parameters using an image pair. |
T. H. Butt; M. Taj; |
532 | Inferring Camera Intrinsics Based on Surfaces of Revolution: A Single Image Geometric Network Approach for Camera Calibration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the problem of calibrating from a single image of a surface of revolution (SOR) based on deep learning, in order to determine the camera intrinsic parameters. |
C. Walker; Y. Wang; Y. Lu; G. Lu; |
533 | Text2video: Text-Driven Talking-Head Video Synthesis with Personalized Phoneme – Pose Dictionary Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel approach to synthesize video from the text. |
S. Zhang; J. Yuan; M. Liao; L. Zhang; |
534 | Towards Accurate Cross-Domain In-Bed Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a novel learning strategy comprises of two-fold data augmentation to reduce the cross-domain discrepancy and knowledge distillation to learn the distribution of unlabeled images in real world conditions. |
M. Afham; U. Haputhanthri; J. Pradeepkumar; M. Anandakumar; A. De Silva; C. U. S. Edussooriya; |
535 | Learning Monocular Mesh Recovery of Multiple Body Parts Via Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on simultaneously recovering the 3D mesh of multiple body parts from a single RGB image. |
Y. Sun; T. Huang; Q. Bao; W. Liu; W. Gao; Y. Fu; |
536 | LightPose: A Lightweight and Efficient Model with Transformer for Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: If performing keypoint prediction on low-resolution heatmaps, the performance is unsatisfied due to serious quantization errors. To solve this contradiction, we propose to perform joint training of the heatmap and center offset on low-resolution heatmaps to reduce quantization errors, which could achieve the comparable performance to the high-resolution heatmap and reduce the computational complexity. |
X. Liu; P. Li; D. Ni; Y. Wang; H. Xue; |
537 | On The Observability in Visual Slam Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Observability is an essential aspect for the performance of a visual Simultaneous Localization and Mapping (SLAM) network. This paper presents a statistical perspective to evaluate the observability of visual SLAM networks and its dependence on network structure. |
Q. An; Y. Shen; |
538 | Variational Bayesian Framework for Advanced Image Generation with Domain-Related Variables Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, it remains challenging for existing methods to address advanced conditional generative problems without annotations, which can enable multiple applications like image-to-image translation and image editing. We present a unified Bayesian framework for such problems, which introduces an inference stage on latent variables within the learning process. |
Y. Li; S. Mazuelas; Y. Shen; |
539 | The Impact of JPEG Compression on Prior Image Noise Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a model predicting how the noise power is affected by JPEG compression. |
M. Gardella; T. Nikoukhah; Y. Li; Q. Bammey; |
540 | On The Use of Component Structural Characteristics for Voxel Segmentation in Semicon 3D Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Detecting defects buried inside chips is critical for failure analysis in semiconductor manufacturing. In this paper, we perform 3D voxel segmentation on 2.5D semicon chips to locate and identify defects that may be present in them. |
T. L. Nwe; et al. |
541 | Blind Source Separation Via A Weak Exclusion Principle Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a generalized Blind Source Separation (BSS) method using a novel assumption which we call �weak exclusion� principle. |
Z. Zihan; T. Blu; |
542 | Graph Convolution for Re-Ranking in Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric. |
Y. Zhang; et al. |
543 | Multi-Level Relation Aware Network for Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Person attribute or pose information has improved person re-identification performance, however, inaccurate pose or attribute module will damage the final identification performance. Based on this, we propose a multi-scale relation aware network (MSRA) for person re-identification. |
J. Yang; C. Zhang; Z. Li; Y. Tang; |
544 | Progressive-Granularity Retrieval Via Hierarchical Feature Alignment for Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: It is a challenging task because of the feature misalignment problem caused by occlusion. In this paper, inspired by the coarse-to-fine nature of human perception, we propose a novel Progressive-Granularity Retrieval (PGR) method to tackle this issue. |
Z. Dou; Z. Wang; Y. Li; S. Wang; |
545 | Occluded Person Re-Identification Via Relational Adaptive Feature Correction Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Most existing methods utilize the off-the-shelf pose or parsing networks as pseudo labels, which are prone to error. To address these issues, we propose a novel Occlusion Correction Network (OCNet) that corrects features through relational-weight learning and obtains diverse and representative features without using external networks. |
M. Kim; M. Cho; H. Lee; S. Cho; S. Lee; |
546 | Learning Semantic-Aligned Feature Representation for Text-Based Person Search Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a semantic-aligned embedding method for text-based person search, in which the feature alignment across modalities is achieved by automatically learning the semantic-aligned visual features and textual features. |
S. Li; M. Cao; M. Zhang; |
547 | Transformer-Based Person Search Model with Symmetric Online Instance Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we adopt Swin Transformer as the backbone network to extract discriminative features. |
X. Xiang; N. Lv; Y. Qiao; |
548 | Wassertrain: An Adversarial Training Framework Against Wasserstein Adversarial Attacks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents an adversarial training framework WasserTrain for improving model robustness against the adversarial attacks in terms of the Wasserstein distance. |
Q. Zhao; X. Chen; Z. Zhao; E. Tang; X. Li; |
549 | Efficient Universal Shuffle Attack for Visual Object Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, existing attacks are difficult to implement in reality due to the real-time of tracking and the re-initialization mechanism. To address these issues, we propose an offline universal adversarial attack called Efficient Universal Shuffle Attack. |
S. Liu; et al. |
550 | Non-Rigid Transformation Based Adversarial Attack Against 3d Object Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the non-rigid transformation based adversarial attack method against 3D object tracking. |
R. Cheng; N. Sang; Y. Zhou; X. Wang; |
551 | Adversary Distillation for One-Shot Attacks on 3D Target Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an efficient generation based adversarial attack, termed Adversary Distillation Network (AD-Net), which is able to distract a victim tracker in a single shot. |
Z. Wang; X. Wang; F. Sohel; M. Bennamoun; Y. Liao; J. Yu; |
552 | AdverFacial: Privacy-Preserving Universal Adversarial Perturbation Against Facial Micro-Expression Leakages Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the novel universal adversarial perturbation-based approach – AdverFacial – for privacy concealment against automated micro-expression analysis via deep learning techniques. |
Y. -Y. Low; A. Tanvy; R. C. . -W. Phan; X. Chang; |
553 | Interpretable Image Classification Using Sparse Oblique Decision Trees Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a simple yet effective method to interpret the image datasets. |
S. Singh Hada; M. �. Carreira-Perpi��n; |
554 | Underwater Image Enhancement Via Learning Water Type Desensitized Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We present a novel underwater image enhancement method termed SCNet to improve the image quality meanwhile cope with the degradation diversity caused by the water. |
Z. Fu; X. Lin; W. Wang; Y. Huang; X. Ding; |
555 | A Wavelet-Based Dual-Stream Network for Underwater Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a wavelet-based dual-stream network that addresses color cast and blurry details in underwater images. |
Z. Ma; C. Oh; |
556 | Unsupervised and Untrained Underwater Image Restoration Based on Physical Image Formation Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Training a deep neural network to restore underwater images is challenging due to the labor-intensive data collection and the lack of paired data. To this end, we propose an unsupervised and untrained underwater image restoration method based on the layer disentanglement and the underwater image formation model. |
S. Chai; Z. Fu; Y. Huang; X. Tu; X. Ding; |
557 | Agcyclegan: Attention-Guided Cyclegan for Single Underwater Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel network architecture based on CycleGAN. |
Z. Wang; W. Liu; Y. Wang; B. Liu; |
558 | Underwater Small Target Detection Based on Deformable Convolutional Pyramid Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a two-stage Underwater Small Target Detection (USTD) network. |
S. Qi; et al. |
559 | Towards Controllable and Physical Interpretable Underwater Scene Simulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we try to fill in this research gap by proposing an Underwater Scene Simulation approach, namely USSim, which especially focuses on the influence of ocean water. |
K. Chen; L. Zhang; Y. Shen; Y. Zhou; |
560 | Graph Learning Based Autoencoder for Hyperspectral Band Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we pro-pose a graph learning based autoencoder (GLAE) to achieve unsupervised hyperspectral band selection. |
Y. Zhang; X. Wang; Z. Wang; X. Jiang; Y. Zhou; |
561 | Multitask Sparse Neural Network for Hyperspectral Image Denoising Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Consequently, they require complex network architectures and a large number of training samples. To address the above issues, this paper introduces a multitask sparse neural network (MTSNN) which bridges the sparsity prior of HSIs with data-driven deep learning for HSI denoising. |
F. Xiong; M. Ye; J. Zhou; J. Lu; Y. Qian; |
562 | Hyperspectral Image Classification Based on Co-Learning Through Dual-Architecture Ensemble Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Some deep learning methods have been introduced and achieved good results, such as the CNN-based architecture, which focuses on local and hierarchical feature extraction to obtain visual information from shallow to deep. |
C. Xiaoyue; C. Xianghai; |
563 | Material-Guided Siamese Fusion Network for Hyperspectral Object Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to limited HSVs for training, most current hyperspectral trackers are based on hand-crafted features rather than deeply learned ones, resulting in poor tracking performance. This paper introduces a material-guided Siamese fusion network (SiamF) for hyperspectral object tracking to make up this gap. |
Z. Li; F. Xiong; J. Lu; J. Zhou; Y. Qian; |
564 | Hyperspectral Image Super-Resolution with Deep Priors and Degradation Model Inversion Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This technique aims to fuse a low-resolution (LR) HSI and a conventional high-resolution (HR) RGB image in order to obtain an HR HSI. |
X. Wang; J. Chen; C. Richard; |
565 | Geometric Low-Rank Tensor Approximation for Remotely Sensed Hyperspectral And Multispectral Imagery Fusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This study proposes to estimate the high-resolution HS2I via low-rank tensor approximation with geometry proximity as side information learned from MSI and HSI by defined graph signals, which we name GLRTA. |
N. Liu; W. Li; R. Tao; |
566 | Dilated Convolutional Neural Network-Based Deep Reference Picture Generation for Video Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: The experimental results demonstrate that our pro-posed method achieves on average 9.7% bit saving compared with VVC under low-delay P configuration. |
H. Tian; P. Gao; R. Wei; M. Paul; |
567 | Rate Control for Learned Video Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present the first rate control scheme tailored for learned video compression. |
Y. Li; et al. |
568 | Global Optimization Solution for Dynamic Adaptive 360-Degree Streaming Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a global optimization solution that combines different types of perceptual information to further improve transmission efficiency. |
X. Wei; M. Zhou; W. Jia; |
569 | Collaborative Object Detectors Adaptive to Bandwidth and Computation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we help to bridge that gap, introducing the first configurable solution for collaborative object detection that manages the triple communication-computation-accuracy trade-off with a single set of weights. |
J. S. Assine; J. C. S. Santos Filho; E. Valle; |
570 | MA-NET: Multi-Scale Attention-Aware Network for Optical Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods for optical flow estimation can perform well on images with small offsets of large objects. |
M. Li; B. Zhong; K. -K. Ma; |
571 | Modeling Human Memory in Multi-Object Tracking with Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Memory-based Multi-object Tracking with Transformers (MMTT) to mimic human behavior in multi-object tracking. |
Y. Li; C. Lu; |
572 | Real-World Adversarial Examples Via Makeup Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Herein, we propose a physical adversarial attack with the use of full-face makeup. |
C. -S. Lin; C. -Y. Hsu; P. -Y. Chen; C. -M. Yu; |
573 | In Pursuit of Preserving The Fidelity of Adversarial Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we find adversarial examples that better match the natural distribution of the input domain by integrating signal processing techniques into the attack framework, dynamically altering the allowed perturbation with a Rule Adjustable Distance (RAD?) |
J. Clements; Y. Lao; |
574 | Object-Oriented Backdoor Attack Against Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore backdoor attack towards image captioning models by poisoning training data. |
M. Li; N. Zhong; X. Zhang; Z. Qian; S. Li; |
575 | Towards Robust Speech-to-Text Adversarial Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a novel adversarial algorithm for attacking the advanced speech-to-text transcription systems. |
M. Esmaeilpour; P. Cardinal; A. L. Koerich; |
576 | Sparse Adversarial Attack For Video Via Gradient-Based Keyframe Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a gradient-based method for self-adaptive white-box video keyframe selection and video adversarial example generation, taking advantage of that perturbations are transferable between video frames. |
Y. Xu; X. Liu; M. Yin; T. Hu; K. Ding; |
577 | How Secure Are The Adversarial Examples Themselves? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the concept of adversarial example security as how unlikely themselves can be detected. |
H. Zeng; K. Deng; B. Chen; A. Peng; |
578 | Exploring Complementarity of Global and Local Spatiotemporal Information for Fake Face Video Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a novel spatiotemporal network is proposed which can better utilize the implicit complementary advantages of global and local information. |
X. Zhao; Y. Yu; R. Ni; Y. Zhao; |
579 | Panchromatic Imagery Copy-Paste Localization Through Data-Driven Sensor Attribution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we consider the problem of localizing copy-paste forgeries on panchromatic images acquired with different satellites. |
E. D. Cannas; J. Horv�th; S. Baireddy; P. Bestagini; E. J. Delp; S. Tubaro; |
580 | Robust Video Hashing Based on Local Fluctuation Preserving for Tracking Deep Fake Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a robust video hashing algorithm based on local fluctuation preserving is proposed. |
L. Chen; D. Ye; Y. Shang; J. Huang; |
581 | ADT: Anti-Deepfake Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To get out of the predicament of prior methods, in this paper, we propose a novel transformer-based framework to model both global and local information and analyze anomalies of face images. |
P. Wang; et al. |
582 | Eyes Tell All: Irregular Pupil Shapes Reveal GAN-Generated Faces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we show that GAN-generated faces can be exposed via irregular pupil shapes. |
H. Guo; S. Hu; X. Wang; M. -C. Chang; S. Lyu; |
583 | Explainable Artificial Intelligence for Authorship Attribution on Social Media Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we extend upon LIME � a model-agnostic interpretability technique � to improve the explanations of the state-of-the-art methods for authorship attribution on social media posts. |
A. Theophilo; R. Padilha; F. A. Andal�; A. Rocha; |
584 | Dual-Domain Low-Rank Fusion Deep Metric Learning for Off-the-Person ECG Biometrics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To reduce the dynamic morphological variability, this paper introduces deep metric learning into ECG biometrics to learn intra-individual compact features. |
G. Zhu; M. Ma; Y. Huang; K. Wang; G. Yang; |
585 | A Robust Deep Audio Splicing Detection Method Via Singularity Detection Feature Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on the characteristic that the tampering operation will cause singularities at high-frequency components, we propose a high-frequency singularity detection feature obtained by wavelet transform. |
K. Zhang; et al. |
586 | Online Ecg Biometrics Via Hadamard Code Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Therefore, it is inefficient and unpractical for them to handle the online scenario where new data may continually come. To overcome the above limitation, we propose a novel ECG biometrics framework, termed Online ECG Biomet-rics based on Hadamard Codes. |
K. Wang; G. Yang; Y. Huang; L. Yang; Y. Yin; |
587 | Forensic Analysis and Localization of Multiply Compressed MP3 Audio Using Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the scenario of audio signal manipulation done by temporal splicing of compressed and uncompressed audio signals. |
Z. Xiang; P. Bestagini; S. Tubaro; E. J. Delp; |
588 | Adaptive Matching Strategy for Multi-Target Multi-Camera Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these works require human participation when formulating matching strategies, which becomes infeasible as the scale of the camera system increases. To tackle this problem, we propose an adaptive matching strategy to replace manual rules when guiding the matching between cameras. |
C. Liu; Y. Zhang; W. Chen; F. Wang; H. Li; Y. -D. Shen; |
589 | Generalized Face Anti-Spoofing Via Cross-Adversarial Disentanglement with Mixing Augmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The spoof-specific features are further used for live vs. spoof classification. To minimize correlation among these two features, we present a cross-adversarial training scheme, which requires each branch to act as adversarial supervision for the other branch. |
H. Huang; et al. |
590 | Free Lunch for Cross-Domain Occluded Face Recognition Without Source Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a source data-free domain adaptive occluded face recognition framework to optimize the network in the target domain via redefining it as a pseudo labels denoising problem. |
T. Zhang; Y. Xiang; X. Li; Z. Weng; Z. Chen; Y. Fu; |
591 | Coneface: Approximate Pairwise Loss for Face Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we proposed an approximate pairwise loss (APL) to encourage inter-class separability as well as intra-class compactness. |
Z. Zhuang; H. Lu; |
592 | Depth-Based Ensemble Learning Network For Face Anti-Spoofing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unlike previous methods, we assume that the data distribution of the target domain will be similar to one of the training domains. Based on this hypothesis, we draw on the idea of ensemble learning and propose a generalized framework with multiple domain-specific modules. |
J. Jiang; Y. Sun; |
593 | Are GAN-based Morphs Threatening Face Recognition? Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Research in generation of face morphs and their detection is developing rapidly, however very few datasets with morphing attacks and open-source detection toolkits are publicly available. This paper bridges this gap by providing two datasets and the corresponding code for four types of morphing attacks: two that rely on facial landmarks based on OpenCV and FaceMorpher, and two that use StyleGAN 2 to generate synthetic morphs. |
E. Sarkar; P. Korshunov; L. Colbois; S. Marcel; |
594 | Privacy Protection In Learning Fair Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a framework to achieve a desirable trade-off between fairness, inference accuracy and privacy protection in the inference as service scenario. |
Y. Jin; L. Lai; |
595 | Stealthy Backdoor Attack with Adversarial Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, we propose adversarial training to bypass neural cleanse detection. |
L. Feng; S. Li; Z. Qian; X. Zhang; |
596 | Fldp: Flexible Strategy For Local Differential Privacy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the challenge of providing weakened but flexible protection where each value only needs to be indistinguishable from part of the domain after perturbation. |
D. Zhao; H. Chen; S. Zhao; R. Liu; C. Li; X. Zhang; |
597 | Enhancing Utility In The Watchdog Privacy Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper is concerned with enhancing data utility in the privacy watchdog method for attaining information-theoretic privacy. |
M. A. Zarrabian; N. Ding; P. Sadeghi; T. Rakotoarivelo; |
598 | Cyber-Threat Propagation Over Network-Slicing Architectures Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Exploiting the multi-dimensional Birth-Death-Immigration model, we examine threat percolation from a vulnerable slice to a virtually secured slice. |
M. Cirillo; M. Di Mauro; V. Matta; G. Basileo; |
599 | Privacy-Aware Communication Over A Wiretap Channel with Generative Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Since we usually do not have access to true distributions, we propose a data-driven approach using variational autoencoder (VAE)-based joint source channel coding (JSCC). |
E. Erdemir; P. L. Dragotti; D. G�nd�z; |
600 | Encrypted Image Visual Security Index Via Non-Local Recognizable Degree Evaluation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: They ignore the fact that humans can reorganize partial local information to infer global information and local information can possibly be leaked in any position of the encrypted image. To address this problem, we propose a block-based non-local recognizable degree measure with a global structure similarity measure as a visual security index. |
R. Shi; J. Xiong; T. Qiao; |
601 | Against Backdoor Attacks In Federated Learning With Differential Privacy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Previous works showed that differential privacy (DP) can be used to defend against backdoor attacks, yet at the cost of vastly losing model utility. To address this issue, we in this paper propose a defense method based on differential privacy, called Clip Norm Decay (CND), to maintain utility when defending against backdoor attacks with DP. |
L. Miao; W. Yang; R. Hu; L. Li; L. Huang; |
602 | SecMPNN: 3-Party Privacy-Preserving Molecular Structure Properties Inference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In order to solve the privacy protection problem of computing on cloud servers, we propose a 3-party molecular structure properties inference privacy protection framework SecMPNN based on additive secret sharing. |
X. Liao; J. Xue; S. Yu; X. Liu; J. Shu; |
603 | Compressed Data Sharing Based On Information Bottleneck Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider privacy-preserving compressed image sharing, where the goal is to release compressed data whilst satisfying some privacy/secrecy constraints yet ensuring image reconstruction with a defined fidelity. |
B. Razeghi; S. Rezaeifar; S. Ferdowsi; T. Holotyak; S. Voloshynovskiy; |
604 | Randomized Smoothing Under Attack: How Good Is It in Practice? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While it indeed provides a theoretical robustness against adversarial attacks, the dimensionality of current classifiers necessarily imposes Monte Carlo approaches for its application in practice.This paper questions the effectiveness of randomized smoothing as a defense, against state of the art black-box attacks. This is a novel perspective, as previous research works considered the certification as an unquestionable guarantee. |
T. Maho; T. Furon; E. L. Merrer; |
605 | Training Privacy-Preserving Video Analytics Pipelines By Suppressing Features That Reveal Information About Private Attributes Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, the features extracted by a deep neural network that was trained to predict a specific, consensual attribute (e.g. emotion) may also encode and thus reveal information about private, protected attributes (e.g. age or gender). In this work, we focus on such leakage of private information at inference time. |
C. Y. Li; A. Cavallaro; |
606 | Unsupervised Anomaly Detection for Container Cloud Via BILSTM-Based Variational Auto-Encoder Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a real-time unsupervised anomaly detection system for monitoring system calls in container cloud via BiLSTM-based variational auto-encoder (VAE). |
Y. Wang; X. Chen; Q. Wang; R. Yang; B. Xin; |
607 | Applying Deep Learning to Known-Plaintext Attack on Chaotic Image Encryption Schemes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate that traditional chaotic encryption schemes are vulnerable to the known-plaintext attack (KPA) with deep learning. |
F. Wang; J. Sang; C. Huang; B. Cai; H. Xiang; N. Sang; |
608 | WordMarkov: A New Password Probability Model of Semantics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new password probability model for semantic information based on Markov Chain with both generalization and accuracy, called WordMarkov, that can capture the semantic essence of password samples. |
J. Xie; H. Cheng; R. Zhu; P. Wang; K. Liang; |
609 | Efficient Identity-Based Chameleon Hash for Mobile Devices Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an efficient IB-CH scheme in the standard model, significantly reducing the computational costs of all the algorithms and the size of public parameters compared with Xie�s scheme. |
C. Li; Q. Shen; Z. Xie; J. Dong; Y. Fang; Z. Wu; |
610 | Passtrans: An Improved Password Reuse Model Based on Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a password reuse model PassTrans and simulate credential tweaking attacks. |
X. He; H. Cheng; J. Xie; P. Wang; K. Liang; |
611 | Fostering The Robustness Of White-Box Deep Neural Network Watermarks By Neuron Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To enhance the robustness of white-box DNN watermarking schemes, this paper presents a procedure that aligns neurons into the same order as when the watermark is embedded, so the watermark can be correctly recognized. |
F. -Q. Li; S. -L. Wang; Y. Zhu; |
612 | Watermarking Images in Self-Supervised Latent Spaces Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We present a way to embed both marks and binary messages into their latent spaces, leveraging data augmentation at marking time. |
P. Fernandez; A. Sablayrolles; T. Furon; H. J�gou; M. Douze; |
613 | Speech Pattern Based Black-Box Model Watermarking for Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the first black-box model watermarking framework for protecting the IP of ASR models. |
H. Chen; W. Zhang; K. Liu; K. Chen; H. Fang; N. Yu; |
614 | Encryption Resistant Deep Neural Network Watermarking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an encryption resistent DNN watermarking scheme, which is able to resist the parameter shuffling based DNN encryption. |
G. Li; S. Li; Z. Qian; X. Zhang; |
615 | Attributable Watermarking of Speech Generative Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. |
Y. Cho; C. Kim; Y. Yang; Y. Ren; |
616 | Exploiting Language Model For Efficient Linguistic Steganalysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Through further experiments, we conclude that this ability can be transplanted to a text classifier by pre-training and fine-tuning to improve the detection performance. Motivated by this insight, we propose two methods for efficient linguistic steganalysis. |
B. Yi; H. Wu; G. Feng; X. Zhang; |
617 | Patch Steganalysis: A Sampling Based Defense Against Adversarial Steganography Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel sampling based defense method for steganalysis. |
C. Qin; N. Zhao; W. Zhang; N. Yu; |
618 | An Effective Steganalysis for Robust Steganography with Repetitive JPEG Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple and effective method to detect the mentioned steganography by chasing both steganographic perturbations as well as continuous compression artifacts. |
J. Feng; Y. Wang; K. Chen; W. Zhang; N. Yu; |
619 | Image Steganalysis with Convolutional Vision Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Convolutional Vision Transformer for image stegananlysis, which can capture both local and global dependencies among noise features. |
G. Luo; P. Wei; S. Zhu; X. Zhang; Z. Qian; S. Li; |
620 | A Bridge Between Features and Evidence for Binary Attribute-Driven Perfect Privacy Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This work presents an approach based on normalizing flow that maps a feature vector into a latent space where the evidence, related to the binary attribute, and an independent residual are disentangled. |
P. -G. No�; A. Nautsch; D. Matrouf; P. -M. Bousquet; J. -F. Bonastre; |
621 | Preserving Trajectory Privacy in Driving Data Release Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a privacy preservation framework based on the Hilbert Schmidt Independence Criterion (HSIC) to sanitize driving data to protect the vehicle�s trajectory from adversarial inference while ensuring the data is still useful for driver behavior detection. |
Y. Xu; C. X. Wang; Y. Song; W. P. Tay; |
622 | Direct Design of Biquad Filter Cascades with Deep Learning By Sampling Random Polynomials Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to initial conditions, requiring manual tuning. In this work, we address some of these limitations by learning a direct mapping from the target magnitude response to the filter coefficient space with a neural network trained on millions of random filters. |
J. T. Colonel; C. J. Steinmetz; M. Michelen; J. D. Reiss; |
623 | An End-to-End Deep Learning Speech Coding and Denoising Strategy for Cochlear Implants Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Cochlear implant (CI) users struggle to understand speech in noisy conditions. To address this problem, we propose a deep learning speech denoising sound coding strategy that estimates the CI electric stimulation patterns out of the raw audio data captured by the microphone, performing end-to-end CI processing. |
T. Gajecki; W. Nogueira; |
624 | Exploiting Hybrid Models of Tensor-Train Networks For Spoken Command Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. |
J. Qi; J. Tejedor; |
625 | Learnable Wavelet Packet Transform for Data-Adapted Spectrograms Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This can be time-consuming and the performance is often dependent on the choice of parameters. To address these limitations, we propose a deep learning framework for learnable wavelet packet transforms, enabling to learn features automatically from data and optimise them with respect to the defined objective function. |
F. Ga�tan; F. Olga; |
626 | Music Enhancement Via Image Translation and Vocoding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ. This paper presents a deep learning approach to enhance low-quality music recordings by combining (i) an image-to-image translation model for manipulating audio in its mel-spectrogram representation and (ii) a music vocoding model for mapping synthetically generated mel-spectrograms to perceptually realistic waveforms. |
N. Kandpal; O. Nieto; Z. Jin; |
627 | Progressive Teacher-Student Training Framework for Music Tagging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a progressive two-stage teacher-student training framework to prevent the music tagging model from overfitting label noise. |
R. Lu; B. Zheng; J. Hai; F. Tao; Z. Duan; J. Liu; |
628 | Joint Dual-Domain Matrix Factorization for ECG Biometric Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel ECG biometrics framework termed Joint Dual-domain Matrix Factorization (JDMF). |
K. Wang; G. Yang; Y. Huang; L. Yang; Y. Yin; |
629 | Iterative Self Knowledge Distillation � from Pothole Classification to Fine-Grained and Covid Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Given the limited computational power and fixed number of training epochs, we propose iterative self knowledge distillation (ISKD) to train lightweight pothole classifiers. |
K. -C. Peng; |
630 | Attention-based Adversarial Partial Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, previous vanilla domain adaptation methods generally assume the same label space, and such an assumption is no longer valid for a more realistic scenario where it requires adaptation from a larger and more diverse source domain to a smaller target domain with less number of classes. To handle this problem, we propose an attention-based adversarial partial domain adaptation (AADA). |
M. Wang; et al. |
631 | Group-Wise Feature Selection for Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two techniques to solve the problem: the first applies K-Means Clustering to the instance-wise feature selection algorithm; the second uses the mixture of experts model with Gumbel-Softmax to learn group membership and feature selector simultaneously. |
Q. Xiao; H. Li; J. Tian; Z. Wang; |
632 | A Light Weight Model for Video Shot Occlusion Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a neural network module, named SAT module, which can effectively extract spatio-temporal information with fewer parameters. |
J. Liao; H. Duan; W. Zhao; Y. Yang; L. Chen; |
633 | Detecting Backdoor Attacks Against Point Cloud Classifiers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such PC BAs are not detectable by existing BA defenses due to their special BP embedding mechanism. In this paper, we propose a reverse-engineering defense that infers whether a PC classifier is backdoor attacked, without access to its training set or to any clean classifiers for reference. |
Z. Xiang; D. J. Miller; S. Chen; X. Li; G. Kesidis; |
634 | Characterizing The Adversarial Vulnerability of Speech Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority. In this paper, we make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries. |
H. Wu; B. Zheng; X. Li; X. Wu; H. -Y. Lee; H. Meng; |
635 | Universal Paralinguistic Speech Representations Using Self-Supervised Conformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a new state-of-the-art paralinguistic representation derived from large-scale, fully self-supervised training of a 600M+ parameter Conformer-based architecture. |
J. Shor; A. Jansen; W. Han; D. Park; Y. Zhang; |
636 | A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. |
Q. -S. Zhu; J. Zhang; Z. -Q. Zhang; M. -H. Wu; X. Fang; L. -R. Dai; |
637 | An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. |
S. Kessler; B. Thomas; S. Karout; |
638 | DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the Disentangled Representation Voice Conversion (DRVC) model to address the issue. |
Q. Wang; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
639 | Contrastive Prediction Strategies for Unsupervised Segmentation and Categorization of Phonemes and Words Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Our experiments suggest that context building networks, albeit necessary for high performance on categorization tasks, harm segmentation performance by causing a temporal shift on the learned representations. Aiming to tackle this trade-off, we take inspiration from the leading approaches on segmentation and propose multi-level Aligned CPC (mACPC). |
S. Cuervo; et al. |
640 | Uncertainty in Data-Driven Kalman Filtering for Partially Known State-Space Models Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we investigate the ability of KalmanNet, a recently proposed; hybrid; model-based; deep state tracking algorithm, to estimate an uncertainty measure. |
I. Klein; G. Revach; N. Shlezinger; J. E. Mehr; R. J. G. van Sloun; Y. C. Eldar; |
641 | Deep Piecewise Hashing for Efficient Hamming Space Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel method named Deep Piecewise Hashing (DPH), for Efficient Hamming Space Retrieval. |
J. Gu; D. Wu; P. Fu; B. Li; W. Wang; |
642 | SODA: Self-Organizing Data Augmentation in Deep Neural Networks Application to Biomedical Image Segmentation Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: When using several types of data augmentation, the budget is usually uniformly distributed over the set of augmentations but one can wonder if this budget should not be allocated to each type in a more efficient way. This paper leverages online learning to allocate on the fly this budget as part of neural network training. |
A. Deleruyelle; J. Klein; C. Versari; |
643 | Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning. |
A. Richard; P. Dodds; V. K. Ithapu; |
644 | Joint Temporal Convolutional Networks and Adversarial Discriminative Domain Adaptation for EEG-Based Cross-Subject Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To guarantee the constancy of feature representations across domains and to eliminate differences between domains, we explored the feasibility of combining temporal convolutional networks (TCNs) and adversarial discriminative domain adaptation (ADDA) algorithms in solving the problem of domain shift in EEG-based cross-subject emotion recognition. |
Z. He; Y. Zhong; J. Pan; |
645 | Gradient Variance Loss for Structure-Enhanced Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: During our research, we observe that gradient maps of images generated by the models trained with the L1 or L2 loss have significantly lower variance than the gradient maps of the original high-resolution images. In this work, we propose to alleviate the above issue by introducing a structure-enhancing loss function, coined Gradient Variance (GV) loss, and generate textures with perceptual-pleasant details. |
L. Abrahamyan; A. M. Truong; W. Philips; N. Deligiannis; |
646 | Label-Occurrence-Balanced Mixup for Long-Tailed Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The suppression effect may further aggravate the problem of data imbalance and lead to a poor performance on tail classes. To address this problem, we propose Label-Occurrence-Balanced Mixup to augment data while keeping the label occurrence for each class statistically balanced. |
S. Zhang; C. Chen; X. Zhang; S. Peng; |
647 | TNTC: Two-Stream Network with Transformer-Based Complementarity for Gait-Based Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, the long range dependencies in both spatial and temporal domains of the gait sequence are scarcely considered. To address these issues, we propose a novel two-stream network with transformer-based complementarity, termed as TNTC. |
C. Hu; W. Sheng; B. Dong; X. Li; |
648 | A Free Lunch from ViT: Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To capture region attention without box annotations and compensate for ViT shortcomings in FGVR, we propose a novel method named Adaptive attention multi-scale Fusion Transformer (AFTrans). |
Y. Zhang; et al. |
649 | Self-Supervised Contrastive Learning for Cross-Domain Hyperspectral Image Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a self-supervised learning framework suitable for hyperspectral images that are inherently challenging to annotate. |
H. Lee; H. Kwon; |
650 | GOS: A Large-Scale Annotated Outdoor Scene Synthetic Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing datasets cannot satisfy the demands of large amounts of diverse data and rich semantic annotations at the same time, which makes the existing method difficult to edit on the content of outdoor scene images. To address these problems, we propose a large-scale, diverse synthetic dataset called GOS dataset generated based on a video game, which contains fine-grained semantic annotations. |
M. Xie; T. Liu; Y. Fu; |
651 | Out-Of-Distribution As A Target Class in Semi-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes an effective way of controlling the behavior of a neural network in the presence of out-of-distribution examples. |
A. Tadros; S. Drouyer; R. G. von Gioi; |
652 | Self-Supervised Acoustic Anomaly Detection Via Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an acoustic anomaly detection algorithm based on the framework of contrastive learning. |
H. Hojjati; N. Armanfard; |
653 | Don’t Speak Too Fast: The Impact of Data Bias on Self-Supervised Speech Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, how pretraining data affects S3Ms� downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these pre-trained S3Ms on selected downstream tasks in SUPERB Benchmark. |
Y. Meng; Y. -H. Chou; A. T. Liu; H. -y. Lee; |
654 | Self-Supervised Learning Method Using Multiple Sampling Strategies for General-Purpose Audio Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a self-supervised learning method using multiple sampling strategies to obtain general-purpose audio representation. |
I. Kuroyanagi; T. Komatsu; |
655 | Self Supervised Representation Learning with Deep Clustering for Acoustic Unit Discovery from Raw Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel neural network paradigm that uses the deep clustering loss along with the autoregressive contrastive predictive coding (CPC) loss. |
V. Krishna; S. Ganapathy; |
656 | T-NGA: Temporal Network Grafting Algorithm for Learning to Process Spiking Audio Sensor Events Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a self-supervised method called Temporal Network Grafting Algorithm (T-NGA), which grafts a recurrent network pretrained on spectrogram features so that the network works with the cochlea event features. |
S. Wang; Y. Hu; S. -C. Liu; |
657 | Contrastive Knowledge Graph Attention Network for Request-Based Recipe Recommendation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we alleviate the foregoing issues by proposing contrastive knowledge graph attention network for recipe recommendation, where a knowledge graph attention-based recommender helps learn fine-grained user and recipe embeddings by modeling diversified user preferences from user behaviors. |
X. Ma; Z. Gao; Q. Hu; M. Abdelhady; |
658 | TargetDrop: A Targeted Regularization Method for Convolutional Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a targeted regularization method, TargetDrop, which incorporates the attention mechanism to drop several discriminative feature units. |
H. Zhu; X. Zhao; |
659 | Coarse-To-Fine Unsupervised Change Detection for Remote Sensing Images Via Object-Based MRF and Inception UNET Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to propose an unsupervised change detection method based on Object-based Markov Random Filed (OMRF) and Inception UNet (IUNet). |
X. Hou; Y. Bai; H. Shi; Y. Li; |
660 | Combating False Sense of Security: Breaking The Defense of Adversarial Training Via Non-Gradient Adversarial Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: On the contrast, in this work, we propose a novel Non-Gradient Attack (NGA) to overcome the above-mentioned problem. |
M. Fan; Y. Liu; C. Chen; S. Yu; W. Guo; X. Liu; |
661 | Dynamically Pruning Segformer for Efficient Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we seek to design a lightweight SegFormer for efficient semantic segmentation. |
H. Bai; H. Mao; D. Nair; |
662 | Deformable VisTR: Spatio Temporal Deformable Attention for Video Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To improve the training efficiency, we propose Deformable VisTR, leveraging spatio-temporal deformable attention module that only attends to a small fixed set of key spatio-temporal sampling points around a reference point. |
S. Yarram; J. Wu; P. Ji; Y. Xu; J. Yuan; |
663 | Attentional Gated Res2net for Multivariate Time Series Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two types of attention modules, namely channel-wise attention and block-wise attention, to leverage the multi-granular temporal patterns. |
C. Yang; X. Wang; L. Yao; G. Long; J. Jiang; G. Xu; |
664 | Convex Clustering for Autocorrelated Time Series Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we develop a convex clustering algorithm suited to auto-correlated time series and compare it with a state of the art method. |
M. Revay; V. Solo; |
665 | Investigating The Potential of Auxiliary-Classifier Gans for Image Classification in Low Data Regimes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we examine the potential for Auxiliary-Classifier GANs (AC-GANs) as a �one-stop-shop� architecture for image classification, particularly in low data regimes. |
A. Dravid; F. Schiffers; Y. Wu; O. Cossairt; A. K. Katsaggelos; |
666 | Feature Augmentation Learning for Few-Shot Palmprint Image Recognition With Unconstrained Acquisition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel feature augmentation network (FAN) for few-shot unconstrained palmprint recognition. |
K. Jing; X. Zhang; Z. Yang; B. Wen; |
667 | Prime Knowledge with Local Pattern Consistency for Knowledge Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing works mainly focus on formulating beneficial knowledge for transferring, but ignore the contribution discrepancy of the knowledge to promote performance. To tackle this issue, we propose a simple Importance-based Knowledge Reweighting mechanism, which dynamically measure the importance of knowledge spatially and channel-wisely for teacher-student pairs. |
Q. Tang; X. Xu; J. Wang; |
668 | Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an in-flight unsupervised defense against backdoor attacks on image classification that 1) detects use of a backdoor trigger at test-time; and 2) infers the class of origin (source class) for a detected trigger example. |
X. Li; Z. Xiang; D. J. Miller; G. Kesidis; |
669 | Multi-View Data Representation Via Deep Autoencoder-Like Nonnegative Matrix Factorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We focus on unsupervised multi-view data representation in this paper and propose a novel framework termed Deep Autoencoder-like NMF (DANMF-MDR), which learns an intact representation by simultaneously exploring multi-view complementary and consistent information. |
H. Huang; Y. Luo; G. Zhou; Q. Zhao; |
670 | On Identifiable Polytope Characterization for Polytopic Matrix Factorization Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this article, we investigate the problem of determining the identifiability of a polytope. |
B. Bozkurt; A. T. Erdogan; |
671 | Fast Learning of Fast Transforms, with Guarantees Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a hierarchical approach that recursively factorizes the considered matrix into two factors. |
Q. -T. Le; L. Zheng; E. Riccietti; R. Gribonval; |
672 | Regression Assisted Matrix Completion for Reconstructing A Propagation Field with Application to Source Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes to employ local polynomial regression to increase the accuracy of matrix completion. |
H. Sun; J. Chen; |
673 | Matrix Decomposition on Graphs: A Simplified Functional View Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a simplified functional view of matrix decomposition problems on graphs such as geometric matrix completion. |
A. Sharma; M. Ovsjanikov; |
674 | Learning to Sample for Sparse Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a model-based deep learning approach to jointly design the subsampling and reconstruction of FRI signals. |
S. Mulleti; H. Zhang; Y. C. Eldar; |
675 | Mixture Model Auto-Encoders: Deep Clustering Through Dictionary Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We introduce Mixture Model Auto-Encoders (MixMate), a novel architecture that clusters data by performing inference on a generative model. |
A. Lin; A. H. Song; D. Ba; |
676 | Exploring The Effect of L0/l2 Regularization in Neural Network Pruning Using The LC Toolkit Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we utilize the LC toolkit�s common algorithmic base to take a deeper look into l0-constrained pruning problems defined as follows: given a budget of ? |
Y. Idelbayev; M. �. Carreira-Perpi��n; |
677 | Dictionary Learning with Uniform Sparse Representations for Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper we use a particular DL formulation that seeks uniform sparse representations model to detect the underlying subspace of the majority of samples in a dataset, using a K-SVD-type algorithm. |
P. Irofti; C. Rusu; A. Patrascu; |
678 | Data-Driven Spatially Dependent PDE Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a data-driven partial differential equation (PDE) identification scheme based on l1-norm minimization which can identify spatially-dependent PDEs from measurements. |
R. Liu; M. J. Bianco; P. Gerstoft; B. D. Rao; |
679 | Sparsity Improves Unsupervised Attribute Discovery in Stylegan Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we identify a new criterion, representation sparsity, that allows us to produce extremely efficient yet diverse semantic directions in GAN (generative adversarial network) latent spaces. |
S. Liu; R. Anirudh; J. J. Thiagarajan; P. -T. Bremer; |
680 | Image-to-Graph Transformers for Chemical Structure Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a deep learning model to extract molecular structures from images. |
S. Yoo; O. Kwon; H. Lee; |
681 | A Simple Hybrid Filter Pruning for Efficient Edge Inference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel hybrid filter pruning method that prunes both redundant and insignificant filters at the same time. |
S. H. Shabbeer Basha; S. N. Gowda; J. Dakala; |
682 | An Enhanced Deep Learning Approach for Tectonic Fault and Fracture Extraction in Very High Resolution Optical Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that training the model with a realistic knowledge of fracture and fault uneven distributions and trends, and using a loss function that operates at both pixel and larger scales through the combined use of weighted Binary Cross Entropy and Intersection over Union, greatly improves the predictions, both qualitatively and quantitatively. As we apply the model to a site differing from those used for training, we demonstrate its enhanced generalization capacity. |
B. Kanoun; M. A. Cherif; I. Manighetti; Y. Tarabalka; J. Zerubia; |
683 | Joint Learning of Feature Extraction and Cost Aggregation for Semantic Correspondence Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework for jointly learning feature extraction and cost aggregation for semantic correspondence. |
J. Kim; Y. Min; M. Kim; S. Kim; |
684 | Generalized Zero-Shot Learning Using Conditional Wasserstein Autoencoder Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new conditional generative model that improves the GZSL performance greatly. |
J. Kim; B. Shim; |
685 | MBA-RainGAN: A Multi-Branch Attention Generative Adversarial Network for Mixture of Rain Removal Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we originally consider that the overall object visibility is determined by MOR, and enrich the RainCityscapes by considering real-world raindrops to construct the MOR dataset, named RainCityscapes++. |
Y. Shen; et al. |
686 | End-to-End Keyword Spotting Using Neural Architecture Search and Quantization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces neural architecture search (NAS) for the automatic discovery of end-to-end keyword spotting (KWS) models for limited resource environments. |
D. Peter; W. Roth; F. Pernkopf; |
687 | Synpose: A Large-Scale and Densely Annotated Synthetic Dataset for Human Pose Estimation in Classroom Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Deep learning-based methods for human pose estimation require large volumes of training data to achieve superior performance. However, data acquisition in classroom environments … |
Z. Yu; Y. Li; Y. Liu; T. Liu; Y. Fu; |
688 | Stpointgcn: Spatial Temporal Graph Convolutional Network for Multiple People Recognition Using Millimeter-Wave Radar Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an end-to-end STPoint-GCN structure, which can extract and aggregate the features of sparse point clouds collected by millimeter-wave radar from the dimensions of space and time. |
C. Wang; P. Gong; L. Zhang; |
689 | Multiple Temporal Context Embedding Networks for Unsupervised Time Series Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, various patterns of anomalous data, especially those lasting for varying periods, are hard to be captured by plain networks. To tackle this problem, we propose a multiple temporal context embedding method. |
H. Li; X. Peng; H. Zhuang; Z. Lin; |
690 | Intermix: An Interference-Based Data Augmentation and Regularization Technique for Automatic Deep Sound Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present InterMix, an interference-based regularization and data augmentation strategy for automatic sound classification. |
R. Sawhney; A. T. Neerkaje; |
691 | Cross-Layer Aggregation with Transformers for Multi-Label Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, transformers have utilized multi-head attention to extract feature with long range dependencies. Inspired by this, this paper proposes a Cross-layer Aggregation with Transformers (CAT) framework, which leverages transformers to capture the long range dependencies of CNN-based features with Long Range Dependencies module and aggregate the features layer by layer with Cross-Layer Fusion module. |
W. Zhang; F. Zhu; J. Han; T. Guo; S. Hu; |
692 | Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. |
P. Bhattacharyya; C. Li; X. Zhao; I. Feh�rv�ri; J. Sun; |
693 | TriBYOL: Triplet BYOL for Self-Supervised Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel self-supervised learning method for learning better representations with small batch sizes. |
G. Li; R. Togo; T. Ogawa; M. Haseyama; |
694 | SAGA: Self-Augmentation with Guided Attention for Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For example, considering an image comprising a boat on the sea, one augmented view is cropped solely from the boat and the other from the sea, whereas linking these two to form a positive pair could be misleading. To resolve this issue, we introduce a Self-Augmentation with Guided Attention (SAGA) strategy, which augments input data based on predictive attention to learn representations rather than simply applying off-the-shelf augmentation schemes. |
C. -H. Yeh; C. -Y. Hong; Y. -C. Hsu; T. -L. Liu; |
695 | An Anomaly Detection Method Based on Self-Supervised Learning with Soft Label Assignment for Defect Visual Inspection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The reason is that the conventional method to generate labels ignores the differences of images between before and after the transformation. To address this issue, we propose soft label assignment (SLA) to construct soft labels via measuring the similarity between the original and transformed images. |
C. Hu; Y. Wang; |
696 | Contrastive Predictive Coding for Anomaly Detection of Fetal Health from The Cardiotocogram Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A method for computerized interpretation of the CTG, based on Contrastive Predictive Coding (CPC) is presented here. |
I. R. de Vries; I. A. M. Huijben; R. D. Kok; R. J. G. van Sloun; R. Vullings; |
697 | Graph Fine-Grained Contrastive Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the design of graph CL schemes is coarse-grained and difficult to capture the universal and intrinsic properties across intermediate layers. To address this problem, we propose a novel fine-grained graph contrastive learning model (FGCL), which decomposes graph CL into global-to-local levels and disentangles the two graph views into hierarchical graphs by pooling operation to capture both global and local dependencies across views and across layers. |
H. Tang; X. Liang; Y. Guo; X. Zheng; B. Wu; |
698 | Position-Invariant Adversarial Attacks on Neural Modulation Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the physical signal communication scenario, the adversarial signal transmitted by the adversary is affected by the channel, resulting in a random time delay with the original signal and causing decay on the attack performance. To ad-dress this issue, we propose the Position-Invariant adversarial attack Method (PIM) that generates the position-invariant adversarial signal by averaging the adversarial signals generated by shifted input signals to mitigate the channel effect on time delay. |
Z. Yu; Y. Xiong; K. He; S. Huang; Y. Zhao; J. Gu; |
699 | Using A Single Input to Forecast Human Action Keystates in Everyday Pick and Place Actions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a method tailored for everyday pick and place actions where the object of interest is known. |
H. Razali; Y. Demiris; |
700 | Adversarial Robustness By Design Through Analog Computing And Synthetic Gradients Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose a new defense mechanism against adversarial at-tacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. |
A. Cappelli; R. Ohana; J. Launay; L. Meunier; I. Poli; F. Krzakala; |
701 | Differentiable Programming A La Moreau Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We define a compositional calculus adapted to Moreau envelopes and show how to apply it to deep networks, and, more broadly, to learning systems equipped with automatic differentiation and implemented in the spirit of differentiable programming. |
V. Roulet; Z. Harchaoui; |
702 | Data Agnostic Filter Gating For Efficient Deep Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a data-agnostic filter pruning method that uses an auxiliary network named Dagger module to induce pruning with the pre-trained weights as input. |
H. Xu; et al. |
703 | Nearest Subspace Search in The Signed Cumulative Distribution Transform Space For 1d Signal Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper presents a new method to classify 1D signals using the signed cumulative distribution transform (SCDT). |
A. H. Mohammad Rubaiyat; M. Shifat-E-Rabbi; Y. Zhuang; S. Li; G. K. Rohde; |
704 | Energy Alignment for Bias Rectification in Class Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, from the perspective of energy-based model, we demonstrate that the free energies of categories are aligned with the label distribution theoretically, thus the energies of different classes are expected to be close to each other when aiming for balanced performance. |
B. Zhao; C. Chen; X. Xiao; Q. Ju; S. Xia; |
705 | A Two-Stage Contrastive Learning Framework For Imbalanced Aerial Scene Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore a novel two-stage contrastive learning framework, which aims to take care of representation learning and classifier learning, thereby boosting aerial scene recognition. |
L. Huang; et al. |
706 | A Maximal Correlation Approach to Imposing Fairness in Machine Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of deriving regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms that are more computationally efficient than existing algorithms. |
J. Lee; et al. |
707 | Boundary-Aware Bias Loss for Transformer-Based Aerial Image Segmentation Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the tremendous success of the transformer-based model in natural language processing (NLP), many efforts introduce the transformer-based model into the image processing tasks. |
Y. Zhang; X. Jiang; S. Liu; B. Hu; X. Gao; |
708 | Investigating Robustness of Biological Vs. Backprop Based Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate the robustness of biologically inspired Hebbian learning algorithm in depth. |
Y. Zhou; M. Wang; M. Gupta; A. Ambikapathi; P. N. Suganthan; S. Ramasamy; |
709 | Semi-Supervised Gaussian Mixture Variational Autoencoder for Pulse Shape Discrimination Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address the problem of pulse shape discrimination (PSD) for radiation sources characterization by leveraging a Gaussian mixture variational autoencoder (GMVAE). |
A. Abdulaziz; J. Zhou; A. Di Fulvio; Y. Altmann; S. McLaughlin; |
710 | How Neural Processes Improve Graph Link Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose a meta-learning approach with graph neural networks for link prediction: Neural Processes for Graph Neural Networks (NPGNN), which can not only perform both transductive and inductive learning tasks, but also generalize well when only training on a small subgraph. |
H. Liang; J. Gao; |
711 | Uncertainty Estimation with A VAE-Classifier Hybrid Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a hybrid model that combines a generative unit and a discriminative classifier to quantify uncertainty in a classification task. |
S. Lin; R. Clark; N. Trigoni; S. Roberts; |
712 | Context-Aware Graph-Based Self-Supervised Learning of Whole Slide Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, WSIs for prostrate cancer are validated and the model performance is evaluated based on diagnosis and grading of prostrate cancer and compared with ResNet50 as a traditional convolutional neural network (CNN) and multi-instance learning (MIL) as a leading approach in WSI diagnosis. |
M. Aryal; N. Y. Soltani; |
713 | Contrastive Sensor Transformer for Predictive Maintenance of Industrial Assets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Contrastive Sensor Transformer(CST), a novel approach for learning useful representations for robust fault identification without using task-specific labels1. |
Z. Bukhsh; |
714 | Improving Anomaly Detection with A Self-Supervised Task Based on Generative Adversarial Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We assume that the main reason is that these methods ignore the diversity of patterns in normal samples. To alleviate the above issue, this paper proposes a novel anomaly detection framework based on generative adversarial network, called ADe-GAN. |
H. Chai; W. Su; S. Tang; Y. Ding; B. Fang; Q. Liao; |
715 | Stgat-Mad : Spatial-Temporal Graph Attention Network For Multivariate Time Series Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel unsupervised multi-scale stacked spatial-temporal graph attention network for multivariate time series anomaly detection (STGAT-MAD). |
J. Zhan; et al. |
716 | Dual Graph Cross-Domain Few-Shot Learning for Hyperspectral Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, domain alignment is carried out based on local spatial information in most methods, rarely taking into account the non-local spatial information (non-local relationships) with strong correspondence. A Dual Graph Cross-domain Few-shot Learning (DG-CFSL) framework is proposed, trying to make up for the above shortcomings by combining Few-shot Learning (FSL) with domain alignment. |
Y. Zhang; W. Li; M. Zhang; R. Tao; |
717 | Personalized Pagerank Graph Attention Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we incorporate the limit distribution of Personalized PageRank (PPR) into graph attention networks (GATs) to reflect the larger neighbor information without introducing over-smoothing. |
J. Choi; |
718 | Multi-Relation Message Passing for Multi-Label Text Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a novel method, entitled Multi-relation Message Passing (MrMP), for the multi-label classification problem. |
M. Ozmen; H. Zhang; P. Wang; M. Coates; |
719 | Adaptive Attention Graph Capsule Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the existing GCNs using simple average or sum aggregation may neglect the characteristics of each node and the topology between nodes, resulting in a large amount of early-stage information lost during the graph convolution step. To tackle the above challenge, we innovatively propose an adaptive attention graph capsule network, named AA-GCN, for graph classification. |
X. Zheng; X. Liang; B. Wu; Y. Guo; H. Tang; |
720 | Graph Convolutional Networks With Autoencoder-Based Compression And Multi-Layer Graph Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work aims to propose a novel architecture and training strategy for graph convolutional networks (GCN). |
L. Giusti; C. Battiloro; P. Di Lorenzo; S. Barbarossa; |
721 | Deep Augmented Music Algorithm for Data-Driven Doa Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a new hybrid MB/DD DoA estimation architecture, based on the classical multiple signal classification (MUSIC) algorithm. |
J. P. Merkofer; G. Revach; N. Shlezinger; R. J. G. van Sloun; |
722 | Convmixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-Field Keyword Spotting Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In a real-world application, the user environment is typically noisy and may contain reverberations. We proposed a novel feature interactive convolutional model with merely 100K parameters to tackle this under the noisy far-field condition. |
D. Ng; Y. Chen; B. Tian; Q. Fu; E. S. Chng; |
723 | Semi-Supervised Source Localization With Residual Physical Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a hybrid approach to ML-based source localization, which uses both SSL and conventional, analytic signal processing approaches to obtain source location estimates. |
M. J. Bianco; P. Gerstoft; |
724 | Automated Prosody Classification for Oral Reading Fluency with Quadratic Kappa Loss and Attentive X-Vectors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Automated prosody classification in the context of oral reading fluency is a critical area for the objective evaluation of students� reading proficiency. In this work, we present the largest dataset to date in this domain. |
G. Sammit; et al. |
725 | Seed: Sound Event Early Detection Via Evidential Uncertainty Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve the problem, we propose a novel Polyphonic Evidential Neural Network (PENet) to model the evidential uncertainty of the class probability with Beta distribution. |
X. Zhao; et al. |
726 | Rank-Based Loss For Learning Hierarchical Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Previously, triplet loss has been proposed to address this problem, however it presents some issues like requiring the careful construction of the triplets, and being limited in the extent of hierarchical information it uses at each iteration. In this work we propose a rank based loss function that uses hierarchical information and translates this into a rank ordering of target distances between the examples. |
I. Nolasco; D. Stowell; |
727 | On The Relaxation of Orthogonal Tensor Rank and Its Nonconvex Riemannian Optimization for Tensor Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Natural extension of matrix rank has attracted interest toward a parsimonious representation and completion of a tensor with partial observation. In this paper, we focus on orthogonal tensor rank and discuss its nonconvex relaxation and minimization. |
K. Ozawa; |
728 | Robust High-Order Tensor Recovery Via Nonconvex Low-Rank Approximation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The latest tensor recovery methods based on tensor Singular Value Decomposition (t-SVD) mainly utilize the tensor nuclear norm (TNN) as a convex surrogate of the rank function. |
W. Qin; H. Wang; W. Ma; J. Wang; |
729 | Variational Bayesian Tensor Networks with Structured Posteriors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Tensor network (TN) methods have proven their considerable potential in deterministic regression and classification related paradigms, but remain underexplored in probabilistic settings. To this end, we introduce a variational inference framework for supervised learning in the context of TNs, referred to as the Bayesian Tensor Network (BTN). |
K. Konstantinidis; Y. L. Xu; Q. Zhao; D. P. Mandic; |
730 | Low-Rank Phase Retrieval with Structured Tensor Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an algorithm called Tucker-Structured Phase Retrieval (TSPR) that models the sequence of images as a tensor rather than a matrix that we factorize using the Tucker decomposition. |
S. M. Kwon; X. Li; A. D. Sarwate; |
731 | HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new algorithm called higher-order QR iteration (HO-QRI) for computing the Tucker decomposition of large and sparse tensors. |
Y. Sun; K. Huang; |
732 | A Multi-Resolution Low-Rank Tensor Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the latter, in this work we propose a multi-resolution low-rank tensor decomposition to describe (approximate) a tensor in a hierarchical fashion. |
S. Rozada; A. G. Marques; |
733 | Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. |
L. Zhou; D. Guliani; A. Kabel; G. Motta; F. Beaufays; |
734 | Probabilistic Fine-Grained Urban Flow Inference with Normalizing Flows Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose UFI-Flow � Urban Flow Inference via normalizing Flow, a novel model for addressing the FUFI problem in a principled manner by using a single probabilistic loss. |
T. Zhong; H. Yu; R. Li; X. Xu; X. Luo; F. Zhou; |
735 | Attention-Based Dual-Stream Vision Transformer for Radar Gait Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While the former shows the time-frequency patterns, the latter encodes the repetitive frequency patterns. In this work, a dual-stream net-work with attention-based fusion is proposed to fully aggregate the discriminant information from these two representations. |
S. Chen; W. He; J. Ren; X. Jiang; |
736 | Deep-MLE: Fusion Between A Neural Network and MLE for A Single Snapshot DOA Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework called DeepMLE, which gives a solution to the single-snapshot Direction Of Arrival (DOA) estimation problem, up to 4 distinct targets, using a radar equipped with a Minimum Redundancy antenna Array (MRA). |
M. L. L. de Oliveira; M. J. G. Bekooij; |
737 | Selective Mutual Learning: An Efficient Approach for Single Channel Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel mutual learning approach, namely selective mutual learning. |
H. M. Tan; D. -Q. Vu; C. -T. Lee; Y. -H. Li; J. -C. Wang; |
738 | Detection of Covid-19 from Joint Time and Frequency Analysis of Speech, Breathing and Cough Audio Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we explore the possibility of leveraging three audio modalities: cough, breathing, and speech to determine COVID-19 status. |
J. Harvill; et al. |
739 | Self-Critical Sequence Training for Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an optimization method called self-critical sequence training (SCST) to make the training procedure much closer to the testing phase. |
C. Chen; Y. Hu; N. Hou; X. Qi; H. Zou; E. S. Chng; |
740 | FastAudio: A Learnable Audio Front-End For Spoof Speech Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Spoof speech can be used to try and fool speaker verification systems that determine the identity of the speaker based on voice characteristics. This paper compares popular learnable front-ends on this task. |
Q. Fu; Z. Teng; J. White; M. E. Powell; D. C. Schmidt; |
741 | Complex IRM-Aware Training for Voice Activity Detection Using Attention Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel attention model-based deep neural network (DNN) architecture for VAD which takes advantage of complex Ideal Ratio Mask (cIRM). |
Y. Zhao; Y. Attabi; B. Champagne; W. -P. Zhu; |
742 | Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To obtain a continuous representation of audio and enable super resolution for arbitrary scale factor, we propose a method of implicit neural representation, coined Local Implicit representation for Super resolution of Arbitrary scale (LISA). |
J. Kim; Y. Lee; S. Hong; J. Ok; |
743 | An Investigation of The Effectiveness of Phase for Audio Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this study, we extensively investigated the effectiveness of including phase information of signals for eight audio classification tasks. |
S. Hidaka; K. Wakamiya; T. Kaburagi; |
744 | Study of Positional Encoding Approaches for Audio Spectrogram Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we study one component of the AST, the positional encoding, and propose several variants to improve the performance of ASTs trained from scratch, without ImageNet pretraining. |
L. Pepino; P. Riera; L. Ferrer; |
745 | Few-Shot Object Detection with Local Correspondence RPN and Attentive Head Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel few-shot object detection method named GCN-FSOD. |
J. Han; Y. Li; S. Wang; |
746 | Natural-Looking Adversarial Examples from Freehand Sketches Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we present a novel freehand sketch-based natural-looking adversarial example generator that we call SketchAdv. |
H. G. Kim; D. Nanni; S. S�sstrunk; |
747 | Video Anomaly Detection Via Prediction Network with Enhanced Spatio-Temporal Memory Exchange Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many approaches investigate the reconstruction difference between normal and abnormal patterns, but neglect that anomalies do not necessarily correspond to large reconstruction errors. To address this issue, we design a Convolutional LSTM Auto-Encoder prediction framework with enhanced spatiotemporal memory exchange using bi-directionalilty and a higher-order mechanism. |
G. Shen; Y. Ouyang; V. Sanchez; |
748 | Signal Compression Via Neural Implicit Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose using neural implicit representations as a novel paradigm for signal compression with neural networks, where the compact representation of the signal is defined by the very weights of the network. |
F. Pistilli; D. Valsesia; G. Fracastoro; E. Magli; |
749 | Hybrid Weighting Loss for Precipitation Nowcasting from Radar Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the uneven distribution of precipitation nowcasting data, we propose a novel data reweighting strategy, termed Hybrid Weighting, which hybrids reweighting and non-weighting strategies together, boosting the precipitation nowcasting performance. |
Y. Cao; L. Chen; D. Zhang; L. Ma; H. Shan; |
750 | Adversarial Learning Enhancement for 3D Human Pose and Shape Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hence we aim to improve the performance of adversarial learning in 3D human pose and shape estimation. |
Y. Sun; J. Zhang; W. Wang; |
751 | Domain Generalized Few-Shot Image Classification Via Meta Regularization Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the challenging domain generalized few-shot image classification problem. |
M. Zhang; S. Huang; D. Wang; |
752 | Generation for Unsupervised Domain Adaptation: A Gan-Based Approach for Object Classification with 3D Point Cloud Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead of aligning features between source data and target data, we propose a method that uses a Generative Adversarial Network (GAN) to generate synthetic data from the source domain so that the output is close to the target domain. |
J. Huang; J. Yuan; C. Qiao; |
753 | Exploring Transferability Measures and Domain Selection in Cross-Domain Slot Filling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Originally, this paper explores several ways to measure transferability across slot filling domains and finds that the shared slot number could serve as an efficient and effective estimator. |
X. -C. Li; Y. -J. Wang; L. Gan; D. -C. Zhan; |
754 | Maximum Batch Frobenius Norm for Multi-Domain Text Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this issue, we first investigate the structure of the batch classification output matrix and theoretically justify that the discriminability of the learned features has a positive correlation with the Frobenius norm of the batch output matrix. Based on this finding, we propose a maximum batch Frobenius norm (MBF) method to boost the feature discriminability for MDTC. |
Y. Wu; D. Inkpen; A. El-Roby; |
755 | Joint Global-Local Alignment for Domain Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, global alignment might be hindered by the noisy outputs corresponding to background pixels in the source domain. To address this limitation, we propose a local output alignment. |
S. Yarram; M. Yang; J. Yuan; C. Qiao; |
756 | Category-Adaptive Domain Adaptation for Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on UDA for semantic segmentation tasks. |
Z. Wang; Y. Luo; D. Huang; N. Ge; J. Lu; |
757 | Simpler Is Better: Spectral Regularization and Up-Sampling Techniques for Variational Autoencoders Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a 2D Fourier transform-based spectral regularization loss and evaluate it on the variational autoencoder. |
S. Bj�rk; J. N. Myhre; T. Haugland Johansen; |
758 | Augmenting Molecular Deep Generative Models with Topological Data Analysis Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we propose augmentation of deep generative models with topological data analysis (TDA) representations, known as persistence images, for robust encoding of 3D molecular geometry. |
Y. Schiff; V. Chenthamarakshan; S. C. Hoffman; K. Natesan Ramamurthy; P. Das; |
759 | Stylegan-Induced Data-Driven Regularization for Inverse Problems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our proposed approach, which we refer to as learned Bayesian reconstruction with generative models (L-BRGM), entails joint optimization over the style-code and the input latent code, and enhances the expressive power of a pre-trained StyleGAN2 generator by allowing the style-codes to be different for different generator layers. |
A. Conmy; S. Mukherjee; C. -B. Sch�nlieb; |
760 | A Closer Look at Autoencoders for Unsupervised Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Consequently, our exposition in this paper is to investigate the extent to which different latent space attributes of AEs impact their performances for anomaly detection tasks. |
O. K. Oyedotun; D. Aouada; |
761 | NFT-K: Non-Fungible Tangent Kernels Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We demonstrate the interpretability of this model on two datasets, showing that the multiple kernels model elucidates the interplay between the layers and predictions. |
S. Alemohammad; et al. |
762 | FDSNeT: An Accurate Real-Time Surface Defect Segmentation Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the current segmentation networks are not effective in dealing with defect boundary details, local similarity of different defects and low contrast between defect and background. To this end, we propose a real-time surface defect segmentation network (FDSNet) based on two-branch architecture, in which two corresponding auxiliary tasks are introduced to encode more boundary details and semantic context. |
J. Zhang; R. Ding; M. Ban; T. Guo; |
763 | Path Signatures for Non-Intrusive Load Monitoring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper demonstrates a systematic method of feature generation called the path signature which has recently been applied in machine learning, often with notable success. |
P. Moore; T. -M. Iliant; F. -A. Ion; Y. Wu; T. Lyons; |
764 | Data-Driven Approach for The Floquet Propagator Inverse Problem Solution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The paper shows the data-driven approach to inverse displacement data to the theoretical propagation constant approximation and consequently the material parameters using machine learning methods, namely a datadriven symbolic regression. |
A. Hvatov; |
765 | Chunkfusion: A Learning-Based RGB-D 3D Reconstruction Framework Via Chunk-Wise Integration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we devote our efforts to try to fill in this research gap by proposing a scalable and robust RGB-D 3D reconstruction framework, namely Chunk-Fusion. |
C. Guo; L. Zhang; Y. Shen; Y. Zhou; |
766 | Closing The Sim-to-Real Gap in Guided Wave Damage Detection with Adversarial Training of Variational Auto-Encoders Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While deep learning models can be an alternative detection method, there is often a lack of real-world training datasets. In this work, we counter this challenge by training an ensemble of variational autoencoders only on simulation data with a wave physics-guided adversarial component. |
I. D. Khurjekar; J. B. Harley; |
767 | Deep Learning on The Sphere for Multi-model Ensembling of Significant Wave Height Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: At the same time, region-specific dynamics that deviate from the general behavior across the globe also need to be accounted for. Addressing these two necessities, we propose the first Deep Learning approach for multi-model ensembling that operates directly on the sphere. |
A. Littardi; A. Hildeman; M. A. Nicolaou; |
768 | Local and Global Alignments for Generalizable Sensor-Based Human Activity Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Local And Global alignment (LAG) for generalized sensor-based HAR. |
W. Lu; J. Wang; Y. Chen; |
769 | Study on Time-of-Flight Estimation in Ultrasonic Well Logging Tool: Model-Driven Transfer Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposed a method that boosted accuracy of ToFs estimation in a complex geological environment. |
W. Zhang; Z. Li; Y. Guo; A. Qiu; Y. Li; Y. Shi; |
770 | Simulation-and-Mining: Towards Accurate Source-Free Unsupervised Domain Adaptive Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, we propose a Simulation-and-Mining (S&M) framework, which simulates false negatives by augmenting true positives and mines back false negatives alternatively and iteratively. |
P. Yuan; et al. |
771 | Target-Aware Auto-Augmentation for Unsupervised Domain Adaptive Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, for the first time, we propose an auto-augmentation problem under unsupervised domain adaptation scenarios. To solve this problem, we propose a simple yet effective target-aware auto-augmentation technique to search for an optimal data augmentation strategy on labeled source data, so as to boost the detection ability on the given unlabeled target data. |
Z. Li; L. Zhao; W. Chen; S. Yang; D. Xie; S. Pu; |
772 | Self-Ensemble Variance Regularization for Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Self-Ensemble Variance Regularization for Domain Adaptaton (VRDA) method to rectify the learning with pseudo labels. |
X. Liu; T. Dai; S. -T. Xia; Y. Jiang; |
773 | Transductive Clip with Class-Conditional Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch. |
J. Huang; W. Chen; S. Yang; D. Xie; S. Pu; Y. Zhuang; |
774 | Controlling The Fr�chet Variance Improves Batch Normalization on The Symmetric Positive Definite Manifold Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we extend upon previous works and propose a batch normalization algorithm for the SPD manifold that can be readily combined with SPD neural networks and unlike previous works controls both the Fr�chet mean and variance on the SPD manifold. |
R. J. Kobler; J. -i. Hirayama; M. Kawanabe; |
775 | Subspace Clustering Using Unsupervised Data Augmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the remarkable influence of data augmentation on the performance of neural networks, we propose a scalable approach that employs data augmentation within subspace clustering. |
M. Abdolali; N. Gillis; |
776 | Private Learning Via Knowledge Transfer with High-Dimensional Targets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While differential privacy (DP) offers mathematically rigorous protection, the high output dimensionality of segmentation tasks prevents the direct application of state-of-the-art algorithms such as Private Aggregation of Teacher Ensembles (PATE). In order to alleviate this problem, we propose to learn dimensionality-reducing transformations to map the prediction target into a bounded lower-dimensional space to reduce the required noise level during the aggregation stage. |
D. Fay; J. Sj�lund; T. J. Oechtering; |
777 | Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We develop a new neural network based independent component analysis (ICA) method by directly minimizing the dependence amongst all extracted components. |
H. Li; S. Yu; J. C. Pr�ncipe; |
778 | Label-Aware Ranked Loss for Robust People Counting Using Automotive In-Cabin Radar Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we introduce the Label-Aware Ranked loss, a novel metric loss function. |
L. Servadei; et al. |
779 | DeepHull: Fast Convex Hull Approximation in High Dimensions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose DeepHull, a new convex hull approximation algorithm based on convex deep networks (DNs) with continuous piecewise-affine nonlinearities and nonnegative weights. |
R. Balestriero; Z. Wang; R. G. Baraniuk; |
780 | Neighbor-Augmented Transformer-Based Embedding for Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the graph-based methods suffer from over-smoothing, while sequential models are largely influenced by data sparseness. To alleviate these issues, we propose NATM�a novel embedding-based method in large-scale learning incorporating both graph-based and sequential information. |
J. Zhang; F. Lin; W. Jiang; C. Yang; G. Liu; |
781 | Sentiment-Aware Distillation for Bitcoin Trend Forecasting Under Partial Observability Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main contribution of this paper is a method that exploits sentiment information as a source of additional supervision during the training process, allowing for improving the profitability of the developed strategies compared to baseline agents, while also allowing for operating the agent under partial observability, i.e., without requiring sentiment information as input during inference. |
G. Panagiotatos; N. Passalis; A. Tsantekidis; A. Tefas; |
782 | Robust Nonparametric Distribution Forecast with Backtest-Based Bootstrap and Adaptive Residual Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a practical and robust distribution forecast framework that relies on backtest-based bootstrap and adaptive residual selection. |
L. Wang; et al. |
783 | Variational Bayesian Graph Convolutional Network for Robust Collaborative Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the data used in real-world applications (e.g., video streaming services) are often incomplete and unreliable. To deal with this realistic situation, we newly introduce the probabilistic model based on variational Bayesian inference to GCN-based recommendation. |
N. Onodera; K. Maeda; T. Ogawa; M. Haseyama; |
784 | FINT: Field-Aware Interaction Neural Network for Click-Through Rate Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we proposed a novel prediction method, named FINT, that employs the Field-aware INTeraction layer which explicitly captures high-order feature interactions while retaining the low-order field information. |
Z. Zhao; S. Yang; G. Liu; D. Feng; K. Xu; |
785 | Making The Unknown More Certain: A Stacked Ensemble Classifier for Open Gesture Recognition with A Social Robot Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a novel stacked ensemble classifier for the unconstrained recognition of known and unknown gestural input data in nonverbal communication with a social robot. |
H. Brock; R. Gomez; |
786 | Applying Differential Privacy to Tensor Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, utilization of the observed tensors often raises serious privacy concerns in many practical scenarios. To address this issue, we propose a solid and unified framework that contains several approaches for applying differential privacy to the two most widely used tensor decomposition methods: i) CANDECOMP/PARAFAC and ii) Tucker decompositions. |
Z. Wei; Z. Li; X. Mao; J. Wang; |
787 | Low-Complexity Attention Modelling Via Graph Tensor Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, current attention models rely on flat-view matrix methods to process tokens embedded in vector spaces; this results in exceedingly high parameter complexity which is prohibitive for practical applications. To this end, we introduce a novel Tensorized Graph Attention (TGA) mechanism, which leverages on the recent Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based graph filters. |
Y. L. Xu; K. Konstantinidis; S. Li; L. Stankovic; D. P. Mandic; |
788 | An Accelerated Rank-(L,L,1,1) Block Term Decomposition Of Multi-Subject Fmri Data Under Spatial Orthonormality Constraint Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, since the number of whole-brain voxels is very large and rank L is larger than 1, the rank-(L,L,1,1) BTD requires high computation and memory. Therefore, we propose an accelerated rank-(L,L,1,1) BTD algorithm based upon the method of alternating least squares (ALS). |
L. -D. Kuang; et al. |
789 | Improving Dynamic Graph Convolutional Network with Fine-Grained Attention Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these methods must use the node information during the entire timeline and ignore two subtle factors: the influence of nodes change with time and are related to the frequency of events. Therefore, we propose a stable and scalable dynamic GCN method using a fine-grained attention mechanism named FADGC. |
B. Wu; X. Liang; X. Zheng; Y. Guo; H. Tang; |
790 | AdaPID: An Adaptive PID Optimizer for Training Deep Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nonetheless, such algorithms often suffer from slow convergence, sizable fluctuations, and abundant local solutions, to name a few. In this context, the present paper draws ideas from adaptive control of dynamical systems, and develops an adaptive proportional-integral-derivative (AdaPID) solver for fast, stable, and effective training of DNNs. |
B. Weng; J. Sun; A. Sadeghi; G. Wang; |
791 | Memory in Echo State Networks and The Controllability Matrix Rank Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We examine the rank behavior of minimal task-effective ESNs predicting the chaotic Lorenz 1963 system for single and multi-variable input/output. |
B. Whiteaker; P. Gerstoft; |
792 | OT Cleaner: Label Correction As Optimal Transport Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing label correction methods can not handle with heavy noise or datasets with samples of many categories so well. We explain the reasons and introduce a global label distribution regularization to remedy these deficiencies. |
J. Xia; C. Tan; L. Wu; Y. Xu; S. Z. Li; |
793 | Demon: Improved Neural Network Training With Momentum Decay Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Here, we propose a decaying momentum (DEMON) hyperparameter rule. |
J. Chen; C. Wolfe; Z. Li; A. Kyrillidis; |
794 | Depth Pruning with Auxiliary Networks for Tinyml Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Depth pruning is a form of pruning that requires no specialized hardware but suffers from a large accuracy falloff. To improve this, we propose a modification that utilizes a highly efficient auxiliary network as an effective interpreter of inter-mediate feature maps. |
J. D. De Leon; R. Atienza; |
795 | Glassoformer: A Query-Sparse Transformer for Post-Fault Power Grid Voltage Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. |
Y. Zheng; C. Hu; G. Lin; M. Yue; B. Wang; J. Xin; |
796 | Sar-Shipnet: Sar-Ship Detection Neural Network Via Bidirectional Coordinate Attention and Multi-Resolution Feature Fusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We broadly extract different types of SAR image features and raise the intriguing question that whether these extracted features are beneficial to (1) suppress data variations (e.g., complex land-sea backgrounds, scattered noise) of real-world SAR images, and (2) enhance the features of ships that are small objects and have different aspect (length-width) ratios, therefore resulting in the improvement of ship detection. To answer this question, we propose a SAR-ship detection neural network (call SAR-ShipNet for short), by newly developing Bidirectional Coordinate Attention (BCA) and Multi-resolution Feature Fusion (MRF) based on CenterNet. |
Y. Deng; D. Guan; Y. Chen; W. Yuan; J. Ji; M. Wei; |
797 | Spatio-Temporal PRRS Epidemic Forecasting Via Factorized Deep Generative Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an epidemic analysis framework for the outbreak prediction in the livestock industry, focusing on the study of the most costly and viral infectious disease in the swine industry � the PRRS virus. |
M. Shamsabardeh; B. Azari; B. Mart�nez-L�pez; |
798 | Fusion-Id: A Photoplethysmography and Motion Sensor Fusion Biometric Authenticator With Few-Shot On-Boarding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose Fusion-ID, which use wrist-worn PPG sensors fused with motion sensor data as a way to do bio authentication on wrist worn devices. |
H. Kumar; H. S. Mousavi; B. Shahsavari; |
799 | Dynimp: Dynamic Imputation for Wearable Sensing Data Through Sensory and Temporal Relatedness Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a model, termed as DynImp, to handle different time point�s missingness with nearest neighbors along feature axis and then feeding the data into a LSTM-based denoising autoen-coder which can reconstruct missingness along the time axis. |
Z. Huo; et al. |
800 | Incremental Context Aware Attentive Knowledge Tracing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We first empirically demonstrate an evolving Knowledge Tracing (eKT) scenario with distinct distribution of learner performances and diversity of questions from similar concepts. |
C. S. Yin Wong; G. Yang; N. F. Chen; R. Savitha; |
801 | Robust and Efficient Uncertainty Aware Biosignal Classification Via Early Exit Ensembles Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing ensemble approaches, however, introduce a high computational and memory cost limiting their applicability to real-time biosignal applications (e.g. ECG, EEG). To address these issues, we propose early exit ensembles (EEEs) for estimating predictive uncertainty via an implicit ensemble of early exits. |
A. Campbell; L. Qendro; P. Li�; C. Mascolo; |
802 | Temporal Cross-Graph Network for Brain Functional Activity Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Temporal Cross-Graph Network (TCGN) for predicting brain functional activity, which can comprehensively exploit multi-modal spatial dependence and temporal patterns. |
X. Yuan; W. Wang; Y. Kong; J. Wu; G. Yang; H. Shu; |
803 | POPO: Pessimistic Offline Policy Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we show that there exists an estimation gap of value-based deep RL algorithms in the offline setting. |
Q. He; X. Hou; Y. Liu; |
804 | Byzantine-Robust Federated Deep Deterministic Policy Gradient Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, a number of malicious agents may deliberately modify the messages transmitted to the central server so as to hinder the learning process, which is often described by the Byzantine attacks model. To address this issue, we propose to employ robust aggregation to replace the simple average aggregation rule in FRL and enhance Byzantine robustness. |
Q. Lin; Q. Ling; |
805 | Improving Actor-Critic Reinforcement Learning Via Hamiltonian Monte Carlo Method Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, inspired by the previous use of Hamiltonian Monte Carlo (HMC) in VI, we propose to integrate the policy network of actor-critic RL with HMC, which is termed as Hamiltonian Policy. |
D. Xu; F. Fekri; |
806 | Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the exploration-exploitation dilemma of reinforcement learning algorithms. |
M. Chen; X. Xiao; W. Zhang; X. Gao; |
807 | Hypergraph-Based Reinforcement Learning for Stock Portfolio Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a hypergraph-based reinforcement learning method for stock portfolio selection, in which the fundamental issue is to learn a policy function generating appropriate trading actions given the current environments. |
X. Li; C. Cui; D. Cao; J. Du; C. Zhang; |
808 | Memory-Based Message Passing: Decoupling The Message for Propagation from Discrimination Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A revised message passing method needs to maintain each node�s discriminative ability when aggregating the message from neighbors. To this end, we propose a Memory-based Message Passing (MMP) method to decouple the message of each node into a self-embedding part for discrimination and a memory part for propagation. |
J. Chen; W. Liu; J. Pu; |
809 | PEAR: Photographic Embedding for Aesthetic Rating Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Photographic Embedding for Aesthetic Rating (PEAR) framework to assimilate their advantages. |
H. Wu; J. Yao; |
810 | Gradient-Weighted Class Activation Mapping for Spatio Temporal Graph Convolutional Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we design an extension of Grad-CAMs for spatio temporal graph convolution (STG-Grad-CAM) to improve the interpretability of STGCNs. |
P. Das; A. Ortega; |
811 | Deep Learning Based Off-Angle Iris Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, CNNs trained with the triplet loss function are applied to extract features for iris recognition. |
E. Jalilian; G. Wimmer; A. Uhl; M. Karakaya; |
812 | Towards Robust Visual Transformer Networks Via K-Sparse Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce K-sparse attention to preserve low inductive bias, while robustifying transformers against adversarial attacks. |
S. Amini; S. Ghaemmaghami; |
813 | A Global to Local Guiding Network for Missing Data Imputation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although many recent methods have made remarkable advances, the local homogenous regions especially in boundary and the reasonable of the imputed data are still two most challenging issues. To address these issues, we propose a novel Global to Local Guiding Network (G2LGN) based on generative adversarial network for missing data imputation, which is composed of a Global-Impute-Net (GIN), a Local-Impute-Net (LIN) and an Impute Guider Model (IGM). |
W. Wang; Y. Chai; Y. Li; |
814 | LocUNet: Fast Urban Positioning Using Radio Maps and Deep Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We present LocUNet: A deep learning method for localization, based merely on Received Signal Strength (RSS) from Base Stations (BSs), which does not require any increase in computation complexity at the user devices with respect to the device standard operations, unlike methods that rely on Time of Arrival (ToA) or Angle of Arrival information. |
�. Yapar; R. Levie; G. Kutyniok; G. Caire; |
815 | LiteHAR: Lightweight Human Activity Recognition from WIFI Signals with Random Convolution Kernels Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a lightweight human activity recognition (LiteHAR) approach which, unlike the state-of-the-art deep learning models, does not require extensive training of a large number of parameters. |
H. Salehinejad; S. Valaee; |
816 | CDX-NET: Cross-Domain Multi-Feature Fusion Modeling Via Deep Neural Networks for Multivariate Time Series Forecasting in AIOps Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our solution introduces a deep neural network named CDX-Net to describe and analyze aperiodic MTS from both temporal and spectral domains. |
J. Li; et al. |
817 | A Clustering-based ML Scheme for Capacity Approaching Soft Level Sensing in 3D TLC NAND Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the capacity-approaching maximum mutual-information method, this work presents the data-driven approach to collect all the optimal 2-bit soft-read level pairs over the 3D TLC NAND. |
L. -W. Liu; Y. -C. Liao; H. -C. Chang; |
818 | Dynamic Resource Optimization for Adaptive Federated Learning Empowered By Reconfigurable Intelligent Surfaces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The aim of this work is to propose a novel dynamic resource allocation strategy for adaptive Federated Learning (FL), in the context of beyond 5G networks endowed with Reconfigurable Intelligent Surfaces (RISs). |
C. Battiloro; M. Merluzzi; P. Di Lorenzo; S. Barbarossa; |
819 | Learning-Based Resource Allocation with Dynamic Data Rate Constraints Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the problem of resource allocation (RA) in wireless communication networks, where each user has a dynamic data rate constraint. |
P. Behmandpoor; P. Patrinos; M. Moonen; |
820 | Denoising-Oriented Deep Hierarchical Reinforcement Learning for Next-Basket Recommendation? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a Hierarchical Reinforcement Learning framework for next Basket recommendation, named HRL4Ba, which learns the personalized inter-basket and intra-basket contexts of the user for dynamic denoising. |
Q. Du; L. Yu; H. Li; Y. Leng; N. Ou; |
821 | Competitive Multi-Agent Reinforcement Learning with Self-Supervised Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present MASRL: Competitive Multi-Agent Self-supervised representations for Reinforcement Learning in the multi-agent competitive environment. |
D. Su; J. D. Lee; J. M. Mulvey; H. V. Poor; |
822 | Model-Based Online Learning for Resource Sharing in Joint Radar-Communication Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a model-based online learning (MBOL) framework to enable a structured way to formulate efficient online learning algorithms for resource sharing in joint radar-communication (JRC) systems. |
P. Pulkkinen; V. Koivunen; |
823 | Qrelation: An Agent Relation-Based Approach for Multi-Agent Reinforcement Learning Value Function Factorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These approaches do not fully consider the relational information among agents, resulting in sub-optimal models for complex tasks. To remedy this issue, we propose QRelation which is a graph neural network approach for value function factorization. |
S. Shen; et al. |
824 | Denoising-Guided Deep Reinforcement Learning For Social Recommendation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Therefore, modeling all of the user�s social relationships without considering the relevance of friends will introduce noises to the social context. To address this issue, in this work, we propose a Denoisingguided deep Reinforcement Learning framework for Social recommendation (DRL4So). |
Q. Du; L. Yu; H. Li; Y. Leng; N. Ou; J. Xiang; |
825 | An Efficient DP-SGD Mechanism for Large Scale NLU Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a more efficient DP-SGD for training using a GPU infrastructure and apply it to fine-tuning models based on LSTM and transformer architectures. |
C. Dupuy; R. Arava; R. Gupta; A. Rumshisky; |
826 | MAKD:MULTIPLE Auxiliary Knowledge Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While we notice that the teacher model has defect in extracting features of another task samples. To improve knowledge distillation under such situation, we propose Multiple Auxiliary Subspaces (MAS). |
Z. Chen; X. Jin; Y. He; H. Xue; |
827 | Feature Imitating Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a novel approach to neural learning: the Feature-Imitating-Network (FIN). |
S. Saba-Sadiya; T. Alhanai; M. M. Ghassemi; |
828 | Over-Parameterized Network Solves Phase Retrieval Effectively Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose using an over-parameterized network to represent the unknown signal to solve the problem. |
J. Li; C. Wang; |
829 | Deep Spatio-Temporal Wind Power Forecasting Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we develop a deep learning approach based on encoder-decoder structure. |
J. Li; M. Armandpour; |
830 | Multiple Kernel K-Means Clustering with Simultaneous Spectral Rotation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: What�s more, an efficient alternative algorithm is proposed to solve the joint optimization problem. |
J. Lu; Y. Lu; R. Wang; F. Nie; X. Li; |
831 | Multitask Gaussian Process With Hierarchical Latent Interactions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We validate that considering the interactions can promote knowledge transferring in MTGP and compare our approach with some state-of-the-art MTGPs on both synthetic-and real-world datasets. |
K. Chen; T. van Laarhoven; E. Marchiori; F. Yin; S. Cui; |
832 | Discrete Multi-Kernel K-Means with Diverse and Optimal Kernel Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Discrete Multi-kernel k-means with Diverse and Optimal Kernel Learning (DMK-DOK) model, which adaptively seeks for a better kernel by residing in the base kernel neighborhood and negotiates the kernel learning and clustering. |
Y. Lu; J. Lu; R. Wang; F. Nie; |
833 | Access Control for Privacy-Preserving Gaussian Process Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose access control for privacy-preserving Gaussian process regression (GPR), in which the encrypted data are generated through a random unitary transform (RUT). |
T. Nakachi; Y. Wang; |
834 | Scalable Ridge Leverage Score Sampling for The Nystr�m Method Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The Nystr�m method, known as an efficient technique for approximating Gram matrices, builds upon a small subset of the data called landmarks, whose choice impacts the quality of the approximated Gram matrix. Various sampling methods have been proposed in the literature to choose such a subset, among which some based on ridge Leverage scores, which come with good theoretical and practical results. |
F. Cherfaoui; H. Kadri; L. Ralaivola; |
835 | On Submodular Set Cover Problems for Near-Optimal Online Kernel Basis Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we formalize that greedy-based approximations, under suitably chosen compression statistics, can admit near-optimal representations. |
H. Pradhan; A. Koppel; K. Rajawat; |
836 | Improving Feature Generalizability with Multitask Learning in Class Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The major challenge in CIL is catastrophic forgetting, i.e., preserving as much of the old knowledge as possible while learning new tasks. Various techniques, such as regularization, knowledge distillation, and the use of exemplars, have been proposed to resolve this issue. |
D. Ma; C. I. Tang; C. Mascolo; |
837 | Adversarial Mask Transformer for Sequential Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Mask language model has been successfully developed to build a transformer for robust language understanding. |
H. Lio; S. -E. Li; J. -T. Chien; |
838 | Online Learning with Probabilistic Feedback Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in practice, the nominal feedback graph often entails uncertainties, which renders it impossible to reveal the actual relationship among experts. To cope with this challenge, the present work develops a novel online learning algorithm to deal with uncertainties while making use of the uncertain feedback graph. |
P. M. Ghari; Y. Shen; |
839 | Data Incubation � Synthesizing Missing Data for Handwriting Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate how a generative model can be used to build a better recognizer through the control of content and style. |
J. -H. R. Chang; et al. |
840 | Tracking The Dimensions of Latent Spaces of Gaussian Process Latent Variable Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel sequential method that relies on the Bayesian approach to estimate the dimension of a latent space of a Gaussian process latent variable model. |
Y. Liu; P. M. Djuric; |
841 | Controlled Sensing and Anomaly Detection Via Soft Actor-Critic Reinforcement Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the anomaly detection problem in the presence of noisy observations and to tackle the tuning and efficient exploration challenges that arise in deep reinforcement learning algorithms, we in this paper propose a soft actor-critic deep reinforcement learning framework. |
C. Zhong; M. C. Gursoy; S. Velipasalar; |
842 | Win The Lottery Ticket Via Fourier Analysis: Frequencies Guided Network Pruning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the Magnitude-Based Pruning (MBP) scheme and analyze it from a novel perspective through Fourier analysis on the deep learning model to guide model designation. |
Y. Shang; B. Duan; Z. Zong; L. Nie; Y. Yan; |
843 | SparseBFA: Attacking Sparse Deep Neural Networks with The Worst-Case Bit Flips on Coordinates Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose SparseBFA, an algorithm that searches for a small number of bits among the coordinates of nonzero weights when the parameters of DNNs are stored using sparse matrix formats. |
K. Lee; A. P. Chandrakasan; |
844 | Adversarial Examples Detection Based on Error Level Analysis and Space Mapping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, researchers have found that the stability of adversarial example after space mapping is worse than that of the clean example. Therefore, we propose a two-branch architecture to detect adversarial examples based on the aforementioned strategies. |
S. Huang; S. Wang; J. Chen; G. Li; W. Wang; |
845 | Learning Monocular 3D Human Pose Estimation With Skeletal Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The quantitative result of cross-dataset experiment demonstrates that our resulting model achieves superior generalization accuracy on the publicly available dataset. |
Z. Chen; A. Sugimoto; S. -H. Lai; |
846 | Training Stable Graph Neural Networks Through Constrained Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we leverage the stability property of GNNs as a typing point in order to seek for representations that are stable within a distribution. |
J. Cervi�o; L. Ruiz; A. Ribeiro; |
847 | Mismatched Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the combinatorial nature of the mismatch problem, existing methods are often designed for small datasets and simple linear models but are not scalable to large-scale datasets and complex models. |
X. Xian; M. Hong; J. Ding; |
848 | Supervised Training of Siamese Spiking Neural Networks with Earth Mover’s Distance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a supervised training framework for optimizing Earth Mover’s Distance (EMD) between spike trains with spiking neural networks (SNN). |
M. Pabian; D. Rzepka; M. Pawlak; |
849 | On The Effectiveness of Active Learning By Uncertainty Sampling in Classification of High-Dimensional Gaussian Mixture Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Towards closing the gap between practical use and theoretical understanding in active learning, we propose to characterize the exact behavior of uncertainty sampling for high-dimensional Gaussian mixture data, in a modern regime of big data where the numbers of samples and features are commensurately large. |
X. Mai; S. Avestimehr; A. Ortega; M. Soltanolkotabi; |
850 | Neural Collapse in Deep Homogeneous Classifiers and The Role of Weight Decay Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present results showing the role of weight decay in the emergence of Neural Collapse in deep homogeneous networks. |
A. Rangamani; A. Banburski-Fahey; |
851 | Synthesis of Adversarial Samples in Two-Stage Classifiers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the robustness of two Two-Stage Hierarchical Classifier models, the flat and top-down hierarchical classifiers, termed FHC and TDHC respectively, to targeted and confidence reduction attacks. |
I. R. Alkhouri; A. Velasquez; G. K. Atia; |
852 | Synergistic Network Learning and Label Correction for Noise-Robust Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice. To address this problem, we propose a robust label correction framework combining the ideas of small loss selection and noise correction, which learns network parameters and reassigns ground truth labels iteratively. |
C. Gong; K. Bin; E. J. Seibel; X. Wang; Y. Yin; Q. Song; |
853 | Social Welfare Maximization in Cross-Silo Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we model the interactions among organizations in cross-silo FL as a public goods game for the first time and theoretically prove that there exists a social dilemma where the maximum social welfare is not achieved in Nash equilibrium. |
J. Chen; Q. Hu; H. Jiang; |
854 | Privacy-Preserving Distributed Expectation Maximization for Gaussian Mixture Model Using Subspace Perturbation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we give an explicit information-theoretical analysis of a federated expectation maximization algorithm for Gaussian mixture model and prove that the intermediate updates can cause severe privacy leakage. |
Q. Li; J. S. Gundersen; K. Tjell; R. Wisniewski; M. G. Christensen; |
855 | A Communication Efficient Quasi-Newton Method for Large-Scale Distributed Multi-Agent Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a communication efficient quasi-Newton method for large-scale multi-agent convex composite optimization. |
Y. Li; P. G. Voulgaris; N. M. Freris; |
856 | A Byzantine-Resilient Dual Subgradient Method for Vertical Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we provide a problem formulation of vertical FL in the presence of Byzantine attacks, and propose a Byzantine-resilient dual subgradient method. |
K. Yuan; Z. Wu; Q. Ling; |
857 | Byzantine-Robust Aggregation with Gradient Difference Compression and Stochastic Variance Reduction for Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that the vanilla combination of the distributed compressed stochastic gradient descent (SGD) with geometric median-based robust aggregation suffers from the compression noise under Byzantine attacks. In light of this observation, we propose to reduce the compression noise with gradient difference compression to improve the Byzantine-robustness. |
H. Zhu; Q. Ling; |
858 | Variance Reduction-Boosted Byzantine Robustness in Decentralized Stochastic Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To eliminate the negative effect of the stochastic noise, we introduce two variance reduction methods, stochastic average gradient algorithm (SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to Byzantine-robust decentralized stochastic optimization. |
J. Peng; W. Li; Q. Ling; |
859 | Integer-Only Zero-Shot Quantization for Efficient Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Moreover, they require training and/or validation data during quantization, which may not be available due to security or privacy concerns. To address these limitations, we propose an integer-only, zeroshot quantization scheme for ASR models. |
S. Kim; et al. |
860 | NnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker Text-to-speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address that, we propose a zero-shot multi-speaker TTS, named nnSpeech, that could synthesis a new speaker voice without fine-tuning and using only one adaption utterance. |
B. Zhao; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
861 | Noise-Robust Speech Recognition With 10 Minutes Unparalleled In-Domain Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (SimuGAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels. |
C. Chen; N. Hou; Y. Hu; S. Shirol; E. S. Chng; |
862 | Enhancing Class Understanding Via Prompt-Tuning For Zero-Shot Text Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a prompt-based method, which enhances semantic understanding for each class and learns the matching between texts and classes for better ZSTC. |
Y. Dan; J. Zhou; Q. Chen; Q. Bai; L. He; |
863 | Filteraugment: An Acoustic Environmental Data Augmentation Method Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose FilterAugment, a data augmentation method for regularization of acoustic models on various acoustic environments. |
H. Nam; S. -H. Kim; Y. -H. Park; |
864 | The Representation Jensen-R�nyi Divergence Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by kernels. |
J. K. Hoyos Osorio; O. Skean; A. J. Brockmeier; L. Gonzalo Sanchez Giraldo; |
865 | Multi-View Information Bottleneck Without Variational Approximation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we extend the information bottleneck principle to supervised multi-view learning scenario and use the recently proposed matrix-based R�nyi�s a-order entropy functional to optimize the resulting objective directly, without the necessity of variational approximation or adversarial training. |
Q. Zhang; S. Yu; J. Xin; B. Chen; |
866 | Time-Frequency and Geometric Analysis of Task-Dependent Learning in Raw Waveform Based Acoustic Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Despite its varied success, there have not been many attempts to understand how spectral/temporal feature integration from raw inputs helps recognize task-dependent information. Towards this aim, this work presents data-dependent and data-independent methods for understanding the modelling behavior of acoustic models. |
D. Gupta; V. Abrol; |
867 | Channel Redundancy and Overlap in Convolutional Neural Networks with Channel-Wise NNK Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that redundancy between channels is significant and varies with the layer depth and the level of regularization during training. |
D. Bonet; A. Ortega; J. Ruiz-Hidalgo; S. Shekkizhar; |
868 | FedClean: A Defense Mechanism Against Parameter Poisoning Attacks in Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose FedClean, an FL mechanism that is robust to model poisoning attacks. |
A. Kumar; V. Khimani; D. Chatzopoulos; P. Hui; |
869 | A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter IT Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we design the first method for revealing the identity of the speaker of a training utterance with access only to a gradient. |
T. Dang; O. Thakkar; S. Ramaswamy; R. Mathews; P. Chin; F. Beaufays; |
870 | On The Convergence of ADAM-Type Algorithms for Solving Structured Single Node and Decentralized Min-Max Saddle Point Games Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Further, most algorithms are centralized in nature and cannot be adapted to a decentralized architecture in a straightforward manner. This study aims to address these issues by introducing general two-step adaptive algorithms for obtaining first-order Nash equilibrium solutions of min-max games in both single-node and decentralized architectures. |
B. Barazandeh; K. Curtis; C. Sarkar; R. Sriharsha; G. Michailidis; |
871 | Partial Variable Training for Efficient On-Device Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel method, called Partial Variable Training (PVT), that only trains a small subset of variables on edge devices to reduce memory usage and communication cost. |
T. -J. Yang; D. Guliani; F. Beaufays; G. Motta; |
872 | Gradient Staleness in Asynchronous Optimization Under Random Communication Delays Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study and analyze centralized asynchronous optimization. |
H. Al-Lawati; S. C. Draper; |
873 | Tempo: Improving Training Performance in Cross-Silo Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Different from its commonly studied scenario to centrally store clients� data in institutions, which implicitly neglects clients� data privacy, we study cross-silo federated learning in a preferable setting to keep private data on clients, and train the global model with a three-layer structure, where the institutions aggregate model updates from their clients for several rounds before sending their aggregated updates to the central server. In this context, we mathematically prove that the number of clients� local training epochs affects the global model performance and thus propose a new approach, Tempo, to adaptively tune the epoch number of each client through training. |
C. Ying; B. Li; B. Li; |
874 | DMANET: Deep Learning-Based Differential Microphone Arrays for Multi-Channel Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a novel differential microphone arrays network (DMANet) for solving the multi-channel speech separation problem. |
X. Yang; J. Wei; |
875 | Amicable Examples for Informed Source Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, in this work, we improve the performance of a pre-trained separation model that does not use any side-information. |
N. Takahashi; Y. Mitsufuji; |
876 | Remix-Cycle-Consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This study introduces the remix-cycle-consistency loss as a more appropriate objective function and uses it to fine-tune adversarially learned source separation models. |
K. Saijo; T. Ogawa; |
877 | An Information Maximization Based Blind Source Separation Approach for Dependent and Independent Sources Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new information maximization (infomax) approach for the blind source separation problem. |
A. T. Erdogan; |
878 | Blind Separation of Linear-Quadratic Mixtures of Mutually Independent and Autocorrelated Sources Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we are interested in the blind separation of linear-quadratic mixtures of mutually independent sources when successive samples of each source are correlated. |
S. Hosseini; Y. Deville; |
879 | Large-Scale Independent Component Analysis By Speeding Up Lie Group Techniques Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our online approach estimates a whitening and a rotation matrix with stochastic gradient descent on centered or uncentered data. |
M. Hermann; G. Umlauf; M. O. Franz; |
880 | Predicting The Generalization Gap in Deep Models Using Anchoring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel strategy for directly predicting accuracy on unseen target data with the help of anchoring and pre-text encoding in predictive models. |
V. Narayanaswamy; R. Anirudh; I. Kim; Y. Mubarka; A. Spanias; J. J. Thiagarajan; |
881 | When Does Backdoor Attack Succeed in Image Reconstruction? A Study of Heuristics Vs. Bi-Level Solution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recent studies have demonstrated the lack of robustness of image reconstruction networks to test-time evasion attacks, posing security risks and potential for misdiagnoses. In this paper, we evaluate how vulnerable such networks are to training-time poisoning attacks for the first time. |
V. Taneja; P. -Y. Chen; Y. Yao; S. Liu; |
882 | Mixed Knowledge Relation Transformer for Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we continually explore the relationship between objects from both internal and external perspectives, and embed the vital image global information into the internal relationship module. |
T. Chen; Z. Li; J. Wei; T. Xian; |
883 | Balanced Stripe-Wise Pruning In The Filter Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Neural network pruning offers a promising prospect to compress and accelerate modern deep convolution networks. The stripe-wise pruning method with a finer granularity than … |
Z. Huo; C. Wang; W. Chen; Y. Li; J. Wang; J. Wu; |
884 | Gan-Based Joint Activity Detection and Channel Estimation for Grant-Free Random Access Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a novel model-free learning method based on generative adversarial network (GAN) to tackle the JADCE problem. |
S. Liang; Y. Zou; Y. Zhou; |
885 | Cascading Bandit Under Differential Privacy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Under DP, we propose a UCB-based algorithm which guarantees ? |
K. Wang; J. Dong; B. Wang; S. Li; |
886 | Iterative Re-weighted Least Squares Algorithms for Non-negative Sparse and Group-sparse Recovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We extend this approach to handle not only sparse recovery but also group-sparse recovery. |
A. Majumdar; |
887 | Exact Partitioning of High-Order Planted Models with A Tensor Nuclear Norm Constraint Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the problem of exact partitioning of the hypergraphs generated by high-order planted models. |
C. Ke; J. Honorio; |
888 | No More Than 6ft Apart: Robust K-Means Via Radius Upper Bounds Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest imbalanced clustering. We propose to remedy such a scenario by introducing a maximal radius constraint r on the clusters formed by the centroids, i.e., samples from the same cluster should not be more than 2r apart in terms of l2 distance. |
A. Imtiaz Humayun; R. Balestriero; A. Kyrillidis; R. Baraniuk; |
889 | Deep Kernel Learning Networks with Multiple Learning Paths Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes deep kernel learning networks with multiple learning paths (DKL-MLP) for nonlinear function approximation. |
P. Xu; Y. Wang; X. Chen; Z. Tian; |
890 | Provable Sample Complexity Guarantees For Learning Of Continuous-Action Graphical Games With Nonparametric Utilities Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the problem of learning the exact structure of continuous-action games with non-parametric utility functions. |
A. Barik; J. Honorio; |
891 | Cross-Modal Knowledge Distillation For Vision-To-Sensor Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, restricted computational resources associated with wearable devices, i.e., smartwatch, failed to directly support such advanced methods. To tackle this issue, this study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework. |
J. Ni; R. Sarbajna; Y. Liu; A. H. H. Ngu; Y. Yan; |
892 | CLIPCAM: A Simple Baseline For Zero-Shot Text-Guided Object And Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in real-world scenarios, since there are infinite amounts of unlabeled data beyond the categories of publicly available datasets, it is not only time- and manpower-consuming to annotate all the data but also requires a lot of computational resources to train the detectors. To address these issues, we show a simple and reliable baseline that can be easily obtained and work directly for the zero-shot text-guided object and action localization tasks without introducing additional training costs by using Grad-CAM, the widely used class visual saliency map generator, with the help of the recently released Contrastive Language-Image Pre-Training (CLIP) model by OpenAI, which is trained contrastively using the dataset of 400 million image-sentence pairs with rich cross-modal information between text semantics and image appearances. |
H. -A. Hsia; et al. |
893 | Exploring Dual Stream Global Information For Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Dual Global Enhanced Transformer (DGET) to explicitly utilize both visual and textual global information. |
T. Xian; Z. Li; T. Chen; H. Ma; |
894 | Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a novel multi-objective loss function including: i) contrastive objectives that enable similarity preservation in intra- and inter-modal similarities; ii) an adversarial objective that is enforced across two modalities for cross-modal representation consistency; and iii) binarization objectives for generating hash codes. |
G. Mikriukov; M. Ravanbakhsh; B. Demir; |
895 | Robust Thermal Infrared Pedestrian Detection By Associating Visible Pedestrian Knowledge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel thermal infrared pedestrian detection framework which can associate and utilize the complementary pedestrian knowledge from visible images. |
S. Park; D. Hwi Choi; J. Uk Kim; Y. M. Ro; |
896 | A Generalized Hierarchical Nonnegative Tensor Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Here, we propose a new HNTF model which directly generalizes a HNMF model special case, and provide a supervised extension. |
J. Vendrow; J. Haddock; D. Needell; |
897 | Two Strategies Toward Lightweight Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the two issues, we propose two strategies, namely global-guided attention strategy (GGAS) and channel-wise scaling strategy (CWSS), which can significantly improve the performance of the state-of-the-arts with negligible overheads. |
Z. Du; J. Liu; J. Tang; G. Wu; |
898 | On Mini-Batch Training with Varying Length Time Series Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel method of normalizing the lengths of the time series in a dataset by exploiting the dynamic matching ability of Dynamic Time Warping (DTW). |
B. Kenji Iwana; |
899 | ACP: Adaptive Channel Pruning for Efficient Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an adaptive channel pruning module (ACPM) to automatically adjust the pruning rate with respect to each channel, which is more efficient to prune redundant channel parameters, as well as more robust to datasets and backbones. |
Y. Zhang; Y. Yuan; Q. Wang; |
900 | Bayesian Continual Imputation and Prediction For Irregularly Sampled Time Series Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Bayesian Continual Imputation and Prediction for Time-series Data (B-CIPIT), for learning from a sequence of time-series tasks. |
Y. Guo; J. W. Jun Poh; C. S. Y. Wong; S. Ramasamy; |
901 | Confidence-Aware Multi-Teacher Knowledge Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, existing studies mainly integrate knowledge from diverse sources by averaging over multiple teacher predictions or combining them using other label-free strategies, which may mislead student in the presence of low-quality teacher predictions. To tackle this problem, we propose Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD), which adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels, with those teacher predictions close to one-hot labels assigned large weights. |
H. Zhang; D. Chen; C. Wang; |
902 | Learnable Hypergraph Laplacian for Hypergraph Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the first learning-based method tailored for constructing adaptive hypergraph structure, termed HypERgrAph Laplacian aDaptor (HERALD), which serves as a generic plug-and-play module for improving the representational power of HGCNNs. |
J. Zhang; Y. Chen; X. Xiao; R. Lu; S. -T. Xia; |
903 | Graph Learning From Multivariate Dependent Time Series Via A Multi-Attribute Formulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an alternating direction method of multipliers (ADMM) solution to minimize a sparse-group lasso penalized negative pseudo log-likelihood objective function to estimate the precision matrix of the random vector associated with the entire multi-attribute graph. |
J. K. Tugnait; |
904 | Generalized Sliced Probability Metrics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a new family of sliced probability metrics, namely Generalized Sliced Probability Metrics (GSPMs), based on the idea of slicing high-dimensional distributions into a set of their one-dimensional marginals. |
S. Kolouri; K. Nadjahi; S. Shahrampour; U. Simsekli; |
905 | Recovery of Noisy Pooled Tests Via Learned Factor Graphs with Application to COVID-19 Testing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a strategy for pooled testing designed for noisy settings, which bypasses the need for a tractable acquisition model. |
E. F. Ben-Knaan; Y. C. Eldar; N. Shlezinger; |
906 | A Remedy For Distributional Shifts Through Expected Domain Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we employ multi-modal translation networks to tackle the correlation shifts that appear when data is sampled out-of-distribution. |
J. -C. Gagnon-Audet; S. Shahtalebi; F. Rudzicz; I. Rish; |
907 | Deterministic Transform Based Weight Matrices for Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to use deterministic transforms as weight matrices for several feedforward neural networks. |
P. G. Jurado; X. Liang; S. Chatterjee; |
908 | Adaptive Group Testing with Mismatched Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we consider noisy adaptive group testing design with specific test sensitivity and specificity that select the optimal group given previous test results based on pre-selected utility function. |
M. Fan; B. -J. Yoon; F. J. Alexander; E. R. Dougherty; X. Qian; |
909 | Mixed In Time And Modality: Curse Or Blessing� Cross-Instance Data Augmentation for Weakly Supervised Multimodal Temporal Fusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We verify it quantitatively on the audio-visual video parsing (AVVP) task, and propose a cross-instance data-augmentation framework, which can preserve the benefits of feature fusion while providing explicit feedbacks for feature cross-interference. |
Y. Zhu; C. Tian; Z. Jiang; A. Men; H. Wang; Q. Chen; |
910 | MTAF: Shopping Guide Micro-Videos Popularity Prediction Using Multimodal and Temporal Attention Fusion Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Multimodal and Temporal Attention Fusion (MTAF) framework to represent and combine multi-modal features. |
N. Ou; L. Yu; H. Li; Q. Du; J. Xiang; W. Gong; |
911 | Learning To Integrate Vision Data Into Road Network Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to integrate remote sensing vision data into road network data for improved embeddings with graph neural networks. |
O. Stromann; A. Razavi; M. Felsberg; |
912 | Hierarchical Signal Fusion Network for Pulsar Detection with Phase-Correlation and Signal Attentions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a hierarchical signal fusion network with phase-correlation and signal attentions are pro-posed. |
H. Wu; M. Chi; |
913 | Recognition Of Silently Spoken Word From Eeg Signals Using Dense Attention Network (DAN) Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a method for recognizing silently spoken words from electroencephalogram (EEG) signals using a Dense Attention Network (DAN). |
S. Datta; A. Aondoakaa; J. J. Holmberg; E. Antonova; |
914 | Wav2CLIP: Learning Robust Audio Representations from Clip Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). |
H. -H. Wu; P. Seetharaman; K. Kumar; J. P. Bello; |
915 | Asd-Transformer: Efficient Active Speaker Detection Using Self And Multimodal Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recent work has shown the success of transformers in multimodal settings, thus we propose a novel framework that leverages modern transformer and concatenation mechanisms to efficiently capture the interaction between audio and video modalities for ASD. |
G. Datta; T. Etchart; V. Yadav; V. Hedau; P. Natarajan; S. -F. Chang; |
916 | Mmlatch: Bottom-Up Top-Down Fusion For Multimodal Sentiment Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training. |
G. Paraskevopoulos; E. Georgiou; A. Potamianos; |
917 | Multi-Channel Attentive Graph Convolutional Network with Sentiment Fusion for Multimodal Sentiment Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For cross-modality interactive learning, we exploit the self-attention mechanism combined with densely connected graph convolutional networks to learn inter-modality dynamics. |
L. Xiao; X. Wu; W. Wu; J. Yang; L. He; |
918 | Learning Music Sequence Representation From Text Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To excavate better MUsic SEquence Representation from labeled audio, we propose a novel text-supervision pre-training method, namely MUSER. |
T. Chen; Y. Xie; S. Zhang; S. Huang; H. Zhou; J. Li; |
919 | Enhancing Affective Representations Of Music-Induced Eeg Through Multimodal Supervision And Latent Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper we extract efficient, personalized affective representations from EEG signals during music listening. |
K. Avramidis; C. Garoufis; A. Zlatintsi; P. Maragos; |
920 | Towards Learning Universal Audio Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learning systems on that benchmark. |
L. Wang; et al. |
921 | Differentiable Wavetable Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We achieve high-fidelity audio synthesis with as little as 10 to 20 wavetables and demonstrate how a data-driven dictionary of waveforms opens up unprecedented one-shot learning paradigms on short audio clips. |
S. Shan; L. Hantrakul; J. Chen; M. Avent; D. Trevelyan; |
922 | Neural Audio-To-Score Music Transcription For Unconstrained Polyphony Using Compact Output Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work aims at tackling two of them: we introduce a novel output representation which addresses shortcomings related to the sequence-based A2S recognition framework and we report a first approximation to dealing with unconstrained polyphony. |
V. Arroyo; J. J. Valero-Mas; J. Calvo-Zaragoza; A. Pertusa; |
923 | End-To-End Music Remastering System Using Self-Supervised And Adversarial Training Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Remastering follows the same technical process, in which the context lies in mastering a song for the times. As these tasks have high entry barriers, we aim to lower the barriers by proposing an end-to-end music remastering system that transforms the mastering style of input audio to that of the target. |
J. Koo; S. Paik; K. Lee; |
924 | Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposed a novel one-shot voice conversion framework based on vector quantization voice conversion (VQVC) and AutoVC, called AVQVC. |
H. Tang; X. Zhang; J. Wang; N. Cheng; J. Xiao; |
925 | Towards Speaker Age Estimation With Label Distribution Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods for speaker age estimation usually treat it as a multi-class classification or a regression problem. |
S. Si; J. Wang; J. Peng; J. Xiao; |
926 | Distributed Audio-Visual Parsing Based On Multimodal Transformer and Deep Joint Source Channel Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This makes it unable to effectively capture the relationship between audio-visual events, and is not suitable for implementation in the network transmission scenario. In this paper, we focus on these problems and propose a distributed audio-visual parsing network (DAVPNet) based on multimodal transformer and deep joint source channel coding (DJSCC). |
P. Wang; J. Li; M. Ma; X. Fan; |
927 | TalkingFlow: Talking Facial Landmark Generation with Multi-Scale Normalizing Flow Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we pro-pose a flow-based probabilistic network named TalkingFlow to generate natural talking facial landmark with head movements from speech data. |
S. Liang; Z. Zhou; R. Li; J. Zhang; H. Bao; |
928 | Incorporating Gaze Behavior Using Joint Embedding With Scene Context for Driver Takeover Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we aim to predict driver takeover timing in order for the system to prepare transition from automation to driver control. |
Y. Qiu; C. Busso; T. Misu; K. Akash; |
929 | Multi-View And Multi-Modal Event Detection Utilizing Transformer-Based Multi-Sensor Fusion Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: For sensors to cooperate effectively in such a situation, the system should be able to exchange information among sensors and combines information that is useful for identifying events in a complementary manner. For such a mechanism, we propose a Transformer-based multi-sensor fusion (MultiTrans) which combines multi-sensor data on the basis of the relationships between features of different viewpoints and modalities. |
M. Yasuda; Y. Ohishi; S. Saito; N. Harado; |
930 | Distributed Label Dequantized Gaussian Process Latent Variable Model for Multi-View Data Integration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel method for multi-view data analysis, distributed label dequantized Gaussian process latent variable model (DLDGP). |
K. Watanabe; K. Maeda; T. Ogawa; M. Haseyama; |
931 | Co-Attention-Guided Bilinear Model for Echo-Based Depth Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the problem of estimating depth maps of indoor scenes based on echoes. |
G. Irie; T. Shibata; A. Kimura; |
932 | Modeling The Detection Capability Of High-Speed Spiking Cameras Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the issue, this paper proposes a modeling algorithm that studies the detection capability of spiking cameras. |
J. Zhao; et al. |
933 | Modernn: Towards Fine-Grained Motion Details for Spatiotemporal Predictive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We integrate DCB with standard ConvLSTM and introduce Motion Details RNN (MoDeRNN) to capture fine-grained spatiotemporal features and improve the expression of latent states of RNNs to achieve significant quality. |
Z. Chai; Z. Xu; C. Yuan; |
934 | Graph-Based Point Cloud Denoising Using Shape-Aware Consistency For Free-Viewpoint Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel graph-based denoising method to correct the quantization error (step noise) arising in the process of generating the visual hull, a commonly used technique to synthesize free-viewpoint video. |
K. Nonaka; R. Watanabe; H. Kato; T. Kobayashi; E. Pavez; A. Ortega; |
935 | DCSN: Deformable Convolutional Semantic Segmentation Neural Network for Non-Rigid Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel semantic segmentation network for outdoor and unstructured scenarios for autonomous driving based on deformable convolution and geometric distortion pipelines. |
B. -S. Huang; C. -C. Hsu; W. -T. Liao; H. -Y. Kao; X. -Y. Wang; |
936 | Transformer-Based Domain Adaptation for Event Data Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the insufficiency issue of annotated event data, we propose to train the CTN via the source-free Unsupervised Domain Adaptation (UDA) algorithm leveraging large-scale labeled image data. |
J. Zhao; S. Zhang; T. Huang; |
937 | Multimodal Emotion Recognition with Surgical and Fabric Masks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we investigate how different types of masks affect automatic emotion classification in different channels of audio, visual, and multimodal. |
Z. Yang; K. Nayan; Z. Fan; H. Cao; |
938 | Human Emotion Recognition Using Multi-Modal Biological Signals Based On Time Lag-Considered Correlation Maximization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The contribution of this paper is the construction of a recognition method with considering the time lag for getting truly close to the realization of the occurrence mechanism of human emotions. |
Y. Moroto; K. Maeda; T. Ogawa; M. Haseyama; |
939 | Multi-Modal Emotion Recognition with Self-Guided Modality Calibration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Self-guided Modality Calibration Network (SMCN) to realize multi-modal alignment which can capture the global connections without interfering with unimodal learning. |
M. Hou; Z. Zhang; G. Lu; |
940 | Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition? Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, a successful model that fuses modalities requires components that can effectively aggregate task-relevant information from each modality. As cross-modal attention is seen as an effective mechanism for multi-modal fusion, in this paper we quantify the gain that such a mechanism brings compared to the corresponding self-attention mechanism. |
V. Rajan; A. Brutti; A. Cavallaro; |
941 | A Pre-Trained Audio-Visual Transformer for Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a pretrained audio-visual Transformer trained on more than 500k utterances from nearly 4000 celebrities from the VoxCeleb2 dataset for human behavior understanding. |
M. Tran; M. Soleymani; |
942 | Memobert: Pre-Training Model with Prompt-Based Learning for Multimodal Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a multimodal pre-training model MEmoBERT for multimodal emotion recognition, which learns multimodal joint representations through self-supervised learning from a self-collected large-scale unlabeled video data that come in sheer volume. |
J. Zhao; R. Li; Q. Jin; X. Wang; H. Li; |
943 | Global-Local Feature Enhancement Network for Robust Object Detection Using MmWave Radar and Camera Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the existing early fusion methods are vulnerable to data noise, while the existing late fusion methods ignore the association of object information between feature maps in the early stage. To overcome these shortcomings, we propose a Global-Local Feature Enhancement Network (GLE-Net), a two-stage deep fusion detector, which first generates anchors from two sensors and uses an auxiliary module to locally enhance the single-branch missing proposals, and then fuses the global features from the multimodal sensors to improve final detection results. |
K. Deng; D. Zhao; Q. Han; Z. Zhang; S. Wang; H. Ma; |
944 | Learning Correlation for Online Multiple Object Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we follow the joint detection and tracking paradigm to learn correlation for online multiple object tracking. |
Y. Wang; C. Zhuang; H. Ye; Y. Yan; H. Wang; |
945 | Bounding Box Distribution Learning and Center Point Calibration for Robust Visual Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to learn bounding box distribution in training and calibrate the center point response in inference for robust online tracking. |
C. Zhuang; Y. Liang; Y. Yan; Y. Lu; H. Wang; |
946 | Multi-Focus Guided Semantic Aggregation for Video Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a method called Multi-Focus guided Semantic Aggregation (MFSA) for video object detection. |
H. Ye; G. Wang; Y. Lu; Y. Yan; H. Wang; |
947 | Enhancing Contrastive Learning with Temporal Cognizance for Audio-Visual Representation Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present our experimental results on action recognition and video summarization tasks. |
C. Lavania; S. Sundaram; S. Srinivasan; K. Kirchhoff; |
948 | Cross-Modal Knowledge Distillation in Multi-Modal Fake News Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many existing methods simply integrate the textual and visual features as a shared representation but overlook their correlations, which may lead to sub-optimal results. To address this problem, we propose CMC, a two-stage fake news detection method with a novel knowledge distillation that captures Cross-Modal feature Correlations while training. |
Z. Wei; H. Pan; L. Qiao; X. Niu; P. Dong; D. Li; |
949 | Training Strategies for Automatic Song Writing: A Unified Framework Perspective Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a unified framework following the pre-training and fine-tuning paradigm to address all four ASW tasks with one model. |
T. Qian; J. Shi; S. Guo; P. Wu; Q. Jin; |
950 | Residual-Guided Personalized Speech Synthesis Based on Face Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus in this work, we innovatively extract personalized speech features from human faces to synthesize personalized speech using neural vocoder. |
J. Wang; Z. Wang; X. Hu; X. Li; Q. Fang; L. Liu; |
951 | Sketch Storytelling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address these issues, we replace the natural image in image caption dataset with the sketch with the corresponding objects to generate pseudo sketch, which can obtain pseudo paired sketch-caption and sketch-image data. Due to these pseudo sketches are not drawn in a standardized way, we present a selective attention module to reduce noise for pseudo sketches. |
Y. Zhou; |
952 | MAG+: An Extended Multimodal Adaptation Gate for Multimodal Sentiment Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an extended MAG, called MAG+, to reinforce multimodal fusion. |
X. Zhao; Y. Chen; W. Li; L. Gao; B. Tang; |
953 | Image-Text Alignment and Retrieval Using Light-Weight Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We proposed a Light-weight Transformer Alignment Network (LTAN), which adopts the current mainstream visual and textual feature extraction methods. |
W. Li; X. Fan; |
954 | A General Framework For Incomplete Cross-Modal Retrieval With Missing Labels And Missing Modalities Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a general framework for handling cross-modal retrieval tasks with both missing labels and missing modalities. |
M. Li; S. -L. Huang; L. Zhang; |
955 | Subgraph Representation Learning with Hard Negative Samples for Inductive Link Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, despite the importance of a high-quality negative sample in link prediction, there is currently no method for selecting hard negatives for inductive link prediction. To overcome this limitation, we propose a new sampling method for selecting hard negative samples given a positive triplet. |
H. Kwak; H. B. K. Jung; |
956 | Deep Hashing with Hash Center Update for Efficient Image Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an approach for learning binary hash codes for image retrieval. |
A. Jose; D. Filbert; C. Rohlfing; J. -R. Ohm; |
957 | Prototype-Based Inter-Camera Learning for Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a Prototype-based Inter-camera ReID (PIRID) method, which tackles the ICS setting through the lens of prototype learning. |
L. Wang; W. Zhang; D. Wu; P. Hong; B. Li; |
958 | DHWP: Learning High-Quality Short Hash Codes Via Weight Pruning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To obtain short hash codes with high quality for fast and accurate image retrieval, we propose a novel framework named Deep Hashing via Weight Pruning (DHWP). |
Z. Ma; et al. |
959 | Node Slicing Broad Learning System for Text Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a flat network called broad learning system (BLS) is employed to derive a novel learning method � node slicing broad learning system (NSBLS). |
F. Liu; X. Wu; C. Li; |
960 | Audio-Text Retrieval in Context Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate several audio features as well as sequence aggregation methods for better audio-text alignment. |
S. Lou; X. Xu; M. Wu; K. Yu; |
961 | Improved Meta Learning for Low Resource Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new meta learning based framework for low resource speech recognition that improves the previous model agnostic meta learning (MAML) approach. |
S. Singh; R. Wang; F. Hou; |
962 | Quantized Winograd Acceleration for CONV1D Equipped ASR Models on Mobile Devices Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a novel quantized Wino-grad optimization framework, combining quantization and fast convolution to achieve efficient inference acceleration for ASR models on mobile devices. |
Y. Yao; C. Wang; J. Huang; |
963 | Acoustic-to-Articulatory Inversion Based on Speech Decomposition and Auxiliary Feature Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, most current works only use audio speech as input, causing an inevitable performance bottleneck. |
J. Wang; J. Liu; L. Zhao; S. Wang; R. Yu; L. Liu; |
964 | An Audio-Saliency Masking Transformer for Audio Emotion Classification in Movies Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In specifics, auditory saliency emphasizes audio segments that need to be attended to cognitively appraise and experience emotion. In this work, inspired by this mechanism, we propose an end-to-end feature masking network for audio emotion recognition in movies. |
Y. -T. Wu; J. -L. Li; C. -C. Lee; |
965 | Generative Adversarial Network Including Referring Image Segmentation For Text-Guided Image Manipulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel generative adversarial network to improve the performance of image manipulation using natural language descriptions that contain desired attributes. |
Y. Watanabe; R. Togo; K. Maeda; T. Ogawa; M. Haseyama; |
966 | Text2Poster: Laying Out Stylized Texts on Retrieved Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel data-driven framework, called Text2Poster, to automatically generate visually-effective posters from textual information. |
C. Jin; H. Xu; R. Song; Z. Lu; |
967 | Deep Rank Cross-Modal Hashing with Semantic Consistent for Image-Text Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing cross-modal methods mainly exploit feature-level similarity between multi-modal data, the label-level similarity and relative ranking relationship between adjacent instances have been ignored. To address these problems, we propose a novel Deep Rank Cross-modal Hashing(DRCH) method that fully explores the intra-modal semantic similarity relationship. |
X. Liu; H. Zeng; Y. Shi; J. Zhu; K. -K. Ma; |
968 | VQA-BC: Robust Visual Question Answering Via Bidirectional Chaining Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. |
M. Lao; Y. Guo; W. Chen; N. Pu; M. S. Lew; |
969 | Type-Aware Medical Visual Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, it is critical to exploit sufficient semantic features with the consideration of characteristic of medical images and language. In this paper, we propose a novel From Image type point To Sentence (FITS) method to tackle the above challenge. |
A. Zhang; W. Tao; Z. Li; H. Wang; W. Zhang; |
970 | From Bottom-Up To Top-Down: Characterization Of Training Process In Gaze Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This stroll passes adjacently to local minima and locations in the geometry of loss landscape associated with low loss. Overall, the network moves from one low loss area to a lower loss area.In this work, we explored those low loss areas and minima, and tried to understand them. |
R. M. Hecht; K. Liu; N. Garnett; A. Telpaz; O. Tsimhoni; |
971 | Meta Talk: Learning To Data-Efficiently Generate Audio-Driven Lip-Synchronized Talking Face With High Definition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel data-efficient audio-driven talking face generation method, which uses just a short target video to produce both lip-synchronized and high-definition face video driven by arbitrary audio in the wild. |
Y. Zhang; et al. |
972 | Map: Multispectral Adversarial Patch to Attack Person Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To evaluate the robustness of multispectral person detectors in the physical world, we propose a novel Multispectral Adversarial Patch (MAP) generation framework. |
T. Kim; H. J. Lee; Y. M. Ro; |
973 | Genre-Conditioned Long-Term 3D Dance Generation Driven By Music Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on generating long-term 3D dance from music with a specific genre. |
Y. Huang; et al. |
974 | Learning Sound Localization Better from Semantically Similar Samples Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The objective of this work is to localize the sound sources in visual scenes. |
A. Senocak; H. Ryu; J. Kim; I. S. Kweon; |
975 | Bi-Directional Modality Fusion Network For Audio-Visual Event Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, an audio-visual adjustment mechanism exists in a complicated multi-modal perception system. Inspired by this observation, we propose a novel bi-directional modality fusion network (BMFN), which not only simply fuses audio and visual features, but also adjusts the fused features to increase their representativeness with the help of the original audio and visual contents. |
S. Liu; W. Quan; Y. Liu; D. -M. Yan; |
976 | Dynamic Multi-Scale Loss Balance for Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we carefully study the objective imbalance of multi-scale detector training. |
Y. Luo; X. Cao; J. Zhang; P. Cheng; T. Wang; Q. Feng; |
977 | Latent Space Slicing for Enhanced Entropy Modeling In Learning-Based Point Cloud Geometry Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an enhanced entropy model that takes into account both the hyperprior and previously encoded latent features to estimate the mean and scale of compressed features. |
N. Frank; D. Lazzarotto; T. Ebrahimi; |
978 | DAM-GAN : Image Inpainting Using Dynamic Attention Map Based on Fake Texture Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To reduce pixel inconsistency disorder resulted from fake textures, we introduce a GANbased model using dynamic attention map (DAM-GAN). |
D. Cha; D. Kim; |
979 | Improving Reference-Based Image Colorization For Line Arts Via Feature Aggregation And Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: The tremendous semantic discrepancy between the line art drawings without texture and the reference pictures containing rich color challenges current image-to-image translation … |
S. Wu; Q. Wang; S. Xu; S. Zhang; |
980 | Few-Shot Gaze Estimation with Model Offset Predictors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to learn a person-specific offset predictor which outputs the difference between the person-agnostic model and the many-shot person-specific model with as few as one training sample. |
J. Ma; X. Zhang; Y. Wu; V. Hedau; S. -F. Chang; |
981 | Adversarial Examples for Image Cropping in Social Media Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an approach to produce the adversarial examples to image cropping systems. |
M. Yoshida; M. Okuda; |
982 | Robust Adaptive Beamforming Based on Power Method Processing and Spatial Spectrum Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we devise an efficient RAB technique for dealing with covariance matrix reconstruction issues. |
S. Mohammadzadeh; V. H. Nascimento; R. C. De Lamare; O. Kukrer; |
983 | An Adaptive Orientational Beamforming Technique for Narrowband Interference Rejection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate and extend the linearly constrained minimum variance (LCMV) algorithm for conventional wideband beamforming system to the recently proposed orientational beamforming (OBF) system. |
J. Han; B. P. Ng; M. H. Er; |
984 | Phase-Only Reconfigurable Sparse Array Beamforming Using Deep Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop a design approach based on supervised deep neural network (DNN) to learn and mimic a phase-only sparse MaxSINR beamformer. |
S. A. Hamza; M. G. Amin; B. K. Chalise; |
985 | Robust Adaptive Beamforming Maximizing The Worst-Case SINR Over Distributional Uncertainty Sets for Random INC Matrix And Signal Steering Vector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: The robust adaptive beamforming (RAB) problem is considered via the worst-case signal-to-interference-plus-noise ratio (SINR) maximization over distributional uncertainty sets for … |
Y. Huang; W. Yang; S. A. Vorobyov; |
986 | Improved Beamforming Encoding for Joint Radar and Communication Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an improved method allowing good control of the transmit beampattern power resulting in lower communication error level. |
T. Aittom�ki; V. Koivunen; |
987 | Study of The Null Directions on The Performance of Differential Beamformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the performance of differential beamformers as a function of the null directions. |
X. Wang; I. Cohen; J. Benesty; J. Chen; |
988 | DOA M-Estimation Using Sparse Bayesian Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider three specific choices: the ML-loss for the circularly symmetric complex Gaussian distribution, the ML-loss for the complex multivariate t-distribution (MVT) with ? |
C. F. Mecklenbr�uker; P. Gerstoft; E. Ollila; |
989 | Learning-Aided Initialization for Variational Bayesian DOA Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a sparsity-promoting method for the detection and estimation of the directions of arrival (DOAs) of source signals. |
Y. Park; F. Meyer; P. Gerstoft; |
990 | Neural Network-Based Compression Framework for DOA Estimation Exploiting Distributed Array Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the complexity involved with the large array size, we propose a compression framework consisting of multiple parallel encoders and a classifier. |
S. R. Pavel; Y. D. Zhang; |
991 | T-SVD Based Broadband Non-Synchronous Measurements Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a tensor singular value decomposition based non-synchronous measurements method for broadband multiple sound source localization. |
L. Chen; W. Sun; L. Huang; G. Chen; |
992 | Low Complex Accurate Multi-Source RTF Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a method for robust estimation of the individual RTFs in a multi-source acoustic scenario. |
C. Li; J. Martinez; R. C. Hendriks; |
993 | Multiple Offsets Multilateration: A New Paradigm for Sensor Network Calibration with Unsynchronized Reference Nodes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we expand the previous state-of-the-art on positioning formulations by introducing Multiple Offsets Multilateration (MOM), a new mathematical framework to compute the receivers positions with pseudoranges from unsynchronized reference transmitters at known positions. |
L. Ferranti; K. �str�m; M. Oskarsson; J. Boutellier; J. Kannala; |
994 | Reference Microphone Selection and Low-Rank Approximation Based Multichannel Wiener Filter with Application to Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an experimental study on the low-rank approximation and reference microphone selection based MWF with application to noisy speech recognition. |
X. -Y. Chen; J. Zhang; L. -R. Dai; |
995 | Incoherent Synthesis of Sparse Broadband Arrays Based on A Parameter-Free Subspace Clustering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an incoherent design method of sparse broadband arrays that optimizes the number of sensors and their positions simultaneously. |
G. Gubnitsky; Y. Buchris; I. Cohen; |
996 | Initialization-Free Implicit-Focusing (IF2) for Wideband Direction-of-Arrival Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel method to focus or align array manifolds at different frequencies to a single reference frequency in wideband direction of arrival (DOA) estimation. |
J. Millhiser; P. Sarangi; P. Pal; |
997 | Recurrent Design of Probing Waveform for Sparse Bayesian Learning Based DOA Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a recurrent scheme of waveform design by sequentially leveraging on the previous-round SBL estimates. |
L. Wu; J. Dai; B. S. M. R.; R. Hu; B. Ottersten; |
998 | Unimodular Waveform Design with Low Correlation Levels: A Fast Algorithm Development to Support Large-Scale Code Lengths Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our major contributions lie in the transformation of the objective into a proper form via multiple shift matrices and the reformulation of problem in order to use alternating direction method of multipliers (ADMM) technique. |
Y. Li; C. Shi; R. Tao; |
999 | Airborne Mimo Radar Transmit-Receive Design Under Spectral Constraint in Signal-Dependent Clutter Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the non-convex joint design problem, we develop an iterative algorithm based on iterative feasible point pursuit successive convex approximation (FPP-SCA). |
Z. Li; J. Shi; D. Wu; S. Shi; Q. Zhou; |
1000 | Weak Target Detection in Massive MIMO Radar Via An Improved Reinforcement Learning Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an improved RL based method to enhance the detection probability of weak targets. |
W. Zhai; X. Wang; M. S. Greco; F. Gini; |
1001 | RIS-Aided Monostatic Mimo Radar with Co-Located Antennas Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper considers a monostatic multiple-input multiple-output (MIMO) radar aided by a reconfigurable intelligent surface (RIS). |
S. Buzzi; E. Grossi; M. Lops; L. Venturino; |
1002 | Convolutional Beamspace Using IIR Filters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a variant which uses IIR instead of FIR filters in the convolutional layer. |
P. -C. Chen; P. P. Vaidyanathan; |
1003 | Rational Arrays for DOA Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper considers rational arrays, where ri are rational numbers. |
P. Kulkarni; P. P. Vaidyanathan; |
1004 | Localizing More Sources Than Sensors in Presence of Coherent Sources Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an algorithm that is shown to be able to localize more sources than sensors in presence of correlated or coherent sources without the knowledge of the source coherence structure. |
X. Chen; Z. Yang; |
1005 | Two-Snapshot DOA Estimation Via Hankel-Structured Matrix Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the problem of estimating the direction of arrival (DOA) using a sparsely sampled uniform linear array (ULA). |
M. Bokaei; S. Razavikia; A. Amini; S. Rini; |
1006 | A Novel Angular Estimation Method in The Presence of Nonuniform Noise Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel algorithm for direction-of-arrival (DOA) estimation in nonuniform sensor noise is developed. |
M. Esfandiari; S. A. Vorobyov; |
1007 | Partially Relaxed Orthogonal Least Squares Weighted Subspace Fitting Direction-of-Arrival Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The Partial Relaxation framework has recently been introduced to address the Direction-of-Arrival (DOA) estimation problem [1]�[3]. |
D. Schenck; K. L�bbe; M. Trinh-Hoang; M. Pesavento; |
1008 | A New Coprime-Array-based Configuration with Augmented Degrees of Freedom and Reduced Mutual Coupling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a new type of coprime-array-based structure, named AtCADiS, is proposed to achieve increased degrees of freedom (DoFs) and reduced mutual coupling. |
N. Mohsen; A. Hawbani; M. Agrawal; S. Alsamhi; L. Zhao; |
1009 | Coarray Manifold Separation In The Spherical Harmonics Domain For Enhanced Source Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the concept of difference coarray to spherical arrays and propose an algorithm which utilises the increased degrees-of-freedom (DOF) provided by the virtual coarray sensors to perform enhanced source localization. |
S. K. Yadav; N. V. George; |
1010 | Sparse Array Source Enumeration Via Coarray Subspace Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes sparse array source enumeration via coarray subspace optimization (SASE-CSO). |
C. -L. Liu; |
1011 | The Prototype Co-Prime Array with A Robust Difference Co-Array Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new sparse co-prime array design that achieves a higher number of degrees-of-freedom for direction-of-arrival (DOA) estimation. |
A. M. A. Shaalan; J. Du; |
1012 | Doa Estimation Via Coarray Tensor Completion with Missing Slices Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a coarray tensor completion-based direction-of-arrival (DOA) estimation method is proposed for coprime planar array. |
H. Zheng; C. Zhou; A. L. F. de Almeida; Y. Gu; Z. Shi; |
1013 | Half Inverted Nested Arrays with Large Hole-Free Fourth-Order Difference Co-Arrays Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes the half inverted nested array (HINA), which consists of a nested array and an inverted, scaled, and shifted nested array. |
Y. -P. Chen; C. -L. Liu; |
1014 | Spherical Convolutional Recurrent Neural Network for Real-Time Sound Source Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a spherical convolutional recurrent neural network that utilizes Deepsphere, a graph-based spherical convolutional neural network, employing the steered response power with phase transform (SRP-PHAT) power maps as input features for real-time robust sound source DOA estimation and tracking applications. |
T. Zhong; I. M. Vel�zquez; Y. Ren; H. M. P. Meana; Y. Haneda; |
1015 | Audio-Visual Tracking of Multiple Speakers Via A PMBM Filter Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an Audio-Visual Possion Multi-Bernoulli Mixture Filter (AV-PMBM) that can not only predict the number of speakers but also give accurate estimation of their states. |
J. Zhao; et al. |
1016 | Floor Plan Reconstruction with High-Precision Rf-Based Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we leverage advanced RF-based inertial tracking and present a high-accuracy and low-cost floor plan reconstruction system. |
G. Zhu; C. Wu; B. Wang; K. J. Ray Liu; |
1017 | Partial Arithmetic Consensus Based Distributed Intensity Particle Flow SMC-PHD Filter for Multi-Target Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we extend IPF-SMC-PHD filter to distributed setting, and develop a novel consensus method for fusing the estimates from individual sensors, based on Arithmetic Average (AA) fusion. |
P. Wu; J. Zhao; S. Goudarzi; W. Wang; |
1018 | Multi-Modal Recurrent Fusion for Indoor Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper considers indoor localization using multi-modal wireless signals including Wi-Fi, inertial measurement unit (IMU), and ultra-wideband (UWB). By formulating the localization as a multi-modal sequence regression problem, a multi-stream recurrent fusion method is proposed to combine the current hidden state of each modality in the context of recurrent neural networks while accounting for the modality uncertainty which is directly learned from its own immediate past states. |
J. Yu; P. Wang; T. Koike-Akino; P. V. Orlik; |
1019 | Improving Joint Sparse Hyperspectral Unmixing By Simultaneously Clustering Pixels According To Their Mixtures Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a novel hierarchical Bayesian model for sparse regression problem to use in semi-supervised hyperspectral unmixing which assumes the signal recorded in each hyperspectral pixel is a linear combination of members of the spectral library contaminated by an additive Gaussian noise. |
S. F. Seyyedsalehi; H. R. Rabiee; |
1020 | Global Evolution Neural Network for Segmentation of Remote Sensing Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Secondly, attention-based networks often only focus on weighting different features of a single sample but ignore the correlation of all samples in training set, thus leading to the loss of global information. To address above issues, we propose two simple yet effective global evolution strategies. |
X. Geng; et al. |
1021 | Spectral-Spatial Symmetrical Aggregation Cross-Linking Multi-Modal Data Fusion Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a spectral-spatial symmetrical aggregation cross-linking network (SACLNet) is developed for multi-modal data classification, which contains three modules as follows. |
J. Wang; J. Li; X. Tan; |
1022 | Relation Discovery in Nonlinearly Related Large-Scale Settings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We put forth a novel method that utilizes a radial basis function (RBF) to tackle curse-of-dimensionality in complex systems. |
A. Vosoughi; A. DSouza; A. Abidin; A. Wism�ller; |
1023 | Acoustic Imaging Aboard The International Space Station (ISS): Challenges and Preliminary Results Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present our preliminary results with a first-of-a-kind acoustic imaging experiment performed aboard the ISS, highlighting the difference between simulations, laboratory measurements, and in-space experiments. |
L. Bondi; et al. |
1024 | Conjugate Augmented Spatial-Temporal Near-Field Sources Localization with Cross Array Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A new near-field source localization method is proposed for two-dimensional (2-D) direction-of-arrival (DOA) and range estimation based on a symmetrical cross array. |
Z. Jiang; H. Chen; W. Liu; Y. Tian; G. Wang; |
1025 | Parametric Models for Doa Trajectory Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a signal model that captures the linear variations in DOA within a block. |
R. Pandey; S. Nannuru; |
1026 | Joint Source Localization and Association Through Overcomplete Representation Under Multipath Propagation Environment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By focusing on the limitation of the prior information in practical applications, we propose a target localization and association method based on iterative optimization with semi-unitary constraint and eigen-decomposition techniques. |
Y. Liu; Z. -W. Tan; A. W. H. Khong; H. Liu; |
1027 | Semidefinite Relaxation Method for Moving Object Localization Using A Stationary Transmitter at Unknown Position Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper addresses the multistatic localization of a moving object in position and velocity using time delay (TD) and Doppler frequency shift (DFS) measurements, where the position of the transmitter is unknown and has not yet been synchronized with the receivers. |
R. Zheng; G. Wang; K. C. Ho; L. Huang; |
1028 | Underdetermined Two-Dimensional Localization for Wideband Sources Based on Distributed Sensor Array Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the underdetermined two dimensional (2-D) source localization problem for wideband sources based on a distributed sensor array network, where a sparse sub-array is placed on each observation platform and the source number is larger than the sensor number of each sub-array. |
H. Wu; Q. Shen; W. Liu; Y. Liang; |
1029 | Direct Localization: An Ising Model Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we use a compressed sensing framework in the direct localization technique to estimate the location of a user in an indoor multipath environment. |
S. Akbari; S. Valaee; |
1030 | Transient Detection with Unknown Statistics Via Source Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we consider the scenario where an appropriate statistical description of our observations is not available, neither before nor after the transient we are trying to detect. |
A. Finelli; P. Willett; Y. Bar-Shalom; S. Marano; |
1031 | Identification of Pulse Streams Of Unknown Shape From Time Encoding Machine Samples Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an algorithm for the resolution of delayed and overlapping pulses of a common unknown shape from multi-channel measurements. |
M. Kalra; Y. Bresler; K. Lee; |
1032 | Exact Sparse Super-Resolution Via Model Aggregation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel Bayesian approach based on the model aggregation idea that can generate an exact sparse estimate, and maintain the required structures of the support. |
H. Yu; H. Qiao; |
1033 | A CRLB Analysis of AoA Estimation Using Bluetooth 5 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the performance of AoA estimation using Bluetooth 5 is not thoroughly understood at present. In this paper, a Cram�r-Rao lower bound (CRLB) model, taking into account Constant Tone Extension (CTE) signals firstly adopted by Bluethooth 5, is proposed to theoretically analyze its performance given different uniform antenna arrays, such as linear, rectangular and circular arrays, and on these grounds, the effects of the number of antennas, inter antenna distance, incident angle, and CTE parameters on AoA estimation are carefully investigated. |
W. Shi; B. Huang; K. Sun; |
1034 | Cramer-Rao Bound Analysis of Distributed DOA Estimation Exploiting Mixed-Precision Covariance Matrix Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the Cramer-Rao bound of the distributed direction-of-arrival (DOA) estimation problem where the covariance matrix is formulated in a mixed-precision manner. |
M. W. T. S. Chowdhury; Y. D. Zhang; |
1035 | Cram�r-Rao Bound and Antenna Selection Optimization for Dual Radar-Communication Design Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose antenna selection as means to ensure a full-rank beamforming matrix, and also select the communication channels so that high communication rate can be achieved. |
Z. Xu; F. Liu; A. Petropulu; |
1036 | Information Theoretic Limits For Standard and One-Bit Compressed Sensing with Graph-Structured Sparsity Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the information theoretic lower bound on the necessary number of samples needed for recovering a sparse signal under different compressed sensing settings. |
A. Barik; J. Honorio; |
1037 | The Data/Identity Tradeoff with Censored Sensors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we formalize the test statistic based on censored and quantized data. |
Z. Sutton; P. Willett; S. Marano; |
1038 | Double-RIS Versus Single-RIS Aided Systems: Tensor-Based Mimo Channel Estimation and Design Perspectives Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider a double-RIS (D-RIS)-aided MIMO system and propose an alternating least-squares-based channel estimation method by exploiting the Tucker2 tensor structure of the received signals. |
K. Ardah; S. Gherekhloo; A. L. F. de Almeida; M. Haardt; |
1039 | Efficient Two-Stage Beam Training and Channel Estimation for Ris-Aided Mmwave Systems Via Fast Alternating Least Squares Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a two-stage beam training and a channel estimation based on fast alternating least squares (FALS) for reconfigurable intelligent surface (RIS)-aided millimeter-wave systems. |
H. Chung; S. Kim; |
1040 | Deep Joint Source-Channel Coding for Wireless Image Transmission with Adaptive Rate Control Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We present a novel adaptive deep joint source-channel coding (JSCC) scheme for wireless image transmission. |
M. Yang; H. -S. Kim; |
1041 | Deep Sequential Beamformer Learning for Multipath Channels in Mmwave Communication Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address multipath beamformer design via deep learning, which is increasingly used for channel estimation and end-to-end communication. |
A. Sant; A. Abdi; J. Soriaga; |
1042 | Data-Driven Optimization for Zero-Delay Lossy Source Coding with Side Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a data-driven architecture for zero-delay lossy source coding with side information (i.e., Wyner-Ziv coding) for sources with memory. |
E. Domanovitz; D. Severo; A. Khisti; W. Yu; |
1043 | Distributed Image Transmission Using Deep Joint Source-Channel Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle this, we need to consider the common information across two stereo images as well as the differences between two transmission channels. In this case, we propose a deep neural networks solution that includes lightweight edge encoders and a powerful center decoder. |
S. Wang; K. Yang; J. Dai; K. Niu; |
1044 | Adaptive Wireless Power Allocation with Graph Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of power control in wireless networks, consisting of multiple transmitter-receiver pairs communicating with each other over a single shared wireless medium. |
N. NaderiAlizadeh; M. Eisen; A. Ribeiro; |
1045 | Restless Multi-Armed Bandits Under Exogenous Global Markov Process Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics, where an unknown exogenous global Markov process governs the rewards distribution of each arm. |
T. Gafni; M. Yemini; K. Cohen; |
1046 | Byzantine-Robust and Communication-Efficient Distributed Non-Convex Learning Over Non-IID Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a compressed robust stochastic model aggregation (CRSA) method, which applies the idea of robust stochastic model aggregation to achieve Byzantine-robustness over non-IID data, while compresses the transmitted messages so as to achieve communication efficiency. |
X. He; H. Zhu; Q. Ling; |
1047 | Communication-Efficient Online Federated Learning Framework for Nonlinear Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a solution, this paper presents a partial-sharing-based online federated learning framework (PSO-Fed) that enables clients to update their local models using continuous streaming data and share only portions of those updated models with the server. |
V. C. Gogineni; S. Werner; Y. -F. Huang; A. Kuh; |
1048 | S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To reduce the sparse gradient communication overhead, we propose Sparse-Sketch Reducer (S2 Reducer), a novel sketch-based sparse gradient aggregation method with convergence guarantees. |
K. Ge; Y. Fu; Y. Zhang; Z. Lai; X. Deng; D. Li; |
1049 | A Data-Driven Quantization Design for Distributed Testing Against Independence with Communication Constraints Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper studies the problem of designing a quantizer (encoder) for the task of distributed detection of independence subject to one-side communication (limited bits) constraints. |
S. Espinosa; J. F. Silva; P. Piantanida; |
1050 | Power Allocation for Wireless Federated Learning Using Graph Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose a data-driven approach for power allocation in the context of federated learning (FL) over interference-limited wireless networks. |
B. Li; A. Swami; S. Segarra; |
1051 | Federated Multi-Armed Bandit Via Uncoordinated Exploration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper focuses on overcoming the difficulty of exploration in FMAB problems, and it proposes a novel federated upper confidence bound (UCB) algorithm that requires uncoordinated exploration (UE) decisions by the agents. |
Z. Yan; Q. Xiao; T. Chen; A. Tajer; |
1052 | Byzantine-Resilient Decentralized Collaborative Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the decentralized learning problem over communication networks, in which worker nodes collaboratively train a machine learning model by exchanging model parameters with neighbors, but a fraction of nodes are corrupted by a Byzantine attacker and could conduct malicious attacks. |
J. Xu; S. -L. Huang; |
1053 | Adaptive Identification of Underwater Acoustic Channel with A Mix of Static and Time-Varying Parameters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a debiased fLBF algorithm which exploits the fact that only a part of the system parameters are time-varying. |
M. Niedzwiecki; A. Gancza; L. Shen; Y. Zakharov; |
1054 | Iterative Channel Estimation and Data Detection Algorithm For OTFS Modulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we design an iterative channel estimation and data detection algorithm in delay-Doppler domain for orthogonal time frequency space (OTFS) system by taking advantage of the sparse nature of the channel in this domain. |
R. Ouchikh; A. A�ssa-El-Bey; T. Chonavel; M. Djeddou; |
1055 | An Asymptotically Optimal Approximation of The Conditional Mean Channel Estimator Based on Gaussian Mixture Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper investigates a channel estimator based on Gaussian mixture models (GMMs). |
M. Koller; B. Fesl; N. Turan; W. Utschick; |
1056 | Low Complexity Equalization for Afdm In Doubly Dispersive Channels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Affine Frequency Division Multiplexing (AFDM), which is based on discrete affine Fourier transform (DAFT), has recently been proposed for reliable communication in high-mobility … |
A. Bemani; N. Ksairi; M. Kountouris; |
1057 | CSI Clustering with Variational Autoencoding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to use a variational autoencoder to group unlabeled channel state information with respect to the model order in the variational autoencoder latent space in an unsupervised manner. |
M. Baur; M. W�rth; M. Koller; V. -C. Andrei; W. Utschick; |
1058 | Massive Unsourced Random Access Based on Bilinear Vector Approximate Message Passing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a new algorithmic solution to the massive unsourced random access (mURA) problem. |
R. Ayachi; M. Akrout; V. Shyianov; F. Bellili; A. Mezghani; |
1059 | Optimal Qos-Aware Network Slicing for Service-Oriented Networks with Flexible Routing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the network slicing problem which attempts to map multiple customized virtual network requests (also called services) to a common shared network infrastructure and allocate network resources to meet diverse quality of service (QoS) requirements. |
W. -K. Chen; Y. -F. Liu; Y. -H. Dai; Z. -Q. Luo; |
1060 | Byzantine-Resilient Decentralized Resource Allocation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Compared with its centralized counterpart, a decentralized algorithm enjoys better scalability when the network is large-scale, but is more vulnerable when some of the agents are malicious and send wrong messages during the optimization process. We use the Byzantine attack model to describe these malicious actions, and propose a novel Byzantine-resilient decentralized resource allocation algorithm, abbreviated as BREDA. |
R. Wang; Y. Liu; Q. Ling; |
1061 | Stability Analysis of Unfolded WMMSE for Power Allocation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Therefore, it is essential that the output power allocation of these algorithms is stable with respect to input perturbations, to the extent that the variations in the output are bounded for bounded variations in the input. In this paper, we focus on UWMMSE � a modern algorithm leveraging graph neural networks �, and illustrate its stability to additive input perturbations of bounded energy through both theoretical analysis and empirical validation. |
A. Chowdhury; F. Gama; S. Segarra; |
1062 | Monotonic Generalized Nash Games with Application to The Management of Energy-Aware Aloha Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we analyze a family of monotonic generalized games (not necessarily convex). |
W. Wang; A. Leshem; |
1063 | Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: For wireless networks with dense connectivity, we propose a distributed scheme for link sparsification with graph convolutional networks (GCNs), which can reduce the scheduling overhead while keeping most of the network capacity. |
Z. Zhao; A. Swami; S. Segarra; |
1064 | A Performance Analysis for Multi-Ris-Assisted Full Duplex Wireless Communication System Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, an analytical framework of a RIS-aided full duplex (FD) communication network consisting of a FD-access point (AP) that communicates with an uplink and a down-link users simultaneously is provided. |
F. Karim; B. Hazarika; S. K. Singh; K. Singh; |
1065 | Joint Beam Selection and Precoding Based on Differential Evolution for Millimeter-Wave Massive MIMO Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a differential evolution (DE)-based beam selection algorithm and an improved QR precoder to reduce power consumption and increase the performance of the systems. |
Y. Liu; Y. Hou; J. Wei; Y. Zhang; J. Zhang; T. Zhang; |
1066 | A Novel Negative L1 Penalty Approach for Multiuser One-Bit Massive MIMO Downlink with PSK Signaling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an efficient negative l1 penalty approach for finding a high-quality solution of the considered problem. |
Z. Wu; B. Jiang; Y. -F. Liu; Y. -H. Dai; |
1067 | A Set-Theoretic Approach to Mimo Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a set-theoretic framework for MIMO detection. |
J. Fink; R. L. G. Cavalcante; Z. Utkovski; S. Stanczak; |
1068 | Designing A QAM Signal Detector for Massive Mimo Systems Via PS-ADMM Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents an efficient quadrature amplitude modulation (QAM) signal detector for massive multiple-input multiple-output (MIMO) communication systems via the penalty-sharing alternating direction method of multipliers (PS-ADMM). |
Q. Zhang; X. Zhao; J. Wang; Y. Wang; |
1069 | Power-Efficient Hybrid MIMO Receiver with Task-Specific Beamforming Using Low-Resolution ADCs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a power-efficient hybrid MIMO receiver with dedicated beamforming to mitigate spatial interferers in congested environments, utilizing low-quantization rate ADCs, jointly optimizing the analog and digital processing using task-specific quantization techniques. |
T. Zirtiloglu; N. Shlezinger; Y. C. Eldar; R. Tugce Yazicigil; |
1070 | Mimo Detection By Variational Posterior Inference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we examine the application of variational inference (VI) to MIMO detection. |
J. Liu; M. Shao; W. -K. Ma; |
1071 | Controlling Smart Propagation Environments: Long-Term Versus Short-Term Phase Shift Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We analyze and compare RIS designs based on long-term and short-term channel statistics in terms of coverage probability and ergodic rate. |
T. V. Chien; et al. |
1072 | Deep Actor-Critic for Continuous 3D Motion Control in Mobile Relay Beamforming Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a model-free, continuous control actor-critic approach that can be easily applied to 2D and 3D motion with the same complexity. |
S. Evmorfos; A. P. Petropulu; |
1073 | Aerial Base Station Placement Leveraging Radio Tomographic Maps Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A convex optimization approach is presented to minimize the number of required ABSs while ensuring that the UAVs do not enter no-fly regions. |
D. Romero; P. Q. Viet; G. Leus; |
1074 | Atomic Norm Based Localization and Orientation Estimation for Millimeter-Wave MIMO OFDM Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Herein, an atomic norm based method for accurately estimating the location and orientation of a target from millimeter-wave multi-input-multi-output (MIMO) orthogonal frequency-division multiplexing (OFDM) signals is presented. |
J. Li; M. F. Da Costa; U. Mitra; |
1075 | Estimation Of Channels In Systems With Intelligent Reflecting Surfaces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider channel estimation for systems equipped with an intelligent reflecting surface (IRS). |
M. Joham; H. Gao; W. Utschick; |
1076 | Distributed Hybrid Beamforming for Mmwave Cell-Free Massive MIMO Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a distributed hybrid beamforming concept for mmWave cell-free massive MIMO systems, which supports multiantenna or even hybrid array based User Equipments (UEs) and multi-stream transmissions. |
N. Song; T. Yang; |
1077 | Quantization-Aware Precoding For Mu-Mimo With Limited-Capacity Fronthaul Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study an AAS used for precoded down-link transmission over a multi-user multiple-input multiple-output (MU-MIMO) channel. |
Y. Khorsandmanesh; E. Bj�rnson; J. Jald�n; |
1078 | An Online Throughput Maximization Algorithm for Green Coordinated Multi-Point Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work incorporates the on-grid energy into a green coordinated multi-point (CoMP) system to handle the volatile arrival of green energy. |
Y. Dong; H. Zhang; J. Li; F. R. Yu; S. Guo; V. C. M. Leung; |
1079 | Efficiently and Globally Solving Joint Beamforming and Compression Problem in The Cooperative Cellular Network Via Lagrangian Duality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The proposed algorithm judiciously exploits the special structure of the Karush-Kuhn-Tucker (KKT) conditions of the considered problem and finds the solution that satisfies the KKT conditions via two fixed-point iterations. |
X. Fan; Y. -F. Liu; L. Liu; |
1080 | Cell-Free Massive Mimo: Exploiting The Wax Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The results on WAX decomposition are only available for random channel matrices with-out specific structures, while in a cell-free massive MIMO scenario the channel can have sparse structures. In this work, we study the applicability of WAX decomposition to cell-free massive MIMO with its implications to the above-mentioned trade-off. |
J. V. Alegr�a; J. Huang; F. Rusek; |
1081 | Learning Structured Sparsity For Time-Frequency Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Aiming at eliminating CTs meanwhile preserving resolution as high as possible in various hard cases, we propose a new U-Net aided iterative shrinkage-thresholding algorithm (U-ISTA), where unfolded ISTA with structure-aware thresholds is exploited to reconstruct near-ideal TFD. |
L. Jiang; H. Zhang; L. Yu; |
1082 | Unlimited Sampling with Sparse Outliers: Experiments with Impulsive and Jump or Reset Noise Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such a noise model has not been considered in literature and can lead to the breakdown of the conventional recovery methods. To overcome this problem, we present a mathematically guaranteed algorithm that is based on spectral estimation. |
A. Bhandari; |
1083 | Learning Approach For Fast Approximate Matrix Factorizations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We derive a closed-form formulation for the gradient of the training problem, enabling us to use efficient gradient-based algorithms. |
H. Yu; Z. Qin; Z. Zhu; |
1084 | Parameter Estimation in Sparse Inverse Problems Using Bernoulli-Gaussian Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Sparse coding is now one of the state-of-art approaches for solving inverse problems. In combination with (Fast) Iterative Shrinkage Thresholding Algorithm (ISTA), among other … |
P. Barbault; M. Kowalski; C. Soussen; |
1085 | Sparse Recovery of Acoustic Waves Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a general model for acoustic wave decomposition (AWD) on a rigid surface for a general microphone array configuration. |
M. Mansour; |
1086 | Nonlinear Signal Decomposition Based on Block Sparse Approximation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose here a nonlinear decomposition approach that properly separates the frequency content of signals. |
E. H. S. Diop; K. Skretting; |
1087 | Block-Activated Algorithms For Multicomponent Fully Nonsmooth Minimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By contrast, in the fully nonsmooth case, few block-activated methods are available and little effort has been devoted to assessing them. Our goal is to shed more light on the implementation, the features, and the behavior of these algorithms, compare their merits, and provide machine learning and image recovery experiments illustrating their performance. |
M. N. B�i; P. L. Combettes; Z. C. Woodstock; |
1088 | Block-Coordinate Frank-Wolfe Algorithm And Convergence Analysis For Semi-Relaxed Optimal Transport Problem Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a superior alternative, we propose a fast block-coordinate Frank-Wolfe (BCFW) algorithm for a convex semi-relaxed OT. |
T. Fukunaga; H. Kasai; |
1089 | An Implicit Gradient-Type Method for Linearly Constrained Bilevel Problems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop an implicit gradient-type (IG-AL) algorithm for bilevel optimization with strongly convex linear inequality constrained lower-level problems. |
I. Tsaknakis; P. Khanduri; M. Hong; |
1090 | Screen & Relax: Accelerating The Resolution Of Elastic-Net By Safe Identification of The Solution Support Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a procedure to accelerate the resolution of the well-known Elastic-Net problem. |
T. Guyard; C. Herzet; C. Elvira; |
1091 | Node-Screening Tests For The L0-Penalized Least-Squares Problem Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel screening methodology to safely discard irrelevant nodes within a generic branch-and-bound (BnB) algorithm solving the l0-penalized least-squares problem. |
T. Guyard; C. Herzet; C. Elvira; |
1092 | Proximal-Based Adaptive Simulated Annealing for Global Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a new SA approach that selects the cooling schedule on the fly. |
T. Guilmeau; E. Chouzenoux; V. Elvira; |
1093 | Graphon-Aided Joint Estimation of Multiple Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We consider the problem of estimating the topology of multiple networks from nodal observations, where these networks are assumed to be drawn from the same (unknown) random graph model. |
M. Navarro; S. Segarra; |
1094 | Exploring Deeper Graph Convolutions for Semi-Supervised Node Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work introduces a simple yet effective idea of feature gating over graph convolution layers to facilitate deeper graph neural networks and address oversmoothing. |
A. Tiwari; R. Das; S. Raman; |
1095 | Dynamic Portfolio Cuts: A Spectral Approach to Graph-Theoretic Diversification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Traditional methods for estimating asset-return covariance operate under the assumption of statistical time-invariance, and are thus unable to appropriately infer the underlying structure of the market graph. To this end, this work introduces a class of graph spectral estimators which cater for the nonstationarity inherent to asset price movements, as a basis to represent the time-varying interactions between assets through a dynamic spectral market graph. |
A. Arroyo; B. Scalzo; L. Stankovic; D. P. Mandic; |
1096 | Stability of Neural Networks on Manifolds to Relative Perturbations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hence, in this paper, we analyze the stability properties of convolutional neural networks on manifolds to understand the stability of GNNs on large graphs. |
Z. Wang; L. Ruiz; A. Ribeiro; |
1097 | Ada-STNet: A Dynamic AdaBoost Spatio-Temporal Network for Traffic Flow Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an AdaBoost Spatio-temporal Network (Ada-STNet). |
J. Sun; J. Li; C. Wu; Z. Tang; C. Wu; |
1098 | Label Propagation Across Graphs: Node Classification Using Graph Neural Tangent Kernels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Under the implicit assumption that the testing and training graphs come from similar distributions, our goal is to develop a labeling function that generalizes to unobserved connectivity structures. |
A. Bayer; A. Chowdhury; S. Segarra; |
1099 | Distributed Particle Filters for State Tracking on The Stiefel Manifold Using Tangent Space Statistics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a novel distributed diffusion algorithm for tracking the state of a dynamic system that evolves on the Stiefel manifold. |
C. J. Bordin; C. G. de Figueredo; M. G. S. Bruno; |
1100 | Human Decision Making with Bounded Rationality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, instead of assuming that a human selects the optimal action with probability one, we employ a bounded rationality choice model where all the actions are candidates for selection, but better options are chosen with higher probabilities. |
B. Geng; Q. Li; P. K. Varshney; |
1101 | Unrolling Particles: Unsupervised Learning of Sampling Distributions Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Leveraging the framework of algorithm unrolling, we model the sampling distribution as a multivariate normal, and we use neural networks to learn both the mean and the covariance. |
F. Gama; N. Zilberstein; R. G. Baraniuk; S. Segarra; |
1102 | Scalable Data Association and Multi-Target Tracking Under A Poisson Mixture Measurement Process Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Results show that our proposed method can provide a robust solution in highly dynamic detection probability environments. |
Q. Li; J. Liang; S. Godsill; |
1103 | Online Learning for Latent Yule-Simon Processes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we focus on learning the properties of unobservable Yule-Simon processes which control the dynamics of sequential sensor measurements. |
A. A. Hensley; P. M. Djuric; |
1104 | Counting The Number of Different Scaling Exponents in Multivariate Scale-Free Dynamics: Clustering By Bootstrap in The Wavelet Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Elaborating on earlier work aiming to test the hypothesis that all exponents are equal, we intend here to count the number of different scaling exponents from a single finite size multivariate time series. |
C. -G. Lucas; P. Abry; H. Wendt; G. Didier; |
1105 | On The Acquisition of Stationary Signals Using Uniform ADCS Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we consider the acquisition of stationary signals using uniform analog-to-digital converters (ADCs), i.e., employing uniform sampling and scalar uniform quantization. |
P. Neuhaus; N. Shlezinger; M. D�rpinghaus; Y. C. Eldar; G. Fettweis; |
1106 | Data-Driven Algorithms for Gaussian Measurement Matrix Design in Compressive Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we provide two data-driven algorithms for learning compressive sensing measurement matrices with Gaussian entries. |
Y. Sun; J. Scarlett; |
1107 | Scattering Statistics of Generalized Spatial Poisson Point Processes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a machine learning model for the analysis of randomly generated discrete signals, modeled as the points of an inhomogeneous, compound Poisson point process. |
M. Perlmutter; J. He; M. Hirn; |
1108 | Regularization Using Denoising: Exact and Robust Signal Recovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For example, it is not known if PnP can in theory recover a signal from few noiseless measurements as in classical compressed sensing and if the recovery is robust. We explore these questions in this work and present some theoretical and experimental results. |
R. G. Gavaskar; K. N. Chaudhury; |
1109 | Graph-Structured Sparse Regularization Via Convex Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel convex method for the graph-structured sparse recovery. |
H. Kuroda; D. Kitahara; |
1110 | Decentralized Bilevel Optimization for Personalized Client Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a decentralized client adaptation strategy for personalized learning by taking local client data structures into account. |
S. Lu; X. Cui; M. S. Squillante; B. Kingsbury; L. Horesh; |
1111 | Extreme-Point Pursuit for Unit-Modulus Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In our recent study for massive MIMO precoding, we devised a penalty method for unit-modulus optimization. In this paper, we revisit this penalty method in several other unit-modulus applications in signal processing. |
M. Shao; Q. Dai; W. -K. Ma; |
1112 | Generalized Matching Pursuits for The Sparse Optimization of Separable Objectives Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we are concerned with the case of a general objective that is separable over observed data points, which encompasses most problems of practical interest: least-squares, logistic, and robust regression problems, and the class of generalized linear models. We propose efficient generalizations of Forward and Backward Stepwise Regression for this case, which take advantage of special structure in the Hessian matrix and are based on a locally quadratic approximation of the objective. |
S. Ament; C. Gomes; |
1113 | Delta Distancing: A Lifting Approach to Localizing Items from User Comparisons Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the problem of embedding a set of items using paired comparisons from a set of known users. |
A. D. McRae; et al. |
1114 | Dual Path Graph Convolutional Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To enjoy the benefits from both residual connections and dense connections and compensate for the drawbacks from each other, we propose Dual Path Graph Convolutional Networks (DPGCNs), which exploit a new topology of connection paths internally. |
Y. Li; Y. Hu; Y. Zhang; |
1115 | On The Stability of Low Pass Graph Filter with A Large Number of Edge Rewires Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Assuming the graph filter is low pass, we show that the stability of the filter depends on perturbation to the community structure. |
H. -S. Nguyen; Y. He; H. -T. Wai; |
1116 | Spatio-Temporal Graph Complementary Scattering Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To benefit from both sides, this work proposes a novel complementary mechanism to organically combine the spatio-temporal graph scattering transform and neural networks, resulting in the proposed spatio-temporal graph complementary scattering networks (ST-GCSN). |
Z. Cheng; S. Chen; Y. Zhang; |
1117 | Convolutional Filtering in Simplicial Complexes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes convolutional filtering for data whose structure can be modeled by a simplicial complex (SC). |
E. Isufi; M. Yang; |
1118 | Annihilation Filter Approach for Estimating Graph Dynamics from Diffusion Processes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach for estimating graph diffusion processes using annihilation filters from a finite set of observations of the diffusion process made at regular intervals. |
A. Venkitaraman; P. Frossard; |
1119 | Learning Gaussian Graphical Models with Differing Pairwise Sample Sizes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Incorporating a projection step in the estimation of the covariance matrix, we develop a convex method for learning the neighborhood of any node. |
L. Zheng; G. I. Allen; |
1120 | R-Local Unlabeled Sensing: Improved Algorithm and Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a proximal alternating minimization algorithm for the general unlabeled sensing problem that provably converges to a first order stationary point. |
A. A. Abbasi; A. Tasissa; S. Aeron; |
1121 | Federated Over-Air Robust Subspace Tracking from Missing Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we study RST-miss to the setting where the data is federated and when the over-air data communication modality is used for information exchange between the K peer nodes and the central server. |
P. Narayanamurthy; N. Vaswani; A. Ramamoorthy; |
1122 | On Continuous-Domain Inverse Problems with Sparse Superpositions of Decaying Sinusoids As Solutions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study a family of inverse problems in which a continuous-domain object is reconstructed from a finite number of noisy linear measurements. We study regularization methods for solving these problems in which the regularizers promote sparsity in the frequency domain. |
R. Parhi; R. D. Nowak; |
1123 | Multiplication-Avoiding Variant of Power Iteration with Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce multiplication-avoiding power iteration (MAPI). |
H. Pan; D. Badawi; R. Miao; E. Koyuncu; A. E. Cetin; |
1124 | Bona Fide Riesz Projections for Density Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on Riesz bases and propose a projection operator that, in contrast to previous works, guarantees the bona fide properties for the estimate, namely, non-negativity and total probability mass 1. |
P. D. Aguila Pla; M. Unser; |
1125 | Blind Modulo Analog-to-Digital Conversion of Vector Processes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As an alternative, the recently proposed modulo-ADC architecture can in principle require dramatically fewer bits in the conversion to obtain the target fidelity, but requires that spatiotemporal information be known and explicitly taken into account by the analog and digital processing in the converter, which is frequently impractical. Building on our recent work, we address this limitation and develop a blind version of the architecture that requires no such knowledge in the converter. |
A. Weiss; E. Huang; O. Ordentlich; G. W. Wornell; |
1126 | Joint Radar-Communications Processing from A Dual-Blind Deconvolution Perspective Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we exploit the sparsity of the channel to solve DBD by casting it as an atomic norm minimization problem. |
E. Vargas; K. V. Mishra; R. Jacome; B. M. Sadler; H. Arguello; |
1127 | Fast Multiscale Diffusion On Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Considering the truncated Chebyshev polynomial approximation of the exponential, we derive a tightened bound on the approximation error, allowing thus for a better estimate of the polynomial degree that reaches a prescribed error. |
S. Marcotte; et al. |
1128 | Adaptive Variational Nonlinear Chirp Mode Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Variational nonlinear chirp mode decomposition (VNCMD) is a recently introduced method for nonlinear chirp signal decomposition that has aroused notable attention in various … |
H. Liang; X. Ding; A. Jakobsson; X. Tu; Y. Huang; |
1129 | Differentiate-and-Fire Time-Encoding of Finite-Rate-of-Innovation Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a new time-encoding machine, namely, differentiate-and-fire time-encoding machine (DIF-TEM) inspired by the functioning of the human visual system. |
A. J. Kamath; C. S. Seelamantula; |
1130 | Graph Learning Information Criterion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a parameter selection method for graph learning. |
K. Yamada; Y. Tanaka; |
1131 | Embedding Signals on Graphs with Unbalanced Diffusion Earth Mover�s Distance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover�s distance (EMD) with a geodesic cost over the underlying graph. |
A. Tong; et al. |
1132 | Message Passing-Based Cooperative Localization with Embedded Particle Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in large and dense cooperative localization networks, particle-based BP suffers from particle degeneracy. In this paper, we propose a new method that combines particle-based BP and particle flow (PF) and can avoid this detrimental effect. |
L. Wielandner; E. Leitinger; F. Meyer; B. Teague; K. Witrisal; |
1133 | A Framework for Private Communication with Secret Block Structure Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Harnessing a block-sparse prior to recover signals through underdetermined linear measurements has been extensively shown to allow exact recovery in conditions where classical compressed sensing would provably fail. We exploit this result to propose a novel private communication framework where the secrecy is achieved by transmitting instances of an unidentifiable compressed sensing problem over a public channel. |
M. F. Da Costa; U. Mitra; |
1134 | LMS and NLMS Algorithms for The Identification of Impulse Responses with Intrinsic Symmetric or Antisymmetric Properties Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In applications involving system identification problems, some characteristics of the impulse response of the system to be identified are usually exploited to design adaptive algorithms with improved performance. |
J. Benesty; C. Paleologu; S. Ciochina; E. V. Kuhn; K. J. Bakri; R. Seara; |
1135 | Decentralized Learning in The Presence of Low-Rank Noise Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Starting from the centralized solution, we propose an algorithm that performs the oblique projection of the overall set of observations onto the signal subspace in an iterative and distributed manner. |
R. Nassif; V. Bordignon; S. Vlaski; A. H. Sayed; |
1136 | Adaptive Diffusion with Compressed Communication Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the ACTC (Adapt-Compress-Then-Combine) diffusion strategy, which leverages differential randomized compression to infuse the classical ATC strategy with the ability to handle compressed data. |
M. Carpentiero; V. Matta; A. H. Sayed; |
1137 | Joint Centrality Estimation and Graph Identification from Mixture of Low Pass Graph Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a mixture model of low pass filtered graph signals. |
Y. He; H. -T. Wai; |
1138 | Fairness-Aware Selective Sampling on Attributed Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, existing online learning and selective sampling algorithms are modified to be used with graphs that have nodal features. |
O. D. Kose; Y. Shen; |
1139 | A Simple Graph Neural Network Via Layer Sniffer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In light of this, we design a Layer Sniffer module that can combine the effects of the local node-level representation closeness extent and the global layer-level information attention. On this basis, we propose a simple Layer Sniffer Graph Neural Network (LSGNN) with a propagation scheme that can fuse neighborhood information of different receptive fields densely and adaptively. |
D. Zeng; L. Zhou; W. Liu; H. Qu; W. Chen; |
1140 | New Improved Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a refined criterion called EBICR where the �R� stands for robust. |
P. B. Gohain; M. Jansson; |
1141 | On The Use of Geodesic Triangles Between Gaussian Distributions for Classification Problems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a new classification framework for both first and second order statistics, i.e. mean/location and covariance matrix. |
A. Collas; F. Bouchard; G. Ginolhac; A. Breloy; C. Ren; J. . -P. Ovarlez; |
1142 | A Non-Convex Proximal Approach for Centroid-Based Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel variational approach for supervised classification based on transform learning. |
M. -H. Kahanam; L. Le-Brusquet; S. Martin; J. -C. Pesquet; |
1143 | Extending The Use of MDL for High-Dimensional Problems: Variable Selection, Robust Fitting, and Additive Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper demonstrates that MDL can be extended naturally to the high-dimensional setting, where the number of predictors p is larger than the number of observations n. |
Z. Wei; R. K. W. Wong; T. C. M. Lee; |
1144 | Clustering Complex Subspaces in Large Dimensions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In order to guarantee that distances between subspaces of different dimensions are comparable, we proposed to normalise the corresponding decision statistics with respect to their asymptotic mean and variance, assuming that (i) the dimensions of both the observation and the involved subspaces are large but comparable in magnitude and (ii) both subspaces are generated by the same statistical law. |
R. Pereira; X. Mestre; D. Gregoratti; |
1145 | Robust Classification with Flexible Discriminant Analysis in Heterogeneous Data Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Linear and Quadratic Discriminant Analysis are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. To fill this gap, this paper presents a new robust discriminant analysis where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. |
P. Houdouin; A. Wang; M. Jonckheere; F. Pascal; |
1146 | Residual Recovery Algorithm for Modulo Sampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a robust algorithm to recover the signal from the modulo samples which operates at lower sampling rate compared to existing techniques. |
E. Azar; S. Mulleti; Y. C. Eldar; |
1147 | Operator Formulation for Linear Transformations and Signal Estimation in The Joint Spatial-Slepian Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an operator formulation for linear transformations in the joint spatial-Slepian domain, which is enabled by the spatial-Slepian transform. |
A. Aslam; Z. Khalid; |
1148 | Sampling Set Selection for Graph Signals Under Arbitrary Signal Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a sampling set selection method for graph signals under arbitrary signal priors. |
J. Hara; Y. Tanaka; |
1149 | Determining Joint Periodicities in Multi-Time Data with Sampling Uncertainties Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a novel approach for determining a joint sparse spectrum from several non-uniformly sampled data sets, where each data set is assumed to have its own, and only partially known, sampling times. |
D. Svedberg; F. Elvander; A. Jakobsson; |
1150 | Unlimited Sampling with Local Averages Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By incorporating a modulo-hysteresis model, both in theory and in hardware, we present a guaranteed recovery algorithm for input reconstruction. |
D. Florescu; A. Bhandari; |
1151 | Modulo Event-Driven Sampling: System Identification and Hardware Experiments Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The goal of this work is to bridge the gap between theory and practice for a MEDS model. |
D. Florescu; A. Bhandari; |
1152 | Point-Mass Filter with Decomposition of Transient Density Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the paper, a novel functional decomposition of the transient density describing the system dynamics is proposed. |
P. Tichavsk�; O. Straka; J. Dun�k; |
1153 | Cramer-Rao Bound for The Time-Varying Poisson Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we consider a periodic time-varying Poisson model and develop the asymptotic Cramer-Rao bound. |
X. Rong; V. Solo; |
1154 | Model Selection Via Misspecified Cram�r-Rao Bound Minimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to use the misspecified Cram�r-Rao bound (MCRB) as a criterion for model selection. |
N. E. Rosenthal; J. Tabrikian; |
1155 | Robust Parameter Estimation Based on The K-Divergence Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present a new divergence, called $\mathcal{K}$-divergence, that involves a weighted version of the hypothesized log-likelihood function. |
Y. Sorek; K. Todros; |
1156 | A Convex Formulation for The Robust Estimation of Multivariate Exponential Power Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As an alternative, this paper presents a novel convex formulation for robustly estimating MEP parameters in the presence of multiplicative perturbations. |
N. Ouzir; J. -C. Pesquet; F. Pascal; |
1157 | Conditionally Factorized Variational Bayes with Importance Sampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a conditionally factorized variational family with an adjustable conditional structure and derive the corresponding coordinate ascent algorithm for optimization. |
R. Gan; S. Godsill; |
1158 | On The False Alarm Probability of The Normalized Matched Filter for Off-Grid Target Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Off-grid targets are known to induce a mismatch that dramatically impacts the detection probability of the popular Normalized Matched Filter. To overcome this problem, the unknown … |
P. Develter; J. Bosse; O. Rabaste; P. Forster; J. . -P. Ovarlez; |
1159 | A Two-Stream Information Fusion Approach to Abnormal Event Detection in Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to explicitly address different kinds of abnormal events by developing a two-stream fusion approach that integrates both geometry and image texture information. |
Y. Yang; Z. Fu; S. M. Naqvi; |
1160 | A Test for Conditional Correlation Between Random Vectors Based on Weighted U-Statistics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By avoiding determinants and inverses, the method presented displays promising robustness in small-sample regimes. |
M. Vil�; J. Riba; |
1161 | Semi-Supervised Standardized Detection of Periodic Signals with Application to Exoplanet Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a numerical methodology for detecting periodicities in unknown colored noise and for evaluating the �significance levels� (p-values) of the test statistics. |
S. Sulis; D. Mary; L. Bigot; |
1162 | Joint Normality Test Via Two-Dimensional Projection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The case of dependent samples has also been addressed, but only for scalar random processes. For this reason, we have proposed a joint normality test for multivariate time-series, extending Mardia�s Kurtosis test. |
S. ElBouch; O. J. J. Michel; P. Comon; |
1163 | Quickest Detection of Composite and Non-Stationary Changes with Application to Pandemic Monitoring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: The problem of quickest detection of a change in the distribution of a sequence of independent observations is considered. The prechange distribution is assumed to be known and … |
Y. Liang; V. V. Veeravalli; |
1164 | A Stimuli-Relevant Directed Dependency Index for Time Series Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we are interested in quantifying the effect that a given time series (e.g., an external stimuli) has upon the coupling strength between other time series. |
P. S. Baboukani; S. Theodoridis; J. �stergaard; |
1165 | Joint Inference of Multiple Graphs with Hidden Variables from Stationary Graph Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: First, many contemporary setups involve multiple related networks, and second, it is often the case that only a subset of nodes is observed while the rest remain hidden. Motivated by these facts, we introduce a joint graph topology inference method that models the influence of the hidden variables. |
S. Rey; A. Buciulea; M. Navarro; S. Segarra; A. G. Marques; |
1166 | Sparse-Group Log-Sum Penalized Graphical Model Learning For Time Series Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we investigate use of a sparse-group logsum penalty (LSP) instead of sparse-group lasso penalty. |
J. K. Tugnait; |
1167 | Wide-Sense Stationarity and Spectral Estimation for Generalized Graph Signal Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider a probabilistic model for graph signal processing (GSP) in a generalized framework where each vertex of a graph is associated with an element from a Hilbert space. |
X. Jian; W. P. Tay; |
1168 | Blind Extraction of Equitable Partitions from Graph Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we study a blind identification problem in which we aim to recover an equitable partition of a network without the knowledge of the network�s edges but based solely on the observations of the outputs of an unknown graph filter. |
M. Scholkemper; M. T. Schaub; |
1169 | Learning Sparse Graphs with A Core-Periphery Structure Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on learning sparse graphs with a core-periphery structure. |
S. Gurugubelli; S. P. Chepuri; |
1170 | Optimal Combination Policies for Adaptive Social Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper investigates the effect of combination policies on the performance of adaptive social learning in non-stationary environments. |
P. Hu; V. Bordignon; S. Vlaski; A. H. Saye; |
1171 | Seismic Fault Identification Using Graph High-Frequency Components As Input to Graph Convolutional Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Using graph high-frequency components as inputs to a graph convolutional network, we propose a method for detecting faults in seismic data. |
P. Palo; A. Routray; |
1172 | Distributed Graph Learning With Smooth Data Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose here a novel distributed graph learning algorithm, which permits to infer a graph from signal observations on the nodes under the assumption that the data is smooth on the target graph. |
I. C. M. Nobre; M. El Gheche; P. Frossard; |
1173 | AdverSparse: An Adversarial Attack Framework for Deep Spatial-Temporal Graph Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These models have demonstrated superior performance in various tasks. In this paper, we propose a sparse adversarial attack framework AdverSparse to illustrate that when only a few key connections are removed in such graphs, hidden spatial dependencies learned by such spatial-temporal models are significantly impacted, leading to various issues such as increasing prediction errors. |
J. Li; T. Zhang; S. Jin; M. Fardad; R. Zafarani; |
1174 | Multimodal Graph Signal Denoising Via Twofold Graph Smoothness Regularization with Deep Algorithm Unrolling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a denoising method of multimodal graph signals with twofold smoothness regularization. |
M. Nagahama; Y. Tanaka; |
1175 | Heterogeneous Graph Node Classification With Multi-Hops Relation Features Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Heterogeneous Attention (HAT) algorithm and use both node-based and path-based attention mechanisms to learn various types of nodes and edges on the KG. |
X. Xu; L. Lyu; H. Jin; W. Wang; S. Jia; |
1176 | Signal Recovery from Inconsistent Nonlinear Observations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address problems with inaccurate measurements, we propose solving a variational inequality relaxation which is guaranteed to possess solutions under mild conditions and which coincides with the original problem if it happens to be consistent. |
P. L. Combettes; Z. C. Woodstock; |
1177 | Perfect Reconstruction of Classes of Non-Bandlimited Signals from Projections with Unknown Angles Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the 2D tomography problem for a finite number of point sources, where the line integral projections are taken at unknown angles. |
R. Wang; R. Alexandru; P. L. Dragotti; |
1178 | Short-and-Sparse Deconvolution Via Rank-One Constrained Optimization (Roco) Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we formulate SaSD as a non-convex optimization with a rank-one matrix constraint, hence referred to as Rank-One Constrained Optimization (ROCO). |
C. Cheng; W. Dai; |
1179 | Blind Equalization of Moving Average Channels Over Galois Fields Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We derive two different estimation approaches: One is based on sequential identification of factors of the channel�s associated polynomial; The other is based on an attempted factorization of the empirical characteristic function of the channel�s output signal. |
A. Yeredor; |
1180 | Sparse Subspace Tracking in High Dimensions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We studied the problem of sparse subspace tracking in the high-dimensional regime where the dimension is comparable to or much larger than the sample size. |
L. T. Thanh; K. Abed-Meraim; A. Hafiane; N. L. Trung; |
1181 | How Can A Cognitive Radar Mask Its Cognition? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our approach in this paper is based on revealed preference theory in microeconomics for identifying rationality. |
K. Pattanayak; V. Krishnamurthy; C. Berry; |
1182 | RTSNet: Deep Learning Aided Kalman Smoothing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work we propose RTSNet, a highly efficient model-based and data-driven smoothing algorithm. |
X. Ni; G. Revach; N. Shlezinger; R. J. G. van Sloun; Y. C. Eldar; |
1183 | Generalized Autocorrelation Analysis for Multi-Target Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Previous works proposed an autocorrelation analysis framework to estimate the signal directly from the measurement, without detecting signal occurrences. |
Y. Shalit; R. Weber; A. Abas; S. Kreymer; T. Bendory; |
1184 | Approximating The Likelihood Ratio in Linear-Gaussian State-Space Models for Change Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on the linear-Gaussian (LG) SSM, in which the LR-based methods require running a Kalman filter for every candidate change point. |
K. Tsampourakis; V. Elvira; |
1185 | Learning Expanding Graphs for Signal Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose a stochastic attachment model for incoming nodes parameterized by the attachment probabilities and edge weights. |
B. Das; E. Isufi; |
1186 | Hodgelets: Localized Spectral Representations of Flows On Simplicial Complexes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop wavelet representations for edge-flows on simplicial complexes, using ideas rooted in combinatorial Hodge theory and spectral graph wavelets. |
T. M. Roddenberry; F. Frantzen; M. T. Schaub; S. Segarra; |
1187 | Recovery of Graph Signals From Sign Measurements Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, the reconstruction of bandlimited graph signals based on sign measurements is discussed and a greedy sampling strategy is proposed. |
W. Liu; H. Feng; K. Wang; F. Ji; B. Hu; |
1188 | Edge Sampling of Graphs Based on Edge Smoothness Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the problem based on sampling theory on graphs. |
K. Yanagiya; K. Yamada; Y. Katsuhara; T. Takatani; Y. Tanaka; |
1189 | WLS Design of Arma Graph Filters Using Iterative Second-Order Cone Programming Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a weighted least-square (WLS) method to design autoregressive moving average (ARMA) graph filters. |
D. Pakiyarajah; C. U. S. Edussooriya; |
1190 | Linear-Time Sampling on Signed Graphs Via Gershgorin Disc Perfect Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that for datasets with inherent strong anti-correlations, a suitable graph structure is instead a signed graph with both positive and negative edge weights, and in response, we propose a linear-time signed graph sampling method. |
C. Dinesh; S. Bagheri; G. Cheung; I. V. Bajic; |
1191 | Privacy-Preserving Federated Multi-Task Linear Regression: A One-Shot Linear Mixing Approach Inspired By Graph Regularization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Motivated by graph regularization, we propose a novel fusion framework that only requires a one-shot communication of local estimates. |
H. Lee; A. L. Bertozzi; J. Kovacevic; Y. Chi; |
1192 | Eco-Fedsplit: Federated Learning with Error-Compensated Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents ECO-FedSplit, an algorithm that increases the communication efficiency of federated learning without sacrificing solution accuracy. |
S. Khirirat; S. Magn�sson; M. Johansson; |
1193 | A Time Encoding Approach to Training Spiking Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we provide an extra tool to help us understand and train SNNs by using theory from the field of time encoding. |
K. Adam; |
1194 | Transient Analysis of Clustered Multitask Diffusion RLS Algorithm Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel clustered multitask diffusion RLS (MT-DRLS) algorithm over network to further improve the performance of its counterpart, the multitask diffusion LMS (MT-DLMS) algorithm. |
W. Gao; J. Chen; C. Richard; W. Shi; Q. Zhang; |
1195 | Improving Inference for Spatial Signals By Contextual False Discovery Rates Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel method to identify areas where the signal behaves interestingly, anomalously, or simply differently from what is expected. |
M. G�lz; A. M. Zoubir; V. Koivunen; |
1196 | Estimation of The Admittance Matrix in Power Systems Under Laplacian and Physical Constraints Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, our goal is to estimate the network admittance matrix, i.e. to learn connectivity and edge weights in the graph representation, under physical and Laplacian constraints. |
M. Halihal; T. Routtenberg; |
1197 | Incipient Fault Severity Estimation Using Local Mahalanobis Distance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Recently, the Local Mahalanobis Distance (LMD) technique was proposed for incipient fault detection, which was shown to be sensitive, robust and distribution assumption-free. … |
J. Yang; C. Delpha; |
1198 | Gridless DOA Estimation Under The Multi-Frequency Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we solve the continuous (gridless) line spectrum estimation problem by incorporating the multi-frequency model into an atomic norm minimization (ANM) framework. |
Y. Wu; M. B. Wakin; P. Gerstoft; |
1199 | Orthogonal Nonnegative Matrix Tri-Factorization for Community Detection in Multiplex Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a new multiplex community detection approach that can identify communities that are common across layers as well as those that are unique to each layer. |
M. Ortiz-Bouza; S. Aviyente; |
1200 | Studying Three Families of Divergences to Compare Wide-Sense Stationary Gaussian Arma Processes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we aim at analyzing the differences between three families of divergences used to compare probability density functions of Gaussian random vectors storing k consecutive samples of wide-sense stationary ARMA processes. |
E. Grivel; |
1201 | Multivariate Multiscale Cosine Similarity Entropy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the analyses of structure on the basis of MSampEn and MMSE require relatively high scales, yet without prior-knowledge of the scale degree. To this end, we propose a new multivariate entropy method based on the recently introduced Cosine Similarity Entropy (CSE). |
H. Xiao; T. Chanwimalueang; D. P. Mandic; |
1202 | Zeroth-Order Randomized Subspace Newton Methods Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To accelerate the convergence rate, this paper proposes the zeroth order randomized subspace Newton (ZO-RSN) method, which estimates projections of the gradient and Hessian by random sketching and finite differences. |
E. Berglund; S. Khirirat; X. Wang; |
1203 | Fast and Stable Convergence of Online SGD for CV@R-Based Risk-Aware Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, due to its variational definition, CV@R is commonly believed to result in difficult optimization problems, even for smooth and strongly convex loss functions. In this work, we disprove this statement by establishing noisy (i.e., fixed-accuracy) linear convergence of stochastic gradient descent for sequential CV@R learning, for a large class of not necessarily strongly-convex (or even convex) loss functions satisfying a set-restricted Polyak-Lojasiewicz inequality. |
D. S. Kalogerias; |
1204 | Deep Initialization for Guaranteed Unimodular Quadratic Programming Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study a deep learning-based initialization approach for unimodular quadratic programs (UQPs), that are concerned with the maximization of a quadratic form over a set of complex unimodular vectors. |
A. V. Ramesh; M. Soltanalian; |
1205 | Continuous Speech Separation with Recurrent Selective Attention Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting. |
Y. Zhang; et al. |
1206 | SA-SDR: A Novel Loss Function for Separation of Meeting Style Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to switch from a mean over the SDRs of each individual output channel to a global SDR over all output channels at the same time, which we call source-aggregated SDR (SA-SDR). |
T. von Neumann; K. Kinoshita; C. Boeddeker; M. Delcroix; R. Haeb-Umbach; |
1207 | VarArray: Array-Geometry-Agnostic Continuous Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes VarArray, an array-geometry-agnostic speech separation neural network model. |
T. Yoshioka; et al. |
1208 | All-Neural Beamformer for Continuous Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input. |
Z. Zhang; et al. |
1209 | Mining Hard Samples Locally And Globally For Improved Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose two methods to alleviate data imbalance in speech separation task, based on local and global hard sample mining. |
K. Wang; Y. Peng; H. Huang; Y. Hu; S. Li; |
1210 | Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all three stages of the system is proposed. |
G. Li; J. Yu; J. Deng; X. Liu; H. Meng; |
1211 | Best of Both Worlds: Multi-Task Audio-Visual Automatic Speech Recognition and Active Speaker Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recent work has shown that we can solve both problems simultaneously by employing an attention mechanism over the competing video tracks of the speakers� faces, at the cost of sacrificing some accuracy on active speaker detection. This work closes this gap in active speaker detection accuracy by presenting a single model that can be jointly trained with a multi-task loss. |
O. Braga; O. Siohan; |
1212 | End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a conformer-based multi-modal speech recognition system. |
J. Chen; M. Wang; X. -L. Zhang; Z. Huang; S. Rahardja; |
1213 | End-To-End Speech Recognition with Joint Dereverberation of Sub-Band Autoregressive Envelopes Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. |
R. Kumar; A. Purushothaman; A. Sreeram; S. Ganapathy; |
1214 | Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, instead of suppressing background noise with a conventional cascaded pipeline, we employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. |
H. Wang; et al. |
1215 | Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Automatic speech recognition (ASR) of multi-channel multi-speaker overlapped speech remains one of the most challenging tasks to the speech community. In this paper, we look into this challenge by utilizing the location information of target speakers in the 3D space for the first time. |
Y. Shao; S. -X. Zhang; D. Yu; |
1216 | Improving Cross-Lingual Speech Synthesis with Triplet Training Scheme Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a triplet training scheme is proposed to enhance the cross-lingual pronunciation by allowing previously unseen content and speaker combinations to be seen during training. |
J. Ye; et al. |
1217 | Improving Phonetic Realizations in Its By Using Phoneme-Aligned Graphemes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To overcome this, we propose using a mix of the two inputs, namely providing phoneme-aligned graphemes to the model. In this paper, we show that this approach can help the model learn to disambiguate some of the more subtle phonemic variations (such as the realization of reduced vowels), and that this effect improves the fidelity to the accent of the original voice talent. |
M. Sharma; Y. Hong; E. Kaplan; S. Tazari; R. Clark; |
1218 | Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel end-to-end text-based speech editing method called context-aware mask prediction network (CampNet), which avoids the unnatural phenomenon caused by cut-copy-paste operation in the traditional method and can synthesize a new word not appearing in the transcript. |
T. Wang; J. Yi; L. Deng; R. Fu; J. Tao; Z. Wen; |
1219 | A Study on The Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This study aims to understand better why and how model pre-training can positively contribute to TTS system performance. |
G. Zhang; et al. |
1220 | One TTS Alignment to Rule Them All Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Most non-autoregressive end-to-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in RAD-TTS and demonstrate its applicability to wide variety of neural TTS models. |
R. Badlani; A. Lancucki; K. J. Shih; R. Valle; W. Ping; B. Catanzaro; |
1221 | Capitalization Normalization for Language Modeling with An Accurate and Efficient Hierarchical RNN Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model. |
H. Zhang; Y. -C. Cheng; S. Kumar; W. R. Huang; M. Chen; R. Mathews; |
1222 | Enhance Rnnlms with Hierarchical Multi-Task Learning for ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, how to best share information among related tasks in MTL remains to be addressed. In this current work, we propose a hierarchical multi-task learning (HMTL) approach to incorporate linguistic knowledge into recurrent neural network language models (RNNLM), instead of using linguistic features as word factors. |
M. Song; Y. Zhao; |
1223 | Neural-FST Class Language Model for End-to-End Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework. |
A. Bruguier; et al. |
1224 | LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: On the other hand, the sparsity of the supervised training data forces the model to have the ability to learn from limited data. To address these problems, we propose LatticeBART, a model that decodes the sequence from the lattice in an end-to-end fashion and can use the pre-trained language models� prior. |
L. Dai; L. Chen; Z. Zhou; K. Yu; |
1225 | RescoreBERT: Discriminative Speech Recognition Rescoring With Bert Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. |
L. Xu; et al. |
1226 | Hybrid Sub-word Segmentation for Handling Long Tail in Morphologically Rich Low Resource Languages Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present a hybrid sub-word segmentation algorithm to deal with OOVs. |
S. Manghat; S. Manghat; T. Schultz; |
1227 | Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose an effective self-supervised learning framework and a novel regularization strategy to facilitate self-supervised speaker representation learning. |
M. Sang; H. Li; F. Liu; A. O. Arnold; L. Wan; |
1228 | Self-Supervised Speaker Recognition Training Using Human-Machine Dialogues Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate how to pretrain speaker recognition models by leveraging dialogues between customers and smart-speaker devices. |
M. Cekic; R. Li; Z. Chen; Y. Yang; A. Stolcke; U. Madhow; |
1229 | Multi-Task Voice Activated Framework Using Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In our work, we propose a general purpose framework for adapting a pre-trained wav2vec 2.0 model for different voice activated tasks. |
S. Hussain; V. Nguyen; S. Zhang; E. Visser; |
1230 | Self-Supervised Speaker Recognition with Loss-Gated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. |
R. Tao; K. Aik Lee; R. Kumar Das; V. Hautam�ki; H. Li; |
1231 | Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we explore the limits of speech representations learned by different self-supervised objectives and datasets for automatic speaker verification (ASV), especially with a well-recognized SOTA ASV model, ECAPA-TDNN [1], as a downstream model. |
Z. Chen; et al. |
1232 | Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we aim to improve the existing SSL framework for speaker representation learning. |
S. Chen; et al. |
1233 | Carina � A Corpus of Aligned German Read Speech Including Annotations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: CARInA is freely available 1, designed to grow and improve over time, and suitable for large-scale speech analyses or machine learning tasks as illustrated by two examples shown in this paper. |
H. Kath; S. Stone; S. Rapp; P. Birkholz; |
1234 | Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents initial Speech Recognition results on �Casual Conversations� � a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone. |
C. Liu; et al. |
1235 | M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper we provide a detailed introduction of the AliMeeting dateset, challenge rules, evaluation methods and baseline systems. |
F. Yu; et al. |
1236 | ADIMA: Abuse Detection In Multilingual Audio Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Exploration of this problem entirely in the audio domain has largely been limited by the lack of audio datasets. Building on these challenges, we propose ADIMA, a novel, linguistically diverse, ethically sourced, expert annotated and well- balanced multilingual abuse detection audio dataset comprising of 11,775 audio samples in 10 Indic languages spanning 65 hours and spoken by 6,446 unique users. |
V. Gupta; R. Sharon; R. Sawhney; D. Mukherjee; |
1237 | Anno-MI: A Dataset of Expert-Annotated Counselling Dialogues Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we introduce AnnoMI, the first publicly and freely accessible dataset of professionally transcribed and expert-annotated therapy dialogues. |
Z. Wu; et al. |
1238 | WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total. |
B. Zhang; et al. |
1239 | Wavebender GAN: An Architecture for Phonetically Meaningful Speech Manipulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The new methods do not meet the controllability demands that practitioners in this area require e.g.: in listening tests with manipulated speech stimuli. Instead, control of different speech properties in such stimuli is achieved by using legacy signal-processing methods. |
G. Teodoro D�hler Beck; U. Wennberg; Z. Malisz; G. Eje Henter; |
1240 | FRE-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Fre-GAN 2, a fast and efficient high-quality audio synthesis model. |
S. -H. Lee; J. -H. Kim; K. -E. Lee; S. -W. Lee; |
1241 | R-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme Conversion By Controlled Noise Introducing and Contextual Information Incorporation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we aim to evaluate and enhance the robustness of G2P models. |
C. Zhao; J. Wang; X. Qu; H. Wang; J. Xiao; |
1242 | Neural Grapheme-To-Phoneme Conversion with Pre-Trained Grapheme Models Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Inspired by the success of the pre-trained language model BERT, this paper proposes a pre-trained grapheme model called grapheme BERT (GBERT), which is built by self-supervised training on a large, language-specific word list with only grapheme information. |
L. Dong; Z. -Q. Guo; C. -H. Tan; Y. -J. Hu; Y. Jiang; Z. -H. Ling; |
1243 | ISTFTNET: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We thus propose iSTFTNet, which replaces some output-side layers of the mel-spectrogram vocoder with the inverse short-time Fourier transform (iSTFT) after sufficiently reducing the frequency dimension using upsampling layers, reducing the computational cost from black-box modeling and avoiding redundant estimations of high-dimensional spectrograms. |
T. Kaneko; K. Tanaka; H. Kameoka; S. Seki; |
1244 | Acoustic Application of Phase Reconstruction Algorithms in Optics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: At the same time, GLA is known as the Gerchberg�Saxton algorithm in optics, and a lot of its variants have been proposed independently of those in acoustics. In this paper, we propose to apply phase reconstruction algorithms developed in the optics community to acoustic applications and evaluate them using acoustical metrics. |
T. Kobayashi; T. Tanaka; K. Yatabe; Y. Oikawa; |
1245 | CPT: Cross-Modal Prefix-Tuning for Speech-To-Text Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we improve the performance of speech translation in medium-/low-resource settings by a cross-modal prefix that bridges the gap between speech input and translation modules to reduce the information loss in the cascaded model. |
Y. Ma; T. H. Nguyen; B. Ma; |
1246 | Tackling Data Scarcity in Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. |
T. A. Dinh; D. Liu; J. Niehues; |
1247 | Improving End-To-End Speech Translation Model with Bert-Based Contextual Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an end-to-end speech translation system that utilizes contextual information. |
J. -U. Bang; M. -K. Lee; S. Yun; S. -H. Kim; |
1248 | Context-Adaptive Document-Level Neural Machine Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work introduces a data-adaptive method that enables the model to adopt the necessary and helpful context. |
L. Zhang; Z. Zhang; B. Chen; W. Luo; L. Si; |
1249 | Integrating Multiple ASR Systems Into NLP Backend with Attention Fusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reduce the impact of ASR errors on the NLP back-end by combining transcriptions from various ASR systems. |
T. Kano; A. Ogawa; M. Delcroix; S. Watanabe; |
1250 | ISOMETRIC MT: Neural Machine Translation for Automatic Dubbing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work introduces a self-learning approach that allows a transformer model to directly learn to generate outputs that closely match the source length, in short Isometric MT. In particular, our approach does not require to generate multiple hypotheses nor any auxiliary ranking function. |
S. M. Lakew; Y. Virkar; P. Mathur; M. Federico; |
1251 | Automatic Depression Detection: An Emotional Audio-Textual Corpus and A Gru/Bilstm-Based Model Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we propose a novel depression detection approach utilizing speech characteristics and linguistic contents from participants� interviews. |
Y. Shen; H. Yang; L. Lin; |
1252 | Multimodal Depression Classification Using Articulatory Coordination Features and Hierarchical Attention Based Text Embeddings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop a multimodal depression classification system using articulatory coordination features extracted from vocal tract variables and text transcriptions obtained from an automatic speech recognition tool that yields improvements of area under the receiver operating characteristics curve compared to unimodal classifiers (7.5% and 13.7% for audio and text respectively). |
N. Seneviratne; C. Espy-Wilson; |
1253 | Thin Slices of Depression: Improving Depression Detection Performance Through Data Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In fact, data collection in the depression domain requires the respect of rigorous ethical constraints that, inevitably, limit the size of the corpora that can be collected. This article proposes to address the problem by using the thin slices theory, i.e., the possibility to detect the inner state of an individual (depression in this case) through very short samples of behavior. |
R. Alsarrani; A. Esposito; A. Vinciarelli; |
1254 | Climate and Weather: Inspecting Depression Detection Via Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Psychological research suggests that depressive mood is closely related with emotion expression and perception, which motivates the investigation of whether knowledge of emotion recognition can be transferred for depression detection. |
W. Wu; M. Wu; K. Yu; |
1255 | Fraug: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a data augmentation method is proposed for depression detection from speech signals. |
V. Ravi; J. Wang; J. Flint; A. Alwan; |
1256 | Privacy Sensitive Speech Analysis Using Federated Learning to Assess Depression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes to use Federated Learning (FL) to enable decentralized, privacy-preserving speech analysis to assess depression. |
S. Bn; S. Abdullah; |
1257 | A Time Domain Progressive Learning Approach with SNR Constriction for Single-Channel Speech Enhancement and Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: There-fore, we propose a time domain progressive learning (TDPL) approach for speech enhancement and ASR. |
Z. Nian; J. Du; Y. Ting Yeung; R. Wang; |
1258 | A Two-Step Approach to Leverage Contextual Data: Speech Recognition in Air-Traffic Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate a two-step callsign boosting approach: (1) at the 1st step (ASR), weights of probable callsign n-grams are reduced in G.fst and/or in the decoding FST (lattices), (2) at the 2nd step (NLP), callsigns extracted from the improved recognition outputs with Named Entity Recognition (NER) are correlated with the surveillance data to select the most suitable one. |
I. Nigmatulina; J. Zuluaga-Gomez; A. Prasad; S. Saeed Sarfjoo; P. Motlicek; |
1259 | Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a DNN-based switching method that directly estimates whether ASR will perform better on the enhanced or observed signals. |
H. Sato; T. Ochiai; M. Delcroix; K. Kinoshita; N. Kamo; T. Moriya; |
1260 | Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, the over-suppression phenomenon in the enhanced speech might degrade the performance of downstream automatic speech recognition (ASR) task due to the missing latent information. To alleviate such problem, we propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature. |
Y. Hu; N. Hou; C. Chen; E. Siong Chng; |
1261 | Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we explore a speaker reinforcement strategy for improving recognition performance without retraining the acoustic model (AM). |
C. Zorila; R. Doddipatla; |
1262 | Mitigating Closed-Model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. |
C. -H. H. Yang; et al. |
1263 | Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work presents Referee, a robust reference-free CSST approach for expressive TTS, which fully leverages low-quality data to learn speaking styles from text. |
S. Liu; S. Yang; D. Su; D. Yu; |
1264 | PVAE-TTS: Adaptive Text-to-Speech Via Progressive Style Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nevertheless, existing adaptive TTS systems still show low adaptation quality for novel speakers, since it is hard to learn an extensive speaking style with limited data. To address this issue, we propose progressive variational autoencoder (PVAE) which generates data with adapting to style gradually. |
J. -H. Lee; S. -H. Lee; J. -H. Kim; S. -W. Lee; |
1265 | EMOQ-TTS: Emotion Intensity Quantization for Fine-Grained Controllable Emotional Text-to-Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose EmoQ-TTS, which synthesizes expressive emotional speech by conditioning phoneme-wise emotion information with fine-grained emotion intensity. |
C. -B. Im; S. -H. Lee; S. -B. Kim; S. -W. Lee; |
1266 | Joint and Adversarial Training with ASR for Expressive Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose to alleviate the entanglement problem by integrating Text-To-Speech (TTS) model and Automatic Speech Recognition (ASR) model with a share layer network for joint training, and using ASR adversarial training to eliminate the content information in the style information. |
K. Zhang; C. Gong; W. Lu; L. Wang; J. Wei; D. Liu; |
1267 | MSDTRON: A High-Capability Multi-Speaker Speech Synthesis System for Diverse Data Using Characteristic Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the issue, this paper proposes a high-capability speech synthesis system, called Msdtron, in which 1) a representation of the harmonic structure of speech, called excitation spectrogram, is designed to directly guide the learning of harmonics in mel-spectrogram. |
Q. Wu; Q. Shen; J. Luan; Y. Wang; |
1268 | SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion Without Tuning Autoencoder Bottlenecks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes SpeechSplit2.0, which constrains the information flow of the speech component to be disentangled on the autoencoder input using efficient signal processing methods instead of bottleneck tuning. |
C. Ho Chan; K. Qian; Y. Zhang; M. Hasegawa-Johnson; |
1269 | Document-Level Event Extraction Via Human-Like Reading Process Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The first challenge means that arguments of one event record could reside in different sentences in the document, while the second one reflects that one document may simultaneously contain multiple such event records. Motivated by humans� reading cognitive to extract information of interests, in this paper, we propose a method called HRE (Human Reading inspired Extractor for Document Events), where DEE is decomposed into these two iterative stages, rough reading and elaborate reading. |
S. Cui; X. Cong; B. Yu; T. Liu; Y. Wang; J. Shi; |
1270 | Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework That Works Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, for the first time we introduce the prompt-based learning strategy to the domain of Event Extraction, which empowers the automatic exploitation of label semantics on both input and output sides. |
J. Si; X. Peng; C. Li; H. Xu; J. Li; |
1271 | Multi-Role Event Argument Extraction As Machine Reading Comprehension with Argument Match Optimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of them need multi-turns to extract the arguments of each role independently, which ignores the relationships among roles in the same event. To alleviate this problem, we propose a novel Multi-Role Argument Extraction method named MRAE which can exploit the relationship of event roles by extracting all arguments for an event simultaneously. |
J. Tao; et al. |
1272 | BNU: A Balance-Normalization-Uncertainty Model for Incremental Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although existing incremental event detection models achieve impressive performance, they face the data imbalance problem between old classes and new classes, and have the knowledge transfer problem which cannot adequately utilize the knowledge provided by the previous model and data. To this end, we propose a Balance-Normalization-Uncertainty (BNU) model to address above problems. |
J. Li; Y. Zhang; Y. Yang; Z. An; Y. Zheng; |
1273 | Wlinker: Modeling Relational Triplet Extraction As Word Linking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In light of these limitations, we take an innovative perspective on RTE by modeling it as a word linking problem that learns to link from subject words to object words for each relation type. To this end, we propose a simple but effective multi-task learning model, WLinker, which can extract overlapping relational triplets in an end-to-end fashion. |
Y. Xu; C. Zhou; H. Huang; J. Yu; Y. Hu; |
1274 | A Knowledge/Data Enhanced Method for Joint Event and Temporal Relation Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To alleviate those issues, we propose a Knowledge/Data Enhanced method for Event and TempRel Extraction, which integrates the temporal commonsense knowledge, data augmentation and Focal Loss function into one single extraction system. |
X. Zhang; L. Zang; P. Cheng; Y. Wang; S. Hu; |
1275 | AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose a novel heterogeneous stacking graph attention layer that models artefacts spanning heterogeneous temporal and spectral intervals with a heterogeneous attention mechanism and a stack node. |
J. -w. Jung; et al. |
1276 | Estimating The Confidence of Speech Spoofing Countermeasure Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We investigated a few confidence estimators that can be easily plugged into a neural-network-based CM. |
X. Wang; J. Yamagishi; |
1277 | Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the two-path GMM-ResNet and GMM-SENet models for spoofing detection, whose input is the Gaussian probability features based on two GMMs trained on genuine and spoofed speech respectively. |
Z. Lei; H. Yan; C. Liu; M. Ma; Y. Yang; |
1278 | Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper introduces RawBoost, a data boosting and augmentation method for the design of more reliable spoofing detection solutions which operate directly upon raw waveform inputs. |
H. Tak; M. Kamble; J. Patino; M. Todisco; N. Evans; |
1279 | Explaining Deep Learning Models for Spoofing and Deepfake Detection with Shapley Additive Explanations Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: All results reported in the paper are reproducible using open-source software. |
W. Ge; J. Patino; M. Todisco; N. Evans; |
1280 | Multi-Task Learning Improves Synthetic Speech Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, by observing that deepening the network impairs the performance of the network in detecting unknown attacks, we propose that the synthetic speech detection problem is an out-of-distribution (OOD) generalization problem and we enhance the robustness of networks by using multi-task learning. |
Y. Mo; S. Wang; |
1281 | Massively Multilingual ASR: A Lifelong Learning Solution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the impact of adding more languages and propose a lifelong learning approach to build high quality MMASR systems. |
B. Li; et al. |
1282 | Joint Unsupervised and Supervised Training for Multilingual ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an end-to-end (E2E) Joint Unsupervised and Supervised Training (JUST) method to combine the supervised RNN-T loss and the self-supervised contrastive and masked language modeling (MLM) losses. |
J. Bai; et al. |
1283 | Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the use of the NOS rescoring model on a first-pass multilingual model and show that similar to the first-pass model, the rescoring model can be made multilingual. |
N. Gaur; T. Chen; E. Variani; P. Haghani; B. Ramabhadran; P. J. Moreno; |
1284 | Joint Modeling of Code-Switched and Monolingual ASR Via Conditional Factorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. |
B. Yan; et al. |
1285 | Bilingual End-to-End ASR with Byte-Level Subwords Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). |
L. Deng; R. Hsiao; A. Ghoshal; |
1286 | A Configurable Multilingual Model Is All You Need to Recognize All Languages Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel configurable multilingual model (CMM) which is trained only once but can be configured as different models based on users� choices by extracting language-specific modules together with a universal module from the trained CMM. |
L. Zhou; J. Li; E. Sun; S. Liu; |
1287 | Domain-Invariant Feature Learning for Cross Corpus Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an efficient domain adversarial training method to cope with the non-affective information during feature extraction. |
Y. Gao; S. Okada; L. Wang; J. Liu; J. Dang; |
1288 | Multi-Stage Graph Representation Learning for Dialogue-Level Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel strategy that focuses on capturing dialogue-level contextual information. |
Y. Song; J. Liu; L. Wang; R. Yu; J. Dang; |
1289 | Speech Emotion Recognition with Global-Aware Fusion on Multi-Scale Feature Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a novel GLobal-Aware Multi-scale (GLAM) neural network1 to learn multi-scale feature representation with global-aware fusion module to attend emotional information. |
W. Zhu; X. Li; |
1290 | Representation Learning Through Cross-Modal Conditional Teacher-Student Training For Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we show that the primary difference between the top-performing representations is in predicting valence while the differences in predicting activation and dominance dimensions are less pronounced. |
S. Srinivasan; Z. Huang; K. Kirchhoff; |
1291 | Not All Features Are Equal: Selection of Robust Features for Speech Emotion Recognition in Noisy Environments Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to select a group of robust features according to their performance and robustness in noisy condition. |
S. -G. Leem; D. Fulford; J. -P. Onnela; D. Gard; C. Busso; |
1292 | Towards Transferable Speech Emotion Representation: On Loss Functions for Cross-Lingual Latent Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, generalizing over languages, corpora and recording conditions is still an open challenge. In this work we address this gap by exploring loss functions that aid in transferability, specifically to non-tonal languages. |
S. Das; N. Nadine L�nfeldt; A. Katrine Pagsberg; L. H. Clemmensen; |
1293 | Dementia Detection By Fusing Speech and Eye-Tracking Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a method of detecting dementia from the simultaneous speech and eye-tracking recordings of subjects in a picture description task. |
Z. Sheng; Z. Guo; X. Li; Y. Li; Z. Ling; |
1294 | Towards Interpretability of Speech Pause in Dementia Detection Using Adversarial Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we will study the positions and lengths of dementia-sensitive pauses using adversarial learning approaches. |
Y. Zhu; B. Tran; X. Liang; J. A. Batsis; R. M. Roth; |
1295 | Using Spectral Sequence-to-Sequence Autoencoders to Assess Mild Cognitive Impairment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we employ sequence-to-sequence deep autoencoders in order to extract compact, robust and efficient attributes from the spontaneous speech of 25 MCI subjects and 25 healthy controls. |
M. Vetr�b; et al. |
1296 | Exploring Dementia Detection from Speech: Cross Corpus Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a qualitative and quantitative analysis of speech and language features derived from two different corpora with the aim to predict early signs of dementia. |
A. Ablimit; C. Botelho; A. Abad; T. Schultz; I. Trancoso; |
1297 | Experimental Investigation on STFT Phase Representations for Deep Learning-Based Dysarthric Speech Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The objective of this paper is to investigate the applicability of the unprocessed phase, MGD, and IF spectra for dysarthric speech detection. |
P. Janbakhshi; I. Kodrasi; |
1298 | Dysfluency Classification in Stuttered Speech Using Deep Learning for Real-Time Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we combine MFCC and phoneme probabilities to train a neural network for stuttering detection and classification of four dysfluency types. |
M. Jouaiti; K. Dautenhahn; |
1299 | Embedding and Beamforming: All-Neural Causal Beamformer for Multichannel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Standing upon the intersection of traditional beamformers and deep neural networks, we propose a causal neural beamformer paradigm called Embedding and Beamforming, and two core modules are devised accordingly, namely EM and BM. |
A. Li; W. Liu; C. Zheng; X. Li; |
1300 | Improving Dual-Microphone Speech Enhancement By Learning Cross-Channel Features with Multi-Head Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, a novel architecture for DMSE using a multi-head cross-attention based convolutional recurrent network (MHCA-CRN) is presented. |
X. Xu; R. Gu; Y. Zou; |
1301 | TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. |
A. Pandey; B. Xu; A. Kumar; J. Donley; P. Calamia; D. Wang; |
1302 | Multichannel Speech Enhancement Without Beamforming Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a two-stage strategy for multi-channel speech enhancement that does not require a traditional beamformer for additional performance. |
A. Pandey; B. Xu; A. Kumar; J. Donley; P. Calamia; D. Wang; |
1303 | Learning Filterbanks for End-to-End Acoustic Beamforming Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We perform a detailed analysis using the recent Clarity Challenge data and show that by using learnt filterbanks it is possible to surpass oracle-mask based beamforming for short windows. |
S. Cornell; M. Pariente; F. Grondin; S. Squartini; |
1304 | Spatial-Temporal Graph Convolution Network for Multichannel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a spatial-temporal graph convolutional network composed of cascaded spatial-temporal (ST) modules with channel fusion. |
M. Hao; J. Yu; L. Zhang; |
1305 | Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. |
A. Ogawa; N. Tawara; M. Delcroix; S. Araki; |
1306 | Continual Learning Using Lattice-Free MMI for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we explore regularization-based CL for neural network acoustic models trained with the lattice-free maximum mutual information (LF-MMI) criterion. |
H. Hadian; A. Gorin; |
1307 | Non-Autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the causal mask is designed for the left-to-right decoding process of the non-parallel autoregressive (AR) transformer, which is inappropriate for the parallel NAR transformer since it ignores the right-to-left contexts. Some methods are proposed to utilize right-to-left contexts with an extra decoder, but these methods increase the model complexity. |
C. -F. Zhang; Y. Liu; T. -H. Zhang; S. -L. Chen; F. Chen; X. -C. Yin; |
1308 | Model-Based Approach for Measuring The Fairness in ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest. |
Z. Liu; I. -E. Veliche; F. Peng; |
1309 | Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. |
Q. Li; Y. Zhang; D. Qiu; Y. He; L. Cao; P. C. Woodland; |
1310 | Parallel Composition of Weighted Finite-State Transducers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an algorithm for parallel composition and implement it on graphics processing units. |
S. Sengupta; V. Pratap; A. Hannun; |
1311 | DGC-Vector: A New Speaker Embedding for Zero-Shot Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. |
R. Xiao; H. Zhang; Y. Lin; |
1312 | S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based on the S3PRL toolkit. |
W. -C. Huang; S. -W. Yang; T. Hayashi; H. -Y. Lee; S. Watanabe; T. Toda; |
1313 | Training Robust Zero-Shot Voice Conversion Models with Self-Supervised Features Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, self-supervised learning of speech representation has been shown to produce useful linguistic units without using transcripts, which can be directly passed to a VC model. In this paper, we showed that high-quality audio samples can be achieved by using a length resampling decoder, which enables the VC model to work in conjunction with different linguistic feature extractors and vocoders without requiring them to operate on the same sequence length. |
T. Dang; D. Tran; P. Chin; K. Koishida; |
1314 | A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we focus on self-supervised representation learning for voice conversion. |
B. van Niekerk; M. -A. Carbonneau; J. Za�di; M. Baas; H. Seut�; H. Kamper; |
1315 | SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System for Both Human Beings and Machines Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel method for zero-shot voice conversion. |
H. Zhang; Z. Cai; X. Qin; M. Li; |
1316 | Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this study, we investigate zero-shot VC from a novel perspective of self-supervised disentangled speech representation learning. |
J. Lian; C. Zhang; D. Yu; |
1317 | Improving Dialogue Generation Via Proactively Querying Grounded Knowledge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel knowledge-based dialogue system which integrates the strength of a transformer-based generator and a knowledge retriever capable of proactively constructing queries for accurate information. |
X. Zhao; L. Wang; J. Dang; |
1318 | A Non-Hierarchical Attention Network with Modality Dropout for Textual Response Generation in Multimodal Dialogue Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the issues mentioned above, we propose a non-hierarchical attention network with modality dropout, which abandons the HRED framework and utilizes attention modules to encode each utterance and model the context representation. |
R. Sun; B. Chen; Q. Zhou; Y. Li; Y. Cao; H. -T. Zheng; |
1319 | Joint Learning for Addressee Selection and Response Generation in Multi-Party Conversation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study a novel task of joint learning for addressee selection and response generation in multi-party conversations. |
Q. Song; S. Li; P. Wei; G. Luo; X. Zhang; Z. Qian; |
1320 | Retrieval Enhanced Segment Generation Neural Network for Task-Oriented Dialogue Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While end-to-end neural networks have achieved promising performances on this task, the existing models still struggle to avoid slot mistakes. To address this challenge, we propose a novel segmented generation approach in this paper. |
M. Chen; et al. |
1321 | A Multi Domain Knowledge Enhanced Matching Network for Response Selection in Retrieval-Based Dialogue Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Multi Domain Knowledge Enhanced Matching Network (MDKEMN) to build retrievalbased dialogue systems that could leverage both explicit knowledge graph and implicit domain knowledge for response selection. |
X. Chen; F. Chen; S. Xu; B. Xu; |
1322 | Retrieval Bias Aware Ensemble Model for Conditional Sentence Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: It leads to a retrieval bias between the condition and its retrieved result, and then text generation augmented by such results becomes unreliable. To fix this issue, we propose RBAEM, a Retrieval Bias Aware Ensemble Model. |
Y. Song; Z. Xie; J. Li; L. Liu; M. Zhang; Z. Tian; |
1323 | Effective and Inconspicuous Over-the-Air Adversarial Examples with Adaptive Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we demonstrate a novel audio-domain adversarial attack that modifies benign audio using an interpretable and differentiable parametric transformation – adaptive filtering. |
P. O�Reilly; P. Awasthi; A. Vijayaraghavan; B. Pardo; |
1324 | LRPD: Large Replay Parallel Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In order to foster the progress of neural network systems, we introduce a Large Replay Parallel Dataset (LRPD) aimed for a detection of replay attacks. |
I. Yakovlev; et al. |
1325 | Robust Self-Supervised Speaker Representation Learning Via Instance Mix Regularization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus while the conventional self-supervised speaker embedding systems can yield minimum within-utterance variability, the capability to generalize to out-of-set utterance is limited. In order to alleviate this problem, we propose a novel self-supervised learning framework for speaker verification which combines the angular prototypical loss and the instance mix (i-mix) regularization. |
W. H. Kang; J. Alam; A. Fathan; |
1326 | Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a GCN-based approach for semi-supervised learning. |
F. Tong; et al. |
1327 | Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problems in a large-scale production setting for online ASR model. |
D. Hwang; et al. |
1328 | Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization. |
T. Munkhdalai; et al. |
1329 | Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We trained personalized models for 195 individuals with different types and severities of speech impairment with training sets ranging in size from <1 minute to 18-20 minutes of speech data. |
J. Tobin; K. Tomanek; |
1330 | Spell My Name: Keyword Boosted Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Recognition of uncommon words such as names and technical terminology is important to understanding conversations in context. However, the ability to recognise such words remains … |
N. Jung; G. Kim; J. S. Chung; |
1331 | Magic Dust for Cross-Lingual Adaptation of Monolingual Wav2vec-2.0 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in resource-scarce languages. |
S. Khurana; A. Laurent; J. Glass; |
1332 | Text Adaptive Detection for Customizable Keyword Spotting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel text adaptive detection framework to directly formulate KWS as a detection rather than a classification problem. |
Y. Xi; T. Tan; W. Zhang; B. Yang; K. Yu; |
1333 | Improving Adversarial Waveform Generation Based Singing Voice Conversion with Harmonic Signals Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes to feed harmonic signals to the SVC model in advance to enhance audio generation. |
H. Guo; Z. Zhou; F. Meng; K. Liu; |
1334 | K-Converter: An Unsupervised Singing Voice Conversion System Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Some end-toend SVC systems use adversarial training, which causes instability during optimization. To address these issues, we present K-Converter, a simple system to disentangle the timbre, pitch, and content information without any manual supervision or adversarial training. |
Y. Zhang; P. Yang; J. Xiao; Y. Bai; H. Che; X. Wang; |
1335 | HiFi-SVC: Fast High Fidelity Cross-Domain Singing Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents HiFi-SVC, a small cross-domain singing voice conversion model for generating high-fidelity 22.05 kHz singing voices. |
Y. Zhou; X. Lu; |
1336 | Towards Identity Preserving Normal to Dysarthric Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. |
W. -C. Huang; B. M. Halpern; L. Phillip Violeta; O. Scharenborg; T. Toda; |
1337 | Speaker Identity Preservation in Dysarthric Speech Reconstruction By Adversarial Speaker Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this research problem, we propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA). |
D. Wang; et al. |
1338 | Controllable Speech Representation Learning Via Voice Conversion and AIC Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a method for invertible and controllable speech representation learning based on disentanglement. |
Y. Wang; J. Su; A. Finkelstein; Z. Jin; |
1339 | VU-BERT: A Unified Framework for Visual Dialog Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing researches tend to employ the modality-specific modules to model the interactions, which might be troublesome to use. To fill in this gap, we propose a unified framework for image-text joint embedding, named VU-BERT, and apply patch projection to obtain vision embedding firstly in visual dialog tasks to simplify the model. |
T. Ye; S. Si; J. Wang; R. Wang; N. Cheng; J. Xiao; |
1340 | Integrating Pretrained Language Model for Dialogue Policy Evaluation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Distinguished from many efforts dedicated to optimizing the policy and recovering the reward alternatively which suffers from easily getting stuck in local optima and model collapse, we decompose the adversarial training into two steps: 1) we integrate a pre-trained language model as a discriminator to judge whether the current system action is good enough for the last user action (i.e., next action prediction); 2) the discriminator gives and extra local dense reward to guide the agent�s exploration. |
H. Wang; H. Wang; Z. Wang; K. -F. Wong; |
1341 | Cache: Modeling Contribution-Aware Context Hierarchically for Long-Range Dialogue State Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel method to model Contribution-Aware Context HiErarchically (CACHE) with a hierarchical encoder and a slot-turn attention module. |
J. Qi; Y. Si; L. Wang; J. Dang; |
1342 | Robust Unstructured Knowledge Access in Conversational Dialogue with ASR Errors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. |
Y. -C. Tam; J. Xu; J. Zou; Z. Wang; T. Liao; S. Yuan; |
1343 | An Embarrassingly Simple Model for Dialogue Relation Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a simple yet effective model named SimpleRE for the RE task. |
F. Xue; A. Sun; H. Zhang; J. Ni; E. -S. Chng; |
1344 | A Gaussian Mixture Model for Dialogue Generation with Dynamic Parameter Sharing Strategy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As we focus on better modeling multinomial data for dialog generation, we study an approach that combines the unsupervised clustering and generative model together with a GMM (Gaussian Mixture Model) based encoder-decoder framework. |
Q. Zhu; P. Wu; Z. Tan; J. Duan; F. Lu; J. Liu; |
1345 | Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To make better use of multiple enrollment utterances, we propose a novel attention back-end model that can be used for both textindependent (TI) and text-dependent (TD) speaker verification, and we use scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of enrollment utterances. |
C. Zeng; X. Wang; E. Cooper; X. Miao; J. Yamagishi; |
1346 | Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces an alternative effective yet simple one, i.e., simple attention module (SimAM), for speaker verification. |
X. Qin; N. Li; C. Weng; D. Su; M. Li; |
1347 | Local Information Modeling with Self-Attention for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the self-attention module, as the key component of transformer, can help the model make full use of global information but insufficient to capture the local information. To tackle this limitation, in this paper, we strengthen the local information modeling from two different aspects: restricting the attention context to be local and introducing convolution operation into transformer. |
B. Han; Z. Chen; Y. Qian; |
1348 | Multi-View Self-Attention Based Transformer for Speaker Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition. |
R. Wang; et al. |
1349 | Multi-Query Multi-Head Attention Pooling and Inter-Topk Penalty for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To further enhance the inter-class discriminability, we propose a method that adds an extra inter-topK penalty on some confused speakers. |
M. Zhao; Y. Ma; Y. Ding; Y. Zheng; M. Liu; M. Xu; |
1350 | Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemic Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose temporal dynamic CNN (TDY-CNN) that considers temporal variation of phonemes by applying kernels optimally adapting to each time bin. |
S. -H. Kim; H. Nam; Y. -H. Park; |
1351 | Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training before being cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features. |
S. Hu; et al. |
1352 | Conversational Speech Recognition By Learning Conversation-Level Characteristics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. |
K. Wei; Y. Zhang; S. Sun; L. Xie; L. Ma; |
1353 | Exploring Machine Speech Chain For Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore the TTS?ASR pipeline in machine speech chain to perform domain adaptation for both E2E ASR and neural TTS models with only text data from the target domain. |
F. Yue; Y. Deng; L. He; T. Ko; Y. Zhang; |
1354 | A Likelihood Ratio Based Domain Adaptation Method for E2E Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities. |
C. Choudhury; A. Gandhe; X. Ding; I. Bulyko; |
1355 | Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A question that naturally arises is whether the dissemination of personalized acoustic models can leak personal information. In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker. |
S. Mdhaffar; J. -F. Bonastre; M. Tommasi; N. Tomashenko; Y. Est�ve; |
1356 | Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast and accurate end-to-end (E2E) model, which executes automatic speech recognition (ASR) and downstream natural language processing (NLP) simultaneously. |
M. Omachi; Y. Fujita; S. Watanabe; T. Wang; |
1357 | Toward Degradation-Robust Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations. It thus becomes highly desired to understand how these degradations affect voice conversion and build a degradation-robust model. |
C. -Y. Huang; K. -W. Chang; H. -Y. Lee; |
1358 | Text-Free Non-Parallel Many-To-Many Voice Conversion Using Normalising Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate normalising flows for VC in both text-conditioned and text-free scenarios. |
T. Merritt; et al. |
1359 | Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Although our framework consisting of a denoising module and a VC module well handles the background sounds, the VC module is sensitive to the distortion caused by the denoising module. To address this distortion issue, in this paper we propose the improved VC module to directly model the noisy speech waveform while controlling the background sounds. |
C. Xie; Y. -C. Wu; P. L. Tobing; W. -C. Huang; T. Toda; |
1360 | One-Shot Voice Conversion For Style Transfer Based On Speaker Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we build on the recognition-synthesis framework and propose a one-shot voice conversion approach for style transfer based on speaker adaptation. |
Z. Wang; et al. |
1361 | Cross-Speaker Style Transfer for Text-to-Speech Using Data Augmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to build a TTS system that is expressive, while retaining the target speaker�s identity. |
M. Sam Ribeiro; J. Roth; G. Comini; G. Huybrechts; A. Gabrys; J. Lorenzo-Trueba; |
1362 | An Investigation of Streaming Non-Autoregressive Sequence-to-sequence Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce streamable architectures such as causal convolution and self-attention with causal masking for the FastSpeech2-based NAR-S2S-VC model. |
T. Hayashi; K. Kobayashi; T. Toda; |
1363 | A Universal Ordinal Regression for Assessing Phoneme-Level Pronunciation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: train an ordinal regression model for each phoneme with the corresponding training and inference costs. In this paper, we propose to train a Universal Ordinal Regression (UOR) model instead of multiple, separate models for different phonemes, and evaluate its performance accordingly. |
S. Mao; F. Soong; Y. Xia; J. Tien; |
1364 | A Transfer Learning Approach for Pronunciation Scoring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. |
M. Sancinetti; J. Vidal; C. Bonomi; L. Ferrer; |
1365 | Exploring Non-Autoregressive End-to-End Neural Modeling for English Mispronunciation Detection and Diagnosis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: End-to-end (E2E) neural modeling has emerged as one predominant school of thought to develop computer-assisted pronunciation training (CAPT) systems, showing competitive … |
H. -W. Wang; B. -C. Yan; H. -S. Chiu; Y. -C. Hsu; B. Chen; |
1366 | Phoneme Mispronunciation Detection By Jointly Learning To Align Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a method for phoneme mispronunciation detection by jointly learning to align. |
B. Lin; L. Wang; |
1367 | An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to utilize Acoustic, Phonetic and Linguistic (APL) embedding features jointly for building a more powerful MD&D system. |
W. Ye; et al. |
1368 | Masked Acoustic Unit for Mispronunciation Detection and Correction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: On the other hand, ASR-based CAPT methods only give the learner text-based feedback about the mispronunciation, but cannot teach the learner how to pronounce the sentence correctly. To solve these limitations, we propose to use the acoustic unit (AU) as the intermediary feature for both mispronunciation detection and correction. |
Z. Zhang; Y. Wang; J. Yang; |
1369 | Investigating Self-Supervised Learning for Speech Enhancement and Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we evaluate 13 SSL upstream methods on speech enhancement and separation downstream tasks. |
Z. Huang; S. Watanabe; S. -w. Yang; P. Garc�a; S. Khudanpur; |
1370 | TFPSNet: Time-Frequency Domain Path Scanning Network for Speech Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose time-frequency (T-F) domain path scanning network (TFPSNet) for speech separation task. |
L. Yang; W. Liu; W. Wang; |
1371 | Efficient Monaural Speech Separation with Multiscale Time-Delay Sampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Multiscale Time-Delay Sampling method (MTDS) for the dual-path networks in MSS to learn sequence features from fine to coarse by multiscale time-delay sampling, which effectively integrates different scale local and global information for long sequences. |
S. Qian; L. Gao; H. Jia; Q. Mao; |
1372 | Toward MmWave-Based Sound Enhancement and Separation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: With the help of video modality, improvements have been shown for these tasks. In this work, we explore a multimodal approach using mmWave radio devices, as these devices can measure vocal folds vibration. |
M. Z. Ozturk; C. Wu; B. Wang; K. J. Ray Liu; |
1373 | DPT-FSNet: Dual-Path Transformer Based Full-Band and Sub-Band Fusion Network for Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a dual-path transformer-based full-band and sub-band fusion network (DPT-FSNet) for speech enhancement in the frequency domain. |
F. Dang; H. Chen; P. Zhang; |
1374 | Real-M: Towards Speech Separation on Real Mixtures Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. |
C. Subakan; M. Ravanelli; S. Cornell; F. Grondin; |
1375 | Spoken Language Recognition with Cluster-Based Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we analyze the incorporation of cluster-based modeling into the language recognition systems, in which a single utterance is represented as an embedding, deploying widely used i-vectors and x-vectors. |
S. Kacprzak; M. Rybicka; K. Kowalczyk; |
1376 | Phonotactic Language Recognition Using A Universal Phoneme Recognizer and A Transformer Architecture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we describe a phonotactic language recognition model that effectively manages long and short n-gram input sequences to learn contextual phonotactic-based vector embeddings. |
D. Romero; L. F. D�Haro; M. Estecha-Garitagoitia; C. Salamea; |
1377 | Improved Language Identification Through Cross-Lingual Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that models pre-trained on many languages perform better and enable language identification systems that require very little labeled data to perform well. |
A. Tjandra; et al. |
1378 | Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate language adaptive training on XLSR models. |
Y. Lu; M. Huang; X. Qu; P. Wei; Z. Ma; |
1379 | Investigation of Robustness of Hubert Features from Different Layers to Domain, Accent and Language Variations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the use of pre-trained HuBERT model to build downstream Automatic Speech Recognition (ASR) models using data that have differences in domain, accent and even language. |
P. Kumar; V. N. Sukhadia; S. Umesh; |
1380 | Combining Unsupervised and Text Augmented Semi-Supervised Learning For Low Resourced Autoregressive Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recent advances in unsupervised representation learning have demonstrated the impact of pretraining on large amounts of read speech. We adapt these techniques for domain adaptation in low-resource�both in terms of data and compute�conversational and broadcast domains. |
C. -F. Li; F. Keith; W. Hartmann; M. Snover; |
1381 | Key-Sparse Transformer for Multimodal Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a key-sparse Transformer is proposed for efficient emotion recognition by focusing more on emotion related information. |
W. Chen; X. Xing; X. Xu; J. Yang; J. Pang; |
1382 | Neural Architecture Search for Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to apply neural architecture search (NAS) techniques to automatically configure the SER models. |
X. Wu; S. Hu; Z. Wu; X. Liu; H. Meng; |
1383 | Multi-Lingual Multi-Task Speech Emotion Recognition Using Wav2vec 2.0 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a Multi-Lingual (MLi) and Multi-Task Learning (MTL) audio only SER system based on the multi-lingual pre-trained wav2vec 2.0 model. |
M. Sharma; |
1384 | LIGHT-SERNET: A Lightweight Fully Convolutional Neural Network for Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose an efficient and lightweight fully convolutional neural network for speech emotion recognition in systems with limited hard-ware resources. |
A. Aftab; A. Morsali; S. Ghaemmaghami; B. Champagne; |
1385 | Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel approach for multi-modal emotion recognition from conversations using speech and text. |
S. Dutta; S. Ganapathy; |
1386 | Speech Emotion Recognition Using Self-Supervised Features Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce a modular End-to-End (E2E) SER system based on an Upstream + Downstream architecture paradigm, which allows easy use/integration of a large variety of self-supervised features. |
E. Morais; R. Hoory; W. Zhu; I. Gat; M. Damasceno; H. Aronowitz; |
1387 | Using Acoustic Deep Neural Network Embeddings to Detect Multiple Sclerosis From Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study we show that there is no need to employ a special neural network architecture such as x-vectors to calculate effective features, but (even more) indicative features can be derived on the basis of a standard Deep Neural Network acoustic model. |
G. Gosztolya; L. T�th; V. Svindt; J. B�na; I. Hoffmann; |
1388 | Repetition Assessment for Speech and Language Disorders: A Study of The Logopenic Variant of Primary Progressive Aphasia Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel technique for quantifying the quality of repetition in speech recordings and demonstrate the utility of the technique by using it to distinguish between healthy speakers and lvPPA speakers. |
R. Haulcy; K. Placek; B. Tracey; A. Vogel; J. Glass; |
1389 | Speech Tasks Relevant to Sleepiness Determined With Deep Transfer Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we use the Voiceome dataset to extract speech from 1,828 participants to develop a deep transfer learning model using Hidden-Unit BERT (HuBERT) speech representations to detect sleepiness from individuals. |
B. Tran; Y. Zhu; X. Liang; J. W. Schwoebel; L. A. Warrenburg; |
1390 | Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an effective phase reconstruction strategy for neural speech enhancement that can operate in noisy environments. |
D. Kim; H. Han; H. -K. Shin; S. -W. Chung; H. -G. Kang; |
1391 | Continual Self-Training With Bootstrapped Remixing For Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose RemixIT, a simple and novel self-supervised training method for speech enhancement. |
E. Tzinis; Y. Adi; V. K. Ithapu; B. Xu; A. Kumar; |
1392 | Alleviating The Loss-Metric Mismatch in Supervised Single-Channel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the loss-metric mismatch problem of supervised single-channel speech enhancement system. |
Y. Yang; H. Zhang; X. Zhang; H. Zhang; |
1393 | A Priori SNR Estimation for Speech Enhancement Based on PESQ-Induced Reinforcement Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on introducing PESQ to improve Deep Xi, a recently proposed minimum mean square error (MMSE) based speech enhancement with a priori signal-to-ratio (SNR) estimated by a deep neural network. |
T. Lei; H. Ruan; K. Chen; J. Lu; |
1394 | A Training Framework for Stereo-Aware Speech Enhancement Using Deep Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel stereo-aware framework for speech enhancement, i.e., a training loss for deep learning-based speech enhancement to preserve the spatial image while enhancing the stereo mixture. |
B. Toloosham; K. Koishida; |
1395 | Joint Magnitude Estimation and Phase Recovery Using Cycle-In-Cycle GAN for Non-Parallel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In the second stage, we incorporate the pretrained CycleGAN with a complex-valued CycleGAN as a cycle-in-cycle structure to simultaneously recover phase information and refine the overall spectrum. |
G. Yu; A. Li; Y. Wang; Y. Guo; H. Wang; C. Zheng; |
1396 | Privacy Attacks for Automatic Speech Recognition Acoustic Models in A Federated Learning Framework Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset. |
N. Tomashenko; S. Mdhaffar; M. Tommasi; Y. Est�ve; J. -F. Bonastre; |
1397 | VADOI: Voice-Activity-Detection Overlapping Inference for End-To-End Long-Form Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. |
J. Wang; X. Tong; J. Guo; D. He; R. Maas; |
1398 | Torchaudio: Building Blocks for Audio and Speech Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this document, we provide an overview of the design principles, functionalities, and benchmarks of TorchAudio. |
Y. -Y. Yang; et al. |
1399 | Unsupervised Model Adaptation for End-to-End ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a method to perform unsupervised model adaptation for E2E ASR using first-pass transcriptions of adaptation data produced by the baseline ASR model itself. |
G. Sivaraman; R. Casal; M. Garland; E. Khoury; |
1400 | Speech Recognition Using Biologically-Inspired Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we revisit the incorporation of biologically-plausible models into deep learning and enhance their capabilities, by taking inspiration from the brain�s diverse neural and synaptic dynamics. |
T. Bohnstingl; A. Garg; S. Wozniak; G. Saon; E. Eleftheriou; A. Pantazi; |
1401 | ASSEM-VC: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Recent works on voice conversion (VC) focus on preserving the rhythm and the intonation as well as the linguistic content. |
K. -W. Kim; S. -W. Park; J. Lee; M. -C. Joe; |
1402 | Minimizing Residuals for Native-Nonnative Voice Conversion in A Sparse, Anchor-Based Representation of Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a dictionary-learning algorithm for reducing the sparse coding residual of an exemplar-based method for native-to-nonnative voice conversion (VC). |
C. Liberatore; R. Gutierrez-Osuna; |
1403 | Improving Recognition-Synthesis Based Any-to-one Voice Conversion with Cyclic Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This inconsistency between conversion and training stages constrains the speaker similarity of converted speech. To address this issue, a cyclic training method is proposed in this paper. |
Y. -N. Chen; L. -J. Liu; Y. -J. Hu; Y. Jiang; Z. -H. Ling; |
1404 | NVC-Net: End-To-End Adversarial Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose NVC-Net, an end-to-end adversarial network, which performs VC directly on the raw audio waveform. |
B. Nguyen; F. Cardinaux; |
1405 | U-GAT-VC: Unsupervised Generative Attentional Networks for Non-Parallel Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Various methods are proposed to approach non-parallel VC using deep neural networks. |
S. Shi; J. Shao; Y. Hao; Y. Du; J. Fan; |
1406 | Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages. |
X. Zhao; et al. |
1407 | A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a Commonsense Knowledge Enhanced Network with a retrospective loss, namely CKE-Net, to hierarchically perform dialog modeling, external knowledge integration, and historical state retrospect. |
Y. Xie; C. Sun; Z. Ji; |
1408 | Hierarchical and Multi-View Dependency Modelling Network for Conversational Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new model, called hierarchical and multi-view dependency modelling network (HMVDM), for the task of emotion recognition in conversations (ERC). |
Y. -P. Ruan; S. -K. Zheng; T. Li; F. Wang; G. Pei; |
1409 | MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a novel Multimodal Dynamic Fusion Network (MM-DFN) to recognize emotions by fully understanding multimodal conversational context. |
D. Hu; X. Hou; L. Wei; L. Jiang; Y. Mo; |
1410 | Modeling Intention, Emotion and External World in Dialogue Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a RelAtion Interaction Network (RAIN), consisting of Intention Relation Module and Emotion Relation Module, to jointly model mutual relationships and explicitly integrate historical intention information. |
W. Peng; Y. Hu; L. Xing; Y. Xie; X. Zhang; Y. Sun; |
1411 | A Neural Prosody Encoder for End-to-End Dialogue Act Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an E2E neural architecture that takes into account the need for characterizing prosodic phenomena co-occurring at different levels inside an utterance. |
K. Wei; et al. |
1412 | Improving Contextual Coherence in Variational Personalized and Empathetic Dialogue Agents Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hence, to improve the contextual coherence, we propose a novel Uncertainty Aware CVAE (UA-CVAE) framework. |
J. Y. Lee; K. Aik Lee; W. S. Gan; |
1413 | Fusion and Orthogonal Projection for Improved Face-Voice Association Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we hypothesize that enriched feature representation coupled with an effective yet efficient supervision is necessary in realizing a discriminative joint embedding space for improved face-voice association. |
M. S. Saeed; M. H. Khan; S. Nawaz; M. H. Yousaf; A. Del Bue; |
1414 | OpenFEAT: Improving Speaker Identification By Open-Set Few-Shot Embedding Adaptation with Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: With our algorithm, Open-set Few-shot Embedding Adaptation with Transformer (openFEAT), we observe that the speaker identification equal error rate (IEER) on simulated households with 2 to 7 hard-to-discriminate speakers is reduced by 23% to 31% relative. |
K. C. Kishan; et al. |
1415 | Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we come up with an innovative asymmetric structure, which takes the large-scale ECAPA-TDNN model for enrollment and the small-scale ECAPA-TDNNLite model for verification. |
Q. Li; L. Yang; X. Wang; X. Qin; J. Wang; M. Li; |
1416 | Speaker Embedding Conversion for Backward and Cross-Channel Compatibility Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This makes the process of interchangeability between systems very cumbersome and costly. In this paper, we address this issue by proposing a highly efficient speaker embedding converter that transforms a speaker embedding extracted from system A into a speaker embedding that can be used by system B. |
T. Chen; E. Khoury; |
1417 | Improving Fairness in Speaker Verification Via Group-Adapted Fusion Network Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we investigate the fairness of speaker verification models on controlled datasets with imbalanced gender distributions, providing direct evidence that model performance suffers for underrepresented groups. |
H. Shen; et al. |
1418 | CS-REP: Making Speaker Verification Networks Embracing Re-Parameterization Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type networks, to increase the inference speed and verification accuracy of models. |
R. Zhang; et al. |
1419 | Distilhubert: Speech Representation Learning By Layer-Wise Distillation of Hidden-Unit Bert Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. |
H. -J. Chang; S. -w. Yang; H. -y. Lee; |
1420 | Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Since the network capacity is limited, we believe the speech recognition performance could be further improved if the model is dedicated to audio content information learning. To this end, we propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL), which forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers. |
C. Wang; et al. |
1421 | Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning. |
Y. Wang; J. Li; H. Wang; Y. Qian; C. Wang; Y. Wu; |
1422 | Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks, and increase scalability of the model to multiple tasks or languages. |
B. Thomas; S. Kessler; S. Karout; |
1423 | An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We explore HuBERT with larger numbers of clusters and iterations in order to obtain better speech representation. |
T. Maekaku; X. Chang; Y. Fujita; S. Watanabe; |
1424 | Optimize Wav2vec2s Architecture for Small Training Set Through Analyzing Its Pre-Trained Models Attention Pattern Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. |
L. Chen; M. Asgari; H. H. Dodge; |
1425 | Part-of-Speech Models Compression Methods for On-Device Grapheme-to-Phoneme Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The paper investigates methods of compressing part-of-speech models that are developed for an on-device grapheme-to-phoneme conversion module. |
M. Kubis; M. M�loux; P. Sk�rzewski; M. Lewandowski; G. Jho; H. Park; |
1426 | An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task. |
W. Dai; et al. |
1427 | Chinese Spelling Text Generation of Mathematical Formulas Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigated text generation methods that translate mathematical formulas in LaTex into Chinese spelling texts. |
S. Dong; S. Liu; S. Liu; B. Tang; |
1428 | Polyphone Disambiguation and Accent Prediction Using Pre-Trained Language Models in Japanese TTS Front-End Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a method for polyphone disambiguation (PD) and accent prediction (AP). |
R. Hida; M. Hamada; C. Kamada; E. Tsunoo; T. Sekiya; T. Kumakura; |
1429 | Data Augmentation for Long-Tailed and Imbalanced Polyphone Disambiguation in Mandarin Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we proposed a simple data-augmentation method based on the pre-trained mask language model BERT to mitigate the long-tailed and imbalanced distribution problem. |
Y. Zhang; H. Zhang; Y. Lin; |
1430 | Leveraging Bilinear Attention to Improve Spoken Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To trigger more adequate information interaction between the input intent or slot features, we propose a novel framework with Bilinear attention, which can build the second order feature interactions. |
D. Chen; Z. Huang; Y. Zou; |
1431 | Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence and ASR Hypothesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). |
Z. Wang; et al. |
1432 | Integration of Pre-Trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a simple and robust integration method for the E2E SLU network with a novel Interface, Continuous Token Interface (CTI). |
S. Seo; D. Kwak; B. Lee; |
1433 | Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, training an E2E system remains a challenge, largely due to the scarcity of paired audio-semantics data. In this paper, we consider an E2E system as a multi-modal model, with audio and text functioning as its two modalities, and use a cross-modal latent space (CMLS) architecture, where a shared latent space is learned between the �acoustic� and �text� embeddings. |
B. Agrawal; et al. |
1434 | Improving End-to-end Models for Set Prediction in Spoken Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To improve E2E SLU models when entity spoken order is unknown, we propose a novel data augmentation technique along with an implicit attention based alignment method to infer the spoken order. |
H. -K. J. Kuo; Z. T�ske; S. Thomas; B. Kingsbury; G. Saon; |
1435 | ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We present ESPnet-SLU, which is designed for quick development of spoken language understanding in a single framework. |
S. Arora; et al. |
1436 | The Coral++ Algorithm for Unsupervised Domain Adaptation of Speaker Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To alleviate the degradation caused by domain mismatch, we propose a new feature-based unsupervised domain adaptation algorithm. |
R. Li; W. Zhang; D. Chen; |
1437 | Learning Domain-Invariant Transformation for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Automatic speaker verification (ASV) faces domain shift caused by the mismatch of intrinsic and extrinsic factors such as recording device and speaking style in real-world applications, which leads to unsatisfactory performance. To this end, we propose the meta generalized transformation via meta-learning to build a domain-invariant embedding space. |
H. Zhang; L. Wang; K. A. Lee; M. Liu; J. Dang; H. Chen; |
1438 | Domain Robust Deep Embedding Learning for Speaker Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a domain robust deep embedding learning method for speaker verification (SV) tasks. |
H. -R. Hu; Y. Song; Y. Liu; L. -R. Dai; I. McLoughlin; L. Liu; |
1439 | Tackling The Score Shift in Cross-Lingual Speaker Verification By Exploiting Language Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two techniques to increase cross-lingual speaker verification robustness. |
J. Thienpondt; B. Desplanques; K. Demuynck; |
1440 | Domain Adaptation for Speaker Recognition in Singing and Spoken Voice Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the effect of speaking style and audio condition variability between the spoken and singing voice on speaker recognition performance. |
A. Chowdhury; A. Cozzo; A. Ross; |
1441 | CDMA: Cross-Domain Distance Metric Adaptation for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Cross-domain Distance Metric Adaptation (CDMA) approach to alleviate the domain shift in the distance metric space, where the source and target domains share the same classes, i.e., within- and between-speaker. |
J. Li; J. Han; H. Song; |
1442 | Word Order Does Not Matter for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known. |
V. Pratap; Q. Xu; T. Likhomanenko; G. Synnaeve; R. Collobert; |
1443 | Contrastive Siamese Network for Semi-Supervised Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces contrastive siamese (c-siam) network, an architecture for leveraging unlabeled acoustic data in speech recognition. |
S. Khorram; J. Kim; A. Tripathi; H. Lu; Q. Zhang; H. Sak; |
1444 | Sequence Transduction with Graph-Based Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, e.g., for studying different transition rules, implementing different transducer losses, or restricting alignments. |
N. Moritz; T. Hori; S. Watanabe; J. L. Roux; |
1445 | Speechmoe2: Mixture-of-Experts Model with Improved Routing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To further improve speech recognition performance against varying domains and accents, we propose a new router architecture which integrates additional global domain and accent embedding into router input to promote adaptability. |
Z. You; S. Feng; D. Su; D. Yu; |
1446 | Supervised Attention in Sequence-to-Sequence Models for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we treat the correspondence between attention weights and alignments as a learning problem by imposing a supervised attention loss. |
G. -P. Yang; H. Tang; |
1447 | End-to-End Speech Recognition from Federated Acoustic Models Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French and Italian sets of the CommonVoice dataset, a large heterogeneous dataset containing thousands of different speakers, acoustic environments and noises. |
Y. Gao; et al. |
1448 | HiFiDenoise: High-Fidelity Denoising Text to Speech with Adversarial Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose HiFiDenoise, a speech synthesis system with adversarial networks that can synthesize high-fidelity speech with low-quality and noisy speech data. |
L. Zhang; Y. Ren; L. Deng; Z. Zhao; |
1449 | VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates singing audio from lyrics and musical score. |
Y. Zhang; J. Cong; H. Xue; L. Xie; P. Zhu; M. Bi; |
1450 | A Melody-Unsupervision Model for Singing Voice Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the issue, we propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time but generates singing voice audio given a melody and lyrics input in inference time. |
S. Choi; J. Nam; |
1451 | Transformer-S2A: Robust and Efficient Speech-to-Animation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel robust and efficient Speech-to-Animation (S2A) approach for synchronized facial animation generation in human-computer interaction. |
L. Chen; Z. Wu; J. Ling; R. Li; X. Tan; S. Zhao; |
1452 | VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel multi-speaker VTS system based on cross-modal knowledge transfer from voice conversion (VC), where vector quantization with contrastive predictive coding (VQCPC) is used for the content encoder of VC to derive discrete phoneme-like acoustic units, which are transferred to a Lip-to-Index (Lip2Ind) network to infer the index sequence of acoustic units. |
D. Wang; S. Yang; D. Su; X. Liu; D. Yu; H. Meng; |
1453 | Fast Task-Specific Adaptation in Spoken Language Assessment with Meta-Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a fast adaptation framework with meta-learning for various task types in spoken language assessment under low-resource settings. |
B. Lin; L. Wang; |
1454 | Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. |
Y. Gong; Z. Chen; I. -H. Chu; P. Chang; J. Glass; |
1455 | A Model for Assessor Bias in Automatic Pronunciation Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a model for pronunciation assessment as the combination of an assessor independent (A) and an assessor specific (B) component. |
J. A. Lopez Saenz; T. Hain; |
1456 | Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper proposes a unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model. |
Y. Zhu; L. Wu; S. Cheng; M. Wang; |
1457 | Punctuation Prediction for Streaming On-Device Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we discuss one-pass models for both ASR and punctuation prediction to replace the conventional two-pass post-processing pipeline. |
Z. Zhou; T. Tan; Y. Qian; |
1458 | ASR Error Correction with Dual-Channel Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose 1) an error correction model, which takes account of both contextual information and phonetic information by dual-channel; 2) a self-supervised learning method for the model. |
F. Zhang; M. Tu; S. Liu; J. Yan; |
1459 | L-SpEx: Localized Target Speaker Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose an end-to-end localized target speaker extraction on pure speech cues, that is called L-SpEx. |
M. Ge; C. Xu; L. Wang; E. S. Chng; J. Dang; H. Li; |
1460 | DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, from the time-frequency domain perspective, we propose a densely-connected pyramid complex convolutional network, termed DPCCN, to improve the robustness of speech separation under complicated conditions. |
J. Han; Y. Long; L. Burget; J. Cernock�; |
1461 | Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, novel mixed precision DNN quantization methods are proposed by applying locally variable bit-widths to individual TCN components of a TF masking based multi-channel speech separation system. |
J. Xu; J. Yu; X. Liu; H. Meng; |
1462 | The Impact of Removing Head Movements on Audio-Visual Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today�s learning-based methods as they often degrade the performance of models that are trained on clean, frontal, and steady face images. To alleviate this problem, we propose to use robust face frontalization (RFF) in combination with an AVSE method based on a variational auto-encoder (VAE) model. |
Z. Kang; M. Sadeghi; R. Horaud; X. Alameda-Pineda; J. Donley; A. Kumar; |
1463 | VSEGAN: Visual Speech Enhancement Generative Adversarial Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel framework that involves visual information for speech enhancement, by incorporating a Generative Adversarial Network (GAN). |
X. Xu; et al. |
1464 | Endpoint Detection for Streaming End-to-End Multi-Talker ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we address the EP detection problem in the SURT framework by introducing an end-of-sentence token as an output unit, following the practice of single-talker end-to-end models. |
L. Lu; J. Li; Y. Gong; |
1465 | Continuous Streaming Multi-Talker ASR with Dual-Path Transducers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions. In this paper, we investigate it for multi-turn meetings containing multiple speakers using the Streaming Unmixing and Recognition Transducer (SURT) model, and show that naively extending the single-turn model to this harder setting incurs a performance penalty. |
D. Raj; L. Lu; Z. Chen; Y. Gaur; J. Li; |
1466 | Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an extension of GTC to model the posteriors of both labels and label transitions by a neural network, which can be applied to a wider range of tasks. |
X. Chang; N. Moritz; T. Hori; S. Watanabe; J. L. Roux; |
1467 | ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose adversarial domain adaptive VAD (ADA-VAD), which is a deep neural network (DNN) based VAD method highly robust to audio samples with various noise types and low SNRs. |
T. Kim; J. Chang; J. H. Ko; |
1468 | Multi-Channel End-To-End Neural Diarization with Distributed Microphones Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. |
S. Horiguchi; Y. Takashima; P. Garc�a; S. Watanabe; Y. Kawaguchi; |
1469 | Multi-Channel Speaker Diarization Using Spatial Features for Meetings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose two multi-channel diarization systems which have enhanced capability in detecting overlapped speech and identify speakers via learning spatial features. |
N. Zheng; et al. |
1470 | Speaker Normalization for Self-Supervised Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These shortcuts usually harm a model�s ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. |
I. Gat; H. Aronowitz; W. Zhu; E. Morais; R. Hoory; |
1471 | Sentiment-Aware Automatic Speech Recognition Pre-Training for Enhanced Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). |
A. Ghriss; B. Yang; V. Rozgic; E. Shriberg; C. Wang; |
1472 | Confidence Estimation for Speech Emotion Recognition Based on The Relationship Between Emotion Categories and Primitives Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we propose (1) a novel confidence metric for SER based on the relationship between emotion primitives: arousal, valence, and dominance (AVD) and emotion categories (ECs), (2) EmoConfidNet – a DNN trained alongside the EC recognizer to predict the proposed confidence metric, and (3) a data filtering technique used to enhance the training of EmoConfidNet and the EC recognizer. |
Y. Li; C. Papayiannis; V. Rozgic; E. Shriberg; C. Wang; |
1473 | AuxFormer: Robust Approach to Audiovisual Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This study proposes the AuxFormer framework, which addresses in a principled way the aforementioned challenges. |
L. Goncalves; C. Busso; |
1474 | Fusing ASR Outputs in Joint Training for Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to fuse Automatic Speech Recognition (ASR) outputs into the pipeline for joint training SER. |
Y. Li; P. Bell; C. Lai; |
1475 | Speech Emotion Recognition with Co-Attention Based Multi-Level Acoustic Information Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose an end-to-end speech emotion recognition system using multi-level acoustic information with a newly designed co-attention module. |
H. Zou; Y. Si; C. Chen; D. Rajan; E. S. Chng; |
1476 | Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper investigates the effectiveness of multi-modal acoustic modelling for dysarthric speech recognition using acoustic features along with articulatory information. |
Z. Yue; E. Loweimi; Z. Cvetkovic; H. Christensen; J. Barker; |
1477 | Raw Source and Filter Modelling for Dysarthric Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we build acoustic models using the raw magnitude spectra of the source and filter components. |
Z. Yue; E. Loweimi; Z. Cvetkovic; |
1478 | Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR. |
M. Soleymanpour; M. T. Johnson; R. Soleymanpour; J. Berry; |
1479 | Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders Step 2: Contribution of The Emergence of Phonetic Traits Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In a recent work, we presented competitive performance achieved with a CNN-based model trained on normal speech for the French phone classification and how it correlates well with different perceptual measures when exposed to disordered speech. |
S. Abderrazek; C. Fredouille; A. Ghio; M. Lalain; C. Meunier; V. Woisard; |
1480 | Constant Q Cepstral Coefficients for Classification of Normal Vs. Pathological Infant Cry Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To that effect, in this paper, we propose a new approach of feature extraction based on Constant Q Transform (CQT) that is known to have variable spectro-temporal resolution w.r.t Heisenberg�s un-certainty principle in signal processing framework. |
H. A. Patil; A. T. Patil; A. Kachhi; |
1481 | Nonverbal Sound Detection for Disordered Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we describe the design of a dataset, model considerations for real-world deployment, and efforts towards model personalization. |
C. Lea; et al. |
1482 | Conditional Diffusion Probabilistic Model for Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. |
Y. -J. Lu; Z. -Q. Wang; S. Watanabe; A. Richard; C. Yu; Y. Tsao; |
1483 | Deepfilternet: A Low Complexity Speech Enhancement Framework for Full-Band Audio Based On Deep Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Recent work proposed to use a complex filter instead of a point-wise multiplication with a mask. This allows to incorporate information from previous and future time steps exploiting local correlations within each frequency band.In this work, we propose DeepFilterNet, a two stage speech enhancement framework utilizing deep filtering. |
H. Schroter; A. N. Escalante-B; T. Rosenkranz; A. Maier; |
1484 | MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Although certain unsupervised learning frameworks have also been proposed to solve the pair constraint, they still require clean speech or noise for training. Therefore, in this paper, we propose MetricGAN-U, which stands for MetricGANunsupervised, to further release the constraint from conventional unsupervised learning. |
S. -W. Fu; C. Yu; K. -H. Hung; M. Ravanelli; Y. Tsao; |
1485 | Uformer: A Unet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. |
Y. Fu; et al. |
1486 | Attenuation Of Acoustic Early Reflections In Television Studios Using Pretrained Speech Synthesis Neural Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the problem of early acoustic reflections in television studios and control rooms, and propose a two-stage method that exploits the knowledge of a pretrained speech synthesis generator. |
T. Rosenbaum; I. Cohen; E. Winebrand; |
1487 | Non-Autoregressive ASR with Self-Conditioned Folded Encoders Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes CTC-based non-autoregressive ASR with self-conditioned folded encoders. |
T. Komatsu; |
1488 | Have Best of Both Worlds: Two-Pass Hybrid and E2E Cascading Framework for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a two-pass hybrid and E2E cascading (HEC) framework to combine the hybrid and E2E model in order to take advantage of both sides, with hybrid in the first pass and E2E in the second pass. |
G. Ye; V. Mazalov; J. Li; Y. Gong; |
1489 | Conformer-Based Hybrid ASR System For Switchboard Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe. |
M. Zeineldeen; et al. |
1490 | Improving Factored Hybrid HMM Acoustic Modeling Without State Tying Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we show that a factored hybrid hidden Markov model (FH-HMM) which is defined without any phonetic state-tying outperforms a state-of-the-art hybrid HMM. |
T. Raissi; E. Beck; R. Schl�ter; H. Ney; |
1491 | Auditory-Based Data Augmentation for End-to-end Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, a well-verified auditory-based model, which can simulate various hearing abilities, is investigated for the purpose of data augmentation for end-to-end speech recognition. |
Z. Tu; J. Deadman; N. Ma; J. Barker; |
1492 | Deliberation of Streaming RNN-Transducer By Non-Autoregressive Decoding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to deliberate the hypothesis alignment of a streaming RNN-T model with the previously proposed Align-Refine non-autoregressive decoding method and its improved versions. |
W. Wang; K. Hu; T. N. Sainath; |
1493 | Neural HMMS Are All You Need (For High-Quality Attention-Free TTS) Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper describes how the old and new paradigms can be combined to obtain the advantages of both worlds, by replacing attention in neural TTS with an autoregressive left-right no-skip hidden Markov model defined by a neural network. |
S. Mehta; �. Sz�kely; J. Beskow; G. E. Henter; |
1494 | Autoregressive Variational Autoencoder with A Hidden Semi-Markov Model-Based Structured Attention for Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an autoregressive speech synthesis model based on the variational autoencoder incorporating latent sequence representation for acoustic and linguistic features and the structure of a hidden semi-Markov model (HSMM). |
T. Fujimoto; K. Hashimoto; Y. Nankaku; K. Tokuda; |
1495 | PAMA-TTS: Progression-Aware Monotonic Attention for Stable SEQ2SEQ TTS with Accurate Phoneme Duration Control Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes PAMA-TTS to address the problem. |
Y. He; J. Luan; Y. Wang; |
1496 | Improving Fastspeech TTS with Efficient Self-Attention and Compact Feed-Forward Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a non-autoregressive TTS, the latency and computation load in inference is shifted from vocoder to transformer where the efficiency is limited by the quadratic time and memory complexity in the self-attention mechanism, particularly for a long text sequence. To tackle this challenges, We propose two models, ProbSparseFS and LinearizedFS, which have efficient self-attention arrangements to improve the inference speed and memory complexity. |
Y. Xiao; X. Wang; L. He; F. K. Soong; |
1497 | Varianceflow: High-Quality and Controllable Text-to-Speech Using Variance Information Via Normalizing Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel model called VarianceFlow combining the advantages of the two types. |
Y. Lee; J. Yang; K. Jung; |
1498 | Mixer-TTS: Non-Autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes Mixer-TTS, a non-autoregressive model for mel-spectrogram generation. |
O. Tatanov; S. Beliaev; B. Ginsburg; |
1499 | Knowledge Augmented Bert Mutual Network in Multi-Turn Spoken Dialogues Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to equip a BERT-based joint model with a knowledge attention module to mutually leverage dialogue contexts between two SLU tasks. |
T. -W. Wu; B. -H. Juang; |
1500 | TINYS2I: A Small-Footprint Utterance Classification Model with Contextual Support for On-Device SLU Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Tiny Signal-to-interpretation (TinyS2I), an end-to-end on-device SLU approach which is focused on heavily resource constrained devices. |
A. Alexandridis; K. M. Sathyendra; G. P. Strimel; P. Kveton; J. Webb; A. Mouchtaris; |
1501 | Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a hierarchical conversation model that is capable of directly using dialog history in speech form, making it fully E2E. |
V. Sunder; S. Thomas; H. -K. J. Kuo; J. Ganhotra; B. Kingsbury; E. Fosler-Lussier; |
1502 | Improving Spoken Language Understanding By Enhancing Text Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel model to train a language model namely CapuBERT that is able to deal with spoken form input from ASR module. |
T. B. Nguyen; |
1503 | Multi-Task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the NLU model in the two-stage system is not streamable, as it must wait for the audio segments to complete processing, which ultimately impacts the latency of the SLU system. In this work, we propose a streamable multi-task semantic transducer model to address these considerations. |
X. Fu; et al. |
1504 | A Bert Based Joint Learning Model with Feature Gated Mechanism for Spoken Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a joint learning model based on BERT, which introduce dual encoder structure and utilizes semantic information by performing feature gate mechanisms in predicting intents and slots. |
W. Zhang; L. Jiang; S. Zhang; S. Wang; J. Tan; |
1505 | MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the performance of such systems may degrade under short utterance scenarios. To address these issues, we propose a multi-scale frequency-channel attention (MFA), where we characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN. |
T. Liu; R. K. Das; K. Aik Lee; H. Li; |
1506 | MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the convolution model often lacks the ability of long-term dependency modeling due to the limitation of receptive field, while the self-attention model is insufficient to model local information. To tackle this limitation, we propose a new multi-layer perceptrons based speaker verification network (MLP-SVNet) which can apply MLPs across temporal and frequency dimensions to capture the local and global information at the same time. |
B. Han; Z. Chen; B. Liu; Y. Qian; |
1507 | Real Additive Margin Softmax for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: A supposed behavior of AM-Softmax is that it can shrink within-class variation by putting emphasis on target logits, which in turn improves margin between target and non-target classes. In this paper, we conduct a careful analysis on the behavior of AM-Softmax loss, and show that this loss does not implement real max-margin training. |
L. Li; R. Nai; D. Wang; |
1508 | Statistical Pyramid Dense Time Delay Neural Network for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose statistical pyramid dense TDNN (SPD-TDNN) with the statistical pyramid pooling module which captures the context information. |
Z. -K. Wan; Q. -H. Ren; Y. -C. Qin; Q. -R. Mao; |
1509 | On The Importance of Different Frequency Bins for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the frequency reweighting layer (FRL) to automatically learn and balance the importance of different frequency bins. |
A. Deng; S. Wang; W. Kang; F. Deng; |
1510 | Self-Knowledge Distillation Via Feature Enhancement for Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel self-knowledge distillation method, namely Self-Knowledge Distillation via Feature Enhancement (SKDFE). |
B. Liu; H. Wang; Z. Chen; S. Wang; Y. Qian; |
1511 | Joint Ego-Noise Suppression and Keyword Spotting on Sweeping Robots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel approach for joint ego-noise suppression and keyword detection. |
Y. Na; Z. Wang; L. Wang; Q. Fu; |
1512 | Progressive Continual Learning for Spoken Keyword Spotting Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Catastrophic forgetting is a thorny challenge when updating keyword spotting (KWS) models after deployment. To tackle such challenges, we propose a progressive continual learning strategy for small-footprint spoken keyword spotting (PCL-KWS). |
Y. Huang; N. Hou; N. F. Chen; |
1513 | Unified Speculation, Detection, and Verification Keyword Spotting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep learning model that separates the keyword spotting task into three phases in order to further optimize both accuracy and latency of the overall system. |
G. -S. Fu; T. Senechal; A. Challenner; T. Zhang; |
1514 | Learning Decoupling Features Through Orthogonality Regularization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Motivated by this, we believe it is important to explore a method that can effectively extract common features while decoupling task-specific features. Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively. |
L. Wang; R. Gu; W. Zhuang; P. Gao; Y. Wang; Y. Zou; |
1515 | Temporal Early Exiting for Streaming Speech Commands Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, recurrent neural networks represent a common approach for streaming commands recognition systems. In this paper, we explore resource-efficient methods to short-circuit such systems in the time domain when the model is confident in its prediction. |
R. Tang; et al. |
1516 | A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate on designing a compact audio-visual WWS system by utilizing the visual information to alleviate the degradation. |
H. Zhou; J. Du; C. -H. Huck Yang; S. Xiong; C. -H. Lee; |
1517 | Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Prosody modeling has several challenges: 1) the extracted pitch used in previous prosody modeling works have inevitable errors, which hurts the prosody modeling; 2) different attributes of prosody (e.g., pitch, duration and energy) are dependent on each other and produce the natural prosody together; and 3) due to high variability of prosody and the limited amount of high-quality data for TTS training, the distribution of prosody cannot be fully shaped. To tackle these issues, we propose ProsoSpeech, which enhances the prosody using quantized latent vectors pre-trained on large-scale unpaired and low-quality text and speech data. |
Y. Ren; et al. |
1518 | Prosodyspeech: Towards Advanced Prosody Model for Neural Text-to-Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes ProsodySpeech, a novel prosody model to enhance encoder-decoder neural Text-To-Speech (TTS), to generate high expressive and personalized speech even with very limited training data. |
Y. Yi; L. He; S. Pan; X. Wang; Y. Xiao; |
1519 | Hierarchical Prosody Modeling and Control in Non-Autoregressive Parallel Neural TTS Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we train a non-autoregressive parallel neural TTS front-end model hierarchically conditioned on both coarse and fine-grained acoustic speech features to learn a latent prosody space with intuitive and meaningful dimensions. |
T. Raitio; J. Li; S. Seshadri; |
1520 | Discourse-Level Prosody Modeling with A Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. |
N. -Q. Wu; Z. -C. Liu; Z. -H. Ling; |
1521 | Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel approach for unsupervised word-level prosody tagging with two stages, where we first group the words into different types with a decision tree according to their phonetic content and then cluster the prosodies using GMM within each type of words separately. |
Y. Guo; C. Du; K. Yu; |
1522 | A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence. |
X. Chen; et al. |
1523 | Slim: Explicit Slot-Intent Mapping with Bert for Joint Multi-Intent Detection and Slot Filling Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a multi-intent SLU framework, called SLIM, to jointly learn multi-intent detection and slot filling based on BERT. |
F. Cai; W. Zhou; F. Mi; B. Faltings; |
1524 | Joint Multiple Intent Detection and Slot Filling Via Self-Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Self-Distillation Joint NLU model (SDJN) for multi-intent NLU. |
L. Chen; P. Zhou; Y. Zou; |
1525 | A Graph Attention Interactive Refine Framework with Contextual Regularization for Jointing Intent Detection and Slot Filling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a two-stage Graph Attention Interactive Refine (GAIR) framework. |
Z. Zhu; P. Huang; H. Huang; S. Liu; L. Lao; |
1526 | Adjacency Pairs-Aware Hierarchical Attention Networks for Dialogue Intent Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose an Adjacency Pairs-Aware Hierarchical Attention Network (AP-HAN) for dialogue intent classification. |
J. Xu; P. Huang; Y. Peng; J. Ding; B. Huang; S. Huang; |
1527 | Advin: Automatically Discovering Novel Domains and Intents from User Text Utterances Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Real applications however encounter dynamic, rapidly evolving environments with newly emerging intents and domains, for which no labeled data or prior information is available. For such a setting, we propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled text. |
N. Vedula; R. Gupta; A. Alok; M. Sridhar; S. Ananthakrishnan; |
1528 | A New Data Augmentation Method for Intent Classification Enhancement and Its Application on Spoken Conversation Datasets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present the Nearest Neighbors Scores Improvement (NNSI) algorithm for automatic data selection and labeling. |
Z. Kons; et al. |
1529 | Robust Speaker Verification with Joint Self-Supervised and Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To overcome the difficulty of acquiring annotated data and contain the high performance in the context of speaker verification, we propose in this work a self-supervised joint learning (SS-JL) framework which complements the supervised main task with self-supervised auxiliary tasks in joint training. |
K. Wang; et al. |
1530 | Robust Speaker Verification Using Population-Based Data Augmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a population-based searching strategy for optimizing the augmentation parameters. |
W. Lin; M. -W. Mak; |
1531 | RawNeXt: Speaker Verification System For Variable-Duration Utterances With Deep Layer Aggregation And Extended Dynamic Scaling Policies Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems. To deal with this issue, we propose a speaker verification system called RawNeXt that can handle input raw waveforms of arbitrary length by employing the following two components: (1) A deep layer aggregation strategy enhances speaker information by iteratively and hierarchically aggregating features of various time scales and spectral channels output from blocks. |
J. -H. Kim; H. -J. Shim; J. Heo; H. -J. Yu; |
1532 | Contrastive-mixup Learning for Improved Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric. |
X. Zhang; M. Jin; R. Cheng; R. Li; E. Han; A. Stolcke; |
1533 | A Study of The Robustness of Raw Waveform Based Speaker Embeddings Under Mismatched Conditions Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we conduct a cross-dataset study on parametric and non-parametric raw-waveform based speaker embeddings through speaker verification experiments. |
G. Zhu; F. Cwitkowitz; Z. Duan; |
1534 | Disentangled Speaker Embedding for Robust Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Entanglement of speaker features and redundant features may lead to poor performance when evaluating speaker verification systems on an unseen domain. To address this issue, we propose an InfoMax domain separation and adaptation network (InfoMax�DSAN) to disentangle the domain-specific features and domain-invariant speaker features based on domain adaptation techniques. |
L. Yi; M. -W. Mak; |
1535 | Performance-Efficiency Trade-Offs in Unsupervised Pre-Training for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Putting together all our observations, we introduce SEW-D (Squeezed and Efficient Wav2vec with Disentangled Attention), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. |
F. Wu; K. Kim; J. Pan; K. J. Han; K. Q. Weinberger; Y. Artzi; |
1536 | Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there is further room for improving the seed model used to initialize the MPL training, as it is in general critical for a PL-based method to start training from high-quality pseudo-labels. To this end, we propose to enhance MPL by (1) introducing the Conformer architecture to boost the overall recognition accuracy and (2) exploiting iterative pseudo-labeling with a language model to improve the seed model before applying MPL. |
Y. Higuchi; N. Moritz; J. L. Roux; T. Hori; |
1537 | Tts4pretrain 2.0: Advancing The Use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce learning from supervised speech earlier on in the training process with consistency-based regularization between real and synthesized speech. |
Z. Chen; Y. Zhang; A. Rosenberg; B. Ramabhadran; P. Moreno; G. Wang; |
1538 | SYNT++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two novel techniques during training to mitigate the problems due to the distribution gap: (i) a rejection sampling algorithm and (ii) using separate batch normalization statistics for the real and the synthetic samples. |
T. -Y. Hu; M. Armandpour; A. Shrivastava; J. -H. R. Chang; H. Koppula; O. Tuzel; |
1539 | Pseudo-Labeling for Massively Multilingual Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages. |
L. Lugosch; T. Likhomanenko; G. Synnaeve; R. Collobert; |
1540 | DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a dual-path network for the far-field acoustic model, which uses voice processing (VP) signal and acoustic echo cancellation (AEC) signal as input. |
D. Ma; Y. Wang; L. He; M. Jin; D. Su; D. Yu; |
1541 | SERAB: A Multi-Lingual Benchmark for Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER. |
N. Scheidwasser-Clow; M. Kegler; P. Beckmann; M. Cernak; |
1542 | Enhancing Privacy Through Domain Adaptive Noise Injection For Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a method to improve inference privacy for sensitive features by injecting noise into the input speech data, but without degrading the SER system performance. |
T. Feng; H. Hashemi; M. Annavaram; S. S. Narayanan; |
1543 | Selective Multi-Task Learning For Speech Emotion Recognition Using Corpora Of Different Styles Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate effective combination methods based on multi-task learning (MTL) considering the style attribute. |
H. Zhang; M. Mimura; T. Kawahara; K. Ishizuka; |
1544 | Frontend Attributes Disentanglement for Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we aim to perform frontend attributes disentanglement (AD) for SER task, using a pre-trained SR model. |
Y. -X. Xi; Y. Song; L. -R. Dai; I. McLoughlin; L. Liu; |
1545 | Exploiting Annotators� Typed Description of Emotion Perception to Maximize Utilization of Ratings for Speech Emotion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This practice provides valuable emotional information, which, however, is ignored in most emotion recognition studies. This paper utilizes easy-accessed natural language processing toolkits to mine the sentiment of these typed descriptions, enriching and maximizing the information obtained from the annotators. |
H. -C. Chou; W. -C. Lin; C. -C. Lee; C. Busso; |
1546 | Automated Audio Captioning Using Transfer Learning and Reconstruction Latent Space Similarity Regularization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we examine the use of Transfer Learning using Pretrained Audio Neural Networks (PANNs) [1], and propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task [2]. |
A. Koh; X. Fuzhao; C. E. Siong; |
1547 | Fast-Slow Transformer for Visually Grounding Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We present Fast-Slow Transformer for Visually Grounding Speech, or FaST-VGS. |
P. Peng; D. Harwath; |
1548 | Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8). |
A. Shah; et al. |
1549 | AIMNet: Adaptive Image-Tag Merging Network For Automatic Medical Report Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is mainly due to severe visual and textual data biases. To address these problems, we propose an Adaptive Image-Tag Merging Network (AIMNet) that first predicts the tags of diseases from the input image, and then adaptively merges the visual information of the input image and disease information from the disease tags to learn the disease-oriented visual features that can better represent abnormal regions of the input image, and thus can be used to alleviate data bias problems. |
J. Shi; S. Wang; R. Wang; S. Ma; |
1550 | Adversarial Input Ablation for Audio-Visual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an adversarial data augmentation strategy for speech spectrograms, within the context of training a model to semantically ground spoken audio captions to the images they describe. |
D. Xu; D. Harwath; |
1551 | Gated Multimodal Fusion with Contrastive Learning for Turn-Taking Prediction in Human-Robot Dialogue Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: More importantly, to tackle the data imbalance issue, we design a simple yet effective data augmentation method to construct negative instances without supervision and apply contrastive learning to obtain better feature representations. |
J. Yang; P. Wang; Y. Zhu; M. Feng; M. Chen; X. He; |
1552 | Joint Far- and Near-End Speech Intelligibility Enhancement Based on The Approximated Speech Intelligibility Index Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to use a simpler signal model and optimize speech intelligibility based on the Approximated Speech Intelligibility Index (ASII). |
A. J. Fuglsig; J. �stergaard; J. Jensen; L. S. Bertelsen; P. Mariager; Z. -H. Tan; |
1553 | Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in The Complex Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to combine the strengths of AC and BC microphones by employing a convolutional recurrent network that performs complex spectral mapping. |
H. Wang; X. Zhang; D. Wang; |
1554 | A Two-Step Backward Compatible Fullband Speech Enhancement System Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Speech enhancement methods based on deep learning have surpassed traditional methods. While many of these new approaches are operating on the wideband (16kHz) sample rate, a new … |
X. Zhang; et al. |
1555 | S-DCCRN: Super Wide Band DCCRN with Learnable Complex Feature for Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we extend our previous deep complex convolution recurrent neural network (DCCRN) substantially to a super wide band version�S-DCCRN, to perform speech denoising on speech of 32K Hz sampling rate. |
S. Lv; et al. |
1556 | Cognitive Coding Of Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach for cognitive coding of speech by unsupervised extraction of contextual representations in two hierarchical levels of abstraction. |
R. Lotfidereshgi; P. Gournay; |
1557 | Speech Enhancement for Low Bit Rate Speech Codec Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a neural extension to low bit rate speech codec (e.g., Codec2) that aims to improve the perceptual quality of synthesized speech. |
J. Lin; K. Kalgaonkar; Q. He; X. Lei; |
1558 | Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we propose a novel approach to introduce LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. |
J. Tian; et al. |
1559 | Being Greedy Does Not Hurt: Sampling Strategies for End-To-End Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, the Optimal Completion Distillation (OCD) training method was proposed which attempts to address some of those issues. In this paper, we analyze if the method is competitive over a strong MLE baseline and investigate its scalability towards large speech data beyond read speech, which to our knowledge is the first attempt known in literature. |
J. Heymann; E. Lakomkin; L. R�del; |
1560 | Investigating Sequence-Level Normalisation For CTC-Like End-to-End ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new CTC-like method, for E2E ASR training, by modifying the topology of original CTC, so that the well-known abuse of the blank label in CTC can be resolved theoretically. |
Z. Zhao; P. Bell; |
1561 | Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, to promote the word-level representation learning in end-to-end ASR, we propose a hierarchical conditional model that is based on connectionist temporal classification (CTC). |
Y. Higuchi; K. Karube; T. Ogawa; T. Kobayashi; |
1562 | Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an embedding aligner and modality switch training to better align the speech and text latent spaces. |
W. Wang; et al. |
1563 | Minimum Word Error Training For Non-Autoregressive Transformer-Based Code-Switching ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose various approaches to boosting the performance of a CTC-mask-based non-autoregressive Transformer under code-switching ASR scenario. |
Y. Peng; J. Zhang; H. Xu; H. Huang; E. S. Chng; |
1564 | Connecting Targets Via Latent Topics And Contrastive Learning: A Unified Framework For Robust Zero-Shot and Few-Shot Stance Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a unified end-to-end framework with a discrete latent topic variable that implicitly establishes the connections between targets. |
R. Liu; Z. Lin; P. Fu; Y. Liu; W. Wang; |
1565 | Prior-Bert and Multi-Task Learning for Target-Aspect-Sentiment Joint Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, the existing models do not take the heavily unbalanced distribution of labels into account and also do not give enough consideration to long-distance dependence of targets and aspect-sentiment pairs. To overcome these challenges, we propose a novel end-to-end model named Prior-BERT and Multi-Task Learning (PBERT-MTL), which can detect all triples more efficiently. |
C. Ke; Q. Xiong; C. Wu; Z. Liao; H. Yi; |
1566 | Cross-Target Stance Detection Via Refined Meta-Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the problem above, we propose a many-to-one CTSD model based on meta-learning. |
H. Ji; Z. Lin; P. Fu; W. Wang; |
1567 | A Robust Contrastive Alignment Method for Multi-Domain Text Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a robust contrastive alignment method to align text classification features of various domains in the same feature space by supervised contrastive learning. |
X. Li; et al. |
1568 | Incremental User Embedding Modeling for Personalized Text Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an incremental user embedding modeling approach, in which embeddings of user�s recent interaction histories are dynamically integrated into the accumulated history vectors via a trans-former encoder. |
R. Lian; C. -W. Huang; Y. Tang; Q. Gu; C. Ma; C. Guo; |
1569 | Block-Sparse Adversarial Attack to Fool Transformer-Based Text Classifiers Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this pa-per, we propose a gradient-based adversarial attack against transformer-based text classifiers. |
S. Sadrizadeh; L. Dolamic; P. Frossard; |
1570 | MANNER: Multi-View Attention Network For Noise Erasure Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. |
H. J. Park; B. H. Kang; W. Shin; J. S. Kim; S. W. Han; |
1571 | Dual-Branch Attention-In-Attention Transformer for Single-Channel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Motivated by that, we propose a dual-branch attention-in-attention transformer dubbed DB-AIAT to handle both coarse- and fine-grained regions of the spectrum in parallel. |
G. Yu; A. Li; C. Zheng; Y. Guo; Y. Wang; H. Wang; |
1572 | Time-Frequency Attention for Monaural Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a simple yet effective T-F attention (TFA) module, where a 2-D attention map is produced to provide differentiated weights to the spectral components of T-F representation. |
Q. Zhang; Q. Song; Z. Ni; A. Nicolson; H. Li; |
1573 | FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose an extended single-channel real-time speech enhancement framework called FullSubNet+ with following significant improvements. |
J. Chen; Z. Wang; D. Tuo; Z. Wu; S. Kang; H. Meng; |
1574 | Cross-Domain Speech Enhancement with A Neural Cascade Architecture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel cascade architecture to address the monaural speech enhancement problem. |
H. Wang; D. Wang; |
1575 | Speech Denoising in The Waveform Domain With Self-Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. |
Z. Kong; W. Ping; A. Dantrey; B. Catanzaro; |
1576 | SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. |
J. Pan; T. Lei; K. Kim; K. J. Han; S. Watanabe; |
1577 | Lattention: Lattice-Attention in ASR Rescoring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we look into the effectiveness of lattice cues for rescoring n-best lists in second-pass. |
P. Pandey; S. D. Torres; A. O. Bayer; A. Gandhe; V. Leutnant; |
1578 | Learning Acoustic Frame Labeling for Phoneme Segmentation with Regularized Attention Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a method for phoneme segmentation based on a regularized attention mechanism. |
B. Lin; L. Wang; |
1579 | Listen, Know and Spell: Knowledge-Infused Subword Modeling for Improving ASR Performance of OOV Named Entities Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose the Knowledge-Infused Subword Model (KISM), a novel technique for incorporating semantic context from KGs into the ASR pipeline for improving the performance of OOV named entities. |
N. Das; D. H. Chau; M. Sunkara; S. Bodapati; D. Bekal; K. Kirchhoff; |
1580 | Joint Speech Recognition and Audio Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The goal of AAC is to generate natural language descriptions of contents in audio samples. We propose several approaches for end-to-end joint modeling of ASR and AAC tasks and demonstrate their advantages over traditional approaches, which model these tasks independently. |
C. Narisetty; E. Tsunoo; X. Chang; Y. Kashiwagi; M. Hentschel; S. Watanabe; |
1581 | Speaker Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our method is easy to implement, and does not require transfer learning from speaker ID systems. We present objective and subjective metrics for evaluating performance on this task, and demonstrate that our proposed objective metrics correlate with human perception of speaker similarity. |
D. Stanton; et al. |
1582 | Voice Filter: Few-Shot Text-to-Speech Speaker Adaptation Using Voice Conversion As A Post-Processing Module Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel extremely low-resource TTS method called Voice Filter that uses as little as one minute of speech from a target speaker. |
A. Gabrys; et al. |
1583 | Fine-Grained Style Control In Transformer-Based Text-To-Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we present a novel architecture to realize fine-grained style control on the transformer-based text-to-speech synthesis (TransformerTTS). |
L. -W. Chen; A. Rudnicky; |
1584 | Using Multiple Reference Audios and Style Embedding Constraints for Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the fact that only the matched text and speech are used in the training process, using unmatched text and speech for inference would cause the model to synthesize speech with low content quality. In this study, we propose to mitigate these two problems by using multiple reference audios and style embedding constraints rather than using only the target audio. |
C. Gong; L. Wang; Z. Ling; J. Zhang; J. Dang; |
1585 | Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Such methods have limited ability in modeling the inter-speaker influence in conversations, and also neglect the speaking styles and the intra-speaker inertia inside each speaker. Inspired by DialogueGCN and its superiority in modeling such conversational influences than RNN based approaches, we propose a graph-based multi-modal context modeling method and adopt it to conversational TTS to enhance the speaking styles of synthesized speeches. |
J. Li; et al. |
1586 | Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a hierarchical framework to model speaking style from context. |
S. Lei; Y. Zhou; L. Chen; Z. Wu; S. Kang; H. Meng; |
1587 | SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We propose to create a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE) consisting of limited-size labeled training sets and corresponding evaluation sets. |
S. Shon; et al. |
1588 | Towards Reducing The Need for Speech Training Data to Build Spoken Language Understanding Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, large amounts of text data with suitable labels are usually available. In this paper, we propose a novel text representation and training methodology that allows E2E SLU systems to be effectively constructed using these text resources. |
S. Thomas; H. -K. J. Kuo; B. Kingsbury; G. Saon; |
1589 | Improving Cross-Modal Understanding in Visual Dialog Via Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the cross-modal understanding in visual dialog based on the vision-language pre-training model VD-BERT and propose a novel approach to improve the cross-modal understanding for visual dialog, named ICMU. |
F. Chen; X. Chen; S. Xu; B. Xu; |
1590 | News Recommendation Via Multi-Interest News Sequence Modelling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of existing methods typically overlook such important characteristic and thus fail to distinguish and model the potential multiple interests of a user, impeding accurate recommendation of the next piece of news. Therefore, this paper proposes multi-interest news sequence (MINS) model for news recommendation. |
R. Wang; S. Wang; W. Lu; X. Peng; |
1591 | Multi-Level Contrastive Learning for Cross-Lingual Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a multi-level contrastive learning (ML-CTL) framework to further improve the cross-lingual ability of pre-trained models. |
B. Chen; W. Guo; B. Gu; Q. Liu; Y. Wang; |
1592 | Augmentation Strategy Optimization for Language Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a new language processing and understanding where an adaptive data augmentation strategy for individual documents is proposed instead of using one universal policy for the whole dataset. |
C. -T. Chu; M. Rohmatillah; C. -H. Lee; J. -T. Chien; |
1593 | Multi-Feature Integration for Speaker Embedding Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we intend to maximize the speaker information by reconstructing the extracted speaker information in one feature from the other features while at the same time minimizing the irrelevant information. |
S. Sankala; S. M. Rafi B; S. R. M. K; |
1594 | Learnable Nonlinear Compression for Robust Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. |
X. Liu; M. Sahidullah; T. Kinnunen; |
1595 | Fine-Tuning Wav2Vec2 for Speaker Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: To adapt the framework to speaker recognition, we propose a single-utterance classification variant with cross-entropy or additive angular softmax loss, and an utterance-pair classification variant with BCE loss. |
N. Vaessen; D. A. Van Leeuwen; |
1596 | Graph Attentive Feature Aggregation for Text-Independent Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pair-wise relationships. |
H. -J. Shim; J. Heo; J. -H. Park; G. -H. Lee; H. -J. Yu; |
1597 | Multisv: Dataset for Far-Field Multi-Channel Speaker Verification Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Motivated by unconsolidated data situation and the lack of a standard benchmark in the field, we complement our previous efforts and present a comprehensive corpus designed for training and evaluating text-independent multi-channel speaker verification systems. |
L. Mo�ner; O. Plchot; L. Burget; J. H. Cernock�; |
1598 | Multi-Channel Speaker Verification with Conv-Tasnet Based Beamformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to use ConvTasNet, a well-known source separation model, and we adapt it to perform speech enhancement by forcing it to separate speech and additive noise. |
L. Mo�ner; O. Plchot; L. Burget; J. H. Cernock�; |
1599 | LETR: A Lightweight and Efficient Transformer for Keyword Spotting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore a family of Transformer architectures for keyword spotting, optimizing the trade-off between accuracy and efficiency in a high-speed regime. |
K. Ding; M. Zong; J. Li; B. Li; |
1600 | Compressing Transformer-Based ASR Model By Task-Driven Loss and Attention-Based Multi-Level Feature Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Under the 1.1M parameters model, the experimental results on the Wall Street Journal dataset reveal that our approach achieves a 12.1% WER reduction compared with the baseline system. |
Y. Lv; et al. |
1601 | Spatial Processing Front-End for Distant ASR Exploiting Self-Attention Channel Combinator Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel multi-channel front-end based on channel shortening with the Weighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem. |
D. Sharma; R. Gong; J. Fosburgh; S. Y. Kruchinin; P. A. Naylor; L. Milanovic; |
1602 | Efficient Sequence Training of Attention Models Using Approximative Recombination Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The error that is incurred by the approximation is analyzed and it is shown that using this technique the effective beam size can be increased by several orders of magnitude without significantly increasing the computational requirements. |
N. -P. Wynands; W. Michel; J. Rosendahl; R. Schl�ter; H. Ney; |
1603 | Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Inspired by the capability of attention mechanism in capturing long term contextual information and learning alignments in ASR and TTS, we propose a neural network based end-to-end forced aligner called NeuFA, in which a novel bidirectional attention mechanism plays an essential role. |
J. Li; et al. |
1604 | Conformer-Based Speech Recognition with Linear Nystr�m Attention and Rotary Position Embedding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to replace self-attention with a linear complexity Nystr�m attention which is a low-rank approximation of the attention scores based on the Nystr�m method. |
L. Samarakoon; T. -Y. Leung; |
1605 | Multilingual Text-To-Speech Training Using Cross Language Voice Conversion And Self-Supervised Learning Of Speech Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: One way of mitigating this issue is by generating polyglot corpus through voice conversion. In this paper, we train such multilingual TTS system through a novel cross-lingual voice conversion model trained with speaker-invariant features extracted from a speech representation model which is pre-trained with 53 languages through self-supervised learning [1]. |
J. Wu; A. Polyak; Y. Taigman; J. Fong; P. Agrawal; Q. He; |
1606 | Towards Lifelong Learning of Multilingual Text-to-Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system, where each language was seen as an individual task and was learned sequentially and continually. |
M. Yang; S. Ding; T. Chen; T. Wang; Z. Wang; |
1607 | Zero-Shot Cross-Lingual Transfer Using Multi-Stream Encoder and Efficient Speaker Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel method for zero-shot cross-lingual TTS task by using multi-stream text encoder and efficient speaker representation. |
Y. Zheng; Z. Zhang; X. Li; W. Su; L. Lu; |
1608 | Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we formulate a novel task to synthesize speech in sync with a silent pre-recorded video, denoted as automatic voice over (AVO). |
J. Lu; B. Sisman; R. Liu; M. Zhang; H. Li; |
1609 | Duration Modeling of Neural TTS for Automatic Dubbing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose novel duration models for neural TTS that can be leveraged both to predict and control TTS duration. |
J. Effendi; Y. Virkar; R. Barra-Chicote; M. Federico; |
1610 | Learning to Predict Speech in Silent Videos Via Audiovisual Analogy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: And synthesizing the original speech waveform from lipreading makes it even a more challenging problem. Towards this end, we present a deep learning framework that can be trained end-to-end to learn the mapping between the auditory and visual signals. |
R. Yadav; A. Sardana; V. P. Namboodiri; R. M. Hegde; |
1611 | Self-Attention for Incomplete Utterance Rewriting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel method by directly extracting the coreference and omission relationship from the self-attention weight matrix of the transformer in-stead of word embeddings and edit the original text accordingly to generate the complete utterance. |
Y. Zhang; Z. Li; J. Wang; N. Cheng; J. Xiao; |
1612 | Multi-Turn Incomplete Utterance Restoration As Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the task of multi-turn incomplete utterance restoration to tackle the issue of frequent coreference and information omission in multi-turn dialogues. |
W. Jiang; S. Li; J. Li; Y. Yang; |
1613 | CLseg: Contrastive Learning of Story Ending Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: The goal of this paper is to adopt contrastive learning to generate endings more consistent with story context, while there are two main challenges in contrastive learning of SEG. |
Y. Xie; Y. Hu; L. Xing; Y. Li; W. Peng; P. Guo; |
1614 | Explicitly Modeling Importance and Coherence for Timeline Summarization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a better approach for TLS by explicitly optimizing importance and coherence on top of coverage and diversity. |
Q. Mao; et al. |
1615 | TED Talk Teaser Generation with Pre-Trained Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we address the challenge of automatically generating teasers for TED talks. |
G. Vico; J. Niehues; |
1616 | End-to-End Speech Summarization Using Restricted Self-Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a single model optimized end-to-end for speech summarization. |
R. Sharma; S. Palaskar; A. W. Black; F. Metze; |
1617 | Turn-to-Diarize: Online Speaker Diarization Constrained By Transformer Transducer Speaker Turn Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we present a novel speaker diarization system for streaming on-device applications. |
W. Xia; et al. |
1618 | Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents Transcribe-to-Diarize, a new approach for neural speaker diarization that uses an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR). |
N. Kanda; et al. |
1619 | A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a novel framework for the SCD task, which utilizes a multitask learning architecture to leverage speaker information during the training stage, and adds the content information extracted from an unsupervised speech decomposition model to help detect the speaker change points. |
H. Su; et al. |
1620 | ASR-Aware End-to-End Neural Diarization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model. |
A. Khare; E. Han; Y. Yang; A. Stolcke; |
1621 | Reformulating Speaker Diarization As Community Detection With Emphasis On Topological Structure Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose to view clustering-based diarization as a community detection problem. |
S. Zheng; H. Suo; |
1622 | TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker representations. |
N. R. Koluguri; T. Park; B. Ginsburg; |
1623 | Transducer-Based Streaming Deliberation for Cascaded Encoders Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a transducer-based streaming deliberation model. |
K. Hu; T. N. Sainath; A. Narayanan; R. Pang; T. Strohman; |
1624 | Improving The Latency And Quality Of Cascaded Encoders Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore reducing computational latency of the 2-pass cascaded encoder model [1]. |
T. N. Sainath; et al. |
1625 | Improving The Fusion of Acoustic and Text Representations in RNN-T Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To estimate the output distributions over subword units, RNN-T uses a fully connected layer as the joint network to fuse the acoustic representations extracted using the acoustic encoder with the text representations obtained using the prediction network based on the previous subword units. In this paper, we propose to use gating, bilinear pooling, and a combination of them in the joint network to produce more expressive representations to feed into the output layer. |
C. Zhang; B. Li; Z. Lu; T. N. Sainath; S. -y. Chang; |
1626 | Adaptive Discounting of Implicit Language Models in RNN-Transducers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: One main reason for the degradation in performance on rare words is that the language model (LM) internal to RNN-Ts can be-come overconfident and lead to hallucinated predictions that are acoustically inconsistent with the underlying speech. To address this issue, we propose a lightweight adaptive LM dis-counting technique ADAPTLMD, that can be used with any RNN-T architecture without requiring any external resources or additional parameters. |
V. Unni; S. Khare; A. Mittal; P. Jyothi; S. Sarawagi; S. Bharadwaj; |
1627 | Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel text representation and training framework for E2E ASR models. |
S. Thomas; B. Kingsbury; G. Saon; H. -K. J. Kuo; |
1628 | Factorized Neural Transducer for Efficient Language Model Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This draw-back might prevent their potential applications in practice. In order to address this issue, we propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction, and adopting a standalone language model for the vocabulary prediction. |
X. Chen; Z. Meng; S. Parthasarathy; J. Li; |
1629 | Integrating Dependency Tree Into Self-Attention for Sentence Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address both issues, we propose Dependency-Transformer, which applies a relation-attention mechanism that works in concert with the self-attention mechanism. |
J. Ma; J. Li; Y. Liu; S. Zhou; X. Li; |
1630 | Metricbert: Text Representation Learning Via Self-Supervised Triplet Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the �traditional� masked-language task. |
I. Malkiel; D. Ginzburg; O. Barkan; A. Caciularu; Y. Weill; N. Koenigstein; |
1631 | End-To-End Neural Coreference Resolution Revisited: A Simple Yet Effective Baseline Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning. |
T. M. Lai; T. Bui; D. S. Kim; |
1632 | Local Context Interaction-Aware Glyph-Vectors for Chinese Sequence Tagging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, it�s difficult for shallow CNNs architecture to extract glyphs information from character data and implement the con-textual interaction of different glyphs information effectively. In this paper, we address these issues by presenting LCIN: a Local Context Interaction-aware Network for glyph-vectors extraction. |
J. Lu; P. Zhang; |
1633 | Deep Learning for Prominence Detection In Children�s Read Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present here, in contrast, a system that operates directly on segmented speech waveforms to learn features relevant to prominent word detection for children�s oral fluency assessment. |
M. Vaidya; K. Sabu; P. Rao; |
1634 | Towards A Common Speech Analysis Engine Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose leveraging recent advances in self-supervised-based speech processing to create a common speech analysis engine. |
H. Aronowitz; I. Gat; E. Morais; W. Zhu; R. Hoory; |
1635 | Phone-to-Audio Alignment Without Text: A Semi-Supervised Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Here we introduce two Wav2Vec2-based models for both text-dependent and text-independent phone-to-audio alignment. |
J. Zhu; C. Zhang; D. Jurgens; |
1636 | Attachment Recognition in School-Age Children: A Multimodal Approach Based on Language and Paralanguage Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The approach proposed in this work recognizes whether a child is secure or insecure, the two major attachment conditions an individual can belong to. |
H. Alsofyani; A. Vinciarelli; |
1637 | Determining The Best Acoustic Features for Smoker Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the performance of four acoustic feature sets/representations extracted using three feature extraction/learning approaches: (i) hand-crafted feature sets including the extended Geneva Minimalistic Acoustic Parameter Set and the Computational Paralinguistics Challenge Set, (ii) the Bag-of-Audio-Words representations, (iii) the neural representations extracted from raw waveform signals by SincNet. |
Z. Ma; Y. Qiu; F. Hou; R. Wang; J. T. Wai Chu; C. Bullen; |
1638 | End-to-End Low Resource Keyword Spotting Through Character Recognition and Beam-Search Re-Scoring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes an end-to-end approach to perform keyword spotting with a pre-trained acoustic model that uses recurrent neural networks and connectionist temporal classification loss. |
E. T. Mekonnen; A. Brutti; D. Falavigna; |
1639 | Curriculum Optimization for Low-Resource Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new difficulty measure called compression ratio that can be used as a scoring function for raw audio in various noise conditions. |
A. Kuznetsova; A. Kumar; J. D. Fox; F. M. Tyers; |
1640 | Exploring Effective Data Utilization for Low-Resource Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a series of training strategies to exploring more effective data utilization for low-resource speech recognition. |
Z. Zhou; W. Wang; W. Zhang; Y. Qian; |
1641 | Omni-Sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR Via Supernet Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. |
H. Yang; et al. |
1642 | Analyzing The Robustness of Unsupervised Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Unsupervised speech recognition (unsupervised ASR) aims to learn the ASR system with non-parallel speech and text corpus only. Wav2vec-U [1] has shown promising results in … |
G. -T. Lin; C. -J. Hsu; D. -R. Liu; H. -Y. Lee; Y. Tsao; |
1643 | Interpreting Intermediate Convolutional Layers In Unsupervised Acoustic Word Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Understanding how deep convolutional neural networks classify data has been subject to extensive research. |
G. Begu�; A. Zhou; |
1644 | Context Modeling with Evidence Filter for Multiple Choice Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the challenge, we propose a simple yet effective approach termed evidence filtering to model the relationships between the encoded contexts with respect to different options collectively, and to potentially highlight the evidence sentences and filter out unrelated sentences. |
S. Yu; H. Zhang; W. Jing; J. Jiang; |
1645 | From Shallow to Deep: Compositional Reasoning Over Graphs for Visual Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: It is effortless for human but challenging for machines. In this paper, we propose a Hierarchical Graph Neural Module Network (HGNMN) that reasons over multi-layer graphs with neural modules to address the above issues. |
Z. Zhu; |
1646 | A Question-Oriented Propagation Network for News Reading Comprehension Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such reading comprehension task usually requires document-level language understanding while state-of-the-art, pretrained question answering models can only encode sequences with a predefined length limit. In this paper, we propose a novel Question-Oriented Propagation Network (QOPN) model for such task. |
L. Wen; et al. |
1647 | Syntax-Based Graph Matching for Knowledge Base Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing selection methods are usually based on word-level matching, which cannot capture the structural information or solve the long-term dependency problem of entities. To solve this problem, we propose a syntax-based graph matching method, which explicitly models both question and logical form as graphs, and performs matching at both word-level and structure-level. |
L. Ma; et al. |
1648 | QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, studies on multi-hop question answering (QA) suggest that Transformers can replace the graph structure for multi-hop reasoning. Therefore, in this work, we propose a novel framework, QA4QG, a QA-augmented BART-based framework for MQG. |
D. Su; P. Xu; P. Fung; |
1649 | Pair-Level Supervised Contrastive Learning for Natural Language Inference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Pair-level Supervised Contrastive Learning approach (PairSCL). |
S. Li; X. Hu; L. Lin; L. Wen; |
1650 | Acoustic Comparison of Physical Vocal Tract Models with Hard and Soft Walls Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This study explored how the frequencies and bandwidths of the acoustic resonances of physical tube models of the vocal tract differ when they have hard versus soft walls. |
P. Birkholz; P. H�sner; S. K�rbis; |
1651 | An Error Correction Scheme for Improved Air-Tissue Boundary in Real-Time MRI Video for Speech Production Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Careful analysis of predicted contours reveals errors in regions like the velum part of contour1 (ATB comprising of upper lip, hard palate, and velum) and tongue base section of contour2 (ATB covering jawline, lower lip, tongue base, and epiglottis), which are not captured in a global evaluation metric like DTW distance. In this work, we automatically detect such errors and propose a correction scheme for the same. |
A. Roy; V. Belagali; P. K. Ghosh; |
1652 | Repeat After Me: Self-Supervised Learning of Acoustic-to-Articulatory Mapping By Vocal Imitation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input. |
M. -A. Georges; J. Diard; L. Girin; J. -L. Schwartz; T. Hueber; |
1653 | Multi-Speaker Pitch Tracking Via Embodied Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the motor theory of speech perception, a novel multi-speaker pitch tracking approach is proposed in this work, based on an embodied self-supervised learning method (EMSSL-Pitch). |
X. Li; Y. Sun; X. Wu; J. Chen; |
1654 | Improving The Classification of Phonetic Segments from Raw Ultrasound Using Self-Supervised Learning and Hard Example Mining Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the data may contain many hard examples for the classification task, due to contamination of speckle noise. In this paper, we aim to address these issues: firstly, self-supervised learning is adopted to utilize the unlabeled datasets and extract the features without any human annotations; secondly, hard example mining is applied to imitate the learning path of the clinical linguists. |
Y. Xiong; K. Xu; M. Jiang; L. Cheng; Y. Dou; J. Wang; |
1655 | The Impact of Cross Language on Acoustic-to-articulatory Inversion and Its Influence on Articulatory Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate the effect of unseen language on the AAI performance in both seen and unseen speaker conditions. |
A. Illa; A. Nair; P. K. Ghosh; |
1656 | Transformer-Based Streaming ASR with Cumulative Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR). |
M. Li; S. Zhang; C. Zorila; R. Doddipatla; |
1657 | Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to use non-causal convolution to process the center block and lookahead context separately. |
Y. Shi; et al. |
1658 | Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose improvements to our recently proposed hybrid RNN-T/Attention architecture that includes a shared encoder followed by recurrent neural network-transducer (RNN-T) and triggered attention-based decoders (TAD). |
T. Moriya; et al. |
1659 | Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A streaming style inference of encoder�decoder automatic speech recognition (ASR) systems is important for reducing latency, which is essential for interactive use cases. To this end, we propose a novel blockwise synchronous decoding algorithm with a hybrid approach that combines endpoint prediction and endpoint post-determination. |
E. Tsunoo; C. Narisetty; M. Hentschel; Y. Kashiwagi; S. Watanabe; |
1660 | Alignment-Learning Based Single-Step Decoding for Accurate and Fast Non-Autoregressive Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the single-step decoding process with length prediction will suffer from the decoding stability problem and limited improvement for inference speed. To address this, in this paper, we propose an alignment learning based NAT model, named AL-NAT. |
Y. Wang; R. Liu; F. Bao; H. Zhang; G. Gao; |
1661 | Usted: Improving ASR with A Unified Speech and Text Encoder-Decoder Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose training ASR model jointly with a set of text-to-text auxiliary tasks with which it shares a decoder and parts of the encoder. |
B. Yusuf; A. Gandhe; A. Sokolov; |
1662 | Improving Emotional Speech Synthesis By Using SUS-Constrained VAE and Text Encoder Aggregation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an innovative constraint to help VAE extract emotion embedding with better cluster cohesion. |
F. Yang; J. Luan; Y. Wang; |
1663 | Distribution Augmentation for Low-Resource Expressive Text-To-Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. |
M. Lajszczak; et al. |
1664 | Interactive Multi-Level Prosody Control for Expressive Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We thus present ConEx, a novel model for expressive speech synthesis, which can produce speech in a certain speaking style, while also allowing local adjustments to the prosody of the generated speech. |
T. Cornille; F. Wang; J. Bekker; |
1665 | Improve Few-Shot Voice Cloning Using Multi-Modal Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to use multi-modal learning to improve the few-shot voice cloning performance. |
H. Zhang; Y. Lin; |
1666 | Cloning One�s Voice Using Very Limited Data in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: (2) How to clone a person�s voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and timbre are modeled separately using two modules, therefore, the independent control of timbre and the other characteristics of audio can be achieved while generating speech. |
D. Dai; et al. |
1667 | UNET-TTS: Improving Unseen Speaker and Style Transfer in One-Shot Voice Cloning Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we present a novel one-shot voice cloning algorithm called Unet-TTS that has good generalization ability for unseen speakers and styles. |
R. Li; D. Pu; M. Huang; B. Huang; |
1668 | Improving Biomedical Named Entity Recognition with A Unified Multi-Task MRC Framework Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To encode prior knowledge into the model, we present a unified multi-task Machine Reading Comprehension (MRC) framework for BioNER. |
Y. Tong; F. Zhuang; D. Wang; H. Ying; B. Wang; |
1669 | A Multi-Task Learning Framework for Chinese Medical Procedure Entity Normalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on Chinese medical procedure entity normalization. |
X. Sui; K. Song; B. Zhou; Y. Zhang; X. Yuan; |
1670 | Wasserstein Cross-Lingual Alignment For Named Entity Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study cross-lingual alignment for NER, an approach for transferring knowledge from high-to low-resource languages, via the alignment of token embeddings between different languages. |
R. Wang; R. Henao; |
1671 | Learning Common Dependency Structure for Unsupervised Cross-Domain Ner Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such information is more accessible and can be shared between sentences from different domains. We propose a novel framework with dependency-aware GNN (DGNN) to learn these common structures from source domain and adapt them to target domain, alleviating the data scarcity issue and bridging the domain gap. |
L. Liu; X. Lin; P. Zhang; L. Zhang; B. Wang; |
1672 | AISHELL-NER: Named Entity Recognition from Chinese Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we introduce a new dataset AISEHLL-NER for NER from Chinese speech. |
B. Chen; G. Xu; X. Wang; P. Xie; M. Zhang; F. Huang; |
1673 | Call-Sign Recognition and Understanding for Noisy Air-Traffic Transcripts Using Surveillance Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A low signal-to-noise ratio (SNR) in the speech leads to high word error rate (WER) transcripts. We propose a new call-sign recognition and understanding (CRU) system that addresses this issue. |
A. Blatt; M. Kocour; K. Vesel�; I. Sz�ke; D. Klakow; |
1674 | Incorporating End-to-End Framework Into Target-Speaker Voice Activity Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an end-to-end target-speaker voice activity detection (E2E-TS-VAD) method for speaker diarization. |
W. Wang; M. Li; |
1675 | Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The objective of this work is effective speaker diarisation using multi-scale speaker embeddings. |
Y. Kwon; H. -S. Heo; J. -W. Jung; Y. J. Kim; B. -J. Lee; J. S. Chung; |
1676 | Towards End-to-end Speaker Diarization with Generalized Neural Speaker Clustering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we present a novel speaker diarization system, with a generalized neural speaker clustering module as the backbone. |
C. Zhang; J. Shi; C. Weng; M. Yu; D. Yu; |
1677 | Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new residual auxiliary EEND (RX-EEND) learning architecture for transformers to enforce the lower encoder blocks to learn more accurately. |
Y. Yu; D. Park; H. Kook Kim; |
1678 | Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the approaches proposed so far have not realized tight integration yet, because the clustering employed therein was not optimal in any sense for clustering the speaker embeddings estimated by the EEND module. To address this problem, this paper introduces a trainable clustering algorithm into the integration framework, by deep-unfolding a non-parametric Bayesian model called the infinite Gaussian mixture model (iGMM). |
K. Kinoshita; M. Delcroix; T. Iwata; |
1679 | Improving Separation-Based Speaker Diarization Via Iterative Model Refinement And Speaker Embedding Based Post-Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an iterative separation-based speaker diarization (ISSD) approach to cope with the realistic data conditions. |
S. -T. Niu; J. Du; L. Sun; C. -H. Lee; |
1680 | Knowledge Distillation from Language Model to Acoustic Model: A Hierarchical Multi-Task Learning Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. |
M. -H. Lee; J. -H. Chang; |
1681 | Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach to combine their ideas for end-to-end speech recognition model. |
S. Ling; C. Shen; M. Cai; Z. Ma; |
1682 | Multi-Turn RNN-T for Streaming Recognition of Multi-Party Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Through an in-depth analysis, we discuss potential pitfalls of the proposed system as well as promising future research directions. |
I. Sklyar; A. Piunova; X. Zheng; Y. Liu; |
1683 | On Language Model Integration for RNN Transducer Based Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework. |
W. Zhou; Z. Zheng; R. Schl�ter; H. Ney; |
1684 | Caching Networks: Capitalizing on Common Speech for ASR Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Caching Networks (CachingNets), a speech recognition network architecture capable of delivering faster, more accurate decoding by leveraging common speech patterns. |
A. Alexandridis; et al. |
1685 | GPU-Accelerated Forward-Backward Algorithm with Application to Lattice-Free MMI Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to express the forward-backward algorithm in terms of operations between sparse matrices in a specific semiring. |
L. Ondel; L. -M. Lam-Yee-Mui; M. Kocour; C. F. Corro; L. Burget; |
1686 | It�Wave: It� Stochastic Differential Equation Is All You Need for Wave Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a vocoder based on a pair of forward and reverse-time linear stochastic differential equations (SDE). |
S. Wu; Z. Shi; |
1687 | Multi-Sample Subband Wavernn Via Multivariate Gaussian Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a high-speed neural vocoder for CPU implementation. |
H. Kanagawa; Y. Ijima; |
1688 | Infergrad: Improving Diffusion Models for Vocoder By Considering Inference in Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality. |
Z. Chen; et al. |
1689 | Neural Speech Synthesis on A Shoestring: Improving The Efficiency of Lpcnet Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In previous work, we introduced LPCNet, which uses linear prediction to significantly reduce the complexity of neural synthesis. |
J. -M. Valin; U. Isik; P. Smaragdis; A. Krishnaswamy; |
1690 | Generalization Ability of MOS Prediction Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, using a variety of networks for MOS prediction including MOSNet and self-supervised speech models such as wav2vec2, we investigate their performance on data from different listening tests in both zero-shot and fine-tuned settings. |
E. Cooper; W. -C. Huang; T. Toda; J. Yamagishi; |
1691 | On The Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: All of our experiments are conducted on publicly available models, and findings in this work are backed by large-scale subjective tests and objective measures. |
C. -I. J. Lai; et al. |
1692 | Phonology Recognition in American Sign Language Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by recent developments in natural language processing, we propose a novel approach to sign language processing based on phonological properties validated by American Sign Language users. |
F. Tavella; A. Galata; A. Cangelosi; |
1693 | Spatio-Temporal Graph Convolutional Networks for Continuous Sign Language Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address the challenging problem of continuous sign language recognition (CSLR) from RGB videos, proposing a novel deep-learning framework that employs spatio-temporal graph convolutional networks (ST-GCNs), which operate on multiple, appropriately fused feature streams, capturing the signer�s pose, shape, appearance, and motion information. |
M. Parelli; K. Papadimitriou; G. Potamianos; G. Pavlakos; P. Maragos; |
1694 | Sensors to Sign Language: A Natural Approach to Equitable Communication Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While existing STSL systems demonstrate effectiveness with small vocabularies of fewer than 100 words, we aim to determine if STSL can scale to larger, more realistic lexicons. For this purpose, we introduce a new dataset, SignBank, which consists of exactly 6,000 signs, spans 558 distinct words from 15 different novice signers, and constitutes the largest such dataset. |
T. Fouts; A. Hindy; C. Tanner; |
1695 | Accurate and Resource-Efficient Lipreading with Efficientnetv2 and Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, we make the following contributions: First, inspired by the recent success of the EfficientNet architecture in image classification and our earlier work on resource-efficient lipreading models (MobiLipNet), we introduce Efficient-Nets to the lipreading task. |
A. Koumparoulis; G. Potamianos; |
1696 | Training Strategies for Improved Lip-Reading Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundaries indicators. |
P. Ma; Y. Wang; S. Petridis; J. Shen; M. Pantic; |
1697 | Multistream Neural Architectures for Cued Speech Recognition Using A Pre-Trained Visual Feature Extractor and Constrained CTC Decoding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a simple and effective approach for automatic recognition of Cued Speech (CS), a visual communication tool that helps people with hearing impairment to understand spoken language with the help of hand gestures that can uniquely identify the uttered phonemes in complement to lip-reading. |
S. Sankar; D. Beautemps; T. Hueber; |
1698 | Modeling of Pre-Trained Neural Network Embeddings Learned From Raw Waveform for COVID-19 Infection Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In that direction, this paper is addressing the second DiCOVA challenge that deals with COVID-19 detection based on speech, cough and breathing. |
Z. Mostaani; R. Prasad; B. Vlasenko; M. Magimai-Doss; |
1699 | Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we proposed a method for recording device classification using the recorded speech signal. |
A. R. Naini; B. Singhal; P. K. Ghosh; |
1700 | Entrainment Analysis for Assessment of Autistic Speech Prosody Using Bottleneck Features of Deep Neural Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the present study, we quantify entrainment characteristics of conversation with the aim of automatic assessment of the severity of autism spectrum disorder (ASD). |
K. Ochi; N. Ono; K. Owada; M. Kuroda; S. Sagayama; H. Yamasue; |
1701 | Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new Customer Satisfaction Estimation (CSE) method that utilizes unsupervised representation learning. |
A. Ando; et al. |
1702 | Automatic Assessment of The Degree of Clinical Depression from Speech Using X-Vectors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we employ x-vectors, a DNN-based feature extractor technique, to detect depression from a Hungarian corpus. |
J. V. Egas-L�pez; G. Kiss; D. Sztah�; G. Gosztolya; |
1703 | Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of the previous work uses hand-crafted features or deep neural network-based feature extractors to obtain deep features and then feed them into a classifier or a regression, which ignores the temporal relation of these features. To address this issue, this paper proposes a global information embedding (GIE) to make use of the long-term global information of depression and re-weight the LSTM output sequence. |
Y. Li; M. Niu; Z. Zhao; J. Tao; |
1704 | Knowledge Transfer from Large-Scale Pretrained Language Models to End-To-End Speech Recognizers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Since end-to-end models are also known to be severely data hungry, this constraint is crucial especially because obtaining transcribed utterances is costly and can possibly be impractical or impossible. This paper proposes a method for alleviating this issue by transferring knowledge from a language model neural network that can be pretrained with text-only data. |
Y. Kubo; S. Karita; M. Bacchiani; |
1705 | Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this issue, we propose two knowledge transferring methods that leverage pre-trained LMs, such as BERT and GPT2, to improve CTC-based models. |
K. Deng; et al. |
1706 | Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fulfill the two demands, in this paper, we propose a NAR CTC/attention model utilizing both pre-trained acoustic and language models: wav2vec2.0 and BERT. |
K. Deng; Z. Yang; S. Watanabe; Y. Higuchi; G. Cheng; P. Zhang; |
1707 | Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-Trained Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple knowledge distillation (KD) loss function for neural transducers that focuses on the one-best path in the output probability lattice under both streaming and non-streaming setups, which allows a small student model to approach the performance of the large pre-trained teacher model. |
X. Yang; Q. Li; P. C. Woodland; |
1708 | Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this work, we focus on mitigating confusion problems with fine-grained contextual knowledge selection (FineCoS). |
M. Han; et al. |
1709 | Contextual Adapters for Personalized Speech Recognition in Neural Transducers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models. |
K. M. Sathyendra; et al. |
1710 | Emotionflow: Capture The Dialogue Level Emotion Transitions Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, the spread impact of emotions in a conversation is rarely addressed in existing researches. To this end, we propose EmotionFlow for ERC with the consideration of the spread of participants’ emotions during a conversation. |
X. Song; L. Zang; R. Zhang; S. Hu; L. Huang; |
1711 | Multimodal Sentiment Analysis on Unaligned Sequences Via Holographic Embedding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a method based on holographic reduced representation which is a compressed version of the outer product to model facilitate higher-order fusion across multiple modality. |
Y. Ma; B. Ma; |
1712 | Distribution Learning for Age Estimation from Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we hypothesize that the age follows a normal distribution centered around the real age with a particular confidence interval. |
A. Saraf; E. Khoury; |
1713 | Dispeech: A Synthetic Toy Dataset for Speech Disentangling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, we introduce diSpeech, a corpus of speech synthesized with the Klatt synthesizer. |
O. Zhang; N. Gengembre; O. L. Blouch; D. Lolive; |
1714 | End-to-End ASR-Enhanced Neural Network for Alzheimer�s Disease Diagnosis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents an approach to Alzheimer�s disease (AD) diagnosis from spontaneous speech using an end-to-end ASR-enhanced neural network. |
J. Gui; Y. Li; K. Chen; J. Siebert; Q. Chen; |
1715 | A Novel Sequential Monte Carlo Framework for Predicting Ambiguous Emotion States Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Sequential Monte Carlo framework that models the perceived emotion as time-varying distributions that allows for ambiguity to be incorporated. |
J. Wu; T. Dang; V. Sethu; E. Ambikairajah; |
1716 | Phone-Informed Refinement of Synthesized Mel Spectrogram for Data Augmentation in Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a phone-informed post-processing network that refines Mel spectrograms without using the vocoder. |
S. Ueno; T. Kawahara; |
1717 | LPC Augment: An LPC-based ASR Data Augmentation Algorithm for Low and Zero-Resource Children�s Dialects Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel linear prediction coding-based data augmentation method for children�s low and zero resource dialect ASR. |
A. Johnson; R. Fan; R. Morris; A. Alwan; |
1718 | Towards Better Meta-Initialization with Task Augmentation for Kindergarten-Aged Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we validate the effectiveness of MI in children�s ASR and attempt to alleviate the problem of learner overfitting. |
Y. Zhu; R. Fan; A. Alwan; |
1719 | Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an unsupervised data selection method by using a submodular function based on contrastive loss ratios of target and training data sets. |
C. Park; R. Ahmad; T. Hain; |
1720 | Importantaug: A Data Augmentation Agent for Speech Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions. |
V. A. Trinh; H. Salami Kavaki; M. I. Mandel; |
1721 | Injecting Text and Cross-Lingual Supervision in Few-Shot Learning from Self-Supervised Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We demonstrate how universal phoneset acoustic models can leverage cross-lingual supervision to improve transfer of pretrained self-supervised representations to new languages. |
M. Wiesner; D. Raj; S. Khudanpur; |
1722 | When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a vertical federated learning architecture based on variational quantum circuits to demonstrate the competitive performance of a quantum-enhanced pre-trained BERT model for text classification. |
C. -H. H. Yang; J. Qi; S. Y. -C. Chen; Y. Tsao; P. -Y. Chen; |
1723 | Matching Point Sets with Quantum Circuit Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a parameterised quantum circuit learning approach to point set matching problem. |
M. Noormandipour; H. Wang; |
1724 | The Dawn of Quantum Natural Language Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. |
R. Di Sipio; J. -H. Huang; S. Y. -C. Chen; S. Mangini; M. Worring; |
1725 | Quantum Federated Learning with Quantum Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes the first fully quantum federated learning frame-work that can operate over purely quantum data. |
M. Chehimi; W. Saad; |
1726 | Quantum Long Short-Term Memory Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a hybrid quantum-classical model of LSTM, which we dub QLSTM. |
S. Y. -C. Chen; S. Yoo; Y. -L. L. Fang; |
1727 | Classical-To-Quantum Transfer Learning for Spoken Command Recognition Based on Quantum Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work investigates an extension of transfer learning applied in machine learning algorithms to the emerging hybrid end-to-end quantum neural network (QNN) for spoken command recognition (SCR). |
J. Qi; J. Tejedor; |
1728 | Waveform Optimization for Wireless Power Transfer with Power Amplifier and Energy Harvester Non-linearities Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a channel-adaptive waveform design strategy that optimizes the transmitter�s input waveform considering both HPA and EH non-linearities. |
Y. Zhang; B. Clerckx; |
1729 | Economics of Semantic Communication System in Wireless Powered Internet of Things Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a semantic-based energy valuation and take an economic approach to solve the energy allocation problem as an incentive mechanism design. |
Z. Q. Liew; Y. Cheng; W. Y. B. Lim; D. Niyato; C. Miao; S. Sun; |
1730 | Optimal Resource Allocation and Beamforming for Two-User Miso WPCNS for A Non-Linear Circuit-Based EH Model : (Invited Paper) Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to existing works, we adopt a non-linear model of the harvested power based on a precise analysis of the employed EH circuit. |
N. Shanin; M. Garkisch; A. Hagelauer; R. Schober; L. Cottatellucci; |
1731 | Performance Optimization for Wireless Semantic Communications Over Energy Harvesting Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, the optimization of semantic communications over energy harvesting networks is studied. |
M. Chen; Y. Wang; H. V. Poor; |
1732 | Deep Learning Based Passive Beamforming for IRS-Assisted Monostatic Backscatter Systems Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep learning based framework that learns the desired IRS phase shifts without knowing the channels, to assist the communication of a passive backscatter tag. |
S. Idrees; X. Jia; S. Khan; S. Durrani; X. Zhou; |
1733 | On Federated Learning with Energy Harvesting Clients Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Catering to the proliferation of Internet of Things devices and distributed machine learning at the edge, we propose an energy harvesting federated learning (EHFL) framework in this paper. |
C. Shen; J. Yang; J. Xu; |
1734 | Structural Prior Models for 3-D Deep Vessel Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the observation that 3-D vessel structures project onto 2-D image slices with distinctive edges that can aid 3-D vessel segmentation, we propose a novel multi-task learning architecture comprising a shared encoder and two decoders that respectively predict vessel segmentation maps and edge profiles. |
X. Li; R. Bala; V. Monga; |
1735 | Expectation Consistent Plug-and-Play for MRI Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we design a PnP method using the expectation consistent (EC) approximation algorithm, a generalization of AMP, that offers predictable error statistics at each iteration, from which a deep-net denoiser can be effectively trained. |
S. K. Shastri; R. Ahmad; C. A. Metzler; P. Schniter; |
1736 | Inverse Imaging with Generative Priors Via Langevin Dynamics Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the use of stochastic gradient Langevin dynamics (SGLD) for compressed sensing with a generative prior. |
T. V. Nguyen; G. Jagatap; C. Hegde; |
1737 | CNN-Aided Factor Graphs with Estimated Mutual Information Features for Seizure Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a convolutional neural network (CNN) aided factor graphs assisted by mutual information features estimated by a neural network for seizure detection. |
B. Salafian; E. F. Ben-Knaan; N. Shlezinger; S. d. Ribaupierre; N. Farsad; |
1738 | Unfolding Model-Based Beamforming for High Quality Ultrasound Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a result, the iterative nature of solving this problem is computationally expensive, and the choice of regularization bounds the fidelity of the obtained solution. Therefore, in this work, we pose ADMIRE as a sparse coding problem and unfold the iterations of the iterative shrinkage and thresholding algorithm (ISTA), training it end to end to yield learned ISTA (LISTA). |
C. Khan; R. J. G. van Sloun; B. Byram; |
1739 | Optimization Guarantees for ISTA and ADMM Based Unfolded Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study optimization guarantees for two popular unfolded networks, i.e., unfolded networks derived from iterative soft thresholding algorithms (ISTA) and derived from Alternating Direction Method of Multipliers (ADMM). |
W. Pu; Y. C. Eldar; M. R. D. Rodrigues; |
1740 | Integration of Anomaly Machine Sound Detection Into Active Noise Control to Shape The Residual Sound Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: When the ANC system is deployed to reduce the noise level in a factory environment, workers may feel strange because they are used to detecting the anomaly machine sound by their auditory perception. In order to solve this problem, this paper proposes to integrate anomaly sound detection (ASD) into the ANC system in order for the residual sound to represent the same machine status as the original machine noise. |
C. Shi; M. Huang; H. Jiang; H. Li; |
1741 | Dual Active Noise Control with Common Sensors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a dual active noise control (ANC) that can reduce unwanted sound simultaneously in two spaces separated by the partition while using common sensors. |
R. Okajima; Y. Kajikawa; K. Oto; |
1742 | A Hybrid Approach to Combine Wireless and Earcup Microphones for ANC Headphones with Error Separation Module Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hence, we proposed a novel hybrid approach that employs wireless microphones to acquire high signal-to-noise-ratio reference signals from far-end noise sources, increasing coherence and thus improving noise reduction performance. |
X. Shen; D. Shi; W. -S. Gan; |
1743 | Spatial Active Noise Control with The Remote Microphone Technique: An Approach with A Moving Higher Order Microphone Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, the remote microphone technique has been introduced to spatial ANC systems without using error microphones inside the region of interest. |
H. Sun; J. Zhang; T. Abhayapala; P. Samarasinghe; |
1744 | Robust Pressure Matching with ATF Perturbation Constraints for Sound Field Control Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Sound field control systems deployed in room acoustic environments require knowing the acoustic channel impulse responses between the loudspeakers and matching microphones, which are challenging to estimate accurately due to perturbations caused by such factors as temperature changes and sensors� position mismatches. To deal with this issue, a robust pressure matching algorithm is developed in this work where a perturbation term of the acoustic transfer function (ATF) is modeled as a Gaussian process, based on which an uncertainty constraint is applied to limit the impact of perturbation on pressure matching. |
J. Zhang; L. Shi; M. G. Christensen; W. Zhang; L. Zhang; J. Chen; |
1745 | Optimization of A Fixed Virtual Sensing Feedback ANC Controller For In-Ear Headphones with Multiple Loudspeakers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we consider an in-ear headphone equipped with an inner microphone and multiple loudspeakers and we propose an optimization procedure with a convex objective function to derive a fixed multi-loudspeaker ANC controller aiming at minimizing the sound pressure at the ear drum. |
P. R. Benois; R. Roden; M. Blau; S. Doclo; |
1746 | On The Potential of Spatially-Spread Orthogonal Time Frequency Space Modulation for ISAC Transmissions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the potentials of spatially-spread orthogonal time frequency space (SS-OTFS) modulation for integrated sensing and communication (ISAC) transmissions. |
S. Li; W. Yuan; J. Yuan; G. Caire; |
1747 | Sensing-Assisted Beam Tracking in V2I Networks: Extended Target Case Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We thus consider the extended target case, in which the beamwidth should be adjusted in real-time to cover the entire vehicle. |
Z. Du; F. Liu; Z. Zhang; |
1748 | Transmit Beamforming with Fixed Covariance for Integrated MIMO Radar and Multiuser Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the design of a multiple-input multiple-output (MIMO) transmitter which simultaneously functions as a MIMO radar and a base station for downlink multiuser communications. |
X. Liu; T. Huang; Y. Liu; Y. C. Eldar; |
1749 | Safeguarding UAV Networks Through Integrated Sensing, Jamming, and Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an integrated sensing, jamming, and communications (ISJC) framework for securing unmanned aerial vehicle (UAV)-enabled wireless networks. |
Z. Wei; F. Liu; D. W. Kwan Ng; R. Schober; |
1750 | Evaluation of Orthogonal Chirp Division Multiplexing for Automotive Integrated Sensing and Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider a bistatic vehicular integrated sensing and communications (ISAC) system that employs the recently proposed orthogonal chirp division multiplexing (OCDM) multicarrier waveform. |
S. Bhattacharjee; K. V. Mishra; R. Annavajjala; C. R. Murthy; |
1751 | Integrated Sensing and Communications Via 5G NR Waveform: Performance Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nowadays, a possible approach to designing a commercial-attractive sensing solution is integrating sensing capability into widely deployed communication systems, e.g., the force coming fifth-generation (5G) new radio (NR), by slightly modifying the standard. |
Y. Cui; X. Jing; J. Mu; |
1752 | Federated Learning Challenges and Opportunities: An Outlook Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper provides an outlook on FL development as part of the ICASSP 2022 special session entitled Frontiers of Federated Learning: Applications, Challenges, and Opportunities. |
J. Ding; E. Tramel; A. K. Sahu; S. Wu; S. Avestimehr; T. Zhang; |
1753 | Enabling On-Device Training of Speech Recognition Models With Federated Dropout Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We provide empirical evidence of the effectiveness of federated dropout, and propose a novel approach to vary the dropout rate applied at each layer. |
D. Guliani; et al. |
1754 | Adaptive Node Participation for Straggler-Resilient Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this chapter, we propose a novel straggler-resilient federated learning method that incorporates statistical characteristics of the clients� data to adaptively select the clients in order to speed up the learning procedure. |
A. Reisizadeh; I. Tziotis; H. Hassani; A. Mokhtari; R. Pedarsani; |
1755 | Learnings from Federated Learning in The Real World Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: There also exists heterogeneity of data across devices. In this study, we evaluate the impact of such idiosyncrasies on Natural Language Understanding (NLU) models trained using FL. |
C. Dupuy; T. G. Roosta; L. Long; C. Chung; R. Gupta; S. Avestimehr; |
1756 | A Dynamic Reweighting Strategy For Fair Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel optimization framework called DRFL that dynamically adjusts the weight assigned to each client, and we combine it with a biased client selection strategy, both of which encourage fairness in federated training. |
Z. Zhao; G. Joshi; |
1757 | Over-the-Air Personalized Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Current over-the-air aggregation frameworks, on the other hand, train a single model for all users, which can degrade performance in heterogeneous environments where the data distributions of the users can differ from one another. This work presents a personalized over-the-air federated learning framework towards addressing this challenge. |
H. U. Sami; B. G�ler; |
1758 | DNN Based Multiframe Single-Channel Noise Reduction Filters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a deep neural network (DNN) based method to estimate the interframe correlation coefficients and the estimated coefficients are subsequently fed into multiframe filters to achieve noise reduction. |
N. Pan; J. Chen; J. Benesty; |
1759 | Learning-Based Personal Speech Enhancement for Teleconferencing By Exploiting Spatial-Spectral Features Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Various features are accounted for in this study�s proposed system, including speaker embeddings derived from user enrollment and a novel long-short-term spatial coherence (LSTSC) feature pertaining to the target speaker activity. |
Y. Hsu; Y. Lee; M. R. Bai; |
1760 | Manifold Learning-Supported Estimation of Relative Transfer Functions For Spatial Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While state-of-the-art RTF estimators ignore prior knowledge about the acoustic enclosure, audio signal processing algorithms for teleconferencing equipment are often operating in the same or at least a similar acoustic enclosure, e.g., a car or an office, such that training data can be collected. In this contribution, we use such data to train Variational Autoencoders (VAEs) in an unsupervised manner and apply the trained VAEs to enhance imprecise RTF estimates. |
A. Brendel; J. Zeitler; W. Kellermann; |
1761 | Audio Signal Processing for Telepresence Based on Wearable Array in Noisy and Dynamic Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper compares model-based processing to learning-based processing in both noisy and dynamic scenarios, and presents a novel processing using data from a real wearable array, studied by simulation and a listening test. |
H. Beit-On; et al. |
1762 | A Multi-Task Learning Method for Weakly Supervised Sound Event Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fully utilize prior knowledge of the time-frequency masks of each sound event, we propose a novel multi-task learning (MTL) method that takes SED as the main task and source separation as the auxiliary task. |
S. Liu; F. Yang; F. Kang; J. Yang; |
1763 | Low Resources Online Single-Microphone Speech Enhancement with Harmonic Emphasis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep neural network (DNN)-based single-microphone speech enhancement algorithm characterized by a short latency and low computational resources. |
N. Raviv; O. Schwartz; S. Gannot; |
1764 | Deep Learning for Location Based Beamforming with Nlos Channels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a method whose objective is to determine an appropriate precoder from the knowledge of the user�s location only is proposed. |
L. Le Magoarou; T. Yassine; S. Paquelet; M. Crussi�re; |
1765 | Predicting Flat-Fading Channels Via Meta-Learned Closed-Form Linear Filters and Equilibrium Propagation Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: This paper proposes to leverage meta-learning in order to mitigate the requirements in terms of training data for channel fading prediction. |
S. Park; O. Simeone; |
1766 | Deep-Learning-Assisted Configuration of Reconfigurable Intelligent Surfaces in Dynamic Rich-Scattering Environments Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: These conditions present a tremendous challenge in identifying an RIS configuration that optimizes the achievable communication rate. In this paper, we make a first step toward tackling this challenge. |
K. Stylianopoulos; N. Shlezinger; P. Del Hougne; G. C. Alexandropoulos; |
1767 | Supervised Learning Based Sparse Channel Estimation For RIS Aided Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: An reconfigurable intelligent surface (RIS) can be used to establish line-of-sight (LoS) communication when the direct path is compromised, which is a common occurrence in a millimeter wave (mmWave) network. In this paper, we focus on the uplink channel estimation of a such network. |
D. Dampahalage; K. B. Shashika Manosha; N. Rajatheva; M. Latva-Aho; |
1768 | Goal-Oriented Communication for Edge Learning Based On The Information Bottleneck Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a goal-oriented communication system, based on the combination of IB and stochastic optimization. |
F. Pezone; S. Barbarossa; P. Di Lorenzo; |
1769 | Hypergraphs with Edge-Dependent Vertex Weights: Spectral Clustering Based on The 1-Laplacian Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a flexible framework for defining the 1-Laplacian of a hypergraph that incorporates edge-dependent vertex weights. |
Y. Zhu; B. Li; S. Segarra; |
1770 | Causal Linear Topological Filters Over A 2-Simplex Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Topological filters via sheaves generalize the classical linear translation-invariant filter theory by attaching the filter computation locally to a simplicial topological space. |
G. Essl; |
1771 | Simplicial Convolutional Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simplicial convolutional neural network (SCNN) architecture to learn from data defined on simplices, e.g., nodes, edges, triangles, etc. |
M. Yang; E. Isufi; G. Leus; |
1772 | Signal Processing On Cell Complexes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we give an introduction to signal processing on (abstract) regular cell complexes, which provide a unifying framework encompassing graphs, simplicial complexes, cubical complexes and various meshes as special cases. |
T. M. Roddenberry; M. T. Schaub; M. Hajij; |
1773 | Robust Signal Processing Over Simplicial Complexes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The goal of this paper is to investigate the impact of perturbations of topological descriptors, such as graphs and simplicial complexes, on the robustness of filters acting on signals observed over such domains. |
S. Sardellitti; S. Barbarossa; |
1774 | Conformer-Based Self-Supervised Learning For Non-Speech Audio Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a self-supervised audio representation learning method and apply it to a variety of downstream non-speech audio tasks. |
S. Srivastava; et al. |
1775 | Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We evaluate the method with two cross-modal tasks: audio-caption retrieval, and phrase-based sound event detection (SED). |
H. Xie; O. R�s�nen; K. Drossos; T. Virtanen; |
1776 | Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, directional interference events make it difficult to accurately extract spatial characteristics from target sound events. To address this problem, we propose an impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR). |
Y. Koyama; et al. |
1777 | Polyphonic Audio Event Detection: Multi-Label or Multi-Class Multi-Task Classification Problem? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, to better handle polyphonic mixtures, we propose to frame the task as a multi-class classification problem by considering each possible label combination as one class. |
H. Phan; T. N. T. Nguyen; P. Koch; A. Mertins; |
1778 | Diverse Audio Captioning Via Adversarial Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips. To address this problem, we propose an adversarial training framework for audio captioning based on a conditional generative adversarial network (C-GAN), which aims at improving the naturalness and diversity of generated captions. |
X. Mei; X. Liu; J. Sun; M. D. Plumbley; W. Wang; |
1779 | Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Owing to the perceptual uniqueness of each soundscape and the inherent subjectiveness of human perception, we propose a probabilistic perceptual attribute predictor (PPAP) that predicts parameters of random distributions as outputs instead of a single deterministic value. |
K. Ooi; K. N. Watcharasupat; B. Lam; Z. -T. Ong; W. -S. Gan; |
1780 | User Scheduling Using Graph Neural Networks for Reconfigurable Intelligent Surface Assisted Multiuser Downlink Communications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper addresses the joint user scheduling, RIS configuration, and BS beamforming problem in an RIS-assisted downlink network with limited pilot overhead. |
Z. Zhang; T. Jiang; W. Yu; |
1781 | Symbol-Level Online Channel Tracking for Deep Receivers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study how one can obtain data for retraining deep receivers without sending pilots or relying on specific protocol redundancies, by combining self-supervision with active learning concepts. |
R. A. Finish; Y. Cohen; T. Raviv; N. Shlezinger; |
1782 | Delay-Oriented Distributed Scheduling Using Graph Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: These myopic policies perform poorly in delay-oriented scheduling, in which the dependency between the current backlogs of the network and the schedule of the previous time slot needs to be considered. To address this issue, we propose a delay-oriented distributed scheduler based on graph convolutional networks (GCNs). |
Z. Zhao; G. Verma; A. Swami; S. Segarra; |
1783 | FlowDT: A Flow-Aware Digital Twin for Computer Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present FlowDT a new DL-based solution designed to model computer networks at the fine-grained flow level. |
M. Ferriol-Galm�s; X. Cheng; X. Shi; S. Xiao; P. Barlet-Ros; A. Cabellos-Aparicio; |
1784 | Stable and Transferable Wireless Resource Allocation Policies Via Manifold Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we prove the stability of MNN resource allocation policies under the absolute perturbations to the Laplace-Beltrami operator of the manifold, representing system noise and dynamics present in wireless systems. |
Z. Wang; L. Ruiz; M. Eisen; A. Ribeiro; |
1785 | Motif-Topology and Reward-Learning Improved Spiking Neural Network for Efficient Multi-Sensory Integration Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a Motiftopology and Reward-learning improved SNN (MR-SNN) for efficient multi-sensory integration. |
S. Jia; R. Zuo; T. Zhang; H. Liu; B. Xu; |
1786 | Event-Based Multimodal Spiking Neural Network with Attention Mechanism Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an end-to-end event-based multimodal spiking neural network. |
Q. Liu; D. Xing; L. Feng; H. Tang; G. Pan; |
1787 | Gradual Surrogate Gradient Learning in Deep Spiking Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, these methods are contingent on a well-designed initialization to effectively transmit the gradient information. To address these issues, we propose the Internal Spiking Neuron Model (ISNM), which uses the synaptic current instead of spike trains as the carrier of information. |
Y. Chen; S. Zhang; S. Ren; H. Qu; |
1788 | Axonal Delay As A Short-Term Memory for Feed Forward Deep Spiking Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we verify the effectiveness of integrating time delay into supervised learning and propose a module that modulates the axonal delay through short-term memory. |
P. Sun; L. Zhu; D. Botteldooren; |
1789 | Low Precision Local Learning for Hardware-Friendly Neuromorphic Visual Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we explore quantization-aware-training (QAT) for SNNs as well as fully quantized transfer-learning using the DECOLLE learning algorithm as the basis system, whose local loss based learning is bio-plausible, avoids complex back-propagation-through-time and potentially hardware-friendly. |
J. Acharya; L. R. Iyer; W. Jiang; |
1790 | A Hybrid Learning Framework for Deep Spiking Neural Networks with One-Spike Temporal Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Hereby, we present a hybrid learning framework for deep SNNs with one-spike temporal coding to make full utilization of the spike timing. |
J. Wang; J. Wu; M. Zhang; Q. Liu; H. Li; |
1791 | A-PixelHop: A Green, Robust and Explainable Fake-Image Detector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel method for detecting CNN-generated images, called Attentive PixelHop (or A-PixelHop), is proposed in this work. |
Y. Zhu; X. Wang; H. -S. Chen; R. Salloum; C. . -C. J. Kuo; |
1792 | Explainable Fact-Checking Through Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Misleading or false information has been creating chaos in some places around the world. To mitigate this issue, many researchers have proposed automated fact-checking methods to fight the spread of fake news. |
J. Yang; D. Vega-Oliveros; T. Seibt; A. Rocha; |
1793 | Deep Video Inpainting Localization Using Spatial and Temporal Traces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The proposed method is evaluated with tampered videos created by two state-of-the-art deep video inpainting algorithms. |
S. Wei; H. Li; J. Huang; |
1794 | Deepfake Speech Detection Through Emotion Recognition: A Semantic Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new audio spoofing detection system leveraging emotional features. |
E. Conti; et al. |
1795 | Text-Image De-Contextualization Detection Using Vision-Language Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the recent advances in vision-language models with powerful relationship learning between images and texts, we leverage the vision-language models to the media de-contextualization detection task. |
M. Huang; S. Jia; M. -C. Chang; S. Lyu; |
1796 | Custom Attribution Loss for Improving Generalization and Interpretability of Deepfake Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we approach deepfake detection by solving the related problem of attribution, where the goal is to distinguish each separate type of a deepfake attack. |
P. Korshunov; A. Jain; S. Marcel; |
1797 | Sparse Multi-Reference Alignment: Sample Complexity and Computational Hardness Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: Motivated by the problem of determining the atomic structure of macromolecules using single-particle cryo-electron microscopy (cryo-EM), we study the sample and computational complexities of the sparse multi-reference alignment (MRA) model: the problem of estimating a sparse signal from its noisy, circularly shifted copies. |
T. Bendory; O. Michelin; A. Singer; |
1798 | Grassmannian Dimensionality Reduction Using Triplet Margin Loss for Ume Classification of 3d Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We elaborate on the design problem of the RTUME operator in the case where the point cloud sampled from the object is sparse, noisy, and non-uniformly sampled. By introducing metric learning and negative-mining techniques into the framework of Grassmannian dimensionality reduction for universal manifold embedding, we improve classification performance for these challenging sampling conditions. |
Y. Haitman; J. M. Francos; L. L. Scharf; |
1799 | A Note on Totally Symmetric Equi-Isoclinic Tight Fusion Frames Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel construction of EITFFs that are totally symmetric: any permutation of the subspaces can be realized by an orthogonal transformation of Rd. |
M. Fickus; J. W. Iverson; J. Jasper; D. G. Mixon; |
1800 | A Simple Formula for The Moments of Unitarily Invariant Matrix Distributions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new formula for computing arbitrary moments of unitarily invariant matrix distributions. |
S. D. Howard; A. Pezeshki; |
1801 | Fusion of Modulation Spectral and Spectral Features with Symptom Metadata for Improved Speech-Based Covid-19 Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing speech-based coronavirus disease 2019 (COVID-19) detection systems provide poor interpretability and limited robustness to unseen data conditions. In this paper, we propose a system to overcome these limitations. |
Y. Zhu; T. H. Falk; |
1802 | An Overview of The FIRST ICASSP Special Session on Computer Audition for Healthcare Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Audio has been increasingly used as a novel digital phenotype that carries important information of the subject�s health status. We can find tremendous efforts given to this young and promising field, i.e., computer audition for healthcare (CA4H), whereas the application scenarios have not been fully studied as compared to its counterpart in medical areas, computer vision. |
K. Qian; T. Schultz; B. W. Schuller; |
1803 | A Glance-and-Gaze Network for Respiratory Sound Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this contribution, we propose a novel glance-and-gaze network to address the aforementioned issue. |
S. Yu; Y. Ding; K. Qian; B. Hu; W. Li; B. W. Schuller; |
1804 | Internet Streaming Audio Based Speech Reception Threshold Measurement in Cochlear Implant Users Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents our work on evaluating the reliability of the remote assessment system. |
X. Chen; et al. |
1805 | A Domain Transfer Based Data Augmentation Method for Automated Respiratory Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unfortunately, even the current world�s largest publicly available respiratory sound dataset, ICBHI, has only 6898 respiratory cycles with a total length of only 5.5 hours, which become a bottleneck for further improvement of DNN models. Therefore, we propose a data augmentation method for respiratory sounds classification, where the input transformation and migration are implemented. |
Z. Wang; Z. Wang; |
1806 | Physical Layer Anonymous Communications: An Anonymity Entropy Oriented Precoding Design (Invited Paper) Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the PHY anonymity design with focus on a typical uplink scenario where the receiver is equipped with more antennas than the sender. |
Z. Wei; C. Masouros; S. Sun; |
1807 | Federated Stochastic Gradient Descent Begets Self-Induced Momentum Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. |
H. H. Yang; Z. Liu; Y. Fu; T. Q. S. Quek; H. Vincent Poor; |
1808 | Adversarial Learning in Transformer Based Neural Network in Radio Signal Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, motivated by attractive classification performance of the transformer based neural networks, we analyse the vulnerability and robustness of the transformer against adversarial attacks in modulation classification scenarios. |
L. Zhang; S. Lambotharan; G. Zheng; |
1809 | Optm3sec: Optimizing Multicast Irs-Aided Multiantenna Dfrc Secrecy Channel With Multiple Eavesdroppers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unlike prior works, we consider a multicast multi-antenna DFRC system with multiple EDs. |
K. V. Mishra; A. Chattopadhyay; S. S. Acharjee; A. P. Petropulu; |
1810 | Privacy-Enhancing Appliance Filtering For Smart Meters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a privacy control strategy that selectively filters appliances� consumption from the smart meter measurements to hinder NILM disaggregation performance. |
R. R. Avula; T. J. Oechtering; |
1811 | Adversarial Linear Quadratic Regulator Under Falsified Actions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, a falsification attack on the agent actions in a scalar linear quadratic regulator (LQR) system is studied. |
C. Sun; Z. Li; C. Wang; |
1812 | Communication-Efficient Distributed MAX-VAR Generalized CCA Via Error Feedback-Assisted Quantization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, distributed optimization for GCCA�which is well-motivated in applications like internet of things and parallel computing�may incur prohibitively high communication costs. To address this challenge, this work proposes a communication-efficient distributed GCCA algorithm under the popular MAX-VAR GCCA paradigm. |
S. Shrestha; X. Fu; |
1813 | Provable Second-Order Riemannian Gauss-Newton Method for Low-Rank Tensor Estimation ? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. |
Y. Luo; Q. Ma; C. Zhang; A. R. Zhang; |
1814 | Bounded Simplex-Structured Matrix Factorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new low-rank matrix factorization model, dubbed bounded simplex-structured matrix factorization (BSSMF). |
O. V. Thanh; N. Gillis; F. Lecron; |
1815 | CPD Computation Via Recursive Eigenspace Decompositions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: The Canonical Polyadic Decomposition (CPD) is a fundamental tensor decomposition which has widespread use in signal processing due to its ability to extract component information. … |
E. Evert; M. Vandecappelle; L. De Lathauwer; |
1816 | Accelerating ILL-Conditioned Robust Low-Rank Tensor Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Leveraging the low-rank structure under the Tucker decomposition, we propose a provably efficient algorithm that directly estimates the tensor factors by solving a nonsmooth and nonconvex composite optimization problem that minimizes the least absolute deviation loss. |
T. Tong; C. Ma; Y. Chi; |
1817 | Ada-JSR: Sample Efficient Adaptive Joint Support Recovery From Extremely Compressed Measurement Vectors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper considers the problem of recovering the joint support (of size K) of a set of unknown sparse vectors in Rd, each of which can be sensed using a different measurement matrix. |
S. Shahsavari; P. Sarangi; M. C. H�c�menoglu; P. Pal; |
1818 | End-to-End Network Based on Transformer for Automatic Detection of Covid-19 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: There is no evidence that these features are optimal for COVID-19 detection. Therefore, we proposed an end-to-end network based on transformer for automatic detection of COVID-19. |
C. Cai; B. Liu; J. Tao; Z. Tian; J. Lu; K. Wang; |
1819 | Prototype Learning for Interpretable Respiratory Sound Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: However, in the high-stake medical domain where decisions can have significant consequences, it is desirable to develop interpretable models; thus, providing understandable reasons for physicians and patients. To address the issue, we propose a prototype learning framework, that jointly generates exemplar samples for explanation and integrates these samples into a layer of DNNs. |
Z. Ren; T. T. Nguyen; W. Nejdl; |
1820 | Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Considering that the cough of an infected person contains a large amount of information, we propose an algorithm for the automatic recognition of Covid-19 from cough signals. |
T. Yan; H. Meng; S. Liu; E. Parada-Cabaleiro; Z. Ren; B. W. Schuller; |
1821 | Detection of COPD Exacerbation from Speech: Comparison of Acoustic Features and Deep Learning Based Speech Breathing Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Medical professionals observe that the speech of COPD patients during stable periods differs from the speech during exacerbation. In this paper, we investigate this detection of COPD exacerbation from speech in three approaches: acoustic features identification using a statistical approach, low-level descriptive features with classification, and speech breathing models based on deep learning architectures to estimate the patients� breathing rate. |
V. S. Nallanthighal; A. H�rm�; H. Strik; |
1822 | Automatic Respiratory Sound Classification Via Multi-Branch Temporal Convolutional Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to explore the effectiveness of a multi-branch Temporal Convolutional Network (TCN) architecture integrated with Squeeze-and-Excitation Network (SEnet), a system denoted herein as MBTCNSE, for respiratory sound classification. |
Z. Zhao; et al. |
1823 | ICASSP 2022 Acoustic Echo Cancellation Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We open source two large datasets to train AEC models under both single talk and double talk scenarios. |
R. Cutler; et al. |
1824 | A Deep Hierarchical Fusion Network for Fullband Acoustic Echo Cancellation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work proposes a deep hierarchical fusion (DHF) network with intra-network and inter-network fusion to further improve the wideband AEC performance. |
H. Zhao; et al. |
1825 | Explore Relative and Context Information with Transformer for Joint Acoustic Echo Cancellation and Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a joint acoustic echo cancellation (AEC) and speech enhancement method with adaptive filter and deep neural network (DNN) model. |
X. Sun; C. Cao; Q. Li; L. Wang; F. Xiang; |
1826 | Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a system consisting of deep learning and signal processing to simultaneously suppress echoes, noise, and reverberation. |
G. Zhang; L. Yu; C. Wang; J. Wei; |
1827 | Multi-Task Deep Residual Echo Suppression with Echo-Aware Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces the NWPU Team�s entry to the ICASSP 2022 AEC Challenge. |
S. Zhang; et al. |
1828 | Multi-Scale Refinement Network Based Acoustic Echo Cancellation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an encoder-decoder network for acoustic echo cancellation with mutli-scale refinement paths to exploit the information at different feature scales. |
F. Cui; L. Guo; W. Li; P. Gao; Y. Wang; |
1829 | Audio-Visual Object Classification for Human-Robot Collaboration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Acoustic and visual signals can be used to estimate the physical properties of such objects, which may vary substantially in shape, material and size, and also be occluded by the hands of the person. To facilitate comparisons and stimulate progress in solving this problem, we present the CORSMAL challenge and a dataset to assess the performance of the algorithms through a set of well-defined performance scores. |
A. Xompero; Y. L. Pang; T. Patten; A. Prabhakar; B. Calli; A. Cavallaro; |
1830 | Shared Transformer Encoder with Mask-Based 3d Model Estimation for Container Mass Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Transformer encoder that shares the same architecture and parameters for filling level and type estimation. |
T. Matsubara; et al. |
1831 | Improving Generalization of Deep Networks for Estimating Physical Properties of Containers and Fillings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present methods to estimate the physical properties of house-hold containers and their fillings manipulated by humans. |
H. Wang; C. Zhu; Z. Ma; C. Oh; |
1832 | Container Localisation and Mass Estimation with An RGB-D Camera Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: The mass of an object constitutes key information for the robot to correctly regulate the force required to grasp the container. We propose a single RGB-D camera-based method to locate a manipulated container and estimate its empty mass i.e., independently of the presence of the content. |
T. Apicella; G. Slavic; E. Ragusa; P. Gastaldo; L. Marcenaro; |
1833 | Summary on The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We briefly describe the released dataset, track setups, baselines and summarize the challenge results and major techniques used in the submissions. |
F. Yu; et al. |
1834 | The CUHK-Tencent Speaker Diarization System for The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks. |
N. Zheng; et al. |
1835 | The USTC-Ximalaya System for The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge. |
M. He; et al. |
1836 | Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for The M2met Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As the highly overlapped speech exists in the dataset, we employ an x-vector-based target-speaker voice activity detection (TS-VAD) to find the overlap between speakers. |
W. Wang; X. Qin; M. Li; |
1837 | The Volcspeech System for The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For Track 1, we propose several approaches to make the clustering-based speaker diarization system enable to handle overlapped speech. |
C. Shen; et al. |
1838 | The Royalflush System of Speech Recognition for M2met Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge. |
S. Ye; P. Wang; S. Chen; X. Hu; X. Xu; |
1839 | L3DAS22 Challenge: Learning 3D Audio Sources in A Real Office Environment Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In the end, we present and discuss the results submitted by all participants. |
E. Guizzo; et al. |
1840 | ICASSP 2022 L3DAS22 Challenge: Ensemble of Resnet-Conformers with Ambisonics Data Augmentation for Sound Event Localization and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: It remains a tough challenge to tackle sound event localization and detection (SELD) problem, especially when sound scene complexity increases and overlapping acoustic sources appear. To improve the SELD performance, we propose an ensemble system, which consists of a ResNet and Conformer backbone network (SELD-RCnet) and its two variants, SED-RCnet and SSL-RCnet. |
Y. Mao; Y. Zeng; H. Liu; W. Zhu; Y. Zhou; |
1841 | A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, a trackwise ensemble event independent network with a novel data augmentation method is proposed. |
J. Hu; et al. |
1842 | Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to The L3DAS22 Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones. |
Y. -J. Lu; et al. |
1843 | Multi-Scale Temporal Frequency Convolutional Network with Axial Attention for Multi-Channel Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, a novel backbone for speech dense-prediction is proposed. |
G. Zhang; C. Wang; L. Yu; J. Wei; |
1844 | The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO Convolutional Recurrent Network for Multi Channel Speech Enhancement and Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We proposed a two-stage framework to address multi-channel speech denoising and dereverberation. |
J. Li; Y. Zhu; D. Luo; Y. Liu; G. Cui; Z. Li; |
1845 | ADD 2022: The First Audio Deep Synthesis Detection Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we describe the datasets, evaluation metrics, and protocols. |
J. Yi; et al. |
1846 | Time Domain Adversarial Voice Conversion for ADD 2022 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022). |
C. Wen; et al. |
1847 | Audio Deepfake Detection System with Neural Stitching for ADD 2022 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge[1]. |
R. Yan; C. Wen; S. Zhou; T. Guo; W. Zou; X. Li; |
1848 | Fake Audio Detection Based On Unsupervised Pretraining Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work presents our systems for the ADD2022 challenge. |
Z. Lv; S. Zhang; K. Tang; P. Hu; |
1849 | Partially Fake Audio Detection By Self-Attention-Based Fake Span Discovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, we propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios. |
H. Wu; et al. |
1850 | The Vicomtech Audio Deepfake Detection System Based on Wav2vec2 for The 2022 ADD Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes our submitted systems to the 2022 ADD challenge withing the tracks 1 and 2. |
J. M. Mart�n-Do�as; A. �lvarez; |
1851 | Audio-Visual Wake Word Spotting System for MISP Challenge 2021 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. |
Y. Xu; et al. |
1852 | Channel-Wise AV-Fusion Attention for Multi-Channel Audio-Visual Speech Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present our work for automatic speech recognition (ASR) in the Multimodal Information Based Speech Processing (MISP) Challenge 2021. |
G. Xu; et al. |
1853 | The DKU Audio-Visual Wake Word Spotting System for The 2021 MISP Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a two-stage approach consisting of end-to-end neural networks for the audio-visual wake word spotting task. |
M. Cheng; H. Wang; Y. Wang; M. Li; |
1854 | The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes the SJTU system for ICASSP Multi-modal Information based Speech Processing Challenge (MISP) 2021. |
W. Wang; et al. |
1855 | The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. |
H. Chen; et al. |
1856 | Icassp 2022 Deep Noise Suppression Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: We provide access to DNS-MOS P.835 and word accuracy (WAcc) APIs to challenge participants to help with iterative model improvements. In this challenge, we introduced the following changes: (i) Included mobile device scenarios in the blind test set; (ii) Included a personalized noise suppression track with baseline; (iii) Added WAcc as an objective metric; (iv) Included DNSMOS P.835; (v) Made the training datasets and test sets fullband (48 kHz). |
H. Dubey; et al. |
1857 | FB-MSTCN: A Full-Band Single-Channel Speech Enhancement Method Based on Multi-Scale Temporal Convolutional Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Because of the low energy of spectral information in the high-frequency part, it is more difficult to directly model and enhance the full-band spectrum using neural networks. To solve this problem, this paper proposes a two-stage real-time speech enhancement model with extraction-interpolation mechanism for a full-band signal. |
Z. Zhang; L. Zhang; X. Zhuang; Y. Qian; H. Li; M. Wang; |
1858 | FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Related Code Details Highlight: In this paper, we propose a convolutional recurrent encoder-decoder (CRED) structure to boost feature representation along the frequency axis. |
S. Zhao; B. Ma; K. N. Watcharasupat; W. -S. Gan; |
1859 | Harmonic Gated Compensation Network Plus for ICASSP 2022 DNS Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The harmonic structure of speech is resistant to noise, but the harmonics may still be partially masked by noise. Therefore, we previously proposed a harmonic gated compensation network (HGCN) to predict the full harmonic locations based on the unmasked harmonics and process the result of a coarse enhancement module to recover the masked harmonics. |
T. Wang; W. Zhu; Y. Gao; Y. Chen; J. Feng; S. Zhang; |
1860 | TEA-PSE: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System for ICASSP 2022 DNS Challenge Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper describes Tencent Ethereal Audio Lab � Northwestern Polytechnical University personalized speech enhancement (TEA-PSE) system submitted to track 2 of the ICASSP 2022 Deep Noise Suppression (DNS) challenge. |
Y. Ju; et al. |
1861 | Multi-Stage and Multi-Loss Training for Fullband Non-Personalized and Personalized Speech Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Abstract: Deep learning-based wideband (16kHz) speech enhancement approaches have surpassed traditional methods. This work further extends the existing wideband systems to enable full-band … |
L. Chen; et al. |
1862 | ICASSP-SPGC 2022: Root Cause Analysis for Wireless Network Fault Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel real-world dataset for wireless communication network fault diagnosis. |
T. Zhang; et al. |
1863 | Accurate Inference of Unseen Combinations of Multiple Rootcauses with Classifier Ensemble Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: It, however, is challenging, due to diverse feature types, diverse lengths of time slices, simultaneous occurrences of multiple root causes, and lack of training samples. In this paper, we present our solutions for these problems in ICASSP-SPGC-2022 AIOps Challenge in Communication Networks. |
X. Zhang; L. Xiong; N. Sun; M. Wang; H. Tang; Y. Zhao; |
1864 | Causal Alignment Based Fault Root Causes Localization for Wireless Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Motivated by the stability of the causal mechanism across the domains, a Causal Alignment based Root Cause Localization (CARCL) framework, including the causal alignment and the multi-stage classifier, is proposed. |
Y. Liu; et al. |
1865 | Netrca: An Effective Network Fault Cause Localization Algorithm Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, due to the complicated network architectures and wireless environments, as well as limited labeled data, accurately localizing the true root cause is challenging. In this paper, we propose a novel algorithm named NetRCA to deal with this problem. |
C. Zhang; et al. |