Paper Digest: ICASSP 2020 Highlights
The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is one of the top signal processing conferences in the world. In 2020, it is to be held virtually between May 4 and May 8, 2020.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: ICASSP 2020 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Addressing The Confounds Of Accompaniments In Singer Identification | T. Hsieh, K. Cheng, Z. Fan, Y. Yang and Y. Yang | In this paper, we attempt to address this issue. Specifically, we employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music. We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data where we “shuffle-and-remix” the separated vocal tracks and instrumental tracks of different songs to artificially make the singers sing in different contexts. |
2 | Disentangled Multidimensional Metric Learning for Music Similarity | J. Lee, N. J. Bryan, J. Salomon, Z. Jin and J. Nam | To do so, we adapt a variant of deep metric learning called conditional similarity networks to the audio domain and extend it using track-based information to control the specificity of our model. |
3 | Learning the Helix Topology of Musical Pitch | V. Lostanlen, S. Sridhar, B. McFee, A. Farnsworth and J. P. Bello | This article addresses the problem of discovering this helical structure from unlabeled audio data. |
4 | Audio-Based Auto-Tagging With Contextual Tags for Music | K. M. Ibrahim, J. Royo-Letelier, E. V. Epure, G. Peeters and G. Richard | In this work, we study the relationship between user context and audio content in order to enable context-aware music recommendation agnostic to user data. Using this, we create and release a dataset of ~50k tracks labelled with 15 different contexts. |
5 | Accurate and Scalable Version Identification Using Musically-Motivated Embeddings | F. Yesiler, J. Serr? and E. G?mez | In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. |
6 | Similarity Learning For Cover Song Identification Using Cross-Similarity Matrices of Multi-Level Deep Sequences | C. Jiang, D. Yang and X. Chen | In this paper, we propose a new Siamese network architecture for music melody similarity metric learning. |
7 | Two-Step Sound Source Separation: Training On Learned Latent Targets | E. Tzinis, S. Venkataramani, Z. Wang, C. Subakan and P. Smaragdis | In this paper, we propose a two-step training procedure for source separation via a deep neural network. |
8 | A Multi-Phase Gammatone Filterbank for Speech Separation Via Tasnet | D. Ditter and T. Gerkmann | In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well be replaced by a deterministic hand-crafted filterbank. |
9 | Improving Voice Separation by Incorporating End-To-End Speech Recognition | N. Takahashi, M. K. Singh, S. Basak, P. Sudarsanam, S. Ganapathy and Y. Mitsufuji | In this work, we propose to explicitly incorporate the phonetic and linguistic nature of speech by taking a transfer learning approach using an end-to-end automatic speech recognition (E2EASR) system. |
10 | Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation | Y. Luo, Z. Chen and T. Yoshioka | In this paper, we propose dual-path recurrent neural network (DPRNN), a simple yet effective method for organizing RNN layers in a deep structure to model extremely long sequences. |
11 | Controlling the Perceived Sound Quality for Dialogue Enhancement With Deep Learning | C. Uhle, M. Torcoliv and J. Paulus | We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. |
12 | Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function | M. Togami, Y. Masuyama, T. Komatsu and Y. Nakagome | In this paper, we propose a multi-channel speech source separation method with a deep neural network (DNN) which is trained under the condition that no clean signal is available. |
13 | A Framework for the Robust Evaluation of Sound Event Detection | ?. Bilen, G. Ferroni, F. Tuveri, J. Azcarreta and S. Krstulovic | This work defines a new framework for performance evaluation of polyphonic sound event detection (SED) systems, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates. |
14 | Weakly-Supervised Sound Event Detection with Self-Attention | K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda and K. Takeda | In this paper, we propose a novel sound event detection (SED) method that incorporates a self-attention mechanism of the Transformer for a weakly-supervised learning scenario. |
15 | A Sequence Matching Network for Polyphonic Sound Event Localization and Detection | T. N. Tho Nguyen, D. L. Jones and W. Gan | We propose a two-step approach that decouples the learning of the sound event detection and directional-of-arrival estimation systems. |
16 | Few-Shot Acoustic Event Detection Via Meta Learning | B. Shi, M. Sun, K. C. Puvvada, C. Kao, S. Matsoukas and C. Wang | We study few-shot acoustic event detection (AED) in this paper. |
17 | Few-Shot Sound Event Detection | Y. Wang, J. Salomon, N. J. Bryan and J. Pablo Bello | In this work, we (1) adapt state-of-the-art metric-based few-shot learning methods to automate the detection of similar-sounding events, requiring only one or few examples of the target event, (2) develop a method to automatically construct a partial set of labeled examples (negative samples) to reduce user labeling effort, and (3) develop an inference-time data augmentation method to increase detection accuracy. |
18 | Sound Event Detection in Synthetic Domestic Environments | R. Serizel, N. Turpault, A. Shah and J. Salamon | We present a comparative analysis of the performance of state-of-the-art sound event detection systems. |
19 | Learning to Separate Sounds from Weakly Labeled Scenes | F. Pishdadian, G. Wichern and J. L. Roux | We propose objective functions and network architectures that enable training a source separation system with weak labels. |
20 | Improving Universal Sound Separation Using Sound Classification | E. Tzinis, S. Wisdom, J. R. Hershey, A. Jansen and D. P. W. Ellis | In this paper, we utilize the semantic information learned by sound classifier networks trained on a vast amount of diverse sounds to improve universal sound separation. |
21 | Source Separation with Weakly Labelled Data: an Approach to Computational Auditory Scene Analysis | Q. Kong, Y. Wang, X. Song, Y. Cao, W. Wang and M. D. Plumbley | In this work, we propose a source separation framework trained with weakly labelled data. |
22 | Boosted Locality Sensitive Hashing: Discriminative Binary Codes for Source Separation | S. Kim, H. Yang and M. Kim | In this study, we propose an adaptive boosting approach to learning locality sensitive hash codes, which represent audio spectra efficiently. |
23 | A Frequency-Domain BSS Method Based on l1 Norm, Unitary Constraint, and Cayley Transform | S. Emura, H. Sawada, S. Araki and N. Harada | We propose a frequency-domain blind source separation method that uses (a) the l1 norm of orthonormal vectors of estimated source signals as a sparsity measure and (b) Cayley transform for optimizing the objective function under the unitary constraint in the Riemannian geometry approach. |
24 | End-To-End Non-Negative Autoencoders for Sound Source Separation | S. Venkataramani, E. Tzinis and P. Smaragdis | In this paper, we generalize NMF to develop end-to-end non-negative auto-encoders and demonstrate how they can be used for source separation. |
25 | Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision | A. Jansen et al. | With this motivation, we present a learning framework for sound representation and recognition that combines (i) a self-supervised objective based on a general notion of unimodal and cross-modal coincidence, (ii) a clustering objective that reflects our need to impose categorical structure on our experiences, and (iii) a cluster-based active learning procedure that solicits targeted weak supervision to consolidate categories into relevant semantic classes. |
26 | Acoustic Scene Classification for Mismatched Recording Devices Using Heated-Up Softmax and Spectrum Correction | T. Nguyen, F. Pernkopf and M. Kosmider | In this paper, we introduce two calibration methods to tackle these challenges. |
27 | Limitations of Weak Labels for Embedding and Tagging | N. Turpault, R. Serizel and E. Vincent | In this paper, we formulate a supervised learning problem which involves weak labels. We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges. |
28 | Mt-Gcn For Multi-Label Audio Tagging With Noisy Labels | H. Shrivaslava, Y. Yin, R. R. Shah and R. Zimmermann | We propose two ontology-based graph construction methods, and conduct extensive experiments on the FSDKaggle2019 dataset. |
29 | Acoustic Scene Classification Using Deep Residual Networks with Late Fusion of Separated High and Low Frequency Paths | M. D. McDonnell and W. Gao | We investigate the problem of acoustic scene classification, using a deep residual network applied to log-mel spectrograms complemented by log-mel deltas and delta-deltas. |
30 | End-To-End Auditory Object Recognition Via Inception Nucleus | M. Ebrahimpour, T. Shea, A. Danielescu, D. Noelle and C. Kello | In this paper, we propose a novel end-to-end deep neural network to map the raw waveform inputs to sound class labels. |
31 | Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition | M. Kentgens, A. Behler and P. Jax | This paper presents a novel 3DoF+ system that allows to navigate, i.e., change position, in scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics recording. |
32 | Blaster: An Off-Grid Method for Blind and Regularized Acoustic Echoes Retrieval | D. D. Carlo, C. Elvira, A. Deleforge, N. Bertin and R. Gribonval | This work proposes a novel approach to blindly retrieve the off-grid timing of early acoustic echoes from a stereophonic recording of an unknown sound source such as speech. |
33 | Evaluation of Sensor Self-Noise In Binaural Rendering of Spherical Microphone Array Signals | H. Helmholz, J. Ahrens, D. L. Alon, S. V. Amengual Gar? and R. Mehra | We use the Real-Time Spherical Array Renderer (ReTiSAR) to analyze and auralize the propagation of sensor self-noise through the processing pipeline. |
34 | Mutual-Information-Based Sensor Placement for Spatial Sound Field Recording | K. Ariga, T. Nishida, S. Koyama, N. Ueno and H. Saruwatari | A sensor (microphone) placement method based on mutual information for spatial sound field recording is proposed. |
35 | Fast Acoustic Scattering Using Convolutional Neural Networks | Z. Fan, V. Vineet, H. Gamper and N. Raghuvanshi | We propose training a convolutional neural network to map from a convex scatterer?s crosssection to a 2D slice of the resulting spatial loudness distribution. |
36 | Frequency-Dependent Directional Feedback Delay Network | B. Alary and A. Politis | In this paper, we present a modified formulation of the Directional Feedback Delay Network method that allows both frequency-and direction-dependent reverberation. |
37 | Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention | Y. Koizumi, K. Yaiabe, M. Delcroix, Y. Maxuxama and D. Takeuchi | This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. |
38 | PEVD-Based Speech Enhancement in Reverberant Environments | V. W. Neo, C. Evers and P. A. Naylor | In this work, we focus on reverberant environments. It is shown that, by exploiting the lack of correlation between speech and the late reflections, further noise reduction can be achieved. |
39 | DNN-Based Speech Presence Probability Estimation for Multi-Frame Single-Microphone Speech Enhancement | M. Tammen, D. Fischer, B. T. Meyer and S. Doclo | In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate the SPP for each TF bin. |
40 | Nonlinear Spatial Filtering for Multichannel Speech Enhancement in Inhomogeneous Noise Fields | K. Tesch and T. Gerkmann | In this paper, we show that a joint spatial-spectral nonlinear filter is not only advantageous for noise distributions that are significantly more heavy-tailed than a Gaussian but also for distributions that model inhomogeneous noise fields while having rather low kurtosis. |
41 | Generalized Coherence-Based Signal Enhancement | H. W. L?llmann, A. Brendel and W. Kellermann | This contribution presents a novel approach for coherence-based signal enhancement. |
42 | Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement | S. Maiti and M. I. Mandel | Here we propose to use the high quality speech generation capability of neural vocoders for better quality speech enhancement. |
43 | Robust and steerable kronecker product differential beamforming With rectangular microphone arrays | G. Huang, J. Benesty, J. Chen and I. Cohen | In this paper, we consider rectangular shapes of planar microphone arrays. |
44 | Jointly Optimal Dereverberation and Beamforming | C. Boeddeker, T. Nakatani, K. Kinoshita and R. Haeb-Umbach | To this end, this paper presents a new derivation of the convolutional beamformer that allows us to factorize it into a WPE dereverberation filter, and a special type of a (non-convolutional) beamformer, referred to as a weighted MPDR (wM-PDR) beamformer, without loss of optimality. |
45 | Exploiting Rays in Blind Localization of Distributed Sensor Arrays | S. Wozniak and K. Kowalczyk | In this paper, we focus on estimators for inferring the relative geometry of distributed arrays and sources, i.e. the setup geometry up to a scaling factor. |
46 | A Novel Method for Obtaining Diffuse Field Measurements for Microphone Calibration | N. Akbar, G. Dickins, M. R. P. Thomas, P. Samarasinghe and T. Abhayapala | We propose a straightforward and cost-effective method to perform diffuse soundfield measurements for calibrating the magnitude response of a microphone array. |
47 | Multi-Channel Speech Source Separation and Dereverberation With Sequential Integration of Determined and Underdetermined Models | M. Togami | In this paper, we propose a joint multi-channel speech source separation and dereverberation method in which multiple speech sources and late reverberation are separated in an unsupervised manner. |
48 | Fast and Stable Blind Source Separation with Rank-1 Updates | R. Scheibler and N. Ono | We propose a new algorithm for the blind source separation of acoustic sources. |
49 | Modeling Plate and Spring Reverberation Using A Dsp-Informed Deep Neural Network | M. A. Mart?nez Ram?rez, E. Benetos and J. D. Reiss | Based on digital reverberators that use sparse FIR filters, we propose a signal processing-informed deep learning architecture for the modeling of artificial reverberators. |
50 | Deep Autotuner: A Pitch Correcting Network for Singing Performances | S. Wager, G. Tzanetakis, C. Wang and M. Kim | We introduce a data-driven approach to automatic pitch correction of solo singing performances. |
51 | Perceptual loss function for neural modeling of audio systems | A. Wright and V. V?lim?ki | This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. |
52 | One-Shot Parametric Audio Production Style Transfer with Application to Frequency Equalization | S. I. Mimilakis, N. J. Bryan and P. Smaragdis | In this paper], [we present a method that facilitates this process by inferring appropriate audio effect parameters in order to make an input recording sound similar to an unrelated reference recording. |
53 | Speech-To-Singing Conversion in an Encoder-Decoder Framework | J. Parekh, P. Rao and Y. Yang | In this paper our goal is to convert a set of spoken lines into sung ones. |
54 | Tensorflow Audio Models in Essentia | P. Alonso-Jim?nez, D. Bogdanov, J. Pons and X. Serra | In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. |
55 | Anomalous Sound Detection Based on Interpolation Deep Neural Network | K. Suefusa, T. Nishida, H. Purohit, R. Tanabe, T. Endo and Y. Kawaguchi | To solve the issue, we propose an approach to anomalous detection in which the model utilizes multiple frames of a spectrogram whose center frame is removed as an input, and it predicts an interpolation of the removed frame as an output. |
56 | A-CRNN: A Domain Adaptation Model for Sound Event Detection | W. Wei, H. Zhu, E. Benetos and Y. Wang | This paper presents a domain adaptation model for sound event detection. We have collected and annotated a dataset in Singapore with two types of recording devices to complement existing datasets in the research community, especially with respect to domain adaptation. |
57 | SPIDERnet: Attention Network For One-Shot Anomaly Detection In Sounds | Y. Koizumi, M. Yasuda, S. Murata, S. Saito, H. Uematsu and N. Harada | We propose a similarity function for one-shot anomaly detection in sounds (ADS) called SPecific anomaly IDentifiER network (SPI- DERnet). |
58 | Sound Event Detection Via Dilated Convolutional Recurrent Neural Networks | Y. Li, M. Liu, K. Drossos and T. Virtanen | In this paper, we propose to use a dilated CRNN, namely a CRNN with a dilated convolutional kernel, as the classifier for the task of SED. |
59 | A Deep Neural Network-Driven Feature Learning Method for Polyphonic Acoustic Event Detection from Real-Life Recordings | M. Mulimani, A. B. Kademani and S. G. Koolagudi | In this paper, a Deep Neural Network (DNN)-driven feature learning method for polyphonic Acoustic Event Detection (AED) is proposed. |
60 | Weakly Labelled Audio Tagging Via Convolutional Networks with Spatial and Channel-Wise Attention | S. Hong, Y. Zou, W. Wang and M. Cao | In this paper, we propose a novel attention mechanism, namely, spatial and channel-wise attention (SCA). |
61 | A Study on the Transferability of Adversarial Attacks in Sound Event Classification | V. Subramanian, A. Pankajakshan, E. Benetos, N. Xu, S. McDonald and M. Sandler | Our work focuses on studying the transferability of adversarial attacks in sound event classification. |
62 | Propeller Noise Detection with Deep Learning | M. Thomas, F. Lionel and D. Laurent | In this paper, we propose a new model of underwater propeller noise as well as its optimal statistical detector for detecting the presence of propeller in acoustic signal. |
63 | Duration Robust Weakly Supervised Sound Event Detection | H. Dinkel and K. Yu | This work aims to investigate the performance impact of fixed-sized window median filter post-processing and advocate the use of double thresholding as a more robust and predictable post-processing method. |
64 | A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification | C. Kao, M. Sun, W. Wang and C. Wang | This paper focuses on investigating the dynamics of LSTM model on AEC tasks. |
65 | An Ontology-Aware Framework for Audio Event Classification | Y. Sun and S. Ghaffarzadegan | To capture such dependencies between the labels, we propose an ontology-aware neural network containing two components: feed-forward ontology layers and graph convolutional networks (GCN). |
66 | Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection | J. Yan, Y. Song, L. Dai and I. McLoughlin | In this paper, we propose a task-aware mean teacher method using a convolutional recurrent neural network (CRNN) with multi-branch structure to solve the SED and AT tasks differently. |
67 | Wawenets: A No-Reference Convolutional Waveform-Based Approach to Estimating Narrowband and Wideband Speech Quality | A. A. Catellier and S. D. Voran | Building on prior work we have developed a no-reference (NR) waveform-based convolutional neural network (CNN) architecture that can accurately estimate speech quality or intelligibility of narrowband and wideband speech segments. |
68 | A Neural Network for Monaural Intrusive Speech Intelligibility Prediction | M. B. Pedersen, A. Heidemann Andersen, S. H. Jensen and J. Jensen | In the present work, we propose a neural network for monaural intrusive SIP. |
69 | Source Coding of Audio Signals with a Generative Model | R. Fejgin, J. Klejsa, L. Villemoes and C. Zhou | We consider source coding of audio signals with the help of a generative model. |
70 | Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks | G. Mittag and S. M?ller | In this paper, we present a full-reference speech quality prediction model with a deep learning approach. |
71 | Enhanced Method of Audio Coding Using CNN-Based Spectral Recovery with Adaptive Structure | S. Shin, S. K. Beack, W. Lim and H. Park | This study proposes an enhanced method of audio coding based on spectral recovery with an adaptive structure that yields improved sound quality compared with the previous method. |
72 | Audio Codec Enhancement with Generative Adversarial Networks | A. Biswas and D. Jia | Specifically, with these two classes of signals as examples, we demonstrate a technique for restoring audio from coding noise based on generative adversarial networks (GAN). |
73 | Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization | K. Zhen, M. S. Lee, J. Sung, S. Beack and M. Kim | We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. |
74 | A Dual-Staged Context Aggregation Method towards Efficient End-to-End Speech Enhancement | K. Zhen, M. S. Lee and M. Kim | In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. |
75 | A Recurrent Variational Autoencoder for Speech Enhancement | S. Leglaive, X. Alameda-Pineda, L. Girin and R. Horaud | This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). |
76 | Speakerfilter: Deep Learning-Based Target Speaker Extraction Using Anchor Speech | S. He, H. Li and X. Zhang | To effectively utilize anchor speech, we propose a multi-level feature extraction and seamlessly integrate the features into a speech separation model. |
77 | Tackling Real Noisy Reverberant Meetings with All-Neural Source Separation, Counting, and Diarization System | K. Kinoshita, M. Delcroix, S. Araki and T. Nakatani | In this paper, we first consider practical issues required for improving the robustness of the all-neural approach, and then experimentally show that, even in real meeting scenarios, the all-neural approach can perform effective speech enhancement, and simultaneously outperform state-of-the-art systems. |
78 | Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform | T. Nakamura and H. Saruwatari | We propose a time-domain audio source separation method using down-sampling (DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT). |
79 | Auditory Model Based Subsetting of Head-Related Transfer Function Datasets | S. Spagnol | In this article a novel HRTF subset selection algorithm based on auditory-model vertical localization predictions and a greedy heuristic is outlined, designed to identify a representative HRTF subset from a catalogue including the three biggest public datasets currently available (373 HRTFs overall). |
80 | Impulse Response Data Augmentation and Deep Neural Networks for Blind Room Acoustic Parameter Estimation | N. J. Bryan | To address this, we propose an AIR augmentation method that can parametrically control the T60 and DRR, allowing us to expand a small dataset of real AIRs into a balanced dataset orders of magnitude larger. |
81 | Individual Distance-Dependent HRTFS Modeling Through A Few Anthropometric Measurements | M. Zhang, X. Wu and T. Qu | In this paper, a method for predicting individual distance-dependent HRTFs using a few anthropometric parameters is proposed. |
82 | Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments | A. Ivry, I. Cohen and B. Berdugo | We consider five different models to generate RIRs, and five different VADs that are trained with the augmented training set. |
83 | A Minimal Personalization of Dynamic Binaural Synthesis with Mixed Structural Modeling and Scattering Delay Networks | M. Geronazzo, J. Y. Tissieres and S. Serafin | This paper provides a small set of essential parameters for a personalized and effective real-time auralization with headphones. |
84 | Sound Texture Synthesis Using RI Spectrograms | H. Caracalla and A. Roebel | This article introduces a new parametric synthesis method for sound textures based on existing works in visual and sound texture synthesis. |
85 | Time Domain Velocity Vector for Retracing the Multipath Propagation | J. Daniel and S. Kitic | We propose a conceptually and computationally simple form of sound velocity that offers a readable view of the interference between direct and indirect sound waves. |
86 | Acoustic Matching By Embedding Impulse Responses | J. Su, Z. Jin and A. Finkelstein | This paper introduces a deep learning solution for two parts of the acoustic matching problem. |
87 | Joint Estimation Of Acoustic Parameters From Single-Microphone Speech Observations | D. Looney and N. D. Gaubitch | To address the inherent interplay that exists between these parameters, which can hinder existing methods designed to estimate only a single parameter, we propose a data-driven solution to jointly estimate all three parameters using a convolutional neural network. |
88 | A Fast Reduced-Rank Sound Zone Control Algorithm Using The Conjugate Gradient Method | L. Shi, T. Lee, L. Zhang, J. K. Nielsen and M. Gr?sb?ll Christensen | In this paper, we propose a fast reduced-rank sound zone control algorithm using the conjugate gradient (CG) method. |
89 | An Empirical Study on Acoustic Feedback Path Across Hearing Aid Users | M. Guo | In this empirical study, we measured feedback paths on different users wearing a RITE style hearing aid fitted with different domes and micro molds. |
90 | Low Complexity NLMS for Multiple Loudspeaker Acoustic ECHO Canceller Using Relative Loudspeaker Transfer Functions | O. Schwartz, E. A. P. Habets and S. Gannot | In this paper, we present an normalized least mean square (NLMS) algorithm for a multi-loudspeaker case using relative loudspeaker transfer functions (RLTFs). |
91 | A Multichannel Kalman-Based Wiener Filter Approach for Speaker Interference Reduction in Meetings | P. Meyer, S. Elshamy and T. Fingscheidt | We extend an existing approach by integrating methods from acoustic echo cancellation to improve the estimation of the interferer (noise) components of the filter, which leads to an improved signal-to-interferer ratio by up to 2.1 dB absolute at constant speech component quality. |
92 | Primary Path Estimator Based on Individual Secondary Path for ANC Headphones | J. Fabry and P. Jax | In this contribution, we propose an estimator for the individual primary path based on a measurement of the individual secondary path. |
93 | Efficient Multichannel Nonlinear Acoustic Echo Cancellation Based on a Cooperative Strategy | M. M. Halimeh and W. Kellermann | While a common approach to address nonlinear distortions, emitted by multiple loudspeakers and observed by multiple microphones, is to use post-filtering techniques, this paper proposes a cooperative strategy to rather model and then cancel such distortions. |
94 | Active Control of Line Spectral Noise with Simultaneous Secondary Path Modeling Without Auxiliary Noise | M. Hu and J. Lu | In this paper, we further extend the analysis and theoretically prove that even for line spectral noise, it is feasible to realize online secondary path modeling using only the output of the control source while exerting active control simultaneously. |
95 | Robust Frequency-Domain Recursive Least M-Estimate Adaptive Filter For Acoustic System Identification | H. He, J. Chen, J. Benesty and Y. Yu | To identify acoustic systems in non-Gaussian and Gaussian noises, a robust frequency-domain recursive least M-estimate (FRLM) adaptive filtering algorithm is proposed |
96 | Nearest Kronecker Product Decomposition Based Normalized Least Mean Square Algorithm | S. S. Bhattacharjee and N. V. George | In this paper, we derive the Least Mean Square (LMS) versions of adaptive algorithms which take advantage of NKP decomposition, namely NKP-LMS and NKP Normalized LMS (NKP-NLMS) algorithms. |
97 | Joint Beamforming and Reverberation Cancellation Using a Constrained Kalman Filter With Multichannel Linear Prediction | S. Hashemgeloogerdi and S. Braun | We propose a constrained Kalman filter based multichannel linear prediction method to jointly perform denoising and dereverberation efficiently using an online processing algorithm. |
98 | Multi-Microphone Complex Spectral Mapping for Speech Dereverberation | Z. Wang and D. Wang | This study proposes a multi-microphone complex spectral mapping approach for speech dereverberation on a fixed array geometry. |
99 | Predicting Word Error Rate for Reverberant Speech | H. Gamper, D. Emmanouilidou, S. Braun and I. J. Tashev | In this paper we propose predicting ASR performance in terms of the word error rate (WER) directly from acoustic parameters via a polynomial, sigmoidal, or neural network fit, as well as blindly from reverberant speech samples using a convolutional neural network (CNN). |
100 | Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help? | C. Gupta, E. Yilmaz and H. Li | In this work, we propose to learn music genre-specific characteristics to train polyphonic acoustic models. |
101 | Local Key Estimation In Classical Music Recordings: A Cross-Version Study on Schubert?s Winterreise | H. Schreiber, C. Wei? and M. M?ller | In this work, we approach local key estimation on a unique cross-version dataset comprising nine performances (versions) of Schubert?s song cycle Winterreise?a challenging scenario of high musical ambiguity and subjectivity. |
102 | Improving Music Transcription by Pre-Stacking A U-Net | F. Pedersoli, G. Tzanetakis and K. M. Yi | We propose to pre-stack a U-Net as a way of improving the polyphonic music transcription performance of various baseline Convolutional Neural Networks (CNNS). |
103 | Learning to Rank Music Tracks Using Triplet Loss | L. Pr?tet, G. Richard and G. Peeters | In this work, we propose a method for direct recommendation based on the audio content without explicitly tagging the music tracks. |
104 | Transformer VAE: A Hierarchical Model for Structure-Aware and Interpretable Music Representation Learning | J. Jiang, G. G. Xia, D. B. Carlton, C. N. Anderson and R. H. Miyakawa | To achieve these two goals simultaneously, we designed the Transformer Variational AutoEncoder, a hierarchical model that unifies the efforts of two recent breakthroughs in deep music generation: 1) the Music Transformer and 2) Deep Music Analogy. |
105 | A Comparative Study of Western and Chinese Classical Music Based on Soundscape Models | J. Fan, Y. Yang, K. Dong and P. Pasquier | In this study, we examine whether we can analyze and compare Western and Chinese classical music based on soundscape models. |
106 | Audio-Based Detection of Explicit Content in Music | A. Vaglio, R. Hennequin, M. Moussallam, G. Richard and F. d?Alch?-Buc | We present a novel automatic system for performing explicit content detection directly on the audio signal. |
107 | New Metrics for Evaluating the Accuracy of Fundamental Frequency Estimation Approaches in Musical Signals | J. Devaney | This paper evaluates the magnitude of accuracy differences between a simple mean-based frame-level accuracy measurement and four metrics that capture more perceptually-relevant aspects of the evolution of f0 traces over time (perceived pitch, vibrato rate, vibrato depth, and jitter) for two score-informed f0 estimation algorithms. |
108 | Data-Driven Harmonic Filters for Audio Representation Learning | M. Won, S. Chun, O. Nieto and X. Serrc | We introduce a trainable front-end module for audio representation learning that exploits the inherent harmonic structure of audio signals. |
109 | Learning a Representation for Cover Song Identification Using Convolutional Neural Network | Z. Yu, X. Xu, X. Chen and D. Yang | In this paper, we propose a novel Convolutional Neural Network (CNN) towards cover song identification. |
110 | Towards Linking the Lakh and IMSLP Datasets | T. Tsai | We propose a method for scalable cross-modal retrieval that might be used to link the Lakh MIDI dataset with IM-SLP sheet music data. |
111 | A Multi-Dilation and Multi-Resolution Fully Convolutional Network for Singing Melody Extraction | P. Gao, C. You and T. Chi | In this paper, we propose a neural network, which includes spectro-temporal multi-resolution decomposition of the log-spectrogram of the sound and a semantic segmentation model to respectively address the bottom-up and top-down processing of hearing, for singing melody extraction. |
112 | An Improved Solution to the Frequency-Invariant Beamforming with Concentric Circular Microphone Arrays | X. Zhao, G. Huang, J. Chen and J. Benesty | In this paper, we find that the spatial aliasing problem is caused by higher-order circular harmonics. |
113 | Binauralaudio Source Remixing with Microphone Array Listening Devices | R. M. Corey and A. C. Singer | This work considers a source-remixing filter that alters the relative level of each source independently. |
114 | Exploiting Periodicity Features for Joint Detection and DOA Estimation of Speech Sources Using Convolutional Neural Networks | R. Varzandeh, K. Adiloglu, S. Doclo and V. Hohmann | In this paper, a multi-input single-output convolutional neural network (CNN) is proposed which exploits a novel feature combination for joint DOA estimation and VAD in the context of binaural hearing aids. |
115 | Unsupervised Multiple Source Localization Using Relative Harmonic Coefficients | Y. Hu, P. N. Samarasinghe, T. D. Abhayapala and S. Gannot | This paper presents an unsupervised multi-source localization algorithm using a recently introduced feature called the relative harmonic coefficients. |
116 | Data-Driven Wind Speed Estimation Using Multiple Microphones | D. Mirabilii, K. K. Lakshminarayana, W. Mack and E. A. P. Habets | A deep neural network (DNN) based approach for estimating the speed of airflows using closely-spaced microphones is proposed. |
117 | A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking | C. Schymura et al. | Inspired by recent progress in the context of integrating uncertainty estimates into modern deep learning frameworks, this paper proposes a deep neural-network-based implementation of the Kalman filter with dynamic stream weights, whose parameters can be learned via standard backpropagation. |
118 | Maximum Likelihood Multi-Speaker Direction of Arrival Estimation Utilizing a Weighted Histogram | E. Hadad and S. Gannot | In this contribution, a novel maximum likelihood (ML) based direction of arrival (DOA) estimator for concurrent speakers in a noisy reverberant environment is presented. |
119 | Overdetermined Independent Vector Analysis | R. Ikeshita, T. Nakatani and S. Araki | We address the convolutive blind source separation problem for the (over-)determined case where (i) the number of nonstationary target-sources K is less than that of microphones M, and (ii) there are up to M -K stationary Gaussian noises that need not to be extracted. |
120 | Spatially Guided Independent Vector Analysis | A. Brendel, T. Haubner and W. Kellermann | We present a Maximum A Posteriori (MAP) derivation of the Independent Vector Analysis (IVA) algorithm for blind source separation incorporating an additional spatial prior over the demixing matrices. |
121 | Fast Independent Vector Extraction by Iterative SINR Maximization | R. Scheibler and N. Ono | We propose fast independent vector extraction (FIVE), a new algorithm that blindly extracts a single non-Gaussian source from a Gaussian background. |
122 | Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-Based Prior Distribution of Joint-Diagonalization Process | K. Kamo et al. | In this paper, we address a convolutive blind source separation (BSS) problem and propose a new extended framework of FastMNMF by introducing prior information for joint diagonalization of the spatial covariance matrix model. |
123 | Beyond the Dcase 2017 Challenge on Rare Sound Event Detection: A Proposal for a More Realistic Training and Test Framework | J. Baumann, T. Lohrenz, A. Roy and T. Fingscheidt | This paper proposes a rare SED training and test framework, which is reflecting an SED application in a more realistic way. |
124 | Metric Learning with Background Noise Class for Few-Shot Detection of Rare Sound Events | K. Shimada, Y. Koyama and A. Inoue | In this paper, we aim to achieve few-shot detection of rare sound events, from query sequence that contain not only the target events but also the other events and background noise. |
125 | Sound Event Detection by Multitask Learning of Sound Events and Scenes with Soft Scene Labels | K. Imoto, N. Tonami, Y. Koizumi, M. Yasuda, R. Yamanishi and Y. Yamashita | In this paper, we thus propose a new method for SED based on MTL of SED and ASC using the soft labels of acoustic scenes, which enable us to model the extent to which sound events and scenes are related. |
126 | Guided Learning for Weakly-Labeled Semi-Supervised Sound Event Detection | L. Lin, X. Wang, H. Liu and Y. Qian | We propose a simple but efficient method termed Guided Learning for weakly-labeled semi-supervised sound event detection (SED). |
127 | Staged Training Strategy and Multi-Activation for Audio Tagging with Noisy and Sparse Multi-Label Data | K. He, Y. Shen, W. Zhang and J. Liu | In this paper, we propose a staged training strategy to deal with the noisy label, and adopt a sigmoid-sparsemax multi-activation structure to deal with the sparse multi-label classification. |
128 | Learning With Out-of-Distribution Data for Audio Classification | T. Iqbal, Y. Cao, Q. Kong, M. D. Plumbley and W. Wang | In this paper, we investigate an instance of labelling error for classification tasks in which the dataset is corrupted with out-of-distribution (OOD) instances: data that does not belong to any of the target classes, but is labelled as such. |
129 | Multi-Branch Learning for Weakly-Labeled Sound Event Detection | Y. Huang, X. Wang, L. Lin, H. Liu and Y. Qian | Since there are only annotations for audio tagging available in weakly-supervised SED, we design multiple branches with different learning purposes instead of pursuing multiple tasks. |
130 | Scene-Dependent Acoustic Event Detection with Scene Conditioning and Fake-Scene-Conditioned Loss | T. Komatsu, K. Imoto and M. Togami | In this paper, we propose scene-dependent acoustic event detection (AED) with scene conditioning and fake-scene-conditioned loss. |
131 | Sound Event Localization Based on Sound Intensity Vector Refined by Dnn-Based Denoising and Source Separation | M. Yasuda, Y. Koizumi, S. Saito, H. Uematsu and K. Imoto | We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). |
132 | High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification | X. Bai, J. Du, J. Pan, H. Zhou, Y. Tu and C. Lee | In this paper, we propose a novel strategy for acoustic scene classification, namely high-resolution attention network with acoustic segment model (HRAN-ASM). |
133 | Polyphonic Sound Event Detection Using Transposed Convolutional Recurrent Neural Network | C. C. Chatterjee, M. Mulimani and S. G. Koolagudi | In this paper we propose a Transposed Convolutional Recurrent Neural Network (TCRNN) architecture for polyphonic sound event recognition. |
134 | SeCoST:: Sequential Co-Supervision for Large Scale Weakly Labeled Audio Event Detection | A. Kumar and V. K. Ithapu | In this work, we propose a new framework for designing learning models with weak supervision by bridging ideas from sequential learning and knowledge distillation. |
135 | Deep Speech Extraction with Time-Varying Spatial Filtering Guided By Desired Direction Attractor | Y. Nakagome, M. Togami, T. Ogawa and T. Kobayashi | In this investigation, a deep neural network (DNN) based speech extraction method is proposed to enhance a speech signal propagating from the desired direction. |
136 | Adaptive Blind Audio Source Extraction Supervised By Dominant Speaker Identification Using X-Vectors | J. Jansk?, J. M?lek, J. Cmejla, T. Kounovsk?, Z. Koldovsk? and J. ?d??nsk? | We propose a novel algorithm for adaptive blind audio source extraction. |
137 | Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student?s T Distribution | T. Kondo et al. | In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). |
138 | Determined Source Separation Using the Sparsity of Impulse Responses | Y. Takahashi, D. Kitahara, K. Matsuura and A. Hirabayashi | In this paper, we propose an over-determined sound source separation method considering the sparsity of impulse responses. |
139 | Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam | M. Delcroix et al. | In this paper, we investigate strategies for improving the speaker discrimination capability of SpeakerBeam. |
140 | WHAMR!: Noisy and Reverberant Single-Channel Speech Separation | M. Maciejewski, G. Wichern, E. McQuinn and J. L. Roux | To address this, we introduce WHAMR!, an augmented version of WHAM! with synthetic reverberated sources, and provide a thorough baseline analysis of current techniques as well as novel cascaded architectures on the newly introduced conditions. |
141 | Impact of a Shift-Invariant Harmonic Phase Model in Fully Parametric Harmonic Voice Representation and Time/Frequency Synthesis | A. Ferreira, J. Silva, F. Brito and D. Sinha | In this paper, we describe two fully parametric harmonic representation and signal reconstruction alternatives that rely on a shift-invariant harmonic phase model and that implement accurate frame-based synthesis in the frequency-domain, and accurate pitch pulse-based synthesis in the time-domain. |
142 | Hearing aid Research Data Set for Acoustic Environment Recognition | A. H?wel, K. Adiloglu and J. Bach | In this work we propose a novel, binaural HA acoustic environment recognition data set (HEAR-DS) suitable for the environment recognition needs of HAs. |
143 | Audio Feature Extraction for Vehicle Engine Noise Classification | L. Becker, A. Nelus, J. Gauer, L. Rudolph and R. Martin | In this paper we propose a new scheme for vehicle engine noise classification as a more privacy-preserving alternative to classifying vehicles based on video recordings. |
144 | Time-Frequency Feature Decomposition Based on Sound Duration for Acoustic Scene Classification | Y. Wu and T. Lee | In this paper, we propose a feature decomposition method based on temporal median filtering, and use convolutional neural network to model long-duration background sounds and transient sounds separately. |
145 | Vggsound: A Large-Scale Audio-Visual Dataset | H. Chen, W. Xie, A. Vedaldi and A. Zisserman | Our goal is to collect a large-scale audio-visual dataset with low label noise from videos ?in the wild? using computer vision techniques. |
146 | Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings | E. B. ?oban, D. Pir, R. So and M. I. Mandel | This paper investigates this generalization in several ways and finds that models themselves display limited performance, however, their intermediate representations can be used to train successful models on small sets of labeled data. |
147 | Data Augmentation Using Empirical Mode Decomposition on Neural Networks to Classify Impact Noise in Vehicle | G. Nam, S. Bu, N. Park, J. Seo, H. Jo and W. Jeong | In this paper, we performed data augmentation using Empirical Mode Decomposition (EMD) method that decomposes the original signal into a finite number of intrinsic mode functions (IMFs). |
148 | Clotho: an Audio Captioning Dataset | K. Drossos, S. Lipping and T. Virtanen | In this paper we present Clotho, a dataset for audio captioning consisting of 4981 audio samples of 15 to 30 seconds duration and 24 905 captions of eight to 20 words length, and a baseline method to provide initial results. |
149 | Robust Fundamental Frequency Estimation in Coloured Noise | A. E. Jaramillo, A. Jakobsson, J. K. Nielsen and M. Gr?sb?ll Christensen | To allow for this, we here propose two schemes that refine the noise statistics and parameter estimates in an iterative manner, one of them based on an approximate ML solution and the other one based on removing the periodic signal obtained from a linearly constrained minimum variance (LCMV) filter. |
150 | Efficient Bird Sound Detection on the Bela Embedded System | A. Solomes and D. Stowell | This paper proposes an automatic detection algorithm for the Bela embedded Linux device for wildlife monitoring. |
151 | Improving Automated Segmentation of Radio Shows with Audio Embeddings | O. Berlage, K. Lux and D. Graus | This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. |
152 | SECL-UMons Database for Sound Event Classification and Localization | M. Brousmiche, J. Rouat and S. Dupont | We introduce the SECL-UMons dataset for sound event classification and localization in the context of office environments. |
153 | Synthesizing Engaging Music Using Dynamic Models of Statistical Surprisal | S. Kothinti, B. Skerritt-Davis, A. Nair and M. Elhilali | In this work, we explore the statistical structure of a musical corpus and its effect on modulating the attention of listeners. |
154 | Harmonics Based Representation in Clarinet Tone Quality Evaluation | Y. Wang, X. Guan, Y. Du and N. Nan | In this paper we present a new method for identifying the clarinet reed quality by evaluating tone quality based on the harmonic structure and energy distribution. |
155 | Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments | E. Manilow, P. Seetharaman and B. Pardo | We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks. |
156 | The Role of Annotation Fusion Methods in the Study of Human-Reported Emotion Experience During Music Listening | T. Greer, K. Mundnich, M. Sachs and S. Narayanan | In this work, we investigate several ways to represent aggregate human annotations of the complex, subjective emotional experience of listening to music. |
157 | Content Based Singing Voice Extraction from a Musical Mixture | P. Chandna, M. Blaauw, J. Bonada and E. G?mez | We present a deep learning based methodology for extracting the singing voice signal from a musical mixture based on the underlying linguistic content. |
158 | Neural Percussive Synthesis Parameterised by High-Level Timbral Features | A. Ramires, P. Chandna, X. Favory, E. G?mez and X. Serra | We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds. |
159 | Non-Griffin?Lim Type Signal Recovery from Magnitude Spectrogram | R. Nakatsu, D. Kitahara and A. Hirabayashi | To make the best of such useful spectrogram transforms, we propose an algorithm which recovers the time-domain signal without the inverse spectrogram transforms. |
160 | Vapar Synth – A Variational Parametric Model for Audio Synthesis | K. Subramani, P. Rao and A. D?Hooge | We present Va-Par Synth – a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation. |
161 | Bandwidth Extension of Musical Audio Signals With No Side Information Using Dilated Convolutional Neural Networks | M. Lagrange and F. Gontier | This paper studies the benefit of considering a dilated fully convolutional neural network to perform the bandwidth extension of musical audio signals with no side information on the magnitude spectra. |
162 | Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled Densenets | M. Huber, G. Schindler, C. Sch?rkhuber, W. Roth, F. Pernkopf and H. Fr?ning | We extend the multi-scaled DenseNet in several aspects to facilitate real-time source separation scenarios. |
163 | State-Based Transcription of Components of Carnatic Music | V. S. Viraraghavan, A. Pal, H. Murthy and R. Aravind | In this paper, we propose a novel state-based representation of the pitch curve motivated by CM components called constant-pitch notes and stationary points. |
164 | Meta-Learning Extractors for Music Source Separation | D. Samuel, A. Ganeshan and J. Naradowsky | We propose a hierarchical meta-learning-inspired model for music source separation (Meta-TasNet) in which a generator model is used to predict the weights of individual extractor models. |
165 | Consistency-Aware Multi-Channel Speech Enhancement Using Deep Neural Networks | Y. Masuyama, M. Togami and T. Komatsu | This paper proposes a deep neural network (DNN)?based multichannel speech enhancement system in which a DNN is trained to maximize the quality of the enhanced time-domain signal. |
166 | Phase Reconstruction Based On Recurrent Phase Unwrapping With Deep Neural Networks | Y. Masuyama, K. Yatabe, Y. Koizumi, Y. Oikawa and N. Harada | However, the training of a DNN for phase reconstruction is not an easy task because phase is sensitive to the shift of a waveform. To overcome this problem, we propose a DNN-based two-stage phase reconstruction method. |
167 | Performance Study of a Convolutional Time-Domain Audio Separation Network for Real-Time Speech Denoising | S. Sonning, C. Sch?ldt, H. Erdogan and S. Wisdom | This paper investigates the performance of such a time-domain network (Conv-TasNet) for speech denoising in a real-time setting, comparing various parameter settings. |
168 | Channel-Attention Dense U-Net for Multichannel Speech Enhancement | B. Tolooshams, R. Giri, A. H. Song, U. Isik and A. Krishnaswamy | We propose Channel-Attention Dense U-Net, in which we apply the channel-attention unit recursively on feature maps at every layer of the network, enabling the network to perform non-linear beamforming. |
169 | A Composite DNN Architecture for Speech Enhancement | Y. Yemini, S. E. Chazan, J. Goldberger and S. Gannot | In this work, we show that both separate cost functions are unsuitable for dealing with narrowband noise, and propose a new composite estimator in the log-spectrum domain. |
170 | Geometrically Constrained Independent Vector Analysis for Directional Speech Enhancement | L. Li and K. Koishida | We propose a dual-microphone speech enhancement system based on the proposed method and investigate its effectiveness with objective metrics. |
171 | Real-Time Speech Enhancement Using Equilibriated RNN | D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa and N. Harada | We propose a speech enhancement method using a causal deep neural network (DNN) for real-time applications. |
172 | Subspace-Based Speech Correlation Vector Estimation for Single-Microphone Multi-Frame MVDR Filtering | D. Fischer and S. Doclo | In this paper, we propose a subspace-based estimator for the normalized speech correlation vector based on the Q largest eigenvalues and their corresponding eigenvectors of the prewhitened noisy speech correlation matrix. |
173 | Time-Frequency Loss for CNN Based Speech Super-Resolution | H. Wang and D. Wang | This paper proposes an autoencoder based fully convolutional neural network (CNN) that merges the information from both time and frequency domains. |
174 | Time-Domain Neural Network Approach for Speech Bandwidth Extension | X. Hao, C. Xu, N. Hou, L. Xie, E. S. Chng and H. Li | In this paper, we study the time-domain neural network approach for speech bandwidth extension. |
175 | Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement | Y. Xia, S. Braun, C. K. A. Reddy, H. Dubey, R. Cutler and I. Tashev | We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction. |
176 | Snorer Diarisation Based On Deep Neural Network Embeddings | H. E. Romero, N. Ma and G. J. Brown | This paper proposes a novel acoustic analysis system for snorer diarisation, a concept extrapolated from speaker diarisation research, which allows screening for SDB of both the user and the bed partner using a single smartphone. |
177 | Playing Technique Recognition by Joint Time?Frequency Scattering | C. Wang, V. Lostanlen, E. Benetos and E. Chew | In this paper, we propose a recognition system based on the joint time?frequency scattering transform (jTFST) for pitch evolution-based playing techniques (PETs), a group of playing techniques with monotonic pitch changes over time. |
178 | Privacy Aware Acoustic Scene Synthesis Using Deep Spectral Feature Inversion | F. Gontier, M. Lagrange, C. Lavandier and J. Petiot | Part of the research is to find ways to inform the citizen about its sound environment while ensuring her privacy.We study in this paper how this application can be cast into a feature inversion problem. |
179 | Robustness Assessment of Automatic Reinke?s Edema Diagnosis Systems | M. Madruga, Y. Campos-Roca and C. J. P?rez | The goal of this paper is to build automatic detection systems for Reinke?s edema based on a novel in-house dataset and, alternatively, on the Massachusetts Eye and Ear Infirmary Voice Disorders Database, and assess noise robustness in both cases. |
180 | Whosecough: In-the-Wild Cougher Verification Using Multitask Learning | M. Whitehill, J. Garrison and S. Patel | In this work, we overcome this problem by using multitask learning, where the second task is speaker verification. |
181 | Chirping up the Right Tree: Incorporating Biological Taxonomies into Deep Bioacoustic Classifiers | J. Cramer, V. Lostanlen, A. Farnsworth, J. Salamon and J. P. Bello | This paper introduces TaxoNet, a deep neural network for structured classification of signals from living organisms. |
182 | Beamforming Design for High-Resolution Low-Intensity Focused Ultrasound Neuromodulation | B. Fan, W. Goodman, R. Y. Cho, S. A. Sheth, R. R. Bouchard and B. Aazhang | In this study, we investigate the optimization of targeting through fine temporal and spatial power delivery control of a phased array of ultrasound elements. |
183 | An Attention Enhanced Multi-Task Model for Objective Speech Assessment in Real-World Environments | X. Dong and D. S. Williamson | In this work, we present a novel reference-less based framework called the attention enhanced multi-task speech assessment (AMSA) model, which provides reliable estimates of multiple objective quality and intelligibility measures in simulated and real-world environments. |
184 | Humbug Zooniverse: A Crowd-Sourced Acoustic Mosquito Dataset | I. Kiskin, A. D. Cobb, L. Wang and S. Roberts | We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels. |
185 | Subjective Quality Estimation Using PESQ For Hands-Free Terminals | S. Kurihara, M. Fukui, S. Shimauchi and N. Harada | We propose third-party listening and conversational test procedures to assess whether PESQ can be used for predicting the subjective quality of an acoustic echo canceler. |
186 | Classification of Epileptic IEEG Signals by CNN and Data Augmentation | X. Zhao et al. | In this paper, we firstly introduce a one-dimensional convolutional neural network (1D-CNN) model for epileptic seizure focus detection which avoids the manual, time-consuming feature extraction Moreover, to reduce the necessary number of training samples, we introduce an approach for data augmentation. |
187 | Fractional Fourier Transform Based QRS Complex Detection in ECG Signal | T. Yaqoob, S. Aziz, S. Ahmed, O. Amin and M. Alouini | By exploiting fractional-Fourier-transform (FrFT), a novel technique for the QRS complex detection is proposed. |
188 | Cross-Domain Joint Dictionary Learning for ECG Reconstruction from PPG | X. Tian, Q. Zhu, Y. Li and M. Wu | This paper proposes a cross-domain joint dictionary learning (XDJDL) framework to maximize the expressive power for the two cross-domain signals. |
189 | An LSTM Based Architecture to Relate Speech Stimulus to Eeg | M. J. Monesi, B. Accou, J. Montoya-Martinez, T. Francart and H. V. Hamme | We present a novel Long Short-Term Memory (LSTM)-based architecture as a nonlinear model for the classification problem of whether a given pair of (EEG, speech envelope) correspond to each other or not. |
190 | Joint Semi-Supervised Feature Auto-Weighting and Classification Model for EEG-Based Cross-Subject Sleep Quality Evaluation | Y. Peng, Q. Li, W. Kong, J. Zhang, B. Lu and A. Cichocki | To suppress the cross-subject variances of EEG data, in this paper, we propose a joint feature auto-weighting and semi-supervised classification model, termed GRLSR, which is formulated by introducing an auto-weighting variable into the least square regression to adaptively and quantitatively measure the importance of each dimension of the feature. |
191 | Reversal No Longer Matters: Attention-Based Arrhythmia Detection with Lead-Reversal ECG Data | Z. Cao, J. Shi and J. Wu | In this paper, we propose an attention-based multi-scale neural network for arrhythmia detection with lead-reversal electrocardiogram data. |
192 | Augmenting Molecular Images with Vector Representations as a Featurization Technique for Drug Classification | D. d. Marchi and A. Budhiraja | This paper proposes the creation of molecular images “captioned” with binary vectors that encode information not contained in or easily understood from a molecular image alone. |
193 | Multi-Modal Self-Supervised Pre-Training for Joint Optic Disc and Cup Segmentation in Eye Fundus Images | ?. S. Hervella, L. Ramos, J. Rouco, J. Novo and M. Ortega | This paper presents a novel approach for the segmentation of the optic disc and cup in eye fundus images using deep learning. |
194 | Dense Mapping of Intracellular Diffusion and Drift from Single-Particle Tracking Data Analysis | A. Salomon, C. A. Valades-Cruz, L. Leconte and C. Kervrann | In this paper, we propose a new mapping method to robustly estimate dynamics in the entire cell from particle tracks. |
195 | A Deep Gradient Boosting Network for Optic Disc and Cup Segmentation | Q. Liu, B. Zou, Y. Zhao and Y. Liang | To build connections among prediction branches, this paper introduces gradient boosting framework to deep classification model and proposes a gradient boosting network called BoostNet. |
196 | Adaptive Elastic Loss Based on Progressive Inter-Class Association for Cervical Histology Image Segmentation | Z. Meng, Z. Zhao, F. Su and W. Wang | In this paper, an end-to-end deep segmentation network for complex cervical histology images is proposed, and a benchmark evaluation is contributed. |
197 | A Bidirectional Context Propagation Network for Urine Sediment Particle Detection in Microscopic Images | M. Yan, Q. Liu, Z. Yin, D. Wang and Y. Liang | This paper proposes a bidirectional context propagation network called BCPNet for urine sediment particle detection. |
198 | The Swax Benchmark: Attacking Biometric Systems with Wax Figures | R. H. Vareto, A. Marcia Saldanha and W. R. Schwartz | This work introduces a new database named Sense Wax Attack dataset (SWAX), comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. |
199 | Resting-State EEG-Based Biometrics with Signals Features Extracted by Multivariate Empirical Mode Decomposition | M. K. Ma, T. Lee, M. C. Fong and W. Shiyuan Wang | This study highlights the uniqueness of the proposed non-stationary and connectivity-based feature and demonstrated its success as a biometrics. |
200 | Auto-Fas: Searching Lightweight Networks for Face Anti-Spoofing | Z. Yu et al. | In this paper, we propose a neural architecture search (NAS) based method called Auto-FAS, intending to discover well-suitable lightweight networks for mobile-level face anti-spoofing. |
201 | Domain Adaptation for Generalization of Face Presentation Attack Detection in Mobile Settengs with Minimal Information | A. Mohammadi, S. Bhattacharjee and S. Marcel | Here, we propose a novel one class domain adaptation method which uses domain guided pruning to adapt a pre-trained PAD network to the target dataset. |
202 | A Lightweight Multi-Label Segmentation Network for Mobile Iris Biometrics | C. Wang, Y. Wang, B. Xu, Y. He, Z. Dong and Z. Sun | This paper proposes a novel, lightweight deep convolutional neural network specifically designed for iris segmentation of noisy images acquired by mobile devices. |
203 | Modeling Behavioral Consistency in Large-Scale Wearable Recordings of Human Bio-Behavioral Signals | T. Feng and S. S. Narayanan | In this work, we introduce a novel data processing pipeline to model behavioral consistency in a large real-world wearable recording data-set collected in a hospital workplace setting from nurses and direct clinical providers for a period of ten weeks. |
204 | Modeling Behavior as Mutual Dependency between Physiological Signals and Indoor Location in Large-Scale Wearable Sensor Study | T. Feng, B. M. Booth and S. S. Narayanan | In this work, we describe our exploration in discovering the correlation between one?s physiological responses and movement patterns within different indoor locations using data collected from nurses in a hospital workplace for a ten week period. |
205 | Multichannel Signal Classification Using Vector Autoregression | A. Haboub, H. Baali and A. Bouzerdoum | To further explore this area, we propose a simple yet effective approach based on modeling MCS with a VAR process. |
206 | Efficient Algorithm to Implement Sliding Singular Spectrum Analysis with Application to Biomedical Signal Denoising | M. Saeed, C. C. Took and S. R. Alty | In this paper, we show it is possible to evaluate the rank-1 SVD update efficiently in $\mathcal{O}\left( {{n^2}} \right)$, thus dramatically increasing the speed of the sliding version of the SSA algorithm. |
207 | Strategic Attention Learning for Modality Translation | J. Martinez, A. Akbari, K. Sel and R. Jafari | In this work, we propose a two-stage deep learning framework that leverages a novel attention mechanism to translate Bio-Z signals to highly interpretable electrocardiogram (ECG) waveforms while also predicting translation uncertainty. |
208 | Sparse CSP Algorithm via Joint Spatio-Temporal Filtering | A. Jiang, J. Shang, W. Cheng, X. Liu, H. K. Kwan and Y. Zhu | To improve its performance, a novel and efficient spatio-temporal filtering strategy is proposed in this paper to extract discriminative features. |
209 | Human-Machine Collaboration for Medical Image Segmentation | M. Ravanbakhsh et al. | In this paper, we propose a method based on conditional Generative Adversarial Network (cGAN) to address segmentation in semi-supervised setup and in a human-in-the-loop fashion. |
210 | Mixup Multi-Attention Multi-Tasking Model for Early-Stage Leukemia Identification | P. Mathur, M. Piplani, R. Sawhney, A. Jindal and R. R. Shah | To this effect, we propose a novel architecture termed as Mixup Multi-Attention Multi-Task Learning Model (MMA-MTL), which introduces Pointwise Attention Convolution Layers and Local Spatial Attention blocks to capture global and local features simultaneously. |
211 | Cross-View Attention Network for Breast Cancer Screening from Multi-View Mammograms | X. Zhao, L. Yu and X. Wang | In this paper, we address the problem of breast caner detection from multi-view mammograms. |
212 | UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation | H. Huang et al. | In this paper, we propose a novel UNet 3+, which takes advantage of full-scale skip connections and deep supervisions. |
213 | Unsupervised Content-Preserved Adaptation Network for Classification of Pulmonary Textures from Different CT Scanners | R. Xu, Z. Cong, X. Ye, S. Kido and N. Tomiyama | In this paper, we propose an unsupervised content-preserved adaptation network to address this problem. |
214 | Classify and Explain: An Interpretable Convolutional Neural Network For Lung Cancer Diagnosis | Y. Li, D. Gu, Z. Wen, F. Jiang and S. Liu | In this paper, we present a novel network structure for visually interpretable lung nodule diagnosis. |
215 | Robust Global Optimized Affine Registration Method for Microscopic Images of Biological Tissue | Y. Lv, X. Chen, C. Shu and H. Han | In this paper, a global optimized affine registration method is proposed, which can be used in volume reconstruction. |
216 | Empirical Sure-Guided Microscopy Super-Resolution Image Reconstruction from Confocal Multi-Array Detectors | S. Prigent, S. Dutertre and C. Kervrann | In this paper, we review the most commonly used reconstruction methods and propose a SURE approach to automatically estimate parameters and improve reconstruction. |
217 | Encoding Temporal Information For Automatic Depression Recognition From Facial Analysis | W. C. de Meto, E. Granger and M. B. Lopez | To address these issues, we propose a novel temporal pooling method to capture and encode the spatio-temporal dynamic of video clips into an image map. |
218 | Retinal Vessel Segmentation via a Semantics and Multi-Scale Aggregation Network | R. Xu, X. Ye, G. Jiang, T. Liu, L. Li and S. Tanaka | In this paper, we propose a semantics and multi-scale aggregation network to address these difficulties. |
219 | Adaptive Matched Filter using Non-Target Free Training Data | A. M. Rekavandi, A. Seghouane and R. J. Evans | To develop the test, an ad hoc approach, similar to the classical adaptive matched filter (AMF) is used where instead of the maximum likelihood (ML) estimator of the covariance, the minimum a-divergence based estimator is substituted in the likelihood ratio. |
220 | Feature Drift Resilient Tracking of The Carotid Artery Wall Using Unscented Kalman Filtering With Data Fusion | J. Dorazil, R. Repp, T. Kropfreiter, R. Pr?ller, K. R?ha and F. Hlawatsch | Here, we propose a method for tracking CCA wall motion from a B-mode ultrasound video sequence. |
221 | Tracing Network Evolution Using The Parafac2 Model | M. Roald, S. Bhinge, C. Jia, V. Calhoun, T. Adali and E. Acar | In this paper, without assuming static networks in time and/or space, we arrange the temporal data as a higher-order tensor and use a tensor fac-torization model called PARAFAC2 to capture underlying patterns (spatial networks) in time-evolving data and their evolution. |
222 | A Model-Based Deep Network for MRI Reconstruction Using Approximate Message Passing Algorithm | X. Qiao, J. Du, L. Wang, Z. He and Y. Jia | We propose a novel model-based network to reconstruct the magnetic resonance (MR) image. |
223 | Online Positron Emission Tomography By Online Portfolio Selection | Y. Li | The number of measurement outcomes in positron emission tomography (PET) is typically large, rendering signal reconstruction computationally expensive. We propose an online algorithm to address this computational issue. |
224 | Space Filling Curves for MRI Sampling | S. Sharma, K. V. S. Hari and G. Leus | We propose 1-shot and 4-shot variable density SFCs by utilizing the space coverage provided by SFCs in different iterations. |
225 | K-Space Trajectory Design for Reduced MRI Scan Time | S. Sharma, K. V. S. Hari and G. Leus | In this paper, a constrained convex optimization based method to obtain feasible trajectories is proposed. |
226 | Rethinking Retinal Landmark Localization as Pose Estimation: Na?ve Single Stacked Network for Optic Disk and Fovea Detection | S. R. Maiya and P. Mathur | In this regard, we present Naive Single Stacked Hourglass (NSSH) network which learns the spatial orientation and pixel intensity contrast between optic disk and fovea to accurately pinpoint their locations. |
227 | Hidden Markov Models for Sepsis Detection in Preterm Infants | A. Honor? et al. | To improve the neural network based HMM, we propose a discriminative training approach. |
228 | Blood Pressure Estimation From PPG Signals Using Convolutional Neural Networks And Siamese Network | O. Schlesinger, N. Vigderhouse, D. Eytan and Y. Moshe | This paper presents two techniques that enable continuous and noninvasive cuff-less BP estimation using photoplethysmography (PPG) signals with Convolutional Neural Networks (CNNs). |
229 | Speech Breathing Estimation Using Deep Learning Methods | V. S. Nallanthighal, A. H?rm? and H. Strik | In this paper, we explore techniques for sensing breathing from speech using deep learning architectures including multi-task learning approaches. |
230 | A Fast Non-Contact Vital Signs Detection Method Based on Regional Hidden Markov Model in A 77ghz Lfmcw Radar System | Z. Mei, Q. Wu, Z. Hu and J. Tao | In this paper, a 77GHz linear frequency modulated continuous-wave (LFMCW) radar system is investigated to mitigate multiple-targets interferences. |
231 | Robust Likelihood Ratio Test Using a-Divergence | A. M. Rekavandi, A. Seghouane and R. J. Evans | The problem of detecting a subspace signal in the presence of subspace interference and contaminated Gaussian noise with unknown variance is investigated. |
232 | Using X-Vectors to Automatically Detect Parkinson?s Disease from Speech | L. Moro-Velazquez, J. Villalba and N. Dehak | Accordingly, in this study we analyze for the first time a new state-of-the-art speaker recognition technique, x-Vectors, in a different scenario: the automatic detection of PD from speech. |
233 | Learning A Common Granger Causality Network Using A Non-Convex Regularization | P. Manomaisaowapak and J. Songsiri | This paper proposes an estimation for learning a common Granger network of panel data. |
234 | Synthetic Data Generation Through Statistical Explosion: Improving Classification Accuracy of Coronary Artery Disease Using PPG | S. Bhattacharya, O. Mazumder, D. Roy, A. Sinha and A. Ghose | This paper presents a novel approach of generating synthetic Photoplethysmogram (PPG) data using statistical explosion. |
235 | High-Accuracy Classification of Attention Deficit Hyperactivity Disorder with L2,1-Norm Linear Discriminant Analysis | Y. Tang, X. Li, Y. Chen, Y. Zhong, A. Jiang and X. Liu | In detail, we introduce a binary hypothesis testing framework as the classification outline to cope with insufficient data of ADHD database. |
236 | A Neural Network-Based Spike Sorting Feature Map That Resolves Spike Overlap in the Feature Space | J. Wouters, F. Kloosterman and A. Bertrand | In this work, a novel approach is presented to resolve spike overlap directly in the feature space. |
237 | Gait Phase Segmentation Using Weighted Dynamic Time Warping and K-Nearest Neighbors Graph Embedding | T. Chen, T. Lin and Y. -. P. Hong | To reduce the complexity of the DTW-based kNN search, we propose a neural network-based graph embedding scheme that is able to map the IMU signals associated with each gait cycle into a distance-preserving low-dimensional representation while also producing a prediction on the k nearest neighbors of the test signal. |
238 | Automatic Classification of Volumes of Water Using Swallow Sounds from Cervical Auscultation | S. Subramani, M. V. Achuth Rao, D. Giridhar, P. S. Hegde and P. Kumar Ghosh | In this work, a rich set of acoustic features, the ComParE 2016 acoustic feature set (OS) is used to investigate whether several temporal, spectral, vocal and source features and their functionals provide cues for volume classification. |
239 | Conditional Domain Adversarial Transfer for Robust Cross-Site ADHD Classification Using Functional MRI | Y. Huang, W. Hsieh, H. Yang and C. Lee | Hence in this research, we utilize an approach of conditional adversarial domain adaptation network (CDAN) to learn a discriminative fMRI representation that is site-invariant for unsupervised transfer of ADHD classification. |
240 | Eeg Connectivity – Informed Cooperative Adaptive Line Enhancer for Recognition of Brain State | S. Sanei, C. C. Took, D. Jarchi and A. Prochazka | In this paper, it is shown that for a proposed cooperative adaptive line enhancer, which can both detect and separate such periodic bursts, the combination weights are consistently different from each other. |
241 | Online Graph Topology Inference with Kernels For Brain Connectivity Estimation | M. Moscu, R. Borsoi and C. Richard | This paper focuses on estimating in an online and adaptive manner a network structure capturing the non-linear dependencies among streaming graph signals in the form of a possibly directed, adjacency matrix. |
242 | Minimal Adversarial Perturbations in Mobile Health Applications: The Epileptic Brain Activity Case Study | A. Aminifar | In this article, we demonstrate the power of such adversarial attacks based on a real-world epileptic seizure detection problem. |
243 | Detecting Autism Spectrum Disorder Using Topological Data Analysis | S. Majumder, F. Apicella, F. Muratori and K. Das | We used a feature extraction technique based on Topological Data Analysis (TDA) to classify autistic subjects from typically developing ones. |
244 | Multi-View Bayesian Generative Model for Multi-Subject FMRI Data on Brain Decoding of Viewed Image Categories | Y. Akamatsu, R. Harakawa, T. Ogawa and M. Haseyama | In this paper, we propose a multi-view Bayesian generative model for multi-subject fMRI data to estimate viewed image categories from fMRI activity. |
245 | Time-Frequency Analysis of Unimodal Sensory Processing In Autism Spectrum Disorder | D. F. D?Croz-Baron, M. C. Baker and T. Karp | This work summarizes the results of a time-frequency analysis of sensory processing in young adults with Autism Spectrum Disorder via continuous wavelet transform. |
246 | Automatic Epileptic Seizure Onset-Offset Detection Based On CNN in Scalp EEG | P. Boonyakitanont, A. Lek-uthai and J. Songsiri | We establish a deep learning-based method to automatically detect the epileptic seizure onsets and offsets in multi-channel electroencephalography (EEG) signals. |
247 | Enhance feature representation of electroencephalogram for Seizure detection | D. Wang, Y. Fang, Y. Li and C. Chai | In order to solve these problems, we propose a novel EEG feature pre-presentation method for seizure detection based on the Log Mel-Filterbank energy feature. |
248 | Speech Synthesis Using EEG | G. Krishna, C. Tran, Y. Han, M. Carnahan and A. H. Tewfik | In this paper we demonstrate speech synthesis using different electroencephalography (EEG) feature sets recently introduced in [1]. |
249 | Eeg Feature Selection Using Orthogonal Regression: Application to Emotion Recognition | X. Xu, F. Wei, Z. Zhu, J. Liu and X. Wu | To reduce the redundancy and choose informative EEG features, in this paper, we propose an EEG feature selection technique, termed as Feature Selection with Orthogonal Regression (FSOR). |
250 | Scalpnet: Detection of Spatiotemporal Abnormal Intervals in Epileptic EEG Using Convolutional Neural Networks | T. Sakai, T. Shoji, N. Yoshida, K. Fukumori, Y. Tanaka and T. Tanaka | We propose ScalpNet: A deep neural network to detect spatiotemporal abnormal intervals from EEGs of epilepsy patients. |
251 | A Semi-Supervised Approach For Identifying Abnormal Heart Sounds Using Variational Autoencoder | R. Banerjee and A. Ghose | In this paper, we propose a semi-supervised approach to solve the problem. |
252 | Detection Of S1 And S2 Locations In Phonocardiogram Signals Using Zero Frequency Filter | R. Prasad, G. Yilmaz, O. Chetelat and M. Magimai.-Doss | In this paper, we propose a purely time?domain processing based method that employs a heavily decaying low pass filter (referred to as zero frequency filter) to suppress extraneous factors and detect S1?S2 locations. |
253 | Mental Fatigue Prediction from Multi-Channel ECOG Signal | L. Yao, J. L. Baker, J. Ryou, N. D. Schiff, K. P. Purpura and M. Shoaran | In this study, we analyzed electrocorticography (ECoG) signals chronically recorded from two non-human primates (NHPs) as they performed a cognitively demanding task over extended periods of time. |
254 | The Effect of Data Augmentation on Classification of Atrial Fibrillation in Short Single-Lead ECG Signals Using Deep Neural Networks | F. N. Hatamian, N. Ravikumar, S. Vesal, F. P. Kemeth, M. Struck and A. Maier | In this study, we investigate the impact of various data augmentation algorithms, e.g., oversampling, Gaussian Mixture Models (GMMs) and Generative Adversarial Networks (GANs), on solving the class imbalance problem. |
255 | Atrial Fibrillation Risk Prediction from Electrocardiogram and Related Health Data with Deep Neural Network | Y. Chen, A. H. Twing, D. Badawi, J. Danavi, M. McCauley and A. E. Cetin | In this study, we develop a novel and effective method to predict the potential AF risk of patients using our ECG signal dataset collected in the University of Illinois Hospital and Health Sciences System. |
256 | Adaptive Region Aggregation Network: Unsupervised Domain Adaptation with Adversarial Training for ECG Delineation | M. Chen, G. Wang, H. Chen and Z. Ding | In this work, we propose an unsupervised domain adaptation method called Adaptive Region Aggregation Network (ARAN) based on adversarial training to tackle domain shift problem in ECG delineation. |
257 | Matching Pursuit Based Dynamic Phase-Amplitude Coupling Measure | T. T. K. Munia and S. Aviyente | In this paper, we introduce a data-driven approach to quantify dynamic PAC. |
258 | Multitaper Spectral Granger Causality with Application to Ssvep | R. Anderson and M. Sandsten | These limits can be solved by using nonparametric spectral estimates in the frequency-domain formulation of GC, also known as spectral GC. |
259 | Cross-Domain Adaptation for Biometric Identification Using Photoplethysmogram | E. Lee, A. Ho, Y. Wang, C. Huang and C. Lee | In this work, we propose the use of both unsupervised and semi-supervised adversarial learning techniques for cross-domain adaptation. |
260 | Exploring Bio-Behavioral Signal Trajectories of State Anxiety During Public Speaking | E. H. Nirjhar, A. Behzadan and T. Chaspari | We tackle this problem by introducing temporal parametric models to quantify bio-behavioral trajectories of PSA throughout a public speaking encounter. |
261 | Real-Time Hand Gesture Recognition Using Temporal Muscle Activation Maps of Multi-Channel Semg Signals | A. D. Silva, M. V. Perera, K. Wickramasinghe, A. M. Naim, T. Dulantha Lalitharatne and S. L. Kappel | Based on these maps, we propose an algorithm that can recognize hand gestures in real-time using a Convolution Neural Network. |
262 | XceptionTime: Independent Time-Window Xceptiontime Architecture for Hand Gesture Classification | E. Rahimian, S. Zabihi, S. F. Atashzar, A. Asif and A. Mohammadi | Capitalizing on the goal of addressing identified shortcomings of recent solutions developed for recognition tasks via sparse multichannel surface Electromyography (sEMG) signals, the paper proposes a novel deep learning model, referred to as the XceptionTime architecture. |
263 | Discovering Causalities from Cardiotocography Signals using Improved Convergent Cross Mapping with Gaussian Processes | G. Feng, J. G. Quirk and P. M. Djuric | In this paper, we propose a novel improved version of CCM using Gaussian processes for discovery of causality from noisy time series. |
264 | Lai-Net: Local-Ancestry Inference with Neural Networks | D. M. Montserrat, C. Bustamante and A. Ioannidis | In this paper we develop the first neural network based LAI method, named LAI-Net, providing competitive accuracy with state-of-the-art methods and robustness to missing or noisy data, while having a small number of layers. |
265 | Prediction of Individual Progression Rate in Parkinson?s Disease Using Clinical Measures and Biomechanical Measures of Gait and Postural Stability | V. Raval, K. P. Nguyen, A. Gerald, R. B. Dewey and A. Montillo | The primary aim of this study was to develop a model using clinical measures and biomechanical measures of gait and postural stability to predict an individual?s PD progression over two years. |
266 | Deep Matrix Completion on Graphs: Application in Drug Target Interaction Prediction | A. Mongia and A. Majumdar | This work proposes matrix completion via deep matrix factorization on graphs. |
267 | Identification of Essential Proteins Using A Novel Multi-Objective Optimization Method | C. Wu, H. Zhang, L. Zhang and H. Zheng | Hence, in this paper, we consider the identification of essential proteins as a multi-objective optimization problem and use a novel multi-objective optimization method to solve it. |
268 | Graph Convolutional Neural Networks to Classify Whole Slide Images | R. Konda, H. Wu and M. D. Wang | Here, we present a novel application of graph convolutional networks (GCNs) to analyze WSIs. |
269 | Deep James-Stein Neural Networks For Brain-Computer Interfaces | M. Angjelichinoski, M. Soltani, J. Choi, B. Pesaran and V. Tarokh | We propose a novel method that combines James-Stein regression for feature extraction, and deep neural network for decoding; we refer to the architecture as deep James-Stein neural network (DJSNN). |
270 | Formulating Divergence Framework for Multiclass Motor Imagery EEG Brain Computer Interface | S. Kumar, T. K. Reddy, V. Arora and L. Behera | In this work, a novel method is proposed based on Joint Approximate Diagonalization (JAD) to optimize stationarity for multiclass motor imagery Brain Computer Interface (BCI) in an information theoretic framework. |
271 | Subject Transfer Framework Based on Source Selection and Semi-Supervised Style Transfer Mapping for Semg Pattern Recognition | S. Kanoga, T. Hoshino and H. Asoh | In this study, we investigate the efficiency of the proposed subject transfer framework, which applies a discriminability-based source selection approach and semi-supervised style transfer mapping algorithm, by constructing support vector machine classifiers. |
272 | Decoding Movement Imagination and Execution From Eeg Signals Using Bci-Transfer Learning Method Based on Relation Network | D. Lee, J. Jeong, K. Shim and S. Lee | In this study, we focused on a robust MI decoding method with transfer learning for the ME and MI paradigm. |
273 | Classification of High-Dimensional Motor Imagery Tasks Based on An End-To-End Role Assigned Convolutional Neural Network | B. Lee, J. Jeong, K. Shim and S. Lee | In this study, we collected intuitive EEG data contained the nine different types of movements of a single-arm from 9 subjects. We propose an end-to-end role assigned convolutional neural network (ERA-CNN) which considers discriminative features of each upper limb region by adopting the principle of a hierarchical CNN architecture. |
274 | Channel Selection over Riemannian Manifold with Non-Stationarity Consideration for Brain-Computer Interface Applications | K. Sadatnejad, A. Roc, L. Pillette, A. appriou, T. Monseigne and F. Lotte | In this paper, we propose and compare multiple criteria for selecting ElectroEncephaloGraphic (EEG) channels over the Riemannian manifold, for EEG classification in Brain- Computer Interfaces (BCI). |
275 | A Segmentation Based Robust Deep Learning Framework for Multimodal Retinal Image Registration | Y. Wang et al. | In this paper, a deep learning framework for multimodal retinal image registration is proposed. |
276 | Dense Residual Network for Retinal Vessel Segmentation | C. Guo, M. Szemenyei, Y. Yi, Y. Xue, W. Zhou and Y. Li | In this work, we propose an efficient method to segment blood vessels in Scanning Laser Ophthalmoscopy (SLO) retinal images. |
277 | Lightweight V-Net for Liver Segmentation | T. Lei, W. Zhou, Y. Zhang, R. Wang, H. Meng and A. K. Nandi | To address these issues, we design a lightweight V-Net (LV-Net) for liver segmentation in this paper. |
278 | Acu-Net: A 3D Attention Context U-Net for Multiple Sclerosis Lesion Segmentation | C. hu, G. Kang, B. Hou, Y. Ma, F. Labeau and Z. Su | To solve the problem, 3D attention context U-Net (ACU-Net) is proposed for MS lesion segmentation in this paper. |
279 | EDNFC-Net: Convolutional Neural Network with Nested Feature Concatenation for Nuclei-Instance Segmentation | S. Gehlot, A. Gupta and R. Gupta | To address these challenges, we propose an Encoder-Decoder based Convolutional Neural Network (CNN) with Nested-Feature Concatenation (EDNFC-Net) for automatic nuclei segmentation. |
280 | An Unsupervised Retinal Vessel Extraction and Segmentation Method Based On a Tube Marked Point Process Model | T. Li, M. Comer and J. Zerubia | In this paper, we propose an unsupervised segmentation method based on our previous connected tube marked point process (MPP) model. |
281 | KALM: Key Area Localization Mechanism for Abnormality Detection in Musculoskeletal Radiographs | W. Huang, Z. Xiong, Q. Wang and X. Li | To achieve this goal, we propose a key area localization mechanism (KALM) for abnormality detection for the first time in this paper. |
282 | Combining CGAN and Mil for Hotspot Segmentation in Bone Scintigraphy | H. Xu, S. Geng, Y. Qiao, K. Xu and Y. Gu | In this paper, we propose a new framework to detect and extract hotspots in thoracic region by integrating the techniques of both conditional generative adversarial networks (cGAN) and multiple instance learning (MIL). |
283 | A Noninvasive Method to Detect Diabetes Mellitus and Lung Cancer Using the Stacked Sparse autoencoder | Q. Zhang, J. Zhou and B. Zhang | The aim of this paper is to distinguish patients with diabetes mellitus, lung cancer from healthy people simultaneously by analyzing facial images through the stacked sparse autoencoder. |
284 | A Multi-Scaled Receptive Field Learning Approach for Medical Image Segmentation | P. Guo, X. Su, H. Zhang, M. Wang and F. Bao | To solve this problem, this paper integrates an atrous spatial pyramid pooling (ASPP) module in the contracting path of attention U-Net. |
285 | Automatic Data Augmentation Via Deep Reinforcement Learning for Effective Kidney Tumor Segmentation | T. Qin, Z. Wang, K. He, Y. Shi, Y. Gao and D. Shen | In this paper, we developed a novel automatic learning-based data augmentation method for medical image segmentation which models the augmentation task as a trial-and-error procedure using deep reinforcement learning (DRL). |
286 | Cross-Stained Segmentation from Renal Biopsy Images Using Multi-Level Adversarial Learning | K. Mei, C. Zhu, L. Jiang, J. Liu and Y. Qiao | In this paper, we design a robust and flexible model for cross-stained segmentation. |
287 | Blind Multi-Spectral Image Pan-Sharpening | L. Yu, D. Liu, H. Mansour, P. T. Boufounos and Y. Ma | We address the problem of sharpening low spatial-resolution multi-spectral (MS) images with their associated misaligned high spatial-resolution panchromatic (PAN) image, based on priors on the spatial blur kernel and on the cross-channel relationship. |
288 | A Forward-Backward Algorithm for Reweighted Procedures: Application to Radio-Astronomical Imaging | A. Repetti and Y. Wiaux | In this work we present a reweighted forward-backward algorithm designed to handle non-convex composite functions. |
289 | Cra: A Generic Compression Ratio Adapter for End-To-End Data-Driven Image Compressive Sensing Reconstruction Frameworks | Z. Zhang, K. Xu and F. Ren | This paper presents a generic compression ratio adapter (CRA) framework that addresses the variable compression ratio (CR) problem for existing EDCSR frameworks with no modification to given reconstruction models nor enormous rounds of training needed. |
290 | Revealing Hidden Drawings in Leonardo?s ?the Virgin of the Rocks? from Macro X-Ray Fluorescence Scanning Data through Element Line Localisation | S. Yan, J. Huang, N. Daly, C. Higgitt and P. L. Dragotti | In this paper, we propose a new method that can process macro XRF scanning data from paintings fully automatically. |
291 | 3D Unknown View Tomography Via Rotation Invariants | M. Zehni, S. Huang, I. Dokmanic and Z. Zhao | In this paper, we study the problem of reconstructing a 3D point source model from a set of 2D projections at unknown view angles. |
292 | Modelling Sea Clutter In Sar Images Using Laplace-Rician Distribution | O. Karakus, E. E. Kuruoglu and A. Achim | This paper presents a novel statistical model for the characterisation of synthetic aperture radar (SAR) images of the sea surface. |
293 | Volume Reconstruction for Light Field Microscopy | H. Verinaz-Jadan, P. Song, C. L. Howe, A. J. Foust and P. L. Dragotti | In this work, we make two contributions. First, we propose a simplification of the forward model based on a novel discretization approach that allows us to accelerate the computation without drastically increasing memory consumption. Second, we experimentally show that by including regularization priors and an appropriate initialization strategy, it is possible to remove the artifacts near the native object plane. |
294 | Deep Exposure Fusion with Deghosting via Homography Estimation and Attention Learning | S. Chen and Y. Chuang | This paper proposes a deep network for exposure fusion. |
295 | Single-Shot Real-Time Multiple-Path Time-of-Flight Depth Imaging for Multi-Aperture and Macro-Pixel Sensors | M. H. Conde, K. Kagawa, T. Kokado, S. Kawahito and O. Loffeld | In this work we consider two hardware alternatives able to acquire all necessary raw data in a single shot. |
296 | Fast Optical System Identification by Numerical Interferometry | S. Gupta, R. Gribonval, L. Daudet and I. Dokmanic | We propose a numerical interferometry method for identification of optical multiply-scattering systems when only intensity can be measured. |
297 | Fourier Phase Retrieval with Arbitrary Reference Signal | F. Arab and M. S. Asif | In this paper, we present a Fourier phase retrieval algorithm in the presence of a known (reference) signal at arbitrary location in the scene. |
298 | Preconditioned Ghost Imaging Via Sparsity Constraint | Z. Tong, J. Wang and S. Han | To deal with this issue, we propose an efficient recovery algorithm for GISC called the preconditioned multiple orthogonal least squares (PmOLS). |
299 | Multispectral Fusion of RGB and NIR Images Using Weighted Least Squares and Alternating Guidance | K. Zhou and C. Jung | In this paper, we propose multi-spectral fusion of RGB and NIR images using weighted least squares (WLS) and alternating guidance. |
300 | Color and Angular Reconstruction of Light Fields from Incomplete-Color Coded Projections | H. Nguyen and C. Guillemot | We present a simple variational approach for reconstructing color light fields (LFs) in the compressed sensing (CS) framework with very low sampling ratio, using both coded masks and color filter arrays (CFAs). |
301 | Cross Image Cubic Interpolator for Spatially Varying Exposures | Z. Li, J. Zheng, S. Xie and H. Shu | In this paper, we introduce a novel cross image cubic interpolator for the spatially varying exposures via the rolling shutter. |
302 | Discriminant and Sparsity Based Least Squares Regression with l1 Regularization for Feature Representation | S. Zhao, B. Zhang and S. Li | To solve these dilemmas, this paper presents a discriminant and sparsity based least squares regression with l1 regularization (DS_LSR). |
303 | Deep Meta-Relation Network for Visual Few-Shot Learning | F. Zhang, Q. Wang and X. Li | This paper proposes a novel metric-based deep learning method to solve the few-shot learning problem. |
304 | A Semi-Supervised Rank Tracking Algorithm For On-Line Unmixing Of Hyperspectral Images | L. NUS, S. MIRON, B. JAILLAIS, S. MOUSSAOUI and D. BRIE | Based on the On-line Alternating Direction Method of Multipliers (ADMM), we propose a new hyperspectral unmixing approach that integrates prior information as well as joint sparsity regularization, allowing to select only the active components on each sample of the image. |
305 | Inverse Multiple Scattering with Phaseless Measurements | M. A. Lodhi, Y. Ma, H. Mansour, P. T. Boufounos and D. Liu | We study the problem of reconstructing an object from phaseless measurements in the context of inverse multiple scattering. |
306 | Multi-Polarization Information Fusion for Object Contour Display in Passive Millimeter-Wave and Terahertz Security Imaging | Y. Cheng, Z. Zhao, Y. Wang and Y. Niu | In this paper, a physical-based contour display method by fusing multi-polarization information is proposed. |
307 | Characterisation of a Snapshot Fourier Transform Imaging Spectrometer Based on an Array of Fabry-Perot Interferometers | D. Picone, A. Dolet, S. Gousset, D. Voisin, M. D. Mura and E. Le Coarer | In this paper, we present a strategy for estimating the thickness of the Fabry-Perot cavities, as this information is typically not precise or even available. |
308 | Shadow Removal of Text Document Images by Estimating Local and Global Background Colors | J. Wang and Y. Chuang | This paper proposes a simple yet effective method for removing shadows from text document images. |
309 | Exploring Energy Efficient Quantum-resistant Signal Processing Using Array Processors | H. Nejatollahi, S. Shahhosseini, R. Cammarota and N. Dutt | In this work, we explore the energy efficiency of polynomial multiplier using systolic architecture for the first time. |
310 | dMazeRunner: Optimizing Convolutions on Dataflow Accelerators | S. Dave, A. Shrivastava, Y. Kim, S. Avancha and K. Lee | We propose dMazeRunner framework, which allows users to optimize execution methods for accelerating convolution and matrix multiplication on a given architecture and to explore dataflow accelerator designs for efficiently executing CNN models. |
311 | Exploration Methodology for BTI-Induced Failures on RRAM-Based Edge AI Systems | A. Levisse, M. Rios, M. Pe?n-Quir?s and D. Atienza | Based on this observation, in this work, we propose to explore how Edge-level applications running on a RRAM-based Edge device could fail because of Bias Temperature Instability (BTI). |
312 | Time-Predictable Software-Defined Architecture with Sdf-Based Compiler Flow for 5g Baseband Processing | V. Venkataramani, B. Bodin, A. Kulkarni, T. Mitra and L. Peh | We introduce a software-defined array-based many-core architecture, called SPECTRUM, that couples lightweight predictable hardware components with a compiler flow that orchestrates the on-chip hardware resources. |
313 | Accelerating Linear Algebra Kernels on a Massively Parallel Reconfigurable Architecture | A. Soorishetty et al. | We present customized implementation of a select set of linear algebra kernels, namely, triangular matrix solver, LU decomposition, QR decomposition and matrix in-version, on Transformer. |
314 | Energy Efficient Acceleration Of Floating Point Applications Onto CGRA | S. Das, R. Prasad, K. J. M. Martin and P. Coussy | In this paper, we propose a novel CGRA architecture and associated compilation flow supporting both integer and floating-point computations for energy efficient acceleration of DSP applications. |
315 | Fast and Accurate Embedded DCNN for Rgb-D Based Sign Language Recognition | C. Wang, C. Chiu, C. Huang, Y. Ding and L. Wang | In this paper, fast and accurate two paths CNN architecture was designed in hardware-oriented manner. |
316 | D2NA: Day-To-Night Adaptation for Vision based Parking Management System | W. Zheng, V. Tran and C. Huang | In this paper, we propose a novel framework based on day-night domain adaptation, feature disentanglement, and style transfer to transfer the knowledge from day to night. |
317 | EnerGAN: A GENERATIVE ADVERSARIAL NETWORK FOR ENERGY DISAGGREGATION | M. Kaselimi, A. Voulodimos, E. Protopapadakis, N. Doulamis and A. Doulamis | An efficient, appliance-level approach for energy disaggregation, exploiting the benefits of Generative Adversarial Networks, is presented. |
318 | Enhancing the Labelling of Audio Samples for Automatic Instrument Classification Based on Neural Networks | G. Castel-Branco, G. Falcao and F. Perdig?o | In the paper we disambiguate and report the automatic labelling of previously unlabelled samples. |
319 | Deep-Neural-Network Based Fall-Back Mechanism in Interference-Aware Receiver Design | S. Hu, W. Hu, D. Kapetanovic and N. Wang | In this paper, we consider designing a fall-back mechanism in an interference-aware receiver. |
320 | DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures | Y. Zhao, C. Li, Y. Wang, P. Xu, Y. Zhang and Y. Lin | To enable fast and effective DNN accelerator development, we propose DNN-Chip Predictor, an analytical performance predictor which can accurately predict DNN accelerators? energy, throughput, and latency prior to their actual implementation. |
321 | Low-Complexity Fixed-Point Convolutional Neural Networks For Automatic Target Recognition | H. Dbouk, H. Geng, C. M. Vineyard and N. R. Shanbhag | In order to bring the cost of implementing these networks down, we develop a set of compact network architectures and train them in fixed-point. |
322 | Accelerating Distributed Deep Learning By Adaptive Gradient Quantization | J. Guo et al. | In this work, we propose a novel adaptive quantization scheme (AdaQS) to explore the balance between model accuracy and quantization level. |
323 | Lupulus: A Flexible Hardware Accelerator for Neural Networks | A. T. Kristensen, R. Giterman, A. Balatsoukas-Stimming and A. Burg | In this work, we present a flexible hardware accelerator for neural networks, called Lupulus, supporting various methods for scheduling and mapping of operations onto the accelerator. |
324 | Depth Estimation From Single Image Through Multi-Path-Multi-Rate Diverse Feature Extractor | W. Lo, C. Chiu and J. Luo | This paper proposes a multi-path-multi-rate feature extractor, which can effectively extract multi-scale information to make accurate depth predictions. |
325 | Object Detection with Color and Depth Images with Multi-Reduced Region Proposal Network and Multi-Pooling | J. Lin, C. -. T. Chiu and Y. Cheng | We base our network on the Faster R-CNN proposed by Shih et al., and we develop a fast and accurate object detection architecture. |
326 | Deblurring And Super-Resolution Using Deep Gated Fusion Attention Networks For Face Images | C. Yang and L. Chang | To solve the problem, we propose a deep gated fusion attention network (DGFAN) to generate a high resolution image without blurring artifacts. |
327 | Indoor Heading Direction Estimation Using Rf Signals | Y. Fan, F. Zhang, C. Wu, B. Wang and K. J. Ray Liu | In this paper, we utilize the radio frequency (RF) signals, received from the commercial off-the-shelf (COTS) WiFi devices, to accurately estimate the heading direction in indoor environments. |
328 | An Improved Selective Active Noise Control Algorithm Based on Empirical Wavelet Transform | S. Wen, W. Gan and D. Shi | To confront this problem, empirical wavelet transform (EWT) is introduced to simplify the matching and selection of SANC in this paper. |
329 | Towards Real-Time, Multi-View Video Stereopsis | J. Ke, A. J. Watras, J. Kim, H. Liu, H. Jiang and Y. H. Hu | We present a real-time, multi-view video stereopsis (RTMVS) algorithm. |
330 | Ernet Family: Hardware-Oriented Cnn Models For Computational Imaging Using Block-Based Inference | C. Huang | In this paper, we address these issues by considering the overheads and hardware constraints in advance when constructing CNNs. |
331 | A DSP Acceleration Framework For Software-Defined Radios On X86 64 | G. Georgis, A. Thanos, M. Filo and K. Nikitopoulos | This paper presents a DSP acceleration and assessment framework targeting SDR platforms on x86_64 architectures. |
332 | Fast Single-View 3D Object Reconstruction with Fine Details Through Dilated Downsample and Multi-Path Upsample Deep Neural Network | C. Hsu, C. Chiu and C. Kuan | To address this issue, we proposed two methods: the dilated downsample block and the multi-path upsample block. |
333 | Processing Convolutional Neural Networks on Cache | J. Vieira, N. Roma, G. Falcao and P. Tom?s | In this paper, we propose and assess a novel mechanism that operates at cache level, leveraging both data-proximity and parallel processing capabilities, enabled by dedicated fully-digital vector Functional Units (FUs). |
334 | Lightweight Hardware Implementation of VVC Transform Block for ASIC Decoder | I. Farhat, W. Hamidouche, A. Grill, D. Menard and O. D?forges | In this paper, an efficient hardware implementation of all DCT/DST transform types and sizes is proposed. |
335 | Rgb-D Based Multi-Modal Deep Learning for Face Identification | T. Lin, C. Chiu and C. Tang | In the proposed architecture, we implement the networks in dual CNN paths for color and depth images separately. |
336 | A Real Time Implementation of a Bayer Domain Image Deblurring Core for Optical Blur Compensation | H. Lee et al. | In this letter, we present an implementation of deblurring hardware to mitigate blur incurred by optical aberrations in a real-time manner to increase resolution for mobile camera modules. |
337 | Self-Attentive Sentimental Sentence Embedding for Sentiment Analysis | S. Lin, W. Su, P. Chien, M. Tsai and C. Wang | We propose the use of a word-level sentiment bidirectional LSTM in tandem with the self-attention mechanism for sentence-level sentiment prediction. |
338 | Decidable Variable-Rate Dataflow for Heterogeneous Signal Processing Systems | Y. Ma, J. Wu, S. S. Bhattacharyya and J. Boutellier | The paper presents the VR-PRUNE model of computation and runtime, and illustrates its applicability to practical signal processing applications by two use cases: an adaptive convolutional neural network, and a predistortion filter for wireless communications. |
339 | Back-to-Back Butterfly Network, an Adaptive Permutation Network for New Communication Standards | H. Harb and C. Chavet | In this paper, we introduce an adaptive Back-to-Back Butterfly Network (B2BN) dedicated to next communication standards. |
340 | 1.5GBIT/S 4.9W Hyperspectral Image Encoders on a Low-Power Parallel Heterogeneous Processing Platform | O. Ferraz, V. Silva and G. Falcao | This work explores the utilization of low-power heterogeneous devices for parallelizing the compute-intensive hyper-spectral and multispectral image compression CCSDS-123 entropy encoders. |
341 | Design of A Convergence-Aware Based Expectation Propagation Algorithm for Uplink Mimo Scma Systems | J. Lin and P. Tsai | To further reduce the complexity, a convergence-aware based EPA for uplink MIMO SCMA systems is proposed. Techniques including user termination, antenna termination, and codebook reduction are adopted. |
342 | Bipartite Belief Propagation Polar Decoding With Bit-Flipping | Z. Gong et al. | In this paper, we introduce the bit-flipping scheme into the LDPC-like BP (L-BP) decoding and propose two methods to identify the error-prone VNs. |
343 | Low-Complexity LSTM-Assisted Bit-Flipping Algorithm For Successive Cancellation List Polar Decoder | C. Chen, C. Teng and A. A. Wu | In this work, we leverage expert knowledge in communication systems and adopt deep learning (DL) techniques to obtain a better solution. |
344 | Adaptive Normalization for Forecasting Limit Order Book Data Using Convolutional Neural Networks | N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj and A. Iosifidis | In this paper we propose a data-driven adaptive normalization layer which is capable of learning the most appropriate normalization scheme that should be applied on the data. |
345 | Greedy Hybrid Rate Adaptation in Dynamic Wireless Communication Environment | Y. Zhao, K. Kang, H. Qian, X. Luo and Y. Jin | In this paper, we model the rate selection problem as a multi-armed bandit (MAB) problem and propose an online learning rate adaptation algorithm that learns the channel status from both RSSI and ACK/NACK signals. |
346 | A WiFi-Based Passive Fall Detection System | Y. Hu, F. Zhang, C. Wu, B. Wang and K. J. Ray Liu | In this paper, we propose DeFall, a novel WiFi-based environment-independent fall detection system by leveraging the features inherently associated with human falls ? the patterns of speed and acceleration over time. |
347 | Programmable Dataflow Accelerators: A 5G OFDM Modulation/Demodulation Case Study | Y. Wu, P. Wang and J. McAllister | This paper describes an FFT accelerator for such a context. |
348 | Simplified Dynamic SC-Flip Polar Decoding | F. Ercan, T. Tonnellier, N. Doan and W. J. Gross | In this work, we propose a simple approximation that replaces the transcendental computations of DSCF decoding. |
349 | Real-Time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems | Y. Xie, C. Shi, Z. Li, J. Liu, Y. Chen and B. Yuan | In this paper, we propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system. |
350 | An Odorant Encoding Machine for Sampling, Reconstruction and Robust Representation of Odorant Identity | A. A. Lazar, T. Liu and C. Yeh | We solve the sampling problem by developing the Odorant Encoding Machine (OEM), a biomimetic system based on the latest insights in the architectural organization of the fruit fly early olfactory system. |
351 | FIR Filter Design and Implementation for Phase-Based Processing | S. Huang, W. Chen and C. Huang | In this paper, we study computation- and memory-efficient finite impulse response filter implementation for CSP. |
352 | Fixed-Point Optimization of Transformer Neural Network | Y. Boo and W. Sung | In this study, we quantize the parameters and hidden signals of the Transformer for complexity reduction. |
353 | A Fifo Based Accelerator for Convolutional Neural Networks | V. Panchbhaiyye and T. Ogunfunmi | In this paper we present an architecture which takes a novel approach to compute convolution results using row-wise inputs as opposed to traditional tile-based processing. |
354 | Soft-Output Finite Alphabet Equalization for mmWave Massive MIMO | O. Casta?eda, S. Jacobsson, G. Durisi, T. Goldstein and C. Studer | In this work, we improve upon finite-alphabet equalization by performing unbiased estimation and soft-output computation for coded systems. |
355 | Diversity and Sparsity: A New Perspective on Index Tracking | Y. Zheng, T. M. Hospedales and Y. Yang | We introduce the first index tracking method that explicitly optimises both diversity and sparsity in a single joint framework. |
356 | Sparse Beamspace Equalization for Massive MU-MIMO MMWave Systems | S. H. Mirfarshbafan and C. Studer | We propose equalization-based data detection algorithms for all-digital millimeter-wave (mmWave) massive multiuser multiple-input multiple-out (MU-MIMO) systems that exploit sparsity in the beamspace domain to reduce complexity. |
357 | Threshold-Adjusted ORB Strategies with Genetic Algorithm and Protective Closing Strategy on Taiwan Futures Market | J. Syu, M. Wu, C. Chen and J. Ho | In this paper, we adjust thresholds through historical data to enhance profitability, and design protective closing strategy to prevent unacceptable losses. |
358 | Can every analog system be simulated on a digital computer? | H. Boche and V. Pohl | This paper shows that there exist very simple linear time-invariant (LTI) systems which can not be simulated on a Turing machine. |
359 | Low-Complexity Compressed Alignment-Aided Compressive Analysis for Real-Time Electrocardiography Telemonitoring | Y. Pua, C. Chou and A. A. Wu | In this paper, we propose a new compressed alignment-aided compressive analysis (CA-CA) framework that enables simple alignment and low-complexity requirements. |
360 | Reduced-Complexity Singular Value Decomposition For Tucker Decomposition: Algorithm And Hardware | X. Hu, C. Deng and B. Yuan | In this paper, we propose a reduced-complexity SVD (Singular Vector Decomposition) scheme, which serves as the key operation in Tucker decomposition. |
361 | An Early Termination Scheme for Successive Cancellation List Decoding of Polar Codes | H. Lee, Y. Pao, C. Chi, H. Lee and Y. Ueng | In order to minimize the decoding period and the response time for Polar Codes, an early termination (ET) scheme based on additional check points (ACPs) is proposed in this work. |
362 | Low-Latency Lightweight Streaming Speech Recognition with 8-Bit Quantized Simple Gated Convolutional Neural Networks | J. Park, X. Qian, Y. Jo and W. Sung | In this paper, we propose a low-latency on-device speech recognition system with a simple gated convolutional network (SGCN). |
363 | Shape From Bandwidth: Central Projection Case | G. Elhami, A. J. Scholefield and M. Vetterli | We model the problem as a central projection operation in 2D and propose a two step approach for finding the surface from the observed image. |
364 | Sequential Deep Unrolling With Flow Priors For Robust Video Deraining | X. Xue, Y. Ding, P. Mu, L. Ma, R. Liu and X. Fan | A sequential deep unrolling framework is substantially presented by solving this model based on optimization techniques. |
365 | A Fast and Accurate Super-Resolution Network Using Progressive Residual Learning | H. Liu, Z. Lu, W. Shi and J. Tu | In this work, a lightweight network using progressive residual learning for SISR (PRLSR) is proposed to address this issue. |
366 | REV-AE: A Learned Frame Set for Image Reconstruction | S. Li, Z. Zheng, W. Dai, J. Zou and H. Xiong | In this paper, we propose a reversible autoencoder (Rev-AE) with this extended non-linear lifting scheme to improve image reconstruction. |
367 | Decomposed Cyclegan for Single Image Deraining With Unpaired Data | K. Han and X. Xiang | Inspired by adopting unpaired data in task of translation, in this paper we present a new method for rain removal using unpaired data. |
368 | Slicenet: Slice-Wise 3D Shapes Reconstruction from Single Image | Y. Wu, Z. Sun, Y. Song, Y. Sun and J. Shi | In this paper, we propose SliceNet, sequentially generating 2D slices of 3D shapes with shared 2D deconvolution parameters. |
369 | Sight to Sound: An End-to-End Approach for Visual Piano Transcription | A. S. Koepke, O. Wiles, Y. Moses and A. Zisserman | In this work, we address the problem of transcribing piano music from visual data alone. |
370 | Exocentric to Egocentric Image Generation Via Parallel Generative Adversarial Network | G. Liu, H. Tang, H. Latapie and Y. Yan | In this paper, we investigate exocentric (third-person) view to egocentric (first-person) view image generation. |
371 | Focus on Semantic Consistency for Cross-Domain Crowd Understanding | T. Han, J. Gao, Y. Yuan and Q. Wang | However, we found that a mass of estimation errors in the background areas impede the performance of the existing methods. In this paper, we propose a domain adaptation method to eliminate it. |
372 | Improved Real-Time Visual Tracking via Adversarial Learning | H. Zhong, X. Yan, Y. Jiang and S. Xia | In this paper, we attempt to combine the advantages from both methods and propose an improved real-time visual tracking algorithm via adversarial learning to get a more balanced result in accuracy and tracking speed. |
373 | Spatial-Temporal Feature Aggregation Network For Video Object Detection | Z. Chen, W. Li, C. Fei, B. Liu and N. Yu | In this paper, we propose a novel spatial-temporal feature aggregation network to deal with this issue. |
374 | Using Panoramic Videos for Multi-Person Localization and Tracking In A 3D Panoramic Coordinate | F. Yang, F. Li, Y. Wu, S. Sakti and S. Nakamura | In this work, we propose an effective and efficient approach at a low cost. |
375 | Position Constraint Loss For Fashion Landmark Estimation | H. Liu, M. Song, W. Shi and X. Li | To alleviate these issues, we propose Position Constraint Loss (PCLoss) to constrain error landmark locations by utilizing the position relationship of landmarks. |
376 | Receptive Field Pyramid Network for Object Detection | F. Wu, A. J. Ma, Y. Pan, Y. Gao and X. Yan | To overcome this limitation and carry out better object detection, we design a novel network named Receptive Field Pyramid Network (RFPN). |
377 | Robust Visual Tracking with Context-Based Active Occlusion Recognition | Y. Gu, Y. Qiao, K. Xu, H. Xu and X. Fang | In this paper, we propose a context-based active occlusion recognition framework that can be integrated with various tracking approaches. |
378 | Leveraging Ordinal Regression With Soft Labels For 3d Head Pose Estimation From Point Sets | S. Xiao, N. Sang, X. Wang and X. Ma | In contrast to existing approaches that take 2D depth image as input, we propose a novel deep regression architecture called Head PointNet, which consumes 3D point sets derived from a depth image describing the visible surface of a head. |
379 | Solving Missing-Annotation Object Detection with Background Recalibration Loss | H. Zhang, F. Chen, Z. Shen, Q. Hao, C. Zhu and M. Savvides | In this paper, we introduce a superior solution called Background Recalibration Loss (BRL) that can automatically re-calibrate the loss signals according to the pre-defined IoU threshold and input image. |
380 | Face Feature Recovery via Temporal Fusion for Person Search | C. Fan, C. Liu, K. Wang, J. Jhan, Y. F. Wang and J. Chen | To address the issue, we propose a unique framework of “Face Feature Recovery via Temporal Fusion” to synthesize virtual facial features by observing both temporal and contextual information. |
381 | Edgefool: an Adversarial Image Enhancement Filter | A. S. Shamsabadi, C. Oh and A. Cavallaro | In this paper, we propose EdgeFool, an adversarial image enhancement filter that learns structure-aware adversarial perturbations. |
382 | Facial Feature Embedded Cyclegan For Vis-Nir Translation | H. Wang, H. Zhang, L. Yu, L. Wang and X. Yang | Inspired by the CycleGAN, this paper presents a method aiming to translate between VIS and NIR face images. |
383 | Deep Image Deblurring Using Local Correlation Block | W. Su, Y. Yuan and Q. Wang | In this paper, we propose a local correlation block (LCBlock), which can adjust the weights of features adaptively according to the blurry inputs. |
384 | Global Structure Graph Guided Fine-Grained Vehicle Recognition | C. Wang, H. Fu and H. Ma | In this paper, we propose an approach that introduces the structure graph into consideration to learn distinguishing representations for vehicle recognition. |
385 | Triplet Loss Feature Aggregation for Scalable Hash | W. Jia, L. Li, Z. Li, S. Zhao and S. Liu | In this paper, we propose a novel content-based video segmentation identification scheme that is invariant to the underlying codec and operational bit rates. |
386 | HDMFH: Hypergraph Based Discrete Matrix Factorization Hashing for Multimodal Retrieval | J. Gao, W. Zhang, Z. Chen and F. Zhong | To address these issues, in this paper, we propose a novel cross-modal hashing method, named Hypergraph Based Discrete Matrix Factorization Hashing (HDMFH), for multimodal retrieval. |
387 | Multi-Scale Deep Feature Fusion for Vehicle Re-Identification | Y. Cheng, C. Zhang, K. Gu, L. Qi, Z. Gan and W. Zhang | In this paper, we propose a novel Multi-Scale Deep Feature Fusion Network (MSDeep) to conduct both multi-scale and multi-level features for precise vehicle re-id. |
388 | Crowdsourcing-Based Ranking Aggregation for Person Re-Identification | Y. Yu, C. Liang, W. Ruan and L. Jiang | To this end, this paper proposes a crowdsourcing-based ranking aggregation to adaptively fuse multiple ranking lists for re-ID problem. |
389 | Deep Multi-Region Hashing | Q. Zhou, X. Nie, Y. Shi, X. Liu and Y. Yin | To address this issue, in this paper, we proposed a novel Deep Multi-Region Hashing (DMRH) method to fully utilize the semantic details, which uses overlapping N ? N regions of an image to learn N2 hash codes for getting a final hash code. |
390 | Semantic Augmentation Hashing for Zero-Shot Image Retrieval | F. Zhong, Z. Chen, G. Min and F. Xia | In this paper, we propose a novel Semantic Augmentation Hashing (SAH) for zero-shot image retrieval. |
391 | End-To-End Generation of Talking Faces from Noisy Speech | S. E. Eskimez, R. K. Maddox, C. Xu and Z. Duan | In this work, we propose an end-to-end (no pre- or post-processing) system that can generate talking faces from arbitrarily long noisy speech. |
392 | Unsupervised Image-to-Image Translation Via Fair Representation of Gender Bias | S. Hwang and H. Byun | In this work, we propose a framework of unsupervised Image-to-Image translation that learns a fair representation by separating the latent space of our model into two purposes: 1) Target Attribute Editing, 2) Gender Preserving. |
393 | Video Frame Interpolation Via Exceptional Motion-Aware Synthesis | M. Park, S. Lee and Y. M. Ro | In this paper, we propose a novel video frame interpolation method via exceptional motion-aware synthesis, in which accurate optical flow could be estimated even with exceptional motion patterns. |
394 | Look Globally, Age Locally: Face Aging With an Attention Mechanism | H. Zhu, Z. Huang, H. Shan and J. Zhang | To address this deficiency, this paper introduces an Attention Conditional GANs (AcGANs) approach for face aging, which utilizes attention mechanism to only alert the regions relevant to face aging. |
395 | Design-Gan: Cross-Category Fashion Translation Driven By Landmark Attention | Y. Lang, Y. He, J. Dong, F. Yang and H. Xue | In this paper, we propose a novel approach, called DesignGAN, that utilizes the landmark guided attention and a similarity constraint mechanism to achieve fashion cross-category translation. |
396 | Intensity-Image Reconstruction for Event Cameras Using Convolutional Neural Network | Y. Chen, W. Chen, X. Cao and Q. Hua | In this paper, “event frames” are recovered from event streams in an attenuation method and they are fed into the U-net network to generate intensity images. |
397 | Colour Compression of Plenoptic Point Clouds Using Raht-Klt with Prior Colour Clustering and Specular/Diffuse Component Separation | M. Krivokuca and C. Guillemot | In the current paper, we demonstrate that the best-proposed RAHT extension, RAHT-KLT, can be improved by performing a prior subdivision of the plenoptic point cloud into clusters based on similar colour values, followed by a separation of each cluster into specular and diffuse components, and coding each component separately with RAHT-KLT. |
398 | Super-Resolution of 3D Color Point Clouds Via Fast Graph Total Variation | C. Dinesh, G. Cheung and I. V. Bajic | In this paper, we propose a fast super-resolution (SR) algorithm for color 3D point clouds. |
399 | Deep Monocular Video Depth Estimation Using Temporal Attention | H. Ren, M. El-khamy and J. Lee | In this paper, we propose an end-to-end monocular video depth estimation network based on temporal attention. |
400 | Robust Full-Fov Depth Estimation in Tele-Wide Camera System | K. Guo, S. Song, S. Chang, T. Kim, S. Han and I. Kim | In this paper, to address the above problems we propose a hierarchical hourglass network for robust full-FoV depth estimation in tele-wide camera system, which combines the robustness of traditional stereo-matching methods with the accuracy of DNN. |
401 | Manet: Multi-Scale Aggregated Network For Light Field Depth Estimation | Y. Li, L. Zhang, Q. Wang and G. Lafruit | We present a novel end-to-end network, MANet, for light field depth estimation. |
402 | EPI-Neighborhood Distribution Based Light Field Depth Estimation | J. Li and X. Jin | In this paper, a novel depth estimation algorithm tackling foreground occlusion is proposed based on the neighborhood distribution in the sheared epipolar images (EPIs). |
403 | Stochastic Multi-Scale Aggregation Network for Crowd Counting | M. Wang, H. Cai, J. Zhou and M. Gong | This paper presents a novel end-to-end stochastic multi-scale aggregation network (SMANet) which carefully addresses these issues. |
404 | MDR-SURV: A Multi-Scale Deep Learning-Based Radiomics for Survival Prediction in Pulmonary Malignancies | P. Afshar, A. Oikonomou, K. N. Plataniotis and A. Mohammadi | In this work, we propose a Multi-scale Deep learning-based Radiomics model, referred to as “MDR-SURV” that exploits the information from positron emission tomography/computed tomography (PET/CT) images, combined with other clinical factors, to predict the overall survival (OS). |
405 | Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising | T. Alt and J. Weickert | To reduce this gap, we introduce a generic wavelet shrinkage function for denoising which is adaptive to both the wavelet scales as well as the noise standard deviation. |
406 | Multi-Scale Residual Network for Image Classification | X. Zhong, O. Gong, W. Huang, J. Yuan, B. Ma and R. W. Liu | In this paper, we propose the Multi-Scale Residual (MSR) module that integrates multi-scale feature maps of the underlying information to the last layer of Convolutional Neural Network. |
407 | Deep Multi-Scale Gabor Wavelet Network for Image Restoration | H. Dong, X. Zhang, Y. Guo and F. Wang | In this paper, we propose a Multiscale Gabor Wavelet Network (MsGWN) for image restoration. |
408 | Residual Attention Network for Wavelet Domain Super-Resolution | J. Liu, Y. Xie, H. Song, W. Yuan and L. Ma | In this paper, we propose a novel network with better textural details in wavelet domain, which is composed of a feature extract layer, residual channel attention groups (RCAG) and a residual up-sampling layer based on inverse discrete wavelet transform. |
409 | An Adaptive Linear Estimator Based Approach to Bi-Directional Motion Compensated Prediction | B. Li, J. Han and K. Rose | This paper proposes a novel bi-directional motion compensation mode that efficiently utilizes the motion information that is already available to the decoder, without recourse to extensive search. |
410 | Spherical Video Coding with Geometry and Region Adaptive Transform Domain Temporal Prediction | B. Vishwanath and K. Rose | To account for such variations, we propose geometry and region adaptive TDTP that is tailored to spherical videos. |
411 | Versatile Video Coding and Super-Resolution for Efficient Delivery of 8k Video with 4k Backward-Compatibility | C. Bonnineau, W. Hamidouche, J. Travers and O. D?forges | In this paper, we propose, through an objective study, to compare and evaluate the performance of different coding approaches allowing the delivery of an 8K video signal with 4K backward-compatibility on broadcast networks. |
412 | Alternative Half-Sample Interpolation Filters for Versatile Video Coding | A. Henkel et al. | In this paper, alternative half-luma-sample interpolation filters are proposed. |
413 | Just Noticeable Distortion Based Perceptually Lossless Intra Coding | X. Shen, X. Zhang, S. Wang, S. Kwong and G. Zhu | In this paper, a just noticeable distortion (JND) guided perceptually lossless coding framework is proposed for Versatile Video Coding (VVC) intra coding. |
414 | Efficient Deep Learning-Based Lossy Image Compression Via Asymmetric Autoencoder and Pruning | J. Kim, J. Choi, J. Chang and J. Lee | In this paper, we propose efficient lossy image compression methods based on asymmetric autoencoder and decoder pruning. |
415 | Deriving Compact Feature Representations Via Annealed Contraction | M. A. Shah and B. Raj | To this end we propose a technique that shrinks a layer by an iterative process in which neurons are removed from the and network is fine tuned. |
416 | Fast Clustering With Co-Clustering Via Discrete Non-Negative Matrix Factorization for Image Identification | F. Nie, S. Pei, R. Wang and X. Li | To address this problem, a novel clustering method called fast clustering with co-clustering via discrete non-negative matrix factorization, is proposed. |
417 | Compressive Adaptive Bilateral Filtering | P. Nair, R. G. Gavaskar and K. N. Chaudhury | We propose a fast algorithm for an adaptive variant of the classical bilateral filter, where the range kernel is allowed to vary from pixel to pixel. |
418 | Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images | B. Zhang, S. Jin, Y. Xia, Y. Huang and Z. Xiong | In this paper, attention mechanism enhanced kernel prediction networks (AME-KPNs) are proposed for burst image denoising, in which, nearly cost-free attention modules are adopted to first refine the feature maps and to further make a full use of the inter-frame and intra-frame redundancies within the whole image burst. |
419 | Image Restoration Via Data-Dependent Proximal Averaged Optimization | P. Mu, J. Chen, R. Liu, W. Zhong, X. Fan and Z. Luo | To partially address the above issues, we develop a Data-dependent Proximal Averaged (DPA) paradigm through optimizing objective and data-dependent feasibility constraint for the challenging Image Restoration (IR) tasks. |
420 | Multi-Way Multi-View Deep Autoencoder for Image Feature Learning with Multi-Level Graph Regularization | Z. Fang, S. Zhou, X. Li and H. Zhu | In this paper, we propose a multi-way deep autoencoder for multi-view feature learning to explore the deep consensus structure and reconcile the efficiency of encoding process meanwhile. |
421 | Exposure Interpolation Via Hybrid Learning | C. Zheng, Z. Li, Y. Yang and S. Wu | A natural question would be “Is there any space for conventional methods on these problems?” In this paper, exposure interpolation is taken as an example to answer this question and the answer is “Yes”. |
422 | Learning Spatio-Temporal Representations With Temporal Squeeze Pooling | G. Huang and A. G. Bors | In this paper, we propose a new video representation learning method, named Temporal Squeeze (TS) pooling, which can extract the essential movement information from a long sequence of video frames and map it into a set of few im?ages, named Squeezed Images. |
423 | Fine-Grained Giant Panda Identification | R. Ding, L. Wang, Q. Zhang, Z. Niu, N. Zheng and G. Hud | To address these challenges, we propose the Feature-Fusion Convolutional Neural Network with Patch Detector (FFCNN-PD) algorithm, which exploits the discriminative local patches and builds a hierarchical representation generated by fusing both global and local features. |
424 | Learning from Dances: Pose-Invariant Re-Identification for Multi-Person Tracking | H. Ho, M. Shim and D. Wee | With the goal of learning pose-invariant representations, we propose an end-to-end deep learning framework Sparse-Temporal ReID Network. |
425 | Learning Fractional Orthogonal Latent Consistent Features for Face Hallucination and Recognition | Y. Yuan, J. Li, Y. Li, J. Qiang and B. Li | In this paper, we propose a novel FH method via fractional orthogonal latent consistent features that we call fractional orthogonal partial least squares based FH (FOPLSFH). |
426 | Sparse Modeling on Distributed Encryption Data | Y. BANDOH, T. NAKACHI and H. KIYA | In this paper, we construct an analysis model for data encrypted with the random unitary transform by deriving a LASSO solution for encrypted data. |
427 | S-DOD-CNN: Doubly Injecting Spatially-Preserved Object Information for Event Recognition | H. Lee, S. Eum and H. Kwon | We present a novel event recognition approach called Spatially-preserved Doubly-injected Object Detection CNN (S-DOD-CNN), which incorporates the spatially preserved object detection information in both a direct and an indirect way. |
428 | Angular Discriminative Deep Feature Learning for Face Verification | B. Wu and H. Wu | In order to bridge the gap between training and testing, we require that the intra-class cosine similarity of the inner-product layer before softmax loss is larger than a margin in the training step, accompanied by the supervision signal of softmax loss. |
429 | 3d Deformation Signature for Dynamic Face Recognition | A. E. Rahman Shabayek, D. Aouada, K. Cherenkova, G. Gusev and B. Ottersten | This work proposes a novel 3D Deformation Signature (3DS) to represent a 3D deformation signal for 3D Dynamic Face Recognition. |
430 | ASR is All You Need: Cross-Modal Distillation for Lip Reading | T. Afouras, J. S. Chung and A. Zisserman | The goal of this work is to train strong models for visual speech recognition without requiring human annotated ground truth data. |
431 | Color Stabilization for Multi-Camera Light-Field Imaging | O. V. Thanh, T. Canham, J. Vazquez-Corral, R. G. Rodr?guez and M. Bertalm?o | In this work we adapt and extend to the light-field scenario a color stabilization method previously proposed for standard multi-camera shoots, and demonstrate experimentally that it provides an improvement over the state-of-the-art techniques for light-field imaging. |
432 | Learning Spatio-Temporal Convolutional Network for Real-Time Object Tracking | H. Chen, X. Xing and X. Xu | In this paper we focus on making using of the rich information in latest consecutive frames to improve the feature representation of initial template frame. |
433 | Learned Lossless Image Compression with A Hyperprior and Discretized Gaussian Mixture Likelihoods | Z. Cheng, H. Sun, M. Takeuchi and J. Katto | This paper generalizes the hyperprior from lossy model to lossless compression, and proposes a L2-norm term into the loss function to speed up training procedure. |
434 | Variable Bitrate Image Compression with Quality Scaling Factors | T. Chen and Z. Ma | In this paper, we propose to embed a set of quality scaling factors (SFs) into learned image compression network, by which we can encode images across an entire bitrate range with a single model. |
435 | Binary Probability Model for Learning Based Image Compression | T. LADUNE, P. PHILIPPE, W. HAMIDOUCHE, L. ZHANG and O. D?FORGES | In this paper, we propose to enhance learned image compression systems with a richer probability model for the latent variables. |
436 | Improved Probability Modelling for Exception Handling in Lossless Screen Content Coding | T. Strutz | This paper proposes an enhanced version of this exception handling coding. |
437 | Spatially Adaptive Intra Mode Pre-Selection for ERP 360 Video Coding | I. Storch, B. Zatt, L. Agostini, G. Correa, L. A. da Silva Cruz and D. Palomino | In this work, we propose a spatially adaptive HEVC intra mode pre-selection for equirectangular (ERP) 360 video coding. |
438 | Semi-Regular Geometric Kernel Encoding & Reconstruction for Video Compression | X. Jiang, C. Yang, G. Cheung and S. Takamura | In this paper, we extract a best-fitting “semi-regular” geometric structure from a target spatial region in a frame group, which is encoded separately as a unified signal predictor for these frames. |
439 | Leveraging Cuboids for Better Motion Modeling in High Efficiency Video Coding | A. Ahmmed, M. Murshed and M. Paul | Leveraging on the advantages of cuboids, in this paper, we propose to discover homogeneous motion regions and their associated motion based on cuboids. |
440 | Adversarial Video Compression Guided by Soft Edge Detection | S. Kim et al. | We propose a video compression framework using conditional Generative Adversarial Networks (GANs). |
441 | Compressing Flow Fields with Edge-Aware Homogeneous Diffusion Inpainting | F. Jost, P. Peter and J. Weickert | In spite of the fact that efficient compression methods for dense two-dimensional flow fields would be very useful for modern video codecs, hardly any research has been performed in this area so far. Our paper addresses this problem by proposing the first lossy diffusion-based codec for this purpose. |
442 | Adaptive Resolution Change Using Uncoded Areas and Dictionary Learning-Based Super-Resolution in Versatile Video Coding | J. Schneider, J. Sauer and C. Rohlfing | This contribution introduces an ARC concept using un-coded areas within a frame of a video sequence and a Dictionary Learning (DL)-based Super-Resolution (SR) scheme. |
443 | RDE-MOGA: Automatic Selection of Rate-Distortion-Energy Control Points for Video Encoders Using Muti-Objetive Genetic Algorithm | ?. D. Machado, M. S. de Aguiar, M. Porto, G. Corr?a, D. Palomino and B. Zatt | In this work we propose the RDE-MOGA, an multi-objective genetic algorithm capable of finding energetically efficient configurations for the HEVC encoder and replacing the current sensitivity analysis methodologies in the development of energy controllers. |
444 | A Connected Auto-Encoders Based Approach for Image Separation with Side Information: With Applications to Art Investigation | W. Pu, B. Sober, N. Daly, C. Higgitt, I. Daubechies and M. R. D. Rodrigues | In this paper, we propose a new architecture based on the use of ?connected? auto-encoders in order to separate mixed X-ray images acquired from double-sided paintings, where in addition to the mixed X-ray image one can also exploit the two RGB images associated with the front and back of the painting. |
445 | Self-Supervised Adversarial Training | K. Chen et al. | With these views, we introduce self-supervised learning to against adversarial examples in this paper. |
446 | Gray-Scale Image Colorization Using Cycle-Consistent Generative Adversarial Networks with Residual Structure Enhancer | M. M. Johari and H. Behroozi | Since one can consider the gray-scale and colorful images as two separate domains, we propose a two-stage cycle-consistent network architecture to produce convincible images. |
447 | All You Need is a Second Look: Towards Tighter Arbitrary Shape Text Detection | M. Cao and Y. Zou | To address these problems, we innovatively propose a two-stage segmentation based arbitrary text detector named NASK (Need A Second looK). |
448 | Compare Learning: Bi-Attention Network for Few-Shot Learning | L. Ke, M. Pan, W. Wen and D. Li | In this paper, we propose a novel approach named Bi-attention network to compare the instances, which can measure the similarity between embeddings of instances precisely, globally and efficiently. |
449 | Arnet: Attention-Based Refinement Network for Few-Shot Semantic Segmentation | R. Li, H. Liu, Y. Zhu and Z. Bai | In this paper, we propose an Attention-based Refinement Network (ARNet) for few-shot semantic segmentation, which consists of three branches: the guidance branch, the segmentation branch and the refinement branch. |
450 | Lightdet: A Lightweight and Accurate Object Detection Network | Q. Tang, J. Li, Z. Shi and Y. Hu | In this paper, we present a lightweight object detector, named LightDet, to address this dilemma. |
451 | Self-Supervised Deep Learning for Fisheye Image Rectification | C. Chao, P. Hsu, H. Lee and Y. F. Wang | To rectify fisheye distortion from a single image, we advance self-supervised learning strategies and propose a unique deep learning model of Fisheye GAN (FE-GAN). |
452 | Sketchppnet: A Joint Pixel and Point Convolutional Neural Network For Low Resolution Sketch Image Recognition | X. Zhu, Y. Xiao, Y. Zheng, G. Tan and S. Zhou | To solve this problem, we propose a joint pixel and point convolutional neural network for LR sketch image recognition. |
453 | All In One Network for Driver Attention Monitoring | D. Yang et al. | To address this problem, we propose a multi-task learning CNN framework which simultaneously solve these tasks. |
454 | Unsupervised Domain Adaptation for Semantic Segmentation with Symmetric Adaptation Consistency | Z. Li, R. Togo, T. Ogawa and M. Haseyama | In this paper, we utilize adversarial learning and semi-supervised learning simultaneously to solve the task of unsupervised domain adaptation in semantic segmentation. |
455 | IQ-STAN: Image Quality Guided Spatio-Temporal Attention Network for License Plate Recognition | C. Zhang, Q. Wang and X. Li | In order to tackle this issue, a novel deep multi-task learning-based method is proposed in this paper by introducing contextual information in multiple license plate frames. |
456 | Weakly Supervised Semantic Segmentation For Remote Sensing Hyperspectral Imaging | E. Moliner, L. S. Romero and V. Vilaplana | This paper studies the problem of training a semantic segmentation neural network with weak annotations, in order to be applied in aerial vegetation images from Teide National Park. |
457 | Social Data Assisted Multi-Modal Video Analysis For Saliency Detection | J. Xia et al. | In this paper, we propose a novel learning-based multi-modal method for optimizing user-oriented video analysis. |
458 | View-Angle Invariant Object Monitoring Without Image Registration | X. Zhang, C. Huo and C. Pan | To address the above difficulties, a novel object-specific change detection approach is proposed for object monitoring in this paper. |
459 | Hierarchical Sequence Representation with Graph Network | D. Chen, X. Wu, J. Dong, Y. He, H. Xue and F. Mao | In this paper, we propose a novel video classification method based on a deep convolutional graph neural network (DCGN). |
460 | Multi Image Depth from Defocus Network with Boundary Cue for Dual Aperture Camera | G. Song, Y. Kim, K. Chun and K. M. Lee | In this paper, we estimate depth information using two defocused images from dual aperture camera. |
461 | Height and Weight Estimation from Unconstrained Images | C. Y. Altinigne, D. Thanou and R. Achanta | We present a deep learning scheme that relies on simultaneous prediction of human silhouettes and skeletal joints as strong regularizers that improve the prediction of attributes such as height and weight. |
462 | Sampling Strategies for GAN Synthetic Data | B. Bhattarai, S. Baek, R. Bodur and T. Kim | To this end, we propose to maximally utilise the parameters learned during training of the GAN itself. |
463 | Auglabel: Exploiting Word Representations to Augment Labels for Face Attribute Classification | B. Bhattarai, R. Bodur and T. Kim | In this paper, we present a simple, yet effective novel method to generate fixed dimensional labels with continuous values for images by exploiting the word2vec ? semantic representations ? of the existing categorical labels. |
464 | Multi-Task Center-Of-Pressure Metrics Estimation from Skeleton Using Graph Convolutional Network | C. Du, S. Graham, S. Jin, C. Depp and T. Nguyen | In this paper, we propose an end-to-end framework to estimate the COP path length and the COP positions from the 3D skeleton, utilizing the spatial-temporal features learned by graph convolutional networks. |
465 | Regression Before Classification for Temporal Action Detection | C. Jin, T. Zhang, W. Kong, T. Li and G. Li | In this paper, we propose to eliminate this inconsistency by making two modifi-cations to the action classifier: 1) redirecting the classification loss to the refined proposal, and 2) rearranging the location regressor before the action classifier so that the feature of the refined proposal is fed to the classifier. |
466 | Multi-Task Learning in Autonomous Driving Scenarios Via Adaptive Feature Refinement Networks | M. Zhai, X. Xiang, N. Lv and A. E. Saddik | In this paper, we combine an adaptive feature refinement module and a unified framework for joint learning of optical flow, depth and camera pose estimation in an unsupervised manner. |
467 | A Real-Time Deep Network for Crowd Counting | X. Shi, X. Li, C. Wu, S. Kong, J. Yang and L. He | In this paper, we propose a compact convolutional neural network for crowd counting which learns a more efficient model with a small number of parameters. |
468 | Drift Detection and Correction Post-Tracking | T. Ghoniemy and M. A. Amer | This paper proposes a method that first detects, at each frame, if a tracker tends to drift by analyzing saliency features of the output BB of a tracker, and then applies automatic seeded object segmentation on the BB to correct the drift once detected. |
469 | Interpretable Self-Attention Temporal Reasoning for Driving Behavior Understanding | Y. Liu, Y. Hsieh, M. Chen, C. -. H. Yang, J. Tegner and Y. -. J. Tsai | We proposed a perturbation-based visual explanation method to inspect the models? performance visually. |
470 | Non-Uniform Video Time-Lapse Method Based on Motion Scenario and Stabilization Constraint | K. Guo et al. | More specific, we introduce an advanced Markov chain (MC) model, in which smoothed camera trajectory, FOV loss constraint of VS, camera motion scenario, and sampling interval similarity between consecutive frames are encoded as potential functions. |
471 | Key Action and Joint CTC-Attention based Sign Language Recognition | H. Li, L. Gao, R. Han, L. Wan and W. Feng | In this paper, we propose to hierarchically search key actions by a pyramid BiLSTM. |
472 | Learning Geometric Features with Dual?stream CNN for 3D Action Recognition | T. Huynh-The, C. Hua, N. A. Tu and D. Kim | This paper introduces a deep network configured by two parallel streams of convolutional stacks for fully learning the deep intra-frame joint associations and inter-frame joint correlations, wherein the structure of each stream is learned from Inception-v3. |
473 | A Deep Learning Approach to Object Affordance Segmentation | S. Thermos, P. Daras and G. Potamianos | In this paper, we propose a novel approach that exploits the spatio-temporal nature of human-object interaction for affordance segmentation. |
474 | Multi-View Shape Estimation of Transparent Containers | A. Xompero, R. Sanchez-Matilla, A. Modas, P. Frossard and A. Cavallaro | In this paper, we propose a method for jointly localising container-like objects and estimating their dimensions using two wide-baseline, calibrated RGB cameras. |
475 | Rethinking Temporal-Related Sample for Human Action Recognition | J. Wang, S. Li, Z. Duan and Z. Yuan | In this paper, our motivation is to address this issue by utilizing temporal information more effectively. |
476 | FDDWNet: A Lightweight Convolutional Neural Network for Real-Time Semantic Segmentation | J. Liu, Q. Zhou, Y. Qiang, B. Kang, X. Wu and B. Zheng | This paper introduces a lightweight convolutional neural network, called FDDWNet, for real-time accurate semantic segmentation. |
477 | Complex Pairwise Activity Analysis Via Instance Level Evolution Reasoning | S. Paul, C. Torres, S. Chandrasekaran and A. K. Roy-Chowdhury | This work introduces a novel method that jointly exploits relational information between pairs of objects and temporal dynamics of each object. |
478 | Scene Text Recognition with Temporal Convolutional Encoder | X. Du, T. Ma, Y. Zheng, H. Ye, X. Wu and L. He | In this paper, we study text recognition framework by considering the long-term temporal dependencies in the encoder stage. |
479 | Enhanced Action Tubelet Detector for Spatio-Temporal Video Action Detection | Y. Wu, H. Wang, S. Wang and Q. Li | To this end, a single stream network named enhanced action tubelet (EAT) detector is proposed in this work based on RGB stream. |
480 | Secure Face Recognition in Edge and Cloud Networks: From the Ensemble Learning Perspective | Y. Wang and T. Nakachi | In this paper, we develop a secure face recognition framework to orchestrate sparse coding in edge and cloud networks. |
481 | Low Complexity Single Image Super-Resolution with Channel Splitting and Fusion Network | M. Zou, J. Tang and G. Wu | In this paper, we propose a low complexity solution based on channel splitting and fusion network (CSFN) to address this problem. |
482 | Learning Spectral-Spatial Prior Via 3DDNCNN for Hyperspectral Image Deconvolution | X. Wang, J. Chen, C. Richard and D. Brie | In this paper, we use the alternating direction method of multipliers (ADMM) to decompose the optimization problem into iterative subproblems where the prior only appears in a denoising subproblem. |
483 | Dynamically Modulated Deep Metric Learning for Visual Search | D. Manandhar, M. Bastan and K. Yap | This paper proposes dynamically modulated metric learning (DMML) for learning a tiered similarity space to perform visual search. |
484 | Deep Residual Network for MSFA Raw Image Denoising | Z. Pan, B. Li, H. Cheng and Y. Bao | To overcome these challenges, we propose a new deep residual network designed to account for the uniqueness of MSFA mosaic patterns. |
485 | MSPNET: Multi-Supervised Parallel Network for Crowd Counting | B. Wei, Y. Yuan and Q. Wang | In this paper, we propose a multi-supervised parallel network (MSPNet) to achieve high accuracy of crowd counting and generate high-quality density maps. |
486 | Video Question Generation via Semantic Rich Cross-Modal Self-Attention Networks Learning | Y. Wang, H. Su, C. Chang, Z. Liu and W. H. Hsu | We introduce a novel task, Video Question Generation (Video QG). |
487 | Blind Hyperspectral Unmixing using Dual Branch Deep Autoencoder with Orthogonal Sparse Prior | Z. Dou, K. Gao, X. Zhang, H. Wang and J. Wang | In this paper, we propose a dual branch autoencoder with a novel sparse prior to simultaneously extract endmembers and abundances from the raw HSI. |
488 | Classification of Depth and Surface Edges with Deep Features | Z. Li and X. Wu | In this paper we study the problem of automatic classification of the two types of edges. |
489 | Learning to Characterize Adversarial Subspaces | X. Mao, Y. Chen, Y. Li, Y. He and H. Xue | To solve this problem, we propose a novel adversarial detection method which identifies adversaries by adaptively learning reasonable metrics to characterize adversarial subspaces. |
490 | Video Deblurring Via 3d CNN and Fourier Accumulation Learning | F. Yang, L. Xiao and J. Yang | In this paper, we propose a simple yet effective Fourier accumulation embedded 3D convolutional encoder-decoder network for video deblurring. |
491 | Enhanced Non-Local Cascading Network with Attention Mechanism for Hyperspectral Image Denoising | H. Ma, G. Liu and Y. Yuan | In this paper, a novel HSIs denoising algorithm based on an enhanced non-local cascading network with attention mechanism (ENCAM) is proposed, which can extract the joint spatial-spectral feature more effectively. |
492 | Quantized Tensor Robust Principal Component Analysis | A. Aidini, G. Tsagkatakis and P. Tsakalides | In this paper, we introduce a tensor robust principal component analysis algorithm in order to recover a tensor with real-valued entries from a partly observed set of quantized and sparsely corrupted entries. |
493 | A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling | S. Long, Y. Guan, K. Bian and C. Yao | To tackle these issues, we propose a pair of coupling modules, termed as Character Anchoring Module (CAM) and Anchor Pooling Module (APM), to extract high-level semantics from two-dimensional space to form feature sequences. |
494 | Hybrid Active Contour Driven by Double-Weighted Signed Pressure Force for Image Segmentation | X. Fu, B. Fang, M. Zhou and J. Li | In this paper, we proposed a novel hybrid active contour driven by double-weighted signed pressure force method for image segmentation. |
495 | Neural Coding Strategies for Event-Based Vision Data | S. Harrigan, S. Coleman, D. Kerr, P. Yogarajah, Z. Fang and C. Wu | This paper introduces three different neural coding scheme formations for event-based vision data which are designed to emulate the neural behaviour exhibited by neurons under stimuli. |
496 | Camera Configuration Design in Cooperative Active Visual 3d Reconstruction: A Statistical Approach | Q. An and Y. Shen | In this paper, we propose a statistical framework for the active visual 3D reconstruction. |
497 | Hand-3d-Studio: A New Multi-View System for 3d Hand Reconstruction | Z. Zhao, T. Wang, S. Xia and Y. Wang | This paper proposes a new system named as Hand-3D-Studio to capture the 3D hand pose and shape information. |
498 | Learning Endmember Dynamics in Multitemporal Hyperspectral Data Using A State-Space Model Formulation | L. Drumetz, M. D. Mura, G. Tochon and R. Fablet | In this paper, we propose a new framework for multitemporal unmixing and endmember extraction based on a state-space model, and present a proof of concept on simulated data to show how this representation can be used to inform multitemporal unmixing with external prior knowledge, or on the contrary to learn the dynamics of the quantities involved from data using neural network architectures adapted to the identification of dynamical systems. |
499 | Learning Eating Environments Through Scene Clustering | S. K. Yarlagadda, S. Baireddy, D. G?era, C. J. Boushey, D. A. Kerr and F. Zhu | In this paper, we propose an image clustering method to automatically extract the eating environments from eating occasion images captured during a community dwelling dietary study. |
500 | A Hybrid Structural Sparse Error Model for Image Deblocking | Z. Zha, X. Yuan, J. Zhou, C. Zhu and B. Wen | In this paper, we propose a novel hybrid structural sparse error (HSSE) model for image deblocking. |
501 | Reflectance-Guided, Contrast-Accumulated Histogram Equalization | X. Wu, T. Kawanishi and K. Kashino | To address this problem, we propose a histogram equalization-based method that adapts to the data-dependent requirements of brightness enhancement and improves the visibility of details without losing the global contrast. |
502 | Bilateral Recurrent Network for Single Image Deraining | W. Shang, P. Zhu, D. Ren and H. Shi | To address this issue, we in this paper propose bilateral recurrent network (BRN) to simultaneously exploit rain streak layer and background image layer. |
503 | Srzoo: An Integrated Repository For Super-Resolution Using Deep Learning | J. Choi, J. Kim and J. Lee | In this paper, we propose an integrated repository for the super-resolution tasks, named SRZoo, to provide state-of-the-art super-resolution models in a single place. |
504 | Sub-Dip: Optimization On A Subspace With Deep Image Prior Regularization And Application To Superresolution | A. Sagel, A. Roumy and C. Guillemot | This work proposes an alternative approach that relaxes this constraint and fully exploits all prior knowledge. |
505 | Parsing Map Guided Multi-Scale Attention Network For Face Hallucination | C. Wang, Z. Zhong, J. Jiang, D. Zhai and X. Liu | In this paper, we propose an effective two- step face hallucination method based on a deep neural network with multi-scale channel and spatial attention mechanism. |
506 | A Variational Bayesian Approach for Multichannel Through-Wall Radar Imaging with Low-Rank and Sparse Priors | V. H. Tang, A. Bouzerdoum and S. L. Phung | This paper considers the problem of multichannel through-wall radar (TWR) imaging from a probabilistic Bayesian perspective. |
507 | Semanticgan: Generative Adversarial Networks For Semantic Image To Photo-Realistic Image Translation | J. Liu, Y. Zou and D. Yang | To address those problems, we propose a SemanticGAN to synthesize high resolution image with fine details and realistic textures from the semantic label map. |
508 | Learning Blind Denoising Network for Noisy Image Deblurring | M. Chen, Y. Chang, S. Cao and L. Yan | In this work, we discover that the noise level and the denoiser is tightly coupled. |
509 | Pixel-Level Self-Paced Learning For Super-Resolution | W. Lin, J. Gao, Q. Wang and X. Li | To tackle this problem, this paper designs a training strategy named Pixel-level Self-Paced Learning (PSPL) to accelerate the convergence velocity of SISR models. |
510 | A Recursive Edge Detector For Color Filter Array Image | B. Magnier, A. Aberkane and N. Gorrity | In this paper, a new edge detection method is proposed for the computation of partial derivative images. |
511 | Image De-Raining Via RDL: When Reweighted Convolutional Sparse Coding Meets Deep Learning | J. He, L. Yu and W. Yang | Specifically, motivated by the success of reweighting algorithms, we propose solving the CSC model by learning weighted iterative soft thresholding algorithm (LwISTA) in a convolutional manner where the reweighted l1-norm is introduced. |
512 | CS-R-FCN: Cross-Supervised Learning for Large-Scale Object Detection | Y. Guo, Y. Li and S. Wang | In this paper, we present a novel cross-supervised learning pipeline for large-scale object detection, denoted as CS-R-FCN. |
513 | Dilated Convolutional Neural Networks for Panoramic Image Saliency Prediction | F. Dai, Y. Zhang, Y. Ma, H. Li and Q. Zhao | In this paper, we propose an encoder-decoder network for panoramic image saliency prediction. |
514 | Fine-Grained Action Recognition on a Novel Basketball Dataset | X. Gu, X. Xue and F. Wang | To tackle this issue, in this paper, we release a challenging dataset by annotating the fine-grained actions in basketball game videos. |
515 | Attention Guided Region Division for Crowd Counting | X. Pan, H. Mo, Z. Zhou and W. Wu | In this paper, we propose a two-branch network combining regression and detection. |
516 | Superpixel Segmentation Via Convolutional Neural Networks with Regularized Information Maximization | T. Suzuki | We propose an unsupervised superpixel segmentation method by optimizing a randomly-initialized convolutional neural network (CNN) in inference time. |
517 | Stacked Pooling for Boosting Scale Invariance of Crowd Counting | S. Huang, X. Li, Z. Cheng, Z. Zhang and A. Hauptmann | In this work, we take insight into the dense crowd counting problem by exploring the phenomenon of cross-scale visual similarity caused by perspective distortions. |
518 | GFNet: A Lightweight Group Frame Network for Efficient Human Action Recognition | H. Liu, L. Zhang, L. Guan and M. Liu | In this paper, we propose a lightweight neural network called Group Frame Network (GFNet) for human action recognition, which imposes intra-frame spatial information sparsity on spatial dimension in a simple yet effective way. |
519 | ROIMIX: Proposal-Fusion Among Multiple Images for Underwater Object Detection | W. Lin, J. Zhong, S. Liu, T. Li and G. Li | We propose an augmentation method called RoIMix, which characterizes interactions among images. |
520 | Tree of Shapes Cut for Material Segmentation Guided by a Design | J. Baderot, M. Desvignes, L. Condat and M. D. Mura | In this paper, we propose a method to segment different materials in a manufactured object. |
521 | Deep Flow Collaborative Network for Online Visual Tracking | P. Liu, X. Yan, Y. Jiang and S. Xia | In this paper, we propose an effective tracking algorithm to alleviate the time-consuming problem. |
522 | Salient Object Detection Based On Image Bit-Map | B. Cao, X. Meng, S. Zhu and B. Zengv | In this paper, we propose a novel salient object detection framework, which makes full use of the essential image compression. |
523 | A Novel Saliency-Driven Oil Tank Detection Method for Synthetic Aperture Radar Images | L. Zhang and C. Liu | In this paper, we propose a novel saliency-driven oil tank detection method (SDD) for SAR images. |
524 | Video Frame Interpolation Via Residue Refinement | H. Li, Y. Yuan and Q. Wang | In this paper, we propose a novel network structure that leverages residue refinement and adaptive weight to synthesize in-between frames. |
525 | Attention-Guided Deraining Network Via Stage-Wise Learning | K. Jiang et al. | To solve this problem, along with the stage-wise learning, we propose a novel attention-guided deraining network (ADN) for rain streak removal. |
526 | Attention-Mask Dense Merger (Attendense) Deep HDR for Ghost Removal | K. Metwaly and V. Monga | We propose a new deep HDR technique that does not need any explicit alignment of SDR images. |
527 | Y-Net: Multi-Scale Feature Aggregation Network With Wavelet Structure Similarity Loss Function For Single Image Dehazing | H. Yang, C. H. Yang and Y. James Tsai | In this paper, we propose a Y-net that is named for its structure. |
528 | Image Super-Resolution Using Residual Global Context Network | K. Liu, Z. Han, J. Chen, C. Liu, J. Chen and Z. Wang | To solve these problem,we introduce Global Context block (GCB) and design a comparative shallow network called Residual Global Context Networks (RGC-N). |
529 | Principle-Inspired Multi-Scale Aggregation Network for Extremely Low-Light Image Enhancement | J. Zhang, R. Liu, L. Ma, W. Zhong, X. Fan and Z. Luo | To address this issue, we develop a Principle-inspired Multi-scale Aggregation Network (PMA-Net) to simultaneously achieve the exposure enhancement and noises removal. |
530 | Non-Local Nested Residual Attention Network for Stereo Image Super-Resolution | W. Xie, J. Zhang, Z. Lu, M. Cao and Y. Zhao | To address this problem, in this paper, we propose a novel network named Non-local Nested Residual Attention Network (NNRANet). |
531 | Opendenoising: An Extensible Benchmark for Building Comparative Studies of Image Denoisers | F. Lemarchand, E. F. Montesuma, M. Pelcat and E. Nogues | This paper proposes a comparative study of existing denoisers, as well as an extensible open tool that makes it possible to reproduce and extend the study. |
532 | SDTCN: Similarity Driven Transmission Computing Network for Image Dehazing | L. Zhang, S. Wang and X. Wang | In this paper, we propose a novel light-weight similarity driven transmission computing network called SDTCN that is guided by the attributes of transmission similarity. |
533 | Joint Enhancement And Denoising of Low Light Images Via JND Transform | L. Yu, H. Su and C. Jung | In this paper, we propose joint enhancement and denoising of low light images via just-noticeable-difference (JND) transform. |
534 | Adversarial Text Image Super-Resolution using Sinkhorn Distance | C. Geng, L. Chen, X. Zhang and Z. Gao | In this paper, instead of using the Lp-norm as the supervision metric, we propose a novel one for better preserving semantic information in text images. |
535 | ADRN: Attention-Based Deep Residual Network for Hyperspectral Image Denoising | Y. Zhao, D. Zhai, J. Jiang and X. Liu | In this paper, we propose an attention-based deep residual network to directly learn a mapping from noisy HSI to the clean one. |
536 | Weakly Supervised Segmentation Guided Hand Pose Estimation During Interaction with Unknown Objects | C. Zhang, G. Wang, X. Chen, P. Xie and T. Yamasaki | To alleviate the influence of unknown objects, we propose a novel weakly supervised segmentation guided scheme to estimate hand poses. |
537 | Sparse Directed Graph Learning for Head Movement Prediction in 360 Video Streaming | X. Zhang, G. Cheung, P. Le Callet and J. Z. G. Tan | In this paper, we cast the head movement prediction task as a sparse directed graph learning problem, where three sources of relevant information?a 360 image saliency map, collected viewers? head movement traces, and a biological head rotation model?are aggregated into a unified Markov model. |
538 | Tracking to Improve Detection Quality in Lidar For Autonomous Driving | J. Tang, A. Yellepeddi, S. Demirtas and C. Barber | In this work, we leverage information from multiple consecutive frames to improve the detection capabilities of Lidar systems. |
539 | Cartoon-Texture Decomposition-Based Variational Pansharpening | Y. Chen, M. Zhang, S. Li, Z. Wang and X. Tian | In this paper, the similarities of MS and PAN images in cartoon-texture space are exploited. |
540 | An Alternative Signature Design Using L1 Principal Components for Spread-Spectrum Steganography | C. P. Bailey, S. Chamadia and D. A. Pados | This paper introduces novel spread spectrum (SS) and improved spread spectrum (ISS) multimedia data embedding techniques using L1 principal component signatures. |
541 | Privacy-Preserving Pattern Recognition Using Encrypted Sparse Representations in L0 Norm Minimization | T. Nakachi, Y. Wang and H. Kiya | In this paper, we propose a privacy-preserving pattern recognition method that uses encrypted sparse representations in L0 norm minimization. |
542 | Flexibly-Tunable Bitcube-Based Perceptual Encryption Within Jpeg Compression | S. Shimizu and T. Suzuki | We propose a perceptual encryption within JPEG compression (EWJ). |
543 | Gyroscope Aided Video Stabilization Using Nonlinear Regression on Special Orthogonal Group | X. Hu, D. Olesen and P. Knudsen | This paper presents a novel approach for gyroscope aided video stabilization. |
544 | Bband Index: a No-Reference Banding Artifact Predictor | Z. Tu, J. Lin, Y. Wang, B. Adsumilli and A. C. Bovik | Here we study this artifact, and propose a new distortion-specific no-reference video quality model for predicting banding artifacts, called the Blind BANding Detector (BBAND index). |
545 | Lqaid: Localized Quality Aware Image Denoising Using Deep Convolutional Neural Networks | S. V. Reddy Dendi, C. Dev, N. Kothari and S. S. Channappayya | In this paper we propose the Localized Quality Aware Image Denoising (LQAID) technique for image denoising using deep convolutional neural networks (CNNs). |
546 | Weakly Supervised Crowd-Wise Attention For Robust Crowd Counting | X. Kong, M. Zhao, H. Zhou and C. Zhang | In this paper, we propose a novel robust crowd counting method by introducing a weakly supervised crowd-wise attention network. |
547 | Xpsnr: A Low-Complexity Extension of The Perceptually Weighted Peak Signal-To-Noise Ratio For High-Resolution Video Quality Assessment | C. R. Helmrich, M. Siekmann, S. Becker, S. Bosse, D. Marpe and T. Wiegand | In this paper we show that, by way of low-complexity enhancements of our previous work on a perceptually weighted PSNR (WPSNR) metric, addressing shortcomings with video and ultra high-definition content, the prediction of human judgments of video coding quality by the WPSNR can be improved. |
548 | Non-Experts or Experts? Statistical Analyses of MOS using DSIS method | Y. Sugito and M. Bertalm?o | In this study, we analyze the results of several subjective evaluation experiments using the DSIS method. |
549 | Full Reference Video Quality Measures Improvement Using Neural Networks | L. F. Tiotsop, A. Servetti and E. Masala | To address such a situation, we propose a machine learning based improvement for each of the VQMs considered in this work and a video quality metric fusion index (VQMFI) that jointly exploits all the VQMs considered in the study as well as spatiotemporal features to produce a better estimation of the subjective quality. |
550 | Learning Multi-Scale Attentive Features for Series Photo Selection | J. Huang, C. Cui, C. Zhang, Z. Shen, J. Yu and Y. Yin | In this paper, we develop a novel deep CNN architecture that aggregates multi-scale features from different network layers, in order to capture the subtle differences between series photos. |
551 | Spatio-Temporal and Geometry Constrained Network for Automobile Visual Odometry | H. Liu, P. Wei, W. Huang, G. Hua and F. Meng | To overcome these deficiencies, an end-to-end framework that leverages spatio-temporal relevance and geometrical knowledge is proposed. |
552 | A Comprehensive Framework for 2D-JND Extension to 360-DEG Images | S. Jaballah, A. Bhavsar and M. Larabi | In this paper, a novel framework to extend 2D-JND models to estimate thresholds for 360-degree images is proposed. |
553 | Composite Dynamic Texture Synthesis Using Hierarchical Linear Dynamical System | R. Singh, S. Yu and J. C. Principe | We demonstrate that a systematic inclusion of prior structural constraints on the states of a linear dynamical system significantly improves its ability to model complex multidimensional sequences. |
554 | Steganography and its Detection in JPEG Images Obtained with the “TRUNC” Quantizer | J. Butora and J. Fridrich | Many portable imaging devices use the operation of “trunc” (rounding towards zero) instead of rounding as the final quantizer for computing DCT coefficients during JPEG compression. We show that this has rather profound consequences for steganography and its detection. |
555 | JPEG Steganography with Side Information from the Processing Pipeline | Q. Giboulot, R. Cogranne and P. Bas | In this paper, we propose a method to better estimate the variances of DCT coefficients by taking into account the dependencies between pixels that come from the development pipeline. |
556 | Selection-Channel-Aware Reverse JPEG Compatibility for Highly Reliable Steganalysis of JPEG Images | R. Cogranne | This paper deeply studies the principle of the recent reverse JPEG compatibility attack [1]. |
557 | EMET: Embeddings from Multilingual-Encoder Transformer for Fake News Detection | S. Schwarz, A. The?philo and A. Rocha | This paper proposes an end-to-end framework called EMET to classify the reliability of small messages posted on social media platforms. |
558 | Depth Map Fingerprinting and Splicing Detection | F. Matern, C. Riess and M. Stamminger | In this work, we propose to use characteristic artifacts of depth reconstruction algorithms as trace for forensic analysis. |
559 | A Framework for Parameters Estimation of Image Operator Chain | X. Liao and Z. Huang | In this paper, we propose a new method to estimate the parameters of operations in different manipulation chains. |
560 | Privacy-Preserving Phishing Web Page Classification Via Fully Homomorphic Encryption | E. J. Chou, A. Gururajan, K. Laine, N. K. Goel, A. Bertiger and J. W. Stokes | This work introduces a fast and lightweight homomorphic-encryption pipeline that enables privacy-preserving machine learning for phishing web page recognition. |
561 | Privacy-Preserving Image Sharing Via Sparsifying Layers on Convolutional Groups | S. Ferdowsi, B. Razeghi, T. Holotyak, F. P. Calmon and S. Voloshynovskiy | We propose a practical framework to address the problem of privacy-aware image sharing in large-scale setups. |
562 | Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers | B. M. Lal Srivastava, N. Vauquier, M. Sahidullah, A. Bellet, M. Tommasi and E. Vincent | In this paper, we investigate anonymization methods based on voice conversion. |
563 | Low-Complexity and Reliable Transforms for Physical Unclonable Functions | O. G?nl? and R. F. Schaefer | A new set of low-complexity and orthogonal transforms with no multiplication is proposed to obtain bit-error probability results significantly better than all methods previously proposed for key binding with PUFs. |
564 | Adversarial Detection of Counterfeited Printable Graphical Codes: Towards “Adversarial Games” In Physical World | O. Taran, S. Bonev, T. Holotyak and S. Voloshynovskiy | This paper addresses a problem of anti-counterfeiting of physical objects and aims at investigating a possibility of counterfeited printable graphical code detection from a machine learning perspectives. |
565 | Phylogenetic Minimum Spanning Tree Reconstruction Using Autoencoders | R. Castelletto, S. Milani and P. Bestagini | This paper proposes a matrix denoising solution that both mitigates dissimilarity noise and reconstruct the desired phylogenetic tree at the same time, |
566 | FCEM: A Novel Fast Correlation Extract Model For Real Time Steganalysis Of VoIP Stream Via Multi-Head Attention | H. Yang, Z. Yang, Y. Bao, S. Liu and Y. Huang | In this paper, we utilized attention mechanisms, which have recently attracted enormous interests due to their highly parallelizable computation and flexibility in modeling correlation in sequence, to tackle steganalysis problem of Quantization Index Modulation (QIM) based steganography in compressed VoIP stream. |
567 | Approaching Optimal Embedding In Audio Steganography With GAN | J. Yang, H. Zheng, X. Kang and Y. Shi | In this work, we proposed a framework based on Generative Adversarial Network (GAN) to approach optimal embedding for audio steganography in the temporal domain. |
568 | Multi-Stage Residual Hiding for Image-Into-Audio Steganography | W. Cui, S. Liu, F. Jiang, Y. Liu and D. Zhao | In this paper, we present a cross-modal steganography method for hiding image content into audio carriers while preserving the perceptual fidelity of the cover audio. |
569 | Patch-Level Selection and Breadth-First Prediction Strategy for Reversible Data Hiding | H. Wu | It motivates us to introduce a novel patch-level selection and breadth-first prediction strategy for efficient reversible data hiding. |
570 | Digital Watermarking For Protecting Audio Classification Datasets | W. Kim and K. Lee | In this study, we investigate the possibility of protecting audio classification datasets used in deep learning by embedding a pattern in the magnitude of the time-frequency representation of a subset of the dataset. |
571 | Saliency-Based Image Contrast Enhancement with Reversible Data Hiding | S. Yang | In this paper, a new approach is proposed by introducing the visual saliency into the process of reversible data hiding. |
572 | Unseen Face Presentation Attack Detection with Hypersphere Loss | Z. Li, H. Li, K. Lam and A. C. Kot | In this paper, we formulate the face presentation attack detection task under an open-set setting and address with our proposed deep anomaly detection based method. |
573 | Texception: A Character/Word-Level Deep Learning Model for Phishing URL Detection | F. Tajaddodianfar, J. W. Stokes and A. Gururajan | In this work, we propose a novel deep learning architecture, Texception, that takes a URL as input and predicts whether it belongs to a phishing attack. |
574 | Cell-Phone Classification: A Convolutional Neural Network Approach Exploiting Electromagnetic Emanations | B. B. Yilmaz, E. Mert Ugurlu, A. Zajic and M. Prvulovic | In this paper, we propose a methodology to identify both the brand of a cell-phone, and the status of its camera by exploiting electromagnetic (EM) emanations. |
575 | Quality-of-Service Prediction for Physical-layer Security via Secrecy Maps | M. A. Gutierrez-Estevez et al. | Building upon this concept, in this work we focus on system design-related aspects which consider physical-layer security as a service, together with thereby associated secrecy-related Quality-of-Service (QoS) requirements. |
576 | Secure Identification for Gaussian Channels | W. Labidi, C. Deppe and H. Boche | We focus on Gaussian channels for their known practical relevance. |
577 | Anti-Jamming Routing For Internet of Satellites: a Reinforcement Learning Approach | C. Han, A. Liu, L. Huo, H. Wang and X. Liang | This paper investigates anti-jamming routing scheme for heterogeneous IoS, with the aim of minimizing anti-jamming routing cost. |
578 | Electro-Magnetic Side-Channel Attack Through Learned Denoising and Classification | F. Lemarchand, C. Marlin, F. Montreuil, E. Nogues and M. Pelcat | This paper proposes an upgraded Electro Magnetic (EM) sidechannel attack that automatically reconstructs the intercepted data. |
579 | Detection of Malicious Vbscript Using Static and Dynamic Analysis with Recurrent Deep Learning | J. W. Stokest, R. Agrawal and G. McDonald | In this study, we explore a system that employs both static and dynamic analysis to detect malicious VBScripts. |
580 | Dynamic Attack Scoring Using Distributed Local Detectors | Z. Zohrevand and U. Gl?sser | This paper presents a novel study on detecting cyberattacks against distributed supervisory control systems. |
581 | Hijacking Tracker: A Powerful Adversarial Attack on Visual Tracking | X. Yan, X. Chen, Y. Jiang, S. Xia, Y. Zhao and F. Zheng | In this paper, we propose to add slight adversarial perturbations to the input image by an inconspicuous but powerful attack strategy?hijacking algorithm. |
582 | AdvMS: A Multi-Source Multi-Cost Defense Against Adversarial Attacks | X. Wang, S. Wang, P. Chen, X. Lin and P. Chin | In this paper, we study principles of designing multi-source and multi-cost schemes where defense performance is boosted from multiple defending components. |
583 | Classifying Anomalies for Network Security | E. H. Do and V. N. Gadepally | This work outlines a machine learning technique that uses deep neural networks to detect and classify a variety of network attacks. |
584 | A Switching Transmission Game with Latency as the User?s Communication Utility | A. Garnaev, A. Petropulu, W. Trappe and H. V. Poor | We consider the communication between a source (user) and a destination in the presence of a jammer, and study resource assignment in a non-cooperative game theory framework using communication latency as the user?s utility. |
585 | An Efficient Methodology to De-Anonymize the 5G-New Radio Physical Downlink Control Channel | B. E. Gardner and J. D. Roth | This paper presents an efficient approach to negating current anonymization methods in a computationally efficient manner. |
586 | Joint Learning of Assignment and Representation for Biometric Group Membership | M. Gheisari, T. Furon and L. Amsaleg | This paper proposes a framework for group membership protocols preventing the curious but honest server from reconstructing the enrolled biometric signatures and inferring the identity of querying clients. |
587 | Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning | B. Xin, W. Yang, Y. Geng, S. Chen, S. Wang and L. Huang | To address this issue, we propose private FL-GAN, a differential privacy generative adversarial network model based on federated learning. |
588 | Augmentation Data Synthesis Via Gans: Boosting Latent Fingerprint Reconstruction | Y. Xu, Y. Wang, J. Liang and Y. Jiang | To address these challenges, we propose a novel generative adversarial network (GAN) based data augmentation scheme to improve such reconstruction. |
589 | Learning to Fool the Speaker Recognition | J. Li et al. | In this paper, we attempt to fool the state-of-the-art speaker recognition model and present speaker recognition attacker, a lightweight model to fool the deep speaker recognition model by adding imperceptible perturbations onto the raw speech waveform. |
590 | Ts-Fen: Probing Feature Selection Strategy for Face Anti-Spoofing | D. Peng, J. Xiao, R. Zhu and G. Gao | In this paper, we propose a novel Two-Stream Feature Extraction Network (TS-FEN) based on depth and chrominance cues, guiding both sparsity and density of the feature distribution. |
591 | Improving Cross-Dataset Performance of Face Presentation Attack Detection Systems Using Face Recognition Datasets | A. Mohammadi, S. Bhattacharjee and S. Marcel | Here, we propose a novel PAD method that leverages the large variability present in FR datasets to induce invariance to factors that cause domain-shift. |
592 | SSTNet: Detecting Manipulated Faces Through Spatial, Steganalysis and Temporal Features | X. Wu, Z. Xie, Y. Gao and Y. Xiao | In this work, we propose a novel manipulation detection framework, named SSTNet, which detects tampered faces through Spatial, Steganalysis and Temporal features. |
593 | Multimodal Violence Detection in Videos | B. Peixoto, B. Lavi, P. Bestagini, Z. Dias and A. Rocha | In this study, we propose to analyze different concepts related to violence and how to better describe these concepts exploring visual and auditory cues in order to reach a robust method to detect violence. |
594 | Open Set Video Camera Model Verification | O. Mayer, B. Hosler and M. C. Stamm | In this work we propose a new, video-specific system for open set verification of camera models. We introduce a new open set video forensics problem called video camera model verification. |
595 | Multi-Patch Aggregation Models for Resampling Detection | M. Lamba and K. Mitra | To handle this issue, we propose two novel deep neural networks ? Iterative Pooling Network (IPN), which does not assume any prior information about the original image size, and Branched Network (BN), which uses this prior knowledge to produce better results. |
596 | Improving the Chronological Sorting of Images through Occlusion: A Study on the Notre-Dame Cathedral Fire | R. Padilha, F. A. Andal? and A. Rocha | In this work, we train a data-driven method to chronologically sort images originated from a real event, the Notre-Dame Cathedral fire, which broke out on April 15th, 2019. |
597 | Effectiveness of Random Deep Feature Selection for Securing Image Manipulation Detectors Against Adversarial Examples | M. Barni, E. Nowroozi, B. Tondi and B. Zhang | We investigate if the random feature selection approach proposed in [1] to improve the robustness of forensic detectors to targeted attacks, can be extended to detectors based on deep learning features. |
598 | A Dense U-Net with Cross-Layer Intersection for Detection and Localization of Image Forgery | R. Zhang and J. Ni | In this paper, we apply cross-layer intersection mechanism to dense u-net for image forgery detection and localization. |
599 | Fast Start-Up Algorithm for Adaptive Noise Cancellers with Novel SNR Estimation and Stepsize Control | A. Sugiyama | This paper proposes a fast convergence algorithm for adaptive noise cancellers with novel SNR (signal-to-noise ratio) estimation and stepsize control. |
600 | Robust and Computationally-Efficient Anomaly Detection Using Powers-Of-Two Networks | U. Muneeb, E. Koyuncu, Y. Keshtkarjahromd, H. Seferoglu, M. F. Erden and A. Enis Cetin | We propose a technique to increase robustness and reduce computational complexity in a Convolutional Neural Network (CNN) based anomaly detector that utilizes the optical flow information of video data. |
601 | Semi-Supervised Optimal Transport Methods for Detecting Anomalies | A. Alaoui-Belghiti, S. Chevallier, E. Monacelli, G. Bao and E. Azabou | Building upon advances on optimal transport and anomaly detection, we propose a generalization of an unsupervised and automatic method for detection of significant deviation from reference signals. |
602 | Learning to Estimate Driver Drowsiness from Car Acceleration Sensors Using Weakly Labeled Data | T. Katsuki, K. Zhao and T. Yoshizumi | By assuming that some aspects of driver drowsiness increase over time due to tiredness, we formulate an algorithm that can learn from such weakly labeled data. |
603 | Damage-Sensitive and Domain-Invariant Feature Extraction for Vehicle-Vibration-Based Bridge Health Monitoring | J. Liu, B. Chen, S. Chen, M. Berg?s, J. Bielak and H. Noh | We introduce a physics-guided signal processing approach to extract a damage-sensitive and domain-invariant (DS & DI) feature from acceleration response data of a vehicle traveling over a bridge to assess bridge health. |
604 | On Robust Variance Filtering and Change of Variance Detection | Q. Wen, Z. Ma and L. Sun | To deal with these challenges, we propose a robust CoV detection algorithm based on robust statistics and sparse regularizations. |
605 | Low-Rank Gradient Approximation for Memory-Efficient on-Device Training of Deep Neural Network | M. Gooneratne, K. C. Sim, P. Zadrazil, A. Kabel, F. Beaufays and G. Motta | In this paper, we propose approximating the gradient matrices of deep neural networks using a low-rank parameterization as an avenue to save training memory. |
606 | Improving Efficiency in Large-Scale Decentralized Distributed Training | W. Zhang et al. | In this paper, we investigate techniques to accelerate (A)D-PSGD based training by improving the spectral gap while minimizing the communication cost. |
607 | Parallelizing Adam Optimizer with Blockwise Model-Update Filtering | K. Chen, H. Ding and Q. Huo | In this paper, we attempt to parallelize Adam with blockwise model-update filtering (BMUF) instead. |
608 | Joint Training of Deep Neural Networks for Multi-Channel Dereverberation and Speech Source Separation | M. Togami | In this paper, we propose a joint training of two deep neural networks (DNNs) for dereverberation and speech source separation. |
609 | Structural Sparsification for Far-Field Speaker Recognition with Intel? Gna | J. Zhang, J. Huang, M. Deisher, H. Li and Y. Chen | In this paper, we apply structural sparsification on time-delay neural networks (TDNN) to remove redundant structures and accelerate the execution. |
610 | Environment-Aware Reconfigurable Noise Suppression | J. Yang and J. Bingham | The paper proposes an efficient, robust, and reconfigurable technique to suppress various types of noises for any sampling rate. |
611 | Fully-Neural Approach to Heavy Vehicle Detection on Bridges Using a Single Strain Sensor | T. Kawakatsu, K. Aihara, A. Takasu and J. Adachi | In this paper, we propose a novel BWIM mechanism, which employs a deep convolutional neural network (CNN). |
612 | Multichannel Signal Processing for Road Surface Identification | G. Safont, A. Salazar, A. Rodriguez and L. Vergara | This work introduces a multi-sensor road surface identification system that considers features from four different kind of sensors: microphones, accelerometers, speed signals, and handwheel signals. |
613 | A Monte Carlo Search-Based Triplet Sampling Method for Learning Disentangled Representation of Impulsive Noise on Steering Gear | S. Bu, N. Park, G. Nam, J. Seo and S. Cho | In this paper, we propose a method to overcome the above two major hurdles by modify a sampling algorithm of triplet pairs based on structural similarity index instead of naive Euclidean distance within Monte Carlo based sampling strategy. |
614 | Stochastic Geometry Planning of Electric Vehicles Charging Stations | R. Atat, M. Ismail and E. Serpedin | In this paper, using stochastic geometry, we formulate the CSs planning on a stochastic geometry-based power grid model, that we previously showed to mimic real-world power grids. |
615 | Discriminant Generative Adversarial Networks with its Application to Equipment Health Classification | S. Zheng and C. Gupta | We propose discriminant GANs, where, for generated samples, we maximize between-class distance and minimize within-class distance, so that generated samples in different classes are more separable and different betweenclass distances are explicitly allowed. |
616 | Power Optimization Using Embedded Automatic Gain Control Algorithm with Photoplethysmography Signal Quality Classification | F. Foroozan, D. D. Xue, K. Fang and J. S. James Wu | This paper presents the design and implementation of an Automatic Gain Control (AGC) embedded algorithm for photoplethysmographic (PPG) sensors. |
617 | A General Difficulty Control Algorithm for Proof-of-Work Based Blockchains | S. Zhang and X. Ma | This paper proposes a general difficulty control algorithm and provides insights for difficulty adjustment rules for PoW based blockchains. |
618 | A New Application of Ultrasound Signal Processing for Archaeological Ceramic Classification | A. Salazar, G. Safont and L. Vergara | This work proposes a novel system that overcomes those limitations using non-destructive ultrasonic testing and advanced pattern recognition techniques. |
619 | Headless Horseman: Adversarial Attacks on Transfer Learning Models | A. Abdelkader et al. | We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these headless attacks. |
620 | Detecting Adversarial Attacks In Time-Series Data | M. G. Abdu-Aguye, W. Gomaa, Y. Makihara and Y. Yagi | We propose a method for detecting Fast Gradient Sign Method (FGSM) and Basic Iterative Method (BIM) adversarial attacks as adapted for time-series data. |
621 | Detection of Adversarial Attacks and Characterization of Adversarial Subspace | M. Esmaeilpour, P. Cardinal and A. L. Koerich | In this paper, we explore subspaces of adversarial examples in unitary vector domain, and we propose a novel detector for defending our models trained for environmental sound classification. |
622 | Adversarial Example Detection by Classification for Deep Speech Recognition | S. Samizade, Z. Tan, C. Shen and X. Guan | We, however, formulate the defense as a classification problem and present a strategy for systematically generating adversarial example datasets: one for white-box attacks and one for black-box attacks, containing both adversarial and normal examples. |
623 | Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement | C. Yang, J. Qi, P. Chen, X. Ma and C. Lee | In this work, we present a U-Net based attention model, UNetAt, to enhance adversarial speech signals. |
624 | Action-Manipulation Attacks on Stochastic Bandits | G. Liu and L. Lai | In this paper, we propose a new class of attack named action-manipulation attack, where an adversary can change the action signal selected by the user. |
625 | Primal-Dual Stochastic Subgradient Method For Log-Determinant Optimization | S. Wu, H. Yu and J. Dauwels | This paper proposes a highly efficient stochastic method that has time complexity $\mathcal{O}$(N2), whereas existing methods have complexity $\mathcal{O}$(N3). |
626 | Neural Network Training with Approximate Logarithmic Computations | A. Sanyal, P. A. Beerel and K. M. Chugg | This paper proposed an end-to-end training and inference scheme that eliminates multiplications by approximate operations in the log-domain which has the potential to significantly reduce implementation complexity. |
627 | Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient-based Optimization Methods | T. Lancewicki and S. Kopru | In this paper, we propose a generic approach that utilizes the statistics of an unbiased gradient estimator to automatically and simultaneously adjust two paramount hyperparameters: the learning rate and momentum. |
628 | A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models | N. Azizan, S. Lale and B. Hassibi | We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. |
629 | On Distributed Stochastic Gradient Descent for Nonconvex Functions in the Presence of Byzantines | S. Bulusu, P. Khanduri, P. Sharma and P. K. Varshney | We propose a robust variant of distributed SGD which is resilient to the presence of Byzantines. |
630 | Preconditioning ADMM for Fast Decentralized Optimization | M. Ma and G. B. Giannakis | In this work, we consider the distributed optimization problem using networked computing machines. |
631 | Extrapolated Alternating Algorithms for Approximate Canonical Polyadic Decomposition | A. M. Shun Ang, J. E. Cohen, L. T. Khanh Hien and N. Gillis | In this work, we propose several algorithms based on extrapolation that improve over existing alternating methods for aCPD. |
632 | Scalable Kernel Learning Via the Discriminant Information | M. Al, Z. Hou and S. Kung | By generalizing this measure to cover a wider range of kernel maps and learning settings, we develop scalable methods to learn kernel features with high discriminant power. |
633 | Arsm Gradient Estimator for Supervised Learning to Rank | S. Z. Dadaneh, S. Boluki, M. Zhou and X. Qian | We propose a new model for supervised learning to rank. |
634 | Solving Non-Convex Non-Differentiable Min-Max Games Using Proximal Gradient Method | B. Barazandeh and M. Razaviyayn | In this work, we study special form of non-smooth min-max games when the objective function is (strongly) convex with respect to one of the player?s decision variable. |
635 | A Fast and Accurate Frequent Directions Algorithm for Low Rank Approximation via Block Krylov Iteration | Q. Yi, C. Wang, X. Liao and Y. Wang | Specifically, by utilizing the power of Block Krylov Iteration and count sketch techniques, we propose a fast and accurate FD algorithm dubbed as BKICS-FD. |
636 | Stochastic Admm For Byzantine-Robust Distributed Learning | F. Lin, Q. Ling, W. Li and Z. Xiong | In this paper, we aim at solving a distributed machine learning problem under Byzantine attacks. |
637 | Unified Signal Compression Using Generative Adversarial Networks | B. Liu, A. Cao and H. Kim | We propose a unified compression framework that uses generative adversarial networks (GAN) to compress image and speech signals. |
638 | Wind: Wasserstein Inception Distance For Evaluating Generative Adversarial Network Performance | P. Dimitrakopoulos, G. Sfikas and C. Nikou | In this paper, we present Wasserstein Inception Distance (WInD), a novel metric for evaluating performance of Generative Adversarial Networks (GANs). |
639 | Trace Norm Generative Adversarial Networks for Sensor Generation and Feature Extraction | S. Zheng and C. Gupta | In this work, we propose Trace Norm GANs to enforce trace norm minimization on generated samples. |
640 | Mahalanobis Distance Based Adversarial Network for Anomaly Detection | Y. Hou, Z. Chen, M. Wu, C. Foo, X. Li and R. M. Shubair | This paper proposes an efficient method, known as Mahalanobis Distance-based Adversarial Network (MDAN), for anomaly detection. |
641 | Commuting Conditional GANS for Multi-Modal Fusion | S. Roheda, H. Krim and B. S. Riggan | This paper presents a data driven approach to multi-modal fusion where a hidden latent sub-space between the different modalities is learned. |
642 | Sequence-To-Subsequence Learning With Conditional Gan For Power Disaggregation | Y. Pan, K. Liu, Z. Shen, X. Cai and Z. Jia | In this paper, we propose a sequence-to-subsequence learning method, which makes a trade-off between traditional sequence-to-sequence and sequence-to-point method, to balance the convergence difficulty in deep neural networks and the amount of computation in the inference period. |
643 | A Bin Encoding Training of a Spiking Neural Network Based Voice Activity Detection | G. Dellaferrera, F. Martinelli and M. Cernak | In this work, we develop a SNN-based Voice Activity Detection (VAD) system that belongs to the building blocks of any audio and speech processing system. |
644 | ECG Heartbeat Classification Based on Multi-Scale Wavelet Convolutional Neural Networks | L. E. Bouny, M. Khalil and A. Adib | This paper proposes a novel Deep Learning technique for ECG beats classification. |
645 | Self-Supervised Learning for ECG-Based Emotion Recognition | P. Sarkar and A. Etemad | We present an electrocardiogram (ECG) -based emotion recognition system using self-supervised learning. |
646 | Expression-Guided EEG Representation Learning for Emotion Recognition | S. Rayatdoost, D. Rudrauf and M. Soleymani | In this paper, we propose a deep representation learning approach for emotion recognition from electroencephalogram (EEG) signals guided by facial electromyogram (EMG) and electrooculogram (EOG) signals. |
647 | Attention Driven Fusion for Multi-Modal Emotion Recognition | D. Priyasad, T. Fernando, S. Denman, S. Sridharan and C. Fookes | In this paper, we present a deep learning-based approach to exploit and fuse text and acoustic data for emotion classification. |
648 | Learning the Spatio-Temporal Dynamics of Physical Processes from Partial Observations | I. Ayed, E. d. B?zenac, A. Pajot and P. Gallinari | We propose a data-driven framework, where the system?s dynamics are modeled by an unknown time-varying differential equation and the evolution term for the state is estimated from the partially observed data only, using a deep convolutional neural network. |
649 | Optimal Laplacian Regularization for Sparse Spectral Community Detection | L. Dall?Amico, R. Couillet and N. Tremblay | In this paper we formally determine a proper regularization which is intimately related to alternative state-of-the-art spectral techniques for sparse graphs. |
650 | Anomaly Detection in Mixed Time-Series Using A Convolutional Sparse Representation With Application To Spacecraft Health Monitoring | B. Pilastre, G. Silva, L. Boussouf, S. d?Escrivan, P. Rodr?guez and J. Tourneret | This paper introduces a convolutional sparse model for anomaly detection in mixed continuous and discrete data. |
651 | Variational Student: Learning Compact and Sparser Networks In Knowledge Distillation Framework | S. Hegde, R. Prasad, R. Hebbalaguppe and V. Kumar | To this end, we propose Variational Student where we reap the benefits of compressibility of the knowledge distillation framework, and sparsity inducing abilities of variational inference (VI) techniques. |
652 | Low Rank Activations for Tensor-Based Convolutional Sparse Coding | P. Humbert, J. Audiffren, L. Oudre and N. Vayatis | In this article, we propose to extend the classical Convolutional Sparse Coding model (CSC) to multivariate data by introducing a new tensor CSC model that enforces sparsity and low-rank constraint on the activations. |
653 | Energy Disaggregation Using Fractional Calculus | P. A. Schirmer and I. Mporas | In this article we introduce the use of fractional calculus in the Non-Intrusive Load Monitoring task. |
654 | Dyna-Bolt: Domain Adaptive Binary Factorization Of Current Waveforms For Energy Disaggregation | B. Chen, J. Liu, H. Lange and M. Berg?s | To alleviate this problem, we formulated NILM in a domain adaptation context. Using Binary OnLine FactorizaTion (BOLT) as the baseline model, we first demonstrate that direct application of domain adversarial training without application-specific modifications is unsuccessful, and hypothesize that this may be due to differences in the data distributions. |
655 | Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis | T. Hu, A. Shrivastava, O. Tuzel and C. Dhir | We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i.e., no style annotation, such as speaker information, is required. |
656 | Improving Singing Voice Separation with the Wave-U-Net Using Minimum Hyperspherical Energy | J. Perez-Lapillo, O. Galkin and T. Weyde | In this work, we apply MHE regularization to the 1D filters of the Wave-U-Net. |
657 | Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders | Y. Luo, C. Hsu, K. Agres and D. Herremans | We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. |
658 | Online Tensor Completion and Free Submodule Tracking With The T-SVD | K. Gilman and L. Balzano | We propose a new online algorithm, called TOUCAN, for the tensor completion problem of imputing missing entries of a low tubal-rank tensor using the tensor-tensor product (t- product) and tensor singular value decomposition (t-SVD) algebraic framework. |
659 | Exploiting Commutativity Condition for CP Decomposition Via Approximate Simultaneous Diagonalization | R. Akema, M. Yamagishi and I. Yamada | In this paper, we propose a novel strategy which utilizes an inherent algebraic property of simultaneously diagonalizable matrix tuples, i.e., commutativity, for both (i) reducing approximate CP decomposition of a higher-order tensor to Approximate Simultaneous Diagonalization (ASD) and (ii) solving the ASD. |
660 | A Novel Rank Selection Scheme in Tensor Ring Decomposition Based on Reinforcement Learning for Deep Neural Networks | Z. Cheng, B. Li, Y. Fan and Y. Bao | To overcome this issue, we propose a novel rank selection scheme, which is inspired by reinforcement learning, to automatically select ranks in recently studied tensor ring decomposition in each convolutional layer. |
661 | Estimating Structural Missing Values Via Low-Tubal-Rank Tensor Completion | H. Wang, F. Zhang, J. Wang and Y. Wang | In this paper, we propose to incorporate a constraint item on the missing values into low-tubal-rank tensor completion to promote the structural hypothesis of the missing values such as sparsity. |
662 | Low-Tubal-Rank Tensor Recovery From One-Bit Measurements | J. Hou, F. Zhang, Y. Wang and J. Wang | This paper focuses on the recovery of low-tubal-rank tensors from binary measurements under the frame of tensor Singular Value Decomposition. |
663 | Continual Learning Through One-Class Classification Using VAE | F. Wiewel, A. Brendle and B. Yang | In this paper, we propose a new method for overcoming this phenomenon based on one-class classification. |
664 | Estimation of Post-Nonlinear Causal Models Using Autoencoding Structure | K. Uemura and S. Shimizu | In this paper, we proposed a new estimation method of PNL model using an autoencoding structure. |
665 | From Symbols to Signals: Symbolic Variational Autoencoders | C. Devaraj, A. Chowdhury, A. Jain, J. R. Kubricht, P. Tu and A. Santamaria-Pang | We introduce Symbolic Variational Autoencoders which generate images from symbols that represent semantic concepts. |
666 | Graph Auto-Encoder for Graph Signal Denoising | T. H. Do, D. Minh Nguyen and N. Deligiannis | To bridge this gap, we propose to use graph convolutional neural network with a Kron-reduction-based pooling operator for denoising on graphs. |
667 | A Priori Estimates of the Generalization Error for Autoencoders | Z. Don, W. E and C. Ma | In this paper, we build theoretical understanding about autoencoders. |
668 | GFCN: A New Graph Convolutional Network Based on Parallel Flows | F. Ji, J. Yang, Q. Zhang and W. P. Tay | In this paper, we study the problem from a different perspective, by introducing parallel flow decomposition of graphs. |
669 | Depthwise-STFT Based Separable Convolutional Neural Networks | S. Kumawat and S. Raman | In this paper, we propose a new convolutional layer called Depthwise-STFT Separable layer that can serve as an alternative to the standard depthwise separable convolutional layer. |
670 | Semi-Implicit Stochastic Recurrent Neural Networks | E. Hajiramezanali, A. Hasanzadeh, N. Duffield, K. Narayanan, M. Zhou and X. Qian | In this paper, we advocate learning implicit latent representations using semi-implicit variational inference to further increase model flexibility. |
671 | Feedback Recurrent Autoencoder | Y. Yang, G. Sauti?re, J. J. Ryu and T. S. Cohen | In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. |
672 | Indylstms: Independently Recurrent LSTMS | P. Gonnet and T. Deselaers | We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. |
673 | Neural Attentive Multiview Machines | O. Barkan, O. Katz and N. Koenigstein | To this end, we introduce NAM: a Neural Attentive Multiview machine that learns multiview item representations and similarity by employing a novel attention mechanism. |
674 | Attentive Modality Hopping Mechanism for Speech Emotion Recognition | S. Yoon, S. Dey, H. Lee and K. Jung | In this work, we explore the impact of visual modality in addition to speech and text for improving the accuracy of the emotion detection system. |
675 | Facial Emotion Recognition Using Light Field Images with Deep Attention-Based Bidirectional LSTM | A. Sepas-Moghaddam, A. Etemad, F. Pereira and P. L. Correia | This paper exploits the rich spatio-angular information available in light field images for facial emotion recognition. |
676 | A Regularized Attention Mechanism for Graph Attention Networks | U. S. Shanthamallu, J. J. Thiagarajan and A. Spanias | In this paper, we perform a detailed analysis of GAT models, and present interesting insights into their behavior. |
677 | Attentive Item2vec: Neural Attentive User Representations | O. Barkan, A. Caciularu, O. Katz and N. Koenigstein | In this work, we present Attentive Item2vec (AI2V) – a novel attentive version of Item2vec (I2V). |
678 | Audio Sound Determination Using Feature Space Attention Based Convolution Recurrent Neural Network | X. Xia, J. Pan and Y. Wang | To deal with this, we propose a feature space attention based convolution recurrent neural network approach utilizing the varying importance of each feature dimension to perform acoustic event detection. |
679 | Spatial Attentional Bilinear 3D Convolutional Network for Video-Based Autism Spectrum Disorder Detection | K. Sun, L. Li, L. Li, N. He and J. Zhu | In this paper, we propose spatial attentional bilinear pooling to enhance its spatial information extraction without significantly increasing the parameters. |
680 | Linear Thompson Sampling Under Unknown Linear Constraints | A. Moradipari, M. Alizadeh and C. Thrampoulidis | In this setting, we propose Safe-LTS, the first safe Thompson Sampling based algorithm, and we prove that it achieves no-regret learning. |
681 | Overlapped State Hidden Semi-Markov Model for Grouped Multiple Sequences | H. Narimatsu and H. Kasai | To tackle this challenge, we propose a new model designated as overlapped state hidden semi-Markov model (OS-HSMM). |
682 | Online Community Detection by Spectral Cusum | M. Zhang, L. Xie and Y. Xie | We present an online community change detection algorithm called spectral CUSUM to detect the emergence of a community using a subspace projection procedure based on a Gaus-sian model setting. |
683 | Enhanced Adversarial Strategically-Timed Attacks Against Deep Reinforcement Learning | C. H. Yang et al. | In this paper, we introduce timing-based adversarial strategies against a DRL-based navigation system by jamming in physical noise patterns on the selected time frames. |
684 | Preference-Aware Mask for Session-Based Recommendation with Bidirectional Transformer | Y. Zhang et al. | In this paper, we propose the preference-aware mask to capture user preferences over the items within the sessions, which adapts to the preference-irrelevant items within the sessions and provides explainable evidence for the recommendation. |
685 | Low Mutual and Average Coherence Dictionary Learning Using Convex Approximation | J. Parsa, M. Sadeghi, M. Babaie-Zadeh and C. Jutten | In this paper, we consider a dictionary learning problem regularized with the average coherence and constrained by an upper-bound on the mutual coherence of the dictionary. |
686 | Robust Online Matrix Completion with Gaussian Mixture Model | C. Liu, C. Chen, H. Shan and B. Wang | In this paper, we study the problem of online matrix completion (MC) aiming to achieve robustness to the variations in both low-rank subspace and noises. |
687 | Deep Neural Network Based Matrix Completion for Internet of Things Network Localization | S. Kim, L. T. Nguyen and B. Shim | In this paper, we propose a deep neural network based matrix completion approach for Internet of Things (IoT) localization. |
688 | Bringing in the Outliers: A Sparse Subspace Clustering Approach to Learn a Dictionary of Mouse Ultrasonic Vocalizations | J. Wang, K. Mundnich, A. T. Knoll, P. Levitt and S. Narayanan | We propose a new method to automatically create a dictionary of USVs based on a two-step spectral clustering approach, where we split the set of USVs into inlier and outlier data sets. |
689 | One-Bit Compressed Sensing Using Generative Models | G. Joseph, S. Kafle and P. K. Varshney | In this paper, we address the classical problem of one-bit compressed sensing. |
690 | Hybrid Deep-Semantic Matrix Factorization for Tag-Aware Personalized Recommendation | Z. Xu, D. Yuan, T. Lukasiewicz, C. Chen, Y. Miao and G. Xu | Therefore, inspired by the recent development of deep-semantic modeling, we propose a hybrid deep-semantic matrix factorization (HDMF) model to further improve the performance of tag-aware personalized recommendation by integrating the techniques of deep-semantic modeling, hybrid learning, and matrix factorization. |
691 | Supervised Encoding for Discrete Representation Learning | C. P. Le, Y. Zhou, J. Ding and V. Tarokh | In this paper, we propose a novel supervised learning model named Supervised-Encoding Quantizer (SEQ). |
692 | Learning Data Representation and Emotion Assessment from Physiological Data | M. S. Joaquim et al. | In this work, raw two-channel pre-frontal electroencephalography and photoplethysmography signals of 25 subjects were collected using EMOTAI?s headband while watching commercials. |
693 | Feature Selection Under Orthogonal Regression with Redundancy Minimizing | X. Xu and X. Wu | To address the defect, in this paper, we propose a two-stage (filter-embedded) feature selection technique based on Maximum Relevance Minimum Redundancy and FSOR, termed as Orthogonal Regression with Minimum Redundancy (ORMR). |
694 | The Picasso Algorithm for Bayesian Localization Via Paired Comparisons in a Union of Subspaces Model | G. Canal et al. | We develop a framework for localizing an unknown point w using paired comparisons of the form “w is closer to point x i than to x j ” when the points lie in a union of known subspaces. |
695 | Learning Semi-Supervised Anonymized Representations by Mutual Information | C. Feutry, P. Piantanida and P. Duhamel | This paper addresses the problem of removing from a set of data (here images) a given private information, while still allowing other utilities on the processed data. |
696 | Learning Local Structure of Representative Points for Point Cloud Classification and Semantic Segmentation | X. Li, Y. Pang, Y. Wu and Y. Li | To solve this problem, we propose a learning-based block, named Representative Points Block (RPB), to select the most representative points of an irregular point cloud according to the task. |
697 | Towards Blind Quality Assessment of Concert Audio Recordings Using Deep Neural Networks | N. Simou, Y. Mastorakis and N. Stefanakis | In this work, we apply different Deep Neural Network (DNN) architectures to a simple binary classification problem, that of deciding whether a musical recording is user-generated or of professional quality. |
698 | Multi-Label Sound Event Retrieval Using A Deep Learning-Based Siamese Structure With A Pairwise Presence Matrix | J. Fan, E. Nichols, D. Tompkins, A. E. M?ndez M?ndez, B. Elizalde and P. Pasquier | To address this latter problem, we propose different Deep Learning architectures with a Siamesestructure and a Pairwise Presence Matrix. |
699 | Speech-Driven Facial Animation Using Polynomial Fusion of Features | T. Kefalas, K. Vougioukas, Y. Panagakis, S. Petridis, J. Kossaifi and M. Pantic | In this paper we propose a polynomial fusion layer that models the joint representation of the encodings by a higher-order polynomial, with the parameters modelled by a tensor decomposition. |
700 | SED-MDD: Towards Sentence Dependent End-To-End Mispronunciation Detection and Diagnosis | Y. Feng, G. Fu, Q. Chen and K. Chen | In order to integrate these stages, we propose SED-MDD, an end-to-end model for sentence dependent mispronunciation detection and diagnosis (MD&D) . |
701 | Generative Pre-Training for Speech with Autoregressive Predictive Coding | Y. Chung and J. Glass | In this paper we propose to use autoregressive predictive coding (APC), a recently proposed self-supervised objective, as a generative pre-training approach for learning meaningful, non-specific, and transferable speech representations. |
702 | Stargan for Emotional Speech Conversion: Validated by Data Augmentation of End-To-End Emotion Recognition | G. Rizos, A. Baird, M. Elliott and B. Schuller | In this paper, we propose an adversarial network implementation for speech emotion conversion as a data augmentation method, validated by a multi-class speech affect recognition task. |
703 | Multimodal Transformer Fusion for Continuous Emotion Recognition | J. Huang, J. Tao, B. Liu, Z. Lian and M. Niu | In this work, we utilize the Transformer model to fuse audio-visual modalities on the model level. |
704 | HKA: A Hierarchical Knowledge Attention Mechanism for Multi-Turn Dialogue System | J. Song, K. Zhang, X. Zhou and J. Wu | Motivated by this, we propose a novel hierarchical knowledge attention (HKA) mechanism for open-domain multi-turn dialogue system in this paper, which utilizes both word and utterance level attention jointly. |
705 | Submodular Rank Aggregation on Score-Based Permutations for Distributed Automatic Speech Recognition | J. Qi, C. H. Yang and J. Tejedor | This work studies the use of submodular functions to design a rank aggregation on score-based permutations, which can be used for distributed ASR systems in both supervised and unsupervised modes. |
706 | Bridging Mixture Density Networks with Meta-Learning for Automatic Speaker Identification | R. Li, J. Jiang, X. Wu, H. Mao, C. Hsieh and W. Wang | To alleviate the disadvantage caused by training data deficiency, we propose a Mixture Density Network- based Meta-Learning method (MDNML) for speaker identification. |
707 | Pitch Estimation Via Self-Supervision | B. Gfeller, C. Frank, D. Roblek, M. Sharifi, M. Tagliasacchi and M. Velimirovic | We present a method to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation. |
708 | Hierarchical Caching via Deep Reinforcement Learning | A. Sadeghi, G. Wang and G. B. Giannakis | To model the two-way interactive influence between caching decisions at the parent and leaf nodes, a reinforcement learning formulation is put forth in this work. |
709 | Learning Network Representation Through Reinforcement Learning | S. Shen et al. | In contrast, we propose, reinforcement learning network representations (RLNet), which explores the idea of using reinforcement learning to learn to explore the network and to obtain network representations. |
710 | Attention-Based Curiosity-Driven Exploration in Deep Reinforcement Learning | P. Reizinger and M. Szemenyei | Combining them, we propose new methods, such as Attention-aided Advantage Actor-Critic, an extension of the Actor-Critic framework. |
711 | Stabilizing Multi-Agent Deep Reinforcement Learning by Implicitly Estimating Other Agents? Behaviors | Y. Jin, S. Wei, J. Yuan, X. Zhang and C. Wang | To cope with this challenge, we propose a novel approach where each agent uses an implicit estimate of others? actions to guide its own policy learning. |
712 | QOS-Aware Flow Control for Power-Efficient Data Center Networks with Deep Reinforcement Learning | P. Sun, Z. Guo, S. Liu, J. Lan and Y. Hu | In this paper, we propose SmartFCT, which employs Software-Defined Networking (SDN) coupled with the Deep Reinforcement Learning (DRL) to improve the power efficiency of DCNs and guarantee the FCT. |
713 | Improving the Scalability of Deep Reinforcement Learning-Based Routing with Control on Partial Nodes | P. Sun, J. Lan, Z. Guo, Y. Xu and Y. Hu | In this paper, we propose SINET, a scalable and intelligent network control framework for routing optimization. |
714 | Generalized Linear Bandits with Safety Constraints | S. Amani, M. Alizadeh and C. Thrampoulidis | This paper formulates a generalized linear stochastic multi-armed bandit problem with generalized linear safety constraints that depend on an unknown parameter vector. |
715 | From Video Game to Real Robot: The Transfer Between Action Spaces | J. Karttunen, A. Kanervisto, V. Kyrki and V. Hautam?ki | In this work, we study how general video games can be directly used instead of fine-tuned simulations for the sim-to-real transfer. |
716 | Correlated Multi-Armed Bandits with A Latent Random Source | S. Gupta, G. Joshi and O. Yagan | We propose and analyze the performance of the C-UCB algorithm that leverages the correlations between arms to reduce the cumulative regret (i.e., to increase the total reward obtained after T rounds). |
717 | Adaptive Sequential Interpolator Using Active Learning for Efficient Emulation of Complex Systems | L. Martino, D. H. Svendsen, J. Vicent and G. Camps-Valls | This paper introduces an interpolation procedure which belongs to the family of active learning algorithms, in order to construct cheap surrogate models of such costly complex systems. |
718 | Continual Learning for Infinite Hierarchical Change-Point Detection | P. Moreno-Mu?oz, D. Ram?rez and A. Art?s-Rodr?guez | To circumvent this problem, we propose to use a hierarchical model, which yields observations that belong to a lower-dimensional manifold. |
719 | Cost Aware Adversarial Learning | S. De Silva, J. Kim and R. Raich | In this paper, we present an attack-cost-aware adversarial learning framework that takes into account the (potentially inhomogeneous) vulnerability characteristics of test data entries in designing an attack-resilient classifier. |
720 | On Divergence Approximations for Unsupervised Training of Deep Denoisers Based on Stein?s Unbiased Risk Estimator | S. Soltanayev, R. Giryes, S. Y. Chun and Y. C. Eldar | In this work, we briefly study the computational efficiency of Monte-Carlo (MC) divergence approximation over recently available exact divergence computation using backpropagation. |
721 | Variable Metric Proximal Gradient Method with Diagonal Barzilai-Borwein Stepsize | Y. Park, S. Dhar, S. Boyd and M. Shah | This paper proposes an adaptive metric selection strategy called diagonal Barzilai-Borwein (DBB) stepsize for the popular Variable Metric Proximal Gradient (VM-PG) algorithm [1], [2]. |
722 | Revisit of Estimate Sequence for Accelerated Gradient Methods | B. Li, M. Couti?o and G. B. Giannakis | In this paper, we revisit the problem of minimizing a convex function f(x) with Lipschitz continuous gradient via accelerated gradient methods (AGM). |
723 | A Generalization of Principal Component Analysis | S. Battaglino and E. Koyuncu | We present a gradient ascent algorithm to solve the problem. |
724 | An Easy-to-Implement Framework of Fast Subspace Clustering For Big Data Sets | L. Meng, Y. Jiao and Y. Gu | To enable the fast implementation of subspace clustering on big datasets, this paper proposes a simple but effective subspace clustering framework called Fast Subspace Clustering (FSC), which adopts a “sampling, random projecting, clustering, and classifying” strategy. |
725 | Investigating Generalization in Neural Networks Under Optimally Evolved Training Perturbations | S. Chaudhury and T. Yamasaki | In this paper, we study the generalization properties of neural networks under input perturbations and show that minimal training data corruption by a few pixel modifications can cause drastic overfitting. |
726 | Heterogeneous Domain Generalization Via Domain Mixup | Y. Wang, H. Li and A. C. Kot | In this work, we focus on the problem of heterogeneous domain generalization which aims to improve the generalization capability across different tasks, which is, how to learn a DCNN model with multiple domain data such that the trained feature extractor can be generalized to supporting recognition of novel categories in a novel target domain. |
727 | Preservation of Anomalous Subgroups On Variational Autoencoder Transformed Data | S. C. Maina et al. | We present a Utility Guaranteed Deep Privacy (UGDP) system which casts existing anomalous pattern detection methods as a new utility measure for data synthesis. |
728 | Learn-By-Calibrating: Using Calibration As A Training Objective | J. J. Thiagarajan, B. Venkatesh and D. Rajan | We propose a novel algorithm that performs simultaneous interval estimation for different calibration levels and effectively leverages the intervals to refine the mean estimates. |
729 | ESRGAN+ : Further Improving Enhanced Super-Resolution Generative Adversarial Network | N. C. Rakotonirina and A. Rasoanaivo | In this fashion, the model is extended to further improve the perceptual quality of the images. |
730 | Attentive Cutmix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification | D. Walawalkar, Z. Shen, Z. Liu and M. Savvides | In this paper, we propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix [3]. |
731 | Efficient Image Super Resolution Via Channel Discriminative Deep Neural Network Pruning | Z. Hou and S. Kung | In order to identify and remove such uninformative channels, we propose a new pruning criterion, Discriminant Information, by characterizing the dependency of the output w.r.t to the hidden-layer feature-maps. |
732 | Multi-Resolution Overlapping Stripes Network for Person Re-Identification | A. E. Okay, M. AlGhamdi, R. Westendrop and M. Abdel-Mottaleb | This paper presents a part-based model with a multi-resolution network that uses different level of features. |
733 | Person Identification Using Deep Convolutional Neural Networks on Short-Term Signals from Wearable Sensors | G. Retsinas, P. P. Filntisis, N. Efthymiou, E. Theodosis, A. Zlatintsi and P. Maragos | In this work, we explore the discriminating ability of short-term signal patterns (e.g. few minutes long) with respect to the person identification task. |
734 | Local-Global Feature for Video-Based One-Shot Person Re-Identification | C. Zhao, Z. Zhang, J. Yan and Y. Yan | In this paper we propose a novel local-global progressive learning framework to overcome the limitations. |
735 | Global and Local Discriminative Patches Exploiting for Action Recognition | J. Wu, W. Luo, W. Liu and C. Zhang | In this work we propose a novel multi-stream features fusion framework based on discriminative patch exploiting. |
736 | Disentangling Controllable Object Through Video Prediction Improves Visual Reinforcement Learning | Y. Zhong, A. Schwing and J. Peng | Leveraging action-conditioned video prediction, we propose an end-to-end learning frame-work to disentangle the controllable object from the observation signal. |
737 | Dynamic Variational Autoencoders for Visual Process Modeling | A. Sagel and H. Shen | We propose a joint learning framework, combining a vector autoregressive model and a Variational Autoencoder. |
738 | A Novel Two-Pathway Encoder-Decoder Network for 3D Face Reconstruction | X. Li, Z. Weng, J. Liang, L. C. Youjun and X. Yuli Fu | To address this problem, Two-Pathway Encoder-Decoder Network (2PEDN) is proposed to regress the identity and expression components via global and local pathways. |
739 | Rate Assignment in 360-Degree Video Tiled Streaming Using Random Forest Regression | R. Skupin, K. Bitterschulte, Y. Sanchez, C. Hellge and T. Schierl | This paper addresses rate assignment in a distributed tile encoding system for such multi-resolution tiled streaming services based on the emerging Versatile Video Coding Standard. |
740 | Improving Convergent Cross Mapping for Causal Discovery with Gaussian Processes | G. Feng, K. Yu, Y. Wang, Y. Yuan and P. M. Djuric | In this paper, we propose a more reliable and principled approach, which is based on Gaussian processes (GPs) that improves the attractor reconstruction. |
741 | Label Reuse for Efficient Semi-Supervised Learning | T. Hsieh, J. Chen and C. Chen | In this paper, we propose a new learning strategy for semi-supervised deep learning algorithms, called label reuse, aiming to significantly reduce the expensive computational cost of pseudo label generation and the like for each unlabeled training instance since pseudo labels require to be repeatedly evaluated through the whole training process. |
742 | Decentralized Optimization with Non-Identical Sampling in Presence of Stragglers | T. Adikari and S. Draper | We propose to combine worker outputs weighted by the amount of work completed by each. |
743 | Content Vs Context: How About “Walking Hand-In-Hand” For Image Clustering? | S. Hu, Z. Hou, Z. Lou and Y. Ye | This paper proposes a novel content-context information bottleneck (C2IB) algorithm, which simultaneously explores and exploits the content and context information for discovering image clusters. |
744 | Fixed Smooth Convolutional Layer for Avoiding Checkerboard Artifacts in CNNS | Y. Kinoshita and H. Kiya | In this paper, we propose a fixed convolutional layer with an order of smoothness not only for avoiding checkerboard artifacts in convolutional neural networks (CNNs) but also for enhancing the performance of CNNs, where the smoothness of its filter kernel can be controlled by a parameter. |
745 | This Dataset Does Not Exist: Training Models from Generated Images | V. Besnier, H. Jain, A. Bursuc, M. Cord and P. P?rez | In this work we investigate this question and its related challenges. We identify ways to improve significantly the performance over naive training on randomly generated images with regular heuristics. |
746 | LEt-SNE: A Hybrid Approach to Data Embedding and Visualization Of Hyperspectral Imagery | M. Shukla, B. Banerjee and K. M. Buddhiraju | In this paper, we propose a novel approach, LEt-SNE, which combines graph based algorithms like t-SNE and Laplacian Eigenmaps into a model parameterized by a shallow feed forward network. |
747 | Adversarial Mixup Synthesis Training for Unsupervised Domain Adaptation | Y. Tang, Z. Lin, H. Wang and L. Xu | In this paper, we propose a new approach termed Adversarial Mixup Synthesis Training (AMST) to alleviate the issue. |
748 | Rate-Invariant Autoencoding of Time-Series | K. Koneripalli, S. Lohit, R. Anirudh and P. Turaga | In this paper, we extend prior work in disentangling latent spaces of autoencoding models, to design a novel architecture to learn rate-invariant latent codes in a completely unsupervised fashion. |
749 | Self-Paced Probabilistic Principal Component Analysis For Data With Outliers | B. Zhao, X. Xiao, W. Zhang, B. Zhang, G. Gan and S. Xia | To alleviate this problem, we propose a novel method called Self-Paced Probabilistic Principal Component Analysis (SP-PPCA) by introducing the Self-Paced Learning mechanism into PPCA. |
750 | Corrdrop: Correlation Based Dropout for Convolutional Neural Networks | Y. Zeng, T. Dai and S. Xia | To address these issues, we propose a novel structural dropout method, Correlation based Dropout (CorrDrop), to regularize CNNs by dropping feature units based on feature correlation, which reflects the discriminative information in feature maps. |
751 | Witchcraft: Efficient PGD Attacks with Random Step Size | P. Chiang et al. | We propose a variant of Projected Gradient Descent (PGD) that uses a random step size to improve performance without resorting to expensive random restarts. |
752 | The Fifthnet Chroma Extractor | K. O. Hanlon and M. B. Sandler | We propose the FifthNet, a neural network for chroma-based ACR that incorporates known spectral structures in its design through data manipulation. |
753 | Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means | Y. Ng, J. M. Pereira, D. Garagic and V. Tarokh | In this paper, we formulate marine buoy placement as a clustering problem, and propose dropout k-means and dropout k-median to improve placement robustness to buoy disruption.We simulated the passage of ships in the Gabonese waters near West Africa using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout k-means to classic k-means and dropout k-median to classic k-median. |
754 | On?The?Fly Feature Selection and Classification with Application to Civic Engagement Platforms | Y. W. Liyanage, D. Zois and C. Chelmis | Instead, we propose an optimal framework to perform joint feature selection and classification on?the?fly while relaxing the assumption on feature independence. |
755 | Global Traffic State Recovery VIA Local Observations with Generative Adversarial Networks | M. He, X. Luo, Z. Wang, F. Yang, H. Qian and C. Hua | In this paper, in order to avoid these communication overheads among spatially distributed intersections, we propose to recover the global traffic state at each intersection in a real-time fashion by only utilizing the traffic state observed at the local intersection. |
756 | Forecasting Sparse Traffic Congestion Patterns Using Message-Passing RNNS | S. R. Iyer, U. An and L. Subramanian | In this work, we leverage mobility traces of public transport vehicles tracked by the New York City MTA and formulate Message-Passing Recurrent Neural Nets (MPRNN) to produce long-term traffic forecasting on data that is sparse but wide in coverage. |
757 | Energy Disaggregation from Low Sampling Frequency Measurements Using Multi-Layer Zero Crossing Rate | P. A. Schirmer and I. Mporas | In this article we propose an extension of the zero crossing rate as a time domain measurement of frequency content and compare it to the previously proposed Karhunen Loeve Expansion, in terms of energy disaggregation performance and in terms of computational time. |
758 | Decoding 5G-NR Communications VIA Deep Learning | P. Henarejos and M. ?ngel V?zquez | In this paper we propose to use Autoencoding Neural Networks (ANN) jointly with a Deep Neural Network (DNN) to construct Autoencoding Deep Neural Networks (ADNN) for demapping and decoding. |
759 | Body Movement Generation for Expressive Violin Performance Applying Neural Networks | J. Liu, H. Lin, Y. Huang, H. Kao and L. Su | In this paper, we take a divide-and-rule approach to tackle the multifaceted characteristics of musical movement, and propose a framework for generating violinists? body movements. |
760 | Sequential Vessel Trajectory Identification Using Truncated Viterbi Algorithm | Z. Dong, Y. Yangv and Y. Xie | In this work, we propose a novel classification algorithm that used to classify vessel data points into different trajectories. |
761 | A Prototypical Triplet Loss for Cover Detection | G. Doras and G. Peeters | We thus introduce here a new test set incorporating these constraints, and propose two contributions to improve our model?s accuracy under these stricter conditions: we replace dominant melody with multi-pitch representation as input data, and describe a novel prototypical triplet loss designed to improve covers clustering. |
762 | Automotive Radar Signal Interference Mitigation Using RNN with Self Attention | J. Mun, S. Ha and J. Lee | In this paper, we propose a new method using deep learning. |
763 | A Large-Scale Deep Architecture for Personalized Grocery Basket Recommendations | A. Mantha et al. | In this paper, we introduce a production within-basket grocery recommendation system, RTT2Vec, which generates real-time personalized product recommendations to supplement the user?s current grocery basket. |
764 | Blind Bounded Source Separation Using Neural Networks with Local Learning Rules | A. T. Erdogan and C. Pehlevan | To separate such bounded sources from their mixtures, we propose a new optimization problem, Bounded Similarity Matching (BSM). |
765 | Modeling Piece-Wise Stationary Time Series | D. Wu, S. Gundimeda, S. Mou and C. J. Quinn | We propose a new, data-driven technique to automatically identify change-points and learn piece-wise stationary models. |
766 | Multivariate Tropical Regression and Piecewise-Linear Surface Fitting | P. Maragos and E. Theodosis | In this paper we propose a novel approach for multivariate convex regression by using as approximation model a maximum of hyperplanes, which we represent as a multivariate max-plus tropical polynomial. |
767 | Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification | Z. Xiang, D. J. Miller and G. Kesidis | We propose a defense against imperceptible backdoor attacks based on perturbation optimization and novel, robust detection inference. |
768 | Classifying Partially Labeled Networked Data VIA Logistic Network Lasso | N. Tran, H. Ambos and A. Jung | We apply the network Lasso to classify partially labeled data points which are characterized by high-dimensional feature vectors. |
769 | Neural Time Warping for Multiple Sequence Alignment | K. Kawano, T. Kutsuna and S. Koide | In this paper, we propose neural time warping (NTW) that relaxes the original MSA to a continuous optimization, in which a neural network is used to model the alignment. |
770 | Lance: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units | G. Li, L. Liu, X. Wang, X. Ma and X. Feng | In this paper, we propose an efficient low-precision quan-tized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. |
771 | Media Classification with Bayesian Optimization and Vapnik-Chervonenkis (VC) Bounds | S. Bharitkar | In this paper, we extend the approach to optimizing the class condition distribution parameters, the neural network hyperparameters, and the size of the synthetic metadata training set using a combination of Bayesian optimization and by invoking the VC-dimensionality. |
772 | Batman: Bayesian Target Modelling For Active Inference | M. T. Koudahl and B. de Vries | The main contribution of this paper is the design of a coupled generative model structure that facilitates learning desired future observations for Active Inference agents and supports integration of Active Inference and classical methods in a joint framework. |
773 | Deep Learning Abilities to Classify Intricate Variations in Temporal Dynamics of Multivariate Time Series | P. Liotet, P. Abry, R. Leonarduzzi, M. Senneret, L. Jaffr?s and G. Perrin | The aim of this work is to investigate the ability of deep learning (DL) architectures to learn temporal dynamics in multivariate time series. |
774 | Assimilation-Based Learning of Chaotic Dynamical Systems from Noisy and Partial Data | D. Nguyen, S. Ouala, L. Drumetz and R. Fablet | We propose a novel framework, which combines data assimilation schemes and neural network representation, namely Auto-Encoders and Ensemble Kalman Smoother, to learn the governing equations of dynamical systems. |
775 | Gated Multi-Layer Convolutional Feature Extraction Network for Robust Pedestrian Detection | T. Liu, J. Huang, T. Dai, G. Ren and T. Stathaki | In this paper, we propose a gated multi-layer convolutional feature extraction method which can adaptively generate discriminative features for candidate pedestrian regions. |
776 | Kernel Ridge Regression with Autocorrelation Prior: Optimal Model and Cross-Validation | A. Tanaka and H. Imai | Kernel regression problem with autocorrelation prior is discussed in this paper. |
777 | Generalized Kernel-Based Dynamic Mode Decomposition | P. H?as, C. Herzet and B. Comb?s | In this work, we devise an algorithm based on low rank constraint optimization and kernel-based computation that generalizes a recent approach called “kernel-based dynamic mode decomposition”. |
778 | An Online Kernel Scalar Quantization Scheme for Signal Classification | J. Guo, R. G. Raj and D. J. Love | In this paper, we are interested in understanding the design and behavior of these relay-like classification nodes. |
779 | Self-Driven Graph Volterra Models for Higher-Order Link Prediction | M. Coutino, G. V. Karanikolas, G. Leus and G. B. Giannakis | Cross-fertilizing ideas from Volterra series and linear structural equation models, the present paper introduces self-driven graph Volterra models that can capture higher-order interactions among nodal observables available in networked data. |
780 | Graph Construction from Data by Non-Negative Kernel Regression | S. Shekkizhar and A. Ortega | We propose non-negative kernel regression (NNK), an improved approach for graph construction with interesting geometric and theoretical properties. |
781 | Structured Citation Trend Prediction Using Graph Neural Networks | D. Cummings and M. Nassar | We present GNN-based architecture that predicts the top set of papers at the time of publication. |
782 | Revisiting Fast Spectral Clustering with Anchor Graph | C. Wang, F. Nie, R. Wang and X. Li | In this paper, we revisit the popular large-scale spectral clustering method based on the anchor graph which is equivalent to the spectral decomposition on a similar matrix obtained using a second-order transition probability. |
783 | A Graph Network Model for Distributed Learning with Limited Bandwidth Links and Privacy Constraints | J. Parras and S. Zazo | In this work, we develop an algorithm based on graph networks to train distributedly a deep learning model. |
784 | Graph Regularized Tensor Train Decomposition | S. E. Sofuoglu and S. Aviyente | In this paper, we propose a graph regularized tensor train (GRTT) decomposition that learns a low-rank tensor train model that preserves the local relationships between tensor samples. |
785 | Weighted Krylov-Levenberg-Marquardt Method for Canonical Polyadic Tensor Decomposition | P. Tichavsk?, A. Phan and A. Cichocki | The proposed Krylov-Levenberg-Marquardt method enables to do second-order-based iterations even in large-scale decomposition problems, with or without weights. |
786 | Low-Complexity Levenberg-Marquardt Algorithm for Tensor Canonical Polyadic Decomposition | K. Huang and X. Fu | In this paper, we propose CPD-fLM++, a fast implementation of the Levenberg-Marquardt (LM) algorithm for the tensor canonical polyadic decomposition. |
787 | A Moment-Based Approach for Guaranteed Tensor Decomposition | A. Marmin, M. Castella and J. Pesquet | This paper presents a new scheme to perform the canonical polyadic decomposition (CPD) of a symmetric tensor. |
788 | Learning Diverse Sub-Policies via a Task-Agnostic Regularization on Action Distributions | L. Huo, Z. Wang, M. Xu and Y. Song | In this paper, we formulate the discovery of diverse sub-policies as a trajectory inference. |
789 | Federated Learning with Mutually Cooperating Devices: A Consensus Approach Towards Server-Less Model Optimization | S. Savazzi, M. Nicoli, V. Rampa and S. Kianoush | In this paper we propose a distributed FL approach that performs a decentralized fusion of local model parameters by leveraging mutual cooperation between the devices and local (in-network) data operations via consensus-based methods. |
790 | No-Regret Non-Convex Online Meta-Learning | Z. Zhuang, Y. Wang, K. Yu and S. Lu | In this paper, we generalize the original framework from convex to non-convex setting, and introduce the local regret as the alternative performance measure. |
791 | Asynchrounous Decentralized Learning of a Neural Network | X. Liang, A. M. Javid, M. Skoglund and S. Chatterjee | In this work, we exploit an asynchronous computing framework namely ARock to learn a deep neural network called self-size estimating feedforward neural network (SSFN) in a decentralized scenario. |
792 | Learning Perception and Planning With Deep Active Inference | O. ?atal, T. Verbelen, J. Nauta, C. D. Boom and B. Dhoedt | In this paper we use recent advances in deep learning to learn the state space and approximate the necessary probability distributions to engage in active inference. |
793 | Projection Free Dynamic Online Learning | D. S. Kalhan, A. S. Bedi, A. Koppel, K. Rajawat, A. Gupta and A. Banerjee | To avoid this bottleneck, we propose a projection-free scheme based on Frank-Wolfe: where instead of online gradient steps, we use steps that are collinear with the gradient but guaranteed to be feasible. |
794 | Learning Partial Differential Equations From Data Using Neural Networks | A. Hasan, J. M. Pereira, R. Ravier, S. Farsiu and V. Tarokh | We introduce a regularization scheme that prevents the function approximation from overfitting the data and forces it to be a solution of the underlying PDE. |
795 | Active Learning with Unsupervised Ensembles of Classifiers | P. A. Traganitis, D. Berberidis and G. B. Giannakis | The present work introduces a simple scheme for active classification of data using unsupervised ensembles of classifiers. |
796 | Nasil: Neural Architecture Search with Imitation Learning | F. S. Fard, A. Rad and V. S. Tomar | Here we introduce neural architecture search with imitation learning (NASIL) method that starts the search by learning from hand designed structures by experts. |
797 | Multi-View Clustering Via Mixed Embedding Approximation | D. Wu, F. Nie, R. Wang and X. Li | Formally, we aim to learn a uniform orthogonal embedding based on the orthogonal pre-embeddings of each view. |
798 | Signal Clustering With Class-Independent Segmentation | S. Gasperini, M. Paschali, C. Hopke, D. Wittmann and N. Navab | In this paper we propose a Deep Learning-based clustering method, which encodes concurrent signals into images, and, for the first time, tackles clustering with image segmentation. |
799 | Mango: A Python Library for Parallel Hyperparameter Tuning | S. S. Sandha, M. Aggarwal, I. Fedorov and M. Srivastava | To address these challenges, we present Mango, a Python library for parallel hyperparameter tuning. |
800 | Anytime Minibatch with Delayed Gradients: System Performance and Convergence Analysis | H. Al-Lawati and S. C. Draper | We present convergence analysis of Anytime Minibatch with Delayed Gradients (AMB-DG) algorithm. |
801 | On Exponentially Consistency of Linkage-Based Hierarchical Clustering Algorithm Using Kolmogrov-Smirnov Distance | T. Wang, Y. Liu and B. Chen | This paper focuses on performance analysis of linkage-based hierarchical agglomerative clustering algorithms for sequence clustering using the Kolmogrov-Smirnov distance. |
802 | A Neural Network Based on First Principles | P. M. Baggenstoss | In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimensionreducing transformation. |
803 | AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks | M. E. Helou, F. Dumbgen and S. S?sstrunk | We propose a novel regularization method that progressively penalizes the magnitude of activations during training. |
804 | Label Propagation Adaptive Resonance Theory for Semi-Supervised Continuous Learning | T. Kim, I. Hwang, G. Kang, W. Choi, H. Kim and B. Zhang | In this paper, we propose Label Propagation Adaptive Resonance Theory (LPART) for semi-supervised continuous learning. |
805 | A Probabilistic Scheme for Representation Learning with Radial Transform Images | H. Salehinejad and S. Valaee | In this paper, a probabilistic framework to analyze radial transform is presented. |
806 | Perception-Distortion Trade-Off with Restricted Boltzmann Machines | C. Cannella, J. Ding, M. Soltani, Y. Zhou and V. Tarokh | In this work, we introduce a new procedure for applying Restricted Boltzmann Machines (RBMs) to missing data inference tasks, based on linearization of the effective energy function governing the distribution of observations. |
807 | Anefficient Alternative to Network Pruning Through Ensemble Learning | M. P?llot, R. Zhang and A. Kaup | Unlike normal pruning methods, we investigate the possibility of replacing a full-sized convolutional neural network with an ensemble of its narrow versions. |
808 | A Novel Pruning Approach for Bagging Ensemble Regression Based on Sparse Representation | A. E. Khorashadi-Zadeh, M. Babaie-Zadeh and C. Jutten | This work aims to propose an approach for pruning a bagging ensemble regression (BER) model based on sparse representation, which we call sparse representation pruning (SRP). |
809 | K-Autoencoders Deep Clustering | Y. Opochinsky, S. E. Chazan, S. Gannot and J. Goldberger | In this study we propose a deep clustering algorithm that extends the k-means algorithm. |
810 | MoGA: Searching Beyond Mobilenetv3 | X. Chu, B. Zhang and R. Xu | Bearing the target hardware in mind, we propose the first Mobile GPU-Aware (MoGA) neural architecture search in order to be precisely tailored for real-world applications. |
811 | Meta Metric Learning for Highly Imbalanced Aerial Scene Classification | J. Guan, J. Liu, J. Sun, P. Feng, T. Shuai and W. Wang | In this paper, we propose a random finetuning meta metric learning model (RF-MML) to address this problem. |
812 | Synthetic Crowd and Pedestrian Generator for Deep Learning Problems | A. Khadka, P. Remagnino and V. Argyriou | In this paper, therefore, a graphics simulator is presented which automatically generates multi-model datasets in real-time providing the corresponding ground truth and annotation. |
813 | TOSO: Student?s-T Distribution Aided One-Stage Orientation Target Detection in Remote Sensing Images | P. Feng, Y. Lin, J. Guan, G. He, H. Shi and J. Chambers | In this paper, a robust Student?s-T distribution aided One-Stage Orientation detector, namely TOSO, is proposed to address orientation target detection in remote sensing images. |
814 | Improving Deep Learning Classification of JPEG2000 Images Over Bandlimited Networks | L. D. Chamain and Z. Ding | Considering limited network bandwidth, we propose an end-to-end deep learning framework to achieve faster and more accurate classification by directly training a deep CNN image classifier using the CDF 9/7 Discrete Wavelet Transformed (DWT) coefficients from j2k-compressed images without image reconstruction. |
815 | Augmented Grad-CAM: Heat-Maps Super Resolution Through Augmentation | P. Morbidelli, D. Carrera, B. Rossi, P. Fragneto and G. Boracchi | We present Augmented Grad-CAM, a general framework to provide a high-resolution visual explanation of CNN outputs. |
816 | BBA-NET: A Bi-Branch Attention Network For Crowd Counting | Y. Hou et al. | To this end, we propose a Bi-Branch Attention Network (BBA-NET) for crowd counting, which has three innovation points. |
817 | Deep Metric Learning Based On Center-Ranked Loss for Gait Recognition | J. Su, Y. Zhao and X. Li | Therefore, in this paper, a novel loss named Center-ranked is proposed to integrate all positive and negative samples information. |
818 | Channel Attention Based Generative Network for Robust Visual Tracking | Y. Hu, H. Xuan, J. Yang and Y. Yan | In this paper, we propose a novel real-time Channel Attention based Generative Network (AGSNet) for Robust Visual Tracking. |
819 | Cross-VAE: Towards Disentangling Expression from Identity For Human Faces | H. Wu, J. Jia, L. Xie, G. Qi, Y. Shi and Q. Tian | In this paper, we propose to learn clearly disentangled and discriminative features that are invariant of identities for expression recognition. |
820 | Enhance Part-Based Model for Person Re-Identification with Fused Multi-Scale Features | X. Lin, Y. Yang and Z. Niu | To address the part-misalignment problem and learn a more discriminative embedding for person Re-ID, we propose a novel Part-based model with fused Multi-Scale features (PMS), which innovatively upscales the low-layer features by using UpShuffle Modules and smoothly integrates the high-layer features. |
821 | Text-To-Image Synthesis Method Evaluation Based On Visual Patterns | W. L. Sommer and A. Iosifidis | In this paper, we introduce an evaluation metric and a visual evaluation method allowing for the simultaneous estimation of the realism, variety and semantic accuracy of generated images. |
822 | Detection of Mild Dyspnea from Pairs of Speech Recordings | S. Boelders, V. S. Nallanthighal, V. Menkovski and A. H?rm? | In this paper, we explore techniques of detecting mild dyspnea directly from conversational speech, for example, in a telehealth application. |
823 | A Hybrid Model for Bipolar Disorder Classification from Visual Information | N. Abaei and H. A. Osman | Our goal is to classify the condition of patients suffering from BD into the clinically significant states of remission, hypo-mania, and mania. |
824 | Automatic Event Detection of REM Sleep Without Atonia From Polysomnography Signals Using Deep Neural Networks | P. Wallis, D. Yaeger, A. Kain, X. Song and M. Lim | We develop a novel, efficient, and objective method using deep learning to detect RSWA events from polysomnography signals using a large cohort of 692 patients. |
825 | A Deep Learning Architecture for Epileptic Seizure Classification Based on Object and Action Recognition | T. Kar?csony, A. M. Loesch-Biffar, C. Vollmar, S. Noachtar and J. P. S. Cunha | In this paper an end-to-end deep learning approach is proposed for binary classification of Frontal vs. Temporal Lobe Epilepsies based solely on seizure videos. |
826 | Transforming Seismocardiograms Into Electrocardiograms by Applying Convolutional Autoencoders | M. Haescher, F. H?pfner, W. Chodan, D. Kraft, M. Aehnelt and B. Urban | Heart.AI presents a fundamentally new approach, transforming motion-based seismocardiograms into electrocardiograms interpretable by cardiologists. |
827 | Improved Nearest Neighbor Density-Based Clustering Techniques with Application to Hyperspectral Images | C. Cariou, K. Chehdi and S. Le Moan | In this paper, we propose some improvements of recently published methods in this vein, namely GWENN (Graph WatershEd using Nearest Neighbors) as well as a KNN version of Density Peaks Clustering. |
828 | Object Surface Estimation from Radar Images | O. Bialer, D. Shapiro and A. Jonas | In this paper we develop a deep neural network (DNN) method for estimating the object surface from radar 2D image (azimuth-range). |
829 | Counting Dense Objects in Remote Sensing Images | G. Gao, Q. Liu and Y. Wang | In this paper, we are interested in counting dense objects from remote sensing images. |
830 | HPRNN: A Hierarchical Sequence Prediction Model for Long-Term Weather Radar Echo Extrapolation | J. Jing, Q. Li, X. Peng, Q. Ma and S. Tang | In this paper, to meet the demand for long-term extrapolation in actual forecasting practice, we propose a hierarchical prediction recurrent neural network (HPRNN) for long-term radar echo extrapolation. |
831 | Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction | P. Castro, A. Armagan and T. Kim | In order to achieve this, we propose a learning framework in which a Graph Convolutional Neural Network reconstructs a Pose Conditioned 3D mesh of the object. |
832 | CPWC: Contextual Point Wise Convolution for Object Recognition | P. Mazumder, P. Singh and V. Namboodiri | We propose an alternative design for pointwise convolution, which uses spatial information from the input efficiently. |
833 | Electric Analog Circuit Design with Hypernetworks And A Differential Simulator | M. Rotman and L. Wolf | In this work, we make a significant step towards a fully automatic design method that is based on deep learning. |
834 | Multi-Task Learning Via SA-FPN and EJ-Head | F. Ni, Z. Luo, X. Cao, Z. Xu and Y. Yao | So we propose a unified head module named EJ-Head (Effective Joint Head) to combine two branches into one head, not only realizing the interaction between two tasks, but also enhancing the effectiveness of multi-task learning. |
835 | Differentiable Branching In Deep Networks for Fast Inference | S. Scardapane, D. Comminiello, M. Scarpiniti, E. Baccarelli and A. Uncini | In this paper, we consider the design of deep neural networks augmented with multiple auxiliary classifiers departing from the main (backbone) network. |
836 | Multi-Step Online Unsupervised Domain Adaptation | J. H. Moon, D. Das and C. S. G. Lee | In this paper, we address the Online Unsupervised Domain Adaptation (OUDA) problem, where the target data are unlabelled and arriving sequentially. |
837 | Self-Adaptive Feature Fool | X. Liu, Y. Bai, S. Xia and Y. Jiang | In this paper, we propose a novel method to craft UAPs in the absence of data, via adaptively perturbing mid-layer outputs of the CNN. |
838 | Multi-MotifGAN (MMGAN): Motif-Targeted Graph Generation And Prediction | A. Gamage, E. Chien, J. Peng and O. Milenkovic | We propose Multi-MotifGAN (MMGAN), a motif-targeted Generative Adversarial Network (GAN) that generalizes the benchmark NetGAN approach. |
839 | Federated Classification with Low Complexity Reproducing Kernel Hilbert Space Representations | M. Peifer and A. Ribeiro | In this work we propose a method for federated learning in which each agent learns a low complexity Reproducing kernel Hilbert space representation. |
840 | Maxpolynomial Division with Application To Neural Network Simplification | G. Smyrnis, P. Maragos and G. Retsinas | To that end, we introduce the process of Maxpolynomial Division, a geometric method which simulates division of polynomials in the max-plus semiring, while highlighting its key properties and noting its connection to neural networks. |
841 | Balanced Binary Neural Networks with Gated Residual | M. Shen, X. Liu, R. Gong and K. Han | In this paper, we attempt to maintain the information propagated in the forward process and propose a Balanced Binary Neural Networks with Gated Residual (BBG for short). |
842 | A Geometric Approach for Unsupervised Similarity Learning | U. K. Dutta and C. Sekhar C | In this paper, we propose a novel, unsupervised metric learning approach, that learns a similarity metric without making use of class labels. |
843 | Gradient Delay Analysis in Asynchronous Distributed Optimization | H. Al-Lawati and S. C. Draper | In this work, we focus on the asynchronous implementation of gradient-based algorithms. |
844 | Sequential IoT Data Augmentation Using Generative Adversarial Networks | M. E. Tschuchnig, C. Ferner and S. Wegenkittl | This paper investigates the possibility of using GANs in order to augment sequential Internet of Things (IoT) data, with an example implementation that generates household energy consumption data with and without swimming pools. |
845 | Robust Rank Constrained Sparse Learning: A Graph-Based Method for Clustering | R. Liu, M. Chen, Q. Wang and X. Li | To solve this problem, a robust rank constrained sparse learning method is proposed in this paper. |
846 | Efficient Decoupled Neural Architecture Search by Structure And Operation Sampling | H. Lee, D. Kim and B. Han | We propose a novel neural architecture search algorithm via reinforcement learning by decoupling structure and operation search. |
847 | Weight Sharing and Deep Learning for Spectral Data | J. S. Larsen and L. Clemmensen | We propose a novel method to co-train deep convolutional neural networks for data sets of differing position specific data. |
848 | Complex Transformer: A Framework for Modeling Complex-Valued Sequence | M. Yang, M. Q. Ma, D. Li, Y. H. Tsai and R. Salakhutdinov | In this paper, we propose a Complex Transformer, which incorporates the transformer model as a backbone for sequence modeling; we also develop attention and encoder-decoder network operating for complex input. |
849 | High-Dimensional Neural Feature Using Rectified Linear Unit And Random Matrix Instance | A. M. Javid, A. Venkitaraman, M. Skoglund and S. Chatterjee | We design a ReLU-based multilayer neural network to generate a rich high-dimensional feature vector. |
850 | Projected Weight Regularization to Improve Neural Network Generalization | G. Zhang, K. Niwa and W. B. Kleijn | In this paper we propose a new technique, named projected weight regularization (PWR), to improve the generalization capacity of a DNN model. |
851 | Deep Clustering for Domain Adaptation | B. Gao, Y. Yang, H. Gouk and T. M. Hospedales | We address the heterogeneous domain adaptation task: adapting a classifier trained on data from one domain to operate on another domain that also has a different label space. |
852 | Deep Clusteringwith Concrete K-Means | B. Gao, Y. Yang, H. Gouk and T. M. Hospedales | We address the problem of simultaneously learning a k-means clustering and deep feature representation from unlabelled data, which is of interest due to the potential for deep k-means to outperform traditional two-step feature extraction and shallow clustering strategies. |
853 | Polarizing Front Ends for Robust Cnns | C. Bakiskan, S. Gopalakrishnan, M. Cekic, U. Madhow and R. Pedarsani | In this paper, we propose a bottom-up strategy for attenuating adversarial perturbations using a nonlinear front end which polarizes and quantizes the data. |
854 | Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers | S. K. Hanna, R. Bitar, P. Parag, V. Dasari and S. El Rouayheb | Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive k. |
855 | A Model of Double Descent for High-Dimensional Logistic Regression | Z. Deng, A. Kammoun and C. Thrampoulidis | We consider a model for logistic regression where only a subset of features of size p is used for training a linear classifier over n training samples. |
856 | Efficient Scene Text Detection with Textual Attention Tower | L. Zhang et al. | In this work, we propose an efficient and accurate approach to detect multi-oriented text in scene images. |
857 | A Hybrid Approach for Thermographic Imaging With Deep Learning | P. Kov?cs, B. Lehner, G. Thummerer, G. Mayr, P. Burgholzer and M. Huemer | We propose a hybrid method for reconstructing thermographic images by combining the recently developed virtual wave concept with deep neural networks. |
858 | Knowledge Enhanced Latent Relevance Mining for Question Answering | D. Wang, Y. Shen and H. Zheng | In this paper, we propose to integrate commonsense from the external knowledge graphs (KGs) into the answer selector through a knowledge-aware context-based attention mechanism. |
859 | Multi-Label Consistent Convolutional Transform Learning: Application to Non-Intrusive Load Monitoring | S. Singh, J. Maggu, A. Majumdar, E. Chouzenoux and G. Chierchia | In this work, we propose a supervised formulation for convolutional transform so as to address the multi-label classification problem. |
860 | Resilient Distributed Recovery of Large Fields | Y. Chen, S. Kar and J. M. F. Moura | We present a field recovery consensus+innovations type distributed algorithm that is resilient to measurement attacks, where an agent maintains and updates a local state based on its neighbors states and its own measurement. |
861 | Training LSTM for Unsupervised Anomaly Detection Without A Priori Knowledge | Y. Cherdo, P. d. Kerret and R. Pawlak | In contrast, we propose a novel anomaly detector, coined as LSTM-Decomposed (LSTM-D), that does not require this normality knowledge. |
862 | Unsupervised Person Re-Identification Using Multi-Branch Feature Compensation Network and Link-Based Cluster Dissimilarity Metric | L. Pan, G. Qi, B. Guo and Y. Zhu | Therefore in this paper, a Multi-branch Feature Compensation Network (MFC-Net) is developed in which the significant parts of lower-layer features are learned and fused with high-layer feature as compensation. |
863 | Deep-SST-Eddies: A Deep Learning Framework to Detect Oceanic Eddies in Sea Surface Temperature Images | E. Moschos, O. Schwander, A. Stegner and P. Gallinari | We introduce a novel method that employs Deep Learning to detect eddy signatures on such input. |
864 | Interpretability-Guided Convolutional Neural Networks for Seismic Fault Segmentation | Z. Liu, C. Zhou, G. Hu and C. Song | To include domain knowledge to improve the interpretability of the CNN, we propose to jointly optimize the prediction accuracy and consistency between explanations of the neural network and domain knowledge. |
865 | Towards High-Performance Object Detection: Task-Specific Design Considering Classification and Localization Separation | J. U. Kim, S. T. Kim, E. S. Kim, S. Moon and Y. M. Ro | In this work, we simply modified layers of the existing object detection networks into three parts by considering such characteristics: lower-layer feature sharing part, layer separation part, and feature fusion part. |
866 | Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model | S. Lin, R. Clark, R. Birke, S. Sch?nborn, N. Trigoni and S. Roberts | In this work, we propose a VAE-LSTM hybrid model as an unsupervised approach for anomaly detection in time series. |
867 | Hydranet: A Real-Time Waveform Separation Network | E. T. Kaspersen, T. Kounalakis and C. Erkut | In this paper, we propose a 1-D convolutional U-Net structure to separate waveform input. |
868 | Storing Digital Data Into DNA: A Comparative Study Of Quaternary Code Construction | M. Dimopoulou, M. Antonini, P. Barbry and R. Appuswamy | In this paper we present a comparative study of our work with the state of the art solutions, and show that our solution is competitive. |
869 | A New Multihypothesis Prediction Scheme for Compressed Video Sensing Reconstruction | S. Zheng, X. Zhang, J. Chen and Y. Kuo | To solve this problem, this paper proposes a new multihypothesis prediction scheme. |
870 | Bit Allocation for Multi-Task Collaborative Intelligence | S. R. Alvar and I. V. Bajic | In this paper, we propose the first bit allocation method for multi-stream, multi-task CI. |
871 | Motion Feedback Design for Video Frame Interpolation | M. Hu, L. Liao, J. Xiao, L. Gu and S. Satoh | This paper introduces a feedback-based approach to interpolate video frames involving small and fast-moving objects. |
872 | Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms | Y. Ohishi, A. Kimura, T. Kawanishi, K. Kashino, D. Harwath and J. Glass | We propose a trilingual semantic embedding model that associates visual objects in images with segments of speech signals corresponding to spoken words in an unsupervised manner. |
873 | Towards Pose-Invariant Lip-Reading | S. Cheng et al. | Specifically, we propose to use a 3D Morphable Model (3DMM) to augment LRW, an existing large-scale but mostly frontal dataset, by generating synthetic facial data in arbitrary poses. |
874 | A Siamese Content-Attentive Graph Convolutional Network for Personality Recognition Using Physiology | H. Yang and C. Lee | In this work, we propose a novel Siamese Content-Attentive Graph Convolutional Network (SCA-GCN) to learn a discriminative physiology representation jointly guided by the actual video content of the emotional stimuli. |
875 | Self-Supervised Learning for Audio-Visual Speaker Diarization | Y. Ding, Y. Xu, S. Zhang, Y. Cong and L. Wang | In this paper, we propose a self-supervised audio-video synchronization learning method to address the problem of speaker diarization without massive labeling effort. |
876 | What Makes the Sound?: A Dual-Modality Interacting Network for Audio-Visual Event Localization | J. Ramaswamy | In this paper, we address the problem of audio-visual event localization where the goal is to identify the presence of an event that is both audible and visible in a video, using fully or weakly supervised learning. |
877 | Attentional Fused Temporal Transformation Network for Video Action Recognition | K. Yang et al. | Focusing on discriminate spatiotemporal feature learning, we propose Attentional Fused Temporal Transformation Network (AttnTTN) for action recognition on top of popular Temporal Segment Network (TSN) framework. |
878 | Deep Product Quantization Module for Efficient Image Retrieval | M. Liu, Y. Dai, Y. Bai and L. Duan | We propose a simple but effective deep Product Quantization Module (PQM) to jointly learn discriminative codebook and precise hard assignment in an end-to-end manner. |
879 | The Open Brands Dataset: Unified Brand Detection and Recognition at Scale | X. Jin, W. Su, R. Zhang, Y. He and H. Xue | In order to tackle these problems, we first define the special issues of brand detection and recognition compared with generic object detection. Second, a novel brands benchmark called “Open Brands” is established. |
880 | Spectrogram Analysis Via Self-Attention for Realizing Cross-Model Visual-Audio Generation | H. Tan, G. Wu, P. Zhao and Y. Chen | In this paper, the Self-Attention mechanism is applied to cross-modal visual-audio generation for the first time. |
881 | DGAN: Disentangled Representation Learning for Anisotropic BRDF Reconstruction | Z. Hu, X. Wang and Q. Wang | In this paper, we present a novel deep architecture, Disentangled Generative Adversarial Network (DGAN), which performs anisotropic Bidirectional Reflectance Distribution Function (BRDF) reconstruction from single BRDF subspace with the maximum entropy. |
882 | APB2FACE: Audio-Guided Face Reenactment with Auxiliary Pose and Blink Signals | J. Zhang, L. Liu, Z. Xue and Y. Liu | To solve those problems, we propose a novel deep neural network named APB2Face, which consists of GeometryPredictor and FaceReenactor modules. |
883 | Motion Dynamics Improve Speaker-Independent Lipreading | M. Riva, M. Wand and J. Schmidhuber | We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. |
884 | Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering | L. Shi et al. | To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously. |
885 | Linear Model-Based Intra Prediction in VVC Test Model | R. Ghaznavi-Youvalari | This paper studies a new intra prediction method based on a linear model for improving the intra prediction performance of Versatile Video Coding (H.266/VVC) standard. |
886 | Intra Frame Rate Control for Versatile Video Coding with Quadratic Rate-Distortion Modelling | Y. Chen, S. Kwong, M. Zhou, S. Wang, G. Zhu and Y. Wang | This paper proposes a new quadratic R-D model for Versatile Video Coding. |
887 | Performance Comparison of Lossless Compression Strategies for Dynamic Vision Sensor Data | K. Iqbal, N. Khan and M. G. Martini | We compare the performance of a number of strategies, including the only strategy developed specifically for such data and other more generic data compression strategies, tailored here to the case of neuromorphic data. |
888 | Encoder-Recurrent Decoder Network for Single Image Dehazing | A. Dang, T. H. Vu and J. Wang | This paper develops a deep learning model, called Encoder-Recurrent Decoder Network (ERDN), which recovers the clear image from a degrade hazy image without using the atmospheric scattering model. |
889 | Denoising of Event-Based Sensors with Spatial-Temporal Correlation | J. Wu, C. Ma, X. Yu and G. Shi | In this paper, we introduce a novel event stream denoising method for such sensors. |
890 | A Visual-Pilot Deep Fusion for Target Speech Separation in Multitalker Noisy Environment | Y. Li, Z. Liu, Y. Na, Z. Wang, B. Tian and Q. Fu | In this paper, we deploy face tracking and propose the low-dimension hand-crafted visual features and the low-cost deep fusion architectures to separate the unseen but visible target sources in multi-talker noisy environment. |
891 | C3DVQA: Full-Reference Video Quality Assessment with 3D Convolutional Neural Network | M. Xu, J. Chen, H. Wang, S. Liu, G. Li and Z. Bai | In this paper, we present a novel architecture, namely C3DVQA, that uses Convolutional Neural Network with 3D kernels (C3D) for full-reference VQA task. |
892 | Exploring Entity-Level Spatial Relationships for Image-Text Matching | Y. Xia, L. Huang, W. Wang, X. Wei and J. Chen | To this end, we utilize the relative position of objects to capture entity-level spatial relationships for image-text matching. |
893 | A Deep Multimodal Approach for Map Image Classification | T. Sawada and M. Katsurai | Specifically, we present a novel strategy for preprocessing text data that are positioned inside the map images, which are extracted using OCR. |
894 | Selective Convolutional Network: An Efficient Object Detector with Ignoring Background | H. Ling, Y. Qin, L. Zhang, Y. Shi and P. Li | Therefore, we introduce an efficient object detector called Selective Convolutional Network (SCN), which selectively calculates only on the locations that contain meaningful and conducive information. |
895 | Back-And-Forth Prediction for Deep Tensor Compression | H. Choi, R. A. Cohen and I. V. Bajic | In this paper we present a prediction scheme called Back-and-Forth (BaF) prediction, developed for deep feature tensors, which allows us to dramatically reduce tensor size and improve its compressibility. |
896 | Effective Pipeline for Compressing Deep Object Detectors | Y. Yao, Z. Fang, B. Dong and S. Zhou | To alleviate the deployment of deep object detectors with large model capacity and complex computation, an effective model compression pipeline is designed in this paper. |
897 | Gated Mechanism for Attention Based Multi Modal Sentiment Analysis | A. Kumar and J. Vepa | In this paper, we address three aspects of multimodal sentiment analysis; 1. Cross modal interaction learning, i.e. how multiple modalities contribute to the sentiment, 2. Learning long-term dependencies in multimodal interactions and 3. Fusion of unimodal and cross modal cues. |
898 | Multitask Learning and Multistage Fusion for Dimensional Audiovisual Emotion Recognition | B. T. Atmaja and M. Akagi | This paper proposes two methods to predict emotional attributes from audio and visual data using a multitask learning and a fusion strategy. |
899 | Object Detection and 3d Estimation Via an FMCW Radar Using a Fully Convolutional Network | G. Zhang, H. Li and F. Wenger | To ensure successful training of a fully convolutional network (FCN), we propose a normalization method, which is found to be essential to be applied to the radar signal before feeding into the neural network. |
900 | Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection | J. Roth et al. | In this paper, we present the AVA Active Speaker detection dataset (AVA-ActiveSpeaker) which has been publicly released to facilitate algorithm development and comparison. |
901 | Supervised Deep Hashing for Efficient Audio Event Retrieval | A. Jati and D. Emmanouilidou | This work investigates the potency of different hashing techniques for efficient audio event retrieval. |
902 | An LSTM-Based Dynamic Chord Progression Generation System for Interactive Music Performance | C. Garoufis, A. Zlatintsi and P. Maragos | In this paper, we describe an interactive generative music system, designed to handle polyphonic guitar music. |
903 | Ensemble Network For Ranking Images Based On Visual Appeal | S. Singh, V. Sanchez and T. Guha | We propose a computational framework for ranking images (group photos in particular) taken at the same event within a short time span. |
904 | Trapezoidal Segment Sequencing: A Novel Approach for Fusion of Human-Produced Continuous Annotations | B. M. Booth and S. S. Narayanan | This work proposes trapezoidal segment sequencing, a new method for fusing annotations into a single representation that, when used alongside a recently proposed signal warping pipeline for correcting annotation artifacts, produces accurate ground truths. |
905 | Sequence-to-Sequence Labanotation Generation Based on Motion Capture Data | M. Li, Z. Miao and C. Ma | In this paper, we propose a sequence-to-sequence approach that can generate Labanotation scores from unsegmented motion data sequences. |
906 | Pose Refinement: Bridging the Gap Between Unsupervised Learning and Geometric Methods for Visual Odometry | L. Zhang, G. Li and T. H. Li | We propose a hybrid VO system which combines an unsupervised monocular VO with a pose refinement backend, performing optimization in real time, leading to boosted performance. |
907 | Multimodal Active Speaker Detection and Virtual Cinematography for Video Conferencing | R. Cutler et al. | We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale. |
908 | A New Variational Method for Deep Supervised Semantic Image Hashing | F. Zhuang and P. Moulin | We present a supervised semantic hashing method which uses a variational autoencoder to represent each database image sample as a product Bernoulli distribution. |
909 | DOA Estimation in Systems with Nonlinearities for MMWAVE Communications | A. Sant and B. D. Rao | We address the DOA estimation problem for a general nonlinear distortion of the received signal. |
910 | Wideband Direction of Arrival Estimation with Sparse Linear Arrays | F. Wang, Z. Tian, J. Fang and G. Leus | This paper concerns wideband direction of arrival (DoA) estimation with sparse linear arrays (SLAs). |
911 | Fourth Order Cumulant Based Active Direction of Arrival Estimation Using Coprime Arrays | Z. Fu, P. Charg? and Y. Wang | In this paper, we investigate the fourth order cumulant for active DOA estimation. |
912 | On Regularization Parameter for L0-Sparse Covariance Fitting Based DOA Estimation | A. Delmer, A. Ferr?ol and P. Larzabal | In this paper, we provide a statistical method allowing to estimate an admissible interval where ? must be chosen. |
913 | Effective Approximate Maximum Likelihood Estimation of Angles of Arrival for Non-Coherent Sub-Arrays | T. Tirer and O. Bialer | In this paper, we propose a technique to estimate the sub-arrays phase offsets for a given AOAs hypothesis, which facilitates approximate maximum likelihood estimation of the AOAs from a single snapshot. |
914 | Two-dimensional DOA Estimation for Coprime Planar Array: A Coarray Tensor-based Solution | H. Zheng, C. Zhou, Y. Gu and Z. Shi | To address this problem, we propose a novel coarray tensor-based two-dimensional underdetermined DOA estimation method for coprime planar array in this paper, where both the multi-dimensional information of the received signals and the augmented coarray are effectively utilized. |
915 | Multi-constraint Spectral Co-design for Colocated MIMO Radar and MIMO Communications | S. H. Dokhanchi, M. R. Bhavani Shankar, K. V. Mishra and B. Ottersten | The paper addresses the challenge of designing a waveform in MIMO-radar MIMO-communications (MRMC) set-up in a broadcast environment to ensure certain performance of the two systems is guaranteed. |
916 | Tensor Decomposition-based Beamspace Esprit Algorithm for Multidimensional Harmonic Retrieval | F. Wen, H. C. So and H. Wymeersch | In this paper, we propose a beamspace tensor-ESPRIT for multidimensional HR. |
917 | Information Theoretic Approach for Waveform Design in Coexisting MIMO Radar and MIMO Communications | M. Alaee-Kerharoodi, S. M. R. Bhavani, K. V. Mishra and B. Ottersten | We investigate waveform design for coexistence between a multiple-input multiple-output (MIMO) radar and MIMO communications (MRMC), with a radar-centric criterion that leads to a minimal interference in the communications system. |
918 | Transmit Beampattern Shaping via Waveform Design in Cognitive Mimo Radar | E. Raei, M. Alaee-Kerahroodi, B. S. M. R. and B. Ottersten | This paper is focused on designing a set of constant modulus waveform for cognitive Multiple-Input Multiple-Output (MIMO) radar systems. |
919 | Multilinear Generalized Singular Value Decomposition (Ml-gsvd) with Application to Coordinated Beamforming in Multi-user Mimo Systems | L. Khamidullina, A. L. F. de Almeida and M. Haardt | In this paper, we propose a new Multilinear Generalized Singular Value Decomposition (ML-GSVD) which allows to jointly factorize a set of matrices with one common dimension. |
920 | Sparse Low-redundancy Linear Array with Uniform Sum Co-array | R. Rajam?ki and V. Koivunen | In this paper, we introduce a parametric sparse linear array configuration called the Kl?ve array (KA). |
921 | Compressed Sensing Based Channel Estimation and Open-loop Training Design for Hybrid Analog-digital Massive MIMO Systems | K. Ardah, B. Sokal, A. L. F. de Almeida and M. Haardt | In this paper, we propose an openloop hybrid analog-digital beam-training framework, where a given sensing matrix is decomposed into analog and digital beamformers. |
922 | Dispersive Grid-free Orthogonal Matching Pursuit for Modal Estimation in Ocean Acoustics | T. Paviet-Salomon, C. Dorffer, J. Bonnel, B. Nicolas, T. Chonavel and A. Dr?meau | In the continuation of previous works, we propose here a new grid-free algorithm allowing a super-resolution of the (f-k) diagram by benifiting from the sparse nature of the wavenumber spectrum and embedding the broadband behavior of the wavenumbers within the algorithm. |
923 | Greedy Sparse Array Design for Optimal Localization under Spatially Prioritized Source Distribution | Y. Gershon, Y. Buchris and I. Cohen | We introduce a new measure called power spread, which quantifies the localization error region. Then, we propose a greedy algorithm for a sparse array design, aiming to minimize the power spread for optimal error region in a given area of interest. |
924 | Compressive 2-d Off-grid DOA Estimation for Propeller Cavitation Localization | Y. Park and P. Gerstoft | This paper introduces compressive sensing (CS) based two-dimensional (2-D) off-grid direction-of-arrival (DOA) estimation approach which can output the azimuths and elevations of radiating sources for propeller tip vortex cavitation localization. |
925 | The Compressed Nested Array for Underdetermined DOA Estimation by Fourth-order Difference Coarrays | Y. Zhou, Y. Li, L. Wang, C. Wen and W. Nie | In this paper, a new sparse array structure, which further improves the degrees of freedom (DOFs) and enhanced the DOA estimation performance, for the fourth-order cumulant based direction of arrival (DOA) estimation is proposed. |
926 | Model Order Selection in DoA Scenarios via Cross-entropy Based Machine Learning Techniques | A. Barthelme, R. Wiesmayr and W. Utschick | In this paper, we present a machine learning approach for estimating the number of incident wavefronts in a direction of arrival scenario. |
927 | Unsupervised Change Detection for Multimodal Remote Sensing Images via Coupled Dictionary Learning and Sparse Coding | V. Ferraris, N. Dobigeon, Y. Cavalcanti, T. Oberlin and M. Chabert | This paper proposes a coupled dictionary learning strategy to detect changes between two images with different modalities and possibly different spatial and/or spectral resolutions. |
928 | Fast Direction-of-arrival Estimation of Multiple Targets Using Deep Learning and Sparse Arrays | G. K. Papageorgiou and M. Sellathurai | In this work, we focus on improving the Direction-of-Arrival (DoA) estimation of multiple targets/sources from a small number of snapshots. |
929 | Learning Based Reconfigurable Sub-nyquist Sampling Framework for Ultra-wideband Angular Sensing | H. Joshi, M. Alaee-Kerahroodi, A. A. Kumar, B. Shankar Mysore R and S. J. Darak | In this work, an intelligent and reconfigurable ultra-wideband angular sensing (UWAS) framework is proposed which is independent of the maximum number of active transmissions in a wideband spectrum unlike the existing UWAS methods. |
930 | Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources | H. Sundar, W. Wang, M. Sun and C. Wang | In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. |
931 | DNN-based Mask Estimation Integrating Spectral and Spatial Features for Robust Beamforming | C. Deng, H. Song, Y. Zhang, Y. Sha and X. Li | In this paper, we propose a novel spectral-spatial mask based beamforming method for two-channel noisy signals, where spectral amplitude and cross-channel spatial features are integrated to improve mask estimation. |
932 | Ray Separation and Source Depth Estimation Based on Sound Pressure Field Transformation | R. Wei, X. Ma and X. Li | In this paper, we address the issue of submerged sound source depth estimation in the deep ocean using the idea of multipath ray separation. |
933 | Regularized Beamformer for the Spherical Microphone Array to Cope with the White Noise Amplification | L. Wang and J. Zhu | This paper presents two regularization methods to optimize the performance of the high directivity beamformer for the spherical microphone array. |
934 | Interpolation and Range Extrapolation of Sound Source Directivity Based on a Spherical Wave Propagation Model | J. Ahrens and S. Bilbao | We revisit the previously proposed approach of only using the angle-dependent magnitude of the measured directivity together with a spherical-wave propagation model and demonstrate its potential by means of numerical simulations based on two case studies. |
935 | Polarization Parameters Estimation with Scalar Sensor Arrays | M. Dai, X. Ma, W. Liu and W. Sheng | The polarization sensitivity model of an SSA is first established and then as an example, a dimension-reduction method based on multiple signal classification (MUSIC) is employed to jointly estimate the direction-of-arrival and polarization parameters. |
936 | DNN-based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays | N. Furnon, R. Serizel, I. Illina and S. Essid | In this context, we propose to extend the distributed adaptive node-specific signal estimation approach to a neural network framework. |
937 | Asymptotically Optimal Blind Calibration of Acoustic Vector Sensor Uniform Linear Arrays | A. Weiss, B. Nadler and A. Yeredor | We study the blind calibration problem of uniform linear arrays of acoustic vector sensors for narrowband Gaussian signals, and propose an improved, asymptotically optimal blind calibration scheme. |
938 | Optimized Sensor Selection for Joint Radar-communication Systems | A. Ahmed, S. Zhang and Y. D. Zhang | In this paper, we address optimal sensor selection for adaptive beamforming-based JRC systems by exploiting a constrained re-weighted l1-norm minimization with low computational complexity. |
939 | Frequency Diverse Array Radar: A Closed-Form Solution to Design Weights for Desired Beampattern | M. Zubair, S. Ahmed and M. Alouini | In this work, a novel algorithm with low complexity is proposed to focus the transmitted power of an FDA radar in the desired region-of-interest for longer dwell time. |
940 | Robust CFAR Radar Detection Using a K-nearest Neighbors Rule | A. Coluccia, A. Fascista and G. Ricci | In particular, a novel interpretation of the well-known Kelly�s and adaptive matched filter (AMF) detectors is provided in terms of decision region boundaries in a suitable feature space. Then, a new detector based on a feature vector that combines the two detection statistics is obtained by exploiting the k-nearest neighbors (KNN) approach. |
941 | Cramer-Rao Bound on DOA Estimation of Finite Bandwidth Signals Using a Moving Sensor | A. Arora, B. S. Mysore R and B. Ottersten | In this paper, we provide a framework for the direction of arrival (DOA) estimation using a single moving sensor and evaluate performance bounds on estimation. |
942 | Theoretical Analysis of Multi-Carrier Agile Phased Array Radar | T. Huang, N. Shlezinger, X. Xu, D. Ma, Y. Liu and Y. C. Eldar | In this paper, we theoretically analyze the range-Doppler recovery capabilities of CAESAR. |
943 | Convolutional Beamspace for Array Signal Processing | P. P. Vaidyanathan and P. Chen | A new type of beamspace for array processing is introduced called convolutional beamspace. |
944 | Efficient Estimation of Mixing Matrix Using a Two-sensor Array | Q. Yan, S. Sun, H. Zhang and G. Hua | Considering the computational cost in practice, this paper proposes an efficient mixing matrix estimation (MME) method using an easily configured two-sensor array. |
945 | Riemannian Geometry and Cram?r-rao Bound for Blind Separation of Gaussian Sources | F. Bouchard, A. Breloy, A. Renaux and G. Ginolhac | We consider the optimal performance of blind separation of Gaussian sources. |
946 | Beamformed Feature for Learning-based Dual-channel Speech Separation | H. Li, X. Zhang and G. Gao | This paper deals with the problem of separating target speech signal from reverberant and noisy environment with dual microphones, where the target speech comes from a predefined direction range. |
947 | Transmit Beamforming Design with Received-Interference Power Constraints: The Zero-Forcing Relaxation | E. Lagunas, A. P?rez-Neira, M. A. Lagunas and M. A. Vazquez | In this paper, we study in detail an alternative transmit beamforming design framework, where we allow some residual received-interference power instead of trying to null it completely out. |
948 | Foreground Signature Extraction for an Intimate Mixing Model in Hyperspectral Image Classification | J. Hollis, R. Raich, J. Kim, B. Fishbain and S. Kendler | We introduce a framework for foreground signature extraction based on a proposed patch model. |
949 | Pixel-Wise Linear/Nonlinear Nonnegative Matrix Factorization for Unmixing of Hyperspectral Data | F. Zhu, P. Honeine and J. Chen | In this work, we propose an unsupervised nonlinear unmixing method that overcomes these weaknesses. |
950 | Beam Elimination Based on Sequentially Estimated a Posteriori Probabilities of Winning | M. K. Marandi, W. Rave and G. Fettweis | In this paper, we refine the elimination process by introducing a new elimination mechanism based on estimated winning probability i.e. probability of being the strongest candidate for each beam at each time step. |
951 | Two-Element Biomimetic Antenna Array Design and Performance | R. J. Kozick, F. T. Dagefu and B. M. Sadler | The optimum coupling network that minimizes the Cramer-Rao bound (CRB) for angle of arrival (AOA) estimation is derived and a design method is presented to synthesize the network. |
952 | Distributed Equalization and Power Allocation For Multi-Carrier Bidirectional Filter-and-Forward Relay Networks | S. KianiHarchegani, S. ShahbazPanahi, M. Dong and G. Boudreau | We propose a novel semi-closed form sub-optimal solution to this non-convex joint optimization problem. |
953 | Upscaling Vector Approximate Message Passing | N. Skuratovs and M. Davies | In this paper we consider the problem of recovering a signal x of size N from noisy and compressed measurements y = Ax + w of size M, where the measurement matrix A is right-orthogonally invariant (ROI). |
954 | Atomic Norm Based Localization of Far-Field and Near-Field Signals with Generalized Symmetric Arrays | X. Wu, W. Zhu and J. Yan | In this paper, we propose a localization method for mixed FF and NF sources based on the generalized symmetric linear arrays, which include ULAs, Cantor array, Fractal array and many other SLAs. |
955 | A Novel Moving Sparse Array Geometry with Increased Degrees of Freedom | S. Li and X. Zhang | In this paper, we propose a novel moving sparse array geometry named dilated arrays (DAs) by extending the dilation of nested arrays to other linear array structures. |
956 | Clutter Identification Based on Sparse Recovery and L1-Type Probabilistic Distance Measures | Y. Zhu, Y. Xiang, S. Sen, E. Dagois, A. Nehorai and M. Akcakaya | We form a dictionary of various clutter distributions and identify the distribution of the new clutter data through matching pursuit using probabilistic similarity and distance measures under sparsity constraints. |
957 | Adaptive Subspace Detectors for off-grid Mismatched Targets | J. Bosse, O. Rabaste and J. Ovarlez | We propose here to use adaptive subspace detectors to solve this issue, a suitable subspace (that coincides with the Discrete Prolate Spheroidal Sequences basis when the signal model is that of sinusoids in noise) is proposed that offers robust performance. |
958 | A Method for Millimeter-Wave Imaging of Concealed Objects Via De-Aliasing | W. Wang and K. Yang | In this paper, we propose a new imaging method by de-aliasing. |
959 | A Fast Sparse Covariance-Based Fitting Method for DOA Estimation via Non-Negative Least Squares | C. Zheng, H. Chen and A. Wang | A fast sparse covariance-based fitting algorithm with the non-negative least squares (NNLS) form is proposed for the direction of arrival (DOA) estimation. |
960 | Extended Cyclic Coordinate Descent for Robust Row-Sparse Signal Reconstruction in the Presence of Outliers | H. Huang, H. C. So and A. M. Zoubir | We propose an extended CCD algorithm to solve the problem for complex-valued measurements, which requires careful characterization and derivation. |
961 | Low-Rank Toeplitz Matrix Estimation Via Random Ultra-Sparse Rulers | H. Lawrence, J. Li, C. Musco and C. Musco | In this work, we introduce random ultra-sparse rulers and propose an improved approach based on these objects. |
962 | Urtis: a Small 3d Imaging Sonar Sensor for Robotic Applications | T. Verellen, R. Kerstens, D. Laurijssen and J. Steckel | In this paper we will discuss the newest 3D in-air sonar sensor developed by CoSys-Lab: the micro Real Time Imaging Sonar (?RTIS). |
963 | A Partial Relaxation DOA Estimator Based on Orthogonal Matching Pursuit | M. Trinh-Hoang, W. Ma and M. Pesavento | By adopting the principle of the partial relaxation approach, in this paper, a modification of the Orthogonal Matching Pursuit algorithm is proposed and applied to the Direction-of-Arrival estimation problem. |
964 | Variable Projection for Multiple Frequency Estimation | Y. E. Garcia, P. Kov?cs and M. Huemer | In this paper, such formulation is derived, and a variable projection (VP) optimization is proposed for solving the SNLLS problem and estimate the frequency parameters. |
965 | Fusionndvi: A Novel Fusion Method for NDVI in Remote Sensing | M. Zhang, Z. Zhao, Y. Chen, Z. Wang and X. Tian | The fusion problem is formulated to minimize a least square fitting error term and a nonlocal gradient sparsity regularization term. |
966 | Robust Music Estimation Under Array Response Uncertainty | A. Bazzi | We introduce a MUSIC estimator, hereby referred to as Robust-MUSIC, that is capable of estimating AoAs of multiple signals, when the antenna array?s response is subject to uncertainties, due to the aforementioned reasons. |
967 | L1-Norm Higher-Order Orthogonal Iterations for Robust Tensor Analysis | D. G. Chachlakis, A. Prater-Bennette and P. P. Markopoulos | To counteract the impact of outliers in tensor data analysis, we propose L1-Tucker: a reformulation of standard Tucker decomposition, resulting by simple substitution of the outlier-responsive L2-norm by the sturdier L1-norm. |
968 | Sensor Selection for Model-Free Source Localization: where Less is More | E. Tohidi, J. Chen and D. Gesbert | In this paper, we demonstrate that instead of employing all the sensors that result in a possibly unbalanced sensing pattern, it is better to reduce the number of sensors such that the subset of selected sensors symmetrically distributes around the source, which in principle would need to know the source location in advance. |
969 | Anomaly Detection with Training Data in Hyperspectral Imagery | J. Liu, Y. Feng, W. Liu, D. Orlando and H. Li | In this paper, we investigate the anomaly detection problem for multi-pixel targets in hyperspectral imagery when training data are available. |
970 | Least-Squares DOA Estimation with an Informed Phase Unwrapping and Full Bandwidth Robustness | A. Bohlender, A. Spriet, W. Tirry and N. Madhu | In this contribution, we focus on both of the described shortcomings. |
971 | Joint Blind Calibration and Time-Delay Estimation for Multiband Ranging | T. Kazaz, M. Coutino, G. J. M. Janssen and A. van der Veen | In this paper, we focus on the problem of blind joint calibration of multiband transceivers and time-delay (TD) estimation of multipath channels. |
972 | Upgrade Methods for Stratified Sensor Network Self-Calibration | M. Larsson, G. Flood, M. Oskarsson and K. ?str?m | In the paper new efficient algorithms for solving for the upgrade parameters using minimal data are presented. |
973 | Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT | F. Grondin, H. Tang and J. Glass | This paper proposes a straightforward 2-D method to spatially calibrate the visual field of a camera with the auditory field of an array microphone by generating and overlaying an acoustic image over an optical image. |
974 | Weighted Null Vector Initialization and its Application to Phase Retrieval | K. Liu, L. Li and L. Wan | This paper introduces a simple method, called weighted null vector initialization (WNI), which can be used to compute accurate initialization vectors for solving non-convex phase retrieval. |
975 | Low-Complexity Accurate Mmwave Positioning for Single-Antenna Users Based on Angle-of-Departure and Adaptive Beamforming | A. Fascista, A. Coluccia, H. Wymeersch and G. Seco-Granados | Based on this analysis, a low-complexity two-step algorithm with improved localization performance is proposed, which first performs a (coarse) angle of departure estimation and then precodes the down-link signal to introduce beamforming towards the user direction. |
976 | Accurate Localization of AUV in Motion by Explicit Solution Using Time Delays | T. Jia, K. C. Ho, H. Wang and X. Shen | A new time delay model that accounts for the motion is proposed for moving AUV localization. |
977 | Drss-Based Localisation Using Weighted Instrumental Variables and Selective Power Measurement | J. Li, K. Dogan?ay, N. H. Nguyen and Y. Wei Law | This paper addresses the bias problem when the noise is large by using the method of selective power measurement to maintain correlation between the instrumental variable and data matrix. |
978 | A Simple and Efficient Iterative Method for Toa Localization | Y. Zou and H. Liu | This paper develops a simple and efficient method for source localization using signal time-of-arrival (TOA) measurements. |
979 | Distributed Tracking and Circumnavigation Using Bearing Measurements | A. Parayil and J. George | This paper is concerned with the problem of bearings based multi-agent circumnavigation of a maneuvering target. |
980 | Joint Multitarget Tracking and Dynamic Network Localization in the Underwater Domain | R. Mendrzik et al. | In contrast to commonly-used approaches which split the sensor localization and target tracking into two different sub-problems, we propose a holistic approach for joint localization and tracking. |
981 | Robust Tdoa Indoor Tracking Using Constrained Measurement Filtering and Grid-Based Filtering | R. Huang, J. Tao, L. Yang, Y. Xue and Q. Wu | This paper presents an enhanced two-step approach to achieve robust TDOA indoor tracking. |
982 | Extended Object Tracking Using Hierarchical Truncation Measurement Model with Automotive Radar | Y. Xia et al. | Motivated by real-world automotive radar measurements that are distributed around object (e.g., vehicles) edges with a certain volume, a novel hierarchical truncated Gaussian measurement model is proposed to resemble the underlying spatial distribution of radar measurements. |
983 | DOA Tracking Via Signal-Subspace Projector Update | J. Zhuang, T. Tan, D. Chen and J. Kang | We develop a novel direction-of-arrival (DOA) tracking method in which we directly operate the signal-subspace projector instead of tracking the subspace eigenbasis. |
984 | Parameter Estimation of In-City Frontal Rainfall Propagation | M. Hadar, J. Ostrometzky and H. Messer | In this study, we demonstrate that standard signal-level measurements being collected by the network can be used to estimate the movement of an ongoing storm. |
985 | ML and EM Estimation of Sampling Intervals of Sensor Devices | R. Nishimura and Y. Suzuki | A precise method of estimating update interval of the register is discussed herein in conjunction with modeling of the problem. |
986 | Asymptotic Stochastic Analysis of Partially Relaxed DML | D. Schenck, X. Mestre and M. Pesavento | In this paper, we investigate the outlier production mechanism of the Partially Relaxed Deterministic Maximum Likelihood (PR-DML) Direction-of-Arrival estimator using tools from Random Matrix Theory. |
987 | Theoretical Performance Bound of Uplink Channel Estimation Accuracy in Massive MIMO | A. Osinsky, A. Ivanov and D. Yarotsky | In this paper, we present a new performance bound for uplink channel estimation (CE) accuracy in the Massive Multiple Input Multiple Output (MIMO) system. |
988 | Signal-Aware Broadband DOA Estimation Using Attention Mechanisms | W. Mack, U. Bharadwaj, S. Chakrabarty and E. A. P. Habets | To obtain a flexible signal-aware DOA estimator, we propose to use binary mask attention with a DNN for multi-source DOA estimation trained with artificial noise. |
989 | Static Visual Spatial Priors for DoA Estimation | P. Swietojanski and O. Miksik | In this paper, we propose what to our knowledge is the first multi-modal direction of arrival (DoA) of sound, which uses static visual spatial prior providing an auxiliary information about the environment to suppress some of the false DoA detections. |
990 | Mirrored Arrays for Direction-of-Arrival Estimation | D. Zhu, G. Li and X. Zhang | A mirrored array configuration, which consists of an ordinary linear array and a reflector, is proposed. |
991 | Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural Networks | L. Comanducci, M. Cobos, F. Antonacci and A. Sarti | Inspired by deep-learning-based image denoising solutions, we propose in this paper the use of convolutional neural networks (CNNs) to learn the time-delay patterns contained in FS-GCCs extracted in adverse acoustic conditions. |
992 | Group-Utility Metric for Efficient Sensor Selection and Removal in LCMV Beamformers | A. M. Narayanan and A. Bertrand | In this paper we derive a generalized expression to efficiently calculate the utility of such groups as a whole, called the group-utility. |
993 | Accurate Semidefinite Relaxation Method for 3-D Rigid Body Localization Using AOA | G. Wang and K. C. Ho | This paper addresses the rigid body localization problem using angle-of-arrival measurements. We formulate the problem as a constrained weighted least squares (CWLS) minimization problem with the rotation matrix and position vector as variables, which is a challenging non-convex problem. |
994 | Cram?r-Rao Bounds for Flaw Localization in Subsampled Multistatic Multichannel Ultrasound Ndt Data | E. P?rez, J. Kirchhof, S. Semper, F. Krieg and F. R?mer | In this paper, we analyze a compressed sensing scenario in which location parameters of point-like scatterers are estimated from subsampled FMC data. |
995 | Age of Information with Finite Horizon and Partial Updates | D. Ramirez, E. Erkip and H. V. Poor | Age of Information with Finite Horizon and Partial Updates |
996 | Robust Online Mirror Saddle-Point Method for Constrained Resource Allocation | E. Tampubolon and H. Boche | For this reason, we introduce a new performance measure, whose particular instance is the cumulative positive part of the constraint violations. |
997 | Real-Time Task Offloading for Large-Scale Mobile Edge Computing | Y. Xu, P. Cheng, Z. Chen, M. Ding, Y. Li and B. Vucetic | In this paper, we propose a novel index based real-time task offloading policy for an asynchronous large-scale MEC system. |
998 | Simple Caching Schemes for Non-homogeneous MISO Cache-Aided Communication via Convexity | I. Bergel and S. Mohajer | We present a novel scheme for cache-aided communication over multiple-input and single output (MISO) cellular networks. |
999 | Dynamic Resource Optimization and Altitude Selection in Uav-Based Multi-Access Edge Computing | F. Costanzo, P. D. Lorenzo and S. Barbarossa | The aim of this work is to develop a dynamic optimization strategy to allocate communication and computation resources in a Multi-access Edge Computing (MEC) scenario, where Unmanned Aerial Vehicles (UAVs) act as flying base station platforms endowed with computation capabilities to provide edge cloud services on demand. |
1000 | Joint Resource Allocation and Routing for Service Function Chaining with In-Subnetwork Processing | N. Reyhanian, H. Farmanbar, S. Mohajer and Z. Luo | In this paper, we study the joint problem of VNF placement on servers and traffic engineering for a network spanning multiple subnetworks. |
1001 | Online Channel Estimation for Hybrid Beamforming Architectures | J. Fink, R. L. G. Cavalcante and S. Stanczak | In this paper, we use variants of the adaptive projected subgradient method to devise online estimation algorithms that exploit temporal correlations of channel samples. |
1002 | An Optimal Channel Estimation Scheme for Intelligent Reflecting Surfaces Based on a Minimum Variance Unbiased Estimator | T. L. Jensen and E. De Carvalho | In a wireless system with Intelligent Reflective Surfaces (IRS) containing many passive elements, we consider the problem of channel estimation. |
1003 | Low-Rank MMWAVE MIMO Channel Estimation in One-Bit Receivers | N. J. Myers, K. N. Tran and R. W. Heath | In this paper, we develop channel estimation algorithms that exploit the low-rank property of mmWave channels. |
1004 | Channel Charting: an Euclidean Distance Matrix Completion Perspective | P. Agostini, Z. Utkovski and S. Stanczak | In this paper, we introduce a correlation matrix distance (CMD) based dissimilarity measure for CC that allows us to group CSI measurements according to their co-linearity. |
1005 | MMSE-Based Channel Estimation for Hybrid Beamforming Massive MIMO with Correlated Channels | J. Mirzaei, F. Sohrabi, R. Adve and S. ShahbazPanahi | In this paper, we study the channel estimation problem in microwave correlated massive multiple-input-multiple-output systems with reduced number of radio-frequency chains. |
1006 | Complex Trainable Ista for Linear and Nonlinear Inverse Problems | S. Takabe, T. Wadayama and Y. C. Eldar | In this paper, we propose a trainable iterative signal recovery algorithm named complex-field TISTA (C-TISTA) which treats complex-field nonlinear inverse problems. |
1007 | Conditional Mutual Information Neural Estimator | S. Molavipour, G. Bassi and M. Skoglund | Motivated by the fact that, in many communication schemes, the achievable transmission rate is determined by a conditional mutual information term, this paper focuses on neural-based estimators for this information-theoretic quantity. |
1008 | Q-Learning Based Predictive Relay Selection for Optimal Relay Beamforming | A. Dimas, K. Diamantaras and A. P. Petropulu | This paper considers a scenario of relay beamforming, in which relays collaboratively retransmit the source signal so that they maximize the average signal-to-interference+noise ratio (SINR) at the destination. |
1009 | Peer To Peer Offloading With Delayed Feedback: An Adversary Bandit Approach | M. Yang, H. Zhu, H. Wang, Y. Koucheryavy, K. Samouylov and H. Qian | In this paper, we consider a peer computation offloading problem for a fog network with unknown dynamics. |
1010 | Transferable Policies for Large Scale Wireless Networks with Graph Neural Networks | M. Eisen and A. Ribeiro | To learn a transferable policy that can generalize to varying and growing networks, we propose the use of so-called random edge graph neural networks (REGNNs). |
1011 | A Zeroth-Order Learning Algorithm for Ergodic Optimization of Wireless Systems with no Models and no Gradients | D. S. Kalogerias, M. Eisen, G. J. Pappas and A. Ribeiro | Under a modular stochastic functional optimization framework, we propose a new zeroth-order stochastic primal-dual algorithm for completely data-driven, model-free and gradient-free learning of optimal resource allocation policies for ergodic network optimization. |
1012 | Joint Sparse Recovery Using Deep Unfolding With Application to Massive Random Access | A. P. Sabulal and S. Bhashyam | We propose a learning-based joint sparse recovery method for the multiple measurement vector (MMV) problem using deep unfolding. |
1013 | Learning-Based Content Caching and User Clustering: A Deep Deterministic Policy Gradient Approach | K. Chan and F. Chien | We propose a new learning structure, termed multiDDPG (MDDPG), that demonstrates better EE performance while providing a comparable CHP to the caching scheme with known content popularity. |
1014 | Learning-Aided Content Placement in Caching-Enabled fog Computing Systems Using Thompson Sampling | J. Zhu, X. Huang and Z. Shao | In this paper, we focus on the problem of online content placement with unknown content popularity in caching-enabled fog computing systems, i.e., how to decide and update cached content on resourcelimited edge fog nodes to maximize cache hit rate and minimize switching costs of content update. |
1015 | Joint Coding and Modulation in the Ultra-Short Blocklength Regime for Bernoulli-Gaussian Impulsive Noise Channels Using Autoencoders | K. Vedula, R. Paffenroth and D. R. Brown | This paper develops a joint coding and modulation scheme for end-to-end communication system design using an autoencoder architecture in the ultra-short blocklength regime. |
1016 | Deep Joint Source-Channel Coding for Wireless Image Retrieval | M. Jankowski, D. G?nd?z and K. Mikolajczyk | Motivated by surveillance applications with wireless cameras or drones, we consider the problem of image retrieval over a wireless channel. |
1017 | Meta-Learning to Communicate: Fast End-to-End Training for Fading Channels | S. Park, O. Simeone and J. Kang | In this paper, we propose to obviate the limitations of joint training via meta-learning: Rather than training a common model for all channels, meta-learning finds a common initialization vector that enables fast training on any channel. |
1018 | Complexity Reduction Methods for Index Modulation Based Dual-Function Radar Communication Systems | T. Huang, N. Shlezinger, X. Xu, Y. Liu and Y. C. Eldar | In this work we propose schemes for reducing the decoding complexity associated with IM-based DFRC systems. |
1019 | Equalization of OFDM Waveforms with Insufficient Cyclic Prefix | D. Gregoratti and X. Mestre | In this paper, a simple equalization strategy for OFDM waveforms is proposed that specifically targets the case where the cyclic prefix is insufficient to span the whole channel duration. |
1020 | Faster-Than-Nyquist Signaling Via Spatiotemporal Symbol-Level Precoding for Multi-User MISO Redundant Transmissions | W. A. Martins, D. Spano, S. Chatzinotas and B. Ottersten | We propose a framework for redundant block-based symbol-level precoders enabling the trade-off between constructive and destructive multi-user and interblock interference (IBI) effects at the singleantenna user terminals. |
1021 | Optimized Single Carrier Transceiver for Future Sub-TeraHertz Applications | S. Bica?s, J. Dor?, G. Goug?on and Y. Corre | In this paper, we address the design of a single carrier transceiver resilient to phase noise. |
1022 | Power Spectrum Optimization for Capacity of the Extended Spectrum Hybrid Fiber Coax Network | R. Strobel and T. Hewavithana | This work presents a new method of spectrum optimization for the coax channel up to 3GHz, taking transmitter distortion into account in the optimization. |
1023 | A Low-Latency Successive Cancellation Hybrid Decoder for Convolutional Polar Codes | Y. Wang et al. | With the observation that SCL and SCF decoding are similar at giving more chances to inspect possible codewords simultaneously or sequentially, a novel hybrid decoder is proposed in this paper, which essentially combines the ideas of SCF and SCL decoders. |
1024 | Near Capacity RCQD Constellations for PAPR Reduction of OFDM Systems | T. Arbi, I. Nasr and B. Geller | We investigate an optimized blind SeLected Mapping (SLM) algorithm to reduce the Peak-to-Average Power Ratio (PAPR) for Orthogonal Frequency Division Multiplexing (OFDM) systems with Signal Space Diversity (SSD). |
1025 | Fully Pipelined Iteration Unrolled Decoders the Road to TB/S Turbo Decoding | S. Weithoffer, R. Klaimi, C. A. Nour, N. Wehn and C. Douillard | In this paper, we present recent findings on the implementation of ultra-high throughput Turbo decoders. |
1026 | Zero-Crossing Precoding with Maximum Distance to the Decision Threshold for Channels with 1-Bit Quantization and Oversampling | D. M. V. Melo, L. T. N. Landau and R. C. de Lamare | In this study, we propose a novel waveform for bandlimited channels with 1-bit quantization and oversampling at the receivers. |
1027 | Achieving Fully-Digital Performance by Hybrid Analog/Digital Beamforming in Wide-Band Massive-Mimo Systems | A. Morsali and B. Champagne | In this paper, we study the realization of any given fullydigital precoder (FDP) by hybrid analog/digital precoding (HADP) in wide-band mmWave systems. |
1028 | Energy-Efficient Bit Allocation for Resolution-Adaptive ADC in Multiuser Large-Scale MIMO Systems: Global Optimality | K. Nguyen, Q. Vu, L. Tran and M. Juntti | We consider uplink multiuser wireless communications systems, where the base station (BS) receiver is equipped with a large-scale antenna array and resolution adaptive analog-to-digital converters (ADCs). |
1029 | Generalized Spatial Modulation for Wireless Terabits Systems Under Sub-THZ Channel With RF Impairments | M. Saad, F. Bader, A. C. Al Ghouwayel, H. Hijazi, N. Bouhel and J. Palicot | The obtained results reveal that QPSK-GSM system is the best combination compared to GSM systems with any other Mary modulation scheme (e.g. PSK, DPSK, QAM, PAM). |
1030 | Efficient Techniques For in-band System Information Broadcast in Multi-Cell Massive Mimo | J. Jayachandran, K. Biswas, S. K. Mohammed and E. G. Larsson | In this paper we consider joint beamforming of data to scheduled terminals (STs) and broadcast of system information (SI) to idle terminals (ITs) on the same time-frequency resource in multi-cell multi-user massive MIMO systems. |
1031 | Optimal Design of Energy-Efficient Cell-Free Massive Mimo: Joint Power Allocation and Load Balancing | T. V. Chien, E. Bj?rnson and E. G. Larsson | To find a flexible and energy-efficient implementation, we minimize the total power consumption at the APs in the downlink, considering both the hardware and transmit powers, where APs can be turned off. |
1032 | Large-Scale Fading Precoding for Maximizing the Product of SINRs | ?. T. Demir and E. Bj?rnson | This paper considers the large-scale fading precoding design for mitigating the pilot contamination in the downlink of multi-cell massive MIMO (multiple-input multiple-output) systems. |
1033 | Proximal Distance Algorithm for Nonconvex QCQP with Beamforming Applications | Q. Li, Y. Liu, M. Shao and W. Ma | In light of this, this work aims at developing an efficient approach to the QCQP without the above mentioned drawbacks. |
1034 | Cloud-Driven Multi-Way Multiple-Antenna Relay Systems: Best-User-Link Selection and Joint Mmse Detection | F. L. Duarte and R. C. de Lamare | In this work, we present a cloud-driven uplink framework for multiway multiple-antenna relay systems which facilitates joint linear Minimum Mean Square Error (MMSE) symbol detection in the cloud and where users are selected to simultaneously transmit to each other aided by relays. |
1035 | A Complexity Efficient DMT-Optimal Tree Pruning Based Sphere Decoding | M. Neinavaie, M. Derakhtian and S. A. Vorobyov | We present a diversity multiplexing tradeoff (DMT) optimal tree pruning sphere decoding algorithm which visits merely a single branch of the search tree of the sphere decoding (SD) algorithm, while maintaining the DMT optimality at high signal to noise ratio (SNR) regime. |
1036 | A Model-Free Approach to Distributed Transmit Beamforming | J. George, C. T. Yilmaz, A. Parayil and A. Chakrabortty | This paper presents a model-free solution to distributed transmit beamforming using mobile agents. |
1037 | Intelligent Reflecting Surface for Massive Device Connectivity: Joint Activity Detection and Channel Estimation | S. Xia and Y. Shi | In this paper, we formulate the IRS-related activity detection and channel estimation problem as sparse matrix factorization, matrix completion and multiple measurement vector problem and, we propose a three-stage framework based on the approximate message passing. |
1038 | Channel Covariance Estimation in Multiuser Massive Mimo Systems with an Approach Based on Infinite Dimensional Hilbert Spaces | R. L. G. Cavalcante and S. Stanczak | We propose a novel algorithm to estimate the channel covariance matrix of a desired user in multiuser massive MIMO systems. |
1039 | Eliminating Out-Of-Cell Interference in Cellular Massive Mimo with a Single Additional Transceiver | U. Erez and A. Leshem | The present work considers a line-of-sight channel model. It is shown that given a sufficiently large antenna array, it suffices that the number of transmit/receive chains exceeds the number of desired users by one in order to reduce the interference to any desired level by judiciously selecting the antenna elements. |
1040 | Favorable Propagation and Linear Multiuser Detection for Distributed Antenna Systems | R. Gholami, L. Cottatellucci and D. Slock | In this paper, we consider a wireless network with transmit and receive antennas distributed according to homogeneous point processes. |
1041 | Distributed Non-Orthogonal Pilot Design for Multi-Cell Massive Mimo Systems | Y. Wu, S. Med and Y. Gu | In this work, a distributed non-orthogonal pilot design approach is proposed to tackle the pilot contamination problem in multi-cell massive multiple input multiple output (MIMO) systems. |
1042 | Distributed Detection of Sparse Signals with 1-Bit Data in Two-Level Two-Degree Tree-Structured Sensor Networks | C. Li, G. Li and P. K. Varshney | In this paper, we present a new detector for the detection of sparse stochastic signals using 1-bit data in two-level two- degree tree-structured sensor networks (2L-2D TSNs). |
1043 | Objective Bayesian Detection Under Spatially Correlated Gaussian Observations for Multi-Antenna Cognitive Radio Network | M. H. Al-Ali and K. C. Ho | This paper develops an objective Bayesian detector for asserting the presence of primary user (PU) signal buried in additive noise/interference using a sequence of complex vector samples from a multi-antenna spectrum sensing system. |
1044 | A Gated Hypernet Decoder for Polar Codes | E. Nachmani and L. Wolf | In this work, we demonstrate how hypernet-works can be applied to decode polar codes by employing a new formalization of the polar belief propagation decoding scheme. |
1045 | Weighted Gradient Coding with Leverage Score Sampling | N. Charalambides, M. Pilanci and A. O. Hero | We present a novel weighted leverage score approach, that achieves improved performance for distributed gradient coding by utilizing an importance sampling. |
1046 | Low-Complexity 5g Slam with CKF-PHD Filter | H. Kim, K. Granstr?m, S. Kim and H. Wymeersch | This paper proposes a new implementation method for the 5G SLAM using message passing (MP) and the cubature Kalman filter (CKF). |
1047 | The Effect of Power Allocation on Visible Light Communication Using Commercial Phosphor-Converted Led Lamp for Indirect Illumination | A. A. Dowhuszko, M. Ilter, P. Pinho and J. H?m?l?inen | In this paper, we estimate the data rate that is feasible with VLC when indirect illumination is used. |
1048 | Robust Transmission Over Channels with Channel Uncertainty: an Algorithmic Perspective | H. Boche, R. F. Schaefer and H. Vincent Poor | This paper approaches this question from a fundamental, algorithmic point of view to study whether or not such optimal schemes can be found algorithmically in principle (without putting any constraints on the computational complexity of such algorithms). |
1049 | Deep Joint Source-Channel Coding of Images with Feedback | D. B. Kurka and D. G?nd?z | We consider wireless transmission of images in the presence of channel output feedback, by introducing an autoencoder-based deep joint source-channel coding (JSCC) scheme. |
1050 | A Learning Approach to Cooperative Communication System Design | Y. Lu, P. Cheng, Z. Chen, W. H. Mow and Y. Li | We present in this paper a Neural Network (NN)-based autoencoder (AE) approach to optimize its design. |
1051 | A Stacked-Autoencoder Based End-to-End Learning Framework for Decode-and-Forward Relay Networks | A. Gupta and M. Sellathurai | In this work, we study an end-to-end deep learning (DL)based constellation design for decode-and-forward (DF) relay network. |
1052 | A New Sampling Scheme for Distributed Blind Spectrum Sensing Using Energy Detectors | T. Wang, F. Chien and C. Hsieh | In this paper, we study the problem of blind spectrum sensing by exploring signal sampling at each cognitive radio (CR) in a distributed cognitive radio network. |
1053 | On Throughput of Millimeterwave Mimo Systems with Lowresolution ADCS | A. Khalili, S. Shahsavari, F. Shirani, E. Erkip and Y. C. Eldar | In this paper, a receiver with low resolution ADCs based on adaptive thresholds is considered in downlink mmWave communications in which the channel state information is not known a-priori and acquired through channel estimation. |
1054 | Reliable and Secure Transmission for Future Networks | Y. Hua | This paper introduces a novel physical layer encryption method called randomized reciprocal channel modulation (RRCM) for reliable and secure transmission of information against eavesdropper (Eve) with any number of antennas and any noise level. |
1055 | On Polar Coding For Finite Blocklength Secret Key Generation Over Wireless Channels | H. Hentil?, Y. Y. Shkel, V. Koivunen and H. V. Poor | We consider the problem of secret key generation from correlated Gaussian random variables in the finite blocklength regime. |
1056 | An Enhanced Decoding Algorithm for Coded Compressed Sensing | V. K. Amalladinne, J. Chamberland and K. R. Narayanan | This article introduces an enhanced decoding algorithm for coded compressed sensing where fragment recovery and the stitching process are executed in tandem, passing information between them. |
1057 | Optimal Window Design for W-OFDM | K. Hussain and R. L?pez-Valcarce | We present an optimal window design for windowed-OFDM minimizing OBR within a given frequency region. |
1058 | Computability of the Peak Value of Bandlimited Signals | H. Boche and U. J. M?nich | In this paper we study the peak value problem, i.e., the task of computing the peak value of a bandlimited signal from its samples. |
1059 | Joint Scheduling and Beamforming for Delay Sensitive Traffic with Priorities and Deadlines | I. Hadar and A. Leshem | In this paper we propose a simple scheduling algorithm which takes priorities and deadlines into account and allocates users dynamically to resource blocks and spatial beams according to the Tomlinson-Harashima precoder. |
1060 | Robust Hybrid Precoding For Interference Exploitation in Massive Mimo Systems | G. Hegde, C. Masouros and M. Pesavento | In this paper, we consider a multiuser massive MIMO system with hybrid analog-digital precoding architecture. |
1061 | State-Space Gaussian Process for Drift Estimation in Stochastic Differential Equations | Z. Zhao, F. Tronarp, R. Hostettler and S. S?rkk? | We propose to formulate this as a non-parametric Gaussian process regression problem and use an Ito-Taylor expansion for approximating the SDE. |
1062 | Computing Hilbert Transform and Spectral Factorization for Signal Spaces of Smooth Functions | H. Boche and V. Pohl | This paper characterizes for both operations precisely those signal spaces of differentiable functions for which such an effective control of the approximation error is possible. |
1063 | M-Estimators of Scatter with Eigenvalue Shrinkage | E. Ollila, D. P. Palomar and F. Pascal | In this paper, a more general approach is considered in which the SCM is replaced by an M-estimator of scatter matrix and a fully automatic data adaptive method to compute the optimal shrinkage parameter with minimum mean squared error is proposed. |
1064 | A Multitaper Reassigned Spectrogram for Increased Time-Frequency Localization Precision | M. Sandsten, I. Reinhold and R. Anderson | In this paper we propose a multitaper reassignment (mtRS) method for estimation of time- and frequency locations of Gaussian envelope transients. |
1065 | Stochastic Ml Estimation for Hyperspectral Unmixing Under Endmember Variability and Nonlinear Models | Y. Li, R. Wu and W. Ma | In this study we introduce a probabilistic approach for HU under two different models, namely, the normal composition model for modeling spatial endmember variability and the generalized bilinear model for modeling multi-path nonlinear effects. |
1066 | Robust Phase Retrieval with Outliers | X. Jiang, H. C. So and X. Liu | An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this paper. |
1067 | Node-Asynchronous Spectral Clustering On Directed Graphs | O. Teke and P. P. Vaidyanathan | Assuming that the graph operator has eigenvalue 1 and the input signal satisfies a certain condition (which ensures the existence of fixed points), this study presents the necessary and sufficient condition for the mean-squared convergence of the graph signal. |
1068 | Estimating Centrality Blindly From Low-Pass Filtered Graph Signals | Y. He and H. Wai | To remedy, we propose a robust blind centrality estimation method which substantially improves the centrality estimation performance. |
1069 | Blind Inference of Centrality Rankings from Graph Signals | T. M. Roddenberry and S. Segarra | We study the blind centrality ranking problem, where our goal is to infer the eigenvector centrality ranking of nodes solely from nodal observations, i.e., without information about the topology of the network. |
1070 | A Low-Dimensionality Method for Data-Driven Graph Learning | L. Stankovic, M. Dakovic, D. Mandic, M. Brajovic, B. Scalzo-Dees and A. G. Constantinides | In this paper, we propose a numerically efficient method for estimating of the normalized Laplacian through its eigenvalues estimation and by promoting its sparsity. |
1071 | Metric Representations of Networks: A Uniqueness Result | S. Segarra, T. M. Roddenberry, F. M?moli and A. Ribeiro | In this paper, we consider the problem of projecting networks onto metric spaces. |
1072 | On The Stability of Polynomial Spectral Graph Filters | H. Kenlay, D. Thanou and X. Dong | In this work, we first prove that polynomial graph filters are stable with respect to the change in the normalised graph Laplacian matrix. We then show empirically that properties of a structural perturbation, specifically the relative locality of the edges removed in a binary graph, effect the change in the normalised graph Laplacian. |
1073 | On Cram?r-Rao Lower Bounds with Random Equality Constraints | C. Pr?vost, E. Chaumette, K. Usevich, D. Brie and P. Comon | In this communication, we introduce a new constrained Cram?r-Rao- like bound for observations where the probability density function (p.d.f.) parameterized by unknown deterministic parameters results from the marginalization of a joint p.d.f. depending on random variables as well. |
1074 | On Harmonic Approximations of Inharmonic Signals | F. Elvander, J. Ding and A. Jakobsson | In this work, we present the misspecified Gaussian Cram?r-Rao lower bound for the parameters of a harmonic signal, or pitch, when signal measurements are collected from an almost, but not quite, harmonic model. |
1075 | A General Test for the Linear Structure of Covariance Matrices of Gaussian Populations | Y. Xiao, D. Ram?rez and P. J. Schreier | We study the problem under the Gaussian assumption and derive the generalized likelihood ratio test (GLRT). |
1076 | Sequential Joint Detection and Estimation with an Application to Joint Symbol Decoding and Noise Power Estimation | D. Reinhard, M. Fau? and A. M. Zoubir | The underlying constrained problem is first converted to an unconstrained problem and then reduced to an optimal stopping problem, whose solution is characterized by a non-linear Bellman equation. |
1077 | A Linear Time Partitioning Algorithm for Frequency Weighted Impurity Functions | T. Nguyen and T. Nguyen | In this paper, we propose a heuristic, efficient (linear time) algorithm for finding the minimum impurity for a broader class of impurity functions which includes popular impurities such as Gini index and entropy. |
1078 | Finite Sample Deviation and Variance Bounds for First Order Autoregressive Processes | R. A. Gonz?lez and C. R. Rojas | In this paper, we study finite-sample properties of the least squares estimator in first order autoregressive processes. |
1079 | Balancing Rates and Variance via Adaptive Batch-Sizes in First-Order Stochastic Optimization | Z. Gao, A. Koppel and A. Ribeiro | In this work, we seek to balance the fact that attenuating step-sizes is required for exact asymptotic convergence with the fact that larger constant step-sizes learn faster in finite time up to an error. |
1080 | A Greedy Sparse Approximation Algorithm Based On L1-Norm Selection Rules | R. B. Mhenni, S. Bourguignon and J. Idier | We propose a new greedy sparse approximation algorithm, called SLS for Single L1 Selection, that addresses a least squares optimization problem under a cardinality constraint. |
1081 | Exact Sparse Nonnegative Least Squares | N. Nadisic, A. Vandaele, N. Gillis and J. E. Cohen | We propose a novel approach to solve exactly the sparse nonnegative least squares problem, under hard l0 sparsity constraints. |
1082 | Epigraphical Reformulation for Non-Proximable Mixed Norms | S. Kyochi, S. Ono and I. Selesnick | This paper proposes an epigraphical reformulation (ER) technique for non-proximable mixed norm regularization. |
1083 | Forward-Backward Splitting for Optimal Transport Based Problems | G. Ortiz-Jim?nez, M. El Gheche, E. Simou, H. P. Maretic and P. Frossard | To overcome this limitation, we devise a general forward-backward splitting algorithm based on Bregman distances for solving a wide range of optimization problems involving a differentiable function with Lipschitz-continuous gradient and a doubly stochastic constraint. |
1084 | SSGD: Sparsity-Promoting Stochastic Gradient Descent Algorithm for Unbiased Dnn Pruning | C. Lee, I. Fedorov, B. D. Rao and H. Garudadri | In this paper, we explore the application of iterative reweighting methods popular in SSR to learning efficient DNNs. |
1085 | Low-Rank Tensor Ring Model for Completing Missing Visual Data | M. S. Asif and A. Prater-Bennette | In this paper, we present an algorithm for recovering low-rank tensors from massively under-sampled or missing data. |
1086 | Sequential Semi-Orthogonal Multi-Level NMF with Negative Residual Reduction for Network Embedding | R. Hashimoto and H. Kasai | To alleviate this shortcoming, this paper presents a proposal of a sequential semi-orthogonal NMF with negative residual reduction for the network embedding (SSO-NRR-NMF). |
1087 | A Fast Proximal Point Algorithm for Generalized Graph Laplacian Learning | Z. Deng and A. M. So | In this paper, we focus on the problem of learning the generalized graph Lapla-cian (GGL) and propose an efficient algorithm to solve it. |
1088 | Reconstruction of Fri Signals Using Deep Neural Network Approaches | V. C. H. Leung, J. Huang and P. L. Dragotti | In this paper, we aim to alleviate the subspace swap problem and investigate alternative approaches including directly estimating FRI parameters using deep neural networks and utilising deep neural networks as denoisers to reduce the noise in the samples. |
1089 | Adaptive Prediction of Financial Time-Series for Decision-Making Using A Tensorial Aggregation Approach | B. S. C. Campello, L. T. Duarte and J. M. T. Romano | More precisely, we introduce a novel approach in which the data taken into account in the decision process is modeled as a tensor. |
1090 | Data Selection Kernel Conjugate Gradient Algorithm | P. S. R. Diniz, J. O. Ferreira, M. O. K. Mendon?a and T. N. Ferreira | In this paper, we propose the data selection kernel conjugate gradient (DS-KCG) algorithm, which is capable of classifying whether the currently available data brings sufficient innovation to update the filter coefficients. |
1091 | Normalized Least-Mean-Square Algorithms with Minimax Concave Penalty | H. Kaneko and M. Yukawa | We propose a novel problem formulation for sparsity-aware adaptive filtering based on the nonconvex minimax concave (MC) penalty, aiming to obtain a sparse solution with small estimation bias. |
1092 | Steepening Squared Error Function Facilitates Online Adaptation of Gaussian Scales | M. Takizawa and M. Yukawa | In this paper, we propose steepening the cost function by adding a squared distance function from the instantaneously-optimal scale. |
1093 | Indoor Altitude Estimation of Unmanned Aerial Vehicles Using a Bank of Kalman Filters | L. Yang, H. Wang, Y. El-Laham, J. I. L. Fonte, D. T. P?rez and M. F. Bugallo | In this work, we propose a novel strategy for tackling the altitude estimation problem that utilizes multiple model adaptive estimation (MMAE), where the candidate models correspond to four scenarios: no obstacles above and below the UAV; obstacles above the UAV; obstacles below the UAV; and obstacles above and below the UAV. |
1094 | Underwater Tracking Based on the Sum-Product Algorithm Enhanced by a Neural Network Detections Classifier | G. Soldi et al. | This paper describes a multiobject tracking framework based on the sum-product algorithm that exploits the information provided by a convolutional neural network that classifies the LFAS detections. |
1095 | Feature Affine Projection Algorithms | H. Yazdanpanah | In this paper, we propose the Feature Affine Projection (F-AP) algorithm to reveal hidden sparsity in unknown systems. |
1096 | Approximate Bayesian Computation with the Sliced-Wasserstein Distance | K. Nadjahi, V. D. Bortoli, A. Durmus, R. Badeau and U. Simsekli | We propose a new ABC technique, called Sliced-Wasserstein ABC and based on the Sliced-Wasserstein distance, which has better computational and statistical properties. |
1097 | Enhanced Mixture Population Monte Carlo Via Stochastic Optimization and Markov Chain Monte Carlo Sampling | Y. El-Laham, P. M. Djuric and M. F. Bugallo | In this work, we introduce a novel M-PMC algorithm that optimizes the parameters of a mixture proposal distribution, where parameter updates are resolved via stochastic optimization instead of EM. |
1098 | Better Safe Than Sorry: Risk-Aware Nonlinear Bayesian Estimation | D. S. Kalogerias, L. F. O. Chamon, G. J. Pappas and A. Ribeiro | To address this issue, we introduce a new risk-aware MMSE formulation which trades between mean performance and risk by explicitly constraining the expected predictive variance of the involved squared error. |
1099 | Particle Filtering on the Complex Stiefel Manifold with Application to Subspace Tracking | C. J. Bordin and M. G. S. Bruno | In this paper, we extend previous particle filtering methods whose states were constrained to the (real) Stiefel manifold to the complex case. |
1100 | Bayesian Multiple Change-Point Detection with Limited Communication | T. Halme, E. Nitzan, H. V. Poor and V. Koivunen | In this paper, we consider Bayesian multiple changepoint detection using a sensor network in which a fusion center can receive a data stream from each sensor. |
1101 | What did your adversary believe? Optimal Filtering and Smoothing in Counter-Adversarial Autonomous Systems | R. Mattila, I. Louren?o, V. Krishnamurthy, C. R. Rojas and B. Wahlberg | We consider fixed-interval smoothing problems for counter-adversarial autonomous systems. |
1102 | Robust Parameter Estimation of Contaminated Damped Exponentials | Y. Xie, D. Liu, H. Mansour and P. T. Boufounos | In this paper we aim to estimate parameters of damped exponentials from contaminated signal, i.e., a mixture of damped exponentials, random Gaussian noise, and spike interference. |
1103 | Computation of “Best” Interpolants in the Lp Sense | P. Bohra and M. Unser | For this continuous-domain problem, we propose an exact discretization scheme that restricts the search space to quadratic splines with knots on an uniform grid. |
1104 | Fast Block-Sparse Estimation for Vector Networks | Z. Yue, P. Sundaram and V. Solo | Here we address key computational bottlenecks and develop a new algorithm which is much faster and has massively reduced requirements on matrix conditioning. |
1105 | Relative Cost Based Model Selection for Sparse High-Dimensional Linear Regression Models | P. B. Gohain and M. Jansson | In this paper, we propose a novel model selection method named multi-beta-test (MBT) for the sparse high-dimensional linear regression model. |
1106 | Cumulant Slice Reconstruction from Compressive Measurements and Its Application to Line Spectrum Estimation | Y. Wang and Z. Tian | To overcome these challenges, this paper develops a novel compressive cumulant slice sensing (CCSS) method that aims to efficiently reconstructing the 1D diagonal slice of higher-order cumulants under the compressive sensing framework. |
1107 | Adaptation and Learning in Multi-Task Decision Systems | S. Marano and A. H. Sayed | Elaborating on previous works on single-task networks engaged in decision problems, here we consider the multi-task version in the challenging scenario where the state of nature may change arbitrarily. |
1108 | Graph Metric Learning via Gershgorin Disc Alignment | C. Yang, G. Cheung and W. Hu | We propose a general projection-free metric learning framework, where the minimization objective ${\min _{{\mathbf{M}} \in \mathcal{S}}}Q({\mathbf{M}})$ is a convex differentiable function of the metric matrix M, and M resides in the set S of generalized graph Laplacian matrices for connected graphs with positive edge weights and node degrees. |
1109 | Learning Graph Influence from Social Interactions | V. Matta, V. Bordignon, A. Santos and A. H. Sayed | Specifically, from observations of the stream of beliefs at certain agents, we would like to examine whether it is possible to learn the strength of the connections (influences) from sending components in the network to these receiving agents. |
1110 | Social Learning with Partial Information Sharing | V. Bordignon, V. Matta and A. H. Sayed | This work studies the learning abilities of agents sharing partial beliefs over social networks. |
1111 | Non-parametric Community Change-points Detection in Streaming Graph Signals | A. Ferrari and C. Richard | The aim of this paper is to address this challenge when anomalies activate unknown groups of nodes in a network. |
1112 | Spatial Gating Strategies for Graph Recurrent Neural Networks | L. Ruiz, F. Gama and A. Ribeiro | To address this, we propose two spatial gating strategies for GRNNs leveraging the node and edge structure of the graph. |
1113 | Learning connectivity and higher-order interactions in radial distribution grids | Q. Yang, M. Coutino, G. Wang, G. B. Giannakis and G. Leus | Establishing a connection between the celebrated exact distribution flow equations and the so-called self-driven graph Volterra model, this paper puts forth a nonlinear topology identification algorithm, that is able to reveal both the edge connections as well as their higher-order interactions. |
1114 | Semi-Supervised Learning of Processes Over Multi-Relational Graphs | Q. Lu, V. N. Ioannidis and G. B. Giannakis | Towards this end, a structured dynamical model is introduced to capture the spatio-temporal nature of dynamic graph processes, and incorporate contributions from multiple relations of the graph in a probabilistic fashion. |
1115 | Recursive Prediction of Graph Signals With Incoming Nodes | A. Venkitaraman, S. Chatterjee and B. Wahlberg | Keeping this premise in mind, we propose a method to recursively obtain the optimal prediction or regression coefficients for the recently proposed Linear Regression over Graphs (LRG), as the graph expands with incoming nodes. |
1116 | Learning Signed Graphs from Data | G. Matz and T. Dittrich | In this paper, we propose a conceptually simple and flexible approach to signed graph learning via signed smoothness metrics. |
1117 | Forecasting Multi-Dimensional Processes Over Graphs | A. Natali, E. Isufi and G. Leus | To tackle this issue, we devise a new framework and propose new methodologies based on the graph vector autoregressive model. |
1118 | Distributed Quantization for Sparse Time Sequences | A. Cohen, N. Shlezinger, S. Salamatian, Y. C. Eldar and M. M?dard | In this work we propose a distributed quantization scheme for representing a set of sparse time sequences acquired using conventional scalar ADCs. |
1119 | A Time-Based Sampling Framework for Finite-Rate-of-Innovation Signals | S. Rudresh, A. J. Kamath and C. Sekhar Seelamantula | In this paper, we consider the problem of sampling and perfect reconstruction of periodic finite-rate-of-innovation (FRI) signals using crossing-time-encoding machine (C-TEM) and integrate-and-fire TEM (IF-TEM). |
1120 | Effective Approximation of Bandlimited Signals and Their Samples | H. Boche and U. J. M?nich | In this paper we analyze if and how this transition affects the computability of the signal. |
1121 | Receiver Design and AGC optimization with Self Interference Induced Saturation | C. K. Sheemar and D. Slock | This leads to missing samples which we propose to reconstruct under the assumptions that the receive signal of interest is a low pass bandlimited signal with known spectrum (mask), oversampling and (perfect) digital SIC after the ADC. |
1122 | D-SLAM: Diffusion Source Localization and Trajectory Mapping | R. Alexandru, T. Blu and P. L. Dragotti | Within this framework, we propose a method for localizing sources of unknown amplitudes, and known activation times. |
1123 | Triggerless Random Interleaved Sampling | M. W. Rupniewski | We show that periodic signals are determined by the probabilistic distribution of their sample sequences, provided that the inter-sequence offsets are uniformly distributed (in a probabilistic sense). |
1124 | The Fractional Quaternion Fourier Number Transform | L. C. da Silva, J. R. de Oliveira Neto and J. B. Lima | In this paper, we define a fractional version of the quaternion Fourier number transform (QFNT). |
1125 | Short and Squeezed: Accelerating the Computation of Antisparse Representations with Safe Squeezing | C. Elvira and C. Herzet | In this paper, we propose a new methodology, coined “safe squeezing”, accelerating the computation of antisparse representations. |
1126 | Decentralized expected consistent signal recovery for quantization Measurements | C. Wang, C. Wen, S. Tsai and S. Jin | In this study, we develop a novel decentralized architecture by leveraging the core framework of GEC-SR called “deGEC-SR.” |
1127 | Lie Group State Estimation via Optimal Transport | Z. Wang and V. Solo | Here we overcome these problems by managing the geometry with the Cayley transform and particle depletion with optimal transport. |
1128 | Smoothing Graph Signals via Random Spanning Forests | Y. Y. Pilavci, P. Amblard, S. Barthelm? and N. Tremblay | Another facet of the elegant link between random processes on graphs and Laplacian-based numerical linear algebra is uncovered: based on random spanning forests, novel Monte-Carlo estimators for graph signal smoothing are proposed. |
1129 | Diagonalizable Shift and Filters for Directed Graphs Based on the Jordan-Chevalley Decomposition | P. Misiakos, C. Wendler and M. P?schel | In this paper, we propose to replace a given adjacency shift A by a diagonalizable shift AD obtained via the Jordan-Chevalley decomposition. |
1130 | Gaussian Processes Over Graphs | A. Venkitaraman, S. Chatterjee and P. Handel | In this work, we consider the development of a stochastic or Bayesian variant of KRG. |
1131 | Blind Source Separation of Graph Signals | J. Miettinen, S. A. Vorobyov and E. Ollila | We propose BSS of graph signals which uses the prior information presented by the signal graph together with nonGaussianity. |
1132 | Graphical Evolutionary Game Theoretic Analysis of Super Users in Information Diffusion | Y. Li, Y. Li, H. Hu, H. V. Zhao and Y. Chen | To address the existence of “super users” in social networks who have higher social status and potentially larger influence, we propose a graphical evolutionary game theoretic framework to investigate the impact of such super users and their strategy update rules on information propagation. |
1133 | Gradient-Based Algorithm with Spatial Regularization for Optimal Sensor Placement | F. Ghayem, B. Rivet, C. Jutten and R. C. Farias | In this paper, we are interested in optimal sensor placement for signal extraction. |
1134 | The Graphon Fourier Transform | L. Ruiz, L. F. O. Chamon and A. Ribeiro | We define graphon signals and introduce the Graphon Fourier Transform (WFT), to which the Graph Fourier Transform (GFT) is shown to converge. |
1135 | Learning Product Graphs from Multidomain Signals | S. K. Kadambari and S. Prabhakar Chepuri | In this paper, we focus on learning the underlying product graph structure from multidomain training data. |
1136 | Graph Vertex Sampling with Arbitrary Graph Signal Hilbert Spaces | B. Girault, A. Ortega and S. S. Narayayan | In this context, we propose to extend sampling set selection based on spectral proxies to arbitrary Hilbert spaces of graph signals. |
1137 | Estimation of Information in Parallel Gaussian Channels via Model Order Selection | C. A. L?pez, F. de Cabrera and J. Riba | We study the problem of estimating the overall mutual information in M independent parallel discrete-time memory-less Gaussian channels from N independent data sample pairs per channel (inputs and outputs). |
1138 | Generalized Graph Spectral Sampling with Stochastic Priors | J. Hara, Y. Tanaka and Y. C. Eldar | In this paper, we assume the graph signals are modeled by graph wide sense stationarity (GWSS), which is an extension of WSS for standard time domain signals. |
1139 | Anomalydae: Dual Autoencoder for Anomaly Detection on Attributed Networks | H. Fan, F. Zhang and Z. Li | In this paper, we propose a deep joint representation learning framework for anomaly detection through a dual autoencoder (AnomalyDAE), which captures the complex interactions between network structure and node attribute for high-quality embeddings. |
1140 | On The Degrees Of Freedom in Total Variation Minimization | F. Xue and T. Blu | Considering the total-variation (TV) regularization, we present a theoretical study of the DOF in Stein?s unbiased risk estimate (SURE), under a very mild assumption. |
1141 | Atomic Norm Denoising In Blind Two-Dimensional Super-Resolution | M. A. Suliman and W. Dai | In this work, we develop a new framework for denoising in blind two-dimensional (2D) super-resolution that recovers a set of 2D continuous parameters as well as unknown waveforms from noisy samples. |
1142 | Dynamic Channel Pruning For Correlation Filter Based Object Tracking | G. Y. Gopal and M. A. Amer | To mitigate this problem, we propose a method for dynamic channel pruning through online (i.e., at every frame) learning of channel weights. |
1143 | Positive Semidefinite Matrix Factorization: A Link to Phase Retrieval And A Block Gradient Algorithm | D. Lahat and C. F?votte | In this paper, we show, for the first time, a link between PS-DMF and the problem of matrix recovery from phaseless measurements, which includes phase retrieval. |
1144 | Realizability of Planar Point Embeddings from Angle Measurements | F. D?mbgen, M. El Helou and A. Scholefield | This paper is concerned with the theory of localization from inner-angle measurements. |
1145 | Sparse Recovery with Non-Linear Fourier Features | A. ?z?elikkale | Motivated by this success, this article focuses on a sparse non-linear Fourier feature (NFF) model. |
1146 | Efficient Super-Resolution Two-Dimensional Harmonic Retrieval Via Enhanced Low-Rank Structured Covariance Reconstruction | Y. Wang, Y. Zhang, Z. Tian, G. Leus and G. Zhang | This paper develops an enhanced low-rank structured covariance reconstruction (LRSCR) method based on the decoupled atomic norm minimization (D-ANM), for super-resolution two-dimensional (2D) harmonic retrieval with multiple measurement vectors. |
1147 | Effect of Undersampling on Non-Negative Blind Deconvolution with Autoregressive Filters | P. Sarangi, M. C. H?c?menoglu and P. Pal | Our objective is to understand if it is possible to recover both the signal and the kernel from downsampled measurements of their convolution. |
1148 | Manifold Gradient Descent Solves Multi-Channel Sparse Blind Deconvolution Provably and Efficiently | L. Shi and Y. Chi | We propose a novel approach based on nonconvex optimization over the sphere manifold by minimizing a smooth surrogate of the sparsity-promoting loss function. |
1149 | Sparse Branch and Bound for Exact Optimization of L0-Norm Penalized Least Squares | R. B. Mhenni, S. Bourguignon, M. Mongeau, J. Ninin and H. Carfantan | We propose a global optimization approach to solve l0-norm penalized least-squares problems, using a dedicated branch-and-bound methodology. |
1150 | A Proximal Dual Consensus Method for Linearly Coupled Multi-Agent Non-Convex Optimization | J. Zhang, S. Ge, T. Chang and Z. Luo | In this paper, we propose such a method, called the proximal dual consensus (PDC) method, that combines a proximal technique and the dual consensus method. |
1151 | A Penalty Alternating Direction Method of Multipliers for Decentralized Composite Optimization | J. Zhang, A. M. So and Q. Ling | This paper proposes a penalty alternating direction method of multipliers (ADMM) to minimize the summation of convex composite functions over a decentralized network. |
1152 | Wirtinger Flow Algorithms for Phase Retrieval from Binary Measurements | V. Kishore and C. S. Seelamantula | We consider the problem of Binary Phase Retrieval, wherein we attempt to recover signals from their quadratic measurements, which are further encoded as +1 or -1 depending on whether they exceed a threshold or not. |
1153 | Decentralized Min-Max Optimization: Formulations, Algorithms and Applications in Network Poisoning Attack | I. Tsaknakis, M. Hong and S. Liu | This paper discusses formulations and algorithms which allow a number of agents to collectively solve problems involving both (non-convex) minimization and (concave) maximization operations. |
1154 | An Efficient Augmented Lagrangian-Based Method for Linear Equality-Constrained Lasso | Z. Deng, M. Yue and A. M. So | In this paper, we demonstrate how the recently developed semis-mooth Newton-based augmented Lagrangian framework can be extended to solve a linear equality-constrained Lasso model. |
1155 | Control of Linear Dynamical Systems Using Sparse Inputs | C. Sriram, G. Joseph and C. R. Murthy | In this work, we consider control of linear dynamical systems using sparse inputs. |
1156 | Decentralized Stochastic Non-Convex Optimization over Weakly Connected Time-Varying Digraphs | S. Lu and C. W. Wu | In this paper, we consider decentralized stochastic non-convex optimization over a class of weakly connected digraphs. |
1157 | Paco and Paco-Dct: Patch Consensus and Its Application To Inpainting | I. R. Paulino and I. Hounie | We propose a novel framework for this type of problem based on the idea that estimated patches should coincide at the overlaps (consensus), and develop an algorithm for solving the general problem. |
1158 | Image Recovery from Rotational And Translational Invariants | N. F. Marshall, T. Lan, T. Bendory and A. Singer | We introduce a framework for recovering an image from its rotationally and translationally invariant features based on autocorrelation analysis. |
1159 | Optimal Window Design for Joint Spatial-Spectral Domain Filtering of Signals on the Sphere | A. Aslam and Z. Khalid | We present the optimal design of an azimuthally symmetric window signal for carrying out joint spatial-spectral domain filtering of a spherical (source) signal contaminated by a realization of an anisotropic noise process. |
1160 | Filtering Out Time-Frequency Areas Using Gabor Multipliers | A. M. Kr?m?, V. Emiya, C. Chaux and B. Torr?sani | We address the problem of filtering out localized time-frequency components in signals. |
1161 | ?-NMF and Sparsity Promoting Regularizations for Complex Mixture Unmixing. Application to 2D HSQC NMR. | A. CHERNI, S. ANTHOINE and C. CHAUX | In this work, we propose a new variational formulation for blind source separation (BSS) based on a ?-divergence data fidelity term combined with sparsity promoting regularization functions. |
1162 | Fir Filtering of Discontinuous Signals: A Random-Stratified Sampling Approachx | H. Y. Darawsheh and A. Tarczynski | This paper presents a novel approach, based on random stratified sampling (StSa) technique, to estimate the output of a finite impulse response (FIR) filter when the input signal is either a piecewise-continuous function having first-derivative discontinuities (FDDs), or a piecewise-discontinuous function, i.e. having zero-derivative discontinuities (ZDDs). |
1163 | Message Transmission ThroughUnderspread Time-Varying Linear Channels | A. Kaplan, D. Lee and V. Pohl | The main contribution of this paper is that the suggested message transmission scheme over time-variant communication channels enables data transmission in scenarios where previously no communication was possible. |
1164 | The Discrete Stockwell Transforms for Infinite-Length Signals and Their Real-Time Implementations | Y. Yan and H. Zhu | In this paper, new formulations of the discrete ST for infinite-length signals are proposed. |
1165 | Low-Rank Approximation of Matrices Via A Rank-Revealing Factorization with Randomization | M. F. Kaloorazi and J. Chen | To treat such a case efficiently, in this paper we present an algorithm called randomized pivoted TSOD (RP-TSOD) that constructs a highly accurate approximation to the TSOD decomposition. |
1166 | Time-Scale Synthesis for Locally Stationary Signals | A. Meynard and B. Torr?sani | In a maximum a posteriori approach, we propose an estimator for the model parameters, namely the time-varying scale translation and an underlying power spectrum. |
1167 | Maximally Energy-Concentrated Differential Window for Phase-Aware Signal Processing Using Instantaneous Frequency | T. Kusano, K. Yatabe and Y. Oikawa | In this paper, we propose window functions suitable for computing the instantaneous frequency which are designed based on minimizing the sidelobe energy of the frequency response of the differential window. |
1168 | On the Use of R?nyi Entropy for Optimal Window Size Computation in the Short-Time Fourier Transform | S. Meignen, M. Colominas and D. Pham | In this paper, we investigate the determination of an optimal window length associated with the computation of the short time Fourier transform of multicomponent signals. |
1169 | Data-Driven Model Set Design for Model Averaged Particle Filter | B. Liu | This paper fills in this gap by proposing a generic data-driven method for BMAPF model set design. |
1170 | Graphem: EM Algorithm for Blind Kalman Filtering Under Graphical Sparsity Constraints | ?. Chouzenoux and V. Elvira | In this work, we propose a novel expectation-maximization algorithm for estimating the linear matrix operator in the state equation of a linear-Gaussian state-space model. |
1171 | On Design of Optimal Smart Meter Privacy Control Strategy Against Adversarial Map Detection | R. R. Avula and T. J. Oechtering | We study the optimal control problem of the maximum a posteriori (MAP) state sequence detection of an adversary using smart meter data. |
1172 | Approximate Inference by Kullback-Leibler Tensor Belief Propagation | P. W. A. Wijnings, S. Stuijk, B. de Vries and H. Corporaal | We focus on reducing the size of discrete models with large treewidth by storing intermediate factors in compressed form, thereby decoupling the variables through conditioning on introduced weights.This work proposes pruning of these weights using Kullback-Leibler divergence. |
1173 | A Particle Gibbs Sampling Approach to Topology Inference in Gene Regulatory Networks | M. Iloska, Y. El-Laham and M. F. Bugallo | In this paper, we propose a novel Bayesian approach for estimating a gene network?s topology using particle Gibbs sampling. |
1174 | Particle Filter with Rejection Control and Unbiased Estimator of the Marginal Likelihood | J. Kudlicka, L. M. Murray, T. B. Sch?n and F. Lindsten | In the paper we present a particle filter with rejection control that enables unbiased estimation of the marginal likelihood. |
1175 | Particle Group Metropolis Methods for Tracking the Leaf Area Index | L. Martino, V. Elvira and G. Camps-Valls | In this work, we introduce an Markov Chain Monte Carlo (MCMC) technique driven by a particle filter. |
1176 | Unsupervised Variational Bayesian Kalman Filtering For Large-Dimensional Gaussian Systems | B. Ait-El-Fquih, T. Rodet and I. Hoteit | For this problem, we propose two efficient algorithms based on the variational Bayesian (VB) approach. |
1177 | Levenberg-Marquardt and Line-Search Extended Kalman Smoothers | S. S?rkk? and L. Svensson | The aim of this article is to present Levenberg-Marquardt and line-search extensions of the classical iterated extended Kalman smoother (IEKS) which has previously been shown to be equivalent to the Gauss-Newton method. |
1178 | Laplace State Space Filter with Exact Inference and Moment Matching | J. Neri, P. Depalle and R. Badea | We present a Bayesian filter for state space models with Laplace-distributed observation noise that is robust to heavy-tailed and outlier-ridden univariate time-series data. |
1179 | Probabilistic Filter and Smoother for Variational Inference of Bayesian Linear Dynamical Systems | J. Neri, R. Badeau and P. Depalle | Here, we propose a solution using matrix inversion lemmas to derive what may be considered as the Bayesian counterparts to the Kalman filter and smoother, which are particular forms of the forward-backward algorithm that have known properties of numerical stability and efficiency that lead to cost growing linear with time. |
1180 | Optimum Kernel Particle Filter for Asymmetric Laplace Noise | U. Andersson and S. Godsill | In this paper we present on-line Bayesian filtering methods for time series models corrupted by asymmetric Laplace noise. |
1181 | Convex Optimisation-Based Privacy-Preserving Distributed Average Consensus in Wireless Sensor Networks | Q. Li, R. Heusdens and M. G. Christensen | In this paper, we propose a novel convex optimization-based solution to the problem of privacy-preserving distributed average consensus. |
1182 | Proximal Multitask Learning Over Distributed Networks with Jointly Sparse Structure | D. Jin, J. Chen, C. Richard and J. Chen | By introducing an l8,1-norm penalty at each node, and using a proximal gradient method to minimize the regularized cost, we devise a proximal multitask diffusion LMS algorithm which promotes the joint-sparsity to enhance the estimation performance. |
1183 | Optimal Sampling Rate and Bandwidth of Bandlimited Signals?an Algorithmic Perspective | H. Boche and U. J. M?nich | In this paper we study if it is possible to algorithmically determine the actual bandwidth of a bandlimited signal. |
1184 | Resilient to Byzantine Attacks Finite-Sum Optimization Over Networks | Z. Wu, Q. Ling, T. Chen and G. B. Giannakis | This contribution deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. |
1185 | Exploiting Sparsity for Robust Sensor Network Localization in Mixed LOS/NLOS Environments | D. Jin, F. Yin, M. Fau?, M. Muma and A. M. Zoubir | We address the problem of robust network localization in realistic mixed LOS/NLOS environments. |
1186 | A Low-Complexity Map Detector for Distributed Networks | A. E. Feitosa, V. H. Nascimento and C. G. Lopes | This work describes a generalization of our previous maximum likelihood (ML) detector to a maximum a posteriori (MAP) detector in distributed networks using the diffusion LMS algorithm. |
1187 | Quickest Change Detection In Anonymous Heterogeneous Sensor Networks | Z. Sun, S. Zou and Q. Li | In this paper, a simple optimality proof is derived for the Mixture Likelihood Ratio Test (MLRT), which was constructed and proved to be optimal for the non-sequential anonymous setting in [1]. |
1188 | Optimal Power Flow Using Graph Neural Networks | D. Owerko, F. Gama and A. Ribeiro | In this paper, we propose using graph neural networks (which are localized, scalable parametrizations of network data) trained under the imitation learning framework to approximate a given optimal solution. |
1189 | Byzantine-Robust Decentralized Stochastic Optimization | J. Peng and Q. Ling | In this paper, we consider the Byzantine-robust stochastic optimization problem defined over a decentralized network, where the agents collaboratively minimize the summation of expectations of stochastic local cost functions, but some of the agents are unreliable. |
1190 | Federated Truth Inference over Distributed Crowdsourcing Platforms | M. Yang, G. Liu and Y. -. P. Hong | A federated truth inference (FTI) algorithm is proposed based on a distributed implementation of the block expectation-maximization (EM) algorithm. |
1191 | Clock Synchronization Over Networks Using Sawtooth Models | P. d. Aguila Pla, L. Pellaco, S. Dwivedi, P. H?ndel and J. Jald?n | In this paper, we study the use of time-to-digital converters in wireless sensors, which provides clock synchronization and ranging at negligible communication overhead through a sawtooth signal model for round trip times between two nodes. |
1192 | An Analytical Solution to Jacobsen Estimator for Windowed Signals | T. Murakami and W. Wang | In this paper, we extend the well-known Jacobsen estimator to windowed signals. |
1193 | Regularized Partial Phase Synchrony Index Applied to Dynamical Functional Connectivity Estimation | G. Frusque, J. Jung, P. Borgnat and P. Gon?alves | We study the inference of conditional independence graph from the partial Phase Locking Value (PLV) index of multivariate time series. |
1194 | The Matched Reassigned Cross-Spectrogram for Phase Estimation | M. Sandsten, R. Anderson, I. Reinhold and J. Brynolfsson | In this paper, the matched reassigned spectrogram is expanded into a novel matched phase reassignment (MPR) method based on the reassigned cross-spectrogram. |
1195 | Line Spectral Estimation with Palindromic Kernels | D. Verbeke and I. Markovsky | We develop an equivalent formulation as a structured low-rank approximation problem and present a necessary condition for the model to be undamped. |
1196 | Latent Fused Lasso | Y. Feng and I. Selesnick | In this paper, we propose a convex variational norm for better modeling sparse piecewise constant signals. |
1197 | Adversarial Attacks on Deep Unfolded Networks for Sparse Coding | Y. Wang, K. Wu and C. Zhang | Our paper is the first work to study the adversarial performance on unfolded DNNs for sparse coding. |
1198 | Riemannian Framework for Robust Covariance Matrix Estimation in Spiked Models | F. Bouchard, A. Breloy, G. Ginolhac and F. Pascal | This paper aims at providing an original Riemannian geometry to derive robust covariance matrix estimators in spiked models (i.e. when the covariance matrix has a low-rank plus identity structure). |
1199 | Robust Matrix Completion via lP-Greedy Pursuits | X. Jiang, A. M. Zoubir and X. Liu | A novel l p -greedy pursuit (GP) algorithm for robust matrix completion, i.e., recovering a low-rank matrix from only a subset of its noisy and outlier-contaminated entries, is devised. |
1200 | Separable Optimization for Joint Blind Deconvolution and Demixing | D. Weitzner and R. Giryes | In this work, we present a separable approach to blind deconvolution and demixing via convex optimization. |
1201 | Misspecified Cramer-Rao Bound For Delay Estimation with a Mismatched Waveform: A Case Study | F. Roemer | In this paper we investigate the problem of time of arrival estimation which occurs in many real-world applications, such as indoor localization or non-destructive testing via ultrasound or radar. |
1202 | Privacy-Aware Quickest Change Detection | T. S. Lau and W. Peng Tay | Our goal is to sanitize the signal to satisfy information privacy requirements while being able to detect a change quickly. |
1203 | Source Enumeration via Toeplitz Matrix Completion | V. Garg, P. Gim?nez-Febrer, A. Pag?s-Zamora and I. Santamaria | This paper addresses the problem of source enumeration by an array of sensors in the presence of noise whose spatial covariance structure is a diagonal matrix with possibly different variances, referred to non-iid noise hereafter, when the sources are uncorrelated. |
1204 | Sequential Methods for Detecting a Change in the Distribution of an Episodic Process | T. Banerjee, E. Adib, A. Taha and E. John | The algorithms can be computed recursively using finite memory and are shown to be asymptotically optimal for well-defined Bayesian or minimax stochastic optimization formulations. |
1205 | Distribution of the Product of a Complex Gaussian Matrix and Vector and Its Sum with a Complex Gaussian Vector | W. Shi, Y. Li and Q. He | In this paper, we derive the distribution of the product of a complex Gaussian matrix and a complex Gaussian vector. |
1206 | Principal Angle Detector for Subspace Signal with Structured Unknown Interference | X. Xu, Y. Jiao and Y. Gu | In this paper, we propose a new detector, called Principal Angle Detector (PAD), based on principal angles between subspaces. |
1207 | A Robust Speaker Clustering Method Based on Discrete Tied Variational Autoencoder | C. Feng, J. Wang, T. Li, J. Peng and J. Xiao | In this paper, we propose a novel speaker clustering method based on Mutual Information (MI) and a non-linear model with discrete variable, which under the enlightenment of Tied Variational Autoencoder (TVAE), to enhance the robustness against noise. |
1208 | Track-Before-Detect for Sub-Nyquist Radar | S. Na, T. Huang, Y. Liu and X. Wang | To overcome this issue, we propose a weighted sparse recovery based track-before-detect (TBD) method for weak targets detection by accumulating multi-frame information. |
1209 | Optimal Transport Based Change Point Detection and Time Series Segment Clustering | K. C. Cheng, S. Aeron, M. C. Hughes, E. Hussey and E. L. Miller | Building upon recent theoretical advances characterizing the limiting distribution-free behavior of the Wasserstein two-sample test (Ramdas et al. 2015), we propose a novel algorithm for unsupervised, distribution-free CPD which is amenable to both offline and online settings. |
1210 | Multi-View Wasserstein Discriminant Analysis with Entropic Regularized Wasserstein Distance | H. Kasai | To evaluate this discrepancy, this paper presents a proposal of a multi-view Wasserstein discriminant analysis, designated as MvWDA, which exploits a recently developed optimal transport theory. |
1211 | Large-Scale Time Series Clustering with k-ARs | Z. Yue and V. Solo | This paper proposes a new algorithm, k-ARs, which is a limiting version of the existing EM algorithm. |
1212 | Deterministic Feature Decoupling by Surfing Invariance Manifolds | E. Mart?nez-Enr?quez and J. Portilla | We introduce a formalism that justifies and extends a heuristic method for algebraically decoupling deterministic features that recently proved useful for improving feature-based classifi-cation. |
1213 | Unsupervised Auto-Encoding Multiple-Object Tracker for Constraint-Consistent Combinatorial Problem | Y. Kawachi and T. Suzuki | In this work, we propose an unsupervised neural MOT model for accurate semi-automatic association labeling and we tackle the challenging one-to-one constrained combinatorial association problem by applying relaxation techniques. |
1214 | A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency | T. N. Sainath et al. | In this paper, we develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer that surpasses a conventional model in both quality and latency. |
1215 | Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR | H. Inaguma, Y. Gaur, L. Lu, J. Li and Y. Gong | To alleviate this issue and reduce latency, we propose several strategies during training by leveraging external hard alignments extracted from the hybrid model. |
1216 | Towards Fast and Accurate Streaming End-To-End ASR | B. Li et al. | We propose to reduce E2E model?s latency by extending the RNN-T endpointer (RNN-T EP) model [2] with additional early and late penalties. |
1217 | Streaming Automatic Speech Recognition with the Transformer Model | N. Moritz, T. Hori and J. Le | In this work, we propose a transformer based end-to-end ASR system for streaming ASR, where an output must be generated shortly after each spoken word. |
1218 | CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition | L. Dong and B. Xu | In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction. |
1219 | Transformer-Based Online CTC/Attention End-To-End Speech Recognition Architecture | H. Miao, G. Cheng, C. Gao, P. Zhang and Y. Yan | In this paper, we propose the Transformer-based online CTC/attention E2E ASR architecture, which contains the chunk self-attention encoder (chunk-SAE) and the monotonic truncated attention (MTA) based self-attention decoder (SAD). |
1220 | Detecting Multiple Speech Disfluencies Using a Deep Residual Network with Bidirectional Long Short-Term Memory | T. Kourkounakis, A. Hajavi and A. Etemad | As opposed to most existing works that identify stutters with language models, our work proposes a model that relies solely on acoustic features, allowing for identification of several variations of stutter disfluencies without the need for speech recognition. |
1221 | Exploring Appropriate Acoustic and Language Modelling Choices for Continuous Dysarthric Speech Recognition | Z. Yue, F. Xiong, H. Christensen and J. Barker | In this paper, we investigate the impact of LM design using the widely used TORGO database. |
1222 | Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment | P. Janbakhshi, I. Kodrasi and H. Bourlard | To be able to use P-ESTOI in such scenarios, in this paper we propose to use synthetic speech generated by state-of-the-art high-quality text-to-speech systems to create an intelligible reference model. |
1223 | Two-Step Acoustic Model Adaptation for Dysarthric Speech Recognition | R. Takashima, T. Takiguchi and Y. Ariki | This paper introduces a model adaptation approach for a speaker-dependent dysarthric speech recognition system. |
1224 | Dysarthric Speech Recognition with Lattice-Free MMI | E. Hermann and M. Magimai.-Doss | This paper focuses on the use of state-of-the-art sequence-discriminative training, in particular lattice-free maximum mutual information (LF-MMI), for improving dysarthric speech recognition. |
1225 | Improved Speaker Independent Dysarthria Intelligibility Classification Using Deepspeech Posteriors | A. Tripathi, S. Bhosale and S. K. Kopparapu | Motivated by this observation, we propose a speaker independent intelligibility assessment system which relies on a novel set of features obtained by processing the output of DeepSpeech, an end to end Speech-to-Text engine. |
1226 | Joint Phoneme-Grapheme Model for End-To-End Speech Recognition | Y. Kubo and M. Bacchiani | This paper proposes methods to improve a commonly used end-to-end speech recognition model, Listen-Attend-Spell (LAS). |
1227 | Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions | S. Kriman et al. | We propose a new end-to-end neural acoustic model for automatic speech recognition. |
1228 | End-To-End Multi-Talker Overlapping Speech Recognition | A. Tripathi, H. Lu and H. Sak | In this paper we present an end-to-end speech recognition system that can recognize single-channel speech where multiple talkers can speak at the same time (overlapping speech) by using a neural network model based on Recurrent Neural Network Transducer (RNN-T) architecture. |
1229 | End-To-End Multi-Speaker Speech Recognition With Transformer | X. Chang, W. Zhang, Y. Qian, J. L. Roux and S. Watanabe | In this work, we explore the use of Transformer models for these tasks by focusing on two aspects. |
1230 | Hybrid Autoregressive Transducer (HAT) | E. Variani, D. Rybach, C. Allauzen and M. Riley | This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves the modularity of conventional automatic speech recognition systems. |
1231 | Lightweight and Efficient End-To-End Speech Recognition Using Low-Rank Transformer | G. I. Winata, S. Cahyawijaya, Z. Lin, Z. Liu and P. Fung | We propose the low-rank transformer (LRT), a memory-efficient and fast neural architecture that significantly reduces the parameters and boosts the speed of training and inference for end-to-end speech recognition. |
1232 | Spoken Language Acquisition Based on Reinforcement Learning and Word Unit Segmentation | S. Gao, W. Hou, T. Tanaka and T. Shinozaki | This paper proposes a new framework for simulating spoken-language acquisition by combining reinforcement learning and unsupervised learning methods. |
1233 | How Much Self-Attention Do We Need? Trading Attention for Feed-Forward Layers | K. Irie, A. Gerstenberger, R. Schl?ter and H. Ney | We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performance. |
1234 | Learning Recurrent Neural Network Language Models With Context-Sensitive Label Smoothing for Automatic Speech Recognition | M. Song, Y. Zhao, S. Wang and M. Han | We propose an approach of context-sensitive candidate label smoothing that has two advantages. |
1235 | Semi-Supervised Learning for Text Classification by Layer Partitioning | A. H. Li and A. Sethy | To adapt these methods to text input, we propose to decompose a neural network M into two components F and U so that M = U ? |
1236 | Integrating Discrete and Neural Features Via Mixed-Feature Trans-Dimensional Random Field Language Models | S. Gao, Z. Ou, W. Yang and H. Xu | This paper develops a mixed-feature TRF LM and demonstrates its advantage in integrating discrete and neural features. |
1237 | Gated Attentive Convolutional Network Dialogue State Tracker | S. Liu, S. Liu and W. Xu | In this paper, we propose the Gated Attentive Convolutional network Dialogue State Tracker (GAC) which overcomes these challenges by utilizing the gated attentive convolutional encoder and introducing historical information. |
1238 | Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech | V. Aggarwal, M. Cotescu, N. Prateek, J. Lorenzo-Trueba and R. Barra-Chicote | We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second. |
1239 | Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings | E. Cooper et al. | We investigate multi-speaker modeling for end-to-end text-to-speech synthesis and study the effects of different types of state-of-the-art neural speaker embeddings on speaker similarity for unseen speakers. |
1240 | Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens | R. Valle, J. Li, R. Prenger and B. Catanzaro | Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. |
1241 | Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis | E. Battenberg et al. | We suggest simple modifications to GMM-based attention that allow it to align quickly and consistently during training, and introduce a new location-relative attention mechanism to the additive energy-based family, called Dynamic Convolution Attention (DCA). |
1242 | Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram | R. Yamamoto, E. Song and J. Kim | We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. |
1243 | Gaussian Lpcnet for Multisample Speech Synthesis | V. Popov, M. Kudinov and T. Sadekova | In this work, we present a modification of LPCNet that is 1.5x faster, has twice less non-zero parameters and synthesizes speech of the same quality. |
1244 | A Computationally Light Algorithm for Bayesian Speech Enhancement with SNR Marginalization | S. Thaleiser and G. Enzner | Specifically, the extrema of the posterior distribution, which can easily be obtained via differentiation, are combined according to their widths, heights and abscissa. |
1245 | Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks | A. E. Bulut and K. Koishida | To do this, we propose a simple but effective U-Net convolutional neural network (CNN) based architecture with skip-connections with a focus on real-time applications which require low-latency processing. |
1246 | A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers | S. Wang, W. Li, S. M. Siniscalchi and C. Lee | We propose an environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback-Leibler divergence between posterior probabilities produced by a multi-condition senone classifier (teacher) fed with noisy speech features and a clean-condition senone classifier (student) fed with enhanced speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression. |
1247 | Monaural Speech Enhancement Using Intra-Spectral Recurrent Layers in the Magnitude and Phase Responses | K. M. Nayem and D. S. Williamson | In this paper, we propose a deep learning architecture that leverages both temporal and spectral dependencies within the magnitude and phase responses. |
1248 | A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement | S. Niu, J. Du, L. Chai and C. Lee | In this paper, we extend the maximum likelihood approach proposed in our previous work [1] to the multi-objective learning for DNN-based speech enhancement (ML-MOL-DNN) to achieve the automatic adjustment of the dynamic range of prediction error values on different targets. |
1249 | PAGAN: A Phase-Adapted Generative Adversarial Networks for Speech Enhancement | P. Li et al. | This paper presents a new approach to solve the phase mismatch problem by training traditional DNN adversarially with a time-domain discriminator. |
1250 | Humangan: Generative Adversarial Network With Human-Based Discriminator And Its Evaluation In Speech Perception Modeling | K. Fujii, Y. Saito, S. Takamichi, Y. Baba and H. Saruwatari | We propose the HumanGAN, a generative adversarial network (GAN) incorporating human perception as a discriminator. |
1251 | The Processing of Mandarin Chinese Tonal Alternations in Contexts: An Eye-Tracking Study | J. Tu and Y. Chien | This study investigated the perception of Mandarin tonal alternations in disyllabic words. |
1252 | On The Impact of Language Familiarity in Talker Change Detection | N. Sharma, V. Krishnamohan, S. Ganapathy, A. Gangopadhayay and L. Fink | In this paper, we propose an experimental paradigm to provide insights on the impact of language familiarity on talker change detection. |
1253 | Effects of Spectral Tilt on Listeners? Preferences And Intelligibility | O. Simantiraki, M. Cooke and Y. Pantazis | This paper describes a real-time speech modification technique which allows evaluation of the impact of individual speech properties on listeners? preferences. |
1254 | Effect of Frication Duration and Formant Transitions on the Perception of Fricatives in VCV Utterances | K. S. Nataraj, P. C. Pandey and H. Dasgupta | A study is conducted to investigate the relative importance of the transitions and the frication duration on the perception of the unvoiced fricatives /f, s, sh/. |
1255 | Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis | G. Sun, Y. Zhang, R. J. Weiss, Y. Cao, H. Zen and Y. Wu | This paper proposes a hierarchical, fine-grained and interpretable latent variable model for prosody based on the Tacotron 2 text-to-speech model. |
1256 | Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation | Y. Zhao, X. Wang, L. Juvela and J. Yamagishi | This work compares three neural synthesizers used for musical instrument sounds generation under three scenarios: training from scratch on music data, zero-shot learning from the speech domain, and fine-tuning-based adaptation from the speech to the music domain. |
1257 | Teacher-Student Training For Robust Tacotron-Based TTS | R. Liu, B. Sisman, J. Li, F. Bao, G. Gao and H. Li | To overcome this, we propose a teacher-student training scheme for Tacotron-based TTS by introducing a distillation loss function in addition to the feature loss function. |
1258 | Many-To-Many Voice Conversion Using Conditional Cycle-Consistent Adversarial Networks | S. Lee, B. Ko, K. Lee, I. Yoo and D. Yook | In this paper, we extend the CycleGAN by conditioning the network on speakers. |
1259 | F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder | K. Qian, Z. Jin, M. Hasegawa-Johnson and G. J. Mysore | In the paper, we modified and improved autoencoder-based voice conversion to disentangle content, F0, and speaker identity at the same time. |
1260 | End-To-End Accent Conversion Without Using Native Utterances | S. Liu et al. | In this paper, we present an end-to-end framework, which is able to conduct AC from non-native-accented utterances without using any native-accented utterances during online conversion. |
1261 | Cogans For Unsupervised Visual Speech Adaptation To New Speakers | A. Fernandez-Lopez, A. Karaali, N. Harte and F. M. Sukno | Specifically, we are the first to explore the visual adaptation of an SI-AVSR system to an unknown and unlabelled speaker. |
1262 | Visually Guided Self Supervised Learning of Speech Representations | A. Shukla, K. Vougioukas, P. Ma, S. Petridis and M. Pantic | We propose a framework for learning audio representations guided by the visual modality in the context of audiovisual speech. |
1263 | Looking Enhances Listening: Recovering Missing Speech Using Images | T. Srinivasan, R. Sanabria and F. Metze | In this paper, we present a set of experiments where we show the utility of the visual modality under noisy conditions. |
1264 | Towards Multilingual Sign Language Recognition | S. Tornay, M. Razavi and M. Magimai.-Doss | In this paper, we develop a multilingual sign language approach, where hand movement modeling is also done with target sign language independent data by derivation of hand movement subunits. |
1265 | Automatic Identification of Speakers From Head Gestures in a Narration | S. K. Vadiraj, A. Rao M V and P. K. Ghosh | In this work, we focus on quantifying speaker identity information encoded in the head gestures of speakers, while they narrate a story. |
1266 | Lipreading Using Temporal Convolutional Networks | B. Martinez, P. Ma, S. Petridis and M. Pantic | In this work, we address the limitations of this model and we propose changes which further improve its performance. |
1267 | On Modeling ASR Word Confidence | W. Jeon, M. Jordan and M. Krishnamoorthy | We present a new method for computing ASR word confidences that effectively mitigates the effect of ASR errors for diverse downstream applications, improves the word error rate of the 1-best result, and allows better comparison of scores across different models. |
1268 | Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks | A. Kastanos, A. Ragni and M. J. F. Gales | This paper examines this limited resource scenario for confidence estimation, a measure commonly used to assess transcription reliability. |
1269 | OOV Recovery with Efficient 2nd Pass Decoding and Open-vocabulary Word-level RNNLM Rescoring for Hybrid ASR | X. Zhang, D. Povey and S. Khudanpur | In this paper, we investigate out-of-vocabulary (OOV) word recovery in hybrid automatic speech recognition (ASR) systems, with emphasis on dynamic vocabulary expansion for both Weight Finite State Transducer (WFST)-based decoding and word-level RNNLM rescoring. |
1270 | End to End Speech Recognition Error Prediction with Sequence to Sequence Learning | P. Serai, A. Stiff and E. Fosler-Lussier | We present a novel end to end model to simulate ASR errors. |
1271 | ASR Error Correction and Domain Adaptation Using Machine Translation | A. Mani, S. Palaskar, N. V. Meripo, S. Konam and F. Metze | We propose a simple technique to perform domain adaptation for ASR error correction via machine translation. |
1272 | Joint Contextual Modeling for ASR Correction and Language Understanding | Y. Weng et al. | In this paper, we propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with LU to improve the performance of both tasks simultaneously. |
1273 | Deep Casa for Talker-independent Monaural Speech Separation | Y. Liu, M. Delfarah and D. Wang | In this study, we address both speech and nonspeech interference, i.e., monaural speaker separation in noise, in a talker-independent fashion. |
1274 | Demystifying TasNet: A Dissecting Approach | J. Heitkaemper, D. Jakobeit, C. Boeddeker, L. Drude and R. Haeb-Umbach | In this paper we dissect the gains of the time-domain audio separation network (TasNet) approach by gradually replacing components of an utterance-level permutation invariant training (u-PIT) based separation system in the frequency domain until the TasNet system is reached, thus blending components of frequency domain approaches with those of time domain approaches. |
1275 | Filterbank Design for End-to-end Speech Separation | M. Pariente, S. Cornell, A. Deleforge and E. Vincent | In this work, we extend real-valued learned and parameterized filterbanks into complex-valued analytic filterbanks and define a set of corresponding representations and masking strategies. |
1276 | Interrupted and Cascaded Permutation Invariant Training for Speech Separation | G. Yang, S. Wu, Y. Mao, H. Lee and L. Lee | In this paper, we investigate instead for a given model architecture the various flexible label assignment strategies for training the model, rather than directly using PIT. |
1277 | Mixup-breakdown: A Consistency Training Method for Improving Generalization of Speech Separation Models | M. W. Y. Lam, J. Wang, D. Su and D. Yu | To address this problem, we propose an easy-to-implement yet effective consistency based semi-supervised learning (SSL) approach, namely Mixup-Breakdown training (MBT). |
1278 | An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation | H. Wang, Y. Song, Z. Li, I. McLoughlin and L. Dai | In this paper, we propose a speaker-aware approach based on the source-aware context network structure, in which the speaker information is explicitly modeled by an auxiliary speaker identification branch. |
1279 | Beam-TasNet: Time-domain Audio Separation Network Meets Frequency-domain Beamformer | T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani and S. Araki | Motivated by this question, this paper proposes a novel speech separation scheme, i.e., Beam-TasNet, which combines TasNet with the frequency-domain beamformer, i.e., a minimum variance distortionless response (MVDR) beamformer, through spatial covariance computation to achieve better ASR performance. |
1280 | On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments | J. Zhang, C. Zorila, R. Doddipatla and J. Barker | This paper introduces a new method for multi-channel time domain speech separation in reverberant environments. |
1281 | End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation | Y. Luo, Z. Chen, N. Mesgarani and T. Yoshioka | In this paper, we propose transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation. |
1282 | DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation | T. Nakatani et al. | In this article, we investigate an integrated mask-based convolutional beamforming method for performing simultaneous denoising, dereverberation, and source separation. |
1283 | Real-Time Binaural Speech Separation with Preserved Spatial Cues | C. Han, Y. Luo and N. Mesgarani | Here, we propose a speech separation algorithm that preserves the interaural cues of separated sound sources and can be implemented with low latency and high fidelity, therefore enabling a real-time modification of the acoustic scene. |
1284 | SLOGD: Speaker Location Guided Deflation Approach to Speech Separation | S. Sivasankaran, E. Vincent and D. Fohr | In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. |
1285 | Multilingual Acoustic Word Embedding Models for Processing Zero-resource Languages | H. Kamper, Y. Matusevych and S. Goldwater | Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. |
1286 | Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders | A. T. Liu, S. Yang, P. Chi, P. Hsu and H. Lee | We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech. |
1287 | Recurrent Neural Audiovisual Word Embeddings for Synchronized Speech and Real-Time Mri | ?. D. K?se and M. Sara?lar | In this paper, the use of word embeddings for the segments found in audio and real-time magnetic resonance imaging (rtMRI) videos is addressed. |
1288 | Deep Contextualized Acoustic Representations for Semi-Supervised Speech Recognition | S. Ling, Y. Liu, J. Salazar and K. Kirchhoff | We propose a novel approach to semi-supervised automatic speech recognition (ASR). |
1289 | What Does a Network Layer Hear? Analyzing Hidden Representations of End-to-End ASR Through Speech Synthesis | C. Li, P. Yuan and H. Lee | In this paper, we present our ASR probing model, which synthesizes speech from hidden representations of end-to-end ASR to examine the information maintained after each layer calculation. |
1290 | Learning a Subword Inventory Jointly with End-to-End Automatic Speech Recognition | J. Drexler and J. Glass | In this paper, we follow the LSD method for using subword units but introduce an updated loss function that allows the ASR model to explicitly perform unit discovery, as well. |
1291 | Multiple Points Input For Convolutional Neural Networks in Replay Attack Detection | S. Yoon and H. Yu | We proposed the multiple points input method to increase the amount of information that can be considered at one time. |
1292 | Information Maximized Variational Domain Adversarial Learning for Speaker Verification | Y. Tu, M. Mak and J. Chien | This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) to reduce domain mismatch by incorporating an InfoVAE into domain adversarial training (DAT). |
1293 | Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings | Y. Yang, S. Wang, X. Gong, Y. Qian and K. Yu | In this paper, we propose a novel text adaptation framework to address the text mismatch issue. |
1294 | Voiceai Systems to NIST Sre19 Evaluation: Robust Speaker Recognition on Conversational Telephone Speech | R. Li, D. Chen and W. Zhang | In this study, we present the VoiceAI (VAI) submissions to NIST SRE 2019 challenge on the task of speaker recognition using conversational telephone speech. |
1295 | Multi-Resolution Multi-Head Attention in Deep Speaker Embedding | Z. Wang, K. Yao, X. Li and S. Fang | This paper proposes simple but effective pooling methods to compute attentive weights for better temporal aggregation over the variable-length input speech, enabling the end-to-end neural network to have improved performance for discriminating among speakers. |
1296 | Within-Sample Variability-Invariant Loss for Robust Speaker Recognition Under Noisy Environments | D. Cai, W. Cai and M. Li | In this paper, we train the speaker embedding network to learn the “clean” embedding of the noisy utterance. |
1297 | Speech Emotion Recognition with Dual-Sequence LSTM Architecture | J. Wang, M. Xue, R. Culhane, E. Diao, J. Ding and V. Tarokh | In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. |
1298 | A Dialogical Emotion Decoder for Speech Motion Recognition in Spoken Dialog | S. Yeh, Y. Lin and C. Lee | In this paper, we proposed a novel inference algorithm, a dialogical emotion decoding (DED) algorithm, that treats a dialog as a sequence and consecutively decode the emotion states of each utterance over time with a given recognition engine. |
1299 | Fusion Approaches for Emotion Recognition from Speech Using Acoustic and Text-Based Features | L. Pepino, P. Riera, L. Ferrer and A. Gravano | In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. |
1300 | Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals | E. Guizzo, T. Weyde and J. B. Leveson | To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. |
1301 | Ordinal Learning for Emotion Recognition in Customer Service Calls | W. Han, T. Jiang, Y. Li, B. Schuller and H. Ruan | To employ the ordinal information between emotional ranks, we propose to model the ordinal SER tasks under a COnsistent RAnk Logits (CORAL) based deep learning framework. |
1302 | HGFM : A Hierarchical Grained and Feature Model for Acoustic Emotion Recognition | Y. Xu, H. Xu and J. Zou | To solve the problem of poor classification performance of multiple complex emotions in acoustic modalities, we propose a hierarchical grained and feature model (HGFM). |
1303 | Speaker Diarization Using Latent Space Clustering in Generative Adversarial Network | M. Pal et al. | In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) back-projection with the help of an encoder network. |
1304 | Multimodal Speaker Diarization of Real-World Meetings Using D-Vectors With Spatial Features | W. Kang, B. C. Roy and W. Chow | In this paper, we present a novel approach to multimodal speaker diarization that combines d-vectors with spatial information derived from performing beamforming given a multi-channel microphone array. |
1305 | Speaker Diarization with Region Proposal Network | Z. Huang et al. | In this paper, we propose a novel speaker diarization method: Region Proposal Network based Speaker Diarization (RPNSD). |
1306 | Optimizing Bayesian Hmm Based X-Vector Clustering for the Second Dihard Speech Diarization Challenge | M. Diez, L. Burget, F. Landini, S. Wang and H. Cernock? | In this paper, we focus on the two x-vector clustering methods employed, namely Agglomerative Hierarchical Clustering followed by a clustering based on Bayesian Hidden Markov Model (BHMM). |
1307 | A Memory Augmented Architecture for Continuous Speaker Identification in Meetings | N. Flemotomos and D. Dimitriadis | We introduce and analyze a novel approach to the problem of speaker identification in multi-party recorded meetings. |
1308 | But System for the Second Dihard Speech Diarization Challenge | F. Landini et al. | This paper describes the winning systems developed by the BUT team for the four tracks of the Second DIHARD Speech Diarization Challenge. |
1309 | Estimating the Degree of Sleepiness by Integrating Articulatory Feature Knowledge in Raw Waveform Based CNNS | J. Fritsch, S. P. Dubagunta and M. Magimai.-Doss | This paper investigates an end-to-end approach, where given raw waveform as input, a convolutional neural network (CNN) estimates at its output the degree of sleepiness. |
1310 | Automatic Prediction of Suicidal Risk in Military Couples Using Multimodal Interaction Cues from Couples Conversations | S. N. Chakravarthula et al. | In this work, we investigate whether acoustic, lexical, behavior and turn-taking cues from military couples? conversations can provide meaningful markers of suicidal risk. |
1311 | Comparison of User Models Based on GMM-UBM and I-Vectors for Speech, Handwriting, and Gait Assessment of Parkinson?s Disease Patients | J. C. Vasquez-Correa, T. Bocklet, J. R. Orozco-Arroyave and E. N?th | This study introduces the use of GMM-UBM and i-vectors to evaluate the neurological state of Parkinson?s patients using information from speech, handwriting, and gait. |
1312 | Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments | Z. Huang, J. Epps and D. Joachim | Motivated by the success of these methods, this paper proposes a novel way to extract full vocal tract coordination (FVTC) features by use of convolutional neural networks (CNNs), overcoming earlier shortcomings. |
1313 | Deep Learning Based Prediction of Hypernasality for Clinical Applications | V. C. Mathad, K. Chapman, J. Liss, N. Scherer and V. Berisha | In this paper, we propose a new approach that uses the speech samples from healthy controls to model the acoustic characteristics of nasalized speech. |
1314 | Language Independent Gender Identification from Raw Waveform Using Multi-Scale Convolutional Neural Networks | K. D N, A. D, S. S. Reddy, A. Acharya, P. A. Garapati and T. B J | In this work, we propose a raw waveform based multiscale convolution neural network approach for language-independent gender identification. |
1315 | Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV | H. Wu, S. Liu, H. Meng and H. Lee | In this work, we introduce a passive defense method, spatial smoothing, and a proactive defense method, adversarial training, to mitigate the vulnerability of ASV spoofing countermeasure models against adversarial examples. |
1316 | Text-Independent Speaker Verification with Adversarial Learning on Short Utterances | K. Liu and H. Zhou | To address the problem, in this paper, we propose an adversarially learned embedding mapping model that directly maps a short embedding to an enhanced embedding with increased discriminability. |
1317 | Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training | Z. Chen, S. Wang, Y. Qian and K. Yu | Inspired by the successful joint multi-task and adversarial training with phonetic information for phonetic-invariant speaker embedding learning, in this paper, a similar methodology is developed to suppress the channel variability. |
1318 | Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems | X. Li, J. Zhong, X. Wu, J. Yu, X. Liu and H. Meng | This work investigates the vulnerability of Gaussian Mixture Model (GMM) i-vector based speaker verification systems to adversarial attacks, and the transferability of adversarial samples crafted from GMM i-vector based systems to x-vector based systems. |
1319 | Orthogonal Training for Text-Independent Speaker Verification | Y. Zhu and B. Mak | In this paper we propose orthogonal training schemes to improve the effectiveness of cosine similarity measurements in text-independent speaker verification (SV) tasks. |
1320 | Assessing the Scope of Generalized Countermeasures for Anti-Spoofing | R. K. Das, J. Yang and H. Li | In this work, we consider widely popular constant-Q cepstral coefficient features along with two other promising front-ends that capture long-term signal characteristics to assess their scope as generalized countermeasures. |
1321 | Improving Speaker-Attribute Estimation by Voting Based on Speaker Cluster Information | N. Tawara, H. Kamiyama, S. Kobashikawa and A. Ogawa | This paper proposes a general post-processing method for improving speaker-attribute estimation. |
1322 | An Ensemble Based Approach for Generalized Detection of Spoofing Attacks to Automatic Speaker Recognizers | J. Monteiro, J. Alam and T. H. Falk | In this work, we introduce an end-to-end ensemble based approach such that two models ? previously shown to perform well on each considered attack strategy ? are trained jointly, while a third model learns how to mix their outputs yielding a single score. |
1323 | A Discriminative Condition-Aware Backend for Speaker Verification | L. Ferrer and M. McLaren | We present a scoring approach for speaker verification that mimics the standard PLDA-based backend process used in most current speaker verification systems. |
1324 | Adversarial Multi-Task Learning for Speaker Normalization in Replay Detection | G. Suthokumar, V. Sethu, K. Sriskandaraja and E. Ambikairajah | In this paper, we propose a novel speaker normalisation technique that employs adversarial multi-task learning to compensate for this speaker variability. |
1325 | Robust Speaker Recognition Using Unsupervised Adversarial Invariance | R. Peri, M. Pal, A. Jati, K. Somandepalli and S. Narayanan | In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. |
1326 | A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition | Q. Wang, K. Okabe, K. A. Lee and T. Koshinaka | This paper proposes a generalized framework for domain adaptation of Probabilistic Linear Discriminant Analysis (PLDA) in speaker recognition. |
1327 | CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement | G. Liu, K. Gong, X. Liang and Z. Chen | In this work, we make the first attempt to explore the global and local speech features for coarse-to-fine speech enhancement and introduce a Context Pyramid Generative Adversarial Network (CPGAN), which contains a densely-connected feature pyramid generator and a dynamic context granularity discriminator to better eliminate audio noise hierarchically. |
1328 | Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain | A. Pandey and D. Wang | In this work, we propose a fully convolutional neural network for real-time speech enhancement in the time domain. |
1329 | Pan: Phoneme-Aware Network for Monaural Speech Enhancement | Z. Du, M. Lei, J. Han and S. Zhang | Inspired by the progress, we propose a phoneme-aware network (PAN) to utilize the noisy PPGs for speech enhancement. |
1330 | Efficient Trainable Front-Ends for Neural Speech Enhancement | J. Casebeer, U. Isik, S. Venkataramani and A. Krishnaswamy | We present an efficient, trainable front-end based on the butterfly mechanism to compute the Fast Fourier Transform, and show its accuracy and efficiency benefits for low-compute neural speech enhancement models. |
1331 | Invertible DNN-Based Nonlinear Time-Frequency Transform for Speech Enhancement | D. Takeuchi, K. Yatabe, Y. Koizumi, Y. Oikawa and N. Harada | We propose an end-to-end speech enhancement method with trainable time-frequency (T-F) transform based on invertible deep neural network (DNN). |
1332 | T-GSA: Transformer with Gaussian-Weighted Self-Attention for Speech Enhancement | J. Kim, M. El-Khamy and J. Lee | In this paper, we propose a Transformer with Gaussian-weighted self-attention (T-GSA), whose attention weights are attenuated according to the distance between target and context symbols. |
1333 | Redundant Convolutional Network With Attention Mechanism For Monaural Speech Enhancement | T. Lan, Y. Lyu, G. Hui, R. Mokhosi, S. Li and Q. Liu | In this study, we introduce an attention mechanism into the convolutional encoderdecoder model. |
1334 | Residual Recurrent Neural Network for Speech Enhancement | J. Abdulbaqi, Y. Gu, S. Chen and I. Marsic | We introduce an end-to-end fully recurrent neural network for single-channel speech enhancement. |
1335 | 2D-to-2D Mask Estimation for Speech Enhancement Based on Fully Convolutional Neural Network | Y. Tu, J. Du and C. Lee | In this study, we design a new Fully CNN (FCNN)-based regression model, which can directly achieve the 2-dimensional (2D) noisy lpg-power spectra (LPS) input to 2dimensional (2D) time-frequency mask output mapping, denoted as 2D-RFCNN. |
1336 | Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement | R. E. Zezario, T. Hussain, X. Lu, H. Wang and Y. Tsao | In this study, we proposed an unsupervised learning approach for speech enhancement, i.e., denoising autoencoder with linear regression decoder (DAELD) model for speech enhancement. |
1337 | Fully Convolutional Recurrent Networks for Speech Enhancement | M. Strake, B. Defraene, K. Fluyt, W. Tirry and T. Fingscheidt | As first novelty, we propose to replace the fully-connected LSTM by a convolutional LSTM (ConvLSTM) and call the resulting network a fully convolutional recurrent network (FCRN). |
1338 | Phonetic Feedback for Speech Enhancement with and Without Parallel Speech Data | P. Plantinga, D. Bagchi and E. Fosler-Lussier | We use the technique of mimic loss to provide phonetic feedback to an off-the-shelf enhancement system, and find gains in objective intelligibility scores on CHiME-4 data. |
1339 | Scalable Multilingual Frontend for TTS | A. Conkie and A. Finch | This paper describes progress towards making a Neural Text-to-Speech (TTS) Frontend that works for many languages and can be easily extended to new languages. |
1340 | A Unified Sequence-to-Sequence Front-End Model for Mandarin Text-to-Speech Synthesis | J. Pan et al. | In this paper, we proposed a unified sequence-to-sequence front-end model for Mandarin TTS that converts raw texts to linguistic features directly. |
1341 | A Hybrid Text Normalization System Using Multi-Head Self-Attention For Mandarin | J. Zhang et al. | In this paper, we propose a hybrid text normalization system using multi-head self-attention. |
1342 | Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior | G. Sun et al. | This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples. |
1343 | Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS | Y. Xiao, L. He, H. Ming and F. K. Soong | In this study, with a multi-speaker TTS to accommodate the insufficient training data of a target speaker, we investigate linguistic features and Bert-derived information to improve the prosody of our Mandarin Chinese TTS. |
1344 | Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis | R. Fu, J. Tao, Z. Wen, J. Yi and T. Wang | In this paper, we present two novel methods to handle the above problems by focusing on the attention. |
1345 | Aligntts: Efficient Feed-Forward Text-to-Speech System Without Explicit Alignment | Z. Zeng, J. Wang, N. Cheng, T. Xia and J. Xiao | Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. |
1346 | GraphTTS: Graph-to-Sequence Modelling in Neural Text-to-Speech | A. Sun, J. Wang, N. Cheng, H. Peng, Z. Zeng and J. Xiao | This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms. |
1347 | Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment | Y. Yasuda, X. Wang and J. Yamagishi | This research investigates various combinations of sampling methods and probability distributions for alignment transition modeling in a hard-alignment-based sequence-to-sequence TTS method called SSNT-TTS. |
1348 | Transformer-Based Text-to-Speech with Weighted Forced Attention | T. Okamoto, T. Toda, Y. Shiga and H. Kawai | Therefore, a Transformer-based acoustic model with weighted forced attention obtained from phoneme durations is proposed to improve synthesis accuracy and stability, where both encoder?decoder attention and forced attention are used with a weighting factor. |
1349 | Improving End-to-End Speech Synthesis with Local Recurrent Neural Network Enhanced Transformer | Y. Zheng, X. Li, F. Xie and L. Lu | In this paper, we introduce local recurrent neural network (Local-RNN) into Transformer to make full use of the advantages of both RNN and Transformer while mitigating their drawbacks. |
1350 | GCI Detection from Raw Speech Using a Fully-Convolutional Network | L. Ardaillon and A. Roebel | Following this trend, we propose a simple approach that performs a mapping from the speech waveform to a target signal from which the GCIs are obtained by peak-picking. |
1351 | Frame-Based Overlapping Speech Detection Using Convolutional Neural Networks | M. Yousefi and J. H. L. Hansen | In this study, we investigate the detection of overlapping speech on segments as short as 25 ms using Convolutional Neural Networks. |
1352 | Learning Domain Invariant Representations for Child-Adult Classification from Speech | R. Lahiri, M. Kumar, S. Bishop and S. Narayanan | In this work, we address two major sources of variability-age of the child and data source collection location-using domain adversarial learning which does not require labeled target domain data. |
1353 | Single Frequency Filter Bank Based Long-Term Average Spectra for Hypernasality Detection and Assessment in Cleft Lip and Palate Speech | M. H. Javid, K. Gurugubelli and A. K. Vuppala | This work proposes single frequency filter bank based long-term average spectral (SFFB-LTAS) features for hypernasality detection and assessment. |
1354 | Autoregressive Parameter Estimation with Dnn-Based Pre-Processing | Z. Cui, C. Bao, J. K. Nielsen and M. Gr?sb?ll Christensen | In this paper, a method for estimating the autoregressive parameters from a signal segment is proposed. |
1355 | Enhancement of Coded Speech Using a Mask-Based Post-Filter | S. Korse, K. Gupta and G. Fuchs | In this paper, a data-driven post-filter relying on masking in the time-frequency domain is proposed. |
1356 | Robust Low Rate Speech Coding Based on Cloned Networks and Wavenet | F. S. C. Lim, W. Bastiaan Kleijn, M. Chinen and J. Skoglund | We present a new speech-coding scheme that is based on features that are robust to the distortions occurring in speech-coder input signals. |
1357 | Mixture Factorized Auto-Encoder for Unsupervised Hierarchical Deep Factorization of Speech Signal | Z. Peng, S. Feng and T. Lee | This paper proposes the mixture factorized auto-encoder (mFAE) for unsupervised deep factorization. |
1358 | A Novel Approach for Intelligibility Assessment in Dysarthric Subjects | A. Tripathi, S. Bhosale and S. K. Kopparapu | In the paper, we propose a usable novel method to assess intelligibility of dysarthric speakers. |
1359 | Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson?s Disease and Healthy Controls with CNN-LSTM using transfer learning | J. Mallela et al. | In this paper, we consider 2-class and 3-class classification problems for classifying patients with Amyotrophic Lateral Sclerosis (ALS), Parkinson’s Disease (PD), and Healthy Controls (HC) using a CNNLSTM network. |
1360 | Analysis of Acoustic Features for Speech Sound Based Classification of Asthmatic and Healthy Subjects | S. Yadav, M. Keerthana, D. Gope, U. Maheswari K. and P. Kumar Ghosh | Analysis of Acoustic Features for Speech Sound Based Classification of Asthmatic and Healthy Subjects |
1361 | Frequency and Temporal Convolutional Attention for Text-Independent Speaker Recognition | S. Yadav and A. Rai | In this paper, we propose methods of convolutional attention for independently modelling temporal and frequency information in a convolutional neural network (CNN) based front-end. |
1362 | Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances | N. Tawara, A. Ogawa, T. Iwata, M. Delcroix and T. Ogawa | This paper investigates a phoneme-invariant speaker embedding approach for speaker recognition on extremely short utterances. |
1363 | Prototypical Networks for Small Footprint Text-Independent Speaker Verification | T. Ko, Y. Chen and Q. Li | In this paper, we investigate the use of prototypical networks in a small footprint text-independent speaker verification task. |
1364 | TDMF: Task-Driven Multilevel Framework for End-to-End Speaker Verification | C. Chen and J. Han | In this paper, a task-driven multilevel framework (TDMF) is proposed for end-to-end speaker verification. |
1365 | An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales | B. Gu, W. Guo, L. Dai and J. Du | This paper presents an improved deep embedding learning method based on a convolutional neural network (CNN) for text-independent speaker verification. |
1366 | Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification | Z. Bai, X. Zhang and J. Chen | In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. |
1367 | Knowledge Distillation and Random Erasing Data Augmentation for Text-Dependent Speaker Verification | V. Mingote, A. Miguel, D. Ribas, A. Ortega and E. Lleida | This paper explores the Knowledge Distillation (KD) approach and a data augmentation technique to improve the generalization ability and robustness of text-dependent speaker verification (SV) systems. |
1368 | Disentangled Speech Embeddings Using Cross-Modal Self-Supervision | A. Nagrani, J. S. Chung, S. Albanie and A. Zisserman | The objective of this paper is to learn representations of speaker identity without access to manually annotated data. |
1369 | Improving Deep CNN Networks with Long Temporal Context for Text-Independent Speaker Verification | Y. Zhao, T. Zhou, Z. Chen and J. Wu | In this paper, we explore two approaches for modeling long temporal contexts to improve the performance of the ResNet networks. |
1370 | Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization | W. Lin, M. Mak, N. Li, D. Su and D. Yu | In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). |
1371 | Multi-Task Learning for Speaker Verification and Voice Trigger Detection | S. Sigtia, E. Marchi, S. Kajarekar, D. Naik and J. Bridle | In this study, we investigate training a single network to perform both tasks jointly. |
1372 | Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification | Q. Hong, C. Wu, H. Wang and C. Huang | This paper aims to improve speaker embedding representation based on x-vector for extracting more detailed information for speaker verification. |
1373 | SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition | Z. Huang, T. Ng, L. Liu, H. Mason, X. Zhuang and D. Liu | Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. |
1374 | Robust Multi-Channel Speech Recognition Using Frequency Aligned Network | T. Park, K. Kumatani, M. Wu and S. Sundaram | In this paper, we further develop this idea and use frequency aligned network for robust multi-channel automatic speech recognition (ASR). |
1375 | Fully Learnable Front-End for Multi-Channel Acoustic Modeling Using Semi-Supervised Learning | S. Wager, A. Khare, M. Wu, K. Kumatani and S. Sundaram | In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). |
1376 | G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR | D. Le, T. Koehler, C. Fliegen and M. L. Seltzer | In this work, we present a novel method to train a statistical grapheme-to-grapheme (G2G) model on text-to-speech data that can rewrite an arbitrary character sequence into more phonetically consistent forms. |
1377 | Transformer-Based Acoustic Modeling for Hybrid Speech Recognition | Y. Wang et al. | We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. |
1378 | Specaugment on Large Scale Datasets | D. S. Park et al. | In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Narayanan et al., 2018). |
1379 | Fast Training of Deep Neural Networks for Speech Recognition | G. Cong, B. Kingsbury, C. Yang and T. Liu | We present an approach for training a bidirectional LSTM acoustic model on the 2000-hour Switchboard corpus. |
1380 | Unsupervised Pre-Training of Bidirectional Speech Encoders via Masked Reconstruction | W. Wang, Q. Tang and K. Livescu | We propose an approach for pre-training speech representations via a masked reconstruction loss. |
1381 | Distilling Attention Weights for CTC-Based ASR Systems | T. Moriya, H. Sato, T. Tanaka, T. Ashihara, R. Masumura and Y. Shinohara | We present a novel training approach for connectionist temporal classification (CTC) -based automatic speech recognition (ASR) systems. |
1382 | DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks | A. Tjandra et al. | Here, we propose to feed the input features at multiple depths in the acoustic model. |
1383 | Frame-Level MMI as A Sequence Discriminative Training Criterion for LVCSR | W. Michel, R. Schl?ter and H. Ney | In this work we present frame-level maximum mutual information (MMI) as a novel sequence discriminative training criterion for hybrid HMM-DNN acoustic models. |
1384 | Cross Lingual Transfer Learning for Zero-Resource Domain Adaptation | A. Abad, P. Bell, A. Carmantini and S. Renais | We propose a method for zero-resource domain adaptation of DNN acoustic models, for use in low-resource situations where the only in-language training data available may be poorly matched to the intended target domain. |
1385 | Improving Robustness of Deep Learning Based Monaural Speech Enhancement Against Processing Artifacts | K. Tan and D. Wang | In this study, we find that enhancing a speech signal twice can dramatically degrade the enhancement performance. |
1386 | CAD-AEC: Context-Aware Deep Acoustic Echo Cancellation | A. Fazel, M. El-Khamy and J. Lee | This paper proposes a context- aware deep AEC (CAD-AEC) by introducing two main components. |
1387 | Artificial Bandwidth Extension Using Conditional Variational Auto-encoders and Adversarial Learning | P. Bachhav, M. Todisco and N. Evans | Generative models such as conditional variational auto-encoders (CVAEs) are capable of modelling complex data distributions via latent representation learning. This paper reports their application to ABE. CVAEs, form of directed, graphical models, are exploited to model the probability distribution of highband features conditioned on narrowband features. |
1388 | Using Automatic Speech Recognition and Speech Synthesis to Improve the Intelligibility of Cochlear Implant users in Reverberant Listening Environments | K. Chu, L. Collins and B. Mainsah | Thus, the current study investigated the performance of the previously proposed algorithm in multiple unseen environments. |
1389 | Speech Intelligibility Enhancement by Equalization for in-Car Applications | E. Gentet, B. David, S. Denjean, G. Richard and V. Roussarie | In this paper, we propose a speech intelligibility enhancement method for typical in-car applications in noisy environments. |
1390 | Maximum Likelihood Estimation of the Interference-Plus-Noise Cross Power Spectral Density Matrix for Own Voice Retrieval | P. Hoang, Z. Tan, T. Lunner, J. M. de Haan and J. Jensen | In this paper, a novel maximum likelihood (ML) estimator of the interference-plus-noise CPSD matrix is proposed. |
1391 | A Constrained Maximum Likelihood Estimator of Speech and Noise Spectra with Application to Multi-Microphone Noise Reduction | A. Zahedi, M. S. Pedersen, J. ?stergaard, L. Bramsl?w, T. U. Christiansen and J. Jensen | In this paper, we suggest a new estimation technique that tackles this issue by enforcing a power constraint on the estimation problem. |
1392 | CLCNET: Deep Learning-Based Noise Reduction for Hearing aids using Complex Linear Coding | H. Schr?ter, T. Rosenkranz, A. N. Escalante-B, M. Aubreville and A. Maier | To improve monaural speech enhancement in noisy environments, we propose CLCNet, a framework based on complex valued linear coding. |
1393 | A Time-Frequency Network with Channel Attention and Non-Local Modules for Artificial Bandwidth Extension | Y. Dong et al. | In this paper, we introduce a Time-Frequency Network (TFNet) with channel attention (CA) and non-local (NL) modules for ABE. |
1394 | Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise | X. Hao et al. | For better speech enhancement at the above scenarios, this paper proposes a two-stage approach that consists of binary masking and spectrogram inpainting. |
1395 | 3-D Acoustic Modeling for Far-Field Multi-Channel Speech Recognition | A. Purushothaman, A. Sreeram and S. Ganapathy | In this paper, we propose to model the multi-channel signal directly using a convolutional neural network (CNN) based architecture which performs the joint acoustic modeling on the three dimensions of time, frequency and channel. |
1396 | Improving Reverberant Speech Training Using Diffuse Acoustic Simulation | Z. Tang, L. Chen, B. Wu, D. Yu and D. Manocha | We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks. |
1397 | Low-Frequency Compensated Synthetic Impulse Responses For Improved Far-Field Speech Recognition | Z. Tang, H. Meng and D. Manocha | We propose a method for generating low-frequency compensated synthetic impulse responses that improve the performance of farfield speech recognition systems trained on artificially augmented datasets. |
1398 | Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition | Y. Chen, Z. Yang, C. Yeh, M. Jain and M. L. Seltzer | In this paper, our goal is to build a unified end-to-end speech recognition system that generalizes well across accents. |
1399 | Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset | J. Yu et al. | Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. |
1400 | Multi-Task Self-Supervised Learning for Robust Speech Recognition | M. Ravanelli et al. | This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. |
1401 | End-to-End Multi-Person Audio/Visual Automatic Speech Recognition | O. Braga, T. Makino, O. Siohan and H. Liao | We propose a fully differentiable A/V ASR model that is able to handle multiple face tracks in a video. |
1402 | End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection | T. Yoshimura, T. Hayashi, K. Takeda and S. Watanabe | This paper integrates a voice activity detection (VAD) function with end-to-end automatic speech recognition toward an online speech interface and transcribing very long audio recordings. |
1403 | End-to-End Training of Time Domain Audio Separation and Recognition | T. von Neumann et al. | We here demonstrate how to combine a separation module based on a Convolutional Time domain Audio Separation Network (Conv-TasNet) with an E2E speech recognizer and how to train such a model jointly by distributing it over multiple GPUs or by approximating truncated back-propagation for the convolutional front-end. |
1404 | Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network | K. Kinoshita, T. Ochiai, M. Delcroix and T. Nakatani | In this paper, we show that a single-channel time-domain denoising approach can significantly improve ASR performance, providing more than 30 % relative word error reduction over a strong ASR back-end on the real evaluation data of the single-channel track of the CHiME-4 dataset. |
1405 | A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition | R. Li, G. Sell, X. Wang, S. Watanabe and H. Hermansky | In this work, we propose a practical two-stage training scheme. |
1406 | Multi-Scale Octave Convolutions for Robust Speech Recognition | J. Rownicka, P. Bell and S. Renals | We propose a multi-scale octave convolution layer to learn robust speech representations efficiently. |
1407 | Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition | S. Zhang, C. Do, R. Doddipatla and S. Renals | To address this, we propose transfer learning from a clean dataset (WSJ) to a noisy dataset (CHiME4) for connectionist temporal classification models. |
1408 | Improving Speech Recognition Using Consistent Predictions on Synthesized Speech | G. Wang et al. | In this work, we demonstrate that promoting consistent predictions in response to real and synthesized speech enables significantly improved speech recognition performance. |
1409 | Attention-Based ASR with Lightweight and Dynamic Convolutions | Y. Fujita, A. S. Subramanian, M. Omachi and S. Watanabe | In this paper, we propose to apply lightweight and dynamic convolution to E2E ASR as an alternative architecture to the self-attention to make the computational order linear. |
1410 | An Attention-Based Joint Acoustic and Text on-Device End-To-End Model | T. N. Sainath, R. Pang, R. J. Weiss, Y. He, C. Chiu and T. Strohman | In this work, we introduce a joint acoustic and text decoder (JATD) into the LAS decoder, which makes it possible to incorporate a much larger text corpus into training. |
1411 | Structured Sparse Attention for end-to-end Automatic Speech Recognition | J. Xue, T. Zheng and J. Han | In this paper, we present two sparse attention mechanisms for ASR tasks with long utterances, which try to improve the attention mechanism by introducing the sparse transformation. |
1412 | Rnn-Transducer with Stateless Prediction Network | M. Ghodsi, X. Liu, J. Apfel, R. Cabrera and E. Weinstein | We focus on the prediction network of the RNNT, since it is believed to be analogous to the Language Model (LM) in the classic ASR systems. |
1413 | Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition | R. Masumura, M. Ihori, A. Takashima, T. Moriya, A. Ando and Y. Shinohara | This paper presents a novel semi-supervised end-to-end automatic speech recognition (ASR) method that employs consistency training with the use of unlabeled data. |
1414 | Independent Language Modeling Architecture for End-To-End ASR | V. T. Pham et al. | To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output. |
1415 | Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings | A. Rouhe, T. Kaseva and M. Kurimo | We apply speaker-aware training to attention-based end-to-end speech recognition. We show that it can improve over a purely end-to-end baseline. |
1416 | Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems | N. Rossenbach, A. Zeyer, R. Schl?ter and H. Ney | We extend state-of-the-art attention-based automatic speech recognition (ASR) systems with synthetic audio generated by a TTS system trained only on the ASR corpora itself. |
1417 | Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model | O. Hrinchuk, M. Popova and B. Ginsburg | In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition. |
1418 | Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition | H. Hu, R. Zhao, J. Li, L. Lu and Y. Gong | In this work, we conversely leverage external alignments to seed the RNN-T model. |
1419 | Self-Training for End-to-End Speech Recognition | J. Kahn, A. Lee and A. Hannun | We revisit self-training in the context of end-to-end speech recognition. |
1420 | Toward Better Speaker Embeddings: Automated Collection of Speech Samples From Unknown Distinct Speakers | M. Pham, Z. Li and J. Whitehill | With the goal of training better embedding models, we devise an automatic pipeline for large-scale collection of speech samples from unique speakers that is significantly more automated than previous approaches. |
1421 | Channel Adversarial Training for Speaker Verification and Diarization | C. Luu, P. Bell and S. Renals | We propose a training strategy which aims to produce features that are invariant at the granularity of the recording or channel, a finer grained objective than dataset- or environment-invariance. |
1422 | Progressive Multi-Target Network Based Speech Enhancement with Snr-Preselection for Robust Speaker Diarization | L. Sun, J. Du, X. Zhang, T. Gao, X. Fang and C. Lee | In this paper, we design a novel front-end processing system for speaker diarization under realistic conditions with challenging background noises. |
1423 | Improved Large-Margin Softmax Loss for Speaker Diarisation | Y. Fathullah, C. Zhang and P. C. Woodland | Therefore, this paper introduces a general approach to the large-margin softmax loss without any approximations to improve the quality of speaker embeddings for diarisation. |
1424 | Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks | J. Wang, X. Xiao, J. Wu, R. Ramamurthy, F. Rudzicz and M. Brudno | In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session. |
1425 | Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection | L. Bullock, H. Bredin and L. P. Garcia-Perera | We address the problem of effectively handling overlapping speech in a diarization system. |
1426 | On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study | R. K. Das and H. Li | On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study |
1427 | Pyannote.Audio: Neural Building Blocks for Speaker Diarization | H. Bredin et al. | We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. |
1428 | Speaker Embeddings Incorporating Acoustic Conditions for Diarization | Y. Higuchi, M. Suzuki and G. Kurata | We present our work on training speaker embeddings, especially effective for speaker diarization. |
1429 | Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data | E. Fini and A. Brutti | In this paper we propose qualitative modifications to the model that significantly improve the learning efficiency and the overall diarization performance. |
1430 | Investigation of Specaugment for Deep Speaker Embedding Learning | S. Wang, J. Rohdin, O. Plchot, L. Burget, K. Yu and J. Cernock? | In this paper, we investigate the usage of SpecAugment for speaker verification tasks. |
1431 | Speaker-Invariant Affective Representation Learning via Adversarial Training | H. Li, M. Tu, J. Huang, S. Narayanan and P. Georgiou | In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals. |
1432 | Speech Sentiment Analysis via Pre-Trained Features from End-to-End ASR Models | Z. Lu, L. Cao, Y. Zhang, C. Chiu and J. Fan | In this paper, we propose to use pre-trained features from end-to-end ASR models to solve speech sentiment analysis as a down-stream task. |
1433 | Gender Differences on the Perception and Production of Utterances with Willingness and Reluctance in Chinese | W. Li, J. W. Wong and J. Tu | This study intends to explore the effects of gender differences on the perception and production of emotional intonation with willingness and reluctance. |
1434 | Hierarchical Attention Transfer Networks for Depression Assessment from Speech | Z. Zhao, Z. Bao, Z. Zhang, N. Cummins, H. Wang and B. Schuller | In this paper, we propose a novel crosstask approach which transfers attention mechanisms from speech recognition to aid depression severity measurement. |
1435 | Detecting Emotion Primitives from Speech and Their Use in Discerning Categorical Emotions | V. Kowtha et al. | This work investigated a long-shot-term memory (LSTM) network and a time convolution – LSTM (TC-LSTM) to detect primitive emotion attributes such as valence, arousal, and dominance, from speech. |
1436 | X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition | R. Pappagari, T. Wang, J. Villalba, N. Chen and N. Dehak | In this work, we explore the dependencies between speaker recognition and emotion recognition. |
1437 | Speech Emotion Recognition with Local-Global Aware Deep Representation Learning | J. Liu, Z. Liu, L. Wang, L. Guo and J. Dang | In this paper, we propose a local-global aware deep representation learning system that mainly includes two modules. |
1438 | Multi-Head Attention for Speech Emotion Recognition with Auxiliary Learning of Gender Recognition | A. Nediyanchath, P. Paramasivam and P. Yenigalla | The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. |
1439 | Generating and Protecting Against Adversarial Attacks for Deep Speech-Based Emotion Recognition Models | Z. Ren, A. Baird, J. Han, Z. Zhang and B. Schuller | With this in mind, in this study, we aim to train deep models to defending against non-targeted white-box adversarial attacks. |
1440 | Deep Encoded Linguistic and Acoustic Cues for Attention Based End to End Speech Emotion Recognition | S. Bhosale, R. Chakraborty and S. K. Kopparapu | An End-to-End model with convolutional layers and multi-head self attention mechanism is proposed for Speech Emotion Recognition (SER) task. |
1441 | Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions | U. Tiwari, M. Soni, R. Chakraborty, A. Panda and S. K. Kopparapu | In this paper, to address the robustness aspect of the SER in additive noise scenarios, we propose multi-conditioning and data augmentation using an utterance level parametric Generative noise model. |
1442 | A Self-Attentive Emotion Recognition Network | H. Partaourides, K. Papadamou, N. Kourtellis, I. Leontiades and S. Chatzis | In this work, we introduce a novel attention mechanism capable of inferring the immensity of the effect of each past utterance on the current speaker emotional state. |
1443 | Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction | P. L. Tobing, Y. Wu, T. Hayashi, K. Kobayashi and T. Toda | In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian distribution parameters of multiple samples are estimated simultaneously. |
1444 | Flow-TTS: A Non-Autoregressive Network for Text to Speech Based on Flow | C. Miao, S. Liang, M. Chen, J. Ma, S. Wang and J. Xiao | In this work, we propose Flow-TTS, a non-autoregressive end-to-end neural TTS model based on generative flow. |
1445 | WaveFFJORD: FFJORD-Based Vocoder for Statistical Parametric Speech Synthesis | N. Wu and Z. Ling | Inspired by WaveGlow, in this paper, we propose WaveFFJORD, a neural vocoder that can synthesize speech waveforms from acoustic features, by combining FFJORD and WaveNet. |
1446 | Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network | M. Hwang, E. Song, R. Yamamoto, F. Soong and H. Kang | In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN). |
1447 | Disentangling Timbre and Singing Style with Multi-Singer Singing Synthesis System | J. Lee, H. Choi, J. Koo and K. Lee | In this study, we define the identity of the singer with two independent concepts – timbre and singing style – and propose a multi-singer singing synthesis system that can model them separately. |
1448 | Sequence-to-Sequence Singing Synthesis Using the Feed-Forward Transformer | M. Blaauw and J. Bonada | We propose a sequence-to-sequence singing synthesizer, which avoids the need for training data with pre-aligned phonetic and acoustic features. |
1449 | Korean Singing Voice Synthesis Based on Auto-Regressive Boundary Equilibrium Gan | S. Choi, W. Kim, S. Park, S. Yong and J. Nam | In this paper, we propose a Korean singing voice synthesis system that addresses the issues using an auto-regressive algorithm that generates spectrogram with the boundary equilibrium GAN objective. |
1450 | Fast and High-Quality Singing Voice Synthesis System Based on Convolutional Neural Networks | K. Nakamura, S. Takaki, K. Hashimoto, K. Oura, Y. Nankaku and K. Tokuda | The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). |
1451 | Hybrid Neural-Parametric F0 Model for Singing Synthesis | J. Bonada and M. Blaauw | We propose a novel hybrid neural-parametric fundamental frequency generation model for singing voice synthesis. |
1452 | Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit | T. Koriyama and H. Saruwatari | This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. |
1453 | Emotional Speech Synthesis with Rich and Granularized Control | S. Um, S. Oh, K. Byun, I. Jang, C. Ahn and H. Kang | This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. |
1454 | Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning | A. H. Liu, T. Tu, H. Lee and L. Lee | In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances. |
1455 | An Empirical Study of Conv-Tasnet | B. Kadioglu, M. Horgan, X. Liu, J. Pons, D. Darcy and V. Kumar | In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. |
1456 | Mask-Dependent Phase Estimation for Monaural Speaker Separation | Z. Ni and M. I. Mandel | In this paper, we propose a simple yet effective phase estimation network that predicts the phase of the clean speech based on a T-F mask predicted by a chimera++ network. |
1457 | Joint Phoneme Alignment and Text-Informed Speech Separation on Highly Corrupted Speech | K. Schulze-Forster, C. S. J. Doire, G. Richard and R. Badeau | We propose to perform text-informed speech-music separation and phoneme alignment jointly using recurrent neural networks and the attention mechanism. |
1458 | Single-Channel Speech Separation Integrating Pitch Information Based on a Multi Task Learning Framework | X. Li, R. Liu, T. Song, X. Wu and J. Chen | In this paper, we aimed to combine speech separation and pitch tracking together to let them benefit from each other. |
1459 | Continuous Speech Separation: Dataset and Analysis | Z. Chen et al. | In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a continuous audio stream that contains multiple utterances that are partially overlapped by a varying degree. |
1460 | The Sound of My Voice: Speaker Representation Loss for Target Voice Separation | S. Mun, S. Choe, J. Huh and J. S. Chung | In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it speaker representation loss. |
1461 | Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction | X. Ji et al. | In this paper, we propose a novel training framework which jointly learns the speaker-conditioned target speaker extraction model and its associated speaker embedding model. |
1462 | Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives | A. S. Subramanian et al. | This paper proposes a method to jointly optimize a location guided target speech extraction module along with a speech recognition module only with ASR error minimization criteria. |
1463 | A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions | X. Wang, J. Du, A. Cristia, L. Sun and C. Lee | In this paper, we design a novel joint framework of speech enhancement and speech separation for child speech extraction in realistic conditions, targeting the problem of extracting child speech from daily conversations in BabyTrain mega corpus. |
1464 | An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-Talker Single Channel Audio-Visual ASR | L. Pasa, G. Morrone and L. Badino | In this paper, we analyzed how audio-visual speech enhancement can help to perform the ASR task in a cocktail party scenario. |
1465 | Deep Audio-Visual Speech Separation with Attention Mechanism | C. Li and Y. Qian | In this paper, we explore a better strategy to utilize visual representations with the attention mechanism. |
1466 | Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning | R. Gu et al. | In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. |
1467 | Detection and Analysis of T/D Deletion in Librispeech | J. Yuan, H. Lin and Y. Liu | In this study we developed a new method for automatic identification of t/d deletion. |
1468 | Prediction of Voicing and the F0 Contour from Electromagnetic Articulography Data for Articulation-to-Speech Synthesis | S. Stone, P. Schmidt and P. Birkholz | This paper examines to what extent the f0 contour of an utterance can be predicted from such supraglottal articulation data. |
1469 | A Comparative Study of Estimating Articulatory Movements from Phoneme Sequences and Acoustic Features | A. Singh, A. Illa and P. K. Ghosh | In this work, we estimate articulatory movements from three different input representations: R1) acoustic signal, R2) phoneme sequence, R3) phoneme sequence with timing information. |
1470 | Automatic Vocal Tractlandmark Tracking in Rtmri Using Fully Convolutional Networks and Kalman Filter | S. Asadiabadi and E. Erzin | In this work, we present an algorithm for robust detection of keypoints on the vocal tract in rtMRI sequences using fully convolutional networks (FCN) via a heatmap regression approach. |
1471 | Speech-Based Parameter Estimation of an Asymmetric Vocal Fold Oscillation Model and its Application in Discriminating Vocal Fold Pathologies | W. Zhao and R. Singh | In this paper, we present a novel and alternate way to determine vocal fold model parameters from the speech signal. |
1472 | End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection | Y. Lin, L. Wang, J. Dang, S. Li and C. Ding | In this study, we focus on detecting articulatory attribute errors for dysarthric patients with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS). |
1473 | Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context | S. A. Hebbar, R. Sharma, K. Somandepalli, A. Toutios and S. Narayanan | In this work, we propose to use information from temporal dynamics together with the spatial structure to detect the articulatory boundaries in rtMRI videos. |
1474 | Retrieving Vocal-Tract Resonance and anti-Resonance From High-Pitched Vowels Using a Rahmonic Subtraction Technique | Z. Zhang, K. Honda and J. Wei | In this study, a new cepstral method is developed combined with a refined rahmonic subtraction technique (RS-CEPS) to extract spectral envelopes excited by glottal noise sources. |
1475 | Epoch Estimation from a Speech Signal Using Gammatone Wavelets in a Scattering Network | P. Kulkarni, J. Sadasivan, A. Adiga and C. S. Seelamantula | In this paper, we propose a new technique that employs a Gammatone wavelet filterbank and compute a scattering sequence whose local maxima define the candidate epochs in the speech signal. |
1476 | Study of Closed Phase Resonance Bandwidths for Oral and Nasal Tracts Using Zero Time Windowing | H. D. Abbas, R. Prasad, B. T. Nellore and S. V. Gangashetty | Study of Closed Phase Resonance Bandwidths for Oral and Nasal Tracts Using Zero Time Windowing |
1477 | Algorithmic Exploration of American English Dialects | A. Aks?nova, A. Bruguier, A. Ritchart-Scott and U. Mendlovic | In this paper, we use a novel algorithmic approach to explore dialectal variation in American English speech. |
1478 | Comparison of Glottal Closure Instants Detection Algorithms for Emotional Speech | S. R. Kadiri, P. Alku and B. Yegnanarayana | In this paper, we compare six GCI detection algorithms using emotional speech and known evaluation metrics. |
1479 | Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR | L. Sari, N. Moritz, T. Hori and J. L. Roux | We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR). |
1480 | L-Vector: Neural Label Embedding for Domain Adaptation | Z. Meng et al. | We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains. |
1481 | Acoustic Model Adaptation for Presentation Transcription and Intelligent Meeting Assistant Systems | Y. Huang and Y. Gong | We present our solution for unsupervised rapid speaker adaptation in a state-of-art presentation and intelligent meeting transcription system. |
1482 | Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation | Y. Huang, L. He, W. Wei, W. Gale, J. Li and Y. Gong | We propose to use the personalized speech synthesis and the neural language generator to synthesize content relevant personalized speech for rapid speaker adaptation. |
1483 | Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition | F. Ding, W. Guo, L. Dai and J. Du | In this paper, we propose a novel adaptive technique that uses an attention-based gated scaling (AGS) scheme to improve deep feature learning for connectionist temporal classification (CTC) acoustic modeling. |
1484 | Adaptive Knowledge Distillation Based on Entropy | K. Kwon, H. Na, H. Lee and N. S. Kim | In this paper, we propose an entropy based KD training, which utilizes the teacher model labels with lower entropy at a larger rate among the various teacher models. |
1485 | Unsupervised Pretraining Transfers Well Across Languages | M. Rivi?re, A. Joulin, P. Mazar? and E. Dupoux | In this work, we investigate whether unsupervised pretraining transfers well across languages. |
1486 | Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition | B. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlicek and J. Billa | In this work, we explore a data scheduling strategy for semi-supervised learning (SSL) for acoustic modeling in automatic speech recognition. |
1487 | Source Domain Data Selection for Improved Transfer Learning Targeting Dysarthric Speech Recognition | F. Xiong, J. Barker, Z. Yue and H. Christensen | This paper presents an improved transfer learning framework applied to robust personalised speech recognition models for speakers with dysarthria. |
1488 | Study of Formant Modification for Children ASR | H. Kumar Kathania, S. Reddy Kadiri, P. Alku and M. Kurimo | In this paper, we propose a formant modification method to mitigate differences between adults? and children?s speech and to improve the performance of ASR for children. |
1489 | Pseudo Likelihood Correction Technique for Low Resource Accented ASR | A. Rajpal, A. R. MV, C. Yarra, R. Aggarwal and P. K. Ghosh | In order to improve the performance of a native English ASR on non-native English data, we, in this work, propose a DNN-based pseudo-likelihood correction (PLC) technique, in which a non-native pseudo-likelihood vector is mapped to match its native counterpart. |
1490 | Libri-Adapt: a New Speech Dataset for Unsupervised Domain Adaptation | A. Mathur, F. Kawsar, N. Berthouze and N. D. Lane | This paper introduces a new dataset, Libri-Adapt, to support unsupervised domain adaptation research on speech recognition models. |
1491 | Mining Effective Negative Training Samples for Keyword Spotting | J. Hou, Y. Shi, M. Ostendorf, M. Hwang and L. Xie | To address the problem, we propose an innovative algorithm, Regional Hard-Example (RHE) mining, to find effective negative training samples, in order to control the ratio of negative vs. positive data. |
1492 | Multi-Task Learning for Voice Trigger Detection | S. Sigtia, P. Clark, R. Haynes, H. Richards and J. Bridle | In this study, we address two major challenges. The first is that the detectors are deployed in complex acoustic environments with external noise and loud playback by the device itself. Secondly, collecting training examples for a specific keyword or trigger phrase is challenging resulting in a scarcity of trigger phrase specific training data. |
1493 | Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions | S. Mittermaier, L. K?rzinger, B. Waschneck and G. Rigoll | Compared to previous publications, we took additional steps to reduce power and memory consumption without reducing classification accuracy. |
1494 | Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks | P. Dighe et al. | In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using graph neural networks (GNN). |
1495 | Integration of Multi-Look Beamformers for Multi-Channel Keyword Spotting | X. Ji, M. Yu, J. Chen, J. Zheng, D. Su and D. Yu | In this paper, we propose integrating multiple beamformed signals and a microphone signal as input to an end-to-end KWS model and leveraging the attention mechanism to dynamically tune the model?s attention to the reliable input sources. |
1496 | Fast Lattice-Free Keyword Filtering for Accelerated Spoken Term Detection | J. Wintrode and J. Wilkes | We present a novel set of keyword detection techniques to accelerate spoken term detection for known queries with minimal loss in accuracy. |
1497 | Training Keyword Spotters with Limited and Synthesized Speech Data | J. Lin, K. Kilgour, D. Roblek and M. Sharifi | In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term detection models of around 400k parameters. |
1498 | Towards Data-Efficient Modeling for Wake Word Spotting | Y. Gao, Y. Mishchenko, A. Shah, S. Matsoukas and S. Vitaladevuni | In this paper we present data-efficient solutions to address the challenges in WW modeling, such as domain-mismatch, noisy conditions, limited annotation, etc. |
1499 | Adaptation of RNN Transducer with Text-To-Speech Technology for Keyword Spotting | E. Sharma et al. | With a goal to improve the KWS performance for these keywords without having to collect additional natural speech data, we explore Text-To-Speech (TTS) technology to synthetically generate training data for such keywords. |
1500 | Crnn-Ctc Based Mandarin Keywords Spotting | H. Yan, Q. He and W. Xie | In this paper, we propose an end-to-end Mandarin KWS system using Convolutional Recurrent Neural Network with the Connectionist Temporal Classification (CTC) loss function (CRNN-CTC). |
1501 | Unsupervised Neural Mask Estimator for Generalized Eigen-Value Beamforming Based Asr | R. Kumar, A. Sreeram, A. Purushothaman and S. Ganapathy | In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. |
1502 | Spatial Attention for Far-Field Speech Recognition with Deep Beamforming Neural Networks | W. He, L. Lu, B. Zhang, J. Mahadeokar, K. Kalgaonkar and C. Fuegen | In this paper, we introduce spatial attention for refining the information in multi-direction neural beamformer for far-field automatic speech recognition. |
1503 | Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network | J. Qi, H. Hu, Y. Wang, C. H. Yang, S. Marco Siniscalchi and C. Lee | We propose a tensor-to-vector regression approach to multi-channel speech enhancement in order to address the issue of input size explosion and hidden-layer size expansion. |
1504 | Truth-to-Estimate Ratio Mask: A Post-Processing Method for Speech Enhancement Direct at Low Signal-to-Noise Ratios | B. Chen, H. Wang, Y. Wei and R. H. Y. So | This study proposes a bi-directional recurrent neural network (Bi-RNN) post-processing method for speech enhancement (SE) at low signal-to noise ratios (SNR). |
1505 | Geometry Constrained Progressive Learning for Lstm-Based Speech Enhancement | X. Tang, J. Du, L. Chai, Y. Wang, Q. Wang and C. Lee | In this paper, we incorporate two kinds of geometric constraints among these targets into the objective function to help LSTM achieve better training. |
1506 | Using Separate Losses for Speech and Noise in Mask-Based Speech Enhancement | Z. Xu, S. Elshamy and T. Fingscheidt | In this paper, we propose a novel components loss (CL) for the training of neural networks for speech enhancement. |
1507 | Stable Training of Dnn for Speech Enhancement Based on Perceptually-Motivated Black-Box Cost Function | M. Kawanaka, Y. Koizumi, R. Miyazaki and K. Yatabe | To overcome this problem, we propose to use stabilization techniques borrowed from reinforcement learning. |
1508 | A Robust Audio-Visual Speech Enhancement Model | W. Wang, C. Xing, D. Wang, X. Chen and F. Sun | In this paper, we present a safe AVSE approach that can make the visual stream contribute to audio speech enhancment(ASE) safely in conditions of various SNRs by late fusion. |
1509 | Robust Unsupervised Audio-Visual Speech Enhancement Using a Mixture of Variational Autoencoders | M. Sadeghi and X. Alameda-Pineda | In this paper, we propose a robust unsupervised audio-visual speech enhancement method based on a per-frame VAE mixture model. |
1510 | AV(SE)2: Audio-Visual Squeeze-Excite Speech Enhancement | M. L. Iuzzolino and K. Koishida | We propose a new mechanism for audio-visual (AV) fusion that leverages a cross-modal squeeze-excitation (SE) block for speech enhancement: AV(SE)2. |
1511 | Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation | H. Shi, L. Wang, M. Ge, S. Li and J. Dang | In this study, we design the minimum difference masks (MDMs) to classify the time-frequency (T-F) bins in spectrograms according to the nearest distances from labels. |
1512 | A Return to Dereverberation in the Frequency Domain Using a Joint Learning Approach | Y. Li and D. S. Williamson | In this paper, we investigate whether deverberation can be effectively performed in the frequency-domain by estimating the complex frequency response of a room impulse response. |
1513 | In-Domain and Out-of-Domain Data Augmentation to Improve Children?s Speaker Verification System in Limited Data Scenario | S. Shahnawazuddin, W. Ahmad, N. Adiga and A. Kumar | In this paper, we present our efforts towards developing a robust automatic speaker verification (ASV) system for children when the domain-specific data is limited. |
1514 | Jhu-HLTCOE System for the Voxsrc Speaker Recognition Challenge | D. Garcia-Romero, A. McCree, D. Snyder and G. Sell | This paper describes our submission to this challenge where we have explored x-vector extractor topologies, classification head alternatives, data augmentation, and angular margin penalty. |
1515 | Detection of Speech Events and Speaker Characteristics through Photo-Plethysmographic Signal Neural Processing | G. C?mbara, J. Luque and M. Farr?s | In this work, we explore several end-to-end convolutional neural network architectures for detection of human?s characteristics such as gender or person identity. |
1516 | XMU-TS Systems for NIST SRE19 CTS Challenge | H. Lu, J. Zhou, M. Zhao, W. Lei, Q. Hong and L. Li | In this paper, we present our submitted XMU-TS system for NIST SRE19 CTS Challenge. |
1517 | I-Vector Transformation Using K-Nearest Neighbors for Speaker Verification | U. Khan, M. India and J. Hernando | In this work, we propose a post processing of i-vectors using a Deep Neural Network (DNN) to transform i-vectors into a new speaker vector representation. |
1518 | H-Vectors: Utterance-Level Speaker Embedding Using a Hierarchical Attention Model | Y. Shi, Q. Huang and T. Hain | In this paper, a hierarchical attention network is proposed to generate utterance-level embeddings (H-vectors) for speaker identification and verification. |
1519 | Feature Enhancement with Deep Feature Losses for Speaker Verification | S. Kataria, P. S. Nidadavolu, J. Villalba, N. Chen, P. Garc?a-Perera and N. Dehak | We propose to use Deep Feature Loss which optimizes the enhancement network in the hidden activation space of a pre-trained auxiliary speaker embedding network. |
1520 | Combining Deep Embeddings of Acoustic and Articulatory Features for Speaker Identification | Q. Hong, C. Wu, H. Wang and C. Huang | In this study, deep embedding of acoustic and articulatory features are combined for speaker identification. |
1521 | Bayesian Estimation of Plda with Noisy Training Labels, with Applications to Speaker Verification | B. J. Borgstr?m and P. Torres-Carrasquillo | This paper proposes a method for Bayesian estimation of probabilistic linear discriminant analysis (PLDA) when training labels are noisy. |
1522 | Unsupervised Feature Enhancement for Speaker Verification | P. S. Nidadavolu, S. Kataria, J. Villalba, P. Garc?a-Perera and N. Dehak | We developed an unsupervised feature enhancement approach in log-filter bank space with the end goal of improving speaker verification performance. |
1523 | CN-Celeb: A Challenging Chinese Speaker Recognition Dataset | Y. Fan et al. | In this paper, we present CN-Celeb, a large-scale speaker recognition dataset collected �in the wild�. |
1524 | HI-MIA: A Far-Field Text-Dependent Speaker Verification Database and the Baselines | X. Qin, H. Bu and M. Li | This paper presents a far-field text-dependent speaker verification database named HI-MIA. |
1525 | End-to-End Code-Switching TTS with Cross-Lingual Language Model | X. Zhou, X. Tian, G. Lee, R. K. Das and H. Li | In this paper, we propose to incorporate crosslingual word embedding into an end-to-end TTS system, to improve the voice rendering. |
1526 | Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora | Y. Cao et al. | In this paper, we present a bilingual phonetic posteriorgram (PPG) based CS speech synthesizer using only monolingual corpora. |
1527 | Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data | S. Maiti, E. Marchi and A. Conkie | We present progress towards bilingual Text-to-Speech which is able to transform a monolingual voice to speak a second language while preserving speaker voice quality. |
1528 | Speaker Adaptation of a Multilingual Acoustic Model for Cross-Language Synthesis | I. Himawan, S. Aryal, I. Ouyang, S. Kang, P. Lanchantin and S. King | In the current work, our objective is to synthesize speech in different languages using the target speaker?s voice, regardless of the language of their data. |
1529 | Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models | K. Inoue, S. Hara, M. Abe, T. Hayashi, R. Yamamoto and S. Watanabe | To make use of such accessible data, the proposed method leverages the recent great success of state-of-the-art end-to-end automatic speech recognition (ASR) systems and obtains corresponding transcriptions from pretrained ASR models. |
1530 | BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization | H. B. Moss, V. Aggarwal, N. Prateek, J. Gonz?lez and R. Barra-Chicote | We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation. |
1531 | Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis | T. Fujimoto, S. Takaki, K. Hashimoto, K. Oura, Y. Nankaku and K. Tokuda | This paper proposes a general framework of semi-supervised learning based on hierarchical generative models and adapts it to a Japanese end-to-end text-to-speech (TTS) system. |
1532 | Breathing and Speech Planning in Spontaneous Speech Synthesis | ?. Sz?kely, G. E. Henter, J. Beskow and J. Gustafson | To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. |
1533 | Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit | T. Hayashi et al. | This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-TTS, which is an extension of the open-source speech processing toolkit ESPnet. |
1534 | Extracting Unit Embeddings Using Sequence-To-Sequence Acoustic Models for Unit Selection Speech Synthesis | X. Zhou, Z. Ling and L. Dai | This paper presents a method of using the intermediate representations between linguistic and acoustic features in a Tacotron model to derive the cost functions for unit selection speech synthesis. |
1535 | Audio-Assisted Image Inpainting for Talking Faces | A. Koumparoulis, G. Potamianos, S. Thomas and E. da Silva Morais | The goal of our work is to complete missing areas of images of talking faces, exploiting information from both the visual and audio modalities. |
1536 | Libri-Light: A Benchmark for ASR with Limited or No Supervision | J. Kahn et al. | We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. |
1537 | A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR | V. Bozheniuk, A. Zeyer, R. Schl?ter and H. Ney | Apart from recognition performance, we focus on the training and evaluation speed and provide a time-efficient setup for CNNs. |
1538 | Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR | M. Zeineldeen, A. Zeyer, R. Schl?ter and H. Ney | We explore various LN long short-term memory (LSTM) recurrent neural networks (RNN) variants by applying LN to different parts of the internal recurrency of LSTMs. |
1539 | Small Energy Masking for Improved Neural Network Training for End-To-End Speech Recognition | C. Kim, K. Kim and S. Reddy Indurthi | In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. |
1540 | Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation | T. Nguyen, S. St?ker, J. Niehues and A. Waibel | In this paper we examine the influence of three data augmentation methods on the performance of two S2S model architectures. |
1541 | Effectiveness of Self-Supervised Pre-Training for ASR | A. Baevski and A. Mohamed | We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization. |
1542 | High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model | J. Li et al. | In this paper, we detail our recent efforts to improve conventional hybrid LSTM acoustic models for high-accuracy and low-latency automatic speech recognition. |
1543 | Dfsmn-San with Persistent Memory Model for Automatic Speech Recognition | Z. You, D. Su, J. Chen, C. Weng and D. Yu | In this paper, we try to investigate whether even more information beyond the whole utterance level can be exploited and beneficial. |
1544 | Dynamic Temporal Residual Learning for Speech Recognition | J. Xie, R. Yan, S. Xiao, L. Peng, M. T. Johnson and W. Zhang | This paper proposes a novel dynamic temporal residual learning mechanism for LSTM networks to better explore temporal dependencies in sequential data. |
1545 | E2E-SINCNET: Toward Fully End-To-End Speech Recognition | T. Parcollet, M. Morchid and G. Linar?s | In this paper, we propose the E2E-SincNet, a novel fully E2E ASR model that goes from the raw waveform to the text transcripts by merging two recent and powerful paradigms: SincNet and the joint CTC-attention training scheme. |
1546 | Speaker Augmentation for Low Resource Speech Recognition | C. Du and K. Yu | In this paper, we propose a novel speaker augmentation approach which can synthesize data with sufficient speaker and text diversity. |
1547 | CGCNN: Complex Gabor Convolutional Neural Network on Raw Speech | P. No?, T. Parcollet and M. Morchid | We propose to combine the complex Gabor filter with complex-valued deep neural networks to replace usual CNN weights kernels, to fully take advantage of its optimal time-frequency resolution and of the complex domain. |
1548 | One-Shot Voice Conversion Using Star-Gan | R. Wang, Y. Ding, L. Li and C. Fan | Our efforts are made on one-shot voice conversion where the target speaker is unseen in training dataset or both source and target speakers are unseen in the training dataset. |
1549 | One-Shot Voice Conversion by Vector Quantization | D. Wu and H. Lee | In this paper, we propose a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label. |
1550 | Neutral to Lombard Speech Conversion with Deep Learning | E. Gentet, B. David, S. Denjean, G. Richard and V. Roussarie | In this paper, we propose several approaches for neutral to Lombard speech conversion. |
1551 | End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction | D. Wang et al. | Inspired by the success of sequence-to-sequence (seq2seq) based text-to-speech (TTS) synthesis and knowledge distillation (KD) techniques, this paper proposes a novel end-to-end voice conversion (VC) method to tackle the reconstruction task. |
1552 | Pitchnet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network | C. Deng, C. Yu, H. Lu, C. Weng and D. Yu | In this paper, we propose to advance the existing unsupervised singing voice conversion method proposed in [1] to achieve more accurate pitch translation and flexible pitch manipulation. |
1553 | An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data | F. Xie et al. | An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data |
1554 | Voice Conversion with Transformer Network | R. Liu, X. Chen and X. Wen | This paper describes an end-to-end voice conversion system, which involves three main ideas: transformer, context preservation mechanisms, and model adaptation. |
1555 | Mspec-Net : Multi-Domain Speech Conversion Network | H. Malaviya, J. Shah, M. Patel, J. Munshi and H. A. Patil | In this paper, we present a multi-domain speech conversion technique by proposing a Multi-domain Speech Conversion Network (MSpeC-Net) architecture for solving the less-explored area of Non-Audible Murmur-to-SPeeCH (NAM2-SPCH) conversion. |
1556 | Multi-Speaker and Multi-Domain Emotional Voice Conversion Using Factorized Hierarchical Variational Autoencoder | M. Elgaar, J. Park and S. W. Lee | This study exploits the FHVAE pipeline to produce disentangled representations of emotion, making it possible to greatly facilitate emotional voice conversion.We propose three versions of algorithms for improving the quality of disentangled representation and audio synthesis. |
1557 | Emotional Voice Conversion Using Multitask Learning with Text-To-Speech | T. Kim, S. Cho, S. Choi, S. Park and S. Lee | In this study, a voice converter that utilizes multitask learning with text-to-speech (TTS) is presented. |
1558 | Effective Wavenet Adaptation for Voice Conversion with Limited Data | H. Du, X. Tian, L. Xie and H. Li | In this paper, we propose a WaveNet adaptation method that effectively reduces the need of adaptation data. |
1559 | Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials | T. Saeki, Y. Saito, S. Takamichi and H. Saruwatari | In this paper, we propose computationally efficient and high-quality methods for statistical voice conversion (VC) with direct waveform modification based on spectral differentials. |
1560 | Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion | C. Peyser, T. N. Sainath and G. Pundak | In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. |
1561 | Neural Lattice Search for Speech Recognition | R. Ma, H. Li, Q. Liu, L. Chen and K. Yu | In this paper, we address these problems with an end-to-end model for accurately extracting the best hypothesis from the word lattice. |
1562 | Deliberation Model Based Two-Pass End-To-End Speech Recognition | K. Hu, T. N. Sainath, R. Pang and R. Prabhavalkar | In this work, we propose to attend to both acoustics and first-pass hypotheses using a deliberation network. |
1563 | Alignment-Length Synchronous Decoding for RNN Transducer | G. Saon, Z. T?ske and K. Audhkhasi | We present a beam decoding strategy for recurrent neural network transducers which has the characteristic that all competing hypotheses within the beam have the same alignment length (number of output symbols plus BLANK symbols). |
1564 | Incorporating Written Domain Numeric Grammars into End-To-End Contextual Speech Recognition Systems for Improved Recognition of Numeric Sequences | B. Haynor and P. S. Aleksic | We propose a modular and scalable solution for improved recognition of numeric sequences. |
1565 | LSTM-Based One-Pass Decoder for Low-Latency Streaming | J. Jorge et al. | In this paper we present a novel streaming decoder that includes a bidirectional LSTM acoustic model as well as an unidirectional LSTM language model to perform the decoding efficiently while keeping the performance comparable to that of an off-line setup. |
1566 | Multistate Encoding with End-To-End Speech RNN Transducer Network | Z. Wu, B. Li, Y. Zhang, P. S. Aleksic and T. N. Sainath | In this paper, we propose a technique for incorporating contextual signals, such as intelligent assistant device state or dialog state, directly into RNN-T models. |
1567 | Neural Oracle Search on N-BEST Hypotheses | E. Variani, T. Chen, J. Apfel, B. Ramabhadran, S. Lee and P. Moreno | In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. |
1568 | Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss | Q. Zhang et al. | In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. |
1569 | Full-Sum Decoding for Hybrid Hmm Based Speech Recognition Using LSTM Language Model | W. Zhou, R. Schl?ter and H. Ney | We explore the potential gain from more accurate probabilities in terms of decision making and apply the full-sum decoding with a modified prefix-tree search framework. |
1570 | The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment | W. Zhou, W. Michel, K. Irie, M. Kitza, R. Schl?ter and H. Ney | We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. |
1571 | Meta Learning for End-To-End Low-Resource Speech Recognition | J. Hsu, Y. Chen and H. Lee | In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). |
1572 | Cross-Speaker Silent-Speech Command Word Recognition Using Electro-Optical Stomatography | S. Stone and P. Birkholz | In this work, we present the results of a study using a novel measurement technology called Electro-Optical Stomatography to capture speech movements and use the acquired data to recognize a number of command words. |
1573 | Exploring A Zero-Order Direct Hmm Based on Latent Attention for Automatic Speech Recognition | P. Bahar, N. Makarov, A. Zeyer, R. Schl?ter and H. Ney | In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. |
1574 | Improving Device Directedness Classification of Utterances With Semantic Lexical Features | K. Gillespie, I. C. Konstantakopoulos, X. Guo, V. T. Vasudevan and A. Sethy | We propose a directedness classifier that combines semantic lexical features with a lightweight acoustic feature and show it is effective in classifying directedness. |
1575 | Training ASR Models By Generation of Contextual Information | K. Singh et al. | In this paper, we conduct a large-scale study evaluating the effectiveness of weakly-supervised learning for speech recognition by using loosely related contextual information as a surrogate for ground-truth labels. |
1576 | Speech Recognition Model Compression | M. Sakthi, A. Tewfik and R. Pawate | In this paper, we propose Bin & Quant (B&Q), a compression technique using which we were able to reduce the Deep Speech 2 speech recognition model size by 7 times for a negligible loss in accuracy. |
1577 | Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition | H. Braun, J. Luitjens, R. Leary, T. Kaldewey and D. Povey | We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). |
1578 | Sequence-to-Sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding | A. H. Liu, T. Sung, S. Chuang, H. Lee and L. Lee | In this paper, we investigate the benefit that off-the-shelf word embedding can bring to the sequence-to-sequence (seq-to-seq) automatic speech recognition (ASR). |
1579 | Synchronous Transformers for end-to-end Speech Recognition | Z. Tian, J. Yi, Y. Bai, J. Tao, S. Zhang and Z. Wen | In this paper, we propose a model named synchronous transformer to address this problem, which can predict the output sequence chunk by chunk. |
1580 | Investigation of Methods to Improve the Recognition Performance of Tamil-English Code-Switched Data in Transformer Framework | M. S. Mary N J, V. M. Shetty and S. Umesh | In this paper, we investigate methods to deal with such very low resource scenarios. |
1581 | Bangla Voice Command Recognition in end-to-end System Using Topic Modeling based Contextual Rescoring | N. Sadeq, S. Ahmed, S. S. Shubha, M. N. Islam and M. A. Adnan | In this work, we perform contextual rescoring using multi-label topic modeling to improve the performance of an End-to-End Bangla voice command recognition system. |
1582 | Learning to Detect Keyword Parts and Whole by Smoothed Max Pooling | H. Park, P. Violette and N. Subrahmanya | We propose smoothed max pooling loss and its application to keyword spotting systems. |
1583 | End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning | S. Indurthi et al. | In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data. |
1584 | Analyzing ASR Pretraining for Low-Resource Speech-to-Text Translation | M. C. Stoian, S. Bansal and S. Goldwater | Here, we experiment with pretraining on datasets of varying sizes, including languages related and unrelated to the AST source language. |
1585 | Instance-based Model Adaptation for Direct Speech Translation | M. A. Di Gangi, V. Nguyen, M. Negri and M. Turchi | We tackle this limitation with a method to improve data exploitation and boost the system’s performance at inference time. |
1586 | Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation | N. Arivazhagan, C. Cherry, I. Te, W. Macherey, P. Baljekar and G. Foster | We investigate the problem of simultaneous machine translation of long-form speech content. |
1587 | Skinaugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation | A. D. McCarthy, L. Puzon and J. Pino | We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. |
1588 | End-to-End Speech Translation with Self-Contained Vocabulary Manipulation | M. Tu, F. Zhang and W. Liu | But vocabulary manipulation is hard to apply to the end-to-end speech-text translation, because neither source text nor speech-to-target mapping is available. We introduce a method that avoids this dependence. |
1589 | An Empirical Study of Transformer-Based Neural Language Model Adaptation | K. Li et al. | We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. |
1590 | Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers | J. Xu, X. Chen, S. Hu, J. Yu, X. Liu and H. Meng | By formulating quantized RNNLMs training as an optimization problem, this paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM). |
1591 | Audio-Attention Discriminative Language Model for ASR Rescoring | A. Gandhe and A. Rastrow | In this work, we propose to combine the benefits of end-to-end approaches with a conventional system using an attention-based discriminative language model that learns to rescore the output of a first-pass ASR system. |
1592 | Training Code-Switching Language Model with Monolingual Data | S. Chuang, T. Sung and H. Lee | In this paper, we propose an approach to train code-switching language models with monolingual data only. |
1593 | Domain Robust, Fast, and Compact Neural Language Models | A. Gerstenberger, K. Irie, P. Golik, E. Beck and H. Ney | We propose training methods for building neural language models for such a task, which are not only domain robust, but reasonable in model size and fast for evaluation. |
1594 | A Random Gossip BMUF Process for Neural Language Modeling | Y. Huang et al. | In this paper, we present a decentralized BMUF process, in which the model is split into different components, each of which is updated by communicating to some randomly chosen neighbor nodes with the same component, followed by a BMUF-like process. |
1595 | Pseudo Labeling and Negative Feedback Learning for Large-Scale Multi-Label Domain Classification | J. Kim and Y. Kim | In this paper, given one ground-truth domain for each training utterance, we regard domains consistently predicted with the highest confidences as additional pseudo labels for the training. |
1596 | Pre-Training for Query Rewriting in a Spoken Language Understanding System | Z. Chen, X. Fan and Y. Ling | In this work, we first propose a neural-retrieval based approach for query rewriting. Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant. |
1597 | End-to-End Architectures for ASR-Free Spoken Language Understanding | E. Palogiannidi, I. Gkinis, G. Mastrapas, P. Mizera and T. Stafylakis | In this paper, we explore a set of recurrent architectures for intent classification, tailored to the recently introduced Fluent Speech Commands (FSC) dataset, where intents are formed as combinations of three slots (action, object, and location). |
1598 | End-To-End Spoken Language Understanding Without Matched Language Speech Model Pretraining Data | R. Price | This paper proposes two strategies to improve the performance of E2E SLU models in scenarios where transcribed data for pretraining in the target language is unavailable: multilingual pretraining with mismatched languages and data augmentation using SpecAugment[1]. |
1599 | Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems | Y. Huang et al. | In this paper, we attempt to leverage NLU text resources. |
1600 | Generating Empathetic Responses by Looking Ahead the User?s Sentiment | J. Shin, P. Xu, A. Madotto and P. Fung | Hence, in this paper, we propose Sentiment Look-ahead, which is a novel perspective for empathy that models the future user emotional state. |
1601 | A Hierarchical Model for Dialog Act Recognition Considering Acoustic and Lexical Context Information | Y. Si, L. Wang, J. Dang, M. Wu and A. Li | To solve the problem, we propose a hierarchical model for DAR considering context information of both lexical and acoustic prosody. |
1602 | Large-Scale Unsupervised Pre-Training for End-to-End Spoken Language Understanding | P. Wang, L. Wei, Y. Cao, J. Xie and Z. Nie | In this paper, we explore unsupervised pre-training for End-to-end SLU models by learning representations from large-scale raw audios. |
1603 | Improving Spoken Question Answering Using Contextualized Word Representation | D. Su and P. Fung | In this paper, we propose using contextualized word representations to mitigate the effects of ASR errors and pretraining on existing textual QA datasets to mitigate the data scarcity issue. |
1604 | Learning Asr-Robust Contextualized Embeddings for Spoken Language Understanding | C. Huang and Y. Chen | We propose a novel confusion-aware fine-tuning method to mitigate the impact of ASR errors on pre-trained LMs. |
1605 | A Hierarchical Tracker for Multi-Domain Dialogue State Tracking | J. Li, S. Zhu and K. Yu | In this paper, we propose a hierarchical dialogue state tracker which consists of three sequential modules: domain classification, slot detection and value extraction. |
1606 | A BI-Model Approach for Handling Unknown Slot Values in Dialogue State Tracking | Y. Wang, Y. Shen and H. Jin | In this paper, we present an end-to-end bi-model structure for dialogue state tracking, which can handle the scenarios when the spoken language understanding model with a predefined slot candidate list is absent. |
1607 | Improving Sample-Efficiency in Reinforcement Learning for Dialogue Systems by Using Trainable-Action-Mask | Y. Wu, B. Tseng and C. E. Rasmussen | In this paper, we propose trainable-action-mask (TAM) which learns from data automatically without handcrafting complicated rules. |
1608 | Fg2seq: Effectively Encoding Knowledge for End-To-End Task-Oriented Dialog | Z. He, Y. He, Q. Wu and J. Chen | In this paper, we propose a Flow-to-Graph seq2seq model (FG2Seq) which can effectively encode knowledge by considering inherent structural information of the knowledge graph and latent semantic information from dialog history. |
1609 | A Simple But Effective Bert Model for Dialog State Tracking on Resource-Limited Systems | T. M. Lai, Q. Hung Tran, T. Bui and D. Kihara | In this work, we propose a simple but effective DST model based on BERT. |
1610 | Fast Domain Adaptation for Goal-Oriented Dialogue Using a Hybrid Generative-Retrieval Transformer | I. Shalyminov, A. Sordoni, A. Atkinson and H. Schulz | In this paper, we present a hybrid generative-retrieval model that can be trained using transfer learning. |
1611 | Predicting Performance Outcome with a Conversational Graph Convolutional Network for Small Group Interactions | Y. Lin and C. Lee | By introducing the use of the graph structure in modeling the natural inter-member conversational ties during such an interaction, we aim to advance the state-of-art computational approach in predicting group performance scores. |
1612 | Design Considerations for Hypothesis Rejection Modules in Spoken Language Understanding Systems | A. Alok, R. Gupta and S. Ananthakrishnan | In this work, we present two designs for SLU hypothesis rejection modules: (i) scheme R1 that performs rejection on domain specific SLU hypothesis and, (ii) scheme R2 that performs rejection on hypothesis generated from the overall SLU system. |
1613 | Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings | D. Makhervaks, W. Hinthorn, D. Dimitriadis and A. Stolcke | In this paper we investigate to what extent various acoustic, linguistic and pragmatic aspects of the meetings, both in isolation and jointly, can help detect hot spots. |
1614 | Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries | H. Tachibana and Y. Katayama | In order to build a large scale accent dictionary that contains those words, the authors developed an accent estimation technique that predicts the accent of a word from its limited information, namely the surface (e.g. kanji) and the yomi (simplified phonetic information). |
1615 | OH, JEEZ! or UH-HUH? A Listener-Aware Backchannel Predictor on ASR Transcriptions | D. Ortega, C. Li and N. T. Vu | This paper presents our latest investigation on modeling backchannel in conversations. |
1616 | Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection | Q. Chen, M. Chen, B. Li and W. Wang | In this paper, we propose a Controllable Time-delay Transformer (CT-Transformer) model that jointly completes the punctuation prediction and disfluency detection tasks in real time. |
1617 | Identifying Truthful Language in Child Interviews | V. Ardulov, Z. Durante, S. Williams, T. Lyon and S. Narayanan | The work presented uses various machine learning algorithms to identify differences in the speech of children when they are lying or being truthful, particularly when they have been asked by a confederate to deceive an interviewer. |
1618 | A multi-view approach for Mandarin non-native mispronunciation verification | Z. Wana, J. H. L. Hansen and Y. Xie | In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin. |
1619 | Diacritic-Level Pronunciation Analysis Using Phonological Features | A. Kain, A. Roten and R. Gale | We propose a system that combines a multi-target architecture with weighted finite-state transducers to first segment and then analyze an utterance in terms of its phonological features. |
1620 | Phoneme Boundary Detection Using Learnable Segmental Features | F. Kreuk, Y. Sheena, J. Keshet and Y. Adi | In this work, we propose a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection. |
1621 | Meta-Learning for Robust Child-Adult Classification from Speech | N. R. Koluguri, M. Kumar, S. H. Kim, C. Lord and S. Narayanan | In this paper, we address a specific sub-problem of speaker diarization, namely child-adult speaker classification in such dyadic conversations with specified roles. |
1622 | Accounting for Microprosody in Modeling Intonation | P. Birkholz and X. Zhang | Here, we derived models for two forms of microvariations: the drop in f0 during voiced obstruents, and the increased f0 at the onset of vowels following voiceless obstruents. |
1623 | How confident are you? Exploring the role of fillers in the automatic prediction of a speaker?s confidence | T. Dinkar, I. Vasilescu, C. Pelachaud and C. Clavel | We introduce a new and challenging task, that is the prediction of FOAK, which we think has widespread applicability, given the increasing popularity of automatic processing of educational and job interviews, reviews and speeches. |
1624 | Training Spoken Language Understanding Systems with Non-Parallel Speech and Text | L. Sari, S. Thomas and M. Hasegawa-Johnson | In this study, we investigate the use of non-parallel speech and text to improve the performance of dialog act recognition as an example SLU task. |
1625 | What is best for spoken language understanding: small but task-dependant embeddings or huge but out-of-domain embeddings? | S. Ghannay, A. Neuraz and S. Rosset | The goal of this study is two-fold: firstly, it focuses on semantic evaluation of common word embeddings approaches for SLU task; secondly, it investigates the use of two different data sets to train the embeddings: small and task-dependent corpus or huge and out-of-domain corpus. |
1626 | Fast Intent Classification for Spoken Language Understanding Systems | A. Tyagi et al. | To address the latency and computational complexity issues, we explore a BranchyNet scheme on an intent classification scheme within SLU systems. |
1627 | Converting Written Language to Spoken Language with Neural Machine Translation for Language Modeling | S. Ando, M. Suzuki, N. Itoh, G. Kurata and N. Minematsu | We address this problem by generating texts in spoken language from those in written language by using a neural machine translation (NMT) model. |
1628 | Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings | R. Ma, L. Jin, Q. Liu, L. Chen and K. Yu | In this work, we address this problem by assigning multiple fine-grained sense embeddings to each word in the embedding layers. |
1629 | Addressing Challenges in Building Web-Scale Content Classification Systems | A. S. Timmaraju, A. Liu and P. Tripathi | We present learnings from our efforts in building a content classification system for multiple document types at Facebook using Multi-modal Transformers. |
1630 | A Neural Document Language Modeling Framework for Spoken Document Retrieval | L. Yen, Z. Wu and K. Chen | Targeting on enhancing the SDR performance, the paper concentrates on proposing a neural retrieval framework, which assembles the merits of using language modeling (LM) mechanism in SDR and leveraging the abstractive information learned by the language representation models. |
1631 | Spoken Document Retrieval Leveraging Bert-Based Modeling and Query Reformulation | S. Fan-Jiang, T. Lo and B. Chen | In this paper, we present a novel study of SDR leveraging the Bidirectional Encoder Representations from Transformers (BERT) model for query and document representations (embeddings), as well as for relevance scoring. |
1632 | Multitask Learning for Darpa Lorelei?s Situation Frame Extraction Task | K. Singla and S. Narayanan | This paper describes a novel approach of multitask learning for an end-to-end optimization technique for document classification. |
1633 | Auxiliary Capsules for Natural Language Understanding | I. Staliunaite and I. Iacobacci | In this work we extend the newly introduced application of Capsule Networks for NLU to a multi-task learning environment, using relevant auxiliary tasks. |
1634 | Discrete Wasserstein Autoencoders for Document Retrieval | Y. Zhang and H. Zhu | In this paper, we present an end-to-end Wasserstein Autoencoder (WAE) for text hashing to avoid in-differentiable operators in the reparameterization trick, where the latent variables can be imposed to any discrete priors we can sample by using adversarial learning. |
1635 | Cross-Lingual Topic Prediction For Speech Using Translations | S. Bansal, H. Kamper, A. Lopez and S. Goldwater | Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic? We consider this question in the setting where a small amount of speech in the low-resource language is paired with text translations in a high-resource language. |
1636 | Embedded Large-Scale Handwritten Chinese Character Recognition | Y. Chherawala, H. J. G. A. Dolfing, R. S. Dixon and J. R. Bellegarda | This paper describes how the Apple deep learning recognition system can accurately handle up to 30,000 Chinese characters while running in real-time across a range of mobile devices. |
1637 | Leveraging Gans to Improve Continuous Path Keyboard Input Models | A. Mehra, J. R. Bellegarda, O. Bapat, P. Lal and X. Wang | In this work, we address this challenge by using GANs to augment our training corpus with user-realistic synthetic data. |
1638 | Unsupervised Key Hand Shape Discovery of Sign Language Videos with Correspondence Sparse Autoencoders | R. D. Siyli, B. Gundogdu, M. Saraclar and L. Akarun | In this paper, we assign labels of an isolated Sign Language(SL) dataset using end-to-end neural network architectures that have proven success in unsupervised discovery of sub-word acoustic units in speech processing. |
1639 | Keyword Search for Sign Language | N. C. Tamer and M. Sara?lar | In this paper, we propose a keyword search (KWS) system for a sign language. |
1640 | Large-Context Pointer-Generator Networks for Spoken-to-Written Style Conversion | M. Ihori, A. Takashima and R. Masumura | This paper introduces a spoken-to-written style conversion method that is suitable for handling a series of text such as discourses and conversations. |
1641 | From Unsupervised Machine Translation to Adversarial Text Generation | A. Rashid, A. Do-Omri, M. A. Haidar, Q. Liu and M. Rezagholizadeh | We present a self-attention based bilingual adversarial text generator (B-GAN) which can learn to generate text from the encoder representation of an unsupervised neural machine translation system. |
1642 | Self-Attention and Retrieval Enhanced Neural Networks for Essay Generation | W. Wang, H. Zheng and Z. Lin | In this paper, we focus on essay generation, which aims at generating an essay (a paragraph) according to a set of topic words. |
1643 | Bert is Not All You Need for Commonsense Inference | S. Park, J. Son, S. Hwang and K. Park | Our contribution is observing the limitations of BERT in commonsense inference, then leveraging complementary resources containing missing information. |
1644 | Semi-Supervised Sentence Classification Based on User Polarity in the Social Scenarios | B. Ma, H. Sun, J. Wang, Q. Qi and J. Liao | Thus, in this paper, a concept called user polarity is proposed to quantify the tendency of sentences published by a user which are categorized into the same class. |
1645 | Upgrading CRFS to JRFS and its Benefits to Sequence Modeling and Labeling | Y. Song, Z. Ou, Z. Liu and S. Yang | In this paper, we propose to upgrade condtional random fields (CRFs) and obtain a joint generative model of observation and label sequences, called joint random fields (JRFs). |
1646 | Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization | H. Xu, Y. Wang, K. Han, B. Ma, J. Chen and X. Li | In this paper, we propose to use a graph to connect the parsing trees from the sentences in a document and utilize the stacked graph convolutional networks (GCNs) to learn the syntactic representation for a document. |
1647 | Learning to Generate Diverse Questions from Keywords | Y. Pan, B. Hu, Q. Chen, Y. Xiang and X. Wang | In this paper, we focus on a more complex question generation task, i.e., generating a series of questions for each set of keywords (one-to-many). To evaluate the effectiveness of the proposed model, we collect a dataset which contains 62835 questions with respect to 12567 sets of keywords. |
1648 | Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates | J. Iranzo-S?nchez et al. | In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. |
1649 | Multilingual Grapheme-To-Phoneme Conversion with Byte Representation | M. Yu et al. | In this work, we propose a multilingual G2P model with byte-level input representation to accommodate different grapheme systems, along with an attention-based Transformer architecture. |
1650 | Language-Agnostic Multilingual Modeling | A. Datta, B. Ramabhadran, J. Emond, A. Kannan and B. Roark | In this paper, we propose a new approach to building a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer. |
1651 | ADI17: A Fine-Grained Arabic Dialect Identification Dataset | S. Shon, A. Ali, Y. Samih, H. Mubarak and J. Glass | In this paper, we describe a method to collect dialectal speech from YouTube videos to create a large-scale Dialect Identification (DID) dataset. |
1652 | Universal Phone Recognition with a Multilingual Allophone System | X. Li et al. | In this work, we propose a joint model of both language-independent phone and language-dependent phoneme distributions. |
1653 | Coupled Training of Sequence-to-Sequence Models for Accented Speech Recognition | V. Unni, N. Joshi and P. Jyothi | We propose coupled training for encoder-decoder ASR models that acts on pairs of utterances corresponding to the same text spoken by speakers with different accents. |
1654 | Addressing Accent Mismatch In Mandarin-English Code-Switching Speech Recognition | Z. Tan, X. Fan, H. Zhu and E. Lin | In this paper, we experiment on Mandarin-English code-switching audio spoken by native Chinese speakers and evaluate three techniques to improve accuracy – data adaptation, individual senone modeling and lexicon enrichment. |
1655 | Detecting Mismatch Between Text Script and Voice-Over Using Utterance Verification Based on Phoneme Recognition Ranking | Y. Jeong and H. Cho | The purpose of this study is to detect the mismatch between text script and voice-over. |
1656 | DNN-Based Speech Recognition for Globalphone Languages | M. Y. Tachbelie, A. Abulimiti, S. T. Abate and T. Schultz | This paper describes new reference benchmark results based on hybrid Hidden Markov Model and Deep Neural Networks (HMM-DNN) for the GlobalPhone (GP) multilingual text and speech database. |
1657 | Deep Neural Networks Based Automatic Speech Recognition for Four Ethiopian Languages | S. T. Abate, M. Yifiru Tachbelie and T. Schultz | In this work, we present speech recognition systems for four Ethiopian languages: Amharic, Tigrigna, Oromo and Wolaytta. |
1658 | Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages | V. M. Shetty, M. Sagaya Mary N J and S. Umesh | In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. |
1659 | Improving Language Identification for Multilingual Speakers | A. Titus, J. Silovsky, N. Chen, R. Hsiao, M. Young and A. Ghoshal | We address this by using coarser-grained targets for the acoustic LID model and integrating its outputs with interaction context signals in a context-aware model to tailor the system to each user. |
1660 | Information Flow Optimization in Inference Networks | A. Deshmukh, J. Liu, V. V. Veeravalli and G. Verma | The problem of maximizing the information flow through a sensor network tasked with an inference objective at the fusion center is considered. |
1661 | Exploiting Two-Dimensional Symmetry and Unimodality for Model-Free Source Localization in Harsh Environment | J. Chen | This paper studies a model-free source localization problem where sensors only measure received signal strengths from a wireless signal source. |
1662 | Uncertainty Quantification for Remaining Useful Lifetime Prediction with Multi-Channel Sensory Data | Y. Deng, H. Wu and C. Jiang | In this paper, the ratio of mean to variance was considered to measure the uncertainty propagation rate (UPR) of RUL prediction over time. |
1663 | ViMo: Vital Sign Monitoring Using Commodity Millimeter Wave Radio | F. Wang, F. Zhang, C. Wu, B. Wang and K. J. Ray Liu | In this paper, we propose ViMo, a calibration-free remote Vital sign Monitoring system that can simultaneously monitor multiple users by leveraging the channel impulse response (CIR) of 60GHz WiFi. |
1664 | Time Reversal Based Robust Gesture Recognition Using Wifi | S. D. Regani, B. Wang, M. Wu and K. J. Ray Liu | In this work, we propose WiGRep, a time reversal based gesture recognition approach using Wi-Fi, which can recognize different gestures by counting the number of repeating gesture segments. |
1665 | Noncoherent Maximum-Likelihood Detection for Ambient Backscattering Communications Over Ambient OFDM Signals | D. Darsena | This paper deals with the problem of noncoherent maximum-likelihood (ML) signal detection for backscatter communications over ambient OFDM. |
1666 | Bandit Sampling for Faster Activity and Data Detection in Massive Random Access | J. Dong, J. Zhang and Y. Shi | This paper considers the grant-free random access scheme in IoT networks with a massive number of devices. |
1667 | Consensus-Based Distributed Clustering for IoT | H. Chen, H. Yu, S. Zhao and Q. Shi | Facing the distributed network challenges including data volume, communication latency, and information security, we here propose a distributed clustering algorithm where each IoT device may have data from multiple clusters. |
1668 | Energy-Efficient 3D UAV Trajectory Design for Data Collection in Wireless Sensor Networks | D. B. Licea, E. Nurellari and M. Ghogho | We consider the issue of designing closed 3D UAV trajectories that allow for an energy efficient collection of data with a UAV-aided wireless sensor network. |
1669 | Application Informed Motion Signal Processing for Finger Motion Tracking Using Wearable Sensors | Y. Liu, F. Jiang and M. Gowda | This paper presents a system called FinGTrAC that shows the feasibility of fine grained finger gesture tracking using low intrusive wearable sensor platform (smart-ring worn on the index finger and a smart-watch worn on the wrist). |
1670 | Instant Adaptive Learning: An Adaptive Filter Based Fast Learning Model Construction for Sensor Signal Time Series Classification on Edge Devices | A. Pal, A. Ukil, T. Deb, I. Sahu and A. Majumdar | In this paper, we propose Instant Adaptive Learning that characterizes the intrinsic signal processing properties of time series sensor signals using linear adaptive filtering and derivative spectrum to efficiently construct a low-cost learning model followed by standard classification algorithms. |
1671 | Image Fusion using Joint Sparse Representations and Coupled Dictionary Learning | F. G. Veshki, N. Ouzir and S. A. Vorobyov | In this work, to identify the sharp image patches, we propose an improved discriminative coupled dictionary learning approach using joint sparse representations in blurred and focused dictionaries. |
1672 | Clustering of Nonnegative Data and an Application to Matrix Completion | C. Strohmeier and D. Needell | In this article, we propose a simple algorithm to cluster nonnegative data lying in disjoint subspaces. |
1673 | Sampling of Surfaces and Learning Functions in High Dimensions | Q. Zou and M. Jacob | To capture the non-linear structure of the data, we model the data as points living on a smooth surface. We model the surface as the zero level-set of a bandlimited function. |
1674 | Stock Movement Prediction That Integrates Heterogeneous Data Sources Using Dilated Causal Convolution Networks with Attention | D. Daiya, M. Wu and C. Lin | The purpose of this research is to develop a high performing model for stock movement prediction utilizing financial indicators and news data. |
1675 | Large-Scale Weakly-Supervised Content Embeddings for Music Recommendation and Tagging | Q. Huang, A. Jansen, L. Zhang, D. P. W. Ellis, R. A. Saurous and J. Anderson | We explore content-based representation learning strategies tailored for large-scale, uncurated music collections that afford only weak supervision through unstructured natural language metadata and colisten statistics. |
1676 | Supervised Canonical Correlation Analysis of Data on Symmetric Positive Definite Manifolds by Riemannian Dimensionality Reduction | F. Fallah and B. Yang | In this paper, we propose a framework for a supervised CCA of manifold-based data. |
1677 | The Empirical Duality Gap of Constrained Statistical Learning | L. F. O. Chamon, S. Paternain, M. Calvo-Fullana and A. Ribeiro | In this work, we propose to directly tackle the constrained statistical problem overcoming its infinite dimensionality, unknown distributions, and constraints by leveraging finite dimensional parameterizations, sample averages, and duality theory. |
1678 | Context and Uncertainty Modeling for Online Speaker Change Detection | H. Aronowitz and W. Zhu | In this work we focus on online speaker change detection as a standalone task which is required for online closed captioning of broadcast television. |
1679 | Modeling Uncertainty in Predicting Emotional Attributes from Spontaneous Speech | K. Sridhar and C. Busso | An intriguing approach to predict uncertainty is Monte Carlo (MC) dropout, which obtains predictions from multiple feed-forward passes through a deep neural network (DNN) by using dropout regularization in both training and inference. This study evaluates this approach with regression models to predict emotional attribute scores for valence, arousal and dominance. |
1680 | Accuracy-Robustness Trade-Off for Positively Weighted Neural Networks | A. Neacsu, J. Pesquet and C. Burileanu | We propose a stochastic projected gradient descent algorithm which allows us to adjust this constant in the training process. |
1681 | Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data | H. Gish, J. Silovsky, M. Sung, M. Siu, W. Hartmann and Z. Jiang | We investigate the problem of machine learning with mislabeled training data. |
1682 | On Network Science and Mutual Information for Explaining Deep Neural Networks | B. Davis, U. Bhatt, K. Bhardwaj, R. Marculescu and J. M. F. Moura | In this paper, we present a new approach to interpret deep learning models. |
1683 | Spatial Active Noise Control Based on Kernel Interpolation with Directional Weighting | H. Ito, S. Koyama, N. Ueno and H. Saruwatari | A spatial active noise control (ANC) method taking prior information on the approximate direction of primary noise sources into consideration is proposed. |
1684 | Active Noise Control Over Multiple Regions: Performance Analysis | J. Zhang, H. Sun, P. N. Samarasinghe and T. D. Abhayapala | In this paper, we perform an initial study on the more complex problem of simultaneous noise control over multiple target regions using a single ANC system. |
1685 | Array-Geometry-Aware Spatial Active Noise Control Based on Direction-of-Arrival Weighting | Y. Maeno, Y. Takida, N. Murata and Y. Mitsufuji | In this paper, we propose a direction of arrival (DOA) weighting algorithm for the adaptive filter update, which prioritizes residual error control with respect to the array geometry. |
1686 | Multichannel Active Noise Control with Spatial Derivative Constraints to Enlarge the Quiet Zone | D. Shi, B. Lam, S. Wen and W. Gan | This paper proposes a time-domain adaptive algorithm to implement the spatial derivative constraint, which avoids the complex analytic acoustic calculations. |
1687 | An Acoustic Modelling Based Remote Error Sensing Approach for Quiet Zone Generation in a Noisy Environment | Q. Zhu, X. Qiu and I. Burnett | This paper presents an improved approach to increase the effective frequency range based on the acoustic modelling. |
1688 | Distributed Wave-Domain Active Noise Control Based on the Diffusion Strategy | Y. Dong, J. Chen and W. Zhang | In order to address this issue, this work presents a distributed wave-domain ANC scheme by resorting to distributed optimization techniques. |
1689 | A Theoretical Basis for Practitioners Heuristic 1/N and Long-Only Quintile Portfolio | R. Zhou and D. P. Palomar | In this paper, we formulate a mathematically meaningful robust maximum return portfolio design and show that it reduces to the two heuristic portfolios under different level of estimation error in the mean returns. |
1690 | A Recursive Bayesian Solution for the Excess Over Threshold Distribution with Stochastic Parameters | D. E. Johnston and P. M. Djuric | In this paper, we propose a new approach for analyzing extreme values that are witnessed in financial markets. |
1691 | Gaussian Process Imputation of Multiple Financial Series | T. d. Wolff, A. Cuevas and F. Tobar | We focus on learning the relationships among financial time series by modelling them through a multi-output Gaussian process (MOGP) with expressive covariance functions. |
1692 | Robust Covariance Matrix Estimation and Portfolio Allocation: The Case of Non-Homogeneous Assets | E. Jay, T. Soler, J. -. Ovarlez, P. D. Peretti and C. Chorro | This paper presents how the most recent improvements made on covariance matrix estimation and model order selection can be applied to the portfolio optimization problem. |
1693 | Portfolio Cuts: A Graph-Theoretic Framework to Diversification | B. S. Dees, L. Stankovic, A. G. Constantinides and D. P. Mandic | To this end, we investigate ways for domain knowledge to be conveniently incorporated into the analysis, by means of graphs. |
1694 | CORRGAN: Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks | G. Marti | We propose a novel approach for sampling realistic financial correlation matrices. |
1695 | Graph Neural Net Using Analytical Graph Filters and Topology Optimization for Image Denoising | W. Su, G. Cheung, R. Wildes and C. Lin | Inspired by an analytically derived CNN by Hadji et al., in this paper we construct a new layered graph neural net (GNN) using GraphBio as our graph filter. |
1696 | Defending Graph Convolutional Networks Against Adversarial Attacks | V. N. Ioannidis and G. B. Giannakis | This paper introduces graph neural network architectures that are robust to perturbed networked data. |
1697 | Constrained Spectral Clustering for Dynamic Community Detection | A. Karaaslanli and S. Aviyente | In this paper, we propose a new stochastic block model (SBM) for modeling the evolution of community membership. |
1698 | Towards an Efficient and General Framework of Robust Training for Graph Neural Networks | K. Xu et al. | To overcome these limitations, we propose a general framework which leverages the greedy search algorithms and zeroth-order methods to obtain robust GNNs in a generic and an efficient manner. |
1699 | Deep Geometric Knowledge Distillation with Graphs | C. Lassance, M. Bontonou, G. B. Hacene, V. Gripon, J. Tang and A. Ortega | In this work, we focus instead on relative knowledge distillation (RKD), which considers the geometry of the respective latent spaces, allowing for dimension-agnostic transfer of knowledge. |
1700 | On The Choice of Graph Neural Network Architectures | C. Vignac, G. Ortiz-Jim?nez and P. Frossard | In this paper, we show empirically that in settings with fewer features and more training data, more complex graph networks significantly outperform simple models, and propose a few insights towards the proper choice of graph network architectures. |
1701 | Multitask Learning with Capsule Networks for Speech-to-Intent Applications | J. Poncelet and H. V. hamme | In this paper, we show how capsules can incorporate multitask learning, which often can improve the performance of a model when the task is difficult. |
1702 | Using Speech Synthesis to Train End-To-End Spoken Language Understanding Models | L. Lugosch, B. H. Meyer, D. Nowrouzezahrai and M. Ravanelli | We propose a strategy to overcome this requirement in which speech synthesis is used to generate a large synthetic training dataset from several artificial speakers. |
1703 | Improved End-To-End Spoken Utterance Classification with a Self-Attention Acoustic Classifier | R. Price, M. Mehrabani and S. Bangalore | In this paper, we propose a new architecture for end-to-end spoken utterance classification (SUC) and also explore the impact of leveraging lexical information in conjunction with acoustic information obtained from the end-to-end model for SUC. |
1704 | Dialogue History Integration into End-to-End Signal-to-Concept Spoken Language Understanding Systems | N. Tomashenko, C. Raymond, A. Caubri?re, R. D. Mori and Y. Est?ve | We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. |
1705 | Error Analysis Applied to End-to-End Spoken Language Understanding | A. Caubri?re et al. | This paper presents a qualitative study of errors produced by an end-to-end spoken language understanding (SLU) system (speech signal to concepts) that reaches state of the art performance. |
1706 | A Data Efficient End-to-End Spoken Language Understanding Architecture | M. Dinarelli, N. Kapoor, B. Jabaian and L. Besacier | In this paper we introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module. |
1707 | Federated Neuromorphic Learning of Spiking Neural Networks for Low-Power Edge Intelligence | N. Skatchkovsky, H. Jang and O. Simeone | In this paper, we propose to mitigate this problem via cooperative training through Federated Learning (FL). |
1708 | Temporal Coding in Spiking Neural Networks with Alpha Synaptic Function | I. M. Comsa, T. Fischbacher, K. Potempa, A. Gesmundo, L. Versari and J. Alakuijala | We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes and performs classification using the first output neuron to spike. |
1709 | Event-Driven Signal Processing with Neuromorphic Computing Systems | P. Blouw and C. Eliasmith | In this paper, we provide an overview of tools and methods for building applications that run on neuromorphic computing devices. |
1710 | Challenges and Perspectives in Neuromorphic-based Visual IoT Systems and Networks | M. Martini, N. Khan, Y. Bi, Y. Andreopoulos, H. Saki and M. Shikh-Bahaei | Hence, we explore here the feasibility of advanced Machine to Machine (M2M) communications systems that directly capture, compress and transmit spike-based visual information to cloud computing services in order to produce content classification or retrieval results with extremely low power and low latency. |
1711 | Spiking Neural Networks Trained With Backpropagation for Low Power Neuromorphic Implementation of Voice Activity Detection | F. Martinelli, G. Dellaferrera, P. Mainar and M. Cernak | We describe a training procedure that achieves low spiking activity and apply pruning algorithms to remove up to 85% of the network connections with no performance loss. |
1712 | Training Deep Spiking Neural Networks for Energy-Efficient Neuromorphic Computing | G. Srinivasan, C. Lee, A. Sengupta, P. Panda, S. S. Sarwar and K. Roy | In this paper, we propose different SNN training methodologies, varying in degrees of biofidelity, and evaluate their efficacy on complex image recognition datasets. |
1713 | Deep Learning for Robust Power Control for Wireless Networks | W. Cui, K. Shen and W. Yu | This paper aims to show that a deep learning approach for network utility maximization can produce more robust solutions than the traditional model-based approach. |
1714 | Feedback Turbo Autoencoder | Y. Jiang, H. Kim, H. Asnani, S. Oh, S. Kannan and P. Viswanath | In this work, we propose Feedback Auto Turbo Encoder (FTAE) which harmoniously combines interleaver and iterative decoding with CNN architectures and demonstrate the blocklength gain and improved performance in the block feedback setting. |
1715 | Exploiting Channel Locality for Adaptive Massive MIMO Signal Detection | M. Khani, M. Alizadeh, J. Hoydis and P. Fleming | We propose MMNet, a deep learning MIMO detection scheme that significantly outperforms existing approaches on realistic channels with the same or lower computational complexity. |
1716 | Deep Learning-Based Beam Alignment in Mmwave Vehicular Networks | N. J. Myers, Y. Wang, N. Gonz?lez-Prelcic and R. W. Heath | In this paper, we propose an end-to-end deep learning technique to design a structured CS matrix that is well suited to the underlying channel distribution, leveraging both sparsity and the particular spatial structure that appears in vehicular channels. |
1717 | Joint Source-Channel Coding and Bayesian Message Passing Detection for Grant-Free Radio Access in IoT | J. Dommel, Z. Utkovski, S. Stanczak and O. Simeone | In contrast, this paper considers a potentially more efficient solution based on Joint Source-Channel (JSC) coding via a non-orthogonal generalization of Type-Based Multiple Access (TBMA). |
1718 | CNN-Based Analog CSI Feedback in FDD MIMO-OFDM Systems | M. B. Mashhadi, Q. Yang and D. G?nd?z | Instead, we propose here a Convolutional neural network (CNN)-based analog feedback scheme, called AnalogDeepCMC, which directly maps the downlink CSI to uplink channel input. |
1719 | Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI | T. Sanchez et al. | In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. |
1720 | Linear Speedup in Saddle-Point Escape for Decentralized Non-Convex Optimization | S. Vlaski and A. H. Sayed | In this work, we examine in detail the dependence of second-order convergence guarantees on the spectral properties of the combination policy for non-convex multi agent optimization. |
1721 | On Distributed Stochastic Gradient Algorithms for Global Optimization | B. Swenson, A. Sridhar and H. V. Poor | The paper considers the problem of network-based computation of global minima in smooth nonconvex optimization problems. |
1722 | Distributed Tensor Completion Over Networks | C. Battiloro and P. D. Lorenzo | The aim of this paper is to propose a novel distributed strategy for tensor completion, where (partial) data are collected over a network of agents with sparse, but connected, topology. |
1723 | Lookahead Converges to Stationary Points of Smooth Non-convex Functions | J. Wang, V. Tantia, N. Ballas and M. Rabbat | We prove that, with appropriate choice of step-sizes, Lookahead converges to a stationary point of smooth non-convex functions. |
1724 | Communication Constrained Learning with Uncertain Models | J. Z. Hare, C. A. Uribe, L. M. Kaplan and a. Ali Jadbabaie | We consider the problem of distributed inference of a group of agents in a social network, where the agents construct, share, and update beliefs in a non-Bayesian framework to identify the underlying true state of the world. |
1725 | A Sparse Linear Array Approach in Automotive Radars Using Matrix Completion | S. Sun and A. P. Petropulu | In this paper, different from previous efforts, we use matrix completion to complete the corresponding virtual uniform linear array (ULA) before estimating the target angle. |
1726 | A Low-Resolution ADC Proof-of-Concept Development for a Fully-Digital Millimeter-wave Joint Communication-Radar | P. Kumari, A. Mezghani and R. W. Heath | To address these concerns, we present a low-complexity proof-of-concept (PoC) development for a wideband MIMO JCR that uses a mmWave communications waveform and low-resolution ADCs, while maintaining a separate radio-frequency chain per antenna. |
1727 | Performance Bounds for Displaced Sensor Automotive Radar Imaging | G. Wang and K. V. Mishra | In this paper, we derive performance bounds on the estimation error of target parameters processed by displaced sensors that correspond to several independent radars mounted at different locations on the same vehicle. |
1728 | Spatial and Temporal Smoothing for Covariance Estimation in Super-Resolution Angle Estimation in Automotive Radars | A. E. Ertan and M. Ali | In this paper, we analyze the effect covariance matrix averaging in spatial and temporal domain on the angular resolution that can be obtained with MUSIC. |
1729 | Slow-Time MIMO-FMCW Automotive Radar Detection with Imperfect Waveform Separation | P. Wang, P. Boufounos, H. Mansour and P. V. Orlik | We develop an explicit signal model that accounts for waveform separation residuals and propose a Kronecker subspace-based object detector in the framework of generalized likelihood ratio test (GLRT). |
1730 | On Binary Sequence Set Design with Applications to Automotive Radar | R. Lin and J. Li | We consider herein the case of two vehicles equipped with multi-input multi-output (MIMO) automotive radars driving next to each other. |
1731 | Optimal Transport Structure of CycleGAN for Unsupervised Learning for Inverse Problems | B. Sim, G. Oh and J. C. Ye | In this article, we explain the link between these two framework by mathematical formula and experimental results. |
1732 | Light-Field Reconstruction and Depth Estimation from Focal Stack Images Using Convolutional Neural Networks | Z. Huang, J. A. Fessler, T. B. Norris and I. Y. Chun | This paper proposes a non-iterative LF reconstruction and depth estimation method based on three sequential convolutional neural networks (CNNs). |
1733 | Joint Learning of Cartesian under Sampling Andre Construction for Accelerated MRI | T. Weiss, S. Vedula, O. Senouf, O. Michailovich, M. Zibulevsky and A. Bronstein | To this end, we propose an algorithm for training the combined acquisition-reconstruction pipeline end-to-end in a differentiable way. |
1734 | Building Firmly Nonexpansive Convolutional Neural Networks | M. Terris, A. Repetti, J. Pesquet and Y. Wiaux | In this paper, we develop an optimization algorithm that can be incorporated in the training of a network to ensure the nonexpansiveness of its convolutional layers. |
1735 | Confirmnet: Convolutional Firmnet and Application to Image Denoising and Inpainting | P. K. Pokala, P. Kumar Uttam and C. S. Seelamantula | We propose a convolutional iterative firm-thresholding algorithm (CIFTA) building on our previously proposed IFTA, and its deep-unfolded version, namely, convolutional-FirmNet (ConFirmNet). |
1736 | Learning Differentiable Sparse and Low Rank Networks for Audio-Visual Object Localization | J. Pu, Y. Panagakis and M. Pantic | In this paper, we propose a novel method to design specific deep neural networks for sparse and low-rank models, where the network can learn a data-adaptive model from training data. |
1737 | Spherical Large Intelligent Surfaces | S. Hu and F. Rusek | In this paper, we extend LISs to be three-dimensional (3D) and deployed as spherical surfaces. |
1738 | Pathloss Prediction using Deep Learning with Applications to Cellular Optimization and Efficient D2D Link Scheduling | R. Levie, ?. Yapar, G. Kutyniok and G. Caire | In this paper we propose a highly efficient and very accurate method for estimating the propagation pathloss from a point x to all points y on the 2D plane. |
1739 | Beamforming in Intelligent Environments based on Ultra-Massive MIMO Platforms in Millimeter Wave and Terahertz Bands | S. Nie and I. F. Akyildiz | In this paper, a joint beamforming scheme is developed based on fractional programming optimization to maximize the spectral efficiency under practical consideration of energy constraints of the UM MIMO communication platform. |
1740 | A Single-RF Architecture for Multiuser Massive MIMO Via Reflecting Surfaces | A. Bereyhi, V. Jamali, R. R. M?ller, A. M. Tulino, G. Fischer and R. Schober | In this work, we propose a new single-RF MIMO architecture which enjoys high scalability and energy-efficiency. |
1741 | Hybrid Precoding for Secure Transmission in Reflect-Array-Assisted Massive MIMO Systems | S. Asaad, R. F. Schaefer and H. Vincent Poor | Inspired by this architecture, we design a secure multiuser hybrid analog-digital precoding scheme. |
1742 | Wideband Channel Tracking for Millimeter Wave Massive Mimo Systems with Hybrid Beamforming Reception | G. C. Alexandropoulos, E. Vlachos and J. Thompson | We present an efficient algorithm for this estimation problem that is based on the alternating direction method of multipliers. |
1743 | Improving Auditory Attention Decoding Performance of Linear and Non-Linear Methods using State-Space Model | A. Aroudi, T. de Taillez and S. Doclo | In this paper, we investigate a state-space model using correlation coefficients obtained with a small correlation window to improve the decoding performance of the linear and the non-linear AAD methods. |
1744 | Towards Decoding Selective Attention from Single-Trial EEG Data in Cochlear Implant users based on Deep Neural Networks | W. Nogueira and H. Dolhopiatenko | The goal of this work was to investigate the use of non-linear models based on deep neural networks (DNNs) to improve the selective attention decoding accuracy in CI users. |
1745 | Harmonic/Percussive Sound Separation and Spectral Complexity Reduction of Music Signals for Cochlear Implant Listeners | B. Lentz, A. Nagathil, J. Gauer and R. Martin | In this work, a music pre-processing scheme is described which combines these approaches and is applicable to a wider variety of music genres. |
1746 | Bio-Mimetic Attentional Feedback in Music Source Separation | A. Bellur and M. Elhilali | In this work, we study these competing theories within a deep-network framework for the task of music source separation. |
1747 | Talker-Independent Speaker Separation in Reverberant Conditions | M. Delfarah, Y. Liu and D. Wang | To effectively deal with speaker separation and speech dereverberation, we propose a two-stage strategy where reverberant utterances are first separated and then dereverberated. |
1748 | Evaluation of Joint Auditory Attention Decoding and Adaptive Binaural Beamforming Approach for Hearing Devices with Attention Switching | W. Pu, P. Zan, J. Xiao, T. Zhang and Z. Luo | In this study, we present the evaluation results of this joint formulation on a new EEG dataset collected on subjects with dynamic switch. |
1749 | Robust Pricing Mechanism for Resource Sustainability Under Privacy Constraint in Competitive Online Learning Multi-Agent Systems | E. Tampubolon and H. Boche | Based on the non-cooperative game as the model for agents? interaction and the noisy online mirror ascent as the model for the rationality of the agents, we propose a novel pricing mechanism that gives the agents incentives for sustainable use of the resources. |
1750 | Neural Network Wiretap Code Design for Multi-Mode Fiber Optical Channels | K. Besser, A. Lonnstrom and E. A. Jorswieck | In this work, we develop an autoencoder for the multi-mode fiber wiretap channel taking into account the error performance at the legitimate receiver and the information leakage at potential eavesdroppers. |
1751 | Age-Based Scheduling Policy for Federated Learning in Mobile Edge Networks | H. H. Yang, A. Arafa, T. Q. S. Quek and H. Vincent Poor | In this paper, based on a metric termed the age of update (AoU), we propose a scheduling policy by jointly accounting for the staleness of the received parameters and the instantaneous channel qualities to improve the running efficiency of FL. |
1752 | Adversarial Networks for Secure Wireless Communications | T. Marchioro, N. Laurenti and D. G?nd?z | We propose a data-driven secure wireless communication scheme, in which the goal is to transmit a signal to a legitimate receiver with minimal distortion, while keeping some information about the signal private from an eavesdropping adversary. |
1753 | Latency-Minimized Design of secure transmissions in UAV-Aided Communications | X. Wu, Q. Li, Y. Lu, H. V. Poor, V. C. M. Leung and P. C. Ching | In this paper, we propose a latency-minimized transmission scheme for satisfying legitimate users? (LUs?) content requests securely against Eve. |
1754 | Detect Insider Attacks Using CNN in Decentralized Optimization | G. Li, S. X. Wu, S. Zhang and Q. Li | This paper studies the security issue of a gossip-based distributed projected gradient (DPG) algorithm, when it is applied for solving a decentralized multi-agent optimization. |
1755 | Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data | K. Elkhalil, A. Kammoun, X. Zhang, M. Alouini and T. Y. Al-Naffouri | This paper carries out a large dimensional analysis of a variation of kernel ridge regression that we call centered kernel ridge regression (CKRR), also known in the literature as kernel ridge regression with offset. |
1756 | A Whiteness Test Based on the Spectral Measure of Large Non-Hermitian Random Matrices | A. Bose and W. Hachem | In the context of multivariate time series, a whiteness test against an MA(1) correlation model is proposed. |
1757 | On the Limit Distribution of the Canonical Correlation Coefficients Between the Past and the Future of a High-Dimensional White Noise | D. Tieplova, P. Loubaton and L. Pastur | On the Limit Distribution of the Canonical Correlation Coefficients Between the Past and the Future of a High-Dimensional White Noise |
1758 | Positive Solutions for Large Random Linear Systems | P. Bizeul, M. Clenet and J. Najim | We investigate the componentwise positivity of the solution xn depending on the scaling factors, as the dimensions of the system grow to infinity. |
1759 | On The Frequency Domain Detection of High Dimensional Time Series | A. Rosuel, P. Vallet, P. Loubaton and X. Mestre | In this paper, we address the problem of detection, in the frequency domain, of a M-dimensional time series modeled as the output of a M ? K MIMO filter driven by a K-dimensional Gaussian white noise, and disturbed by an additive M-dimensional Gaussian colored noise. |
1760 | Large Dimensional Asymptotics of Multi-Task Learning | M. Tiomoko, C. Louart and R. Couillet | Based on a random matrix approach, this article proposes an asymptotic analysis of a support vector machine-inspired multitask learning scheme. |
1761 | Robust Hybrid Beamforming for Satellite-Terrestrial Integrated Networks | Z. Lin, M. Lin, B. Champagne, W. Zhu and N. Al-Dhahir | In this paper, we propose a novel robust downlink beamforming (BF) design for satellite-terrestrial integrated networks. |
1762 | In-Network Caching for Hybrid Satellite-Terrestrial Networks Using Deep Reinforcement Learning | N. Garg, M. Sellathurai and T. Ratnarajah | In this paper, we consider in-network caching where an unavailable content at one BS can be fetched from the nearest BS in the network, before requesting from the content server. |
1763 | Multigraph Spectral Clustering for Joint Content Delivery and Scheduling in Beam-Free Satellite Communications | M. A. Vazquez and A. I. Perez-Neira | This paper tackles the problem of user scheduling in satellite content delivery networks with precoding. |
1764 | Constant-Envelope Precoding for Satellite Systems | C. G. Tsinos, A. Arora and B. Ottersten | In this paper, Constant-Envelope Precoding techniques are presented for satellite-based communication systems. |
1765 | Resource Management in the Multibeam NOMA-based Satellite Downlink | T. Ram?rez and C. Mosquera | Resource Management in the Multibeam NOMA-based Satellite Downlink |
1766 | Genetic Algorithm Optimized Support Vector Machine in NOMA-based Satellite Networks with Imperfect CSI | X. Yan, K. An, C. Wang, W. Zhu, Y. Li and Z. Feng | Inspired by the advantages of machine learning (ML) algorithms, we propose an improved support vector machine (SVM) scheme to reduce the inappropriate user pairing risks and enhance the performance of NOMA based satellite networks with imperfect CSI. |
1767 | Overcoming High Nanopore Basecaller Error Rates for DNA Storage via Basecaller-Decoder Integration and Convolutional Codes | S. Chandak et al. | We propose a novel approach which overcomes the high error rates in basecalled sequences by integrating a Viterbi error correction decoder with the basecaller, enabling the decoder to exploit the soft information available in the deep learning based basecaller pipeline. |
1768 | Efficient Constrained Encoders Correcting a Single Nucleotide Edit in DNA Storage | K. Cai, X. He, H. M. Kiah and T. Thanh Nguyen | In this paper, we investigate codes that corrects a single nucleotide edit and provide linear-time algorithms that encode binary messages into these codes of length n. |
1769 | Image Processing in DNA | C. Pan, S. M. Hossein Tabatabaei Yazdi, S. Kasra Tabatabaei, A. G. Hernandez, C. Schroeder and O. Milenkovic | Here we propose the first method for storing quantized images in DNA that uses signal processing and machine learning techniques to deal with error and cost issues without resorting to the use of redundant oligos or rewriting. |
1770 | Concentration-Based Polynomial Calculations on Nicked DNA | T. Chen and M. Riedel | In this paper, we introduce a novel scheme for computing polynomial functions on a substrate of nicked DNA. |
1771 | Capacity of the Erasure Shuffling Channel | S. Shin, R. Heckel and I. Shomorony | Motivated by DNA-based data storage, we study the erasure shuffling channel. |
1772 | Achieving the Capacity of the DNA Storage Channel | A. Lenz, P. H. Siegel, A. Wachter-Zeh and E. Yaakohi | In this paper we analyze storage systems based on these macromolecules from an information theoretic perspective. |
1773 | Federated Learning with Quantization Constraints | N. Shlezinger, M. Chen, Y. C. Eldar, H. V. Poor and S. Cui | In this work, we tackle this challenge using tools from quantization theory. |
1774 | Cooperative Learning VIA Federated Distillation OVER Fading Channels | J. Ahn, O. Simeone and J. Kang | While prior work studied implementations of FL over wireless fading channels, here we propose wireless protocols for FD and for an enhanced version thereof that leverages an offline communication phase to communicate “mixed-up” covariate vectors. |
1775 | On the Byzantine Robustness of Clustered Federated Learning | F. Sattler, K. M?ller, T. Wiegand and W. Samek | In this work we investigate the application of CFL to byzantine settings, where a subset of clients behaves unpredictably or tries to disturb the joint training effort in an directed or undirected way. |
1776 | Hierarchical Federated Learning ACROSS Heterogeneous Cellular Networks | M. S. H. Abad, E. Ozfatura, D. GUndUz and O. Ercetin | We consider federated edge learning (FEEL), where mobile users (MUs) collaboratively learn a global model by sharing local updates on the model parameters rather than their datasets, with the help of a mobile base station (MBS). |
1777 | Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD | J. Wang, H. Liang and G. Joshi | In this paper, we propose an algorithmic approach named Overlap Local-SGD (and its momentum variant) to overlap communication and computation so as to speedup the distributed training procedure. |
1778 | Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning | A. Elgabli, J. Park, A. S. Bedi, M. Bennis and V. Aggarwal | In this paper, we propose a communication-efficient decen-tralized machine learning (ML) algorithm, coined quantized group ADMM (Q-GADMM). |
1779 | Deep Soft Interference Cancellation for MIMO Detection | N. Shlezinger, R. Fu and Y. C. Eldar | In this work we propose a multiuser MIMO receiver which learns to jointly detect in a data-driven fashion, without assuming a specific channel model or requiring CSI. |
1780 | An Empirical Bayes Approach to Partially Labeled and Shuffled Data Sets | A. Dytso and H. V. Poor | Specifically, we propose to train model-based empirical Bayes separately on the set of features and the set of labels and combine/mix the two models based on the proportion of unlabeled pairs. |
1781 | Reinforced Depth-Aware Deep Learning for Single Image Dehazing | T. Guo and V. Monga | In this work, we proposed a Depth-aware Dehazing using Reinforcement Learning system, denoted as DDRL. |
1782 | Learning Plug-And-Play Proximal Quasi-Newton Denoisers | A. H. Al-Shabili, H. Mansour and P. T. Boufounos | We propose a strategy for training a Gaussian denoiser inspired by an unfolded proximal quasi-Newton algorithm, where the noise level of the input signal to the denoiser is estimated in each iteration and at every entry in the signal. |
1783 | Joint Optimization of Sampling Patterns and Deep Priors for Improved Parallel MRI | H. K. Aggarwal and M. Jacob | In this work, we investigate the impact of sampling patterns on the quality of the image recovered using the MoDL algorithm. |
1784 | Learning Sampling and Model-Based Signal Recovery for Compressed Sensing MRI | I. A. M. Huijben, B. S. Veeling and R. J. G. van Sloun | Realizing that an optimal sampling pattern may depend on the downstream task (e.g. image reconstruction, segmentation, or classification), we here propose joint learning of both task-adaptive k-space sampling and a subsequent model-based proximal-gradient recovery network. |
1785 | Inferring Dynamic Group Leadership Using Sequential Bayesian Methods | Q. Li, S. J. Godsill, J. Liang and B. I. Ahmad | This paper presents an online approach for inferring dominant entities in tracked groups from observations. |
1786 | Scalable Detection and Tracking of Extended Objects | F. Meyer and J. L. Williams | This paper presents a factor graph formulation and sum-product algorithm (SPA) for scalable detection and tracking of extended objects that generate multiple measurements. |
1787 | Adversarial Anomaly Detection for Marked Spatio-Temporal Streaming Data | S. Zhu, H. S. Yuchi and Y. Xie | How to efficiently detect anomalies in these dynamic systems using these streaming event data? This work proposes a novel anomaly detection framework for such event data combining the Long Short-Term Memory (LSTM) and marked spatio-temporal point processes. |
1788 | Quickest Detection of Growing Dynamic Anomalies in Networks | G. Rovatsos, V. V. Veeravalli, D. Towsley and A. Swami | The problem of quickest growing dynamic anomaly detection in sensor networks is studied. |
1789 | Image Segmentation Based Privacy-Preserving Human Action Recognition for Anomaly Detection | J. Yan, F. Angelini and S. M. Naqvi | In this paper, we prove that human action recognition accuracy mostly depends on contextual data, rather than on privacy-related data. |
1790 | Prediction oof Vessel Trajectories From AIS Data Via Sequence-To-Sequence Recurrent Neural Networks | N. Forti, L. M. Millefiori, P. Braca and P. Willett | In this paper, we address the problem of predicting vessel trajectories based on Automatic Identification System (AIS) data. |
1791 | Near-Optimal Interference Exploitation 1-Bit Massive MIMO Precoding Via Partial Branch-and-Bound | A. Li, F. Liu, C. Masouros, Y. Li and B. Vucetic | In this paper, we focus on 1-bit precoding for large-scale antenna systems in the downlink based on the concept of constructive interference (CI). |
1792 | Secure Symbol-Level Miso Precoding | Q. Xu, P. Ren and A. L. Swindlehurst | In this paper, we modify the technique to exploit the full DI region, but we show that even with this improvement, the general approach is vulnerable to an intelligent eavesdropper who can perform maximum likelihood detection. |
1793 | Robust Symbol-Level Precoding Via Autoencoder-Based Deep Learning | F. Sohrabi, H. V. Cheng and W. Yu | This paper proposes an autoencoder-based symbol-level precoding (SLP) scheme for a massive multiple-input multiple-output (MIMO) system operating in a limited-scattering environment. |
1794 | Constant Envelope Massive MIMO-OFDM Precoding: an Improved Formulation and Solution | S. Domouchtsidis, C. Tsinos, S. Chatzinotas and B. Ottersten | Herein, the problem of CE MIMO-OFDM precoding for transmission over frequency selective channels is tackled. |
1795 | Passive Intelligent Surface Assisted MIMO Powered Sustainable IoT | D. Mishra and E. G. Larsson | In this paper, we focus on maximizing the sum received power among the energy harvesting IoT users by jointly optimizing the active precoder for multi-antenna power beacon and the passive constant-envelope precoding based phase shifters (PS) design for PIS. |
1796 | Multiuser Massive Mimo Downlink Precoding Using Second-Order Spatial Sigma-Delta Modulation | M. Shao, W. Ma and L. Swindlehurst | In this work we study the potential of spatial S? modulation in the two-bit DAC case and under second-order modulation. |
1797 | Allocation of Computing Tasks In Distributed MEC Servers Co-Powered By Renewable Sources And The Power Grid | D. Cecchinato, M. Berno, F. Esposito and M. Rossi | We consider a Multiaccess Edge Computing (MEC) network where distributed servers have energy harvesting (e.g., solar) and storage (e.g., batteries) capabilities. |
1798 | Multi-Agent Deep Reinforcement Learning For Distributed Handover Management In Dense MmWave Networks | M. Sana, A. De Domenico, E. C. Strinati and A. Clemente | To tackle this problem, our paper proposes a deep multi-agent reinforcement learning framework for distributed handover management called RHando (Reinforced Handover). |
1799 | Interpretable Machine Learning In Sustainable Edge Computing: A Case Study of Short-Term Photovoltaic Power Output Prediction | X. Chang, W. Li, J. Ma, T. Yang and A. Y. Zomaya | In this paper, we propose a unified clustering-based prediction framework with two tree-based algorithms to provide short-term prediction of PV power output. |
1800 | Load Management with Predictions of Solar Energy Production for Cloud Data Centers | M. Floridia, D. Lagan?, C. Mastroianni, M. Meo and D. Renga | In this paper, we consider the case of Data Centers (DCs) composed of a few sites located in different geographical positions and powered with solar energy. |
1801 | Spectrum Allocation in Wireless Networks for Crowd Labelling | X. Li, G. Zhu, K. Shen, Y. Gong and K. Huang | To tackle this challenge, a novel framework of wireless crowd labelling is proposed that downloads data to many imperfect mobile annotators for repetition labelling by exploiting multicasting in wireless networks. |
1802 | Modeling the Environment in Deep Reinforcement Learning: The Case of Energy Harvesting Base Stations | N. Piovesan, M. Miozzo and P. Dini | In this paper, we focus on the design of energy self-sustainable mobile networks by enabling intelligent energy management that allows the base stations to mostly operate off-grid by using renewable energy. |
1803 | A Differential Approach for Rain Field Tomographic Reconstruction Using Microwave Signals from Leo Satellites | X. Shen, D. D. Huang, C. Vincent, W. Wang and R. Togneri | A differential approach is proposed for tomographic rain field reconstruction using the estimated signal-to-noise ratio of microwave signals from low earth orbit satellites at the ground receivers, with the unknown baseline values eliminated before using least squares to reconstruct the attenuation field. |
1804 | Uncertainties in Short Commercial Microwave Links Fading Due to Rain | H. V. Habi and H. Messer | In this paper we empirically show that while the power-law relation provides good approximation for relating attenuation and rain-rate in terrestrial microwave links of length 1-20Km, for links shorter than 1km, widely used in 5G technologies, it shows significant errors. |
1805 | On the Opportunistic use of Commercial Ku and Ka Band Satcom Networks for Rain Rate Estimation: Potentials and Critical Issues | F. Giannetti, M. Moretti, R. Reggiannini, A. Vaccaro, S. Scarfone and A. Ortolani | In this paper we study the opportunistic use of microwave satellite signals in the Ku and Ka bands to detect the presence of rain events and to estimate their intensity. |
1806 | Performance Analysis for Path Attenuation Estimation of Microwave Signals Due to Rainfall and Beyond | B. Song, D. D. Huang, X. Shen and R. Togneri | In this work, through the mean square error (MSE) analysis of an ideal RSL estimator, it is found that the estimation error can be lower than 0.01 dB for high signal-to-noise ratio (SNR), thereby making it feasible to measure other meteorological variables such as water vapor and clouds. |
1807 | Deep Rainrate Estimation from Highly Attenuated Downlink Signals of Ground-Based Communications Satellite Terminals | K. V. Mishra, B. S. M. R. and B. Ottersten | We address this problem by designing a deep convolutional neural network (CNN) that learns the relationship between the signal attenuation and rainfall rate observed by weather radars and rain gauges at a given location. |
1808 | Statistical Signal Processing Approach for Rain Estimation Based on Measurements from Network Management Systems | J. Ostrometzky and H. Messer | In this paper we apply statistical signal processing methodologies on a real-world application of using Commercial Microwave Links (CMLs) as opportunistic sensors for rain monitoring. |
1809 | Dynamic Oversampling in 1-Bit Quantized Asynchronous Large-Scale Multiple-Antenna Systems for Sustainable Iot Networks | Z. Shao, L. T. N. Landau and R. C. de Lamare | In this paper, we propose a dynamic oversampling technique for asynchronous large-scale multiple-antenna systems with 1-bit analog-to-digital converters at the base station that is suitable for sustainable internet of things and cellular networks. |
1810 | Dynamic Resource Allocation for Wireless Edge Machine Learning with Latency And Accuracy Guarantees | M. Merluzzi, P. D. Lorenzo and S. Barbarossa | In this paper, we address the problem of dynamic allocation of communication and computation resources for Edge Machine Learning (EML) exploiting Multi-Access Edge Computing (MEC). |
1811 | Federating Solar, Storage and Communications in the Electric Grid and Internet of things | R. Ramakrishna, N. Karakoc, K. Hreinsson and A. Scaglione | A futuristic infrastructure model is envisioned with distributed modules that can produce solar energy, have a storage system and provide services of lighting, electric-vehicle charging and communications. |
1812 | Non-Gaussian BLE-Based Indoor Localization Via Gaussian Sum Filtering Coupled with Wasserstein Distance | P. Malekzadeh, S. Mehryar, P. Spachos, K. N. Plataniotis and A. Mohammadi | In contrary to existing solutions, where RSSIs are assumed to have normal statistical properties, in this paper, a Gaussian Sum Filter (GSF) approach is designed to more realistically model the non-Gaussian nature of RSSIs. |
1813 | Optimal Joint Channel Estimation and Data Detection by L1-norm PCA for Streetscape IoT | G. Sklivanitis, K. Tountas, N. Tsagkarakis, D. A. Pados and S. N. Batalama | We prove, for the first time in the literature of communication theory and machine learning, the equivalence of joint maximum-likelihood (ML) optimal channel estimation and data detection (JOCEDD) to the problem of finding the L 1 -norm principal components of a real-valued data matrix. |
1814 | On Measuring Doppler Shifts between Tags in a Backscattering Tag-to-Tag Network with Applications in Tracking | A. Ahmad et al. | In this paper, we present a technique whereby passive tags can track each other in a backscattering tag-to-tag network (BTTN). |
1815 | Efficient Belief Propagation for Graph Matching | E. Onaran and S. Villar | In this short note we derive a novel belief propagation algorithm for graph matching and we numerically evaluate it in the context of matching random graphs. |
1816 | Supervised Graph Representation Learning for Modeling the Relationship between Structural and Functional Brain Connectivity | Y. Li, R. Shafipour, G. Mateos and Z. Zhang | In this paper, we propose a supervised graph representation learning method to model the relationship between brain functional connectivity (FC) and structural connectivity (SC) through a graph encoderdecoder system. |
1817 | Stability of Graph Neural Networks to Relative Perturbations | F. Gama, A. Ribeiro and J. Bruna | In this paper, we are set to study the effect that a change in the underlying graph topology that supports the signal has on the output of a GNN. |
1818 | Active Semi-Supervised Learning for Diffusions on Graphs | B. Das, E. Isufi and G. Leus | In this paper, we put forth a new paradigm for one-shot active semi-supervised learning for graph diffusions. |
1819 | Stochastic Graph Neural Networks | Z. Gao, E. Isufi and A. Ribeiro | Since stochasticity brings in a new paradigm, we develop a novel learning process for the SGNN and introduce the stochastic gradient descent (SGD) algorithm to estimate the parameters. |
1820 | Generative Adversarial Networks for Graph Data Imputation from Signed Observations | A. Madapu, S. Segarra, S. P. Chepuri and A. G. Marques | Our goal is to estimate the true underlying graph signals from our observations. |
1821 | Joint Frequency Domain Channel Estimation and Equalization Based on Expectation Propagation for Single Carrier Transmissions | S. Sahin, A. M. Cipriano, C. Poulliat and M. Boucheret | In this paper, a novel category of expectation propagation (EP) based frequency domain (FD) semi-blind receivers are proposed for single-carrier block transmissions. |
1822 | BP-VB-EP Based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries | C. K. Thomas and D. Slock | In this paper, we propose techniques which allow handling the extension of sparse Bayesian learning (SBL) to time-varying states. |
1823 | Robustness of Sparse Bayesian Learning in Correlated Environments | R. R. Pote and B. D. Rao | In this paper we analyse the performance of Sparse Bayesian Learning (SBL) in an environment with correlated sources. |
1824 | A Simple Derivation of AMP and its State Evolution via First-Order Cancellation | P. Schniter | In this work, we provide a heuristic derivation of AMP and its state evolution, based on the idea of “firstorder cancellation,” that provides insights missing from the LBP derivation while being much shorter than the rigorous SE proof. |
1825 | VAMP with Vector-Valued Diagonalization | R. F. H. Fischer, C. Sippel and N. Goertz | In this contribution, improved versions for the update equation (“Onsager correction”) are derived from basic estimation principles. |
1826 | Distributed Verification of Belief Precisions Convergence in Gaussian Belief Propagation | B. Li, N. Wu and Y. Wu | This paper derives a simple convergence condition for Gaussian BP precisions, which can be verified in a distributed way. |
1827 | ADMM-Based One-Bit Quantized Signal Detection for Massive MIMO Systems With Hardware Impairments | ?. T. Demir and E. Bj?rnson | To solve the non-convex quadratically-constrained quadratic programming (QCQP) problem, we propose an ADMM-based algorithm with closed-form update equations. |
1828 | Learning Task-Based Analog-to-Digital Conversion for MIMO Receivers | N. Shlezinger, R. J. G. van Sloun, I. A. M. Huijben, G. Tsintsadze and Y. C. Eldar | In this work we design task-oriented analog-to-digital converters (ADCs) which operate in a data-driven manner, namely they learn how to map an analog signal into a sampled digital representation such that the system task can be efficiently carried out. |
1829 | One-Bit Normalized Scatter Matrix Estimation For Complex Elliptically Symmetric Distributions | C. Liu and P. P. Vaidyanathan | This paper shows that the arcsine law remains applicable to CES distributions. |
1830 | One-Bit DoA Estimation via Sparse Linear Arrays | S. Sedighi, B. Shankar, M. Soltanalian and B. Ottersten | In this paper, the problem of DoA estimation from one-bit measurements received by an SLA is considered and a novel framework for solving this problem is proposed. |
1831 | One-Bit Sampling in Fractional Fourier Domain | A. Bhandari, O. Graf, F. Krahmer and A. I. Zayed | In this paper, we discuss a different approach to sampling theory in the FrFT domain. |
1832 | Target Parameter Estimation via One-Bit PMCW Radar | H. Zhu, X. Shang and J. Li | We formulate the target parameter estimation problem as a sparse signal recovery problem and use the alternating direction method of multipliers (ADMM) to solve it efficiently. |
1833 | Mobility-Aware Beam Steering in Metasurface-Based Programmable Wireless Environments | C. Liaskos, S. Nie, A. Tsioliaridou, A. Pitsillides, S. Ioannidis and I. F. Akyildiz | In this work we study the effects of user device mobility on the efficiency of PWEs. |
1834 | Dynamic Metasurface Antennas for Bit-Constrained MIMO-OFDM Receivers | H. Wang et al. | In this work we study the application of DMAs for MIMO-OFDM receivers operating with bit-constrained analog-to-digital converters (ADCs). |
1835 | Using Intelligent Reflecting Surfaces for Rank Improvement in MIMO Communications | ?. ?zdogan, E. Bj?rnson and E. G. Larsson | In this paper, we study another use case that might have a larger impact on the channel capacity: rank improvement. |
1836 | Optimizing Backscattering Coefficient Design for Minimizing BER at Monostatic MIMO reader | D. Mishra and J. Yuan | We present a novel monostatic backscatter communication (BSC) protocol for multiple-input-multiple-output (MIMO) reader to detect signals from a single-antenna tag. |
1837 | Real-Time Implementation Aspects of Large Intelligent Surfaces | H. Tataria, F. Tufvesson and O. Edfors | Using the fully active LIS as the basis of our exposition, we first present a rigorous discussion on the relative merits and disadvantages of possible implementation architectures from a radio-frequency circuits and real-time processing viewpoints. We then show that a distributed architecture based on a common module interfacing a smaller number of antennas can be scalable. |
1838 | A Hardware Architecture For Reconfigurable Intelligent Surfaces with Minimal Active Elements for Explicit Channel Estimation | G. C. Alexandropoulos and E. Vlachos | In this paper, we present a novel RIS architecture comprising of any number of passive reflecting elements, a simple controller for their adjustable configuration, and a single Radio Frequency (RF) chain for baseband measurements. |
1839 | Conditional Density Driven Grid Design in Point-Mass Filter | J. Dun?k, O. Straka and J. Matou?ek | In the paper, a novel conditional density driven grid (CDDG) design is proposed. |
1840 | Enhanced Safety of Autonomous Driving by Incorporating Terrestrial Signals of Opportunity | M. Maaref, J. Khalife and Z. M. Kassas | This framework aims to incorporate terrestrial signals of opportunity (SOPs) alongside GPS signals to provide tight horizontal protection level (HPL) bounds to enhance the safety of autonomous driving. |
1841 | Opportunistic use of GNSS Signals to Characterize the Environment by Means of Machine Learning Based Processing | F. Dovis, R. Imam, W. Qin, C. Savas and H. Visser | Given this framework the present paper will describe the use of GNSS as signals of opportunity for the characterization of the operative environment by processing the GNSS observables through Machine Learning (ML) algorithms that can be used as efficient features extractors. |
1842 | An Optimal Symmetric Threshold Strategy for Remote Estimation Over The Collision Channel | X. Zhang, M. M. Vasconcelos, W. Cui and U. Mitra | An Optimal Symmetric Threshold Strategy for Remote Estimation Over The Collision Channel |
1843 | Automotive Collision Risk Estimation Under Cooperative Sensing | D. LaChapelle, T. Humphreys, L. Narula, P. Iannucci and E. Moradi-Pari | This paper offers a technique for estimating collision risk for automated ground vehicles engaged in cooperative sensing. |
1844 | Exploitation of 3D City Maps for Hybrid 5G RTT and GNSS Positioning Simulations | J. A. del Peral-Rosado et al. | To circumvent this limitation, the proposed simulation method is based on using three-dimensional (3D) city maps to coherently determine the line-of-sight (LoS) conditions of the available satellite and cellular links. |
1845 | A Dataset for Measuring Reading Levels In India At Scale | D. Agarwal, J. Gupchup and N. Baghel | To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14. |
1846 | Noise-Robust Key-Phrase Detectors for Automated Classroom Feedback | B. Zylich and J. Whitehill | With the goal of giving teachers automated feedback about their classrooms, we investigate how to train automatic speech detectors of key phrases such as good job, thank you, please, and you?re welcome. |
1847 | Experiments in Creating Online Course Content for Signal Processing Education | C. -. Jansson, R. Thottappillil, S. Hillman, S. M?ller, K. V. S. Hari and R. Sundaresan | This paper presents the details and experiences of creating course content and presents guidelines for prospective content creators. |
1848 | Teaching Signals and Systems – A First Course in Signal Processing | N. P. Rakhashi, A. A. Bhurane and V. M. Gadre | Giving due consideration to this matter, this paper reflects on the experiences in teaching this course. |
1849 | Cochlear Signal Processing: A Platform for Learning the Fundamentals of Digital Signal Processing | E. Ambikairajah and V. Sethu | In this paper we propose that cochlear signal processing is an excellent platform that brings together almost all the fundamental DSP concepts taught in an introductory course and also lends itself as a suitable candidate for project-based learning. |
1850 | Multimodal Learning for Classroom Activity Detection | H. Li et al. | In this paper, we address the above challenges by using a novel attention based neural framework. |
1851 | Automatic Fluency Evaluation of Spontaneous Speech Using Disfluency-Based Features | H. Deng, Y. Lin, T. Utsuro, A. Kobayashi, H. Nishizaki and J. Hoshino | This paper describes an automatic fluency evaluation of spontaneous speech. |
1852 | Intelligent Student Behavior Analysis System for Real Classrooms | R. Zheng, F. Jiang and R. Shen | In this paper, we design an intelligent student behavior analysis system for recorded classrooms, which automatically detects hand-raising, standing, and sleeping behaviors of students. |
1853 | Coded Illumination and Multiplexing for Lensless Imaging | Y. Zheng, R. Zhang and M. S. Asif | In this paper, we propose a new method to address the problem of illconditioning by combining coded illumination patterns with the mask-based lensless imaging. |
1854 | Sparse Convolutional Beamforming for Wireless Ultrasound | A. Mamistvalov and Y. C. Eldar | In this paper we combine recently proposed methods for reducing sampling rate in each channel, and spatial reduction in the number of channels using sparse arrays, in order to implement beamforming at a low data rate without impacting image quality. |
1855 | Divergence-Based Adaptive Extreme Video Completion | M. El Helou, R. Zhou, F. Schmutz, F. Guibert and S. S?sstrunk | We propose an extension of a state-of-the-art extreme image completion algorithm to extreme video completion. |
1856 | Encoding and Decoding Mixed Bandlimited Signals Using Spiking Integrate-and-Fire Neurons | K. Adam, A. Scholefield and M. Vetterli | In this paper, we investigate the encoding and decoding potential of such a layer. |
1857 | On the Effect of Reflectance on Phasor Field Non-Line-of-Sight Imaging | I. Guill?n, X. Liu, A. Velten, D. Gutierrez and A. Jarabo | In this work we complement recent theoretical analysis of phasor field-based reconstruction, by empirically analyzing the effect of reflectance of the hidden scenes on reconstruction. |
1858 | Signal Sensing and Reconstruction Paradigms for a Novel Multi-Source Static Computed Tomography System | A. Kowtal et al. | This paper discusses new sensing and reconstruction paradigms enabled by this new CT architecture. |
1859 | Sampling Classes of Non-Bandlimited Signals Using Integrate-and-Fire Devices: Average Case Analysis | R. Alexandru, N. T. Thao, D. Rzepka and P. L. Dragotti | We investigate the use of integrate-and-fire systems to efficiently sample classes of non-bandlimited signals such as bursts of spikes. |
1860 | Towards an Intelligent Microscope: Adaptively Learned Illumination for Optimal Sample Classification | A. Chaware, C. L. Cooke, K. Kim and R. Horstmeyer | We present a reinforcement learning system that adaptively explores optimal patterns to illuminate specimens for immediate classification. |
1861 | High Dynamic Range Imaging Using Deep Image Priors | G. Jagatap and C. Hegde | In this paper, we introduce a new approach for HDR image reconstruction using neural priors that require no training data. |
1862 | Kernel Computations from Large-Scale Random Features Obtained by Optical Processing Units | R. Ohana et al. | In this paper, we show that this operation results in a dot-product kernel that has connections to the polynomial kernel, and we extend this computation to arbitrary powers of the feature map. |
1863 | Multi-Depth Computational Periscopy with an Ordinary Camera | C. Saunders, R. Bose, J. Murray-Bruce and V. K. Goyal | We demonstrate non-line-of-sight imaging of multi-depth scenes using only a single photograph from an ordinary digital camera. |
1864 | Super-Resolution with Noisy Measurements: Reconciling Upper and Lower Bounds | H. Qiao, S. Shahsavari and P. Pal | The main contribution of this paper is to derive the Cram?r-Rao Bound for noisy super- resolution problem and understand its scaling as a function of the super-resolution factor. |