Home > On-Demand Archives > Talks >
Signal Processing and Machine Learning: Connections and Applications
Kirty Vedula - Watch Now - DSP Online Conference 2022 - Duration: 51:28
This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.
What this presentation is about and why it matters
This talk explores the interplay between classical signal processing and modern machine learning through three concrete domains: speech/audio, wireless communications, and target tracking. It shows how established DSP tools (sampling, transforms, filters, estimators) provide the mathematical backbone for many ML methods, and how data-driven models (deep nets, autoencoders, end-to-end learning) can augment or replace blocks in traditional signal chains. For practicing engineers this matters because many real systems today sit on the boundary: low-level signal models still guide system design, while learned components offer adaptivity and performance advantages when models are imperfect or environments change. Understanding both viewpoints helps you pick the right tool for a problem, design hybrid solutions, and reason about trade-offs such as data needs, interpretability, and computational cost.
Who will benefit the most from this presentation
- Engineers and students in communications, audio, radar, or sensing who already know basic DSP and want to learn where ML fits.
- Machine learning practitioners who want a focused, applied view of DSP building blocks (STFT, MFCCs, Kalman/IMM) used in real systems.
- Researchers interested in hybrid approaches that combine model-based estimators with learned components (e.g., ML+Kalman, autoencoder-based comms).
- Anyone evaluating whether to use hand-engineered features or end-to-end learned models for a signal task.
What you need to know
To get the most from the talk, have a working knowledge of the following ideas. Short pointers are included so you can refresh the essentials before watching.
- Sampling and the Nyquist theorem — signals are discretized by sampling. If a continuous-time signal has no energy above frequency \(f_N\), sampling at \(f_s\) with \(f_s>2f_N\) avoids aliasing. Notation: sampling frequency \(f_s\), Nyquist \(f_N= f_s/2\).
- Discrete Fourier Transform (DFT/FFT) — decomposition of a finite-length sequence into frequency bins. A compact form: \(X[k]=\sum_{n=0}^{N-1} x[n]e^{-j2\pi kn/N}\). FFT algorithms compute this efficiently and are used to obtain spectra for frames.
- Short-Time Fourier Transform (STFT) and spectrograms — use a sliding window to obtain time–frequency energy. STFT is essential when signals are nonstationary (speech, transient interference).
- Filtering and time/frequency duality — filters modify spectra; low-pass filters remove high-frequency noise. Viewing signals in time and frequency gives complementary insights.
- Feature extraction vs end-to-end learning — classical pipelines extract features (MFCCs, spectral centroid), then apply classifiers; deep learning can learn representations directly from raw waveforms or spectrograms.
- MFCC pipeline — STFT magnitude → mel filterbank → log energies → DCT to produce compact cepstral coefficients used widely in speech tasks.
- Statistical estimation basics — MMSE, MAP, and ML are criteria to design estimators/detectors; Kalman filters are an MMSE-style recursive estimator for linear Gaussian systems.
- Interacting Multiple Model (IMM) — a practical approach for tracking targets that switch between motion models (e.g., straight vs turn). IMM runs multiple filters and mixes their estimates.
- Autoencoders and end-to-end communication — treating transmitter+channel+receiver as a differentiable pipeline trained with backprop lets you learn novel constellations and joint coding/decoding maps.
- Optimization and learning — gradient descent variants train deep models; understand bias–variance, overfitting vs underfitting, and the need for data or domain randomization in simulation-based training.
Glossary
- DFT — Discrete Fourier Transform: maps a sequence to frequency bins; implemented efficiently with FFT.
- STFT — Short-Time Fourier Transform: time-localized spectra via windowing and sliding frames.
- Spectrogram — squared magnitude of the STFT; a time–frequency energy map used as input features in audio ML.
- MFCC — Mel-Frequency Cepstral Coefficients: compact features that approximate human auditory perception for speech tasks.
- Wavelet — multiscale transform providing variable time–frequency resolution useful for transient and multiscale analysis.
- Kalman filter — recursive linear estimator for tracking and state estimation under Gaussian noise.
- IMM — Interacting Multiple Model: a bank of filters for systems that switch between discrete motion models.
- Autoencoder — an encoder–decoder neural network used for compression, feature learning, or end-to-end system modeling.
- End-to-end learning — optimizing an entire signal chain jointly (e.g., transmitter+receiver) by treating it as a differentiable network.
- Aliasing — spectral overlap caused by under-sampling; leads to distortion and information loss.
Final notes
This presentation offers a pragmatic bridge between two complementary toolsets: the principled, model-based ideas of DSP and the flexible, data-driven approaches of modern ML. The speaker keeps examples concrete (MFCCs, autoencoder communications, IMM tracking) which helps translate abstract concepts into real engineering choices. If you work with signals in any applied domain, watching the talk will give you practical intuition about when to lean on analytical models and when to reach for learned solutions — and how to combine both for better results.
