Speech Dereverberation using Coherent to Diffuse Power Ratio Estimators (CDR)

David Castro Pinol - Watch Now - DSP Online Conference 2022 - Duration: 38:01

David Castro Pinol

Abstract Read Before Watching Questions & Comments (2)

I will give a Talk regarding Coherent to Diffuse power Ratio (CDR) estimators for dereverberation. I will show the basic theory and how to apply them in dereverberation in audio signals. In addition, I will show implementation codes in Matlab and I will reproduce examples. The topic is part of my recent research. I hope you find it interesting and hope to meet you!

This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.

What this presentation is about and why it matters

This talk introduces coherent-to-diffuse power ratio (CDR) estimators and their use for speech dereverberation with microphone arrays. Dereverberation is the process of reducing the late reflections and room-induced smearing that make speech sound distant or muddy and that degrade the performance of listeners and automatic systems (speech recognition, speaker identification, hearing aids). The speaker reviews both the theoretical signal model (how direct sound and reverberation combine at multiple microphones) and practical processing steps, then demonstrates MATLAB implementations and audio examples.

For engineers and researchers working with audio capture or speech systems, CDR-based dereverberation is attractive because it: (1) directly exploits spatial coherence between microphones, (2) can work without machine learning or training data, and (3) can be implemented cheaply in time–frequency domains (STFT) to improve intelligibility and ASR performance.

Who will benefit the most from this presentation

DSP engineers and audio practitioners building front-ends for speech recognition, hearing devices, teleconferencing, or smart speakers.
Students learning spatial audio processing, array signal processing, or room acoustics who want a concrete connection between theory and implementation.
Researchers comparing model-based dereverberation methods with learning-based approaches.
Anyone looking for implementable MATLAB code and listening examples to test ideas quickly.

What you need to know

To get the most from the talk, it's helpful to be comfortable with the following concepts and simple equations. The presenter keeps math compact but precise, so a basic DSP/math background will make the material easier to follow.

Core signal model

Microphone recordings are modeled in the short-time Fourier transform (STFT) domain as a sum of a desired direct-path component and an undesired component that contains late reverberation plus background noise. A compact notation often used is:

Recorded signal = direct (coherent) + late reverberation + additive noise.

Power spectral densities and coherence

Auto and cross power spectral densities are the basic statistics. The complex spatial coherence between two microphone signals is defined as

$\Gamma_{xy} = \dfrac{\Phi_{xy}}{\sqrt{\Phi_{xx}\,\Phi_{yy}}}$

where $\Phi_{xy}$ is the cross-PSD and $\Phi_{xx},\Phi_{yy}$ are auto-PSDs. For a single plane wave arriving with a TDOA (time difference of arrival), the direct-signal coherence has unit magnitude: $|\Gamma_s| = 1$ (it is a pure phase term determined by the TDOA).

Coherent-to-diffuse power ratio (CDR)

CDR is the ratio of coherent (direct) power to diffuse (late-reverberant + background) power. In simple notation:

$CDR = \dfrac{\Phi_{s}}{\Phi_{n}}$

The mixed-field coherence (what you estimate from data) lies on the line between the signal coherence (on the unit circle) and the noise coherence (a real-valued function determined by the diffuse-field model). CDR estimators invert that relationship to estimate how much coherent energy is present in each time–frequency bin.

Estimator types and pipeline

Model-based two-microphone estimators: require a noise-field coherence model (spherical or cylindrical isotropic) and sometimes the signal TDOA. Variants address bias and non-physical results.
DOA-independent estimator: uses the constraint $|\Gamma_s|=1$ to eliminate the need for an explicit DOA estimate.
Non-model (multi-channel) estimator: uses covariance structure and effective rank (via SVD) to estimate diffuseness without explicit spatial models, at higher computational cost.
Typical pipeline: STFT -> coherence estimation (recursive averaging) -> CDR estimation -> map CDR to post-filter gain -> apply gain and reconstruct.

Glossary

Coherent-to-Diffuse Power Ratio (CDR): Ratio of direct (coherent) energy to diffuse (late reverberant + noise) energy in a time–frequency bin.
Spatial coherence: Complex correlation between two microphone signals across frequency, normalized by their auto-PSDs.
STFT (Short-Time Fourier Transform): Time–frequency transform used to process audio in frames and frequency bins.
Auto Power Spectral Density (auto-PSD): Time-averaged spectral power of a single signal, denoted $\Phi_{xx}$.
Cross Power Spectral Density (cross-PSD): Time-averaged cross-spectrum of two signals, denoted $\Phi_{xy}$, used to compute coherence.
TDOA (Time Difference of Arrival): Relative delay between microphones for a direct sound, determines the phase of direct coherence.
Diffuse sound field: Acoustic model where reflections arrive from all directions; its coherence vs frequency is a known real function (spherical or cylindrical models).
Late reverberation: Long-tail reflections that smear energy in time and harm intelligibility and ASR performance.
Post-filter (magnitude-subtraction gain): Gain derived from the CDR used to attenuate reverberant-dominated bins and preserve direct-dominated bins.
Diffuseness: Alternative metric (0..1) that maps the relative diffuse energy to an interpretable scale; related to CDR by a monotonic transform.

Final thoughts

This presentation strikes a useful balance between theory and practice: it clearly explains the signal model and coherence-based reasoning, then shows concrete MATLAB code and audio examples so you can hear and test the results. If you care about compact, model-based dereverberation that does not require training data, this talk is a practical and well-executed introduction. Expect clear derivations, discussion of estimator trade-offs (bias, DOA dependence, complexity), and runnable code you can adapt to your own microphone setups.

M↓ MARKDOWN HELP

italics	surround text with asterisks
bold	surround text with two asterisks
hyperlink	[hyperlink](https://example.com) or just a bare URL
code	surround text with `backticks`
~~strikethrough~~	surround text with ~~two tilde characters~~
quote	prefix with >

Upvotes Newest Oldest

david.castro.pinolSpeaker

Score: 0 | 3 years ago | no reply

There was a typo in the repository reference: The code to test speech dereverberation using CDR estimators is available on: https://github.com/andreas12345/cdr-dereverb

david.castro.pinolSpeaker

Score: 0 | 3 years ago | no reply

There was a typo in the repository reference: The code to test speech dereverberation using CDR estimators is available on: https://github.com/andreas12345/cdr-dereverb

Login

About David Castro Pinol