Historical
assessment-engine
Audio Features

Audio Features

Overview

Mind Measure extracts 10 audio features from conversation recordings to assess vocal patterns associated with emotional state. All audio feature extraction is performed client-side using the Web Audio API.

Feature List

IDFeatureRangeDescription
A1meanPitch50-400 HzAverage fundamental frequency (F0)
A2pitchVariability0-200 HzStandard deviation of pitch
A3speakingRate50-250 wpmWords per minute estimation
A4pauseFrequency0-50Number of pauses per minute
A5pauseDuration0-5 sAverage pause length
A6voiceEnergy0-1RMS energy normalised
A7jitter0-1Pitch period variability (voice stability)
A8shimmer0-1Amplitude variability (voice quality)
A9harmonicRatio0-1Harmonic-to-noise ratio
A10quality0-1Overall signal quality metric

Extraction Methods

A1: Mean Pitch (F0)

Uses autocorrelation to detect the fundamental frequency:

async extractMeanPitch(channelData: Float32Array, sampleRate: number): Promise<number> {
  const frameSize = Math.floor(sampleRate * 0.03); // 30ms frames
  const hopSize = Math.floor(frameSize / 2);
  const pitchValues: number[] = [];
  
  for (let i = 0; i < channelData.length - frameSize; i += hopSize) {
    const frame = channelData.slice(i, i + frameSize);
    const pitch = this.autocorrelationPitch(frame, sampleRate);
    if (pitch > 50 && pitch < 400) {
      pitchValues.push(pitch);
    }
  }
  
  return pitchValues.length > 0 
    ? pitchValues.reduce((a, b) => a + b) / pitchValues.length 
    : 150; // Default fallback
}

A2: Pitch Variability

Standard deviation of detected pitch values:

extractPitchVariability(pitchValues: number[]): number {
  const mean = pitchValues.reduce((a, b) => a + b) / pitchValues.length;
  const variance = pitchValues.reduce((sum, p) => sum + Math.pow(p - mean, 2), 0) / pitchValues.length;
  return Math.sqrt(variance);
}

A3: Speaking Rate

Estimated from energy envelope transitions:

estimateSpeakingRate(channelData: Float32Array, sampleRate: number, duration: number): number {
  // Count syllable-like energy peaks
  const syllables = this.countSyllables(channelData, sampleRate);
  const minutes = duration / 60;
  const wordsEstimate = syllables / 1.5; // Average syllables per word
  return wordsEstimate / minutes;
}

A4-A5: Pause Analysis

Pauses detected as energy drops below threshold:

analyzePauses(channelData: Float32Array, sampleRate: number): { frequency: number; duration: number } {
  const frameSize = Math.floor(sampleRate * 0.02); // 20ms frames
  const energyThreshold = 0.01;
  let pauses: number[] = [];
  let currentPauseLength = 0;
  
  for (let i = 0; i < channelData.length; i += frameSize) {
    const energy = this.calculateRMS(channelData.slice(i, i + frameSize));
    if (energy < energyThreshold) {
      currentPauseLength += frameSize / sampleRate;
    } else if (currentPauseLength > 0.2) { // Minimum 200ms pause
      pauses.push(currentPauseLength);
      currentPauseLength = 0;
    }
  }
  
  return {
    frequency: pauses.length / (channelData.length / sampleRate / 60),
    duration: pauses.length > 0 ? pauses.reduce((a, b) => a + b) / pauses.length : 0
  };
}

A6: Voice Energy

Normalised RMS energy:

extractVoiceEnergy(channelData: Float32Array): number {
  const rms = Math.sqrt(
    channelData.reduce((sum, sample) => sum + sample * sample, 0) / channelData.length
  );
  return Math.min(rms * 10, 1); // Normalise to 0-1
}

A7: Jitter

Cycle-to-cycle pitch period variation (simplified):

extractJitter(channelData: Float32Array, sampleRate: number): number {
  // Simplified: use pitch variability as proxy
  // True jitter requires cycle-by-cycle period analysis
  const pitchVar = this.extractPitchVariability(channelData, sampleRate);
  return Math.min(pitchVar / 100, 1);
}

A8: Shimmer

Amplitude variation between consecutive cycles:

extractShimmer(channelData: Float32Array): number {
  const frameSize = 1024;
  const amplitudes: number[] = [];
  
  for (let i = 0; i < channelData.length - frameSize; i += frameSize) {
    const frame = channelData.slice(i, i + frameSize);
    amplitudes.push(Math.max(...frame.map(Math.abs)));
  }
  
  let shimmerSum = 0;
  for (let i = 1; i < amplitudes.length; i++) {
    shimmerSum += Math.abs(amplitudes[i] - amplitudes[i - 1]);
  }
  
  const meanAmplitude = amplitudes.reduce((a, b) => a + b) / amplitudes.length;
  return shimmerSum / (amplitudes.length - 1) / meanAmplitude;
}

A9: Harmonic Ratio

Harmonic-to-noise ratio estimation:

extractHarmonicRatio(channelData: Float32Array, sampleRate: number): number {
  // FFT-based harmonic analysis
  const fftSize = 2048;
  const spectrum = this.computeFFT(channelData.slice(0, fftSize));
  
  // Find harmonic peaks
  const harmonicEnergy = this.sumHarmonicPeaks(spectrum, sampleRate);
  const totalEnergy = spectrum.reduce((a, b) => a + b * b, 0);
  
  return harmonicEnergy / totalEnergy;
}

A10: Quality

Composite quality metric:

calculateQuality(features: AudioFeatures): number {
  let quality = 0.6; // Base quality
  
  // Penalise extreme values
  if (features.meanPitch < 80 || features.meanPitch > 300) quality -= 0.2;
  if (features.voiceEnergy < 0.001) quality -= 0.3;
  if (features.shimmer > 0.5) quality -= 0.1;
  
  return Math.max(0, Math.min(1, quality));
}

Scoring

Audio features are converted to a 0-100 score:

calculateAudioScore(features: AudioFeatures): number {
  const scores: number[] = [];
  
  // Pitch: optimal around 150-180 Hz
  if (features.meanPitch != null) {
    const pitchScore = 100 - Math.abs(features.meanPitch - 165) / 2;
    scores.push(Math.max(0, Math.min(100, pitchScore)));
  }
  
  // Speaking rate: optimal around 120-150 wpm
  if (features.speakingRate != null) {
    const rateScore = 100 - Math.abs(features.speakingRate - 135) / 1.5;
    scores.push(Math.max(0, Math.min(100, rateScore)));
  }
  
  // Jitter: lower is better (more stable voice)
  if (features.jitter != null) {
    scores.push((1 - features.jitter) * 100);
  }
  
  // Shimmer: lower is better (more consistent amplitude)
  if (features.shimmer != null) {
    scores.push((1 - features.shimmer) * 100);
  }
  
  // Average all available scores
  return scores.length > 0 
    ? scores.reduce((a, b) => a + b) / scores.length 
    : 50;
}

Research Basis

Audio features are based on research into vocal biomarkers of mental health:

  • Pitch variability: Reduced in depression (monotone speech)
  • Speaking rate: Slowed in depression, increased in anxiety
  • Pause patterns: Increased pauses associated with cognitive load
  • Jitter/Shimmer: Elevated in stress and anxiety
  • Voice energy: Reduced in depression

Limitations: Current extraction is simplified. Production systems may benefit from:

  • Praat integration for clinical-grade prosodic analysis
  • Machine learning models trained on clinical populations
  • Longitudinal normalisation against individual baselines

Last Updated: December 2025