#3817 Librosa Feature Extraction

#3817 Provide a python application using librosa to read an MP3 to produce arrays of 8 distinct musical properties

Using the librosa library in Python, you can load an MP3 file and extract various musical properties (features). Since librosa requires external libraries like FFmpeg or GStreamer for MP3 support via its dependency audioread, ensure one of these is installed on your system.

Here is a Python application that reads an MP3 file and extracts 8 distinct musical properties as NumPy arrays:

🎶 Python Librosa Feature Extraction

import librosa
import librosa.feature
import numpy as np

def extract_musical_features(file_path):
    """
    Reads an MP3 file and extracts 8 distinct musical properties using librosa.

    Args:
        file_path (str): The path to the MP3 file.

    Returns:
        dict: A dictionary where keys are the feature names and values are 
              the resulting NumPy arrays or scalar values.
    """
    try:
        # 1. Load the audio file (y: audio time series, sr: sampling rate)
        # librosa.load handles MP3 via audioread dependency (requires ffmpeg/gstreamer)
        y, sr = librosa.load(file_path, sr=22050, mono=True)
        
        # Initialize dictionary to hold features
        features = {}

        # --- Extract 8 Distinct Features ---
        
        # 2. Mel-Frequency Cepstral Coefficients (MFCCs) - Represents spectral shape
        # Typically used for recognition/classification. We'll take the array.
        features['mfcc'] = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
        
        # 3. Zero-Crossing Rate (ZCR) - Measures the rate of sign changes in the signal
        # Good for characterizing percussive vs. harmonic content.
        features['zero_crossing_rate'] = librosa.feature.zero_crossing_rate(y)

        # 4. Spectral Centroid - Indicates the 'center of mass' of the spectrum
        # Correlates with the perceived 'brightness' of the sound.
        features['spectral_centroid'] = librosa.feature.spectral_centroid(y=y, sr=sr)[0]

        # 5. Spectral Bandwidth - Measures the spread of the spectrum around its centroid.
        features['spectral_bandwidth'] = librosa.feature.spectral_bandwidth(y=y, sr=sr)[0]

        # 6. Spectral Roll-off - The frequency below which a specified percentage of the 
        # total spectral energy lies (e.g., 85%).
        features['spectral_rolloff'] = librosa.feature.spectral_rolloff(y=y, sr=sr)[0]
        
        # 7. Chromagram (Chroma Short-Time Fourier Transform) - Measures the 
        # intensity of the 12 pitch classes (C, C#, D, etc.). Useful for harmony/tonality.
        features['chroma_stft'] = librosa.feature.chroma_stft(y=y, sr=sr)
        
        # 8. Root Mean Square (RMS) Energy - Measures the magnitude of the signal, 
        # often related to loudness.
        features['rms_energy'] = librosa.feature.rms(y=y)[0]
        
        # 9. Estimated Tempo (BPM) - Measures the tempo in beats per minute.
        # This returns a scalar (float), not a frame-wise array like others, but it's a distinct musical property.
        features['tempo'], _ = librosa.beat.beat_track(y=y, sr=sr)
        
        return features

    except Exception as e:
        print(f"An error occurred: {e}")
        print("Make sure you have all dependencies installed (e.g., 'pip install librosa' and either FFmpeg or GStreamer for MP3 support).")
        return None

# --- Example Usage ---
# NOTE: Replace 'your_audio_file.mp3' with the actual path to your MP3 file.
mp3_file = 'your_audio_file.mp3' 
extracted_data = extract_musical_features(mp3_file)

if extracted_data:
    print("\n✅ Extracted Musical Features:")
    for feature_name, array in extracted_data.items():
        if isinstance(array, np.ndarray):
            print(f"- **{feature_name}**: NumPy Array, Shape: {array.shape}, Data Type: {array.dtype}")
        else:
             print(f"- **{feature_name}**: Scalar Value, Value: {array}")

The video Audio processing in Python with Feature Extraction for machine learning provides a tutorial on using librosa for audio processing and feature extraction, including some of the features mentioned above.

Would you like me to elaborate on what any of these specific musical properties represent in more detail?

In addition to the 8 features, what other data can be extracted from an mp3?

That's a great question! Besides the 8 low-level acoustic features already extracted, you can derive several other types of valuable data from an MP3 using librosa and related Python packages. These additional data points often represent high-level musical structure or time-frequency analysis.

Here are some other key types of data you can extract from an MP3:

📊 Core Signal Data

This is the raw or primary transformed signal data, which all other features are typically derived from.

Waveform ( $y$ and $sr$ ):
- The fundamental data extracted by librosa.load().
- Data: A 1-dimensional NumPy array (¹ $y$ ) of amplitude values over time, and the sampling rate (² $sr$ ) in Hertz.³
- Use: This is the time-domain representation of the audio.⁴
Short-Time Fourier Transform (STFT):
- Data: A complex-valued 2D array representing the spectrogram (time vs. frequency). The magnitude of this array, ⁵ $\left|S\right|$ , is the standard spectrogram.⁶
- Use: Shows how the frequency content of the signal changes over time. It's the basis for most spectral features.
Mel Spectrogram:
- Data: A 2D array similar to the STFT spectrogram but with frequencies warped onto the Mel scale, which is perceptually uniform.⁷
- Use: Closer to how the human ear perceives pitch, it's often more effective than a standard spectrogram for machine learning tasks.

🎵 Structural and Temporal Data

These features analyze the music's structure over time, giving insight into rhythm and tonality.

Feature	Librosa Function	Description
Beat Frames/Times	`librosa.beat.beat_track()`	The indices (frames) or timestamps (seconds) where musical beats occur. Essential for synchronization.
Onset Times	`librosa.onset.onset_detect()`	The precise times when notes or percussive events begin. Crucial for rhythm analysis.
Harmonic/Percussive Separation	`librosa.effects.hpss()`	Separates the audio into two new time-series arrays: one containing the harmonic (tonal) components and one containing the percussive (rhythmic) components.
Tonal Centroid Features (Tonnetz)	`librosa.feature.tonnetz()`	A 6-dimensional representation of musical pitch that is invariant to octaves. Used for advanced harmonic analysis.

⚙️ Feature Manipulation Data

These functions don't extract new information but provide derivatives of existing arrays, which are very useful for analysis.

Delta Features:
- Data: A new array representing the local estimate of the derivative (change) of an existing feature (like MFCCs).
- Use: Captures the velocity or rate of change of a feature over time, which can be critical in music and speech recognition. You can also calculate the second derivative (delta-delta).
Feature Statistics:
- Data: Single scalar values (mean, standard deviation, median, min, max, etc.) calculated across the entire array for any time-series feature (e.g., the average Spectral Centroid for the whole song).
- Use: Reduces large time-series arrays into a small, fixed-size feature vector, often required as input for machine learning models.

Would you like to see a code snippet demonstrating how to perform Harmonic/Percussive Separation or extract Beat Times?

List the full set of librosa.features

The librosa.feature module contains a comprehensive suite of functions for extracting low-level musical and acoustic features from audio signals. These features are generally categorized as Spectral, Rhythm, Manipulation, and Inversion.

Here is the full set of functions available within librosa.feature (based on librosa version 0.11.0 documentation):

✨ Complete `librosa.feature` Function Set

Spectral Features

These functions analyze the frequency content of the audio signal.

Feature Function	Description
`librosa.feature.chroma_stft`	Compute a standard Chromagram (pitch classes C, C#, D... over time).
`librosa.feature.chroma_cqt`	Compute a Chromagram using the Constant-Q Transform (CQT) for better resolution in lower frequencies.
`librosa.feature.chroma_cens`	Compute the Chroma Energy Normalized Statistics (CENS), which smooths local deviations for better musical key recognition.
`librosa.feature.chroma_vqt`	Compute a Chromagram using the Variable-Q Transform (VQT).
`librosa.feature.melspectrogram`	Compute a Mel-scaled spectrogram.
`librosa.feature.mfcc`	Compute Mel-Frequency Cepstral Coefficients (MFCCs).
`librosa.feature.rms`	Compute the Root Mean Square (RMS) energy for each frame (often used for loudness).
`librosa.feature.spectral_centroid`	Compute the spectral centroid (perceived brightness).
`librosa.feature.spectral_bandwidth`	Compute the spectral bandwidth (spread of energy around the centroid).
`librosa.feature.spectral_contrast`	Compute spectral contrast (difference between peaks and valleys in the spectrum).
`librosa.feature.spectral_flatness`	Compute spectral flatness (measure of how "noise-like" the spectrum is).
`librosa.feature.spectral_rolloff`	Compute the spectral roll-off frequency (frequency below which a certain percentage of energy lies).
`librosa.feature.poly_features`	Get coefficients of fitting an nth-order polynomial to the columns of a spectrogram.
`librosa.feature.tonnetz`	Compute the Tonal Centroid Features (Tonnetz), a 6-dimensional representation of harmony.
`librosa.feature.zero_crossing_rate`	Compute the Zero-Crossing Rate (ZCR) (rate of sign changes in the signal).

Rhythm Features

These functions are used for analyzing the rhythmic properties of the audio.

Feature Function	Description
`librosa.feature.tempo`	Estimate the tempo (Beats Per Minute, BPM).
`librosa.feature.tempogram`	Compute the tempogram: local autocorrelation of the onset strength envelope.
`librosa.feature.fourier_tempogram`	Compute the Fourier tempogram: the short-time Fourier transform of the onset strength envelope.
`librosa.feature.tempogram_ratio`	Compute the tempogram ratio features (also known as spectral rhythm patterns).

Feature Manipulation

These functions are utilities for processing features after they've been extracted.

Feature Function	Description
`librosa.feature.delta`	Compute delta features: the local estimate of the first, second, etc., derivative of the input data.
`librosa.feature.stack_memory`	Create a short-term history embedding by vertically concatenating delayed copies of the data matrix (useful for sequential modeling).

Feature Inversion

These functions are used to reverse the feature extraction process to approximate the original audio.¹

Feature Function	Description
`librosa.feature.inverse.mel_to_stft`	Approximate STFT magnitude from a Mel power spectrogram.
`librosa.feature.inverse.mel_to_audio`	Invert a Mel power spectrogram back to audio using the Griffin-Lim algorithm.
`librosa.feature.inverse.mfcc_to_mel`	Invert MFCCs to approximate a Mel power spectrogram.
`librosa.feature.inverse.mfcc_to_audio`	Convert MFCCs to a time-domain audio signal.

The documentation for all these feature extraction methods can be found in this resource on audio processing with librosa: Audio processing in Python with Feature Extraction for machine learning.

From <https://gemini.google.com/app/> Google Gemini 2.5 Pro

Search This Blog

Learning from ChatGPT