Switch Shazam from listing through the microphone and instead feed it through a buffer #6

Open
opened 2026-04-10 15:02:54 -07:00 by dallasgroot · 0 comments
Owner

AI Instructions: ShazamKit HLS Buffer Integration

Use the following instructions and technical reference to implement a microphone-free ShazamKit solution for HLS streams.


1. System Prompt for Your AI Assistant

Role: Senior iOS Audio Engineer
Objective: Refactor the app's ShazamKit integration to ingest audio directly from an AVPlayer HLS stream buffer instead of the microphone. Make this work with none HLS stations to.

Task Requirements:

  1. Implement an MTAudioProcessingTap to intercept the audio from an AVPlayerItem.
  2. In the tap's process callback, retrieve the raw AudioBufferList.
  3. Convert the incoming audio format (likely AAC/High-Sample Rate) into a mono AVAudioPCMBuffer compatible with ShazamKit (16kHz or 44.1kHz).
  4. Feed the resulting buffer into SHSession.matchStreamingBuffer(_:at:).
  5. Thread Safety: Ensure the matching logic is dispatched to a background serial queue to prevent blocking the real-time audio render thread.
  6. Privacy: Remove all code related to AVAudioInputNode and the NSMicrophoneUsageDescription requirement.

2. Technical Architecture

The best approach for AVPlayer is the MTAudioProcessingTap. It acts as a digital "splitter" in the audio pipeline.

Component Responsibility
AVPlayerItem Holds the HLS asset and the AVAudioMix.
MTAudioProcessingTap A low-level C-callback that provides access to the raw PCM samples.
AVAudioConverter Resamples the stream audio (e.g., 48kHz Stereo) to Shazam-friendly (16kHz Mono).
SHSession Analyzes the signature generated from the converted buffer.

3. Implementation Examples

A. The Processing Tap Callback

This function must be a C-style function (outside a class or static) to handle the raw audio frames.

func tapProcessCallback(
    tap: MTAudioProcessingTap,
    numberFrames: CMItemCount,
    flags: MTAudioProcessingTapFlags,
    bufferListInOut: UnsafeMutablePointer<AudioBufferList>,
    numberFramesOut: UnsafeMutablePointer<CMItemCount>,
    flagsOut: UnsafeMutablePointer<MTAudioProcessingTapFlags>
) {
    // 1. Fetch the audio from the source
    let status = MTAudioProcessingTapGetSourceAudio(tap, numberFrames, bufferListInOut, flagsOut, nil, numberFramesOut)
    
    if status == noErr {
        // 2. Access the class instance stored in the tap's storage
        let storage = Unmanaged<ShazamManager>.fromOpaque(MTAudioProcessingTapGetStorage(tap)).takeUnretainedValue()
        
        // 3. Forward the raw buffer to the analysis queue
        storage.analyze(bufferList: bufferListInOut, frameCount: numberFrames)
    }
}
B. Installing the Tap on HLS
Attach the tap via an AVMutableAudioMix once the player item is ready.

func attachTap(to playerItem: AVPlayerItem) {
    guard let audioTrack = playerItem.asset.tracks(withMediaType: .audio).first else { return }
    
    var callbacks = MTAudioProcessingTapCallbacks(
        version: kMTAudioProcessingTapCallbacksVersion_0,
        clientInfo: UnsafeMutableRawPointer(Unmanaged.passUnretained(self).toOpaque()),
        init: { (tap, clientInfo, tapStorageOut) in
            tapStorageOut.pointee = clientInfo
        },
        finalize: nil,
        prepare: nil,
        unprepare: nil,
        process: tapProcessCallback
    )
    
    var tap: Unmanaged<MTAudioProcessingTap>?
    MTAudioProcessingTapCreate(kCFAllocatorDefault, &callbacks, kMTAudioProcessingTapCreationFlag_PostEffects, &tap)
    
    let inputParameters = AVMutableAudioMixInputParameters(track: audioTrack)
    inputParameters.audioTapProcessor = tap?.takeRetainedValue()
    
    let audioMix = AVMutableAudioMix()
    audioMix.inputParameters = [inputParameters]
    
    playerItem.audioMix = audioMix
}

C. Conversion and Matching
ShazamKit requires Linear PCM. Since HLS streams are often 48kHz or use compressed AAC, a conversion step is required.

func analyze(bufferList: UnsafeMutablePointer<AudioBufferList>, frameCount: CMItemCount) {
    analysisQueue.async {
        // Target: 44.1kHz Mono PCM
        let targetFormat = AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 1)!
        
        // Wrap the raw AudioBufferList in an AVAudioPCMBuffer
        // Note: You will need to use AVAudioConverter here if the source format
        // differs from the target format.
        guard let pcmBuffer = AVAudioPCMBuffer(pcmFormat: targetFormat, frameCapacity: AVAudioFrameCount(frameCount)) else { return }
        
        // Perform match
        self.shazamSession.matchStreamingBuffer(pcmBuffer, at: nil)
    }
}

4. Key Requirements & Constraints
Real-Time Thread Safety: The process callback is called on a high-priority thread. Never perform UI updates, networking, or blocking file I/O inside this callback.
 Sample Rate: ShazamKit supports 16kHz, 32kHz, 44.1kHz, and 48kHz. 16kHz is recommended for performance.
 Mono vs Stereo: Always downmix to Mono before sending to ShazamKit to reduce CPU overhead.
 Ready State: You must wait for AVPlayerItem.status == .readyToPlay before setting up the tap; otherwise, the audio tracks will be nil.
 Buffer Size: Feed ShazamKit at least 5-10 seconds of audio for a reliable match, though it can trigger on less if the signal is clear.
# AI Instructions: ShazamKit HLS Buffer Integration Use the following instructions and technical reference to implement a microphone-free ShazamKit solution for HLS streams. --- ## 1. System Prompt for Your AI Assistant > **Role:** Senior iOS Audio Engineer > **Objective:** Refactor the app's ShazamKit integration to ingest audio directly from an `AVPlayer` HLS stream buffer instead of the microphone. Make this work with none HLS stations to. > > **Task Requirements:** > 1. Implement an `MTAudioProcessingTap` to intercept the audio from an `AVPlayerItem`. > 2. In the tap's `process` callback, retrieve the raw `AudioBufferList`. > 3. Convert the incoming audio format (likely AAC/High-Sample Rate) into a mono `AVAudioPCMBuffer` compatible with ShazamKit (16kHz or 44.1kHz). > 4. Feed the resulting buffer into `SHSession.matchStreamingBuffer(_:at:)`. > 5. **Thread Safety:** Ensure the matching logic is dispatched to a background serial queue to prevent blocking the real-time audio render thread. > 6. **Privacy:** Remove all code related to `AVAudioInputNode` and the `NSMicrophoneUsageDescription` requirement. --- ## 2. Technical Architecture The best approach for `AVPlayer` is the **MTAudioProcessingTap**. It acts as a digital "splitter" in the audio pipeline. | Component | Responsibility | | :--- | :--- | | **AVPlayerItem** | Holds the HLS asset and the `AVAudioMix`. | | **MTAudioProcessingTap** | A low-level C-callback that provides access to the raw PCM samples. | | **AVAudioConverter** | Resamples the stream audio (e.g., 48kHz Stereo) to Shazam-friendly (16kHz Mono). | | **SHSession** | Analyzes the signature generated from the converted buffer. | --- ## 3. Implementation Examples ### A. The Processing Tap Callback This function must be a C-style function (outside a class or static) to handle the raw audio frames. ```swift func tapProcessCallback( tap: MTAudioProcessingTap, numberFrames: CMItemCount, flags: MTAudioProcessingTapFlags, bufferListInOut: UnsafeMutablePointer<AudioBufferList>, numberFramesOut: UnsafeMutablePointer<CMItemCount>, flagsOut: UnsafeMutablePointer<MTAudioProcessingTapFlags> ) { // 1. Fetch the audio from the source let status = MTAudioProcessingTapGetSourceAudio(tap, numberFrames, bufferListInOut, flagsOut, nil, numberFramesOut) if status == noErr { // 2. Access the class instance stored in the tap's storage let storage = Unmanaged<ShazamManager>.fromOpaque(MTAudioProcessingTapGetStorage(tap)).takeUnretainedValue() // 3. Forward the raw buffer to the analysis queue storage.analyze(bufferList: bufferListInOut, frameCount: numberFrames) } } B. Installing the Tap on HLS Attach the tap via an AVMutableAudioMix once the player item is ready. func attachTap(to playerItem: AVPlayerItem) { guard let audioTrack = playerItem.asset.tracks(withMediaType: .audio).first else { return } var callbacks = MTAudioProcessingTapCallbacks( version: kMTAudioProcessingTapCallbacksVersion_0, clientInfo: UnsafeMutableRawPointer(Unmanaged.passUnretained(self).toOpaque()), init: { (tap, clientInfo, tapStorageOut) in tapStorageOut.pointee = clientInfo }, finalize: nil, prepare: nil, unprepare: nil, process: tapProcessCallback ) var tap: Unmanaged<MTAudioProcessingTap>? MTAudioProcessingTapCreate(kCFAllocatorDefault, &callbacks, kMTAudioProcessingTapCreationFlag_PostEffects, &tap) let inputParameters = AVMutableAudioMixInputParameters(track: audioTrack) inputParameters.audioTapProcessor = tap?.takeRetainedValue() let audioMix = AVMutableAudioMix() audioMix.inputParameters = [inputParameters] playerItem.audioMix = audioMix } C. Conversion and Matching ShazamKit requires Linear PCM. Since HLS streams are often 48kHz or use compressed AAC, a conversion step is required. func analyze(bufferList: UnsafeMutablePointer<AudioBufferList>, frameCount: CMItemCount) { analysisQueue.async { // Target: 44.1kHz Mono PCM let targetFormat = AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 1)! // Wrap the raw AudioBufferList in an AVAudioPCMBuffer // Note: You will need to use AVAudioConverter here if the source format // differs from the target format. guard let pcmBuffer = AVAudioPCMBuffer(pcmFormat: targetFormat, frameCapacity: AVAudioFrameCount(frameCount)) else { return } // Perform match self.shazamSession.matchStreamingBuffer(pcmBuffer, at: nil) } } 4. Key Requirements & Constraints Real-Time Thread Safety: The process callback is called on a high-priority thread. Never perform UI updates, networking, or blocking file I/O inside this callback. • Sample Rate: ShazamKit supports 16kHz, 32kHz, 44.1kHz, and 48kHz. 16kHz is recommended for performance. • Mono vs Stereo: Always downmix to Mono before sending to ShazamKit to reduce CPU overhead. • Ready State: You must wait for AVPlayerItem.status == .readyToPlay before setting up the tap; otherwise, the audio tracks will be nil. • Buffer Size: Feed ShazamKit at least 5-10 seconds of audio for a reliable match, though it can trigger on less if the signal is clear.
dallasgroot started working 2026-04-10 15:03:24 -07:00
dallasgroot stopped working 2026-04-10 15:03:29 -07:00
5 seconds
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Total time spent: 5 seconds
dallasgroot
5 seconds
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: dallasgroot/NavidromeApp#6
No description provided.