398 lines
18 KiB
Racket
398 lines
18 KiB
Racket
#lang scribble/manual
|
|
|
|
@(require (for-label racket/base
|
|
;racket/contract
|
|
racket/path
|
|
ffi/unsafe
|
|
let-assert
|
|
define-return
|
|
"../ffmpeg-definitions.rkt"
|
|
"../private/cstruct-helper.rkt"))
|
|
|
|
@title[#:tag "ffmpeg-definitions"]{FFmpeg Decoder Definitions}
|
|
@author[@author+email["Hans Dijkema" "hans@dijkewijk.nl"]]
|
|
|
|
@defmodule[ffmpeg-definitions]
|
|
|
|
This module provides the direct FFmpeg-backed decoder layer used by the audio
|
|
pipeline. It is deliberately small and stateful. A caller creates one decoder
|
|
instance, opens one file on it, queries the selected audio stream, repeatedly
|
|
asks for the next PCM block, and closes the instance again.
|
|
|
|
The module does not expose FFmpeg metadata. It only exposes the information
|
|
needed for playback: stream count, sample rate, channel count, duration,
|
|
bitrate, decoded PCM data, and sample positions. The output format is fixed:
|
|
interleaved signed 32-bit PCM, four bytes per sample, using FFmpeg's
|
|
@tt{AV_SAMPLE_FMT_S32} sample format.
|
|
|
|
The FFmpeg libraries are loaded when the module is required. The module checks
|
|
that the runtime FFmpeg major versions are in the supported range configured by
|
|
the implementation. This binding targets the FFmpeg library major versions
|
|
used by FFmpeg 6, 7, and 8: @tt{libavutil} 58 to 60, @tt{libavcodec} 60 to 62,
|
|
@tt{libavformat} 60 to 62, and @tt{libswresample} 4 to 6. Unsupported runtime
|
|
versions fail early, before a decoder instance is used.
|
|
|
|
On Windows, the private library loader may download the bundled sound-library
|
|
set into Racket's add-on directory before the FFI libraries are opened. On
|
|
Unix-like systems, the FFmpeg libraries are expected to be installed by the
|
|
operating system or platform package manager and to be reachable by Racket's
|
|
FFI library search path.
|
|
|
|
@section{Layering}
|
|
|
|
This module is the low-level Racket FFI layer. It is normally wrapped by
|
|
@filepath{ffmpeg-ffi.rkt} and then by @filepath{ffmpeg-decoder.rkt}@elem{.}
|
|
The first wrapper adapts this module to the command protocol used by the audio
|
|
decoder frontend. The second wrapper exposes the callback-oriented decoder
|
|
interface used by the rest of the playback pipeline.
|
|
|
|
The distinction matters for buffer lifetime. At this level,
|
|
@racket[fmpg-buffer] returns the current buffer owned by the decoder instance.
|
|
The adapter in @filepath{ffmpeg-ffi.rkt} copies that buffer before passing it to
|
|
@filepath{ffmpeg-decoder.rkt}@elem{.} Code that uses this module directly must
|
|
copy the buffer itself when the bytes must survive the next decoder operation.
|
|
|
|
@section{Implementation strategy}
|
|
|
|
This module talks directly to the FFmpeg shared libraries through Racket's FFI.
|
|
There is no C shim that hides FFmpeg's structs or normalizes their layout. The
|
|
price of that choice is that the Racket side must know enough of the relevant C
|
|
struct layouts to read the fields used by the decoder. The benefit is that the
|
|
binding remains a Racket module with direct access to the platform FFmpeg
|
|
libraries.
|
|
|
|
@subsection{Versioned C struct layouts}
|
|
|
|
The module defines only partial FFmpeg structs. A partial definition includes
|
|
the fields that are actually read by this decoder and enough preceding fields to
|
|
compute their offsets. Fields that are not needed are represented only by their
|
|
C type, or by a repetition count such as @racket[(6 _int)]@elem{.} Tail fields
|
|
after the last required member are not described.
|
|
|
|
The helper module @filepath{private/cstruct-helper.rkt} provides
|
|
@racket[make-offsets] and @racket[def-cstruct]@elem{.} The
|
|
@racket[make-offsets] form computes offsets for a sequence of C field types,
|
|
while @racket[def-cstruct] expands to a @racket[define-cstruct] form whose
|
|
public fields are placed at those explicit offsets. This keeps the actual
|
|
accessors small while still accounting for skipped fields in the C layout.
|
|
|
|
The right layout is selected when the module is required, after the runtime
|
|
FFmpeg major versions have been read from the libraries. For the supported
|
|
range, @racket[_AVCodecParameters] uses one layout for
|
|
@tt{libavcodec} major version 60 and another for major versions 61 and 62.
|
|
Likewise, @racket[_AVFrame] uses one layout for @tt{libavutil} major version
|
|
58 and another for major versions 59 and 60. The other partial structs used by
|
|
this module are defined with a single layout across the supported versions.
|
|
|
|
This is why the version check is performed before normal decoder use. The
|
|
accessors are correct only for the FFmpeg major-version ranges for which the
|
|
partial layouts were written. If a future FFmpeg major release changes a
|
|
layout before one of the fields used here, the version range should be extended
|
|
only after the affected partial definitions have been checked.
|
|
|
|
@subsection{Sequential failure handling}
|
|
|
|
Most FFmpeg calls report ordinary failure through C-style return values or null
|
|
pointers. The implementation treats those results as normal control flow, not
|
|
as exceptional Racket failures. The @racket[let/assert] form is used for this
|
|
pattern. It behaves like a sequential binding form: each binding can be checked
|
|
immediately, and a failed check returns the specified failure value for the
|
|
whole form.
|
|
|
|
That style is used for setup paths such as opening a file, selecting stream
|
|
information, allocating the codec context, and initializing the resampler. It
|
|
keeps the success path linear while still giving each FFmpeg return value or
|
|
pointer a local check. Predicates such as @tt{a-!nullptr?}@elem{,}
|
|
@tt{a-nullptr?}@elem{,} @tt{a-true?}@elem{,} and @tt{a->=?} express the usual
|
|
FFmpeg checks directly next to the binding that produced the value.
|
|
|
|
For loops where decoding must stop immediately from a nested position, the
|
|
module uses @racket[define/return] from @racketmodname[define-return]@elem{.}
|
|
This gives functions such as @racket[fmpg-decode-next!] and the internal
|
|
resampler drain routine an explicit early-return continuation without using
|
|
exceptions for normal FFmpeg outcomes. The two helpers are implementation
|
|
dependencies; they are not re-exported by this module.
|
|
|
|
@section{Decoder instances}
|
|
|
|
A decoder instance is an opaque value returned by @racket[fmpg-init]@elem{.}
|
|
Its structure type and predicate are not exported. Pass the value back to the
|
|
functions in this module and do not inspect it directly. The contracts below
|
|
therefore use @racket[any/c] for the instance argument. Operationally, that
|
|
argument must be a value returned by @racket[fmpg-init]@elem{.}
|
|
|
|
The instance owns native FFmpeg resources: a format context, a codec context,
|
|
an audio frame, a resampler, and the Racket byte string used for the current
|
|
PCM block. Finalizers are installed as a last line of defence, but callers
|
|
should still call @racket[fmpg-close!] explicitly when playback stops or when
|
|
the file is no longer needed. Explicit close keeps the lifetime of native
|
|
resources predictable.
|
|
|
|
@defproc[(fmpg-init) any/c]{
|
|
Creates a new decoder instance. The result is an opaque instance value, or
|
|
@racket[#f] if the instance could not be created.
|
|
|
|
Creating the instance does not open a file. Use @racket[fmpg-open-file!]
|
|
before querying stream information or decoding audio.
|
|
}
|
|
|
|
@defproc[(fmpg-open-file! [instance any/c]
|
|
[filename (or/c path? string?)])
|
|
(integer-in 0 1)]{
|
|
Opens @racket[filename] on @racket[instance]@elem{,} reads the stream
|
|
information, selects the best audio stream, initializes the codec context, and
|
|
initializes the resampler.
|
|
|
|
The function returns @racket[1] on success and @racket[0] on failure. On
|
|
failure, partially initialized native state is closed again.
|
|
|
|
An instance can only have one file open. Close it with @racket[fmpg-close!]
|
|
before opening another file on the same instance. A non-string, non-path
|
|
filename is treated as an open failure and returns @racket[0]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-close! [instance any/c]) void?]{
|
|
Closes @racket[instance] if it is open and releases the native FFmpeg resources
|
|
owned by the instance. The stored audio information is reset. Calling this
|
|
function with @racket[#f] or with an already closed instance is harmless.
|
|
}
|
|
|
|
@defproc[(fmpg-is-open [instance any/c]) (integer-in 0 1)]{
|
|
Returns @racket[1] when @racket[instance] is ready for decoding and
|
|
@racket[0] otherwise. An instance is ready only after a file has been opened,
|
|
a usable audio stream has been selected, and the decoder and resampler have
|
|
been initialized.
|
|
}
|
|
|
|
@section{Audio stream information}
|
|
|
|
The decoder selects one audio stream for playback using FFmpeg's best-stream
|
|
selection. The stream count reports how many audio streams were found in the
|
|
container, but decoding is performed only for the selected stream.
|
|
|
|
The term @italic{sample} in this module means a sample frame: one time step in
|
|
the audio stream, across all channels. For stereo 32-bit output, one sample
|
|
frame therefore occupies @racket[(* 2 4)] bytes in the returned PCM buffer.
|
|
|
|
@defproc[(fmpg-audio-stream-count [instance any/c])
|
|
exact-nonnegative-integer?]{
|
|
Returns the number of audio streams in the open container. If the instance is
|
|
not open, the result is @racket[0]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-audio-sample-rate [instance any/c])
|
|
exact-nonnegative-integer?]{
|
|
Returns the selected audio stream's sample rate. If the instance is not ready,
|
|
the result is @racket[0]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-audio-channels [instance any/c])
|
|
exact-nonnegative-integer?]{
|
|
Returns the selected audio stream's channel count. If the instance is not
|
|
ready, the result is @racket[0]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-audio-bits-per-sample [instance any/c])
|
|
exact-positive-integer?]{
|
|
Returns the fixed output sample width in bits. The current output format is
|
|
32-bit signed PCM, so this function returns @racket[32]@elem{.} The value is
|
|
independent of the input file's original sample format and does not depend on
|
|
the instance state.
|
|
}
|
|
|
|
@defproc[(fmpg-audio-bytes-per-sample [instance any/c])
|
|
exact-positive-integer?]{
|
|
Returns the fixed output sample width in bytes. The current output format is
|
|
32-bit signed PCM, so this function returns @racket[4]@elem{.} The value is
|
|
independent of the input file's original sample format and does not depend on
|
|
the instance state.
|
|
}
|
|
|
|
@defproc[(fmpg-duration-ms [instance any/c]) exact-integer?]{
|
|
Returns the duration of the selected audio stream in milliseconds. If the
|
|
stream duration is not available, the container duration is used as a fallback.
|
|
If no duration can be determined, or when the instance is not ready, the result
|
|
is @racket[-1]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-duration-samples [instance any/c]) exact-integer?]{
|
|
Returns the duration of the selected audio stream in sample frames. If the
|
|
stream duration is not available, the container duration is used as a fallback.
|
|
If no duration can be determined, or when the instance is not ready, the result
|
|
is @racket[-1]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-file-bitrate [instance any/c]) exact-integer?]{
|
|
Returns the container bitrate in bits per second. If the bitrate is
|
|
unavailable or if the instance is not open, the result is @racket[-1]@elem{.}
|
|
}
|
|
|
|
@section{Decoding}
|
|
|
|
Decoding is block oriented. Each call to @racket[fmpg-decode-next!] clears the
|
|
previous PCM block and attempts to produce the next decoded block for the
|
|
selected audio stream. When the call returns @racket[1]@elem{,} the block can
|
|
be read with @racket[fmpg-buffer] and described with the buffer query
|
|
functions.
|
|
|
|
@defproc[(fmpg-decode-next! [instance any/c]) (integer-in 0 1)]{
|
|
Decodes until a block of PCM output is available or no more output can be
|
|
produced. The function returns @racket[1] when @racket[fmpg-buffer] contains a
|
|
non-empty PCM block. It returns @racket[0] when the instance is not ready, when
|
|
end of stream has been reached, or when FFmpeg reports an unrecoverable decode
|
|
error.
|
|
|
|
The function does not distinguish end of stream from a decode failure. The
|
|
intended playback loop treats @racket[0] as no further PCM block available for
|
|
this decoder instance.
|
|
|
|
Internally, decoding receives all currently available frames, reads packets for
|
|
the selected audio stream, sends those packets to the codec, converts decoded
|
|
frames through @tt{libswresample}@elem{,} and drains the resampler at end of
|
|
stream. Non-selected packets are skipped.
|
|
}
|
|
|
|
@defproc[(fmpg-seek-ms! [instance any/c]
|
|
[target-pos-ms exact-nonnegative-integer?])
|
|
(integer-in 0 1)]{
|
|
Seeks the selected audio stream to @racket[target-pos-ms] milliseconds and
|
|
resets the decoder and resampler state. The function returns @racket[1] on
|
|
success and @racket[0] on failure.
|
|
|
|
Seeking uses FFmpeg's backward seek flag. After the seek, decoded audio before
|
|
the requested target sample is discarded so the next buffer starts at, or as
|
|
close as FFmpeg can provide to, the requested position.
|
|
}
|
|
|
|
@section{Decoded buffers}
|
|
|
|
The PCM buffer belongs to the decoder instance. It is replaced by the next
|
|
call to @racket[fmpg-decode-next!]@elem{,} @racket[fmpg-seek-ms!]@elem{,} or
|
|
@racket[fmpg-close!]@elem{.} Treat the returned byte string as read-only.
|
|
Copy it if it must outlive the next decoder operation or if another component
|
|
may mutate it.
|
|
|
|
@defproc[(fmpg-buffer [instance any/c]) (or/c bytes? #f)]{
|
|
Returns the current decoded PCM block as a byte string, or @racket[#f] when no
|
|
PCM block is available.
|
|
|
|
The byte string contains interleaved signed 32-bit samples. Its logical frame
|
|
count is available as the difference between @racket[fmpg-buffer-end-sample]
|
|
and @racket[fmpg-buffer-start-sample]@elem{.} Its byte size is also available
|
|
through @racket[fmpg-buffer-size]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-buffer-size [instance any/c]) exact-nonnegative-integer?]{
|
|
Returns the number of valid bytes in the current PCM buffer. If no decoder
|
|
state is available, or if the size would not fit in the internal integer range,
|
|
the function returns @racket[0]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-buffer-start-sample [instance any/c])
|
|
exact-nonnegative-integer?]{
|
|
Returns the first sample frame represented by the current PCM buffer. If no
|
|
decoder state is available, the result is @racket[0]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-buffer-end-sample [instance any/c])
|
|
exact-nonnegative-integer?]{
|
|
Returns the half-open end position of the current PCM buffer: the first sample
|
|
frame after the current buffer. The number of sample frames in the buffer is
|
|
the end position minus @racket[fmpg-buffer-start-sample]@elem{.} If no decoder
|
|
state is available, the result is @racket[0]@elem{.}
|
|
}
|
|
|
|
@defproc[(fmpg-sample-position [instance any/c])
|
|
exact-nonnegative-integer?]{
|
|
Returns the decoder's next sample-frame position after the current output.
|
|
During normal decoding it is the same as @racket[fmpg-buffer-end-sample] for
|
|
the current buffer. After a seek, it is reset to the target position before
|
|
new audio is decoded.
|
|
}
|
|
|
|
@section{FFmpeg version information}
|
|
|
|
@defproc[(ffmpeg-version [lib (or/c 'avutil 'avcodec 'avformat
|
|
'swr 'swresample)])
|
|
(list/c exact-nonnegative-integer?
|
|
exact-nonnegative-integer?
|
|
exact-nonnegative-integer?)]{
|
|
Returns the runtime version of one FFmpeg library as a three-element list
|
|
containing the major, minor, and micro version numbers. The symbols
|
|
@racket['swr] and @racket['swresample] both refer to @tt{libswresample}@elem{.}
|
|
|
|
The function raises an exception for an unknown library symbol.
|
|
}
|
|
|
|
@section{Use through the decoder frontend}
|
|
|
|
The direct API above is normally wrapped by @filepath{ffmpeg-ffi.rkt} and by
|
|
@filepath{ffmpeg-decoder.rkt}@elem{.} The frontend function
|
|
@tt{ffmpeg-open} returns a handle or @racket[#f] when the file does not exist.
|
|
Its stream-info callback receives a mutable hash with at least these playback
|
|
keys:
|
|
|
|
@racketblock[
|
|
(list 'sample-rate
|
|
'channels
|
|
'bits-per-sample
|
|
'bytes-per-sample
|
|
'total-samples
|
|
'duration)]
|
|
|
|
The audio callback receives the same hash extended for the current buffer with
|
|
these keys:
|
|
|
|
@racketblock[
|
|
(list 'sample
|
|
'current-time)]
|
|
|
|
The hash is followed by a copied byte string and its valid byte count. The
|
|
copy is made by @filepath{ffmpeg-ffi.rkt}@elem{,} not by the low-level buffer
|
|
function itself.
|
|
|
|
The frontend's seek function accepts a percentage of the stream and translates
|
|
that percentage to a sample position. The adapter then translates the sample
|
|
position to milliseconds and calls @racket[fmpg-seek-ms!]@elem{.} This is why
|
|
the low-level module exposes millisecond seeking while the frontend exposes
|
|
percentage seeking.
|
|
|
|
@section{Example}
|
|
|
|
The following example opens a file, decodes all PCM blocks, and reports their
|
|
byte ranges and sample ranges. A real playback loop would pass each buffer to
|
|
the audio output layer before requesting the next block.
|
|
|
|
@racketblock[
|
|
(define dec (fmpg-init))
|
|
|
|
(when (and dec (= (fmpg-open-file! dec "track.ogg") 1))
|
|
(printf "~a Hz, ~a channels, ~a ms\n"
|
|
(fmpg-audio-sample-rate dec)
|
|
(fmpg-audio-channels dec)
|
|
(fmpg-duration-ms dec))
|
|
|
|
(let loop ()
|
|
(when (= (fmpg-decode-next! dec) 1)
|
|
(define pcm (fmpg-buffer dec))
|
|
(define size (fmpg-buffer-size dec))
|
|
(define start (fmpg-buffer-start-sample dec))
|
|
(define end (fmpg-buffer-end-sample dec))
|
|
(printf "decoded ~a bytes, samples [~a, ~a)\n"
|
|
size start end)
|
|
;; Pass pcm to the audio output layer here, or copy it if needed.
|
|
(loop)))
|
|
|
|
(fmpg-close! dec))
|
|
]
|
|
|
|
A simple seek flow looks the same after the seek succeeds. The following code
|
|
moves to 30 seconds and then requests the next decoded buffer.
|
|
|
|
@racketblock[
|
|
(when (= (fmpg-seek-ms! dec 30000) 1)
|
|
(when (= (fmpg-decode-next! dec) 1)
|
|
(define pcm (fmpg-buffer dec))
|
|
(define start (fmpg-buffer-start-sample dec))
|
|
(printf "first buffer after seek starts at sample ~a\n" start)))
|
|
] |