documentation

This commit is contained in:
2026-05-11 20:03:28 +02:00
parent 7956d15072
commit 4966936381
4 changed files with 666 additions and 1 deletions
+398
View File
@@ -0,0 +1,398 @@
#lang scribble/manual
@(require (for-label racket/base
;racket/contract
racket/path
ffi/unsafe
let-assert
define-return
"../ffmpeg-definitions.rkt"
"../private/cstruct-helper.rkt"))
@title[#:tag "ffmpeg-definitions"]{FFmpeg Decoder Definitions}
@author[@author+email["Hans Dijkema" "hans@dijkewijk.nl"]]
@defmodule[ffmpeg-definitions]
This module provides the direct FFmpeg-backed decoder layer used by the audio
pipeline. It is deliberately small and stateful. A caller creates one decoder
instance, opens one file on it, queries the selected audio stream, repeatedly
asks for the next PCM block, and closes the instance again.
The module does not expose FFmpeg metadata. It only exposes the information
needed for playback: stream count, sample rate, channel count, duration,
bitrate, decoded PCM data, and sample positions. The output format is fixed:
interleaved signed 32-bit PCM, four bytes per sample, using FFmpeg's
@tt{AV_SAMPLE_FMT_S32} sample format.
The FFmpeg libraries are loaded when the module is required. The module checks
that the runtime FFmpeg major versions are in the supported range configured by
the implementation. This binding targets the FFmpeg library major versions
used by FFmpeg 6, 7, and 8: @tt{libavutil} 58 to 60, @tt{libavcodec} 60 to 62,
@tt{libavformat} 60 to 62, and @tt{libswresample} 4 to 6. Unsupported runtime
versions fail early, before a decoder instance is used.
On Windows, the private library loader may download the bundled sound-library
set into Racket's add-on directory before the FFI libraries are opened. On
Unix-like systems, the FFmpeg libraries are expected to be installed by the
operating system or platform package manager and to be reachable by Racket's
FFI library search path.
@section{Layering}
This module is the low-level Racket FFI layer. It is normally wrapped by
@filepath{ffmpeg-ffi.rkt} and then by @filepath{ffmpeg-decoder.rkt}@elem{.}
The first wrapper adapts this module to the command protocol used by the audio
decoder frontend. The second wrapper exposes the callback-oriented decoder
interface used by the rest of the playback pipeline.
The distinction matters for buffer lifetime. At this level,
@racket[fmpg-buffer] returns the current buffer owned by the decoder instance.
The adapter in @filepath{ffmpeg-ffi.rkt} copies that buffer before passing it to
@filepath{ffmpeg-decoder.rkt}@elem{.} Code that uses this module directly must
copy the buffer itself when the bytes must survive the next decoder operation.
@section{Implementation strategy}
This module talks directly to the FFmpeg shared libraries through Racket's FFI.
There is no C shim that hides FFmpeg's structs or normalizes their layout. The
price of that choice is that the Racket side must know enough of the relevant C
struct layouts to read the fields used by the decoder. The benefit is that the
binding remains a Racket module with direct access to the platform FFmpeg
libraries.
@subsection{Versioned C struct layouts}
The module defines only partial FFmpeg structs. A partial definition includes
the fields that are actually read by this decoder and enough preceding fields to
compute their offsets. Fields that are not needed are represented only by their
C type, or by a repetition count such as @racket[(6 _int)]@elem{.} Tail fields
after the last required member are not described.
The helper module @filepath{private/cstruct-helper.rkt} provides
@racket[make-offsets] and @racket[def-cstruct]@elem{.} The
@racket[make-offsets] form computes offsets for a sequence of C field types,
while @racket[def-cstruct] expands to a @racket[define-cstruct] form whose
public fields are placed at those explicit offsets. This keeps the actual
accessors small while still accounting for skipped fields in the C layout.
The right layout is selected when the module is required, after the runtime
FFmpeg major versions have been read from the libraries. For the supported
range, @racket[_AVCodecParameters] uses one layout for
@tt{libavcodec} major version 60 and another for major versions 61 and 62.
Likewise, @racket[_AVFrame] uses one layout for @tt{libavutil} major version
58 and another for major versions 59 and 60. The other partial structs used by
this module are defined with a single layout across the supported versions.
This is why the version check is performed before normal decoder use. The
accessors are correct only for the FFmpeg major-version ranges for which the
partial layouts were written. If a future FFmpeg major release changes a
layout before one of the fields used here, the version range should be extended
only after the affected partial definitions have been checked.
@subsection{Sequential failure handling}
Most FFmpeg calls report ordinary failure through C-style return values or null
pointers. The implementation treats those results as normal control flow, not
as exceptional Racket failures. The @racket[let/assert] form is used for this
pattern. It behaves like a sequential binding form: each binding can be checked
immediately, and a failed check returns the specified failure value for the
whole form.
That style is used for setup paths such as opening a file, selecting stream
information, allocating the codec context, and initializing the resampler. It
keeps the success path linear while still giving each FFmpeg return value or
pointer a local check. Predicates such as @tt{a-!nullptr?}@elem{,}
@tt{a-nullptr?}@elem{,} @tt{a-true?}@elem{,} and @tt{a->=?} express the usual
FFmpeg checks directly next to the binding that produced the value.
For loops where decoding must stop immediately from a nested position, the
module uses @racket[define/return] from @racketmodname[define-return]@elem{.}
This gives functions such as @racket[fmpg-decode-next!] and the internal
resampler drain routine an explicit early-return continuation without using
exceptions for normal FFmpeg outcomes. The two helpers are implementation
dependencies; they are not re-exported by this module.
@section{Decoder instances}
A decoder instance is an opaque value returned by @racket[fmpg-init]@elem{.}
Its structure type and predicate are not exported. Pass the value back to the
functions in this module and do not inspect it directly. The contracts below
therefore use @racket[any/c] for the instance argument. Operationally, that
argument must be a value returned by @racket[fmpg-init]@elem{.}
The instance owns native FFmpeg resources: a format context, a codec context,
an audio frame, a resampler, and the Racket byte string used for the current
PCM block. Finalizers are installed as a last line of defence, but callers
should still call @racket[fmpg-close!] explicitly when playback stops or when
the file is no longer needed. Explicit close keeps the lifetime of native
resources predictable.
@defproc[(fmpg-init) any/c]{
Creates a new decoder instance. The result is an opaque instance value, or
@racket[#f] if the instance could not be created.
Creating the instance does not open a file. Use @racket[fmpg-open-file!]
before querying stream information or decoding audio.
}
@defproc[(fmpg-open-file! [instance any/c]
[filename (or/c path? string?)])
(integer-in 0 1)]{
Opens @racket[filename] on @racket[instance]@elem{,} reads the stream
information, selects the best audio stream, initializes the codec context, and
initializes the resampler.
The function returns @racket[1] on success and @racket[0] on failure. On
failure, partially initialized native state is closed again.
An instance can only have one file open. Close it with @racket[fmpg-close!]
before opening another file on the same instance. A non-string, non-path
filename is treated as an open failure and returns @racket[0]@elem{.}
}
@defproc[(fmpg-close! [instance any/c]) void?]{
Closes @racket[instance] if it is open and releases the native FFmpeg resources
owned by the instance. The stored audio information is reset. Calling this
function with @racket[#f] or with an already closed instance is harmless.
}
@defproc[(fmpg-is-open [instance any/c]) (integer-in 0 1)]{
Returns @racket[1] when @racket[instance] is ready for decoding and
@racket[0] otherwise. An instance is ready only after a file has been opened,
a usable audio stream has been selected, and the decoder and resampler have
been initialized.
}
@section{Audio stream information}
The decoder selects one audio stream for playback using FFmpeg's best-stream
selection. The stream count reports how many audio streams were found in the
container, but decoding is performed only for the selected stream.
The term @italic{sample} in this module means a sample frame: one time step in
the audio stream, across all channels. For stereo 32-bit output, one sample
frame therefore occupies @racket[(* 2 4)] bytes in the returned PCM buffer.
@defproc[(fmpg-audio-stream-count [instance any/c])
exact-nonnegative-integer?]{
Returns the number of audio streams in the open container. If the instance is
not open, the result is @racket[0]@elem{.}
}
@defproc[(fmpg-audio-sample-rate [instance any/c])
exact-nonnegative-integer?]{
Returns the selected audio stream's sample rate. If the instance is not ready,
the result is @racket[0]@elem{.}
}
@defproc[(fmpg-audio-channels [instance any/c])
exact-nonnegative-integer?]{
Returns the selected audio stream's channel count. If the instance is not
ready, the result is @racket[0]@elem{.}
}
@defproc[(fmpg-audio-bits-per-sample [instance any/c])
exact-positive-integer?]{
Returns the fixed output sample width in bits. The current output format is
32-bit signed PCM, so this function returns @racket[32]@elem{.} The value is
independent of the input file's original sample format and does not depend on
the instance state.
}
@defproc[(fmpg-audio-bytes-per-sample [instance any/c])
exact-positive-integer?]{
Returns the fixed output sample width in bytes. The current output format is
32-bit signed PCM, so this function returns @racket[4]@elem{.} The value is
independent of the input file's original sample format and does not depend on
the instance state.
}
@defproc[(fmpg-duration-ms [instance any/c]) exact-integer?]{
Returns the duration of the selected audio stream in milliseconds. If the
stream duration is not available, the container duration is used as a fallback.
If no duration can be determined, or when the instance is not ready, the result
is @racket[-1]@elem{.}
}
@defproc[(fmpg-duration-samples [instance any/c]) exact-integer?]{
Returns the duration of the selected audio stream in sample frames. If the
stream duration is not available, the container duration is used as a fallback.
If no duration can be determined, or when the instance is not ready, the result
is @racket[-1]@elem{.}
}
@defproc[(fmpg-file-bitrate [instance any/c]) exact-integer?]{
Returns the container bitrate in bits per second. If the bitrate is
unavailable or if the instance is not open, the result is @racket[-1]@elem{.}
}
@section{Decoding}
Decoding is block oriented. Each call to @racket[fmpg-decode-next!] clears the
previous PCM block and attempts to produce the next decoded block for the
selected audio stream. When the call returns @racket[1]@elem{,} the block can
be read with @racket[fmpg-buffer] and described with the buffer query
functions.
@defproc[(fmpg-decode-next! [instance any/c]) (integer-in 0 1)]{
Decodes until a block of PCM output is available or no more output can be
produced. The function returns @racket[1] when @racket[fmpg-buffer] contains a
non-empty PCM block. It returns @racket[0] when the instance is not ready, when
end of stream has been reached, or when FFmpeg reports an unrecoverable decode
error.
The function does not distinguish end of stream from a decode failure. The
intended playback loop treats @racket[0] as no further PCM block available for
this decoder instance.
Internally, decoding receives all currently available frames, reads packets for
the selected audio stream, sends those packets to the codec, converts decoded
frames through @tt{libswresample}@elem{,} and drains the resampler at end of
stream. Non-selected packets are skipped.
}
@defproc[(fmpg-seek-ms! [instance any/c]
[target-pos-ms exact-nonnegative-integer?])
(integer-in 0 1)]{
Seeks the selected audio stream to @racket[target-pos-ms] milliseconds and
resets the decoder and resampler state. The function returns @racket[1] on
success and @racket[0] on failure.
Seeking uses FFmpeg's backward seek flag. After the seek, decoded audio before
the requested target sample is discarded so the next buffer starts at, or as
close as FFmpeg can provide to, the requested position.
}
@section{Decoded buffers}
The PCM buffer belongs to the decoder instance. It is replaced by the next
call to @racket[fmpg-decode-next!]@elem{,} @racket[fmpg-seek-ms!]@elem{,} or
@racket[fmpg-close!]@elem{.} Treat the returned byte string as read-only.
Copy it if it must outlive the next decoder operation or if another component
may mutate it.
@defproc[(fmpg-buffer [instance any/c]) (or/c bytes? #f)]{
Returns the current decoded PCM block as a byte string, or @racket[#f] when no
PCM block is available.
The byte string contains interleaved signed 32-bit samples. Its logical frame
count is available as the difference between @racket[fmpg-buffer-end-sample]
and @racket[fmpg-buffer-start-sample]@elem{.} Its byte size is also available
through @racket[fmpg-buffer-size]@elem{.}
}
@defproc[(fmpg-buffer-size [instance any/c]) exact-nonnegative-integer?]{
Returns the number of valid bytes in the current PCM buffer. If no decoder
state is available, or if the size would not fit in the internal integer range,
the function returns @racket[0]@elem{.}
}
@defproc[(fmpg-buffer-start-sample [instance any/c])
exact-nonnegative-integer?]{
Returns the first sample frame represented by the current PCM buffer. If no
decoder state is available, the result is @racket[0]@elem{.}
}
@defproc[(fmpg-buffer-end-sample [instance any/c])
exact-nonnegative-integer?]{
Returns the half-open end position of the current PCM buffer: the first sample
frame after the current buffer. The number of sample frames in the buffer is
the end position minus @racket[fmpg-buffer-start-sample]@elem{.} If no decoder
state is available, the result is @racket[0]@elem{.}
}
@defproc[(fmpg-sample-position [instance any/c])
exact-nonnegative-integer?]{
Returns the decoder's next sample-frame position after the current output.
During normal decoding it is the same as @racket[fmpg-buffer-end-sample] for
the current buffer. After a seek, it is reset to the target position before
new audio is decoded.
}
@section{FFmpeg version information}
@defproc[(ffmpeg-version [lib (or/c 'avutil 'avcodec 'avformat
'swr 'swresample)])
(list/c exact-nonnegative-integer?
exact-nonnegative-integer?
exact-nonnegative-integer?)]{
Returns the runtime version of one FFmpeg library as a three-element list
containing the major, minor, and micro version numbers. The symbols
@racket['swr] and @racket['swresample] both refer to @tt{libswresample}@elem{.}
The function raises an exception for an unknown library symbol.
}
@section{Use through the decoder frontend}
The direct API above is normally wrapped by @filepath{ffmpeg-ffi.rkt} and by
@filepath{ffmpeg-decoder.rkt}@elem{.} The frontend function
@tt{ffmpeg-open} returns a handle or @racket[#f] when the file does not exist.
Its stream-info callback receives a mutable hash with at least these playback
keys:
@racketblock[
(list 'sample-rate
'channels
'bits-per-sample
'bytes-per-sample
'total-samples
'duration)]
The audio callback receives the same hash extended for the current buffer with
these keys:
@racketblock[
(list 'sample
'current-time)]
The hash is followed by a copied byte string and its valid byte count. The
copy is made by @filepath{ffmpeg-ffi.rkt}@elem{,} not by the low-level buffer
function itself.
The frontend's seek function accepts a percentage of the stream and translates
that percentage to a sample position. The adapter then translates the sample
position to milliseconds and calls @racket[fmpg-seek-ms!]@elem{.} This is why
the low-level module exposes millisecond seeking while the frontend exposes
percentage seeking.
@section{Example}
The following example opens a file, decodes all PCM blocks, and reports their
byte ranges and sample ranges. A real playback loop would pass each buffer to
the audio output layer before requesting the next block.
@racketblock[
(define dec (fmpg-init))
(when (and dec (= (fmpg-open-file! dec "track.ogg") 1))
(printf "~a Hz, ~a channels, ~a ms\n"
(fmpg-audio-sample-rate dec)
(fmpg-audio-channels dec)
(fmpg-duration-ms dec))
(let loop ()
(when (= (fmpg-decode-next! dec) 1)
(define pcm (fmpg-buffer dec))
(define size (fmpg-buffer-size dec))
(define start (fmpg-buffer-start-sample dec))
(define end (fmpg-buffer-end-sample dec))
(printf "decoded ~a bytes, samples [~a, ~a)\n"
size start end)
;; Pass pcm to the audio output layer here, or copy it if needed.
(loop)))
(fmpg-close! dec))
]
A simple seek flow looks the same after the seek succeeds. The following code
moves to 30 seconds and then requests the next decoded buffer.
@racketblock[
(when (= (fmpg-seek-ms! dec 30000) 1)
(when (= (fmpg-decode-next! dec) 1)
(define pcm (fmpg-buffer dec))
(define start (fmpg-buffer-start-sample dec))
(printf "first buffer after seek starts at sample ~a\n" start)))
]