Files
2026-06-08 13:45:54 +02:00

442 lines
19 KiB
Racket

#lang scribble/manual
@(require (for-label racket/base
;racket/contract
racket/path
ffi/unsafe
let-assert
early-return
"../ffmpeg-definitions.rkt"
"../private/cstruct-helper.rkt"))
@title[#:tag "ffmpeg-definitions"]{FFmpeg Decoder Definitions}
@author[@author+email["Hans Dijkema" "hans@dijkewijk.nl"]]
@defmodule[racket-audio/ffmpeg-definitions]
This module provides the direct FFmpeg-backed decoder layer used by the audio
pipeline. It is deliberately small and stateful. A caller creates one decoder
instance, opens one file on it, queries the selected audio stream, repeatedly
asks for the next PCM block, and closes the instance again.
The module does not expose FFmpeg metadata. It only exposes the information
needed for playback: stream count, sample rate, channel count, duration,
bitrate, decoded PCM data, and sample positions. The output format is fixed:
interleaved signed 32-bit PCM, four bytes per sample, using FFmpeg's
@tt{AV_SAMPLE_FMT_S32} sample format.
The FFmpeg libraries are loaded when the module is required. The module checks
that the runtime FFmpeg major versions are in the supported range configured by
the implementation. This binding targets the FFmpeg library major versions
used by FFmpeg 6, 7, and 8: @tt{libavutil} 58 to 60, @tt{libavcodec} 60 to 62,
@tt{libavformat} 60 to 62, and @tt{libswresample} 4 to 6. Unsupported runtime
versions fail early, before a decoder instance is used.
On Windows, the private library loader may download the bundled sound-library
set into Racket's add-on directory before the FFI libraries are opened. On
Unix-like systems, the FFmpeg libraries are expected to be installed by the
operating system or platform package manager and to be reachable by Racket's
FFI library search path.
@section{Layering}
This module is the low-level Racket FFI layer. It is normally wrapped by
@filepath{ffmpeg-ffi.rkt} and then by @filepath{ffmpeg-decoder.rkt}. The first
wrapper adapts this module to the command protocol used by the audio decoder
frontend. The second wrapper exposes the callback-oriented decoder interface
used by the rest of the playback pipeline.
The distinction matters for buffer lifetime. At this level,
@racket[fmpg-buffer] returns the current buffer owned by the decoder instance.
The adapter in @filepath{ffmpeg-ffi.rkt} copies that buffer before passing it to
@filepath{ffmpeg-decoder.rkt}. Code that uses this module directly must copy
the buffer itself when the bytes must survive the next decoder operation.
@section{FFmpeg version information}
@defproc[(ffmpeg-version [lib (or/c 'avutil 'avcodec 'avformat
'swr 'swresample)])
(list/c exact-nonnegative-integer?
exact-nonnegative-integer?
exact-nonnegative-integer?)]{
Returns the runtime version of one FFmpeg library as a three-element list
containing the major, minor, and micro version numbers. The symbols
@racket['swr] and @racket['swresample] both refer to @tt{libswresample}.
The version is read from FFmpeg's packed integer value. For example, a runtime
value corresponding to @tt{62.28.100} is returned as @racket['(62 28 100)].
The function raises an exception for an unknown library symbol.
}
The runtime versions determine which partial FFmpeg struct layouts are safe to
use. If a future FFmpeg major release changes a layout before one of the
fields read by this module, the supported range should be extended only after
the affected partial definitions have been checked.
@section[#:tag "ffmpeg-definitions-implementation-strategy"]{Implementation strategy}
This module talks directly to the FFmpeg shared libraries through Racket's FFI.
There is no C shim that hides FFmpeg's structs or normalizes their layout. The
price of that choice is that the Racket side must know enough of the relevant C
struct layouts to read the fields used by the decoder. The benefit is that the
binding remains a Racket module with direct access to the platform FFmpeg
libraries.
@subsection{C structs and offsets}
Small and stable structures, such as @tt{AVRational} and
@tt{AVChannelLayout}, are described with @racket[define-cstruct]. A
@racket[define-cstruct] form describes the C fields to Racket's FFI. Racket
then calculates the correct field offsets for the current platform ABI and
creates the corresponding pointer type, constructor, accessors and mutators.
The larger FFmpeg structures are handled by @racket[def-cstruct] from
@filepath{private/cstruct-helper.rkt}. Structures such as
@tt{AVCodecParameters}, @tt{AVStream}, @tt{AVFormatContext}, @tt{AVFrame} and
@tt{AVPacket} are large and may differ between FFmpeg major versions. The
decoder only needs a few fields from each one, but those fields must still be
read from their exact native offsets.
The helper solves this by describing the complete field sequence up to the last
field the backend needs. Unnamed entries are used only to advance the offset.
Named entries become generated accessors. Repeated entries such as
@racket[(6 _int)] keep the definition compact while still allowing Racket's FFI
to compute alignment, padding and pointer size correctly. Tail fields after
the last required member are not described.
The right layout is selected when the module is required, after the runtime
FFmpeg major versions have been read from the libraries. For the supported
range, @tt{_AVCodecParameters} uses one layout for @tt{libavcodec} major
version 60 and another for major versions 61 and 62. Likewise,
@tt{_AVFrame} uses one layout for @tt{libavutil} major version 58 and
another for major versions 59 and 60. The other partial structs used by this
module are defined with a single layout across the supported versions.
@subsection{Defensive control flow}
Most FFmpeg calls report ordinary failure through C-style return values or null
pointers. The implementation treats those results as normal control flow. The
@racket[let/assert] form is used for setup paths where each native result must
be checked before the next native call is made. It behaves like a sequential
binding form: each binding can be checked immediately, and a failed check
returns the specified failure value for the whole form.
That style is used for opening a file, selecting stream information, allocating
the codec context, and initializing the resampler. Predicates such as
@tt{a-!nullptr?}, @tt{a-nullptr?}, @tt{a-true?}, and @tt{a->=?} express the
usual FFmpeg checks directly next to the binding that produced the value.
The decode and seek paths also use @racket[early-return] where processing must
stop immediately from a nested position. This keeps the normal FFmpeg outcomes
away from exception-based control flow while still making cleanup actions local
to the point where a failure can occur.
@section{Decoder instances}
A decoder instance is an opaque value returned by @racket[fmpg-init]. Its
structure type and predicate are not exported. Pass the value back to the
functions in this module and do not inspect it directly. The contracts below
therefore use @racket[any/c] for the instance argument. Operationally, that
argument must be a value returned by @racket[fmpg-init].
The instance owns native FFmpeg resources: a format context, a codec context,
an audio frame, a resampler, and the Racket byte string used for the current
PCM block. Finalizers are installed as a last line of defence, but callers
should still call @racket[fmpg-close!] explicitly when playback stops or when
the file is no longer needed. Explicit close keeps the lifetime of native
resources predictable.
@defproc[(fmpg-init) any/c]{
Creates a new decoder instance. The result is an opaque instance value, or
@racket[#f] if the instance could not be created.
Creating the instance does not open a file. Use @racket[fmpg-open-file!]
before querying stream information or decoding audio.
}
@defproc[(fmpg-open-file! [instance any/c]
[filename (or/c path? string?)])
(integer-in 0 1)]{
Opens @racket[filename] on @racket[instance], reads the stream information,
selects the best audio stream, initializes the codec context, and initializes
the resampler.
The function returns @racket[1] on success and @racket[0] on failure. On
failure, partially initialized native state is closed again. A non-string,
non-path filename is treated as an open failure and returns @racket[0].
An instance can only have one file open. Close it with @racket[fmpg-close!]
before opening another file on the same instance.
}
@defproc[(fmpg-close! [instance any/c]) void?]{
Closes @racket[instance] if it is open and releases the native FFmpeg resources
owned by the instance. The codec context, frame and resampler are freed before
the format context is closed. This order avoids keeping decoder pointers that
refer to streams from an already closed container.
The stored audio information is reset. Calling this function with @racket[#f]
or with an already closed instance is harmless.
}
@defproc[(fmpg-is-open [instance any/c]) (integer-in 0 1)]{
Returns @racket[1] when @racket[instance] is ready for decoding and @racket[0]
otherwise. An instance is ready only after a file has been opened, a usable
audio stream has been selected, and the decoder and resampler have been
initialized.
}
@section{Audio stream information}
The decoder selects one audio stream for playback using FFmpeg's best-stream
selection. The stream count reports how many audio streams were found in the
container, but decoding is performed only for the selected stream.
The term @italic{sample} in this module means a sample frame: one time step in
the audio stream, across all channels. For stereo 32-bit output, one sample
frame therefore occupies @racket[(* 2 4)] bytes in the returned PCM buffer.
@defproc[(fmpg-audio-stream-count [instance any/c])
exact-nonnegative-integer?]{
Returns the number of audio streams in the open container. If the instance is
not open, the result is @racket[0]. This count is informational; actual stream
selection is performed during @racket[fmpg-open-file!].
}
@deftogether[
(@defproc[(fmpg-audio-sample-rate [instance any/c])
exact-nonnegative-integer?]
@defproc[(fmpg-audio-channels [instance any/c])
exact-nonnegative-integer?])]{
Return the sample rate and channel count of the selected audio stream. If the
instance is not ready, both functions return @racket[0].
}
@deftogether[
(@defproc[(fmpg-audio-bits-per-sample [instance any/c])
exact-positive-integer?]
@defproc[(fmpg-audio-bytes-per-sample [instance any/c])
exact-positive-integer?])]{
Return the fixed output sample width in bits and bytes. The current output
format is 32-bit signed PCM, so @racket[fmpg-audio-bits-per-sample] returns
@racket[32] and @racket[fmpg-audio-bytes-per-sample] returns @racket[4]. The
values are independent of the input file's original sample format and do not
depend on the instance state.
}
@deftogether[
(@defproc[(fmpg-duration-ms [instance any/c]) exact-integer?]
@defproc[(fmpg-duration-samples [instance any/c]) exact-integer?])]{
Return the duration of the selected audio stream in milliseconds and in sample
frames. If the stream duration is not available, the container duration is
used as a fallback. If no duration can be determined, or when the instance is
not ready, the result is @racket[-1].
}
@defproc[(fmpg-file-bitrate [instance any/c]) exact-integer?]{
Returns the container bitrate in bits per second. If the bitrate is unavailable
or if the instance is not open, the result is @racket[-1]. Only positive
FFmpeg bitrates are passed through as reliable.
}
@section{Output format}
The decoder output format is intentionally fixed:
@itemlist[
#:style 'compact
@item{sample format: signed 32-bit PCM, @tt{AV_SAMPLE_FMT_S32}}
@item{layout: interleaved}
@item{sample rate: the selected stream's sample rate}
@item{channels: the selected stream's channel count}
]
This keeps the playback layer simple. The FFmpeg input format may be planar,
floating point, compressed, or otherwise different; @tt{libswresample} converts
the decoded frames to the fixed output format before the bytes are exposed to
Racket.
@section{Decoding}
Decoding is block oriented. Each call to @racket[fmpg-decode-next!] clears the
previous PCM block and attempts to produce the next decoded block for the
selected audio stream. When the call returns @racket[1], the block can be read
with @racket[fmpg-buffer] and described with the buffer query functions.
@defproc[(fmpg-decode-next! [instance any/c]) exact-integer?]{
Decodes until a block of PCM output is available, end of stream is reached, or
an error occurs. The return values are:
@itemlist[
#:style 'compact
@item{@racket[1]: a new PCM buffer is available through @racket[fmpg-buffer].}
@item{@racket[0]: decoding is complete and no more PCM is available.}
@item{A negative value: decoding failed or the instance was not ready.}
]
Internally, the decoder first tries to receive frames that FFmpeg may already
have buffered. If no frame is ready, it reads packets until it finds a packet
for the selected audio stream. Packets from other streams are skipped and
immediately unreferenced. Sent packets are unreferenced after
@tt{avcodec_send_packet}, because the codec has then taken what it needs.
At end of input, the function drains both the codec and the resampler. This is
necessary because FFmpeg and @tt{libswresample} may still hold delayed samples
even after the demuxer has no more packets.
}
@section{Decoded buffers}
The PCM buffer belongs to the decoder instance. It is replaced by the next
call to @racket[fmpg-decode-next!], @racket[fmpg-seek-ms!], or
@racket[fmpg-close!]. Treat the returned byte string as read-only. Copy it if
it must outlive the next decoder operation or if another component may mutate
it.
@defproc[(fmpg-buffer [instance any/c]) (or/c bytes? #f)]{
Returns the current decoded PCM block as a byte string, or @racket[#f] when no
PCM block is available.
The byte string contains interleaved signed 32-bit samples. Its logical frame
count is available as the difference between @racket[fmpg-buffer-end-sample]
and @racket[fmpg-buffer-start-sample]. Its byte size is also available through
@racket[fmpg-buffer-size].
}
@defproc[(fmpg-buffer-size [instance any/c]) exact-nonnegative-integer?]{
Returns the number of valid bytes in the current PCM buffer. If no decoder
state is available, or if the size would not fit in the internal integer range,
the function returns @racket[0].
}
@deftogether[
(@defproc[(fmpg-buffer-start-sample [instance any/c])
exact-nonnegative-integer?]
@defproc[(fmpg-buffer-end-sample [instance any/c])
exact-nonnegative-integer?]
@defproc[(fmpg-sample-position [instance any/c])
exact-nonnegative-integer?])]{
Return sample-frame positions for the current decoder state.
@racket[fmpg-buffer-start-sample] returns the first sample frame represented by
the current PCM buffer. @racket[fmpg-buffer-end-sample] returns the half-open
end position: the first sample frame after the current buffer.
@racket[fmpg-sample-position] returns the next sample position the decoder
expects to produce.
These values count sample frames, not individual channel samples. For stereo
audio, one sample frame contains one sample for the left channel and one sample
for the right channel.
}
@section[#:tag "ffmpeg-definitions-seeking"]{Seeking}
@defproc[(fmpg-seek-ms! [instance any/c]
[target-pos-ms exact-nonnegative-integer?])
(integer-in 0 1)]{
Seeks the selected audio stream to @racket[target-pos-ms] milliseconds and
resets the decoder and resampler state. The function returns @racket[1] on
success and @racket[0] on failure. Seeking is allowed only when the instance
is already ready for decoding and the target position is non-negative.
Seeking uses FFmpeg's backward seek flag. FFmpeg may therefore seek to a packet
position before the requested target. The decoder stores a discard target in
sample frames. During the following decode calls, frames before the target are
dropped, and frames that overlap the target are trimmed so the exposed PCM
buffer starts at, or as close as FFmpeg can provide to, the requested position.
After a successful seek, the codec buffers are flushed, the resampler is closed
and reinitialized, EOF state is cleared, and sample bookkeeping is reset to the
target position.
}
@section{Resource ownership}
The decoder instance owns the native FFmpeg objects it allocates. The codec
pointer returned by FFmpeg is not owned by the instance, but the codec context,
frame, resampler and format context are. They are released by
@racket[fmpg-close!]. Finalizers are registered as a safety net, but callers
should close decoder instances explicitly.
Temporary native buffers used during resampling are allocated only for the
duration of a conversion step and are always freed before control returns to the
caller. The public PCM buffer is a Racket byte string, so it can safely be
passed to the Racket-side playback backend.
@section{Use through the decoder frontend}
The direct API above is normally wrapped by @filepath{ffmpeg-ffi.rkt} and by
@filepath{ffmpeg-decoder.rkt}. The frontend function @tt{ffmpeg-open} returns
a handle or @racket[#f] when the file does not exist. Its stream-info callback
receives a mutable hash with at least these playback keys:
@racketblock[
(list 'sample-rate
'channels
'bits-per-sample
'bytes-per-sample
'total-samples
'duration)]
The audio callback receives the same hash extended for the current buffer with
these keys:
@racketblock[
(list 'sample
'current-time)]
The hash is followed by a copied byte string and its valid byte count. The
copy is made by @filepath{ffmpeg-ffi.rkt}, not by the low-level buffer function
itself.
The frontend's seek function accepts a percentage of the stream and translates
that percentage to a sample position. The adapter then translates the sample
position to milliseconds and calls @racket[fmpg-seek-ms!]. This is why the
low-level module exposes millisecond seeking while the frontend exposes
percentage seeking.
@section{Examples}
The following example opens a file, decodes all PCM blocks, and reports their
byte ranges and sample ranges. A real playback loop would pass each buffer to
the audio output layer before requesting the next block.
@racketblock[
(define dec (fmpg-init))
(when (and dec (= (fmpg-open-file! dec "track.ogg") 1))
(printf "~a Hz, ~a channels, ~a ms\n"
(fmpg-audio-sample-rate dec)
(fmpg-audio-channels dec)
(fmpg-duration-ms dec))
(let loop ()
(case (fmpg-decode-next! dec)
[(1)
(define pcm (fmpg-buffer dec))
(define size (fmpg-buffer-size dec))
(define start (fmpg-buffer-start-sample dec))
(define end (fmpg-buffer-end-sample dec))
(printf "decoded ~a bytes, samples [~a, ~a)\n"
size start end)
;; Pass pcm to the audio output layer here, or copy it if needed.
(loop)]
[(0)
(printf "done\n")]
[else
(error "decode error")]))
(fmpg-close! dec))
]
A simple seek flow looks the same after the seek succeeds. The following code
moves to 30 seconds and then requests the next decoded buffer.
@racketblock[
(when (= (fmpg-seek-ms! dec 30000) 1)
(when (= (fmpg-decode-next! dec) 1)
(define pcm (fmpg-buffer dec))
(define start (fmpg-buffer-start-sample dec))
(printf "first buffer after seek starts at sample ~a\n" start)))
]