442 lines
19 KiB
Racket
442 lines
19 KiB
Racket
#lang scribble/manual
|
|
|
|
@(require (for-label racket/base
|
|
;racket/contract
|
|
racket/path
|
|
ffi/unsafe
|
|
let-assert
|
|
early-return
|
|
"../ffmpeg-definitions.rkt"
|
|
"../private/cstruct-helper.rkt"))
|
|
|
|
@title[#:tag "ffmpeg-definitions"]{FFmpeg Decoder Definitions}
|
|
@author[@author+email["Hans Dijkema" "hans@dijkewijk.nl"]]
|
|
|
|
@defmodule[racket-audio/ffmpeg-definitions]
|
|
|
|
This module provides the direct FFmpeg-backed decoder layer used by the audio
|
|
pipeline. It is deliberately small and stateful. A caller creates one decoder
|
|
instance, opens one file on it, queries the selected audio stream, repeatedly
|
|
asks for the next PCM block, and closes the instance again.
|
|
|
|
The module does not expose FFmpeg metadata. It only exposes the information
|
|
needed for playback: stream count, sample rate, channel count, duration,
|
|
bitrate, decoded PCM data, and sample positions. The output format is fixed:
|
|
interleaved signed 32-bit PCM, four bytes per sample, using FFmpeg's
|
|
@tt{AV_SAMPLE_FMT_S32} sample format.
|
|
|
|
The FFmpeg libraries are loaded when the module is required. The module checks
|
|
that the runtime FFmpeg major versions are in the supported range configured by
|
|
the implementation. This binding targets the FFmpeg library major versions
|
|
used by FFmpeg 6, 7, and 8: @tt{libavutil} 58 to 60, @tt{libavcodec} 60 to 62,
|
|
@tt{libavformat} 60 to 62, and @tt{libswresample} 4 to 6. Unsupported runtime
|
|
versions fail early, before a decoder instance is used.
|
|
|
|
On Windows, the private library loader may download the bundled sound-library
|
|
set into Racket's add-on directory before the FFI libraries are opened. On
|
|
Unix-like systems, the FFmpeg libraries are expected to be installed by the
|
|
operating system or platform package manager and to be reachable by Racket's
|
|
FFI library search path.
|
|
|
|
@section{Layering}
|
|
|
|
This module is the low-level Racket FFI layer. It is normally wrapped by
|
|
@filepath{ffmpeg-ffi.rkt} and then by @filepath{ffmpeg-decoder.rkt}. The first
|
|
wrapper adapts this module to the command protocol used by the audio decoder
|
|
frontend. The second wrapper exposes the callback-oriented decoder interface
|
|
used by the rest of the playback pipeline.
|
|
|
|
The distinction matters for buffer lifetime. At this level,
|
|
@racket[fmpg-buffer] returns the current buffer owned by the decoder instance.
|
|
The adapter in @filepath{ffmpeg-ffi.rkt} copies that buffer before passing it to
|
|
@filepath{ffmpeg-decoder.rkt}. Code that uses this module directly must copy
|
|
the buffer itself when the bytes must survive the next decoder operation.
|
|
|
|
@section{FFmpeg version information}
|
|
|
|
@defproc[(ffmpeg-version [lib (or/c 'avutil 'avcodec 'avformat
|
|
'swr 'swresample)])
|
|
(list/c exact-nonnegative-integer?
|
|
exact-nonnegative-integer?
|
|
exact-nonnegative-integer?)]{
|
|
Returns the runtime version of one FFmpeg library as a three-element list
|
|
containing the major, minor, and micro version numbers. The symbols
|
|
@racket['swr] and @racket['swresample] both refer to @tt{libswresample}.
|
|
|
|
The version is read from FFmpeg's packed integer value. For example, a runtime
|
|
value corresponding to @tt{62.28.100} is returned as @racket['(62 28 100)].
|
|
The function raises an exception for an unknown library symbol.
|
|
}
|
|
|
|
The runtime versions determine which partial FFmpeg struct layouts are safe to
|
|
use. If a future FFmpeg major release changes a layout before one of the
|
|
fields read by this module, the supported range should be extended only after
|
|
the affected partial definitions have been checked.
|
|
|
|
@section{Implementation strategy}
|
|
|
|
This module talks directly to the FFmpeg shared libraries through Racket's FFI.
|
|
There is no C shim that hides FFmpeg's structs or normalizes their layout. The
|
|
price of that choice is that the Racket side must know enough of the relevant C
|
|
struct layouts to read the fields used by the decoder. The benefit is that the
|
|
binding remains a Racket module with direct access to the platform FFmpeg
|
|
libraries.
|
|
|
|
@subsection{C structs and offsets}
|
|
|
|
Small and stable structures, such as @tt{AVRational} and
|
|
@tt{AVChannelLayout}, are described with @racket[define-cstruct]. A
|
|
@racket[define-cstruct] form describes the C fields to Racket's FFI. Racket
|
|
then calculates the correct field offsets for the current platform ABI and
|
|
creates the corresponding pointer type, constructor, accessors and mutators.
|
|
|
|
The larger FFmpeg structures are handled by @racket[def-cstruct] from
|
|
@filepath{private/cstruct-helper.rkt}. Structures such as
|
|
@tt{AVCodecParameters}, @tt{AVStream}, @tt{AVFormatContext}, @tt{AVFrame} and
|
|
@tt{AVPacket} are large and may differ between FFmpeg major versions. The
|
|
decoder only needs a few fields from each one, but those fields must still be
|
|
read from their exact native offsets.
|
|
|
|
The helper solves this by describing the complete field sequence up to the last
|
|
field the backend needs. Unnamed entries are used only to advance the offset.
|
|
Named entries become generated accessors. Repeated entries such as
|
|
@racket[(6 _int)] keep the definition compact while still allowing Racket's FFI
|
|
to compute alignment, padding and pointer size correctly. Tail fields after
|
|
the last required member are not described.
|
|
|
|
The right layout is selected when the module is required, after the runtime
|
|
FFmpeg major versions have been read from the libraries. For the supported
|
|
range, @tt{_AVCodecParameters} uses one layout for @tt{libavcodec} major
|
|
version 60 and another for major versions 61 and 62. Likewise,
|
|
@tt{_AVFrame} uses one layout for @tt{libavutil} major version 58 and
|
|
another for major versions 59 and 60. The other partial structs used by this
|
|
module are defined with a single layout across the supported versions.
|
|
|
|
@subsection{Defensive control flow}
|
|
|
|
Most FFmpeg calls report ordinary failure through C-style return values or null
|
|
pointers. The implementation treats those results as normal control flow. The
|
|
@racket[let/assert] form is used for setup paths where each native result must
|
|
be checked before the next native call is made. It behaves like a sequential
|
|
binding form: each binding can be checked immediately, and a failed check
|
|
returns the specified failure value for the whole form.
|
|
|
|
That style is used for opening a file, selecting stream information, allocating
|
|
the codec context, and initializing the resampler. Predicates such as
|
|
@tt{a-!nullptr?}, @tt{a-nullptr?}, @tt{a-true?}, and @tt{a->=?} express the
|
|
usual FFmpeg checks directly next to the binding that produced the value.
|
|
|
|
The decode and seek paths also use @racket[early-return] where processing must
|
|
stop immediately from a nested position. This keeps the normal FFmpeg outcomes
|
|
away from exception-based control flow while still making cleanup actions local
|
|
to the point where a failure can occur.
|
|
|
|
@section{Decoder instances}
|
|
|
|
A decoder instance is an opaque value returned by @racket[fmpg-init]. Its
|
|
structure type and predicate are not exported. Pass the value back to the
|
|
functions in this module and do not inspect it directly. The contracts below
|
|
therefore use @racket[any/c] for the instance argument. Operationally, that
|
|
argument must be a value returned by @racket[fmpg-init].
|
|
|
|
The instance owns native FFmpeg resources: a format context, a codec context,
|
|
an audio frame, a resampler, and the Racket byte string used for the current
|
|
PCM block. Finalizers are installed as a last line of defence, but callers
|
|
should still call @racket[fmpg-close!] explicitly when playback stops or when
|
|
the file is no longer needed. Explicit close keeps the lifetime of native
|
|
resources predictable.
|
|
|
|
@defproc[(fmpg-init) any/c]{
|
|
Creates a new decoder instance. The result is an opaque instance value, or
|
|
@racket[#f] if the instance could not be created.
|
|
|
|
Creating the instance does not open a file. Use @racket[fmpg-open-file!]
|
|
before querying stream information or decoding audio.
|
|
}
|
|
|
|
@defproc[(fmpg-open-file! [instance any/c]
|
|
[filename (or/c path? string?)])
|
|
(integer-in 0 1)]{
|
|
Opens @racket[filename] on @racket[instance], reads the stream information,
|
|
selects the best audio stream, initializes the codec context, and initializes
|
|
the resampler.
|
|
|
|
The function returns @racket[1] on success and @racket[0] on failure. On
|
|
failure, partially initialized native state is closed again. A non-string,
|
|
non-path filename is treated as an open failure and returns @racket[0].
|
|
|
|
An instance can only have one file open. Close it with @racket[fmpg-close!]
|
|
before opening another file on the same instance.
|
|
}
|
|
|
|
@defproc[(fmpg-close! [instance any/c]) void?]{
|
|
Closes @racket[instance] if it is open and releases the native FFmpeg resources
|
|
owned by the instance. The codec context, frame and resampler are freed before
|
|
the format context is closed. This order avoids keeping decoder pointers that
|
|
refer to streams from an already closed container.
|
|
|
|
The stored audio information is reset. Calling this function with @racket[#f]
|
|
or with an already closed instance is harmless.
|
|
}
|
|
|
|
@defproc[(fmpg-is-open [instance any/c]) (integer-in 0 1)]{
|
|
Returns @racket[1] when @racket[instance] is ready for decoding and @racket[0]
|
|
otherwise. An instance is ready only after a file has been opened, a usable
|
|
audio stream has been selected, and the decoder and resampler have been
|
|
initialized.
|
|
}
|
|
|
|
@section{Audio stream information}
|
|
|
|
The decoder selects one audio stream for playback using FFmpeg's best-stream
|
|
selection. The stream count reports how many audio streams were found in the
|
|
container, but decoding is performed only for the selected stream.
|
|
|
|
The term @italic{sample} in this module means a sample frame: one time step in
|
|
the audio stream, across all channels. For stereo 32-bit output, one sample
|
|
frame therefore occupies @racket[(* 2 4)] bytes in the returned PCM buffer.
|
|
|
|
@defproc[(fmpg-audio-stream-count [instance any/c])
|
|
exact-nonnegative-integer?]{
|
|
Returns the number of audio streams in the open container. If the instance is
|
|
not open, the result is @racket[0]. This count is informational; actual stream
|
|
selection is performed during @racket[fmpg-open-file!].
|
|
}
|
|
|
|
@deftogether[
|
|
(@defproc[(fmpg-audio-sample-rate [instance any/c])
|
|
exact-nonnegative-integer?]
|
|
@defproc[(fmpg-audio-channels [instance any/c])
|
|
exact-nonnegative-integer?])]{
|
|
Return the sample rate and channel count of the selected audio stream. If the
|
|
instance is not ready, both functions return @racket[0].
|
|
}
|
|
|
|
@deftogether[
|
|
(@defproc[(fmpg-audio-bits-per-sample [instance any/c])
|
|
exact-positive-integer?]
|
|
@defproc[(fmpg-audio-bytes-per-sample [instance any/c])
|
|
exact-positive-integer?])]{
|
|
Return the fixed output sample width in bits and bytes. The current output
|
|
format is 32-bit signed PCM, so @racket[fmpg-audio-bits-per-sample] returns
|
|
@racket[32] and @racket[fmpg-audio-bytes-per-sample] returns @racket[4]. The
|
|
values are independent of the input file's original sample format and do not
|
|
depend on the instance state.
|
|
}
|
|
|
|
@deftogether[
|
|
(@defproc[(fmpg-duration-ms [instance any/c]) exact-integer?]
|
|
@defproc[(fmpg-duration-samples [instance any/c]) exact-integer?])]{
|
|
Return the duration of the selected audio stream in milliseconds and in sample
|
|
frames. If the stream duration is not available, the container duration is
|
|
used as a fallback. If no duration can be determined, or when the instance is
|
|
not ready, the result is @racket[-1].
|
|
}
|
|
|
|
@defproc[(fmpg-file-bitrate [instance any/c]) exact-integer?]{
|
|
Returns the container bitrate in bits per second. If the bitrate is unavailable
|
|
or if the instance is not open, the result is @racket[-1]. Only positive
|
|
FFmpeg bitrates are passed through as reliable.
|
|
}
|
|
|
|
@section{Output format}
|
|
|
|
The decoder output format is intentionally fixed:
|
|
|
|
@itemlist[
|
|
#:style 'compact
|
|
@item{sample format: signed 32-bit PCM, @tt{AV_SAMPLE_FMT_S32}}
|
|
@item{layout: interleaved}
|
|
@item{sample rate: the selected stream's sample rate}
|
|
@item{channels: the selected stream's channel count}
|
|
]
|
|
|
|
This keeps the playback layer simple. The FFmpeg input format may be planar,
|
|
floating point, compressed, or otherwise different; @tt{libswresample} converts
|
|
the decoded frames to the fixed output format before the bytes are exposed to
|
|
Racket.
|
|
|
|
@section{Decoding}
|
|
|
|
Decoding is block oriented. Each call to @racket[fmpg-decode-next!] clears the
|
|
previous PCM block and attempts to produce the next decoded block for the
|
|
selected audio stream. When the call returns @racket[1], the block can be read
|
|
with @racket[fmpg-buffer] and described with the buffer query functions.
|
|
|
|
@defproc[(fmpg-decode-next! [instance any/c]) exact-integer?]{
|
|
Decodes until a block of PCM output is available, end of stream is reached, or
|
|
an error occurs. The return values are:
|
|
|
|
@itemlist[
|
|
#:style 'compact
|
|
@item{@racket[1]: a new PCM buffer is available through @racket[fmpg-buffer].}
|
|
@item{@racket[0]: decoding is complete and no more PCM is available.}
|
|
@item{A negative value: decoding failed or the instance was not ready.}
|
|
]
|
|
|
|
Internally, the decoder first tries to receive frames that FFmpeg may already
|
|
have buffered. If no frame is ready, it reads packets until it finds a packet
|
|
for the selected audio stream. Packets from other streams are skipped and
|
|
immediately unreferenced. Sent packets are unreferenced after
|
|
@tt{avcodec_send_packet}, because the codec has then taken what it needs.
|
|
|
|
At end of input, the function drains both the codec and the resampler. This is
|
|
necessary because FFmpeg and @tt{libswresample} may still hold delayed samples
|
|
even after the demuxer has no more packets.
|
|
}
|
|
|
|
@section{Decoded buffers}
|
|
|
|
The PCM buffer belongs to the decoder instance. It is replaced by the next
|
|
call to @racket[fmpg-decode-next!], @racket[fmpg-seek-ms!], or
|
|
@racket[fmpg-close!]. Treat the returned byte string as read-only. Copy it if
|
|
it must outlive the next decoder operation or if another component may mutate
|
|
it.
|
|
|
|
@defproc[(fmpg-buffer [instance any/c]) (or/c bytes? #f)]{
|
|
Returns the current decoded PCM block as a byte string, or @racket[#f] when no
|
|
PCM block is available.
|
|
|
|
The byte string contains interleaved signed 32-bit samples. Its logical frame
|
|
count is available as the difference between @racket[fmpg-buffer-end-sample]
|
|
and @racket[fmpg-buffer-start-sample]. Its byte size is also available through
|
|
@racket[fmpg-buffer-size].
|
|
}
|
|
|
|
@defproc[(fmpg-buffer-size [instance any/c]) exact-nonnegative-integer?]{
|
|
Returns the number of valid bytes in the current PCM buffer. If no decoder
|
|
state is available, or if the size would not fit in the internal integer range,
|
|
the function returns @racket[0].
|
|
}
|
|
|
|
@deftogether[
|
|
(@defproc[(fmpg-buffer-start-sample [instance any/c])
|
|
exact-nonnegative-integer?]
|
|
@defproc[(fmpg-buffer-end-sample [instance any/c])
|
|
exact-nonnegative-integer?]
|
|
@defproc[(fmpg-sample-position [instance any/c])
|
|
exact-nonnegative-integer?])]{
|
|
Return sample-frame positions for the current decoder state.
|
|
|
|
@racket[fmpg-buffer-start-sample] returns the first sample frame represented by
|
|
the current PCM buffer. @racket[fmpg-buffer-end-sample] returns the half-open
|
|
end position: the first sample frame after the current buffer.
|
|
@racket[fmpg-sample-position] returns the next sample position the decoder
|
|
expects to produce.
|
|
|
|
These values count sample frames, not individual channel samples. For stereo
|
|
audio, one sample frame contains one sample for the left channel and one sample
|
|
for the right channel.
|
|
}
|
|
|
|
@section{Seeking}
|
|
|
|
@defproc[(fmpg-seek-ms! [instance any/c]
|
|
[target-pos-ms exact-nonnegative-integer?])
|
|
(integer-in 0 1)]{
|
|
Seeks the selected audio stream to @racket[target-pos-ms] milliseconds and
|
|
resets the decoder and resampler state. The function returns @racket[1] on
|
|
success and @racket[0] on failure. Seeking is allowed only when the instance
|
|
is already ready for decoding and the target position is non-negative.
|
|
|
|
Seeking uses FFmpeg's backward seek flag. FFmpeg may therefore seek to a packet
|
|
position before the requested target. The decoder stores a discard target in
|
|
sample frames. During the following decode calls, frames before the target are
|
|
dropped, and frames that overlap the target are trimmed so the exposed PCM
|
|
buffer starts at, or as close as FFmpeg can provide to, the requested position.
|
|
|
|
After a successful seek, the codec buffers are flushed, the resampler is closed
|
|
and reinitialized, EOF state is cleared, and sample bookkeeping is reset to the
|
|
target position.
|
|
}
|
|
|
|
@section{Resource ownership}
|
|
|
|
The decoder instance owns the native FFmpeg objects it allocates. The codec
|
|
pointer returned by FFmpeg is not owned by the instance, but the codec context,
|
|
frame, resampler and format context are. They are released by
|
|
@racket[fmpg-close!]. Finalizers are registered as a safety net, but callers
|
|
should close decoder instances explicitly.
|
|
|
|
Temporary native buffers used during resampling are allocated only for the
|
|
duration of a conversion step and are always freed before control returns to the
|
|
caller. The public PCM buffer is a Racket byte string, so it can safely be
|
|
passed to the Racket-side playback backend.
|
|
|
|
@section{Use through the decoder frontend}
|
|
|
|
The direct API above is normally wrapped by @filepath{ffmpeg-ffi.rkt} and by
|
|
@filepath{ffmpeg-decoder.rkt}. The frontend function @tt{ffmpeg-open} returns
|
|
a handle or @racket[#f] when the file does not exist. Its stream-info callback
|
|
receives a mutable hash with at least these playback keys:
|
|
|
|
@racketblock[
|
|
(list 'sample-rate
|
|
'channels
|
|
'bits-per-sample
|
|
'bytes-per-sample
|
|
'total-samples
|
|
'duration)]
|
|
|
|
The audio callback receives the same hash extended for the current buffer with
|
|
these keys:
|
|
|
|
@racketblock[
|
|
(list 'sample
|
|
'current-time)]
|
|
|
|
The hash is followed by a copied byte string and its valid byte count. The
|
|
copy is made by @filepath{ffmpeg-ffi.rkt}, not by the low-level buffer function
|
|
itself.
|
|
|
|
The frontend's seek function accepts a percentage of the stream and translates
|
|
that percentage to a sample position. The adapter then translates the sample
|
|
position to milliseconds and calls @racket[fmpg-seek-ms!]. This is why the
|
|
low-level module exposes millisecond seeking while the frontend exposes
|
|
percentage seeking.
|
|
|
|
@section{Examples}
|
|
|
|
The following example opens a file, decodes all PCM blocks, and reports their
|
|
byte ranges and sample ranges. A real playback loop would pass each buffer to
|
|
the audio output layer before requesting the next block.
|
|
|
|
@racketblock[
|
|
(define dec (fmpg-init))
|
|
|
|
(when (and dec (= (fmpg-open-file! dec "track.ogg") 1))
|
|
(printf "~a Hz, ~a channels, ~a ms\n"
|
|
(fmpg-audio-sample-rate dec)
|
|
(fmpg-audio-channels dec)
|
|
(fmpg-duration-ms dec))
|
|
|
|
(let loop ()
|
|
(case (fmpg-decode-next! dec)
|
|
[(1)
|
|
(define pcm (fmpg-buffer dec))
|
|
(define size (fmpg-buffer-size dec))
|
|
(define start (fmpg-buffer-start-sample dec))
|
|
(define end (fmpg-buffer-end-sample dec))
|
|
(printf "decoded ~a bytes, samples [~a, ~a)\n"
|
|
size start end)
|
|
;; Pass pcm to the audio output layer here, or copy it if needed.
|
|
(loop)]
|
|
[(0)
|
|
(printf "done\n")]
|
|
[else
|
|
(error "decode error")]))
|
|
|
|
(fmpg-close! dec))
|
|
]
|
|
|
|
A simple seek flow looks the same after the seek succeeds. The following code
|
|
moves to 30 seconds and then requests the next decoded buffer.
|
|
|
|
@racketblock[
|
|
(when (= (fmpg-seek-ms! dec 30000) 1)
|
|
(when (= (fmpg-decode-next! dec) 1)
|
|
(define pcm (fmpg-buffer dec))
|
|
(define start (fmpg-buffer-start-sample dec))
|
|
(printf "first buffer after seek starts at sample ~a\n" start)))
|
|
]
|