diff --git a/scrbl/ffmpeg-definitions.scrbl b/scrbl/ffmpeg-definitions.scrbl index e69de29..4d82ce9 100644 --- a/scrbl/ffmpeg-definitions.scrbl +++ b/scrbl/ffmpeg-definitions.scrbl @@ -0,0 +1,441 @@ +#lang scribble/manual + +@(require (for-label racket/base + ;racket/contract + racket/path + ffi/unsafe + let-assert + early-return + "../ffmpeg-definitions.rkt" + "../private/cstruct-helper.rkt")) + +@title[#:tag "ffmpeg-definitions"]{FFmpeg Decoder Definitions} +@author[@author+email["Hans Dijkema" "hans@dijkewijk.nl"]] + +@defmodule[racket-audio/ffmpeg-definitions] + +This module provides the direct FFmpeg-backed decoder layer used by the audio +pipeline. It is deliberately small and stateful. A caller creates one decoder +instance, opens one file on it, queries the selected audio stream, repeatedly +asks for the next PCM block, and closes the instance again. + +The module does not expose FFmpeg metadata. It only exposes the information +needed for playback: stream count, sample rate, channel count, duration, +bitrate, decoded PCM data, and sample positions. The output format is fixed: +interleaved signed 32-bit PCM, four bytes per sample, using FFmpeg's +@tt{AV_SAMPLE_FMT_S32} sample format. + +The FFmpeg libraries are loaded when the module is required. The module checks +that the runtime FFmpeg major versions are in the supported range configured by +the implementation. This binding targets the FFmpeg library major versions +used by FFmpeg 6, 7, and 8: @tt{libavutil} 58 to 60, @tt{libavcodec} 60 to 62, +@tt{libavformat} 60 to 62, and @tt{libswresample} 4 to 6. Unsupported runtime +versions fail early, before a decoder instance is used. + +On Windows, the private library loader may download the bundled sound-library +set into Racket's add-on directory before the FFI libraries are opened. On +Unix-like systems, the FFmpeg libraries are expected to be installed by the +operating system or platform package manager and to be reachable by Racket's +FFI library search path. + +@section{Layering} + +This module is the low-level Racket FFI layer. It is normally wrapped by +@filepath{ffmpeg-ffi.rkt} and then by @filepath{ffmpeg-decoder.rkt}. The first +wrapper adapts this module to the command protocol used by the audio decoder +frontend. The second wrapper exposes the callback-oriented decoder interface +used by the rest of the playback pipeline. + +The distinction matters for buffer lifetime. At this level, +@racket[fmpg-buffer] returns the current buffer owned by the decoder instance. +The adapter in @filepath{ffmpeg-ffi.rkt} copies that buffer before passing it to +@filepath{ffmpeg-decoder.rkt}. Code that uses this module directly must copy +the buffer itself when the bytes must survive the next decoder operation. + +@section{FFmpeg version information} + +@defproc[(ffmpeg-version [lib (or/c 'avutil 'avcodec 'avformat + 'swr 'swresample)]) + (list/c exact-nonnegative-integer? + exact-nonnegative-integer? + exact-nonnegative-integer?)]{ +Returns the runtime version of one FFmpeg library as a three-element list +containing the major, minor, and micro version numbers. The symbols +@racket['swr] and @racket['swresample] both refer to @tt{libswresample}. + +The version is read from FFmpeg's packed integer value. For example, a runtime +value corresponding to @tt{62.28.100} is returned as @racket['(62 28 100)]. +The function raises an exception for an unknown library symbol. +} + +The runtime versions determine which partial FFmpeg struct layouts are safe to +use. If a future FFmpeg major release changes a layout before one of the +fields read by this module, the supported range should be extended only after +the affected partial definitions have been checked. + +@section{Implementation strategy} + +This module talks directly to the FFmpeg shared libraries through Racket's FFI. +There is no C shim that hides FFmpeg's structs or normalizes their layout. The +price of that choice is that the Racket side must know enough of the relevant C +struct layouts to read the fields used by the decoder. The benefit is that the +binding remains a Racket module with direct access to the platform FFmpeg +libraries. + +@subsection{C structs and offsets} + +Small and stable structures, such as @tt{AVRational} and +@tt{AVChannelLayout}, are described with @racket[define-cstruct]. A +@racket[define-cstruct] form describes the C fields to Racket's FFI. Racket +then calculates the correct field offsets for the current platform ABI and +creates the corresponding pointer type, constructor, accessors and mutators. + +The larger FFmpeg structures are handled by @racket[def-cstruct] from +@filepath{private/cstruct-helper.rkt}. Structures such as +@tt{AVCodecParameters}, @tt{AVStream}, @tt{AVFormatContext}, @tt{AVFrame} and +@tt{AVPacket} are large and may differ between FFmpeg major versions. The +decoder only needs a few fields from each one, but those fields must still be +read from their exact native offsets. + +The helper solves this by describing the complete field sequence up to the last +field the backend needs. Unnamed entries are used only to advance the offset. +Named entries become generated accessors. Repeated entries such as +@racket[(6 _int)] keep the definition compact while still allowing Racket's FFI +to compute alignment, padding and pointer size correctly. Tail fields after +the last required member are not described. + +The right layout is selected when the module is required, after the runtime +FFmpeg major versions have been read from the libraries. For the supported +range, @tt{_AVCodecParameters} uses one layout for @tt{libavcodec} major +version 60 and another for major versions 61 and 62. Likewise, +@tt{_AVFrame} uses one layout for @tt{libavutil} major version 58 and +another for major versions 59 and 60. The other partial structs used by this +module are defined with a single layout across the supported versions. + +@subsection{Defensive control flow} + +Most FFmpeg calls report ordinary failure through C-style return values or null +pointers. The implementation treats those results as normal control flow. The +@racket[let/assert] form is used for setup paths where each native result must +be checked before the next native call is made. It behaves like a sequential +binding form: each binding can be checked immediately, and a failed check +returns the specified failure value for the whole form. + +That style is used for opening a file, selecting stream information, allocating +the codec context, and initializing the resampler. Predicates such as +@tt{a-!nullptr?}, @tt{a-nullptr?}, @tt{a-true?}, and @tt{a->=?} express the +usual FFmpeg checks directly next to the binding that produced the value. + +The decode and seek paths also use @racket[early-return] where processing must +stop immediately from a nested position. This keeps the normal FFmpeg outcomes +away from exception-based control flow while still making cleanup actions local +to the point where a failure can occur. + +@section{Decoder instances} + +A decoder instance is an opaque value returned by @racket[fmpg-init]. Its +structure type and predicate are not exported. Pass the value back to the +functions in this module and do not inspect it directly. The contracts below +therefore use @racket[any/c] for the instance argument. Operationally, that +argument must be a value returned by @racket[fmpg-init]. + +The instance owns native FFmpeg resources: a format context, a codec context, +an audio frame, a resampler, and the Racket byte string used for the current +PCM block. Finalizers are installed as a last line of defence, but callers +should still call @racket[fmpg-close!] explicitly when playback stops or when +the file is no longer needed. Explicit close keeps the lifetime of native +resources predictable. + +@defproc[(fmpg-init) any/c]{ +Creates a new decoder instance. The result is an opaque instance value, or +@racket[#f] if the instance could not be created. + +Creating the instance does not open a file. Use @racket[fmpg-open-file!] +before querying stream information or decoding audio. +} + +@defproc[(fmpg-open-file! [instance any/c] + [filename (or/c path? string?)]) + (integer-in 0 1)]{ +Opens @racket[filename] on @racket[instance], reads the stream information, +selects the best audio stream, initializes the codec context, and initializes +the resampler. + +The function returns @racket[1] on success and @racket[0] on failure. On +failure, partially initialized native state is closed again. A non-string, +non-path filename is treated as an open failure and returns @racket[0]. + +An instance can only have one file open. Close it with @racket[fmpg-close!] +before opening another file on the same instance. +} + +@defproc[(fmpg-close! [instance any/c]) void?]{ +Closes @racket[instance] if it is open and releases the native FFmpeg resources +owned by the instance. The codec context, frame and resampler are freed before +the format context is closed. This order avoids keeping decoder pointers that +refer to streams from an already closed container. + +The stored audio information is reset. Calling this function with @racket[#f] +or with an already closed instance is harmless. +} + +@defproc[(fmpg-is-open [instance any/c]) (integer-in 0 1)]{ +Returns @racket[1] when @racket[instance] is ready for decoding and @racket[0] +otherwise. An instance is ready only after a file has been opened, a usable +audio stream has been selected, and the decoder and resampler have been +initialized. +} + +@section{Audio stream information} + +The decoder selects one audio stream for playback using FFmpeg's best-stream +selection. The stream count reports how many audio streams were found in the +container, but decoding is performed only for the selected stream. + +The term @italic{sample} in this module means a sample frame: one time step in +the audio stream, across all channels. For stereo 32-bit output, one sample +frame therefore occupies @racket[(* 2 4)] bytes in the returned PCM buffer. + +@defproc[(fmpg-audio-stream-count [instance any/c]) + exact-nonnegative-integer?]{ +Returns the number of audio streams in the open container. If the instance is +not open, the result is @racket[0]. This count is informational; actual stream +selection is performed during @racket[fmpg-open-file!]. +} + +@deftogether[ +(@defproc[(fmpg-audio-sample-rate [instance any/c]) + exact-nonnegative-integer?] + @defproc[(fmpg-audio-channels [instance any/c]) + exact-nonnegative-integer?])]{ +Return the sample rate and channel count of the selected audio stream. If the +instance is not ready, both functions return @racket[0]. +} + +@deftogether[ +(@defproc[(fmpg-audio-bits-per-sample [instance any/c]) + exact-positive-integer?] + @defproc[(fmpg-audio-bytes-per-sample [instance any/c]) + exact-positive-integer?])]{ +Return the fixed output sample width in bits and bytes. The current output +format is 32-bit signed PCM, so @racket[fmpg-audio-bits-per-sample] returns +@racket[32] and @racket[fmpg-audio-bytes-per-sample] returns @racket[4]. The +values are independent of the input file's original sample format and do not +depend on the instance state. +} + +@deftogether[ +(@defproc[(fmpg-duration-ms [instance any/c]) exact-integer?] + @defproc[(fmpg-duration-samples [instance any/c]) exact-integer?])]{ +Return the duration of the selected audio stream in milliseconds and in sample +frames. If the stream duration is not available, the container duration is +used as a fallback. If no duration can be determined, or when the instance is +not ready, the result is @racket[-1]. +} + +@defproc[(fmpg-file-bitrate [instance any/c]) exact-integer?]{ +Returns the container bitrate in bits per second. If the bitrate is unavailable +or if the instance is not open, the result is @racket[-1]. Only positive +FFmpeg bitrates are passed through as reliable. +} + +@section{Output format} + +The decoder output format is intentionally fixed: + +@itemlist[ + #:style 'compact + @item{sample format: signed 32-bit PCM, @tt{AV_SAMPLE_FMT_S32}} + @item{layout: interleaved} + @item{sample rate: the selected stream's sample rate} + @item{channels: the selected stream's channel count} +] + +This keeps the playback layer simple. The FFmpeg input format may be planar, +floating point, compressed, or otherwise different; @tt{libswresample} converts +the decoded frames to the fixed output format before the bytes are exposed to +Racket. + +@section{Decoding} + +Decoding is block oriented. Each call to @racket[fmpg-decode-next!] clears the +previous PCM block and attempts to produce the next decoded block for the +selected audio stream. When the call returns @racket[1], the block can be read +with @racket[fmpg-buffer] and described with the buffer query functions. + +@defproc[(fmpg-decode-next! [instance any/c]) exact-integer?]{ +Decodes until a block of PCM output is available, end of stream is reached, or +an error occurs. The return values are: + +@itemlist[ + #:style 'compact + @item{@racket[1]: a new PCM buffer is available through @racket[fmpg-buffer].} + @item{@racket[0]: decoding is complete and no more PCM is available.} + @item{A negative value: decoding failed or the instance was not ready.} +] + +Internally, the decoder first tries to receive frames that FFmpeg may already +have buffered. If no frame is ready, it reads packets until it finds a packet +for the selected audio stream. Packets from other streams are skipped and +immediately unreferenced. Sent packets are unreferenced after +@tt{avcodec_send_packet}, because the codec has then taken what it needs. + +At end of input, the function drains both the codec and the resampler. This is +necessary because FFmpeg and @tt{libswresample} may still hold delayed samples +even after the demuxer has no more packets. +} + +@section{Decoded buffers} + +The PCM buffer belongs to the decoder instance. It is replaced by the next +call to @racket[fmpg-decode-next!], @racket[fmpg-seek-ms!], or +@racket[fmpg-close!]. Treat the returned byte string as read-only. Copy it if +it must outlive the next decoder operation or if another component may mutate +it. + +@defproc[(fmpg-buffer [instance any/c]) (or/c bytes? #f)]{ +Returns the current decoded PCM block as a byte string, or @racket[#f] when no +PCM block is available. + +The byte string contains interleaved signed 32-bit samples. Its logical frame +count is available as the difference between @racket[fmpg-buffer-end-sample] +and @racket[fmpg-buffer-start-sample]. Its byte size is also available through +@racket[fmpg-buffer-size]. +} + +@defproc[(fmpg-buffer-size [instance any/c]) exact-nonnegative-integer?]{ +Returns the number of valid bytes in the current PCM buffer. If no decoder +state is available, or if the size would not fit in the internal integer range, +the function returns @racket[0]. +} + +@deftogether[ +(@defproc[(fmpg-buffer-start-sample [instance any/c]) + exact-nonnegative-integer?] + @defproc[(fmpg-buffer-end-sample [instance any/c]) + exact-nonnegative-integer?] + @defproc[(fmpg-sample-position [instance any/c]) + exact-nonnegative-integer?])]{ +Return sample-frame positions for the current decoder state. + +@racket[fmpg-buffer-start-sample] returns the first sample frame represented by +the current PCM buffer. @racket[fmpg-buffer-end-sample] returns the half-open +end position: the first sample frame after the current buffer. +@racket[fmpg-sample-position] returns the next sample position the decoder +expects to produce. + +These values count sample frames, not individual channel samples. For stereo +audio, one sample frame contains one sample for the left channel and one sample +for the right channel. +} + +@section{Seeking} + +@defproc[(fmpg-seek-ms! [instance any/c] + [target-pos-ms exact-nonnegative-integer?]) + (integer-in 0 1)]{ +Seeks the selected audio stream to @racket[target-pos-ms] milliseconds and +resets the decoder and resampler state. The function returns @racket[1] on +success and @racket[0] on failure. Seeking is allowed only when the instance +is already ready for decoding and the target position is non-negative. + +Seeking uses FFmpeg's backward seek flag. FFmpeg may therefore seek to a packet +position before the requested target. The decoder stores a discard target in +sample frames. During the following decode calls, frames before the target are +dropped, and frames that overlap the target are trimmed so the exposed PCM +buffer starts at, or as close as FFmpeg can provide to, the requested position. + +After a successful seek, the codec buffers are flushed, the resampler is closed +and reinitialized, EOF state is cleared, and sample bookkeeping is reset to the +target position. +} + +@section{Resource ownership} + +The decoder instance owns the native FFmpeg objects it allocates. The codec +pointer returned by FFmpeg is not owned by the instance, but the codec context, +frame, resampler and format context are. They are released by +@racket[fmpg-close!]. Finalizers are registered as a safety net, but callers +should close decoder instances explicitly. + +Temporary native buffers used during resampling are allocated only for the +duration of a conversion step and are always freed before control returns to the +caller. The public PCM buffer is a Racket byte string, so it can safely be +passed to the Racket-side playback backend. + +@section{Use through the decoder frontend} + +The direct API above is normally wrapped by @filepath{ffmpeg-ffi.rkt} and by +@filepath{ffmpeg-decoder.rkt}. The frontend function @tt{ffmpeg-open} returns +a handle or @racket[#f] when the file does not exist. Its stream-info callback +receives a mutable hash with at least these playback keys: + +@racketblock[ +(list 'sample-rate + 'channels + 'bits-per-sample + 'bytes-per-sample + 'total-samples + 'duration)] + +The audio callback receives the same hash extended for the current buffer with +these keys: + +@racketblock[ +(list 'sample + 'current-time)] + +The hash is followed by a copied byte string and its valid byte count. The +copy is made by @filepath{ffmpeg-ffi.rkt}, not by the low-level buffer function +itself. + +The frontend's seek function accepts a percentage of the stream and translates +that percentage to a sample position. The adapter then translates the sample +position to milliseconds and calls @racket[fmpg-seek-ms!]. This is why the +low-level module exposes millisecond seeking while the frontend exposes +percentage seeking. + +@section{Examples} + +The following example opens a file, decodes all PCM blocks, and reports their +byte ranges and sample ranges. A real playback loop would pass each buffer to +the audio output layer before requesting the next block. + +@racketblock[ +(define dec (fmpg-init)) + +(when (and dec (= (fmpg-open-file! dec "track.ogg") 1)) + (printf "~a Hz, ~a channels, ~a ms\n" + (fmpg-audio-sample-rate dec) + (fmpg-audio-channels dec) + (fmpg-duration-ms dec)) + + (let loop () + (case (fmpg-decode-next! dec) + [(1) + (define pcm (fmpg-buffer dec)) + (define size (fmpg-buffer-size dec)) + (define start (fmpg-buffer-start-sample dec)) + (define end (fmpg-buffer-end-sample dec)) + (printf "decoded ~a bytes, samples [~a, ~a)\n" + size start end) + ;; Pass pcm to the audio output layer here, or copy it if needed. + (loop)] + [(0) + (printf "done\n")] + [else + (error "decode error")])) + + (fmpg-close! dec)) +] + +A simple seek flow looks the same after the seek succeeds. The following code +moves to 30 seconds and then requests the next decoded buffer. + +@racketblock[ +(when (= (fmpg-seek-ms! dec 30000) 1) + (when (= (fmpg-decode-next! dec) 1) + (define pcm (fmpg-buffer dec)) + (define start (fmpg-buffer-start-sample dec)) + (printf "first buffer after seek starts at sample ~a\n" start))) +]