racket-audio/scrbl/ffmpeg-c-backend.scrbl

#lang scribble/manual

@title{FFmpeg Audio Backend}
@author{@author+email["Hans Dijkema" "hans@dijkewijk.nl"]}

@section{Overview}

The FFmpeg audio backend is a small C++ wrapper with a plain C ABI. It hides
the FFmpeg data structures from the caller and exposes a simple
audio-only decoder interface.

The caller does not handle FFmpeg streams, packets, frames, codec
contexts or resampler objects. A file is opened, the best audio stream is
selected, and decoding is performed by repeatedly calling
@tt{fmpg_decode_next}.

The output format is fixed: signed 32-bit integer PCM, interleaved, in
native endian format.

A sample frame means one sample moment across all channels. For stereo
S32, one sample frame contains two @tt{int32_t} values and therefore
takes 8 bytes.

@section{Opaque Instance}

@verbatim|{
typedef struct fmpg_instance fmpg_instance;
}|

The decoder instance is opaque. The caller only receives and passes
around a pointer to this type. All FFmpeg state is stored internally.

@section{Lifecycle}

@verbatim|{
fmpg_instance *fmpg_init(void);
}|

Creates a new decoder instance.

Before allocating the instance, the backend checks whether the FFmpeg major
versions used at compile time match the FFmpeg major versions available
at runtime. If they do not match, @tt{NULL} is returned.

Returns a pointer to a new @tt{fmpg_instance}, or @tt{NULL} on failure.

@verbatim|{
void fmpg_free(fmpg_instance *instance);
}|

Frees the decoder instance. If the instance still has an open input, it
is closed as part of destruction.

@verbatim|{
int fmpg_open_file(fmpg_instance *instance, const char *filename);
}|

Opens a media file, selects the best audio stream, initializes the
decoder and initializes the resampler.

After a successful call, stream information, duration and metadata can be
read using the getter functions.

Returns @tt{1} on success and @tt{0} on failure. The call fails if the
instance is @tt{NULL}, if a file is already open, if @tt{filename} is
@tt{NULL}, if no usable audio stream is found, or if FFmpeg cannot open
or initialize the file.

@verbatim|{
void fmpg_close(fmpg_instance *instance);
}|

Closes the current file and releases all FFmpeg state owned by the
instance. The instance itself remains valid and may be reused.

@verbatim|{
int fmpg_is_open(fmpg_instance *instance);
}|

Returns @tt{1} if the instance is open and ready to decode. Otherwise
returns @tt{0}.

@section{Audio Information}

@verbatim|{
int fmpg_audio_stream_count(fmpg_instance *instance);
int fmpg_audio_sample_rate(fmpg_instance *instance);
int fmpg_audio_channels(fmpg_instance *instance);
int fmpg_audio_bits_per_sample(fmpg_instance *instance);
int fmpg_audio_bytes_per_sample(fmpg_instance *instance);
int64_t fmpg_duration_ms(fmpg_instance *instance);
int64_t fmpg_duration_samples(fmpg_instance *instance);
}|

These functions return information about the selected audio stream.

@itemlist[
  #:style 'compact
  @item{@tt{fmpg_audio_stream_count} returns the number of audio streams found in the opened file, or @tt{0}.}
  @item{@tt{fmpg_audio_sample_rate} returns the selected stream's sample rate, or @tt{0}.}
  @item{@tt{fmpg_audio_channels} returns the selected stream's channel count, or @tt{0}.}
  @item{@tt{fmpg_audio_bits_per_sample} always returns @tt{32}.}
  @item{@tt{fmpg_audio_bytes_per_sample} always returns @tt{4}.}
  @item{@tt{fmpg_duration_ms} returns the duration in milliseconds, or @tt{-1}.}
  @item{@tt{fmpg_duration_samples} returns the duration in output sample frames, or @tt{-1}.}
]

@section{Metadata}

@verbatim|{
const char *fmpg_file_title(fmpg_instance *instance);
const char *fmpg_file_author(fmpg_instance *instance);
const char *fmpg_file_album(fmpg_instance *instance);
const char *fmpg_file_genre(fmpg_instance *instance);
const char *fmpg_file_comment(fmpg_instance *instance);
const char *fmpg_file_copyright(fmpg_instance *instance);
int fmpg_file_year(fmpg_instance *instance);
int fmpg_file_track(fmpg_instance *instance);
int64_t fmpg_file_bitrate(fmpg_instance *instance);
}|

The metadata getters return values read from the container metadata. A
missing string value is returned as an empty string. A missing numeric
value is returned as @tt{-1}. @tt{fmpg_file_author} returns the
@tt{artist} metadata field.

@section{Decoding}

@verbatim|{
int fmpg_decode_next(fmpg_instance *instance);
}|

Decodes the next block of audio.

Internally, the backend reads packets from the selected audio stream, feeds
them to the FFmpeg decoder, receives all available decoded frames,
converts them to signed 32-bit interleaved PCM, and concatenates the
result in the instance output buffer.

Packets from non-selected streams are skipped internally.

Returns @tt{1} if decoded PCM data is available through
@tt{fmpg_buffer} and @tt{fmpg_buffer_size}. Returns @tt{0} at EOF or on
error.

@verbatim|{
int fmpg_seek_ms(fmpg_instance *instance, int64_t target_pos_ms);
}|

Seeks to an absolute position in milliseconds.

FFmpeg may seek to a packet before the requested timestamp. After
seeking, this backend discards decoded pre-roll samples until the requested
output sample position is reached, when timestamps are available.

Returns @tt{1} on success and @tt{0} on failure.

@section{Output Buffer and Sample Positions}

@verbatim|{
const uint8_t *fmpg_buffer(fmpg_instance *instance);
int fmpg_buffer_size(fmpg_instance *instance);
int64_t fmpg_buffer_samples(fmpg_instance *instance);
int64_t fmpg_buffer_start_sample(fmpg_instance *instance);
int64_t fmpg_buffer_end_sample(fmpg_instance *instance);
int64_t fmpg_sample_position(fmpg_instance *instance);
double fmpg_timecode(fmpg_instance *instance);
}|

@tt{fmpg_buffer} returns a pointer to the current decoded PCM buffer, or
@tt{NULL} if there is no current buffer. The pointer remains valid only
until the next API call that decodes, seeks, closes or frees the
instance.

@tt{fmpg_buffer_size} returns the size of the current buffer in bytes.
@tt{fmpg_buffer_samples} returns the number of sample frames in the
current buffer. @tt{fmpg_buffer_start_sample} returns the absolute
sample-frame index of the first sample frame in the buffer, and
@tt{fmpg_buffer_end_sample} returns the absolute sample-frame index just
after the current buffer.

@tt{fmpg_sample_position} returns the current absolute sample position in
the music stream. After a successful @tt{fmpg_decode_next}, this is the
same value as @tt{fmpg_buffer_end_sample}.

@tt{fmpg_timecode} returns the approximate start time of the current
decoded block in seconds.

@section{FFmpeg Version Checks}

@verbatim|{
const char *fmpg_ffmpeg_version(void);
const char *fmpg_int_version2string(unsigned version);
int fmpg_compatible_ffmpeg(void);
}|

@tt{fmpg_ffmpeg_version} returns a string describing the FFmpeg versions
used when the backend was compiled. The string includes avformat, avcodec,
swresample and avutil.

@tt{fmpg_int_version2string} converts an FFmpeg integer version value to
a string of the form @tt{major.minor.micro}.

@tt{fmpg_compatible_ffmpeg} checks whether the FFmpeg major versions used
at compile time match the FFmpeg major versions available at runtime.
It returns @tt{1} when the versions are compatible and @tt{0} otherwise.

@section{Decoder Model}

The backend uses the modern FFmpeg send/receive decoding model. Packets are
sent with @tt{avcodec_send_packet}, decoded frames are received with
@tt{avcodec_receive_frame}, and conversion to the fixed output format is
done with libswresample.

The public API intentionally avoids exposing these details. From the
caller perspective, decoding is a sequence of calls to
@tt{fmpg_decode_next} followed by reading the current output buffer and
its sample-position metadata.