Files
gemigreerd-racket-audio/scrbl/audio-encoder.scrbl
T
2026-06-08 13:26:10 +02:00

231 lines
9.7 KiB
Racket

#lang scribble/manual
@(require (for-label racket/base
racket/contract
racket/path
"../audio-encoder.rkt"))
@title{Audio Encoding}
@author[@author+email["Hans Dijkema" "hans@dijkewijk.nl"]]
@defmodule[racket-audio/audio-encoder]
The @racketmodname[racket-audio/audio-encoder] module provides the high level
file-to-file encoding pipeline. It reuses the existing decoder environment to
read the input file and sends the decoded PCM stream to a selected encoder
backend. The built-in backends are Opus, implemented with @tt{libopusenc}, and
FLAC, implemented with @tt{libFLAC}.
This module is intended as the public encoding API. The concrete backend
modules are small FFI backends; applications normally call @racket[audio-encode]
instead of using those modules directly.
@section{Pipeline}
Encoding is organised as a streaming pipeline:
@racketblock[
input file
;; decoded by audio-decoder.rkt
-> PCM buffers
;; optional conversion for FLAC
-> encoder backend
-> output file]
The encoder is selected from @racket[#:encoder] or, when that argument is not
provided, from the output filename extension. The initial built-in encoders are
@racket['opus] for @filepath{.opus} and @filepath{.oga} files, and
@racket['flac] for @filepath{.flac} files.
The PCM stream is not collected in memory. Each decoded buffer is forwarded to
the selected backend. FLAC encoding may insert a PCM conversion step when the
settings request a different sample rate, channel count, or bit depth. Opus
encoding feeds floating-point PCM to @tt{libopusenc}; sample-rate conversion for
Opus is left to @tt{libopusenc}.
@section{Encoding a file}
@defproc[(audio-encode [input-file path-string?]
[output-file path-string?]
[settings hash?]
[#:encoder encoder (or/c symbol? #f) #f]
[#:copy-tags? copy-tags? boolean? #t]
[#:progress-callback progress-callback
(or/c procedure? #f) #f])
hash?]{
Encodes @racket[input-file] to @racket[output-file] and returns a result hash.
The @racket[settings] hash is interpreted by the selected backend.
When @racket[encoder] is @racket[#f], the backend is inferred from the output
file extension. Pass @racket['opus] or @racket['flac] to force a backend.
When @racket[copy-tags?] is true, common textual tags and an embedded picture
are copied from the source file to the destination file. Opus comments and
cover art are written before encoding starts through @tt{libopusenc}. FLAC
metadata is copied after the encoded file has been written, using the
read-write API from @racketmodname[racket-audio/taglib].
When @racket[progress-callback] is a procedure, it is called with a progress
hash during encoding. Progress is based on the number of input frames read from
the decoder, not on the number of frames written by the encoder. This matters
for resampling, because output frame counts can differ from input frame counts.}
@racketblock[
(audio-encode "input.flac"
"output.opus"
(hash 'bitrate 224000
'vbr? #t
'complexity 10)
#:encoder 'opus)
(audio-encode "input-96k.flac"
"output-48k.flac"
(hash 'sample-rate 48000
'bits-per-sample 24
'compression-level 8)
#:encoder 'flac)]
@section{Result hash}
The result hash contains the following keys:
@itemlist[#:style 'compact
@item{@racket['encoder], the selected backend symbol;}
@item{@racket['input] and @racket['output], the source and destination paths;}
@item{@racket['input-format], the final decoded input format hash seen by the
pipeline;}
@item{@racket['output-format], the resolved backend output format hash;}
@item{@racket['frames-read], the number of input frames consumed;}
@item{@racket['frames-written], the number of frames accepted by the backend;}
@item{@racket['tag-copy], a hash describing how metadata was handled.}]
The @racket['tag-copy] hash contains a @racket['method] key. For Opus the
method is @racket['libopusenc-comments], because metadata must be supplied to
@tt{libopusenc} before the encoder writes the OpusTags packet. For FLAC the
method is @racket['taglib-post-copy], because the encoded file is tagged after
encoding.
@section{Progress callback}
The progress callback receives a hash with at least these keys:
@itemlist[#:style 'compact
@item{@racket['phase], such as @racket['format], @racket['audio],
@racket['finished-encoding], or @racket['finished];}
@item{@racket['frames-read] and @racket['frames-written];}
@item{@racket['total-frames], when the decoder reported a known input length;}
@item{@racket['progress], a number between @racket[0.0] and @racket[1.0] when
@racket['total-frames] is known, otherwise @racket[#f];}
@item{@racket['input-format] and, after the backend has opened,
@racket['output-format].}]
A simple command-line style progress callback can print a percentage on one
line:
@racketblock[
(define (show-progress h)
(let ((p (hash-ref h 'progress #f)))
(when (number? p)
(printf "\rprogress: ~a%" (round (* 100 p)))
(flush-output))))]
@section{Opus settings}
The Opus backend uses @tt{libopusenc}. The input PCM is converted to interleaved
floating-point samples in the range @racket[-1.0] to @racket[1.0] and written
with @tt{ope_encoder_write_float}. The source sample rate is passed to
@tt{libopusenc}; @tt{libopusenc} performs the required internal resampling for
Opus output.
The following settings are recognised:
@itemlist[#:style 'compact
@item{@racket['bitrate], bitrate in bits per second. The default is
@racket[160000].}
@item{@racket['vbr?], whether variable bitrate is enabled. The default is
@racket[#t].}
@item{@racket['constrained-vbr?], whether constrained VBR is enabled. The
default is @racket[#f].}
@item{@racket['complexity], encoder complexity. The default is @racket[10].}
@item{@racket['comment-padding], Opus comment padding in bytes. The default
is @racket[512].}
@item{@racket['signal], optionally @racket['auto], @racket['voice], or
@racket['music].}
@item{@racket['lsb-depth], optionally passed to the encoder as the source
least significant bit depth.}
@item{@racket['comments], an optional hash of Opus comment strings. When
@racket[#:copy-tags?] is true, @racket[audio-encode] fills this from the
source tags.}
@item{@racket['picture], an optional picture value from @racketmodname[racket-audio/taglib].
When @racket[#:copy-tags?] is true, @racket[audio-encode] fills this
from the source tags.}]
The first backend version supports mono and stereo input.
@section{FLAC settings}
The FLAC backend uses the @tt{libFLAC} stream encoder. It writes interleaved
integer PCM samples through the FLAC encoder API. When the requested output
format differs from the decoded input format, @racketmodname[racket-audio/private/pcm-converter]
uses the existing FFmpeg @tt{swresample} layer from
@racketmodname[racket-audio/ffmpeg-definitions] to perform PCM normalisation.
The following settings are recognised:
@itemlist[#:style 'compact
@item{@racket['compression-level], FLAC compression level. The default is
@racket[5].}
@item{@racket['verify?], whether the FLAC encoder verifies encoded output. The
default is @racket[#f].}
@item{@racket['blocksize], explicit FLAC block size. The default is
@racket[0], meaning the library default.}
@item{@racket['sample-rate] or @racket['target-sample-rate], target sample rate
in Hz. Use @racket['source] or omit the key to keep the source rate.}
@item{@racket['channels] or @racket['target-channels], target channel count.
Use @racket['source] or omit the key to keep the source channel count.}
@item{@racket['bits-per-sample] or @racket['target-bits-per-sample], target
bit depth. Use @racket['source] or omit the key to keep the source bit
depth.}]
For example, a 24-bit 96 kHz FLAC file can be transcoded to 24-bit 48 kHz FLAC
with:
@racketblock[
(audio-encode "input-96k.flac"
"output-48k.flac"
(hash 'sample-rate 48000
'bits-per-sample 24
'compression-level 8)
#:encoder 'flac)]
@section{Encoder registration}
@defproc[(audio-supported-encoder-extensions) (listof string?)]{
Returns the extensions supported by the currently registered encoders. The
initial list includes @racket["flac"], @racket["opus"], and @racket["oga"].}
@defproc[(make-audio-encoder [exts (listof string?)]
[open procedure?]
[write procedure?]
[finish procedure?]
[settings procedure?])
audio-encoder?]{
Creates an encoder descriptor. The descriptor is used by
@racket[audio-register-encoder!] to register a backend.
The @racket[open] procedure receives the output file, settings hash, and input
format hash. The @racket[write] procedure receives the backend handle, buffer
format hash, byte buffer, and byte length, and returns the number of frames
accepted by the backend. The @racket[finish] procedure finalises and releases
the backend handle. The @racket[settings] procedure resolves backend defaults
against the input format and returns the output format hash.}
@defproc[(audio-encoder? [v any/c]) boolean?]{
Returns @racket[#t] when @racket[v] is an encoder descriptor.}
@defproc[(audio-register-encoder! [type symbol?]
[encoder audio-encoder?])
void?]{
Registers @racket[encoder] under @racket[type]. The encoder's extensions are
used for extension-based selection in @racket[audio-encode].}