Commit Graph

2679 Commits

Author SHA1 Message Date
HomerJau
a27dca557e [Matroska] Allow Chapters Without a ChapterUID
Fix: Don't silently drop ChapterAtom elements that omit MkChapterUID

parseChapterAtom() returned chapterUid = 0 when the source file omitted
the MkChapterUID element, and the existing call sites in
MkChapters::parse() (ebmlmkchapters.cpp lines 106 and 127) use a C++17
init-if `chapter.uid()` predicate that silently dropped such chapters.
The caller saw an empty ChapterEditionList for files that mkvinfo,
MediaInfo and FFmpeg all handle gracefully — produced by some audiobook
generators and older muxers that omit the per-chapter UID. The
companion orphan-EditionEntry fix in PR #1311 / commit e07b956f doesn't
cover this case because the ChapterAtoms there ARE wrapped in an
EditionEntry, just without a ChapterUID inside each atom.

This change:
- Synthesises a process-unique ChapterUID inside parseChapterAtom() when
  the source file lacks MkChapterUID. The synthetic value sets the high
  bit (1ULL << 63) and increments via a static std::atomic counter; real
  ChapterUIDs are random 64-bit values from muxers, so collision with a
  generated one is practically impossible while keeping the distinction
  local to TagLib.
- The existing chapter.uid() filters at the call sites then always
  evaluate truthy and the chapter is exposed through the public
  ChapterEditionList API as if it were spec-compliant.

No existing behavior is changed — files that already conform to the spec
(ChapterAtoms with a ChapterUID element) parse identically; only
previously-dropped chapters are now surfaced.

Reported and verified against real-world chaptered Matroska audiobook
files where mkvinfo / MediaInfo see all chapters but TagLib 2.3 returned
an empty ChapterEditionList.
2026-05-19 21:32:40 +02:00
Ryan Francesconi
8fcda2daa2 [FLAC] Pack hasiXML/hasBEXT with scanned to save padding
Per review feedback on #1364: moving the two new bool flags to the end
of FilePrivate, next to the existing `scanned` bool, lets the compiler
coalesce the three bytes into the same trailing padding slot.  Saves one
machine word per FLAC::File instance versus placing the flags mid-struct
between bextData and the List<MetadataBlock*>.

Pure layout change, no behaviour difference.  Test suite still green.
2026-05-17 16:38:07 +02:00
Ryan Francesconi
c32f7c7f86 [FLAC] Track iXML/BEXT block presence with explicit flags
hasiXMLData() / hasBEXTData() were implemented as !data.isEmpty()
checks, which conflated in-memory payload with on-disk block presence.
That caused two wrong answers:

* setiXMLData("foo") on a file with no iXML block made hasiXMLData()
  return true immediately, before save().
* A FLAC file carrying an iXML APPLICATION block with empty payload
  round-tripped fine, but hasiXMLData() reported false.

Switch to the same model RIFF::WAV::File already uses: explicit
hasiXML / hasBEXT bool flags on FilePrivate, set during scan() when
the APPLICATION block is recognised, updated during save() after the
block is (re)written or omitted, and returned verbatim by the
accessors. New regression test pins down the before/after-save and
empty-block cases.

Refs: https://github.com/taglib/taglib/issues/1362
2026-05-17 16:38:07 +02:00
Stephen Booth
e23d97c580 Add algorithm include for std::min and max 2026-05-17 08:18:58 +02:00
Stephen Booth
b8b91fd072 Add algorithm include for std::find_if 2026-05-17 08:18:58 +02:00
Stephen Booth
e83e02da2e Correct documentation comment for timeEnd 2026-05-17 08:16:07 +02:00
Stephen Booth
0b5296e20e Correct assignment operator qualification 2026-05-17 08:13:48 +02:00
Stephen Booth
83fdf27cd7 Correct assignment operator qualification 2026-05-17 08:13:48 +02:00
Stephen Booth
7010d112ba Correct destructor qualification 2026-05-17 08:12:12 +02:00
Urs Fleisch
1b94b93762 Version 2.3 v2.3 2026-05-10 15:25:51 +02:00
HomerJau
8511827fa1 Skip Matroska Cues with AudioProperties::Fast and read-only mode
When the file is opened in read-only mode, it will not be written and
the Cues do not have to be updated. Skipping the Cues will make the
reading of large Matroska files over network filesystems (SMB/NFS)
faster.
2026-05-09 09:25:14 +02:00
Urs Fleisch
b02ff63916 Fix -Wconversion size_t to unsigned int warning 2026-05-09 08:28:54 +02:00
HomerJau
f1e8dac084 [Matroska: Follow chained SeekHead entries when parsing segment metadata
Some muxers — notably MakeMKV, and mkvmerge in certain configurations — write a small primary seekHead at the start of the segment that contains a single entry referencing a secondary seekHead near the end of the file. The secondary seekHead carries the actual entries for info, tracks, tags, chapters, and attachments.
2026-05-08 05:17:23 +02:00
Luc Schrijvers
d1460b6fbf Build fix for Haiku's fcntl.h which can't be found in sys/fcntl.h 2026-05-07 18:32:19 +02:00
Urs Fleisch
43190d30ed Prepare 2.3 release 2026-05-04 13:02:17 +02:00
Urs Fleisch
4c43f1c577 Matroska: Provide different WriteStyle to trade-off size/speed
A new Matroska::File::save(WriteStyle style) overload is provided to
control how tags, attachments and chapters are written to the file.

- Compact: Write tags, attachments and chapters as compact as possible.
  This is the default mode.
- DoNotShrink: Do not shrink elements; add void padding when content
  gets smaller. Allow inserts when content gets larger.
- AvoidInsert: Like DoNotShrink but also avoid inserts for non-last
  elements: replace a growing non-last element with a void of the old
  size and append the new element at the end of the segment.
  For very large files and/or slow (network) filesystems, using this
  mode will reduce write time significantly.

Co-authored-by: Copilot <copilot@github.com>
2026-05-04 12:55:17 +02:00
Ryan Francesconi
59ed19d12f [WAV] Decode iXML as UTF-8
The iXML chunk in BWF/WAV files is specified as UTF-8 (per the EBU
Tech 3285 supplement and the iXML spec). The reader was constructing
the String without an encoding hint, which falls back to Latin-1 and
mangles any non-ASCII bytes (e.g. Unicode in <NOTE>, <PROJECT>, or
<TRACK_LIST> entries written by Sound Devices, Zaxcom, etc.).
2026-05-01 06:32:09 +02:00
Ryan Francesconi
1e7bdae284 [FLAC] Add iXML and BEXT support via APPLICATION blocks
Adds 6 public methods on FLAC::File mirroring RIFF::WAV::File's existing
iXML/BEXT API: iXMLData/setiXMLData/hasiXMLData and the BEXT equivalents.

Reads APPLICATION blocks (RFC 9639 § 8.4) carrying either the IANA-
registered "riff" foreign-metadata wrapper or the direct "iXML" / "bext"
application IDs used by some third-party tools (e.g. Sequoia). Writes
the spec-blessed "riff"-wrapped form. Unrecognized application IDs and
"riff"-wrapped chunks other than iXML/bext (e.g. "fmt ", "JUNK") flow
through unmodified, so existing files round-trip without churn.

Test coverage: read direct + riff-wrapped for both iXML and BEXT,
write+reread round-trip, empty-clears-block, and an unknown-application-
block preservation guard.
2026-05-01 06:31:50 +02:00
HomerJau
e07b956fda [Matroska] Allow Orphaned Chapter Reading (when Chapter has no EditionID)
Fix: Handle orphan ChapterAtom elements not wrapped in EditionEntry

The Matroska specification requires every ChapterAtom to be inside an
EditionEntry. However, some muxers (older FFmpeg versions, some streaming
tools) produce files with ChapterAtom elements directly under Chapters,
without an EditionEntry wrapper.
MKVToolNix and FFmpeg both handle this case gracefully by treating orphan
atoms as belonging to an implicit default edition. Previously, TagLib
silently ignored these chapters, returning an empty ChapterEditionList.

This change:
- Collects orphan ChapterAtom elements encountered directly under Chapters
- Wraps them in an implicit default edition (UID = 0, isDefault = true,
  isOrdered = false) so they are exposed through the existing
  chapterEditionList() API
- Extracts the atom-parsing logic into a private parseChapterAtom() helper
  to avoid code duplication between the two call sites

No existing behavior is changed - files that already conform to the spec
(chapters inside an EditionEntry) parse identically.
2026-04-26 08:24:19 +02:00
Urs Fleisch
5e1cb4081d Limit MP4 atom sibling count at top level (#1344) 2026-04-26 07:16:53 +02:00
Urs Fleisch
e7e4f0958c Merge pull requests #1325 #1343 from ryanfrancesconi MP4 chapterlist
ryanfrancesconi/feature/mp4-chapterlist
ryanfrancesconi/fix/qt-chapter-orphaned-mdat
2026-04-26 07:15:40 +02:00
Urs Fleisch
497c040f04 Set MP4 chapters only if modified
An equality operator is added for the chapters. The chapters are
only written to the file if they were really modified, so just
reading the chapters without modifying them will not affect
the save operation.
2026-04-25 11:46:51 +02:00
Ryan Francesconi
05c2c8671e MP4: Add test coverage for chapter unicode, empty titles, and format independence
Six new tests exercise corners of the chapter implementation that the
orphaned-mdat fix did not reach:

testQTChapterListUnicodeTitles / testChapterListUnicodeTitles --
Round-trip Japanese, German (umlaut), and Russian titles through the
QT text-sample serialisation and the Nero length-prefixed UTF-8 path
respectively.  These are separate paths in the code and benefit from
separate coverage.

testQTChapterListEmptyTitleStripped --
A multi-chapter list whose first entry is empty at t=0 matches the QT
dummy-marker pattern; read() must drop it.  Test documents the rule so
a regression is immediately detectable.

testQTChapterListSingleEmptyTitleNotStripped --
The stripping rule only applies when size > 1.  A single empty-title
chapter at t=0 is valid and must be preserved.

testNeroAndQTChaptersAreIndependent --
Both formats can coexist; removing one leaves the other intact.
Validates the lazy saveChaptersIfModified contract in mp4file.cpp.

testNeroChaptersAloneWhenNoQT --
Writing one format must not create atoms for the other.

All 47 MP4 tests pass.
2026-04-23 12:19:27 -07:00
Ryan Francesconi
85b6a9eb93 MP4: Guard against deleting shared mdat on QT chapter remove
The previous fix for orphaned chapter mdats assumed the chapter text
mdat was dedicated and derived its location from stco[0] - 8.  In
audiobooks that co-locate chapter text at the start of the primary
audio mdat (stco[0] == audioMdat.offset + 8), that arithmetic lands
on the audio mdat header, the "mdat" signature check passes, and the
full audio payload gets removed -- shrinking a 484 MB audiobook to
5.4 MB.

Fix: resolve the chapter mdat by finding the top-level mdat whose
data range contains stco[0], then re-parse after the trak/tref
removals and confirm no other track's stco/co64 points into that
mdat before deleting it.  Shared mdats are left intact; the dead
chapter text bytes remain as harmless padding.

Add a regression test that writes a chapter track, patches its
stco[0] to point into the primary audio mdat (simulating the
audiobook layout), removes the chapter track, and verifies the
audio mdat is byte-identical afterwards.
2026-04-23 12:14:00 -07:00
Ryan Francesconi
5c70f0071f MP4: Add regression test for orphaned mdat on QT chapter remove
Adds testQTChapterListNoOrphanedMdat which performs three add/remove
cycles and asserts that the top-level mdat count is identical before and
after.  Without the fix, each cycle leaves an orphaned mdat at EOF, so
three cycles produce originalCount + 3 atoms.

Uses TagLib's own MP4::Atoms parser as the primary check, with
AtomicParsley as an optional cross-validation when installed.
2026-04-23 11:03:23 -07:00
Ryan Francesconi
ae171ee237 MP4: Remove orphaned mdat when removing QT chapter track
write() appends a new mdat at EOF to hold chapter text samples but the
removal code (both remove() and the replace-existing path in write())
only deleted the chapter trak and tref atoms from inside moov.  Each
add/remove cycle left the previous chapter mdat behind, causing orphaned
mdat atoms to accumulate.

Fix: extract a removeQTChapterTrack() helper that performs all three
removals atomically.  Before deleting the chapter trak, the helper reads
the first stco chunk offset (which points 8 bytes past the chapter mdat
header) to locate the mdat.  After removing the trak and tref (both
inside moov, which precedes the mdat at EOF), it adjusts the mdat offset
by -(chapterLen + trefLen) and removes the atom, leaving no orphaned data.
2026-04-23 11:03:23 -07:00
Urs Fleisch
78c7208bc9 Integrate MP4 chapters into MP4::File 2026-04-23 11:03:23 -07:00
Urs Fleisch
0df52e3993 Apply stco/co64 bounds fix from PR #1333 to MP4 chapter code
The updateChunkOffsets() function in mp4qtchapterlist.cpp and
mp4chapterlist.cpp is duplicated code from mp4tag.cpp and needs
the patch from mp4tag.cpp too.
2026-04-23 11:03:23 -07:00
Ryan Francesconi
ba2441b378 corrected nanosecond unit change -> milliseconds
taglib/mp4/mp4chapterlist.h
• start​Time doc comment: 100​-nanosecond units → milliseconds

taglib/mp4/mp4chapterlist.cpp
• render​Chpl​Data: from​Long​Long(ch​.start​Time) → from​Long​Long(ch​.start​Time * 10000​LL)
• parse​Chpl​Data: ch​.start​Time = start​Time → ch​.start​Time = start​Time100ns / 10000​LL

taglib/mp4/mp4qtchapterlist.cpp
• read: current​Time * 10000000​.0 / timescale → current​Time * 1000​.0 / timescale
• build​Stts lambda: time100ns * timescale / 10000000​.0 → time​Ms * timescale / 1000​.0

tests/test_mp4.cpp
• All start​Time assignments and assertions divided by 10,000 (e.g. 300000000​LL → 30000​LL)
2026-04-23 11:03:23 -07:00
Ryan Francesconi
c5ea13bb34 overloads for read, write, remove
Changes made

mp4chapterlist.h
• Added (​MP4::​File*) overloads for read, write, remove
• Replaced broken class ​File; forward declaration with #include "mp4file​.h" (fixed a subtle C++ name-resolution linker bug where Atoms(​File*) resolved to MP4::​File* instead of Tag​Lib::​File*)

mp4chapterlist.cpp
• Refactored: path-based overloads are now thin wrappers that delegate to file-based overloads
• File-based overloads construct Atoms locally — no Atoms* in the public API
• Removed chpl​Header​Size = 9 constant; replaced the minimum-size guard in parse​Chpl​Data with a correct 5-byte check (the old constant was version-1 specific and would reject valid version-0 atoms)

mp4qtchapterlist.h
• Added (​MP4::​File*) overloads for read, write, remove
• Removed Atoms* parameters entirely from the public API

mp4qtchapterlist.cpp
• Same refactor: path-based overloads delegate; file-based overloads construct Atoms locally
• Added empty-chapter guard: write(​MP4::​File*, {}) delegates to remove(file) instead of writing a 0-sample chapter track

tests/test_mp4.cpp
• Added test​Chapter​List​File​API and test​QTChapter​List​File​API — exercise the full write/read/remove cycle via the file-based API
• Updated test bodies to use the simplified (​MP4::​File*) API (no MP4::​Atoms construction in test code)
2026-04-23 11:03:23 -07:00
Ryan Francesconi
4a73d73b20 MP4: Add QuickTime-style chapter track support
QuickTime-style chapter tracks are the native chapter format for
Apple's ecosystem. They use a disabled text track (hdlr type "text")
referenced by a chap track-reference in the audio track's tref box.
This format is recognized by QuickTime, iTunes/Music, Final Cut Pro,
Logic Pro, DaVinci Resolve, VLC, and most other MP4/M4A players. It
is also the format that AVFoundation reads natively via
AVAssetChapterMetadataGroup.

The implementation produces output that matches ffmpeg's chapter track
structure byte-for-byte: per-sample stts entries (required by
AVFoundation), encd atoms for UTF-8 text encoding, edts/elst edit
lists, gmhd with gmin+text media information, and disabled tkhd flags
(track_in_movie only).

Key behaviors:
- write() inserts tref + chapter trak as a single contiguous block,
  then appends text samples in an mdat atom at EOF
- Handles non-zero first chapter times by prepending a dummy chapter
  at time 0 (stripped on read)
- Overwrite support: removes existing chapter track before writing
- Preserves existing metadata tags and audio data integrity
- Uses timescale=1000 (milliseconds) for chapter track timing

7 new tests covering write/read round-trip, remove, overwrite, tag
preservation, empty file read, timestamp precision, and non-zero
first chapter handling.
2026-04-23 11:03:23 -07:00
Ryan Francesconi
9c56f191e5 MP4: Add Nero-style chapter marker support
Implement read/write/remove of Nero-style chapter markers (chpl atom)
in MP4 files. The chpl atom lives at moov/udta/chpl, storing up to 255
chapter entries with 100-nanosecond timestamps and UTF-8 titles.

Includes CppUnit tests covering round-trip read/write, remove, tag
preservation, and reading from files with no chapters.
2026-04-23 11:03:23 -07:00
Urs Fleisch
77f6b9add5 Drop zero size ID3v2 frames but accept tag (#437) 2026-04-20 15:11:33 +02:00
Urs Fleisch
a64e7543f8 Fix DSD/DSF signed integer issues (#1332) 2026-04-20 15:08:37 +02:00
Felipe
d466b72eea docs: Some improvements to the documentation (#1337)
Make MP4 AtomDataType descriptions visible in the generated documentation.
Convert the ID3v2 text frame listing into a table.
Convert the shorten `fileType()` documentation into a table.
Fix some typos.
Add link to specification in `EventType` for consistency with other headers.
2026-04-13 20:05:53 +02:00
Urs Fleisch
c3a0e1d0a2 Matroska: Use seek head for faster element lookup (#1321)
Limit scan for Matroska seek head to 512 KB in ReadStyle::Fast

---------

Co-authored-by: tolriq <git@leetzone.org>
2026-04-13 19:58:52 +02:00
Ryan Francesconi
13751f5a6b Fix/shorten rice golomb k bounds (#1335)
* Shorten: Reject out-of-range k in getRiceGolombCode

k values outside [0, 31] cause undefined behavior: a left shift by 32
on int32_t (UB in C++) when bitsAvailable reaches 32 after a buffer
refill. Guard against this at the top of getRiceGolombCode and return
false (invalid file) for any k outside the valid range.

* Shorten: Reject out-of-range k in getRiceGolombCode

k values outside [0, 31] cause undefined behavior: a left shift by 32
on int32_t (UB in C++) when bitsAvailable reaches 32 after a buffer
refill. Guard against this at the top of getRiceGolombCode and return
false (invalid file) for any k outside the valid range.
2026-04-09 14:03:36 -06:00
Urs Fleisch
4da5ac2de4 Fix writing too many offsets when updating MP4 stco/co64 atoms (#1332)
This will fix a DoS with a crafted MP4 file causing too many offsets
to be written when updating the stco or co64 tables in MP4 files.

Credits for the discovery of this bug go to Yuen Ying Ng (Ruth)
(Cyber Security Researcher at PwC Hong Kong).
2026-04-08 20:53:59 +02:00
Urs Fleisch
193091fe2e Fix unbounded recursion in EBML/Matroska MasterElement and MP4 atoms (#1326)
Credits for fix and reporting go to https://github.com/ericliu-12.
2026-04-08 20:52:58 +02:00
Ryan Francesconi
5d63187a8b MP4: Fix data race in ItemFactory lazy map initialization (#1331)
Concurrent calls to propertyKeyForName() and handlerTypeForName() (e.g.
via batchMap during import) could race on the isEmpty() guard used for
first-call lazy initialization.

Replace isEmpty() guards with std::call_once / std::once_flag so that
each map is initialized exactly once in a thread-safe manner. Using
call_once (rather than eager construction in the base class constructor)
preserves virtual dispatch, allowing ItemFactory subclasses to override
nameHandlerMap() and namePropertyMap() correctly.

Both property maps are initialized together in a single once_flag since
nameForPropertyKey is derived from namePropertyMap.
2026-04-04 17:52:54 +02:00
Ryan Francesconi
f32b503f56 Fix bitrate calculation unit errors in ADTS and MP4 ESDS parsers (#1330)
mpegheader.cpp: ADTS bitrate divided by 1024 (binary kilo) instead of
1000 (decimal kilo), causing ~2.4% underreporting for all AAC streams.

mp4properties.cpp: ESDS averageBitrate double-rounded via both +500 and
+0.5 before int cast, causing standard bitrates (128000, 192000, etc.)
to read 1 kbps too high.
2026-04-04 16:34:37 +02:00
Ryan Francesconi
d6a2134cf3 Clamp oversized RIFF chunk to available bytes instead of rejecting it (#1329)
Some encoders write a valid data chunk but with a slightly too-large
declared chunkSize, or place the data chunk beyond the declared RIFF
boundary. The previous behaviour called break, abandoning all remaining
chunks and making the file appear empty to taglib.

Lenient parsers (ffmpeg, QuickTime) handle this case by clamping the
chunk size to the bytes that actually remain in the file. Adopt the
same strategy: when chunkSize would exceed the file length, clamp it
and continue parsing rather than stopping early.
2026-04-04 12:47:49 +02:00
Ryan Francesconi
abadbb6768 Add BEXT and iXML chunk support to WAV files (#1323)
Read, write, and remove Broadcast Audio Extension (BEXT, EBU Tech 3285)
and iXML metadata chunks in WAV files. BEXT is widely used in broadcast
and professional audio for originator, description, time reference, and
loudness metadata. iXML is used by field recorders and DAWs for scene,
take, and track metadata.
2026-04-04 12:14:34 +02:00
Daniel
49510e7d5a Move MPEG check to end of content-based detection (#1319)
MPEG::File::isSupported() scans for frame sync bytes that can appear
in other files, causing them to be misidentified as MP3.

This also includes a test with such a file.
2026-04-04 08:01:41 +02:00
Daniel
7f2f2ddcaf Add tests for FileRef content-based detection via ByteVectorStream (#1318)
This covers all 18 formats supported by the content-based detection.
2026-04-04 07:43:56 +02:00
Urs Fleisch
0368c0239a Pin submodule utfcpp to tag v4.0.9 (#1315)
git submodule init
git submodule update --remote
(cd 3rdparty/utfcpp && git checkout v4.0.9)
git add 3rdparty/utfcpp
git commit -m 'Pin submodule utfcpp to tag v4.0.9'
2026-03-31 20:04:11 +02:00
Felipe
9411bb161f Opus: Read output gain (#1320) 2026-03-31 11:14:44 -05:00
Urs Fleisch
78298769de Version 2.2.1 v2.2.1 2026-03-07 06:41:13 +01:00
Urs Fleisch
c43d2b3fc1 Avoid duplicates in StringList Matroska::Tag::complexPropertyKeys()
When using for example

examples/tagwriter -C GENRE \
"name=GENRE,targetTypeValue=50,value=Soft Rock;name=GENRE,targetTypeValue=50,value=Classic Rock" \
path/to/file.mka

the GENRE key was included twice and tagreader displayed the two genre
tags twice.
2026-02-28 07:51:04 +01:00
Urs Fleisch
3db0d48f4b Support edition, chapter and attachment UIDs in Matroska simple tags (#1311) 2026-02-24 06:53:23 +01:00