Backport of 4dcf0b41c6b01f45e141https://github.com/taglib/taglib/pull/77
Tested with files larger than 2GB which have been created using
sox -n -r 44100 -C 320 large.mp3 synth 58916 sine 440 channels 2
sox -n -r 44100 -C 0 large.flac synth 25459 sine 440 channels 2
sox -n -r 44100 -C 10 large.ogg synth 229806 sine 440 channels 2
sox -n -r 44100 large.wav synth 6692 sine 440 channels 2
ffmpeg -f lavfi -i "sine=frequency=440:duration=244676" -y large.m4a
The only file which was readable with the tagreader example
before this commit was large.ogg. The problem is that long on
Windows is only 32-bit (also in LLP64 data model of 64-bit
compilation target) and all the file offsets using long are
too small for large files. Now long is replaced by offset_t
(defined to be long long on Windows and off_t on UNIX) for such
cases and some unsigned long are now size_t, which has the
correct size even on Windows.
Some ID3v2.4.0 frames such as text information frames support multiple strings
separated by the termination code of the character encoding. If the encoding
is $01 UTF-16 with BOM, all strings shall have the same byte order. In the
multi strings written by TagLib, all string elements of such a multi string
have a BOM. However, I have often seen tags where a BOM exists only at the
beginning, i.e. at the start of the first string. In such a case, TagLib will
only return a list with the first string and a second empty string. This
commit will detect such cases and parse the strings without BOM according to
the BOM of the first string.
There are m4a files with regular (non-full) meta atoms. When such
a meta atom is not correctly parsed, the subsequent atoms are not
recognized and offsets will not be adjusted when atoms are added,
which will corrupt the MP4 file.
This change will look behind the meta atom to check if the next
atom follows directly, i.e. without the four bytes with version
and flags as they exist in full atoms. In such a case, these
four bytes will not be skipped.
Witnesses of this strange format specification are
https://leo-van-stee.github.io/https://github.com/axiomatic-systems/Bento4/blob/v1.6.0-639/Source/C%2B%2B/Core/Ap4ContainerAtom.cpp#L60
This changes the modifications from the last commit in order to
achieve the following behavior: MP4::File::save() works in the
same way as before, i.e. it will never shrink the file and will
make space from removed items available as padding in the form of
a "free" atom. To completely remove the "meta" atom from the file,
a new method strip() is introduced, which can be used in the same
way as its MPEG::File::strip() counterpart.
Currently, MP4 tags can only grow. If items are removed, they are
just replaced by padding in the form of "free" atoms. This change
will remove the whole "meta" atom when an MP4 tag without items
is saved. This will make it possible, to bring the file back to
its pristine state without metadata.
The support for MusicBrainz properties is enhanced with "ARTISTS", "ASIN",
"RELEASECOUNTRY", "RELEASESTATUS", "RELEASETYPE", "MUSICBRAINZ_RELEASETRACKID",
"ORIGINALDATE" on APE, ASF, MP4, ID3v2, and Xiph tags.
As described in id3v2.3.0.txt (4.2.1, TCON), multiple genres are
only possible as references to ID3v1 genres with an optional
refinement as a text. When downgrading multiple genres from
ID3v2.4.0, they are now converted to numbers when possible and
the first genre text without ID3v1 reference is added as a
refinement. The keywords RX and CR are supported too.
This incorporates [6ca536b5] (mp4 properties: handle the case when
mp4 file header has zero bitrate) from PR #899 with a more accurate
bitrate calculation and a unit test.
This will allow editing the tags of WAV files which have data
appended at the end. The corresponding unit test checks that the
original contents are still available after editing the metadata
of such files.