Correctly parse ID3v2.4.0 multiple strings with single BOM (#1055)

Some ID3v2.4.0 frames such as text information frames support multiple strings separated by the termination code of the character encoding. If the encoding is $01 UTF-16 with BOM, all strings shall have the same byte order. In the multi strings written by TagLib, all string elements of such a multi string have a BOM. However, I have often seen tags where a BOM exists only at the beginning, i.e. at the start of the first string. In such a case, TagLib will only return a list with the first string and a second empty string. This commit will detect such cases and parse the strings without BOM according to the BOM of the first string.
2026-02-12 11:12:58 -05:00 · 2022-07-25 20:37:15 +02:00
parent 50b89ad19a
commit 4e7f844ea6
2 changed files with 57 additions and 5 deletions
--- a/taglib/mpeg/id3v2/frames/textidentificationframe.cpp
+++ b/taglib/mpeg/id3v2/frames/textidentificationframe.cpp
@ -218,12 +218,32 @@ void TextIdentificationFrame::parseFields(const ByteVector &data)
  // append those split values to the list and make sure that the new string's
  // type is the same specified for this frame

+  unsigned short firstBom = 0;
  for(ByteVectorList::ConstIterator it = l.begin(); it != l.end(); it++) {
    if(!(*it).isEmpty()) {
-      if(d->textEncoding == String::Latin1)
+      if(d->textEncoding == String::Latin1) {
        d->fieldList.append(Tag::latin1StringHandler()->parse(*it));
-      else
-        d->fieldList.append(String(*it, d->textEncoding));
+      }
+      else {
+        String::Type textEncoding = d->textEncoding;
+        if(textEncoding == String::UTF16) {
+          if(it == l.begin()) {
+            firstBom = it->mid(0, 2).toUShort();
+          }
+          else {
+            unsigned short subsequentBom = it->mid(0, 2).toUShort();
+            if(subsequentBom != 0xfeff && subsequentBom != 0xfffe) {
+              if(firstBom == 0xfeff) {
+                textEncoding = String::UTF16BE;
+              }
+              else if(firstBom == 0xfffe) {
+                textEncoding = String::UTF16LE;
+              }
+            }
+          }
+        }
+        d->fieldList.append(String(*it, textEncoding));
+      }
    }
  }
 }