gpt4 book ai didi

delphi - Unicode 字符串中的奇数字符

转载 作者:行者123 更新时间:2023-12-03 15:48:49 26 4
gpt4 key购买 nike

在实现 MP3 ID3 v2 时我遇到了一些问题。除了这个问题之外,我的大部分工作都可以正常进行,这可能与此完全无关。无论如何,我使用下面的代码来处理检索涉及文本的标题标签的数据。

我遇到的是(我猜?)我在一些不同的字符串中遇到了 Unicode 字符。我尝试在下面对其进行转换,并且它有效。但我在字符串前面得到 $3F 作为字符,在后面得到 $3F$3F 。我可以对下面的代码做些什么来解析它们,还是我必须自己做?如果有帮助的话,这些文件是由 iTunes 编码的。

function Id3v2_string(currp: pointer; datasize: integer): string;
{ handles string processing for ID3v2 data }
const
IS_TEXT_UNICODE_UNICODE_MASK = $0F;
var
outstr: string;
uscan: integer;
begin
outstr := '';
SetLength(outstr, datasize);
uscan := IS_TEXT_UNICODE_UNICODE_MASK;
if IsTextUnicode(currp, datasize, @uscan) then
outstr := WideCharToString(currp)
else
move(currp^, outstr[1], datasize);
Result := outstr;
end;

请注意,我真的对此媒体库不感兴趣,因为我想做的只是编辑 ID3 标签而不是播放文件 - 除了像这样的一些小问题外,实现已经完成。

最佳答案

取决于 ID3 的版本使用 v2 时,文本字符串前面可能会也可能不会有一个字节来告诉您字符串的实际编码。不要使用 IsTextUnicode() 来猜测编码是什么(特别是因为它可以报告 false results )。

在 ID3 v2 到 v2.3 中,没有编码字节,文本为 ISO-8859-1UCS-2,以及 UCS -2 字符串始终以 BOM 开头,以便您了解字节顺序。例如:

// prior to Delphi 2009 - String is Ansi
function Id3v2_string(currp: Pointer; datasize: Integer): String;
var
W: WideString;
I: Integer;
Ch: WideChar;
begin
Result := '';
if (datasize >= SizeOf(Word)) and ((PWord(currp)^ = $FEFF) or (PWord(currp)^= $FFFE)) then begin
// UCS-2 with BOM
W := WideCharLenToString(PWideChar(Integer(currp) + SizeOf(Word)), (datasize - SizeOf(Word)) div SizeOf(WideChar));
if PWord(currp)^ = $FFFE then begin
// BE, convert to LE
for I := 1 to Length(W) do begin
Ch := W[I];
W[I] := WideChar(((Word(Ch) and $FF) shl 8) or (Word(Ch) shr 8));
end;
end;
end else begin
// ISO-8859-1
I := MultiByteToWideChar(28591, 0, PAnsiChar(currp), datasize, nil, 0);
if I > 0 then begin
SetLength(W, I);
MultiByteToWideChar(28591, 0, PAnsiChar(currp), datasize, PWideChar(W), I);
end;
end;
Result := TrimRight(W);
end;

.

// Delphi 2009+ - String is Unicode
function Id3v2_string(currp: Pointer; datasize: Integer): String;
var
Enc: TEncoding;

function Convert(P: Pointer; Size: Integer): String;
var
Buf: TBytes;
begin
SetLength(Buf, Size);
if Size > 0 then Move(P^, Buf[0], Size);
Result := Enc.GetString(Buf);
end;

begin
Result := '';
if (datasize >= SizeOf(Word)) and ((PWord(currp)^ = $FEFF) or (PWord(currp)^ = $FFFE)) then begin
// UCS-2 with BOM
if PWord(currp)^ = $FFFE then begin
// BE
Enc := TEncoding.BigEndianUnicode;
end else begin
// LE
Enc := TEncoding.Unicode;
end;
Result := Convert(PWord(currp)+1, datasize - SizeOf(Word));
end else begin
// ISO-8859-1
Enc := TEncoding.GetEncoding(28591);
try
Result := Convert(currp, datasize);
finally
Enc.Free;
end;
end;
end;

ID3 v2.4 将 UCS-2 切换为 UTF-16,并添加了对 UTF-8UTF- 的支持16BE 无 BOM,例如:

// prior to Delphi 2009 - String is Ansi
function Id3v2_string(currp: Pointer; datasize: Integer; Encoding: Byte): String;
var
W: WideString;
I: Integer;
Ch: WideChar;
begin
Result := '';

case Encoding of
$00: begin
// ISO-8859-1
I := MultiByteToWideChar(28591, 0, PAnsiChar(currp), datasize, nil, 0);
if I > 0 then begin
SetLength(W, I);
MultiByteToWideChar(28591, 0, PAnsiChar(currp), datasize, PWideChar(W), I);
end;
end;
$01: begin
// UTF-16 with BOM
SetString(W, PWideChar(Integer(currp) + SizeOf(Word)), (datasize - SizeOf(Word)) div SizeOf(WideChar));
if PWord(currp)^ = $FFFE then begin
// BE, convert to LE
for I := 1 to Length(W) do begin
Ch := W[I];
W[I] := WideChar(((Word(Ch) and $FF) shl 8) or (Word(Ch) shr 8));
end;
end;
end;
$02: begin
// UTF-16BE without BOM, convert to LE
SetString(W, PWideChar(currp), datasize div SizeOf(WideChar));
for I := 1 to Length(W) do begin
Ch := W[I];
W[I] := WideChar(((Word(Ch) and $FF) shl 8) or (Word(Ch) shr 8));
end;
end;
$03: begin
// UTF-8
I := MultiByteToWideChar(65001, 0, PAnsiChar(currp), datasize, nil, 0);
if I > 0 then begin
SetLength(W, I);
MultiByteToWideChar(65001, 0, PAnsiChar(currp), datasize, PWideChar(W), I);
end;
end;
end;
Result := TrimRight(W);
end;

.

// Delphi 2009+ - String is Unicode
function Id3v2_string(currp: Pointer; datasize: Integer; Encoding: Byte): String;
var
Enc: TEncoding;

function Convert(P: Pointer; Size: Integer): String;
var
Buf: TBytes;
begin
SetLength(Buf, Size);
if Size > 0 then Move(P^, Buf[0], Size);
Result := Enc.GetString(Buf);
end;

begin
Result := '';

case Encoding of
$00: begin
// ISO-8859-1
Enc := TEncoding.GetEncoding(28591);
try
Result := Convert(currp, datasize);
finally
Enc.Free;
end;
end;
$01: begin
// UTF-16 with BOM
if PWord(currp)^ = $FFFE then begin
// BE
Enc := TEncoding.BigEndianUnicode;
end else begin
// LE
Enc := TEncoding.Unicode;
end;
Result := Convert(PWord(currp)+1, datasize - SizeOf(Word));
end;
$02: begin
// UTF-16BE without BOM
Enc := TEncoding.BigEndianUnicode;
Result := Convert(currp, datasize);
end;
$03: begin
// UTF-8
Enc := TEncoding.UTF8;
Result := Convert(currp, datasize);
end;
end;
Result := TrimRight(Result);
end;

关于delphi - Unicode 字符串中的奇数字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9540666/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com