gpt4 book ai didi

delphi - TWebBrowser 的 HTML 源代码 - 如何检测流编码?

转载 作者:行者123 更新时间:2023-12-03 15:33:03 25 4
gpt4 key购买 nike

基于这个问题:How can I get HTML source code from TWebBrowser

如果我运行this code对于具有 Unicode 代码页的 html 页面,结果是乱码,因为 TStringStream 在 D7 中不是 Unicode。该页面可能是 UTF8 编码或其他 (Ansi) 代码页编码。

如何检测 TStream/IPersistStreamInit 是否为 Unicode/UTF8/Ansi?

对于此函数,如何始终返回正确的结果作为 WideString

function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;

如果我用 TMemoryStream 替换 TStringStream,并将 TMemoryStream 保存到文件中,一切都很好。它可以是 Unicode/UTF8/Ansi。但我总是想将流返回为 WideString:

function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;
var
// LStream: TStringStream;
LStream: TMemoryStream;
Stream : IStream;
LPersistStreamInit : IPersistStreamInit;
begin
if not Assigned(WebBrowser.Document) then exit;
// LStream := TStringStream.Create('');
LStream := TMemoryStream.Create;
try
LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
Stream := TStreamAdapter.Create(LStream,soReference);
LPersistStreamInit.Save(Stream,true);
// result := LStream.DataString;
LStream.SaveToFile('c:\test\test.txt'); // test only - file is ok
Result := ??? // WideString
finally
LStream.Free();
end;
end;
<小时/>

编辑:我找到这篇文章 - How to load and save documents in TWebBrowser in a Delphi-like way

这正是我所需要的。但它仅适用于 Delphi Unicode 编译器 (D2009+)。阅读Conclusion部分:

There is obviously a lot more we could do. A couple of things immediately spring to mind. We retro-fit some of the Unicode functionality and support for non-ANSI encodings to the pre-Unicode compiler code. The present code when compiled with anything earlier than Delphi 2009 will not save document content to strings correctly if the document character set is not ANSI.

魔法显然在于TEncoding类(TEncoding.GetBufferEncoding)。但D7没有TEncoding。有什么想法吗?

最佳答案

我用了GpTextStream处理转换(应该适用于所有 Delphi 版本):

function GetCodePageFromHTMLCharSet(Charset: WideString): Word;
const
WIN_CHARSET = 'windows-';
ISO_CHARSET = 'iso-';
var
S: string;
begin
Result := 0;
if Charset = 'unicode' then
Result := CP_UNICODE else
if Charset = 'utf-8' then
Result := CP_UTF8 else
if Pos(WIN_CHARSET, Charset) <> 0 then
begin
S := Copy(Charset, Length(WIN_CHARSET) + 1, Maxint);
Result := StrToIntDef(S, 0);
end else
if Pos(ISO_CHARSET, Charset) <> 0 then // ISO-8859 (e.g. iso-8859-1: => 28591)
begin
S := Copy(Charset, Length(ISO_CHARSET) + 1, Maxint);
S := Copy(S, Pos('-', S) + 1, 2);
if S = '15' then // ISO-8859-15 (Latin 9)
Result := 28605
else
Result := StrToIntDef('2859' + S, 0);
end;
end;

function GetWebBrowserHTML(WebBrowser: TWebBrowser): WideString;
var
LStream: TMemoryStream;
Stream: IStream;
LPersistStreamInit: IPersistStreamInit;
TextStream: TGpTextStream;
Charset: WideString;
Buf: WideString;
CodePage: Word;
N: Integer;
begin
Result := '';
if not Assigned(WebBrowser.Document) then Exit;
LStream := TMemoryStream.Create;
try
LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
Stream := TStreamAdapter.Create(LStream, soReference);
if Failed(LPersistStreamInit.Save(Stream, True)) then Exit;
Charset := (WebBrowser.Document as IHTMLDocument2).charset;
CodePage := GetCodePageFromHTMLCharSet(Charset);
N := LStream.Size;
SetLength(Buf, N);
TextStream := TGpTextStream.Create(LStream, tsaccRead, [], CodePage);
try
N := TextStream.Read(Buf[1], N * SizeOf(WideChar)) div SizeOf(WideChar);
SetLength(Buf, N);
Result := Buf;
finally
TextStream.Free;
end;
finally
LStream.Free();
end;
end;

关于delphi - TWebBrowser 的 HTML 源代码 - 如何检测流编码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14268220/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com