gpt4 book ai didi

javascript - 使用 CefSharp.Offscreen 检索需要 Javascript 呈现的网页

转载 作者:数据小太阳 更新时间:2023-10-29 06:10:36 24 4
gpt4 key购买 nike

我的任务很简单,但需要精通 CefSharp 的人才能解决。

我有一个 url,我想从中检索 HTML。问题是这个特定的 url 实际上并没有在 GET 上分发页面。相反,它将一堆 Javascript 推送到浏览器,然后浏览器执行并生成实际呈现的页面。这意味着涉及 HttpWebRequestHttpWebResponse 的常用方法不会起作用。

我看过许多不同的“ headless ”选项,出于多种原因,我认为最能满足我的需求的是 CefSharp.Offscreen。但是我不知道这东西是如何工作的。我看到有几个可以订阅的事件和一些配置选项,但我不需要嵌入式浏览器之类的东西。

我真正需要的是一种方法来做这样的事情(伪代码):

string html = CefSharp.Get(url);

我订阅事件没有问题,如果这是等待 Javascript 执行和生成渲染页面所需要的。

最佳答案

我知道我正在做一些考古学来恢复 2yo 的帖子,但详细的回答可能对其他人有用。

是的,Cefsharp.Offscreen 适合这项任务。

下面是一个将处理所有浏览器事件的类。

using System;
using System.IO;
using System.Threading;
using CefSharp;
using CefSharp.OffScreen;

namespace [whatever]
{
public class Browser
{

/// <summary>
/// The browser page
/// </summary>
public ChromiumWebBrowser Page { get; private set; }
/// <summary>
/// The request context
/// </summary>
public RequestContext RequestContext { get; private set; }

// chromium does not manage timeouts, so we'll implement one
private ManualResetEvent manualResetEvent = new ManualResetEvent(false);

public Browser()
{
var settings = new CefSettings()
{
//By default CefSharp will use an in-memory cache, you need to specify a Cache Folder to persist data
CachePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData), "CefSharp\\Cache"),
};

//Autoshutdown when closing
CefSharpSettings.ShutdownOnExit = true;

//Perform dependency check to make sure all relevant resources are in our output directory.
Cef.Initialize(settings, performDependencyCheck: true, browserProcessHandler: null);

RequestContext = new RequestContext();
Page = new ChromiumWebBrowser("", null, RequestContext);
PageInitialize();
}

/// <summary>
/// Open the given url
/// </summary>
/// <param name="url">the url</param>
/// <returns></returns>
public void OpenUrl(string url)
{
try
{
Page.LoadingStateChanged += PageLoadingStateChanged;
if (Page.IsBrowserInitialized)
{
Page.Load(url);

//create a 60 sec timeout
bool isSignalled = manualResetEvent.WaitOne(TimeSpan.FromSeconds(60));
manualResetEvent.Reset();

//As the request may actually get an answer, we'll force stop when the timeout is passed
if (!isSignalled)
{
Page.Stop();
}
}
}
catch (ObjectDisposedException)
{
//happens on the manualResetEvent.Reset(); when a cancelation token has disposed the context
}
Page.LoadingStateChanged -= PageLoadingStateChanged;
}

/// <summary>
/// Manage the IsLoading parameter
/// </summary>
/// <param name="sender"></param>
/// <param name="e"></param>
private void PageLoadingStateChanged(object sender, LoadingStateChangedEventArgs e)
{
// Check to see if loading is complete - this event is called twice, one when loading starts
// second time when it's finished
if (!e.IsLoading)
{
manualResetEvent.Set();
}
}

/// <summary>
/// Wait until page initialization
/// </summary>
private void PageInitialize()
{
SpinWait.SpinUntil(() => Page.IsBrowserInitialized);
}
}
}

现在在我的应用程序中,我只需要执行以下操作:

public MainWindow()
{
InitializeComponent();
_browser = new Browser();
}

private async void GetGoogleSource()
{
_browser.OpenUrl("http://icanhazip.com/");
string source = await _browser.Page.GetSourceAsync();
}

这是我得到的字符串

"<html><head></head><body><pre style=\"word-wrap: break-word; white-space: pre-wrap;\">NotGonnaGiveYouMyIP:)\n</pre></body></html>"

关于javascript - 使用 CefSharp.Offscreen 检索需要 Javascript 呈现的网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35471261/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com