gpt4 book ai didi

authentication - 如何在 Apache Nutch 中进行 NTLM 身份验证?

转载 作者:行者123 更新时间:2023-12-02 03:42:00 26 4
gpt4 key购买 nike

网络爬虫 Apache Nutch 内置了对 NTLM 的支持。我正在尝试使用 1.7 版使用 NTLM 身份验证来爬网网站 (Windows Sharepoint)。我根据 https://wiki.apache.org/nutch/HttpAuthenticationSchemes 设置了 Nutch这特别意味着我有凭据

<credentials username="rickert" password="mypassword">
<authscope host="server-to-be-crawled.com" port="80" realm="CORP" scheme="NTLM"/>
</credentials>

已配置。当我查看日志文件时,我可以看到 Nutch 尝试访问种子 URL 并经历“正常”NTLM 循环:在第一个 GET 期间获得 401 错误,提取 NTLM 质询并在下一个 GET 中发送 NTLM 身份验证(使用保持事件连接)。但是第二次GET也没有成功。

当我怀疑我的凭据或特定设置存在一些基本问题时:我在 Windows 主机上的 Debian guest Virtual Box 中运行 Nutch。但令我惊讶的是wgetcurl能够使用我的凭据从 Debian guest 中检索文档。有趣的是,这两个命令行工具都只需要用户名和密码即可工作。另一方面,完整的 NTLM 规范还需要一个主机 和一个。根据规范,主机 是请求的来源主机,我会将其解释为运行 http 代理的主机,即 Windows 域中的 用户名与之相关联。我的假设是这两种工具都只是将此详细信息留空。

这就是 Nutch 配置的用武之地:主机 据称提供为 http.agent.host在配置文件中。 应该被配置为凭据的领域,但文档更确切地说这是一个约定,并不是真正必要的。但是,无论我是否设置领域,结果都是一样的。再次查看日志文件,我可以看到一些消息,表明使用 <any_realm>@server-to-be-crawled.com 解决了身份验证问题无论我使用哪个领域。

我的直觉是 Nutch 配置值到 Java 类所需的 NTLM 参数的映射有一些错误 httpclient执行 GET。我很无奈。任何人都可以给我一些关于如何进一步调试它的提示吗?有人有适用于 SharePoint 服务器的具体配置吗?谢谢!

最佳答案

这是一个旧线程,但它似乎是一个常见问题,我终于找到了解决方案。

在我的例子中,问题是我试图抓取的内容源托管在一个相当新的 IIS 服务器上。检查 header 表明它使用的是 NTLMv1,但在阅​​读 Apache Commons HttpClient v3.x 仅支持 NTLMv1 而不是 NTLMv2 后,我开始寻找一种方法将该支持添加到 nutch v1.15 而无需升级到较新的 HttpComponents 版本的 HttpClient。

线索就在documentation for the newer HC version of HttpClient所以,使用 this approach with JCIFS我设法修改了 nutch protocol-httpclient Http 类,以便它使用我新的基于 JCIFS 的 NTLM 方案进行身份验证。执行此操作的步骤:

  1. 创建新的基于 JCIFS 的 NTLMScheme
  2. 在Http.configureClient中,注册新方案的使用
  3. 将 JCIFS 添加到 nutch protocol-httpclient 插件类路径

工作完成后,我就可以抓取受 NTLMv2 保护的网站了。

通过添加大量额外的日志记录,我可以看到身份验证握手的详细信息,这些详细信息表明它实际上正在使用 NTLMv2。

Http.configureClient 的变化如下所示:

  /** Configures the HTTP client */
private void configureClient() {
LOG.info("Setting new NTLM scheme: " + JcifsNtlmScheme.class.getName());
AuthPolicy.registerAuthScheme(AuthPolicy.NTLM, JcifsNtlmScheme.class);
...
}

新的 NTLM 方案实现看起来像这样(需要一些整理)。


public class JcifsNtlmScheme implements AuthScheme {

public static final Logger LOG = LoggerFactory.getLogger(JcifsNtlmScheme.class);

/** NTLM challenge string. */
private String ntlmchallenge = null;

private static final int UNINITIATED = 0;
private static final int INITIATED = 1;
private static final int TYPE1_MSG_GENERATED = 2;
private static final int TYPE2_MSG_RECEIVED = 3;
private static final int TYPE3_MSG_GENERATED = 4;
private static final int FAILED = Integer.MAX_VALUE;

/** Authentication process state */
private int state;

public JcifsNtlmScheme() throws AuthenticationException {
// Check if JCIFS is present. If not present, do not proceed.
try {
Class.forName("jcifs.ntlmssp.NtlmMessage", false, this.getClass().getClassLoader());
LOG.trace("jcifs.ntlmssp.NtlmMessage is present");
} catch (ClassNotFoundException e) {
throw new AuthenticationException("Unable to proceed as JCIFS library is not found.");
}
}

public String authenticate(Credentials credentials, HttpMethod method) throws AuthenticationException {
LOG.trace("authenticate called. State: " + this.state);
if (this.state == UNINITIATED) {
throw new IllegalStateException("NTLM authentication process has not been initiated");
}

NTCredentials ntcredentials = null;
try {
ntcredentials = (NTCredentials) credentials;
} catch (ClassCastException e) {
throw new InvalidCredentialsException(
"Credentials cannot be used for NTLM authentication: " + credentials.getClass().getName());
}

NTLM ntlm = new NTLM();
String charset = method.getParams().getCredentialCharset();
LOG.trace("Setting credential charset to: " + charset);
ntlm.setCredentialCharset(charset);

String response = null;
if (this.state == INITIATED || this.state == FAILED) {
LOG.trace("Generating TYPE1 message");
response = ntlm.generateType1Msg(ntcredentials.getHost(), ntcredentials.getDomain());
this.state = TYPE1_MSG_GENERATED;
} else {
LOG.trace("Generating TYPE3 message");
response = ntlm.generateType3Msg(ntcredentials.getUserName(), ntcredentials.getPassword(),
ntcredentials.getHost(), ntcredentials.getDomain(), this.ntlmchallenge);
this.state = TYPE3_MSG_GENERATED;
}

String result = "NTLM " + response;
return result;

}

public String authenticate(Credentials credentials, String method, String uri) throws AuthenticationException {
throw new RuntimeException("Not implemented as it is deprecated anyway in Httpclient 3.x");
}

public String getID() {
throw new RuntimeException("Not implemented as it is deprecated anyway in Httpclient 3.x");
}

/**
* Returns the authentication parameter with the given name, if available.
*
* <p>
* There are no valid parameters for NTLM authentication so this method always
* returns null.
* </p>
*
* @param name The name of the parameter to be returned
*
* @return the parameter with the given name
*/
public String getParameter(String name) {
if (name == null) {
throw new IllegalArgumentException("Parameter name may not be null");
}
return null;
}

/**
* The concept of an authentication realm is not supported by the NTLM
* authentication scheme. Always returns <code>null</code>.
*
* @return <code>null</code>
*/
public String getRealm() {
return null;
}

/**
* Returns textual designation of the NTLM authentication scheme.
*
* @return <code>ntlm</code>
*/
public String getSchemeName() {
return "ntlm";
}

/**
* Tests if the NTLM authentication process has been completed.
*
* @return true if Basic authorization has been processed,
* false otherwise.
*
* @since 3.0
*/
public boolean isComplete() {

boolean result = this.state == TYPE3_MSG_GENERATED || this.state == FAILED;
LOG.trace("isComplete? " + result);

return result;
}

/**
* Returns true. NTLM authentication scheme is connection based.
*
* @return true.
*
* @since 3.0
*/
public boolean isConnectionBased() {
return true;
}

/**
* Processes the NTLM challenge.
*
* @param challenge the challenge string
*
* @throws MalformedChallengeException is thrown if the authentication challenge
* is malformed
*
* @since 3.0
*/
public void processChallenge(final String challenge) throws MalformedChallengeException {
String s = AuthChallengeParser.extractScheme(challenge);
LOG.trace("processChallenge called. challenge: " + challenge + " scheme: " + s);

if (!s.equalsIgnoreCase(getSchemeName())) {
LOG.trace("Invalid scheme name in challenge. Should be: " + getSchemeName());
throw new MalformedChallengeException("Invalid NTLM challenge: " + challenge);
}
int i = challenge.indexOf(' ');
if (i != -1) {
LOG.trace("processChallenge: TYPE2 message received");
s = challenge.substring(i, challenge.length());
this.ntlmchallenge = s.trim();
this.state = TYPE2_MSG_RECEIVED;
} else {
this.ntlmchallenge = "";
if (this.state == UNINITIATED) {
this.state = INITIATED;
LOG.trace("State was UNINITIATED, switching to INITIATED");
} else {
LOG.trace("State is FAILED");
this.state = FAILED;
}
}
}

private class NTLM {
/** Character encoding */
public static final String DEFAULT_CHARSET = "ASCII";

/**
* The character was used by 3.x's NTLM to encode the username and password.
* Apparently, this is not needed in when passing username, password from
* NTCredentials to the JCIFS library
*/
private String credentialCharset = DEFAULT_CHARSET;

void setCredentialCharset(String credentialCharset) {
this.credentialCharset = credentialCharset;
}

private String generateType1Msg(String host, String domain) {
jcifs.ntlmssp.Type1Message t1m = new jcifs.ntlmssp.Type1Message(
jcifs.ntlmssp.Type1Message.getDefaultFlags(), domain, host);
String result = jcifs.util.Base64.encode(t1m.toByteArray());
LOG.trace("generateType1Msg: " + result);

return result;
}

private String generateType3Msg(String username, String password, String host, String domain,
String challenge) {
jcifs.ntlmssp.Type2Message t2m;
try {
t2m = new jcifs.ntlmssp.Type2Message(jcifs.util.Base64.decode(challenge));
} catch (IOException e) {
throw new RuntimeException("Invalid Type2 message", e);
}

jcifs.ntlmssp.Type3Message t3m = new jcifs.ntlmssp.Type3Message(t2m, password, domain, username, host, 0);
String result = jcifs.util.Base64.encode(t3m.toByteArray());
LOG.trace("generateType3Msg username: [" + username + "] host: [" + host + "] domain: [" + domain
+ "] response: [" + result + "]");
return result;
}
}
}

关于authentication - 如何在 Apache Nutch 中进行 NTLM 身份验证?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19529619/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com