gpt4 book ai didi

c# - 提交 C# MapReduce 作业 Windows Azure HDInsight - 响应状态代码不表示成功 : 500 (Server Error)

转载 作者:可可西里 更新时间:2023-11-01 14:20:03 26 4
gpt4 key购买 nike

我正在尝试将 MapReduce 作业提交到 HDInsight 群集。在我的工作中,我没有写减少部分,因为我不想减少任何东西。我想做的就是解析每个文件名并将值附加到文件中的每一行。这样我就可以获得文件中所需的所有数据。

我的代码是

using Microsoft.Hadoop.MapReduce;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace GetMetaDataFromFileName
{
class Program
{
static void Main(string[] args)
{
var hadoop = connectAzure();

//Temp Workaround to Env Variables
Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:\hadoop");
Environment.SetEnvironmentVariable("Java_HOME", @"c:\hadoop\jvm");

var result = hadoop.MapReduceJob.ExecuteJob<MetaDataGetterJob>();
}

static IHadoop connectAzure()
{
//TODO: Update credentials and other information
return Hadoop.Connect(
new Uri("https://sampleclustername.azurehdinsight.net//"),
"admin",
"Hadoop",
"password",
"blobstoragename.blob.core.windows.net", //Storage Account that Log files exists
"AccessKeySample", //Storage Account Access Key
"logs", //Container Name
true
);
}

//Hadoop Mapper
public class MetaDataGetter : MapperBase
{
public override void Map(string inputLine, MapperContext context)
{
try
{
//Get the meta data from name of the file
string[] _fileMetaData = context.InputFilename.Split('_');

string _PublicIP = _fileMetaData[0].Trim();
string _PhysicalAdapterMAC = _fileMetaData[1].Trim();
string _BootID = _fileMetaData[2].Trim();
string _ServerUploadTime = _fileMetaData[3].Trim();
string _LogType = _fileMetaData[4].Trim();
string _MachineUpTime = _fileMetaData[5].Trim();

//Generate CSV portion
string _RowHeader = string.Format("{0},{1},{2},{3},{4},{5},", _PublicIP, _PhysicalAdapterMAC, _BootID, _ServerUploadTime, _LogType, _MachineUpTime);

//TODO: Append _RowHeader to every row in the file.
context.EmitLine(_RowHeader + inputLine);
}
catch(ArgumentException ex)
{
return;
}
}
}

//Hadoop Job Definition
public class MetaDataGetterJob : HadoopJob<MetaDataGetter>
{
public override HadoopJobConfiguration Configure(ExecutorContext context)
{
//Initiate the job config
HadoopJobConfiguration config = new HadoopJobConfiguration();
config.InputPath = "asv://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="167a7971655665777b667a7338747a7974387579647338617f787279616538787362" rel="noreferrer noopener nofollow">[email protected]</a>/Input";
config.OutputFolder = "asv://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b0dcdfd7c3f0c3d1ddc0dcd59ed2dcdfd29ed3dfc2d59ec7d9ded4dfc7c39eded5c4" rel="noreferrer noopener nofollow">[email protected]</a>/Output";
config.DeleteOutputFolder = true;
return config;
}
}
}
}

您通常认为 500(服务器错误)的原因是什么?我是否提供了错误的凭证?实际上我并没有真正理解 Hadoop.Connect 方法中的 Username 和 HadoopUser 参数之间的区别?

谢谢,

最佳答案

我过去遇到过大致相同的问题(无法通过 BadGateway 响应将配置单元作业提交到集群)。我已经联系了支持团队,就我而言,问题出在头节点的内存泄漏,这意味着问题不在客户端,而且似乎是继承的hadoop问题。

我已经通过重新部署集群解决了这个问题。您是否尝试过提交其他工作(简单的工作)?如果是这样,我建议您与 azure 支持团队联系,或者重新部署集群(如果这对您来说不那么痛苦的话)。

关于c# - 提交 C# MapReduce 作业 Windows Azure HDInsight - 响应状态代码不表示成功 : 500 (Server Error),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27390295/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com