gpt4 book ai didi

c# - 代码翻译(Python 到 C#)需要一些帮助

转载 作者:太空宇宙 更新时间:2023-11-03 11:10:52 27 4
gpt4 key购买 nike

大家晚安

这个问题让我有点尴尬,因为我当然知道我一个人应该能够得到答案。然而,我对 Python 的了解只是一点点,所以我需要比我更有经验的人的帮助......

以下代码来自Norvig's "Natural Language Corpus Data"最近编辑的书中的一章,它是关于将句子“likethisone”转换为“[like, this, one]”(也就是说,正确分割单词)...

除了函数segment,我已经将所有代码移植到C#(事实上,我自己重写了程序),我什至试图理解它都遇到了很多麻烦句法。有人可以帮我将它翻译成更易读的 C# 形式吗?

非常感谢您。

################ Word Segmentation (p. 223)

@memo
def segment(text):
"Return a list of words that is the best segmentation of text."
if not text: return []
candidates = ([first]+segment(rem) for first,rem in splits(text))
return max(candidates, key=Pwords)

def splits(text, L=20):
"Return a list of all possible (first, rem) pairs, len(first)<=L."
return [(text[:i+1], text[i+1:])
for i in range(min(len(text), L))]

def Pwords(words):
"The Naive Bayes probability of a sequence of words."
return product(Pw(w) for w in words)

#### Support functions (p. 224)

def product(nums):
"Return the product of a sequence of numbers."
return reduce(operator.mul, nums, 1)

class Pdist(dict):
"A probability distribution estimated from counts in datafile."
def __init__(self, data=[], N=None, missingfn=None):
for key,count in data:
self[key] = self.get(key, 0) + int(count)
self.N = float(N or sum(self.itervalues()))
self.missingfn = missingfn or (lambda k, N: 1./N)
def __call__(self, key):
if key in self: return self[key]/self.N
else: return self.missingfn(key, self.N)

def datafile(name, sep='\t'):
"Read key,value pairs from file."
for line in file(name):
yield line.split(sep)

def avoid_long_words(key, N):
"Estimate the probability of an unknown word."
return 10./(N * 10**len(key))

N = 1024908267229 ## Number of tokens

Pw = Pdist(datafile('count_1w.txt'), N, avoid_long_words)

最佳答案

让我们先处理第一个函数:

def segment(text): 
"Return a list of words that is the best segmentation of text."
if not text: return []
candidates = ([first]+segment(rem) for first,rem in splits(text))
return max(candidates, key=Pwords)

它接受一个单词并返回它可能是的最有可能的单词列表,因此它的签名将为 static IEnumerable<string> segment(string text) .显然如果text是一个空字符串,它的结果应该是一个空列表。否则,它会创建一个递归列表理解,定义可能的候选单词列表,并根据其概率返回最大值。

static IEnumerable<string> segment(string text)
{
if (text == "") return new string[0]; // C# idiom for empty list of strings
var candidates = from pair in splits(text)
select new[] {pair.Item1}.Concat(segment(pair.Item2));
return candidates.OrderBy(Pwords).First();
}

当然,现在我们要翻译splits功能。它的工作是返回一个单词开头和结尾的所有可能元组的列表。翻译起来相当简单:

static IEnumerable<Tuple<string, string>> splits(string text, int L = 20)
{
return from i in Enumerable.Range(1, Math.Min(text.Length, L))
select Tuple.Create(text.Substring(0, i), text.Substring(i));
}

接下来是Pwords ,它只调用 productPw 的结果上运行在其输入列表中的每个单词上:

static double Pwords(IEnumerable<string> words)
{
return product(from w in words select Pw(w));
}

product非常简单:

static double product(IEnumerable<double> nums)
{
return nums.Aggregate((a, b) => a * b);
}

附录:

查看完整的源代码,很明显 Norvig 想要 segment 的结果。函数被内存以提高速度。这是提供这种加速的版本:

static Dictionary<string, IEnumerable<string>> segmentTable =
new Dictionary<string, IEnumerable<string>>();

static IEnumerable<string> segment(string text)
{
if (text == "") return new string[0]; // C# idiom for empty list of strings
if (!segmentTable.ContainsKey(text))
{
var candidates = from pair in splits(text)
select new[] {pair.Item1}.Concat(segment(pair.Item2));
segmentTable[text] = candidates.OrderBy(Pwords).First().ToList();
}
return segmentTable[text];
}

关于c# - 代码翻译(Python 到 C#)需要一些帮助,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3946265/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com