gpt4 book ai didi

r - K-Skip-N-Gram : generalization of for-loops in R

转载 作者:行者123 更新时间:2023-12-01 03:54:48 25 4
gpt4 key购买 nike

我有一个 R 函数来生成 K-Skip-N-Grams :
我的完整功能可以在 github 找到.

我的代码确实正确生成了所需的 k-skip-ngram:

> kSkipNgram("Lorem ipsum dolor sit amet, consectetur adipiscing elit.", n=2, skip=1)
[1] "Lorem dolor" "Lorem ipsum" "ipsum sit"
[4] "ipsum dolor" "dolor amet" "dolor sit"
[7] "sit consectetur" "sit amet" "amet adipiscing"
[10] "amet consectetur" "consectetur elit" "consectetur adipiscing"
[13] "adipiscing elit"

但我想概括/简化以下嵌套 for 循环的 switch 语句:
# x - should be text, sentense
# n - n-gramm
# skip - number of skips
###################################
switch(as.character(n),
"0" = {ngram<-c(ngram, paste(x[i]))},
"1" = {for(j in skip:1)
{
if (i+j <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j]))}
}
},
"2" = {for(j in skip:1)
{for (k in skip:1)
{
if (i+j <= length(x) && i+j+k <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k]))}
}
}
},
"3" = {for(j in skip:1)
{for (k in skip:1)
{for (l in skip:1)
{
if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l]))}
}
}
}
},
"4" = {for(j in skip:1)
{for (k in skip:1)
{for (l in skip:1)
{for (m in skip:1)
{
if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x) && i+j+k+l+m <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l],x[i+j+k+l+m]))}
}
}
}
}
}
)
}
}

最佳答案

我对一般的 k-skip-n-gram 使用了递归解决方案。我已经将它包含在 Python 中;我对 R 没有经验,但希望你能翻译它。我使用了这篇论文中的定义:
http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf

如果你打算在长句子上使用它,这可能应该用一些动态编程来优化,因为它目前有很多冗余计算(重复计算子语法)。我也没有彻底测试过,可能会有极端情况。

def kskipngrams(sentence,k,n):
"Assumes the sentence is already tokenized into a list"
if n == 0 or len(sentence) == 0:
return None
grams = []
for i in range(len(sentence)-n+1):
grams.extend(initial_kskipngrams(sentence[i:],k,n))
return grams

def initial_kskipngrams(sentence,k,n):
if n == 1:
return [[sentence[0]]]
grams = []
for j in range(min(k+1,len(sentence)-1)):
kmjskipnm1grams = initial_kskipngrams(sentence[j+1:],k-j,n-1)
if kmjskipnm1grams is not None:
for gram in kmjskipnm1grams:
grams.append([sentence[0]]+gram)
return grams

关于r - K-Skip-N-Gram : generalization of for-loops in R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18259128/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com