gpt4 book ai didi

r - 计算词列表中的词频

转载 作者:行者123 更新时间:2023-12-04 14:57:34 25 4
gpt4 key购买 nike

我在数据框中有这个大型语料库数据

res(数据框)

文本1

1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <NA>
2 beren stuart vanuatu monday october venkatesh ramesh sandeep talanki nagaraj subject approve qlikview gpa access process form gpa access email requestor line manager access granted raj add user qlikview workgroup gpa access form requestors lim tek kon vanuatu address lini high port vila efate title relationship manager emerging corporates employee id lan id limtk bsbcc authorising manager beren stuart vanuatu read gpa dashboard business technical reason na
text.2
1 <NA>
2 kumar santhosh behalf relationshipbankingfinancesupport friday october venkatesh cc global business reporting subject fw approve qlikview gpa access santhosh faunt daniel png wednesday october relationshipbankingfinancesupport cc amet sova subject fw approve qlikview gpa access unable approve excel due macro issues process amet sova monday october faunt daniel png subject approve qlikview gpa access review attached form click line manager approval approve
text.3
1 <NA>
2 thomson owen tonga thursday october venkatesh ramesh sandeep talanki nagaraj subject approve qlikview gpa access process form gpa access email requestor line manager access granted raj add user qlikview workgroup gpa access form requestors hia viliami address head office fakafanua centre maufanga vuna road nukualofa tongatapu tonga nukualofa tongatapu title nfc amu manager employee id lan id hiav bsbcc authorising manager thomson owen tonga read gpa dashboard business technical reason

1 <NA>
2 kumar rajesh fiji tuesday october venkatesh ramesh sandeep talanki nagaraj subject approve qlikview gpa access process form gpa access email requestor line manager access granted raj add user qlikview workgroup gpa access form requestors fong vincent address level anz house victoria parade suva suva viti levu title national manager commercial banking fiji employee id lan id fongv bsbcc authorising manager kumar rajesh fiji read gpa dashboard business technical reason user
text.5
1 <NA>
2 dennis david timor thursday october buchanan geoffrey solomon islands subject approve qlikview gpa access review attached form click line manager approval approve
text.6
1 <NA>
2 matthey christopher wednesday october pm parrott louise subject document file documentzip
text.7
1
2 tan jasmine thursday october pm global business reporting cc tan yong hoong rai dinkar subject rtc view report sep sensitivity confidential team don’ access sharepoint link arrange access jasmine ayyamperumal rajendran ramesh kumar behalf global business reporting tuesday october pm kumar gaurav hong kong tan jasmine seah linda shroff manish behan thibault hong kong clay iv william cc tan yong hoong rai dinkar tan matthew rb finance sim sui poh subramanian raghuveer murugeshaiah sunil subject rtc view report sep sensitivity confidential october dear attached sharepoint report rtc portfolio client list august report discussed individual reviews rtc financials full client financials pivot table excel file metrics clients note report based rtc client list dinkar queries client list review list december reporting excel file worksheets rtc summary default income measure product details pivot table product measures rtc data detail client level data grouping rtc rtc methodology explained queries email global business reporting issues accessing reports sharepoint sharepoint link ø gaurav kumar ø jasmine tan ø linda seah ø manish shroff ø thibault behan ø william clay global business reporting team
text.8
1 <NA>
2 deo ravinesh friday october venkatesh global business reporting cc monteleone elif kabyanga isaac pinto rufus kiribati kumar santhosh subject approve qlikview gpa access team assist rufus ceo kiribati gpa access ravi
text.9
1 <NA>
2 epoa regina thursday october relationshipbankingfinancesupport subject gpa analysis filled form reports assist cheers regina
text.10
1 <NA>
2 original message tseng rickson thursday october pm global business reporting cc kumar santhosh wong toto subject fw gpa importance high santhosh venkatesh quickly grant global iib gpa access mary macleod cheers rickson original message wong toto thursday october pm tseng rickson kumar santhosh subject gpa installed qlik marys desktop access account ready toto original message tseng rickson wednesday october kumar santhosh cc wong toto subject gpa santhosh email gbr mailbox requesting marys iib cfo access gpa helping setup cheers rickson original message kumar santhosh wednesday october tseng rickson cc wong toto subject gpa rickson continue email global business report mailbox venkatesh cover work find replacement sandeep software package windows package apcqliktechintabqvpluginsetupr santhosh original message tseng rickson tuesday october pm kumar santhosh cc wong toto subject gpa santhosh sandeep left bank dont whats software package win gpa plugin dont grant access mary cheers rickson original message wong toto tuesday october pm tseng rickson subject fw gpa rickson advise software package upgrade marys desktop win week add package ready toto original message yip vivian tuesday october pm wong toto subject fw gpa toto gpa installed mary macleods desktop computerbefore friday october rickson computer lan id window version order installation advise vivian yip executive assistant mr gilles planté deputy ceo iib anz exchange square connaught place central hong kong phone original message broker ali tuesday october pm yip vivian tseng rickson li shirley cc macleod mary scott nicola subject gpa vivian gpa installed marys laptop installed rickson spend minutes mary week mary hk ali
text.11
1 <NA>
2 ang vanessa tuesday october global business reporting subject discontinued commercial fum performance report monday october team advise reach moving forward information required vanessa ayyamperumal rajendran ramesh kumar behalf global business reporting tuesday october au yeung ivy bhuta chintan chang frank chok christopher chuang jacky china dyer andy goyal aseem gupta vivek jiang charles li shirley lim jasmine ec loh jonathan mcleod donnelle miller greg singapore praswitrianto rama roumier frederic hong kong leoni kelly hong kong runciman gary hong kong shankar vijay soh serene tong nelson tran dang cecilia tseng rickson yeh anita yeung jonathan hong kong tse ying tin yew lolita ang benedict hong kong lea danay lin gloria tong mike chuang jacky china chen carrie china poon yen chi anita qian jack chow frankie jiang helen china oum morokot dith sochal kheng sopheakchenda wong theodore foo chang horng bhattacharya arnab truong kent hong kong chan vincent cy hong kong skien craig hong kong lau vincent yeung jonathan hong kong sum selina chok christopher yau emily lee irene hong kong chung margaret lam betty turel kaiwan chan david hong kong chak katherine cheng wilson hong kong chiu polly dhupar karan chow ruskin hong kong wong sunny minam saud fiji damayanti meirina eka bahashwan rifai venkatesh shailesh sucianto lucy kartadinata paul tye alan ng wee lee diana ang sarup adesh lim jasmine ec yeoh hin ler adrain ang vanessa vu pham linh phuong tran thi sinh vietnam bui thanh van nadarajah lavanya vietnam lee john chu sally chou peter huang sophia tw tb lin lydia chang richard hsu ken huang michelle chow winnie tw tb cc mathad vijayakumar kumar santhosh subramanian raghuveer mohan durga subject discontinued commercial fum performance report monday october monday october commercial fum performance report forward due change business structure back friday oct global business reporting anz support services india manyata embassy business park bangalore email global business reporting

从这个数据框中我提取了我需要的单词
pattern<- "([a][c][c][e][s][s]|[r][e][p][o][r][t]|[d][a][t][a])"

O<-lapply(res, function(x) str_extract_all(x,pattern) )

结果 编辑
   $text
$text[[1]]
[1] "access" "access" "access" "access"

$text[[2]]
[1] "report" "access" "access" "access"

$text[[3]]
[1] "access" "access" "access" "access"

$text[[4]]
[1] "access" "access" "access" "access"

$text[[5]]
[1] "report" "access" "access" "access" "access" "access" "access" "access"

$text[[6]]
[1] "report" "access" "access" "report" "access" "access" "access" "access" "access" "access"

$text[[7]]
[1] "report" "report" "access" "access" "report" "report" "report" "report" "report" "report" "data" "data" "report"
[14] "access" "report" "report"

$text[[8]]
[1] "report" "access" "access"

$text[[9]]
[1] "report" "access" "access" "access" "report"

$text[[10]]
[1] "report" "access" "access" "access" "report" "access"

在此我想计算每个单词的出现次数
我用过 str_count实现这一目标,但没有帮助。我在 STO 中发现了许多与 Q 相关的字数统计,但在 R 中没有发现列表类型。
dd<-lapply(O,function(x) c<-str_count(x))

或者我可以计算每个列表中每个单词的频率吗?
我使用了 termFrequency 但不支持我的 R 3.1.0 版本。
 O <- structure(list(text= list(c("access", "access","access", "access"),
c("report","access", "access", "access"),
c("access","access", "access", "access"),
c("access","access", "access", "access"),
c("access"),
c(character(0)),
c("report", "report", "access", "access","report", "report", "report", "report", "report", "report",
"data", "data", "report", "access", "report", "report"),
c("report", "access","access"),
c("report"), c("report", "access", "access", "access", "report","access"))))

转介 this STO 并尝试使用 frq1 <- findFreqTerms(O)不工作

最佳答案

好的,告诉我这对你有什么作用。

使用这些数据:

O <- structure(list(text.1 = list(character(0), c("access", "access", 
"access", "access")), text.2 = list(character(0), c("report",
"access", "access", "access")), text.3 = list(character(0), c("access",
"access", "access", "access")), text.4 = list(character(0), c("access",
"access", "access", "access")), text.5 = list(character(0), "access"),
text.6 = list(character(0), character(0)), text.7 = list(
character(0), c("report", "report", "access", "access",
"report", "report", "report", "report", "report", "report",
"data", "data", "report", "access", "report", "report"
)), text.8 = list(character(0), c("report", "access",
"access")), text.9 = list(character(0), "report"), text.10 = list(
NULL, c("report", "access", "access", "access", "report",
"access"))), .Names = c("text.1", "text.2", "text.3",
"text.4", "text.5", "text.6", "text.7", "text.8", "text.9", "text.10"
))

由于似乎单词总是在 text.x 的第二个元素中列表,我们会将这些单词放入 newlist .更重要的是,我们会将这些数据转换为因子,以便稍后将它们重新组合成一个数据框。
newlist <- list()

for(item in O) {
newlist[[length(newlist)+1]] <- factor(item[[2]], levels = c("access", "data", "report"))
}

dd <- data.frame(lapply(newlist, table))
dd <- t(as.matrix(dd[,c(2,4,6,8,10,12,14,16,18,20)]))

rownames(dd) <- paste0("Text.",1:10)
colnames(dd) <- c("access", "data", "report")

dd

# access data report
# Text.1 4 0 0
# Text.2 3 0 1
# Text.3 4 0 0
# Text.4 4 0 0
# Text.5 1 0 0
# Text.6 0 0 0
# Text.7 3 2 11
# Text.8 2 0 1
# Text.9 0 0 1
# Text.10 4 0 2

关于r - 计算词列表中的词频,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29530584/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com