gpt4 book ai didi

r - 帮助 R 和 grouping/aggregate/*apply/data.table

转载 作者:行者123 更新时间:2023-12-04 21:41:15 26 4
gpt4 key购买 nike

我对 R 很陌生,在运行函数来获得我需要的答案时遇到了麻烦。我有示例数据 PCSTest

http://pastebin.com/z9Ti3nHB

看起来像这样:

Date        Site            Word
--------------------------------------
9/1/2012 slashdot javascript
9/1/2012 stackexchange R
9/1/2012 reddit R
9/1/2012 slashdot javascript
9/1/2012 stackexchange javascript
9/5/2012 reddit R
9/8/2012 slashdot javascript
9/8/2012 stackexchange R
9/8/2012 reddit R
9/8/2012 slashdot javascript
9/18/2012 stackexchange R
9/18/2012 reddit R
9/18/2012 slashdot javascript
9/18/2012 stackexchange R
9/27/2012 reddit R
9/27/2012 slashdot R

我的目标是寻找随着时间的推移与网站相关的不同词出现的趋势。我可以计算它们:
library(plyr)   
PCSTest <- read.csv(file="c:/PCS/PCS Data - Test.csv", header=TRUE)
PCSTest$Date <- as.Date(PCSTest$Date, "%m/%d/%Y")
PCSTest$Date <- as.POSIXct(PCSTest$Date)
countTest <- count(PCSTest, c("Date", "Site", "Word"))

这给出了:
                  Date          Site       Word freq
1 2012-08-31 20:00:00 reddit R 4
2 2012-08-31 20:00:00 slashdot javascript 7
3 2012-08-31 20:00:00 stackexchange javascript 1
4 2012-08-31 20:00:00 stackexchange R 2
5 2012-09-01 20:00:00 reddit javascript 2
6 2012-09-01 20:00:00 slashdot R 3
7 2012-09-04 20:00:00 reddit R 1
8 2012-09-07 20:00:00 reddit R 1
9 2012-09-07 20:00:00 slashdot javascript 2
10 2012-09-07 20:00:00 stackexchange R 1
11 2012-09-09 20:00:00 stackexchange javascript 4
12 2012-09-10 20:00:00 slashdot R 4
13 2012-09-14 20:00:00 reddit javascript 4
14 2012-09-17 20:00:00 reddit R 4
15 2012-09-17 20:00:00 slashdot javascript 1
16 2012-09-17 20:00:00 stackexchange R 2
17 2012-09-19 20:00:00 reddit javascript 2
18 2012-09-23 20:00:00 stackexchange javascript 2
19 2012-09-24 20:00:00 reddit javascript 3
20 2012-09-24 20:00:00 stackexchange javascript 1
21 2012-09-24 20:00:00 stackexchange R 4
22 2012-09-25 20:00:00 reddit javascript 5
23 2012-09-25 20:00:00 slashdot javascript 3
24 2012-09-25 20:00:00 stackexchange R 7
25 2012-09-26 20:00:00 reddit R 1
26 2012-09-26 20:00:00 slashdot R 5

或将它们全部绘制:
library(ggplot2)
ggplot(data=countTest, aes(x=Date, y=freq, group=interaction(Site, Word), colour=interaction(Site, Word), shape=Site)) + geom_line() + geom_point()

My plot of Frequency per day for Words per Site

我现在需要对数据做一些计算,所以我尝试了聚合
aggregate(freq ~ Site + Word, data = countTest,  function(freq) cbind(mean(freq), max(freq)))[order(-agg$freq[,3]),]

这使:
           Site       Word freq.1 freq.2
2 slashdot javascript 3.25 7.00
5 slashdot R 4.00 5.00
1 reddit javascript 3.20 5.00
4 reddit R 2.20 4.00
6 stackexchange R 3.20 7.00
3 stackexchange javascript 2.00 4.00

在最后一个结果中,我想要的是一个具有每天平均频率的列,类似于... sum(freq)/20 days,根据数据计算得出,甚至可能是移动平均数。
另外,我想要另一列带有斜率/线性回归的列。我将如何在聚合函数中计算它?

或者,我如何使这些更好/更快?我知道有 apply 和 data.table 函数,但我不知道如何使用它们。任何帮助将不胜感激!

最佳答案

我不确定你到底想做什么,但是 dplyr (或 plyr )会帮助你。
这里是例子。如果您明确说出您想要什么,您将获得更多帮助。

d <- read.csv("~/Downloads/r_data.txt")
d$Date <- as.POSIXct(as.Date(d$Date, "%m/%d/%Y"))

library(dplyr)
d.cnt <- d %>% group_by(Date, Site, Word) %>% summarise(cnt = n())

# average per day
date.range <- d$Date %>% range %>% diff %>% as.numeric # gives 26 days or
date.range <- d$Date %>% unique %>% length # gives 13 days
d.ave <- d.cnt %>% group_by(Site, Word) %>% summarize(ave_per_day = sum(cnt)/date.range)

# slope
d.reg <- d.cnt %>% group_by(Site, Word) %>%
do({fit = lm(cnt ~ Date, data = .); data.frame(int = coef(fit)[1], slope = coef(fit)[2])})

# plot the slope value
library(ggplot2)
ggplot(d.reg, aes(Site, slope, fill = Word)) + geom_bar(stat = "identity", position = "dodge")

关于r - 帮助 R 和 grouping/aggregate/*apply/data.table,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25733241/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com