gpt4 book ai didi

r - 是否有必要添加计数为零的年份? (R 中的犯罪分析)

转载 作者:行者123 更新时间:2023-12-01 00:48:28 27 4
gpt4 key购买 nike

我正在分析巴尔的摩地区的犯罪情况(5 年的数据)。我正在为该地区特定街区的特定犯罪类型创建折线图。但是,并非每个街区每天都会报告所有犯罪类型。因此,数据中没有计数为零的天数。只有那些日子才会在数据中报告犯罪。这会在视觉上影响在零处接触 x 轴的折线图数据。这是否会对 stat_smooth 创建的趋势线产生负面影响,以便识别犯罪类型的增加/减少?

生成折线图的可重现代码:

#Read crime data from GitHub repo into a R dataframe
df = read.csv("https://raw.githubusercontent.com/brianthomasbaker/Baltimore-Crime-Analysis/master/Baltimore_SE_Reported_Crime_2010_to_2014.csv", stringsAsFactors=FALSE, sep=",")

#Format CrimeDate column
df$CrimeDate = as.Date(df$CrimeDate, "%m/%d/%Y")

#Create new dataframe of only Larceny From Auto crimes by Day of the Year in Canton (2010-2014)
library(dplyr)
df_cantonlarcauto = df %>%
filter(Neighborhood == "Canton", Description == "LARCENY FROM AUTO") %>%
group_by(CrimeDate) %>%
summarize(crimes = n())

#Create Line Chart using ggplot
library(ggplot2)
ggplot(df_cantonlarcauto, aes(x = CrimeDate, y = crimes, group=1)) +
geom_line() +
scale_size_area() +
stat_smooth(method = "gam") +
xlab("Year") +
ylab("Number of Crimes") +
ylim(0,13) +
theme(plot.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=32, hjust=0)) +
theme(axis.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=22)) +
ggtitle("Larceny From Auto\nCanton (2010-2014)")

head(df_cantonlarcauto)

您可以在数据帧的标题中看到缺少 1 月 2 日和 3 日。是否应该将这些天的缺失天数和零计数添加到数据中?如果是这样,你怎么能在 R 中做到这一点?或者,这些日子的遗漏不会对随着时间的推移分析犯罪数据的尝试产生负面影响吗?

最佳答案

您可以添加缺少的日期:

library(dplyr)
df_cantonlarcauto_missing = data_frame(CrimeDate = seq(min(df_cantonlarcauto$CrimeDate), max(df_cantonlarcauto$CrimeDate), 1)) %>%
left_join(df_cantonlarcauto)

如果您使用此数据框 (ggplot(df_cantonlarcauto_missing, aes(x = CrimeDate, y = Crime, group=1)) + ... ) 进行绘图,您应该已经看到了一个更好看的情节。

我不知道这些数据,但我个人建议现在观察它是强制将缺失的日期设为 0,然后进行某种聚合(如每周滚动平均值),因为这些值非常低并且经常缺失/0:
df_cantonlarcauto_missing = data_frame(CrimeDate = seq(min(df_cantonlarcauto$CrimeDate), max(df_cantonlarcauto$CrimeDate), 1)) %>% 
left_join(df_cantonlarcauto) %>%
mutate(crimes = ifelse(is.na(crimes), 0, crimes)) %>%
mutate(crimes = c(rep(NA, 6), rollmean(crimes, 7, align = "right")))

ggplot(df_cantonlarcauto_missing, aes(x = CrimeDate, y = crimes, group=1)) +
geom_line() +
scale_size_area() +
stat_smooth(method = "gam") +
xlab("Year") +
ylab("Number of Crimes") +
# ylim(0,13) +
theme(plot.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=32, hjust=0)) +
theme(axis.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=22)) +
ggtitle("Larceny From Auto\nCanton (2010-2014)")

Plot with Rolling Mean

关于r - 是否有必要添加计数为零的年份? (R 中的犯罪分析),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32766765/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com