gpt4 book ai didi

r - 使用 geom_line() 连接选定 NA 上的点

转载 作者:行者123 更新时间:2023-12-04 15:21:56 27 4
gpt4 key购买 nike

我的问题与Connecting across missing values with geom_line密切相关,但它是后续而不是重复。

我有缺失值 NA 的数据。数据已用包 reshape2 以长格式“融化”我正在使用 ggplot2绘制两个 geom_points()geom_line() .在示例数据中,我只有一组,在我的真实数据中,我有几组。我想绘制一个 geom_line()连接没有被超过 4 年的缺失数据分隔的数据点。换句话说,如果有 3 个相邻的 NA 行,则应用 na.rm到 data.frame,而如果至少有 4 个相邻的行带有 NA,则不应用 na.rm到数据框。

编辑:注意:我正在复制书中的数字,即使数据丢失,这些点也是连接的。最好使用不同的 linetypecolour对于连接缺失数据的那些段,以及图例中的注释解释它。

在下面,我有一个非常乏味和丑陋的 hack,它不会扩展到操作大量数据。我很感激有一种更简单的方法,并且特别渴望找到一种简单的方法来计算数据中连续 NA 的实例。

### ggplot draws geom_line with NAs

# Data (real-world example, so not exactly MWE)
df <-
structure(list(Year = c(1910, 1911, 1912, 1913, 1914, 1915, 1916,
1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927,
1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938,
1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949,
1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971,
1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982,
1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
2005, 2006, 2007, 2008, 2009, 2010), variable = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("France", "Germany", "Sweden", "Japan"
), class = c("ordered", "factor")), value = c(0.1724, 0.1748,
0.1752, 0.1777, 0.1778, 0.1953, 0.2132, 0.2242, 0.222, 0.1947,
NA, NA, NA, NA, NA, 0.113, 0.113, 0.115, 0.112, 0.111, NA, NA,
0.114, 0.109, 0.113, 0.12, 0.137, 0.15, 0.163, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.116, NA, NA, NA, NA, NA, NA, 0.11,
NA, NA, NA, 0.122, NA, NA, NA, 0.122, NA, NA, 0.112, NA, NA,
0.113, NA, NA, 0.101, NA, NA, 0.102, NA, NA, 0.1043, NA, NA,
0.0906, NA, NA, 0.0964, NA, NA, 0.1052, NA, NA, 0.1043, NA, NA,
0.1005, NA, NA, 0.1088, NA, NA, 0.101139312657167, 0.0950290025146689,
0.0901042749371333, 0.09, 0.107249622799665, 0.108891198658843,
0.115913495389774, 0.110684772282761, 0.113299133836267, 0.111991953059514
)), .Names = c("Year", "variable", "value"), row.names = 102:202, class = "data.frame")

默认图:
library("ggplot2")
ggplot(data = df, aes(x = Year, y = value, group = variable, colour = variable, shape = variable)) +
geom_point(size = 3) + geom_line()

enter image description here

删除所有 NA 的图(参见 Connecting across missing values with geom_line):
ggplot(data = df, aes(x = Year, y = value, group = variable, colour = variable, shape = variable)) + 
geom_point(size = 3) + geom_line(data = df[!is.na(df$value), ])

enter image description here

所需的情节:
df2 <- df
df2[df2$Year == 1922, ]$value <- "-999999"
df2[df2$Year == 1948, ]$value <- "-999999"
df2 <- df2[!is.na(df2$value), ]
df2$value <- as.numeric(df2$value)
ggplot(data = df2, aes(x = Year, y = value, group = variable, colour = variable, shape = variable)) + geom_point(size = 3) +
geom_line() + scale_y_continuous(limit = c(.08, .23))

enter image description here

最佳答案

这会产生您的“所需情节”,但注释中注明的除外。

x <- rle(!is.na(df$value))
x$values[which(x$lengths>3 & !x$values)] <- TRUE
indx <- inverse.rle(x)
library(ggplot2)
ggplot(df[indx,],aes(x=Year,y=value,color=variable))+
geom_point(size=3)+
geom_line()



基本上,我们编码 NAFALSE ,以及其他所有内容 TRUE ,然后执行游程长度编码来识别 T/F的序列. FALSE 的任意序列长度 > 3 的应该保留,所以我们将它们转换为 TRUE (好像它们不是 NA ),然后我们使用逆 rle 来恢复具有 TRUE 的索引向量如果应保留该行。最后,我们将此应用于 df用于 ggplot .

关于r - 使用 geom_line() 连接选定 NA 上的点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27676179/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com