英文:
Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?
问题
The problem you're facing is that you want to plot multiple variables over time, but some of them have missing data. You've tried various solutions, but encountered errors. Here's a corrected version of your R code with explanations in Chinese:
# 问题
# 我有三个随时间记录的变量。第一个(黑色)在每个时间段记录,第二个(蓝色)在每隔一个时间段记录,第三个(红色)在除一个时间段外的每个时间段记录。我尝试在R中绘制这些数据:
# 创建数据框
test <- data.frame(time=c(1:5), black=c(3, 3, 3, 3, 3), blue=c(1, NA, 3, NA, 5), red=c(5, 4, NA, 2, 1))
# 绘制黑色线
plot(test$time, test$black, type="l", col="black")
# 添加蓝色线
lines(test$time, na.approx(test$blue), col="blue")
# 添加红色线
lines(test$time, na.approx(test$red), col="red")
这段代码解决了你的问题,使用na.approx()
函数来填充缺失值并绘制连续的线条。希望这对你有所帮助!
英文:
The problem
I have three variables recorded over time. The first (black) is recorded at every time period, the second (blue) every other time period, the third (red) at every time period except one. I try to plot these in R:
test <- data.frame(time=c(1:5), black=c(3, 3, 3, 3, 3), blue=c(1, NA, 3, NA, 5), red=c(5, 4, NA, 2, 1))
plot(test$time, test$black, type="l", col="black")
lines(test$time, test$blue, col="blue")
lines(test$time, test$red, col="red")
The result is a plot in which 'black' is the only continuous line, 'blue' is completely absent, and 'red' is absent between time 2 and time 4. I would like all three lines to be continuous.
Attempted solutions
From https://stackoverflow.com/questions/15533212/how-to-connect-dots-where-there-are-missing-values
plot(na.omit(test), test$time, test$black, type="l", col="black")
Returns "Error in match.fun(panel) : 'test$black' is not a function, character or symbol".
na.omit(test)
plot(test$time, test$black, type="l", col="black")
lines(test$time, test$blue, col="blue")
lines(test$time, test$red, col="red")
The plot is the same as in my original problem, and actually omits every time period in which one of the variables is missing data, so actual data (in this example, for black) is omitted alongside every period in which there is missing data for any of the other variables.
plot(type="l", test$time, test$black, col="black")
lines(which(!is.na(test$blue)), na.omit(test$blue), test$time, test$blue, col="blue")
lines(test$time, test$red, col="red")
Returns "Error in plot.xy(xy.coords(x, y), type = type, ...) : invalid plot type". Even amending the first line to plot(test$time, test$black, col="black")
does not resolve this error.
From https://stackoverflow.com/questions/15533212/how-to-connect-dots-where-there-are-missing-values
plot(approx(test, xout=seq_along(test))$y, type="l", test$time, test$black, col="black")
Returns "Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ".
From https://stackoverflow.com/questions/42590545/r-plotting-a-line-with-missing-na-values
There it is commented that na.omit() or na.approx() "seem to work only if I would plot 'A' separately in a stand-alone plot, they do not seem to work in conjunction with 'Time' and 'B' and 'C' all in the same plot" and that this as a "super weird bug". They suggest:
plot(test$time[!is.na(test$black)],test$black[!is.na(test$black)],type="l")
lines(test$time,test$blue, type="l",col="blue")
lines(test$time, test$red, type="l", col="red")
The plot is the same as in my original problem. If I change the coding for 'blue' to (test$time, test$blue, type="p", col="blue")
then I get a single point at time point 3, but not the line that I would expect.
Also from https://stackoverflow.com/questions/42590545/r-plotting-a-line-with-missing-na-values
xlim <- range(test$time)
ylim <- range(subset[-1], na.rm = TRUE)
Quickly returns "Error in subset[-1] : object of type 'closure' is not subsettable".
ok <- ! is.na(test$black)
plot(black ~ time, time, time = ok, type = "l", xlim = xlim, ylim = ylim)
Quickly returns "Error in FUN(X[[i]], ...) : invalid 'envir' argument of type 'closure'". I also cannot see how 'blue' or 'red' data would enter into this plot, even if it didn't return an error.
So is there any way to use plot() for plotting multiple variables over time when one of them has missing data?
答案1
得分: 2
如果您想在数据缺失时连接数据点,您可以使用zoo
库:
install.packages('zoo')
library(zoo)
# 创建数据框
test <- data.frame(time = 1:5, black = c(3, 3, 3, 3, 3), blue = c(1, NA, 3, NA, 5), red = c(5, 4, NA, 2, 1))
# 插值缺失值
test$blue <- na.approx(test$blue)
test$red <- na.approx(test$red)
# 绘制数据
plot(test$time, test$black, type = "l", col = "black", ylim = range(na.omit(test[-1])))
lines(test$time, test$blue, col = "blue")
lines(test$time, test$red, col = "red")
英文:
If you want to connect the dots when you have missing data, you can use library(zoo)
install.packages('zoo')
library(zoo)
# Create the data frame
test <- data.frame(time = 1:5, black = c(3, 3, 3, 3, 3), blue = c(1, NA, 3, NA, 5), red = c(5, 4, NA, 2, 1))
# Interpolate missing values
test$blue <- na.approx(test$blue)
test$red <- na.approx(test$red)
# Plot the data
plot(test$time, test$black, type = "l", col = "black", ylim = range(na.omit(test[-1])))
lines(test$time, test$blue, col = "blue")
lines(test$time, test$red, col = "red")
答案2
得分: 1
使用approx
来对每个y轴向量进行插值。要绘制插值结果,首先打开一个空白图,然后使用mapply
循环将每列和线条颜色传递给插值和绘图代码。
test <- data.frame(time = 1:5,
black = c(3, 3, 3, 3, 3),
blue = c(1, NA, 3, NA, 5),
red = c(5, 4, NA, 2, 1))
clrs <- names(test)[-1]
xlim <- range(test$time)
ylim <- range(test[-1], na.rm = TRUE)
plot(NA, NA, type = "n", xlim = xlim, ylim = ylim)
mapply(\(y, col) {
dat <- approx(x = test$time, y)
lines(y ~ x, data = dat, col = col)
}, test[-1], clrs)
<!-- -->
#> $black
#> NULL
#>
#> $blue
#> NULL
#>
#> $red
#> NULL
<sup>创建于2023-06-15,使用reprex v2.0.2。</sup>
英文:
Use approx
to interpolate each y-axis vector. To plot the interpolated results start by opening an empty plot then use a mapply
loop to pass each column and line color to the interpolating and plotting code.
test <- data.frame(time = 1:5,
black = c(3, 3, 3, 3, 3),
blue = c(1, NA, 3, NA, 5),
red = c(5, 4, NA, 2, 1))
clrs <- names(test)[-1]
xlim <- range(test$time)
ylim <- range(test[-1], na.rm = TRUE)
plot(NA, NA, type = "n", xlim = xlim, ylim = ylim)
mapply(\(y, col) {
dat <- approx(x = test$time, y)
lines(y ~ x, data = dat, col = col)
}, test[-1], clrs)
<!-- -->
#> $black
#> NULL
#>
#> $blue
#> NULL
#>
#> $red
#> NULL
<sup>Created on 2023-06-15 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论