英文:
Linear regression model incorrectly calculated in R
问题
我有3个不同的数据集,我以这种方式绘制它们:
每个数据集都是从文件导入到数据框中的(分别称为vues
,likes
和commentaires
),并包含日期和相应日期的数据(观看次数、点赞或评论)。
现在,我想在我的图表上绘制线性模型(likes ~ views和comments ~ views)。
从红色线开始,我输入了以下代码:
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=c(likes,vues)),col="red")
这是RStudio绘制的图形:
现在我不明白问题是来自数据集还是其他地方,但如果我删除data
参数,或者只选择其中一个数据集,它仍然会执行完全相同的操作,即以下操作:
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=likes),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=vues),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30),col="red")
这是我的数据:
> vues$X2022.12.30
[1] 15900 8245 4531 546800 7149 10600 7774 45600 157100
[10] 348300 15000 7363 24000 6073 6469 5848 13100 185600
[19] 18700 7622 483800 6373 12000 7839 17100 10800 9846
[28] 5671 10100 8330 9031 183000 17600 5153 117700 39600
[37] 10300 27900 11200 29500 387800 15000 8968 465800 72500
[46] 9501 5816 9761 5814 16200 269700 8905 16300 14700
[55] 149600 7547 422600 40700 71100 18900 942000 12100 13400
[64] 551900 16500 12000 8648 131900 10700 18400 183700 13500
[73] 21500 1203000 14300 14700 108400 5233 388800 368400 1411000
[82] 286400 17900 261500 1049000 13500 11200 74300 1312000 6044
[91] 22200 9467 5975 143200 4552 502700 3971 9755 32000
[100] 46800 8844 31600 3671 60700 8249 20100 14500 3475
[109] 5745 2420 193700 2305 13500 90200 5746 5520 29200
[118] 7803 2502 4559 2120 3233 242100 5616 1371 1109
[127] 2123 2097 4019 1444 1515 2350 34600 2642 148000
[136] 2139 541400 13700 52600 421700 9876 3671 33600 6388
[145] 12300 3014 50200 2033 45900 5878 2221 1479
> likes$X2022.12.30
[1] 1572 935 229 39000 471 944 472 2149 15400 42000 1346
[12] 517 1977 488 569 462 1940 17200 2121 588 84800 587
[23] 987 618 1229 862 947 278 1048 628 795 19200 1529
[34] 319 9050 3119 868 2840 780 1912 40100 1130 759 47800
[45] 4197 815 470 786 502 1068 33200 698 1145 1442 11200
[56] 534 41600 3740 5119 2376 91700 904 983 20800 812 869
[67] 571 6653 807 1356 7332 1005 1597 104700 1171 982 14300
[78] 367 14900 29800 103500 11900 1073 22700 67700 872 894 3673
[89] 116800 251 2229 593 392 20400 267 29200 449 569 1933
[100] 2260 1031 3035 311 6370 1014 812 956 241 641 116
[
<details>
<summary>英文:</summary>
I have 3 different datasets that I have been plotting this way:
[![Aucune description](https://i.stack.imgur.com/HCP4r.png)](https://i.stack.imgur.com/HCP4r.png)
Each dataset was imported from a file to a data frame (respectively called `vues`, `likes` and `commentaires`), and contains the date and the corresponding data (either views, likes or comments) for each date.
Now, I'd like to plot both linear models onto my graph (likes \~ views and comments \~ views).
Starting with the red one, I entered the following code:
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=c(likes,vues)),col="red")
And this is what RStudio plots:
[![Aucune description](https://i.stack.imgur.com/Nnapc.png)](https://i.stack.imgur.com/Nnapc.png)
Now I don't understand if the problem comes from the dataset or somewhere else, but if I remove the `data` parameter, or just choose one of the two datasets, it still does the exact same thing, i.e. the following:
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=likes),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=vues),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30),col="red")
Here is my data:
> vues$X2022.12.30
[1] 15900 8245 4531 546800 7149 10600 7774 45600 157100
[10] 348300 15000 7363 24000 6073 6469 5848 13100 185600
[19] 18700 7622 483800 6373 12000 7839 17100 10800 9846
[28] 5671 10100 8330 9031 183000 17600 5153 117700 39600
[37] 10300 27900 11200 29500 387800 15000 8968 465800 72500
[46] 9501 5816 9761 5814 16200 269700 8905 16300 14700
[55] 149600 7547 422600 40700 71100 18900 942000 12100 13400
[64] 551900 16500 12000 8648 131900 10700 18400 183700 13500
[73] 21500 1203000 14300 14700 108400 5233 388800 368400 1411000
[82] 286400 17900 261500 1049000 13500 11200 74300 1312000 6044
[91] 22200 9467 5975 143200 4552 502700 3971 9755 32000
[100] 46800 8844 31600 3671 60700 8249 20100 14500 3475
[109] 5745 2420 193700 2305 13500 90200 5746 5520 29200
[118] 7803 2502 4559 2120 3233 242100 5616 1371 1109
[127] 2123 2097 4019 1444 1515 2350 34600 2642 148000
[136] 2139 541400 13700 52600 421700 9876 3671 33600 6388
[145] 12300 3014 50200 2033 45900 5878 2221 1479
> likes$X2022.12.30
[1] 1572 935 229 39000 471 944 472 2149 15400 42000 1346
[12] 517 1977 488 569 462 1940 17200 2121 588 84800 587
[23] 987 618 1229 862 947 278 1048 628 795 19200 1529
[34] 319 9050 3119 868 2840 780 1912 40100 1130 759 47800
[45] 4197 815 470 786 502 1068 33200 698 1145 1442 11200
[56] 534 41600 3740 5119 2376 91700 904 983 20800 812 869
[67] 571 6653 807 1356 7332 1005 1597 104700 1171 982 14300
[78] 367 14900 29800 103500 11900 1073 22700 67700 872 894 3673
[89] 116800 251 2229 593 392 20400 267 29200 449 569 1933
[100] 2260 1031 3035 311 6370 1014 812 956 241 641 116
[111] 6543 113 503 5505 450 410 2067 494 76 350 155
[122] 122 11400 350 51 42 109 96 200 62 53 98
[133] 1207 153 15500 101 56900 718 4498 23600 619 248 1803
[144] 437 983 234 4188 147 2623 591 176 138
And here is the code I used for plotting the graph if that is relevant:
plot.new()
par(mar=c(4,4,4,4))
par(new=TRUE)
par(bg="#FFECDE")
rect(par("usr")[1], par("usr")[3],
par("usr")[2], par("usr")[4],
col = c("#E1DEFF"))
par(new=TRUE)
plot(vues$X2022.12.30,likes$X2022.12.30,col="red",axes=FALSE,xlab="",ylab="",
main="Nombre de j'aime et de commentaires en fonction du nombre de vues",
pch=-0x2022,bg="red")
axis(2,ylim=c(0,120000),col="red", col.axis="red",at=seq(0, 120000, by=20000))
mtext("Nombre de j'aime",side=2,line=2.5,col="red")
box()
par(new=TRUE)
plot(vues$X2022.12.30,commentaires$X2022.12.30,col="blue",axes=FALSE,xlab="",
ylab="",ylim=c(0,1500),pch=-0x2022,bg="blue")
axis(4,col="blue",col.axis="blue",at=seq(0, 1500, by=250))
mtext("Nombre de commentaires",side=4,line=2.5,col="blue")
axis(1,xlim=c(0,1500000),ylim=c(0,145000),col="black",col.axis="black",
at=seq(0, 1400000, by=100000))
mtext("Nombre de vues",side=1,line=2.5,col="black")
legend(x="topleft",legend=c("J'aime","Commentaires"),
text.col=c("black","black"),pch=c(-0x2022,-0x2022),col=c("red","blue"),
bg=c("#C9FFF1"),inset=0.02)
</details>
# 答案1
**得分**: 2
使用 `par(new=TRUE)` 并叠加评论数据会改变y轴刻度;`abline()` 仍然假定旧的刻度仍然有效。
简单的解决方案是在添加评论数据之前使用 `abline()` 添加回归线。
示例:
```r
dd <- data.frame(vues=
c(15900,8245,4531,546800,7149,10600,7774,45600,157100,
348300,15000,7363,24000,6073,6469,5848,13100,185600,
18700,7622,483800,6373,12000,7839,17100,10800,9846,
5671,10100,8330,9031,183000,17600,5153,117700,39600,
10300,27900,11200,29500,387800,15000,8968,465800,72500,
9501,5816,9761,5814,16200,269700,8905,16300,14700,
149600,7547,422600,40700,71100,18900,942000,12100,13400,
551900,16500,12000,8648,131900,10700,18400,183700,13500,
21500,1203000,14300,14700,108400,5233,388800,368400,1411000,
286400,17900,261500,1049000,13500,11200,74300,1312000,6044,
22200,9467,5975,143200,4552,502700,3971,9755,32000,
46800,8844,31600,3671,60700,8249,20100,14500,3475,
5745,2420,193700,2305,13500,90200,5746,5520,29200,
7803,2502,4559,2120,3233,242100,5616,1371,1109,
2123,2097,4019,1444,1515,2350,34600,2642,148000,
2139,541400,13700,52600,421700,9876,3671,33600,6388,
12300,3014,50200,2033,45900,5878,2221,1479),
likes = c(1572,935,229,39000,471,944,472,2149,15400,42000,1346,
517,1977,488,569,462,1940,17200,2121,588,84800,587,
987,618,1229,862,947,278,1048,628,795,19200,1529,
319,9050,3119,868,2840,780,1912,40100,1130,759,47800,
4197,815,470,786,502,1068,33200,698,1145,1442,11200,
534,41600,3740,5119,2376,91700,904,983,20800,812,869,
571,6653,807,1356,7332,1005,1597,104700,1171,982,14300,
367,14900,29800,103500,11900,1073,22700,67700,872,894,3673,
116800,251,2229,593,392,20400,267,29200,449,569,1933,
2260,1031,3035,311,6370,1014,812,956,241,641,116,
6543,113,503,5505,450,410,2067,494,76,350,155,
122,11400,350,51,42,109,96,200,62,53,98,
1207,153,15500,101,56900,718,4498,23600,619,248,1803,
437,983,234,4188,147,2623,591,176,138))
set.seed(101)
dd$other <- runif(nrow(dd), min=0, max = 1500)
plot(likes~vues, data =dd)
abline(lm(likes~vues, data =dd))
par(new=TRUE)
plot(other~vues, data = dd, axes=FALSE, col = 2)
abline(lm(likes~vues, data =dd), col =4, lwd =2)
Note: The code portion is not translated, as requested.
英文:
Using par(new=TRUE)
and overplotting the commentaries data changes the y-axis scale; abline()
is still assuming the old scale is in effect.
The simple solution would be to use abline()
to add the regression line before you add the commentaries data.
Example:
dd <- data.frame(vues=
c(15900,8245,4531,546800,7149,10600,7774,45600,157100,
348300,15000,7363,24000,6073,6469,5848,13100,185600,
18700,7622,483800,6373,12000,7839,17100,10800,9846,
5671,10100,8330,9031,183000,17600,5153,117700,39600,
10300,27900,11200,29500,387800,15000,8968,465800,72500,
9501,5816,9761,5814,16200,269700,8905,16300,14700,
149600,7547,422600,40700,71100,18900,942000,12100,13400,
551900,16500,12000,8648,131900,10700,18400,183700,13500,
21500,1203000,14300,14700,108400,5233,388800,368400,1411000,
286400,17900,261500,1049000,13500,11200,74300,1312000,6044,
22200,9467,5975,143200,4552,502700,3971,9755,32000,
46800,8844,31600,3671,60700,8249,20100,14500,3475,
5745,2420,193700,2305,13500,90200,5746,5520,29200,
7803,2502,4559,2120,3233,242100,5616,1371,1109,
2123,2097,4019,1444,1515,2350,34600,2642,148000,
2139,541400,13700,52600,421700,9876,3671,33600,6388,
12300,3014,50200,2033,45900,5878,2221,1479),
likes = c(1572,935,229,39000,471,944,472,2149,15400,42000,1346,
517,1977,488,569,462,1940,17200,2121,588,84800,587,
987,618,1229,862,947,278,1048,628,795,19200,1529,
319,9050,3119,868,2840,780,1912,40100,1130,759,47800,
4197,815,470,786,502,1068,33200,698,1145,1442,11200,
534,41600,3740,5119,2376,91700,904,983,20800,812,869,
571,6653,807,1356,7332,1005,1597,104700,1171,982,14300,
367,14900,29800,103500,11900,1073,22700,67700,872,894,3673,
116800,251,2229,593,392,20400,267,29200,449,569,1933,
2260,1031,3035,311,6370,1014,812,956,241,641,116,
6543,113,503,5505,450,410,2067,494,76,350,155,
122,11400,350,51,42,109,96,200,62,53,98,
1207,153,15500,101,56900,718,4498,23600,619,248,1803,
437,983,234,4188,147,2623,591,176,138))
set.seed(101)
dd$other <- runif(nrow(dd), min=0, max = 1500)
plot(likes~vues, data =dd)
abline(lm(likes~vues, data =dd))
par(new=TRUE)
plot(other~vues, data = dd, axes=FALSE, col = 2)
abline(lm(likes~vues, data =dd), col =4, lwd =2)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论