基于另一列删除数值。

huangapple go评论69阅读模式
英文:

remove values on the basis of another column

问题

我有两列在数据框中,一列是总分,另一列是预期分数。现在我想要从预期分数列中获取那些预期分数大于总分的值。

df <- data.frame(total_score=c(4.5,12.2,4.6,9.2,12.2,36.4,4.5,12.2,4.6,9.2,12.2,36.4),
                 expected_score=c(4.5,12.1,NA,10,12.2,NA,5,12.5,NA,9.2,16,NA),
                 Region1=c("All region",NA,NA,"All region","All region",NA,"All region",NA,NA,"All region","All region",NA),
                 Region2=c("EAST","EAST","EAST","EAST","EAST",NA,"EAST","EAST","EAST","EAST","EAST",NA),
                 Region3=c("West",NA,"West","West","West","West","West",NA,"West","West","West","West"))
英文:

i have two columns in data frame both have values like one have total score and one have expected score. now i want to values from expected score columns where expected score is greater that total score.

df &lt;- data.frame(tota_score=c(4.5,12.2,4.6,9.2,12.2,36.4,4.5,12.2,4.6,9.2,12.2,36.4),
                 expected_score=c(4.5,12.1,NA,10,12.2,NA,5,12.5,NA,9.2,16,NA),
                 Region1=c(&quot;All region&quot;,NA,NA,&quot;All region&quot;,&quot;All region&quot;,NA,&quot;All region&quot;,NA,NA,&quot;All region&quot;,&quot;All region&quot;,NA),
                 Region2=c(&quot;EAST&quot;,&quot;EAST&quot;,&quot;EAST&quot;,&quot;EAST&quot;,&quot;EAST&quot;,NA,&quot;EAST&quot;,&quot;EAST&quot;,&quot;EAST&quot;,&quot;EAST&quot;,&quot;EAST&quot;,NA),
                 Region3=c(&quot;West&quot;,NA,&quot;West&quot;,&quot;West&quot;,&quot;West&quot;,&quot;West&quot;,&quot;West&quot;,NA,&quot;West&quot;,&quot;West&quot;,&quot;West&quot;,&quot;West&quot;))

答案1

得分: 1

使用 dplyr 的第一个选项如下:

library(dplyr)

df %>%
    mutate(expected_score = ifelse(expected_score > total_score, 
                                   NA, expected_score))
   total_score expected_score    Region1 Region2 Region3
1          4.5            4.5 All region    EAST    West
2         12.2           12.1       &lt;NA&gt;    EAST    &lt;NA&gt;
3          4.6             NA       &lt;NA&gt;    EAST    West
4          9.2             NA All region    EAST    West
5         12.2           12.2 All region    EAST    West
6         36.4             NA       &lt;NA&gt;    &lt;NA&gt;    West
7          4.5             NA All region    EAST    West
8         12.2             NA       &lt;NA&gt;    EAST    &lt;NA&gt;
9          4.6             NA       &lt;NA&gt;    EAST    West
10         9.2            9.2 All region    EAST    West
11        12.2             NA All region    EAST    West
12        36.4             NA       &lt;NA&gt;    &lt;NA&gt;    West

使用 data.table 你可以这样做:

library(data.table)
setDT(df)

df[expected_score > total_score, expected_score  := NA]

在一个大型的 data.frame 上,使用 data.table 的后一种选项应该更快。

英文:

Try this first option that uses dplyr:

library(dplyr)

df %&gt;%
    mutate(expected_score = ifelse(expected_score &gt; total_score, 
                                   NA, expected_score))
   total_score expected_score    Region1 Region2 Region3
1          4.5            4.5 All region    EAST    West
2         12.2           12.1       &lt;NA&gt;    EAST    &lt;NA&gt;
3          4.6             NA       &lt;NA&gt;    EAST    West
4          9.2             NA All region    EAST    West
5         12.2           12.2 All region    EAST    West
6         36.4             NA       &lt;NA&gt;    &lt;NA&gt;    West
7          4.5             NA All region    EAST    West
8         12.2             NA       &lt;NA&gt;    EAST    &lt;NA&gt;
9          4.6             NA       &lt;NA&gt;    EAST    West
10         9.2            9.2 All region    EAST    West
11        12.2             NA All region    EAST    West
12        36.4             NA       &lt;NA&gt;    &lt;NA&gt;    West

Using data.table you can do:

library(data.table)
setDT(df)

df[expected_score &gt; total_score, expected_score  := NA]

On a large data.frame, the latter option using data.table should be much faster.

答案2

得分: 1

以下是已翻译的内容:

一行解决方案:

df[which(df[!is.na(df$expected_score),2] > df[!is.na(df$expected_score),1]),]

# 这里的2是您的预期分数,1是您的总分
英文:

One line solution:

df[which(df[!is.na(df$expected_score),2] &gt; df[!is.na(df$expected_score),1]),]

#here 2 is your expected score and 1 is your total score

huangapple
  • 本文由 发表于 2023年5月22日 19:12:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76305567.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定