有没有办法将异常值提取到一个单独的数据框中?

huangapple go评论74阅读模式
英文:

Is there a way to pull outliers into a separate df?

问题

我有一个包含150列和200行的数据框,我想遍历每一列,并提取每列中大于该列均值加3倍标准差的数据点。

我用以下代码替换了异常值为NA,但后来我发现我需要将异常值保存在另一个数据框中。有没有办法修改这个代码,只提取那些异常值的单元格的行和列名称?

新数据框的预期外观如下:

Sample Gene
X1027 G-198712
X7CUH G-228253

以下是修改后的代码:

newtpose = tpose_genexp %>%
mutate_at(.vars = vars(contains("G")),
          .funs= ~ifelse(abs(.) > mean(.) + 3 * sd(.), NA, .))  
英文:

Hi I have a data frame with 150 Columns and 200 rows and I want to go through each column and pull any data points that are more than 3 sd from the mean of each column.

G-198804 G-198712 G-228253 G-198899
X1027 15.100481 15.949672 13.783062 17.106806
X1104 14.905931 15.766908 13.885380 17.134476
X5010 15.268376 16.457303 13.447923 17.345957
X5023 15.513746 16.457871 13.848918 17.634144
X5425 15.093679 16.085498 13.253646 17.066823
X7CUH 15.471564 16.417165 13.764880 17.365255
X8VHB 15.222530 16.440389 13.146401 17.158754
VWU2 14.999256 16.121702 13.261694 17.193140
CUKX 14.795677 16.076999 13.325234 17.145046

I used this to replace the outliers with NA, but I realized I needed the outliers in a separate df. Is there any way to modify this to just pull the row and column name of the cells that are outliers?

newtpose = tpose_genexp %>% 
mutate_at(.vars = vars(contains("G")), 
          .funs= ~ifelse(abs(.)>mean(.)+3*sd(.), NA, .))  

My new data frame would hopefully look like

Sample Gene
X1027 G-198712
X7CUH G-228253

答案1

得分: 0

你可以定义一个新的数据框,称之为 df_out,其中所有不是离群值的数值将被设置为NA:

df_out <- df %>%
       mutate(across(starts_with("G"), 
        ~ifelse(abs(.) > mean(.) + 3*sd(.), ., NA)))

如果你想要将离群值存储在一个两列的数据框中,你可以添加 pivot_longer

df %>%
       mutate(across(starts_with("G"),
        ~ifelse(abs(.) > mean(.) + 3*sd(.), NA, .))) %>%
  pivot_longer(everything()) %>%
  na.omit
英文:

You could define a new data frame, say df_out, in which all those values that are not outliers are set to NA:

df_out &lt;- df %&gt;% 
       mutate(across(starts_with(&quot;G&quot;), 
        ~ifelse(abs(.) &gt; mean(.) + 3*sd(.), ., NA)))

If you want the outliers in a two-column dataframe, you can add pivot_longer:

df %&gt;% 
       mutate(across(starts_with(&quot;G&quot;),
        ~ifelse(abs(.) &gt; mean(.) + 3*sd(.), NA, .))) %&gt;%
  pivot_longer(everything()) %&gt;%
  na.omit 

huangapple
  • 本文由 发表于 2023年3月7日 12:29:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75658051.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定