问题

我有一个包含150列和200行的数据框，我想遍历每一列，并提取每列中大于该列均值加3倍标准差的数据点。

我用以下代码替换了异常值为NA，但后来我发现我需要将异常值保存在另一个数据框中。有没有办法修改这个代码，只提取那些异常值的单元格的行和列名称？

新数据框的预期外观如下：

Sample	Gene
X1027	G-198712
X7CUH	G-228253

以下是修改后的代码：

newtpose = tpose_genexp %>%
mutate_at(.vars = vars(contains("G")),
          .funs= ~ifelse(abs(.) > mean(.) + 3 * sd(.), NA, .))

英文:

Hi I have a data frame with 150 Columns and 200 rows and I want to go through each column and pull any data points that are more than 3 sd from the mean of each column.

	G-198804	G-198712	G-228253	G-198899
X1027	15.100481	15.949672	13.783062	17.106806
X1104	14.905931	15.766908	13.885380	17.134476
X5010	15.268376	16.457303	13.447923	17.345957
X5023	15.513746	16.457871	13.848918	17.634144
X5425	15.093679	16.085498	13.253646	17.066823
X7CUH	15.471564	16.417165	13.764880	17.365255
X8VHB	15.222530	16.440389	13.146401	17.158754
VWU2	14.999256	16.121702	13.261694	17.193140
CUKX	14.795677	16.076999	13.325234	17.145046

I used this to replace the outliers with NA, but I realized I needed the outliers in a separate df. Is there any way to modify this to just pull the row and column name of the cells that are outliers?

newtpose = tpose_genexp %&gt;% 
mutate_at(.vars = vars(contains(&quot;G&quot;)), 
          .funs= ~ifelse(abs(.)&gt;mean(.)+3*sd(.), NA, .))

My new data frame would hopefully look like

Sample	Gene
X1027	G-198712
X7CUH	G-228253

答案1

得分: 0

你可以定义一个新的数据框，称之为 df_out，其中所有不是离群值的数值将被设置为NA：

df_out <- df %>%
       mutate(across(starts_with("G"), 
        ~ifelse(abs(.) > mean(.) + 3*sd(.), ., NA)))

如果你想要将离群值存储在一个两列的数据框中，你可以添加 pivot_longer：

df %>%
       mutate(across(starts_with("G"),
        ~ifelse(abs(.) > mean(.) + 3*sd(.), NA, .))) %>%
  pivot_longer(everything()) %>%
  na.omit

英文:

You could define a new data frame, say df_out, in which all those values that are not outliers are set to NA:

df_out &lt;- df %&gt;% 
       mutate(across(starts_with(&quot;G&quot;), 
        ~ifelse(abs(.) &gt; mean(.) + 3*sd(.), ., NA)))

If you want the outliers in a two-column dataframe, you can add pivot_longer:

df %&gt;% 
       mutate(across(starts_with(&quot;G&quot;),
        ~ifelse(abs(.) &gt; mean(.) + 3*sd(.), NA, .))) %&gt;%
  pivot_longer(everything()) %&gt;%
  na.omit

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有办法将异常值提取到一个单独的数据框中？

问题

答案1

如何在R中使用scale_fill_discrete时更改颜色？

在Mapview中一直添加多边形的名称？

R: 将 geom_errorbarh 重新排序放入 ggplot 中

部分字符串在使用带有searchHighlight选项的筛选器时在DT中消失。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论