2023年2月18日 00:48:14go评论85阅读模式

英文:

Trouble with NA's in large dataframe

问题

我在尝试标准化我的数据时遇到了困难。

首先，我使用我的数据创建了一个数据框对象，并指定了所需的行名称（并删除了第一列，因为它不需要）。

然后，我应该将所有的列转换为Z-分数（每列=基因表达值；每行=样本）-> 这里的想法是将每个基因表达数据转换为它在每个细胞中的Z-分数值。

这使得我的整个数据框中都是NA的单元格。

实际上，数据集中有许多NA值，一些是整行，甚至一些是整列。

我尝试了几种方法来删除NA值。

然而显然，它不起作用。此外，尝试is.na(EXPGli)返回所有字段为False。

我想了解我在这里做错了什么，似乎问题可能是R中的NA没有被识别为NA，但我找不到解决方法。非常感谢任何输入，提前感谢！

英文:

I'm having trouble trying to standardize my data.
So, first things first, I create the dataframe object with my data, with my desired row names (and I remove the 1st column, as it is not needed.

EXPGli &lt;-read.delim(&quot;C:/Users/i5/Dropbox/Guilherme Vergara/Doutorado/Data/Datasets/MergedEXP3.txt&quot;, row.names=2)
EXPGli &lt;- EXPGli[,-1]
EXPGli &lt;- as.data.frame(EXPGli)

Then, I am supposed to convert all the columns to Z-score (each column = gene expression values; each row = sample) -> the idea here is to convert every gene expression data to a Z-score value of it for each cell

Z_score &lt;- function(x) {(x-mean(x))/ sd(x)}
apply(EXPGli, 2, Z_score)

Which returns me [ reached 'max' / getOption("max.print") -- omitted 1143 rows ]
And now my whole df is NA's cells.
Indeed, there are several NAs in the dataset, some full rows and even some columns.

I tried several approaches to remove NAs

EXPGli &lt;- na.omit(EXPGli)
EXPGli %&gt;% drop_na()
print(EXPGli[rowSums(is.na(EXPGli)) == 0, ])
na.exclude(EXPGli)

Yet apparently, it does not work. Additionally, trying to is.na(EXPGli)
Returns me False to all fields.
I would like to understand what am I doing wrong here, it seems that the issue might be NA's not being recognized in R as NA but I couldnt find a solve for this. Any input is very appreciatted, thanks in advance!

答案1

得分: 1

你可能希望在调用`mean(x)`和`sd(x)`时在`Z_score`函数内设置参数`na.rm = TRUE`，否则这些调用会对其中包含NA的向量返回NA。

Z_score <- function(x) {(x-mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)}

英文:

You may want to set the argument na.rm = TRUE in your calls to mean(x) and sd(x) inside the Z_score function, otherwise these calls would return NAs for any vector with NAs in it.

Z_score &lt;- function(x) {(x-mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)}
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在大型数据框中遇到缺失值的问题

问题

答案1

使用 df.loc[] 与使用布尔蒙版的 df[] 简写，pandas

在R中合并不同数据集中具有相同列名的列元素。

Passing calculations run in renderPlotly to be shown in table below graph (without global variables)

更新盒形图轴范围以添加点后

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。