在大型数据框中遇到缺失值的问题

huangapple go评论63阅读模式
英文:

Trouble with NA's in large dataframe

问题

我在尝试标准化我的数据时遇到了困难。

首先,我使用我的数据创建了一个数据框对象,并指定了所需的行名称(并删除了第一列,因为它不需要)。

然后,我应该将所有的列转换为Z-分数(每列=基因表达值;每行=样本)-> 这里的想法是将每个基因表达数据转换为它在每个细胞中的Z-分数值。

这使得我的整个数据框中都是NA的单元格。

实际上,数据集中有许多NA值,一些是整行,甚至一些是整列。

我尝试了几种方法来删除NA值。

然而显然,它不起作用。此外,尝试is.na(EXPGli)返回所有字段为False。

我想了解我在这里做错了什么,似乎问题可能是R中的NA没有被识别为NA,但我找不到解决方法。非常感谢任何输入,提前感谢!

英文:

I'm having trouble trying to standardize my data.
So, first things first, I create the dataframe object with my data, with my desired row names (and I remove the 1st column, as it is not needed.

EXPGli <-read.delim("C:/Users/i5/Dropbox/Guilherme Vergara/Doutorado/Data/Datasets/MergedEXP3.txt", row.names=2)
EXPGli <- EXPGli[,-1]
EXPGli <- as.data.frame(EXPGli)

Then, I am supposed to convert all the columns to Z-score (each column = gene expression values; each row = sample) -> the idea here is to convert every gene expression data to a Z-score value of it for each cell

Z_score <- function(x) {(x-mean(x))/ sd(x)}
apply(EXPGli, 2, Z_score)

Which returns me [ reached 'max' / getOption("max.print") -- omitted 1143 rows ]
And now my whole df is NA's cells.
Indeed, there are several NAs in the dataset, some full rows and even some columns.

I tried several approaches to remove NAs

EXPGli <- na.omit(EXPGli)
EXPGli %>% drop_na()
print(EXPGli[rowSums(is.na(EXPGli)) == 0, ])
na.exclude(EXPGli)

Yet apparently, it does not work. Additionally, trying to is.na(EXPGli)
Returns me False to all fields.
I would like to understand what am I doing wrong here, it seems that the issue might be NA's not being recognized in R as NA but I couldnt find a solve for this. Any input is very appreciatted, thanks in advance!

答案1

得分: 1

你可能希望在调用`mean(x)`和`sd(x)`时在`Z_score`函数内设置参数`na.rm = TRUE`,否则这些调用会对其中包含NA的向量返回NA。
Z_score <- function(x) {(x-mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)}
英文:

You may want to set the argument na.rm = TRUE in your calls to mean(x) and sd(x) inside the Z_score function, otherwise these calls would return NAs for any vector with NAs in it.

Z_score &lt;- function(x) {(x-mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)}


</details>



huangapple
  • 本文由 发表于 2023年2月18日 00:48:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75487074.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定