英文:
Trouble with NA's in large dataframe
问题
我在尝试标准化我的数据时遇到了困难。
首先,我使用我的数据创建了一个数据框对象,并指定了所需的行名称(并删除了第一列,因为它不需要)。
然后,我应该将所有的列转换为Z-分数(每列=基因表达值;每行=样本)-> 这里的想法是将每个基因表达数据转换为它在每个细胞中的Z-分数值。
这使得我的整个数据框中都是NA的单元格。
实际上,数据集中有许多NA值,一些是整行,甚至一些是整列。
我尝试了几种方法来删除NA值。
然而显然,它不起作用。此外,尝试is.na(EXPGli)返回所有字段为False。
我想了解我在这里做错了什么,似乎问题可能是R中的NA没有被识别为NA,但我找不到解决方法。非常感谢任何输入,提前感谢!
英文:
I'm having trouble trying to standardize my data.
So, first things first, I create the dataframe object with my data, with my desired row names (and I remove the 1st column, as it is not needed.
EXPGli <-read.delim("C:/Users/i5/Dropbox/Guilherme Vergara/Doutorado/Data/Datasets/MergedEXP3.txt", row.names=2)
EXPGli <- EXPGli[,-1]
EXPGli <- as.data.frame(EXPGli)
Then, I am supposed to convert all the columns to Z-score (each column = gene expression values; each row = sample) -> the idea here is to convert every gene expression data to a Z-score value of it for each cell
Z_score <- function(x) {(x-mean(x))/ sd(x)}
apply(EXPGli, 2, Z_score)
Which returns me [ reached 'max' / getOption("max.print") -- omitted 1143 rows ]
And now my whole df is NA's cells.
Indeed, there are several NAs in the dataset, some full rows and even some columns.
I tried several approaches to remove NAs
EXPGli <- na.omit(EXPGli)
EXPGli %>% drop_na()
print(EXPGli[rowSums(is.na(EXPGli)) == 0, ])
na.exclude(EXPGli)
Yet apparently, it does not work. Additionally, trying to is.na(EXPGli)
Returns me False to all fields.
I would like to understand what am I doing wrong here, it seems that the issue might be NA's not being recognized in R as NA but I couldnt find a solve for this. Any input is very appreciatted, thanks in advance!
答案1
得分: 1
你可能希望在调用`mean(x)`和`sd(x)`时在`Z_score`函数内设置参数`na.rm = TRUE`,否则这些调用会对其中包含NA的向量返回NA。
Z_score <- function(x) {(x-mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)}
英文:
You may want to set the argument na.rm = TRUE
in your calls to mean(x)
and sd(x)
inside the Z_score function
, otherwise these calls would return NAs for any vector with NAs in it.
Z_score <- function(x) {(x-mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)}
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论