如何在R中对具有相同列数值的行进行排序

huangapple go评论98阅读模式
英文:

How to sort the Rows with identical column values in R

问题

我的数据框看起来像这样:

  1. dput(Data)
  2. structure(c(NA, "FGFR3", "FAT1", "ARID1A", "CREBBP", "HRAS",
  3. "SF3B1", "RHOB", "FBXW7", "KRAS", "TP53", "PIK3CA", "RHOA", "ASXL2",
  4. "HLA-A", "APC", "ATM", "ARID2", "PTEN", "CDM1", "RBM10", "ERBB4",
  5. "ERCC2", "BAP1", "KMT2D", "ERBB2", "SMC1A", "RB1", "BCLAF1",
  6. NA, NA, NA, NA, NA, NA, NA, "TP53", "RHOA", "FGFR3", "SF3B1",
  7. "PTEN", "RB1", "FAT1", "KDM6A", "ARID1A", "PIK3CA", "CDKN1A",
  8. "ERBB4", "RBM10", "ASXL2", "HRAS", "BAP1", "KMT2A", "ERBB3",
  9. "RHOB", "KRAS", "APC", "KMT2C", "BCLAF1", "KMT2D", "CDKN2A",
  10. "PSIP1", "FBXW7", "HLA-A", "ERBB2", "ATM", "RXRA", "CREBBP",
  11. "EP300", "ARID2", "KDM6A", "CDKN1A", "KMT2A", "ERBB3", "KMT2C",
  12. "CDKN2A", "PSIP1", "RXRA", "EP300", NA, NA, NA, NA, NA, NA, NA,
  13. NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
  14. NA, NA), dim = c(35L, 3L), dimnames = list(NULL, c("F_d", "M_d", "unique")))

我想按照这样的方式对这些数据进行排序或排序,如果某一列的值在两列或三列之间是相同的,它应该在同一行中。

我的输出应该是:

  1. F_d M_d Unique
  2. 1 TP53 TP53 NA
  3. 2 NA CDKN1A CDKN1A
英文:

My df looks like

  1. dput(Data)
  2. structure(c(NA, "FGFR3", "FAT1", "ARID1A", "CREBBP", "HRAS",
  3. "SF3B1", "RHOB", "FBXW7", "KRAS", "TP53", "PIK3CA", "RHOA", "ASXL2",
  4. "HLA-A", "APC", "ATM", "ARID2", "PTEN", "CDH1", "RBM10", "ERBB4",
  5. "ERCC2", "BAP1", "KMT2D", "ERBB2", "SMC1A", "RB1", "BCLAF1",
  6. NA, NA, NA, NA, NA, NA, NA, "TP53", "RHOA", "FGFR3", "SF3B1",
  7. "PTEN", "RB1", "FAT1", "KDM6A", "ARID1A", "PIK3CA", "CDKN1A",
  8. "ERBB4", "RBM10", "ASXL2", "HRAS", "BAP1", "KMT2A", "ERBB3",
  9. "RHOB", "KRAS", "APC", "KMT2C", "BCLAF1", "KMT2D", "CDKN2A",
  10. "PSIP1", "FBXW7", "HLA-A", "ERBB2", "ATM", "RXRA", "CREBBP",
  11. "EP300", "ARID2", "KDM6A", "CDKN1A", "KMT2A", "ERBB3", "KMT2C",
  12. "CDKN2A", "PSIP1", "RXRA", "EP300", NA, NA, NA, NA, NA, NA, NA,
  13. NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
  14. NA, NA, NA), dim = c(35L, 3L), dimnames = list(NULL, c("F_d",
  15. "M_d", "unique")))

I want to sort or order this data in a way that if a column value is common between two or three column it should be in same row.

Like my output should me

  1. F_d M_d Unique
  2. TP53 TP53 NA
  3. NA CDKN1A CDKN1A

答案1

得分: 1

通过创建一个包含所有可能出现的列,你可以解决你的问题。

  1. library(dplyr)
  2. df = as.data.frame(df)
  3. list_tot = data.frame(x=unlist(df), row.names = 1:105) %>%
  4. distinct() %>%
  5. filter(is.na(x)==FALSE)
  6. interm1 <- left_join(list_tot,
  7. df %>% select(F_d) %>% mutate(x=F_d),
  8. by="x")
  9. interm2 <- left_join(interm1,
  10. df %>% select(M_d) %>% mutate(x=M_d),
  11. by="x")
  12. df2 <- left_join(interm2,
  13. df %>% select(unique) %>% mutate(x=unique),
  14. by="x") %>%
  15. select(-x)

解释这段代码,list_tot 将基于表中的所有项创建一个唯一的列表。然后通过 left_join 查看这些项是否在不同的列中(每个列都需要一个 left_join)。df2 应该看起来像你需要的样子!

希望这对你有帮助!

英文:

By creating a column with all the possible occurences, you can solve your problem

  1. library(dplyr)
  2. df = as.data.frame(df)
  3. list_tot = data.frame(x=unlist(df), row.names = 1:105) %&gt;%
  4. distinct() %&gt;%
  5. filter(is.na(x)==FALSE)
  6. interm1 &lt;- left_join(list_tot,
  7. df %&gt;% select(F_d) %&gt;% mutate(x=F_d),
  8. by=&quot;x&quot;)
  9. interm2 &lt;- left_join(interm1,
  10. df %&gt;% select(M_d) %&gt;% mutate(x=M_d),
  11. by=&quot;x&quot;)
  12. df2 &lt;- left_join(interm2,
  13. df %&gt;% select(unique) %&gt;% mutate(x=unique),
  14. by=&quot;x&quot;) %&gt;% select(-x)

To explain the code, list_tot will create a unique list based on all the items in your table. Then the left_join to see if those items are in the different columns (one left_join by column needed)
df2 should look like what you need!

Hope this will help you

huangapple
  • 本文由 发表于 2023年3月8日 19:23:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75672378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定