英文:
Reorganize a dataframe in by column values R
问题
基于第一列的最后两个字符 'AH',我想要将Res1、Pos1和Chain1设置为与 'A' 相关的部分,因此第一行应该是:
File_pdb Res1 Pos1 Chain1 Res2 Pos2 Chain2
6wps_AH CYS 361 A TRP 105 H
这应该是一个通用的操作,因为有些行可能已经是正确的,而且在同一个数据框中可能有不同的行,有些是正确的,有些不是。谢谢!
以下是数据:
data.table::as.data.table(structure(list(File_pdb = c("6wps_AH", "6wps_AH", "6wps_AH",
"6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH",
"6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH",
"6wps_AH"), Res1 = c("TRP", "GLU", "PHE", "SER", "ILE", "TYR",
"GLY", "PHE", "LEU", "TRP", "PHE", "PRO", "ILE", "TRP", "ALA",
"THR"), Pos1 = c("105", "108", "106", "109", "111", "100", "103",
"106", "110", "105", "106", "28", "111", "105", "104", "30"),
Chain1 = c("H", "H", "H", "H", "H", "H", "H", "H", "H", "H",
"H", "H", "H", "H", "H", "H"), Res2 = c("CYS", "LYS", "LYS",
"THR", "THR", "ASN", "GLU", "GLU", "GLU", "PRO", "PRO", "LEU",
"ASN", "GLU", "GLU", "ASN"), Pos2 = c("361", "356", "356",
"345", "345", "343", "340", "340", "340", "337", "337", "335",
"343", "340", "340", "334"), Chain2 = c("A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"
)), row.names = c(NA, -16L), class = c("data.table", "data.frame"
)))
希望这可以帮助你。如果有其他问题,请告诉我。
英文:
I have this df
File_pdb Res1 Pos1 Chain1 Res2 Pos2 Chain2
6wps_AH TRP 105 H CYS 361 A
6wps_AH GLU 108 H LYS 356 A
6wps_AH PHE 106 H LYS 356 A
6wps_AH SER 109 H THR 345 A
6wps_AH ILE 111 H THR 345 A
6wps_AH TYR 100 H ASN 343 A
Based on the last two character on the first column 'AH', I-'d like to have as Res1, Pos1 and Chain1 what is referring to 'A' so CYS 361 A. So, the first line would be
File_pdb Res1 Pos1 Chain1 Res2 Pos2 Chain2
6wps_AH CYS 361 A TRP 105 H
It should be something general since it can happen that some rows are already right and also, we can have different rows in the same df, some right and others not. Thanks in advance!
here the data:
data.table::as.data.table(structure(list(File_pdb = c("6wps_AH", "6wps_AH", "6wps_AH",
"6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH",
"6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH", "6wps_AH",
"6wps_AH"), Res1 = c("TRP", "GLU", "PHE", "SER", "ILE", "TYR",
"GLY", "PHE", "LEU", "TRP", "PHE", "PRO", "ILE", "TRP", "ALA",
"THR"), Pos1 = c("105", "108", "106", "109", "111", "100", "103",
"106", "110", "105", "106", "28", "111", "105", "104", "30"),
Chain1 = c("H", "H", "H", "H", "H", "H", "H", "H", "H", "H",
"H", "H", "H", "H", "H", "H"), Res2 = c("CYS", "LYS", "LYS",
"THR", "THR", "ASN", "GLU", "GLU", "GLU", "PRO", "PRO", "LEU",
"ASN", "GLU", "GLU", "ASN"), Pos2 = c("361", "356", "356",
"345", "345", "343", "340", "340", "340", "337", "337", "335",
"343", "340", "340", "334"), Chain2 = c("A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"
)), row.names = c(NA, -16L), class = c("data.table", "data.frame"
)))
答案1
得分: 2
这是一个可行的方法(使用data.table
),可以完成您的需求。最后的列顺序未保留,但可以使用setcolorder
或类似方法手动处理。
我在这里使用magrittr::%>%
进行管道操作;这并非必需,可以使用data.table
的管道操作([...][...][...]
)或使用临时对象来替代。
library(data.table)
quux %>%
.[, i := .I] %>%
melt(id.vars = c("File_pdb", "i"), measure.vars = patterns(Res="^Res", Pos="^Pos", Chain="^Chain")) %>%
.[, last2 := strsplit(substring(File_pdb, nchar(File_pdb) - 1, nchar(File_pdb)), "")] %>%
.[, num := mapply(function(nm, l2) match(nm, l2), Chain, last2) ] %>%
.[, c("variable", "last2") := NULL ] %>%
dcast(File_pdb + i ~ num, value.var = c("Res", "Pos", "Chain"), sep = "") %>%
.[, i := NULL] %>%
.[]
# File_pdb Res1 Res2 Pos1 Pos2 Chain1 Chain2
# <char> <char> <char> <char> <char> <char> <char>
# 1: 6wps_AH CYS TRP 361 105 A H
# 2: 6wps_AH LYS GLU 356 108 A H
# 3: 6wps_AH LYS PHE 356 106 A H
# 4: 6wps_AH THR SER 345 109 A H
# 5: 6wps_AH THR ILE 345 111 A H
# 6: 6wps_AH ASN TYR 343 100 A H
# 7: 6wps_AH GLU GLY 340 103 A H
# 8: 6wps_AH GLU PHE 340 106 A H
# 9: 6wps_AH GLU LEU 340 110 A H
# 10: 6wps_AH PRO TRP 337 105 A H
# 11: 6wps_AH PRO PHE 337 106 A H
# 12: 6wps_AH LEU PRO 335 28 A H
# 13: 6wps_AH ASN ILE 343 111 A H
# 14: 6wps_AH GLU TRP 340 105 A H
# 15: 6wps_AH GLU ALA 340 104 A H
# 16: 6wps_AH ASN THR 334 30 A H
顺便说一句:您可能更喜欢保持数据的长格式(使用melt
),而不是将其重新塑造为宽格式。许多分析过程更喜欢长格式(包括ggplot2
用于绘图),尽管如果您的工具集已经适应宽格式,这可能会带来一些挑战。
英文:
Here's a working approach (using data.table
) that does what you need. The final column order is not preserved, but that can be handled manually using setcolorder
or similar.
I'm using magrittr::%>%
for piping here; this is not required and can be removed in favor of data.table
-piping ([...][...][...]
) for much of it or the use of temporary objects.
library(data.table)
quux %>%
.[, i := .I] %>%
melt(id.vars = c("File_pdb", "i"), measure.vars = patterns(Res="^Res", Pos="^Pos", Chain="^Chain")) %>%
.[, last2 := strsplit(substring(File_pdb, nchar(File_pdb) - 1, nchar(File_pdb)), "")] %>%
.[, num := mapply(function(nm, l2) match(nm, l2), Chain, last2) ] %>%
.[, c("variable", "last2") := NULL ] %>%
dcast(File_pdb + i ~ num, value.var = c("Res", "Pos", "Chain"), sep = "") %>%
.[, i := NULL] %>%
.[]
# File_pdb Res1 Res2 Pos1 Pos2 Chain1 Chain2
# <char> <char> <char> <char> <char> <char> <char>
# 1: 6wps_AH CYS TRP 361 105 A H
# 2: 6wps_AH LYS GLU 356 108 A H
# 3: 6wps_AH LYS PHE 356 106 A H
# 4: 6wps_AH THR SER 345 109 A H
# 5: 6wps_AH THR ILE 345 111 A H
# 6: 6wps_AH ASN TYR 343 100 A H
# 7: 6wps_AH GLU GLY 340 103 A H
# 8: 6wps_AH GLU PHE 340 106 A H
# 9: 6wps_AH GLU LEU 340 110 A H
# 10: 6wps_AH PRO TRP 337 105 A H
# 11: 6wps_AH PRO PHE 337 106 A H
# 12: 6wps_AH LEU PRO 335 28 A H
# 13: 6wps_AH ASN ILE 343 111 A H
# 14: 6wps_AH GLU TRP 340 105 A H
# 15: 6wps_AH GLU ALA 340 104 A H
# 16: 6wps_AH ASN THR 334 30 A H
Side note: you might prefer to keep your data in the long (melt
ed) format instead of reshaping back to the wide format. Many analytic processes prefer long (including ggplot2
for plotting), though it does come with some challenges if your toolsets are already set for wide format.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论