Display the distribution of two groups on the same plot, using two data frames.

huangapple go评论60阅读模式
英文:

Display the distribution of two groups on the same plot, using two data frames

问题

I have a data frame of scores, it has 2000 rows and 25 columns. The columns are features and rows are samples. This data frame will be the data I use to plot the distributions.

在我的数据帧scores中,有2000行和25列。列是特征,行是样本。这个数据帧将用于绘制分布图。

In another data frame, metadata, I have clinical information about each sample in the scores data frame, like gender, age, type of disease, treatment, and most importantly outcome to treatment. This data frame will serve as labels, it gives the label for each sample.

在另一个数据帧metadata中,我有有关scores数据帧中每个样本的临床信息,例如性别、年龄、疾病类型、治疗情况,尤其是治疗结果。这个数据帧将用作标签,它为每个样本提供标签。

The two data frames have the exact same samples.

这两个数据帧具有完全相同的样本。

There are three columns that describe a different kind of response to each sample, and those columns are binary, yes or no.

有三列描述对每个样本的不同类型的响应,这些列是二进制的,是或否。

My target is to make a distribution plot for the samples that belong to the yes or no groups, in each of those 3 columns.

我的目标是在这3列中,为属于“是”或“否”组的样本制作分布图。

Here is an example. Say this is scores:

这是一个例子。假设这是scores

                  Feature_1        Feature_2        Feature_3
Patient_1            0.56             0.11             0.03
Ptient_2             0.605             0.34            0.49
P_3                  0.1              0.76             0.42
12312AX              0.9              0.382            0.12
P_10                 0.89             0.30             0.119
12312BX              0.232            0.118            0.80
12312CX              0.679            0.31             0.789

And this is metadata:

这是metadata

                  Gender        Age        Outcome1       Outcome2        Outcome3

Patient_1           M           54            1              0                0
Ptient_2            M           28            0              0                1
P_3                 F           32            1              1                0
12312AX             F           87            0              0                1
P_10                F           43            0              0                1
12312BX             M           90            1              1                0             
12312CX             F           65            1              0                0

Now, for example, I want to plot Feature_1 for the samples that are Outcome1 = 1 vs. the samples that are with label Outcome1 = 0, and put them on the same plot to see the difference. A plot that would look like this:

现在,例如,我想绘制Feature_1,对于Outcome1 = 1的样本与具有标签Outcome1 = 0的样本,并将它们放在同一个图上以查看差异。一个看起来像这样的图:

Display the distribution of two groups on the same plot, using two data frames.

It doesn't matter if it's not filled with color.

如果没有填充颜色也没关系。

This is some subset of the data. Starting with scores:

这是数据的一部分。从scores开始:

structure(list(`Feature_1` = c(0.58126387599574, 0.554773857342486, 
0.73811669435931, 0.5993561705421, 0.549993884896126, 0.560952809292699, 
0.514920708901865, 0.668611976328753, 0.579311040856707, 0.627079649056927, 
0.549778821698995, 0.563433551362653, 0.566883741540508, 0.586839499814986, 
0.527874599585146, 0.533974585406425, 0.583020804822263, 0.607821542253184, 
0.570922624085177, 0.531065608748296), `Feature_2` = c(0.671868971517913, 
0.657649690364772, 0.681277871841209, 0.633247301225077, 0.658829966989863, 
0.649553434195565, 0.654719152272398, 0.678510931368968, 0.67606269281911, 
0.657861486037168, 0.656157657102225, 0.654684442044789, 0.660668253143108, 
0.680000904001928, 0.676215636114716, 0.68015840395165, 0.656533748483226, 
0.654344382579621, 0.626207872177309, 0.640129803823085), `Feature10` = c(0.607691853076, 
0.507746766229958, 0.642056075026442, 0.647793952813017, 0.571844979370279, 
0.592183904204232, 0.473827520445559, 0.618900091543045, 0.60656936545554, 
0.60603612041945, 0.510241627095173, 0.564418205496303, 0.561084611266194, 
0.558495659089567, 0.503235910349171, 0.492768739941572, 0.551283907128425, 
0.664425637003928, 0.541804175576185, 0.537845283573044)), row names = c("Pt1", 
"Pt10", "Pt101", "Pt103", "Pt106", "Pt11", "Pt17", "Pt18

<details>
<summary>英文:</summary>

I have a data frame of `scores`, it has 2000 rows and 25 columns. The columns are features and rows are samples. This data frame will be the data I use to plot the distributions.

In another data frame,`metadata`, I have clinical information about each sample in the `scores` data frame, like gender, age, type of diease, treatment, and most importantly outcome to treatment. This data frame will serve as labels, it gives the label for each sample. 

The two dataframe have the exact same samples.

There are three columns that describe a different kind of response to each sample, and those columns are binrary, yes or no. 

My target is to make a distribution plot for the samples that belong to the yes or no groups, in each of those 3 columns.

Here is an example. Say this is `scores`:

                      Feature_1        Feature_2        Feature_3
    Patient_1            0.56             0.11             0.03
    Ptient_2             0.605             0.34            0.49
    P_3                  0.1              0.76             0.42
    12312AX              0.9              0.382            0.12
    P_10                 0.89             0.30             0.119
    12312BX              0.232            0.118            0.80
    12312CX              0.679            0.31             0.789


And this is `metadata`:

                      Gender        Age        Outcome1       Outcome2        Outcome3
    
    Patient_1           M           54            1              0                0
    Ptient_2            M           28            0              0                1
    P_3                 F           32            1              1                0
    12312AX             F           87            0              0                1
    P_10                F           43            0              0                1
    12312BX             M           90            1              1                0             
    12312CX             F           65            1              0                0

Now, for example, I want to plot `Feature_1` for the sameples that are `Outcome1 = 1` vs the samples that are with label`Outcome1 = 0`, and put them on the same plot to see the difference. A plot that would look like this:

[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/lpflL.png

It doesn&#39;t matter if it&#39;s not filled with color.

This is some subset of the data. Starting with `scores`:

    structure(list(`Feature_1` = c(0.58126387599574, 0.554773857342486, 
    0.73811669435931, 0.5993561705421, 0.549993884896126, 0.560952809292699, 
    0.514920708901865, 0.668611976328753, 0.579311040856707, 0.627079649056927, 
    0.549778821698995, 0.563433551362653, 0.566883741540508, 0.586839499814986, 
    0.527874599585146, 0.533974585406425, 0.583020804822263, 0.607821542253184, 
    0.570922624085177, 0.531065608748296), `Feature_2` = c(0.671868971517913, 
    0.657649690364772, 0.681277871841209, 0.633247301225077, 0.658829966989863, 
    0.649553434195565, 0.654719152272398, 0.678510931368968, 0.67606269281911, 
    0.657861486037168, 0.656157657102225, 0.654684442044789, 0.660668253143108, 
    0.680000904001928, 0.676215636114716, 0.68015840395165, 0.656533748483226, 
    0.654344382579621, 0.626207872177309, 0.640129803823085), `Feature10` = c(0.607691853076, 
    0.507746766229958, 0.642056075026442, 0.647793952813017, 0.571844979370279, 
    0.592183904204232, 0.473827520445559, 0.618900091543045, 0.60656936545554, 
    0.60603612041945, 0.510241627095173, 0.564418205496303, 0.561084611266194, 
    0.558495659089567, 0.503235910349171, 0.492768739941572, 0.551283907128425, 
    0.664425637003928, 0.541804175576185, 0.537845283573044)), row.names = c(&quot;Pt1&quot;, 
    &quot;Pt10&quot;, &quot;Pt101&quot;, &quot;Pt103&quot;, &quot;Pt106&quot;, &quot;Pt11&quot;, &quot;Pt17&quot;, &quot;Pt18&quot;, &quot;Pt2&quot;, 
    &quot;Pt24&quot;, &quot;Pt26&quot;, &quot;Pt27&quot;, &quot;Pt28&quot;, &quot;Pt29&quot;, &quot;Pt3&quot;, &quot;Pt30&quot;, &quot;Pt31&quot;, 
    &quot;Pt34&quot;, &quot;Pt36&quot;, &quot;Pt37&quot;), class = &quot;data.frame&quot;)

And the `metadata`:

    structure(list(Response = c(&quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;No&quot;), Gender = c(&quot;F&quot;, &quot;M&quot;, 
    &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;M&quot;, 
    &quot;M&quot;, &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, 
    &quot;M&quot;, &quot;F&quot;, &quot;F&quot;, &quot;M&quot;, &quot;F&quot;, 
    &quot;F&quot;, &quot;M&quot;, &quot;F&quot;), Response2 = c(1, 0, 0, 
    1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0)), row.names = c(&quot;Pt1&quot;, 
    &quot;Pt10&quot;, &quot;Pt101&quot;, &quot;Pt103&quot;, &quot;Pt106&quot;, &quot;Pt11&quot;, &quot;Pt17&quot;, &quot;Pt18&quot;, &quot;Pt2&quot;, 
    &quot;Pt24&quot;, &quot;Pt26&quot;, &quot;Pt27&quot;, &quot;Pt28&quot;, &quot;Pt29&quot;, &quot;Pt3&quot;, &quot;Pt30&quot;, &quot;Pt31&quot;, 
    &quot;Pt34&quot;, &quot;Pt36&quot;, &quot;Pt37&quot;), class = &quot;data.frame&quot;)



</details>


# 答案1
**得分**: 0

你可以使用ggplot2包来实现这个。首先通过行名称合并数据,然后你可以使用ggplot绘制图形。

```R
# 按行名称合并
df <- merge(score, metadata, by=0, all=TRUE)
# 绘图
library(ggplot2)
ggplot(data=df, aes(x=Feature_1, fill=Response)) + geom_density(alpha=.3)

如果你的分类数据是数值的(例如"0"或"1"而不是"Yes"或"No"),你可以将变量转换为因子:

ggplot(data=df, aes(x=Feature_1, fill=factor(Response2))) + geom_density(alpha=.3)

Display the distribution of two groups on the same plot, using two data frames.

英文:

You can do this using ggplot2 package. So first merge the data by rownames, and then you can plot it with ggplot

# Merge by rownames
df &lt;- merge(score, metadata, by=0, all=TRUE)
# Plot 
library(ggplot2)
ggplot(data=df, aes(x=Feature_1, fill=Response)) + geom_density(alpha=.3)

Display the distribution of two groups on the same plot, using two data frames.

If your categorical data is numerical ("0" or "1" instead of "Yes" "No", you can turn the variable into factor:

ggplot(data=df, aes(x=Feature_1, fill=factor(Response2))) + geom_density(alpha=.3)

huangapple
  • 本文由 发表于 2023年2月23日 19:43:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75544363.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定