Display the distribution of two groups on the same plot, using two data frames.

huangapple go评论74阅读模式

Display the distribution of two groups on the same plot, using two data frames


I have a data frame of scores, it has 2000 rows and 25 columns. The columns are features and rows are samples. This data frame will be the data I use to plot the distributions.


In another data frame, metadata, I have clinical information about each sample in the scores data frame, like gender, age, type of disease, treatment, and most importantly outcome to treatment. This data frame will serve as labels, it gives the label for each sample.


The two data frames have the exact same samples.


There are three columns that describe a different kind of response to each sample, and those columns are binary, yes or no.


My target is to make a distribution plot for the samples that belong to the yes or no groups, in each of those 3 columns.


Here is an example. Say this is scores:


                  Feature_1        Feature_2        Feature_3
Patient_1            0.56             0.11             0.03
Ptient_2             0.605             0.34            0.49
P_3                  0.1              0.76             0.42
12312AX              0.9              0.382            0.12
P_10                 0.89             0.30             0.119
12312BX              0.232            0.118            0.80
12312CX              0.679            0.31             0.789

And this is metadata:


                  Gender        Age        Outcome1       Outcome2        Outcome3

Patient_1           M           54            1              0                0
Ptient_2            M           28            0              0                1
P_3                 F           32            1              1                0
12312AX             F           87            0              0                1
P_10                F           43            0              0                1
12312BX             M           90            1              1                0             
12312CX             F           65            1              0                0

Now, for example, I want to plot Feature_1 for the samples that are Outcome1 = 1 vs. the samples that are with label Outcome1 = 0, and put them on the same plot to see the difference. A plot that would look like this:

现在,例如,我想绘制Feature_1,对于Outcome1 = 1的样本与具有标签Outcome1 = 0的样本,并将它们放在同一个图上以查看差异。一个看起来像这样的图:

Display the distribution of two groups on the same plot, using two data frames.

It doesn't matter if it's not filled with color.


This is some subset of the data. Starting with scores:


structure(list(`Feature_1` = c(0.58126387599574, 0.554773857342486, 
0.73811669435931, 0.5993561705421, 0.549993884896126, 0.560952809292699, 
0.514920708901865, 0.668611976328753, 0.579311040856707, 0.627079649056927, 
0.549778821698995, 0.563433551362653, 0.566883741540508, 0.586839499814986, 
0.527874599585146, 0.533974585406425, 0.583020804822263, 0.607821542253184, 
0.570922624085177, 0.531065608748296), `Feature_2` = c(0.671868971517913, 
0.657649690364772, 0.681277871841209, 0.633247301225077, 0.658829966989863, 
0.649553434195565, 0.654719152272398, 0.678510931368968, 0.67606269281911, 
0.657861486037168, 0.656157657102225, 0.654684442044789, 0.660668253143108, 
0.680000904001928, 0.676215636114716, 0.68015840395165, 0.656533748483226, 
0.654344382579621, 0.626207872177309, 0.640129803823085), `Feature10` = c(0.607691853076, 
0.507746766229958, 0.642056075026442, 0.647793952813017, 0.571844979370279, 
0.592183904204232, 0.473827520445559, 0.618900091543045, 0.60656936545554, 
0.60603612041945, 0.510241627095173, 0.564418205496303, 0.561084611266194, 
0.558495659089567, 0.503235910349171, 0.492768739941572, 0.551283907128425, 
0.664425637003928, 0.541804175576185, 0.537845283573044)), row names = c("Pt1", 
"Pt10", "Pt101", "Pt103", "Pt106", "Pt11", "Pt17", "Pt18


I have a data frame of `scores`, it has 2000 rows and 25 columns. The columns are features and rows are samples. This data frame will be the data I use to plot the distributions.

In another data frame,`metadata`, I have clinical information about each sample in the `scores` data frame, like gender, age, type of diease, treatment, and most importantly outcome to treatment. This data frame will serve as labels, it gives the label for each sample. 

The two dataframe have the exact same samples.

There are three columns that describe a different kind of response to each sample, and those columns are binrary, yes or no. 

My target is to make a distribution plot for the samples that belong to the yes or no groups, in each of those 3 columns.

Here is an example. Say this is `scores`:

                      Feature_1        Feature_2        Feature_3
    Patient_1            0.56             0.11             0.03
    Ptient_2             0.605             0.34            0.49
    P_3                  0.1              0.76             0.42
    12312AX              0.9              0.382            0.12
    P_10                 0.89             0.30             0.119
    12312BX              0.232            0.118            0.80
    12312CX              0.679            0.31             0.789

And this is `metadata`:

                      Gender        Age        Outcome1       Outcome2        Outcome3
    Patient_1           M           54            1              0                0
    Ptient_2            M           28            0              0                1
    P_3                 F           32            1              1                0
    12312AX             F           87            0              0                1
    P_10                F           43            0              0                1
    12312BX             M           90            1              1                0             
    12312CX             F           65            1              0                0

Now, for example, I want to plot `Feature_1` for the sameples that are `Outcome1 = 1` vs the samples that are with label`Outcome1 = 0`, and put them on the same plot to see the difference. A plot that would look like this:

[![enter image description here][1]][1]

  [1]: https://i.stack.imgur.com/lpflL.png

It doesn&#39;t matter if it&#39;s not filled with color.

This is some subset of the data. Starting with `scores`:

    structure(list(`Feature_1` = c(0.58126387599574, 0.554773857342486, 
    0.73811669435931, 0.5993561705421, 0.549993884896126, 0.560952809292699, 
    0.514920708901865, 0.668611976328753, 0.579311040856707, 0.627079649056927, 
    0.549778821698995, 0.563433551362653, 0.566883741540508, 0.586839499814986, 
    0.527874599585146, 0.533974585406425, 0.583020804822263, 0.607821542253184, 
    0.570922624085177, 0.531065608748296), `Feature_2` = c(0.671868971517913, 
    0.657649690364772, 0.681277871841209, 0.633247301225077, 0.658829966989863, 
    0.649553434195565, 0.654719152272398, 0.678510931368968, 0.67606269281911, 
    0.657861486037168, 0.656157657102225, 0.654684442044789, 0.660668253143108, 
    0.680000904001928, 0.676215636114716, 0.68015840395165, 0.656533748483226, 
    0.654344382579621, 0.626207872177309, 0.640129803823085), `Feature10` = c(0.607691853076, 
    0.507746766229958, 0.642056075026442, 0.647793952813017, 0.571844979370279, 
    0.592183904204232, 0.473827520445559, 0.618900091543045, 0.60656936545554, 
    0.60603612041945, 0.510241627095173, 0.564418205496303, 0.561084611266194, 
    0.558495659089567, 0.503235910349171, 0.492768739941572, 0.551283907128425, 
    0.664425637003928, 0.541804175576185, 0.537845283573044)), row.names = c(&quot;Pt1&quot;, 
    &quot;Pt10&quot;, &quot;Pt101&quot;, &quot;Pt103&quot;, &quot;Pt106&quot;, &quot;Pt11&quot;, &quot;Pt17&quot;, &quot;Pt18&quot;, &quot;Pt2&quot;, 
    &quot;Pt24&quot;, &quot;Pt26&quot;, &quot;Pt27&quot;, &quot;Pt28&quot;, &quot;Pt29&quot;, &quot;Pt3&quot;, &quot;Pt30&quot;, &quot;Pt31&quot;, 
    &quot;Pt34&quot;, &quot;Pt36&quot;, &quot;Pt37&quot;), class = &quot;data.frame&quot;)

And the `metadata`:

    structure(list(Response = c(&quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;No&quot;), Gender = c(&quot;F&quot;, &quot;M&quot;, 
    &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;M&quot;, 
    &quot;M&quot;, &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, 
    &quot;M&quot;, &quot;F&quot;, &quot;F&quot;, &quot;M&quot;, &quot;F&quot;, 
    &quot;F&quot;, &quot;M&quot;, &quot;F&quot;), Response2 = c(1, 0, 0, 
    1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0)), row.names = c(&quot;Pt1&quot;, 
    &quot;Pt10&quot;, &quot;Pt101&quot;, &quot;Pt103&quot;, &quot;Pt106&quot;, &quot;Pt11&quot;, &quot;Pt17&quot;, &quot;Pt18&quot;, &quot;Pt2&quot;, 
    &quot;Pt24&quot;, &quot;Pt26&quot;, &quot;Pt27&quot;, &quot;Pt28&quot;, &quot;Pt29&quot;, &quot;Pt3&quot;, &quot;Pt30&quot;, &quot;Pt31&quot;, 
    &quot;Pt34&quot;, &quot;Pt36&quot;, &quot;Pt37&quot;), class = &quot;data.frame&quot;)


# 答案1
**得分**: 0


# 按行名称合并
df <- merge(score, metadata, by=0, all=TRUE)
# 绘图
ggplot(data=df, aes(x=Feature_1, fill=Response)) + geom_density(alpha=.3)


ggplot(data=df, aes(x=Feature_1, fill=factor(Response2))) + geom_density(alpha=.3)

Display the distribution of two groups on the same plot, using two data frames.


You can do this using ggplot2 package. So first merge the data by rownames, and then you can plot it with ggplot

# Merge by rownames
df &lt;- merge(score, metadata, by=0, all=TRUE)
# Plot 
ggplot(data=df, aes(x=Feature_1, fill=Response)) + geom_density(alpha=.3)

Display the distribution of two groups on the same plot, using two data frames.

If your categorical data is numerical ("0" or "1" instead of "Yes" "No", you can turn the variable into factor:

ggplot(data=df, aes(x=Feature_1, fill=factor(Response2))) + geom_density(alpha=.3)

  • 本文由 发表于 2023年2月23日 19:43:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75544363.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
