2023年2月23日 19:43:48go评论86阅读模式

英文:

Display the distribution of two groups on the same plot, using two data frames

问题

I have a data frame of scores, it has 2000 rows and 25 columns. The columns are features and rows are samples. This data frame will be the data I use to plot the distributions.

在我的数据帧scores中，有2000行和25列。列是特征，行是样本。这个数据帧将用于绘制分布图。

In another data frame, metadata, I have clinical information about each sample in the scores data frame, like gender, age, type of disease, treatment, and most importantly outcome to treatment. This data frame will serve as labels, it gives the label for each sample.

在另一个数据帧metadata中，我有有关scores数据帧中每个样本的临床信息，例如性别、年龄、疾病类型、治疗情况，尤其是治疗结果。这个数据帧将用作标签，它为每个样本提供标签。

The two data frames have the exact same samples.

这两个数据帧具有完全相同的样本。

There are three columns that describe a different kind of response to each sample, and those columns are binary, yes or no.

有三列描述对每个样本的不同类型的响应，这些列是二进制的，是或否。

My target is to make a distribution plot for the samples that belong to the yes or no groups, in each of those 3 columns.

我的目标是在这3列中，为属于“是”或“否”组的样本制作分布图。

Here is an example. Say this is scores:

这是一个例子。假设这是scores：

                  Feature_1        Feature_2        Feature_3
Patient_1            0.56             0.11             0.03
Ptient_2             0.605             0.34            0.49
P_3                  0.1              0.76             0.42
12312AX              0.9              0.382            0.12
P_10                 0.89             0.30             0.119
12312BX              0.232            0.118            0.80
12312CX              0.679            0.31             0.789

And this is metadata:

这是metadata：

                  Gender        Age        Outcome1       Outcome2        Outcome3
Patient_1           M           54            1              0                0
Ptient_2            M           28            0              0                1
P_3                 F           32            1              1                0
12312AX             F           87            0              0                1
P_10                F           43            0              0                1
12312BX             M           90            1              1                0             
12312CX             F           65            1              0                0

Now, for example, I want to plot Feature_1 for the samples that are Outcome1 = 1 vs. the samples that are with label Outcome1 = 0, and put them on the same plot to see the difference. A plot that would look like this:

现在，例如，我想绘制Feature_1，对于Outcome1 = 1的样本与具有标签Outcome1 = 0的样本，并将它们放在同一个图上以查看差异。一个看起来像这样的图：

It doesn't matter if it's not filled with color.

如果没有填充颜色也没关系。

This is some subset of the data. Starting with scores:

这是数据的一部分。从scores开始：

structure(list(`Feature_1` = c(0.58126387599574, 0.554773857342486, 
0.73811669435931, 0.5993561705421, 0.549993884896126, 0.560952809292699, 
0.514920708901865, 0.668611976328753, 0.579311040856707, 0.627079649056927, 
0.549778821698995, 0.563433551362653, 0.566883741540508, 0.586839499814986, 
0.527874599585146, 0.533974585406425, 0.583020804822263, 0.607821542253184, 
0.570922624085177, 0.531065608748296), `Feature_2` = c(0.671868971517913, 
0.657649690364772, 0.681277871841209, 0.633247301225077, 0.658829966989863, 
0.649553434195565, 0.654719152272398, 0.678510931368968, 0.67606269281911, 
0.657861486037168, 0.656157657102225, 0.654684442044789, 0.660668253143108, 
0.680000904001928, 0.676215636114716, 0.68015840395165, 0.656533748483226, 
0.654344382579621, 0.626207872177309, 0.640129803823085), `Feature10` = c(0.607691853076, 
0.507746766229958, 0.642056075026442, 0.647793952813017, 0.571844979370279, 
0.592183904204232, 0.473827520445559, 0.618900091543045, 0.60656936545554, 
0.60603612041945, 0.510241627095173, 0.564418205496303, 0.561084611266194, 
0.558495659089567, 0.503235910349171, 0.492768739941572, 0.551283907128425, 
0.664425637003928, 0.541804175576185, 0.537845283573044)), row names = c("Pt1", 
"Pt10", "Pt101", "Pt103", "Pt106", "Pt11", "Pt17", "Pt18
<details>
<summary>英文:</summary>
I have a data frame of `scores`, it has 2000 rows and 25 columns. The columns are features and rows are samples. This data frame will be the data I use to plot the distributions.
In another data frame,`metadata`, I have clinical information about each sample in the `scores` data frame, like gender, age, type of diease, treatment, and most importantly outcome to treatment. This data frame will serve as labels, it gives the label for each sample. 
The two dataframe have the exact same samples.
There are three columns that describe a different kind of response to each sample, and those columns are binrary, yes or no. 
My target is to make a distribution plot for the samples that belong to the yes or no groups, in each of those 3 columns.
Here is an example. Say this is `scores`:
                      Feature_1        Feature_2        Feature_3
    Patient_1            0.56             0.11             0.03
    Ptient_2             0.605             0.34            0.49
    P_3                  0.1              0.76             0.42
    12312AX              0.9              0.382            0.12
    P_10                 0.89             0.30             0.119
    12312BX              0.232            0.118            0.80
    12312CX              0.679            0.31             0.789
And this is `metadata`:
                      Gender        Age        Outcome1       Outcome2        Outcome3
    
    Patient_1           M           54            1              0                0
    Ptient_2            M           28            0              0                1
    P_3                 F           32            1              1                0
    12312AX             F           87            0              0                1
    P_10                F           43            0              0                1
    12312BX             M           90            1              1                0             
    12312CX             F           65            1              0                0
Now, for example, I want to plot `Feature_1` for the sameples that are `Outcome1 = 1` vs the samples that are with label`Outcome1 = 0`, and put them on the same plot to see the difference. A plot that would look like this:
[![enter image description here][1]][1]
  [1]: https://i.stack.imgur.com/lpflL.png
It doesn&#39;t matter if it&#39;s not filled with color.
This is some subset of the data. Starting with `scores`:
    structure(list(`Feature_1` = c(0.58126387599574, 0.554773857342486, 
    0.73811669435931, 0.5993561705421, 0.549993884896126, 0.560952809292699, 
    0.514920708901865, 0.668611976328753, 0.579311040856707, 0.627079649056927, 
    0.549778821698995, 0.563433551362653, 0.566883741540508, 0.586839499814986, 
    0.527874599585146, 0.533974585406425, 0.583020804822263, 0.607821542253184, 
    0.570922624085177, 0.531065608748296), `Feature_2` = c(0.671868971517913, 
    0.657649690364772, 0.681277871841209, 0.633247301225077, 0.658829966989863, 
    0.649553434195565, 0.654719152272398, 0.678510931368968, 0.67606269281911, 
    0.657861486037168, 0.656157657102225, 0.654684442044789, 0.660668253143108, 
    0.680000904001928, 0.676215636114716, 0.68015840395165, 0.656533748483226, 
    0.654344382579621, 0.626207872177309, 0.640129803823085), `Feature10` = c(0.607691853076, 
    0.507746766229958, 0.642056075026442, 0.647793952813017, 0.571844979370279, 
    0.592183904204232, 0.473827520445559, 0.618900091543045, 0.60656936545554, 
    0.60603612041945, 0.510241627095173, 0.564418205496303, 0.561084611266194, 
    0.558495659089567, 0.503235910349171, 0.492768739941572, 0.551283907128425, 
    0.664425637003928, 0.541804175576185, 0.537845283573044)), row.names = c(&quot;Pt1&quot;, 
    &quot;Pt10&quot;, &quot;Pt101&quot;, &quot;Pt103&quot;, &quot;Pt106&quot;, &quot;Pt11&quot;, &quot;Pt17&quot;, &quot;Pt18&quot;, &quot;Pt2&quot;, 
    &quot;Pt24&quot;, &quot;Pt26&quot;, &quot;Pt27&quot;, &quot;Pt28&quot;, &quot;Pt29&quot;, &quot;Pt3&quot;, &quot;Pt30&quot;, &quot;Pt31&quot;, 
    &quot;Pt34&quot;, &quot;Pt36&quot;, &quot;Pt37&quot;), class = &quot;data.frame&quot;)
And the `metadata`:
    structure(list(Response = c(&quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, 
    &quot;No&quot;, &quot;No&quot;), Gender = c(&quot;F&quot;, &quot;M&quot;, 
    &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;M&quot;, 
    &quot;M&quot;, &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, 
    &quot;M&quot;, &quot;F&quot;, &quot;F&quot;, &quot;M&quot;, &quot;F&quot;, 
    &quot;F&quot;, &quot;M&quot;, &quot;F&quot;), Response2 = c(1, 0, 0, 
    1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0)), row.names = c(&quot;Pt1&quot;, 
    &quot;Pt10&quot;, &quot;Pt101&quot;, &quot;Pt103&quot;, &quot;Pt106&quot;, &quot;Pt11&quot;, &quot;Pt17&quot;, &quot;Pt18&quot;, &quot;Pt2&quot;, 
    &quot;Pt24&quot;, &quot;Pt26&quot;, &quot;Pt27&quot;, &quot;Pt28&quot;, &quot;Pt29&quot;, &quot;Pt3&quot;, &quot;Pt30&quot;, &quot;Pt31&quot;, 
    &quot;Pt34&quot;, &quot;Pt36&quot;, &quot;Pt37&quot;), class = &quot;data.frame&quot;)
</details>
# 答案1
**得分**: 0
你可以使用ggplot2包来实现这个。首先通过行名称合并数据，然后你可以使用ggplot绘制图形。
```R
# 按行名称合并
df <- merge(score, metadata, by=0, all=TRUE)
# 绘图
library(ggplot2)
ggplot(data=df, aes(x=Feature_1, fill=Response)) + geom_density(alpha=.3)

如果你的分类数据是数值的（例如"0"或"1"而不是"Yes"或"No"），你可以将变量转换为因子：

ggplot(data=df, aes(x=Feature_1, fill=factor(Response2))) + geom_density(alpha=.3)

英文:

You can do this using ggplot2 package. So first merge the data by rownames, and then you can plot it with ggplot

# Merge by rownames
df &lt;- merge(score, metadata, by=0, all=TRUE)
# Plot 
library(ggplot2)
ggplot(data=df, aes(x=Feature_1, fill=Response)) + geom_density(alpha=.3)

If your categorical data is numerical ("0" or "1" instead of "Yes" "No", you can turn the variable into factor:

ggplot(data=df, aes(x=Feature_1, fill=factor(Response2))) + geom_density(alpha=.3)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Display the distribution of two groups on the same plot, using two data frames.

问题

根据另一个数据框填充数据框中的缺失值。

用R替换NA值为一组随机生成的数值。

在R Shiny应用中呈现Markdown (.md)表格是否可能？

如何查找具有三个或更少不同值的列

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。