2023年3月23日 09:54:47go评论138阅读模式

英文:

violin plot with categorization using two different columns of data for "one violin"

问题

尝试可视化数据框中存储的数据分布。
我有1000行，每一行都有以下列：

样本编号 | chi_2_n_est | chi_2_n_theo
---------------------------------------
1         | 1.01        | 1.001 
1         | 1.03        | 1.012 
... 
2         | 1.11        | 1.04
3         | 1.21        | 1.03
...

我想要展示存储在 chi_2_n_est 和 chi_2_n_theo 列中的数据的小提琴图，但需要一个拆分器，以比较数据框中每个样本编号的分布。

类似于这样：

其中蓝色表示 chi_2_n_est 的分布，橙色表示 chi_2_n_theo 的分布，分别针对数据框中的每个样本编号...

英文:

trying to visualize the distributions of the data stored in a dataframe.
I have 1000 rows, each of them has next columns:

sample_id | chi_2_n_est | chi_2_n_theo 
---------------------------------------
1         | 1.01        | 1.001 
1         | 1.03        |1.012 
... 
2         | 1.11        | 1.04
3         | 1.21        | 1.03
...

I want to display violin plots for the data stored in columns chi_2_n_est and chi_2_n_theo, but splitter - to compare the distributions for each sample_id in the dataframe.

Something similar to:

Where blue will be the distribution for chi_2_n_est, and orange for chi_2_n_theo for each sample_id...

答案1

得分: 0

I don't have your data, but I created a random sample that will hopefully mimic yours. I also misspoke, this is the opposite of a pivot, your data is pivoted

import pandas as pd
import numpy as np
import seaborn as sns
# create dummy data
data = {
    'product_id': np.random.choice(2, 22, replace=True) + 1,
    'chi_2_ne': np.random.uniform(0.1, 1.9, 22),
    'chi_2_theo': np.random.uniform(0.1, 1.9, 22)
}
# load into a dataframe
df = pd.DataFrame.from_dict(data)
# use melt to blend columns into rows (opposite of pivot, actually)
pdf = df.melt(id_vars=['product_id'], value_vars=['chi_2_ne', 'chi_2_theo'], var_name='measure', value_name='value')
# use seaborn to create a violin plot where split=True
sns.violinplot(data=pdf, x="product_id", y="value", hue="measure", split=True)

To create

Hopefully this is what you are looking for, and similar enough to your raw data that it's useful. Notes on pd.melt and sns.violinplot if you need it

英文:

I don't have your data, but I created a random sample that will hopefully mimic yours. I also misspoke, this is the opposite of a pivot, your data is pivoted

import pandas as pd
import numpy as np
import seaborn as sns
# create dummy data
data = {
    &#39;product_id&#39;: np.random.choice(2, 22, replace=True)+1,
    &#39;chi_2_ne&#39;: np.random.uniform(0.1, 1.9, 22),
    &#39;chi_2_theo&#39;: np.random.uniform(0.1, 1.9, 22)
}
# load into a dataframe
df = pd.DataFrame.from_dict(data)
# use melt to blend columns into rows (opposite of pivot, actually)
pdf = df.melt(id_vars=[&#39;product_id&#39;], value_vars=[&#39;chi_2_ne&#39;, &#39;chi_2_theo&#39;], var_name=&#39;measure&#39;, value_name=&#39;value&#39;)
# use seaborn to create a violin plot where split=True
sns.violinplot(data=pdf, x=&quot;product_id&quot;, y=&quot;value&quot;, hue=&quot;measure&quot;, split=True)

To create

Hopefully this is what you are looking for, and similar enough to your raw data that it's useful. Notes on pd.melt and sns.violinplot if you need it

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“使用两个不同的数据列进行分类的小提琴图，用于“一个小提琴””

问题

答案1

“Diffusers SDXL” 错误类型：”TypeError: argument of type ‘NoneType’ is not iterable”

如何配置我的工具以忽略或阻止在 Jupyter 笔记本中更新 execution_count 字段？

如何优化这段Python代码以提高性能？

check if a dataframe is not empty in 1 line of code in python

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。