2023年6月15日 00:12:28go评论126阅读模式

英文:

How to annotate scatter points plotted with the prince library

问题

我正在使用prince库来执行对应分析（Correspondence Analysis）。

from prince import CA

我的列联表（contingency table）dummy_contingency看起来像这样：

{
 'v1': {'0': 4.479591836734694, '1': 75.08163265306122, '2': 1.1020408163265305, '3': 5.285714285714286, '4': 14.244897959183673, '5': 0.0, '6': 94.06122448979592, '7': 0.5102040816326531, '8': 87.62244897959184, '9': 16.102040816326532},
 'v2': {'0': 6.142857142857143, '1': 24.653061224489797, '2': 0.3979591836734694, '3': 2.63265306122449, '4': 18.714285714285715, '5': 0.0, '6': 60.92857142857143, '7': 1.030612244897959, '8': 71.73469387755102, '9': 14.76530612244898},
 'v3': {'0': 3.642857142857143, '1': 21.551020408163264, '2': 0.8061224489795918, '3': 2.979591836734694, '4': 14.5, '5': 0.030612244897959183, '6': 39.60204081632653, '7': 0.7551020408163265, '8': 71.89795918367346, '9': 11.571428571428571},
 'v4': {'0': 6.1020408163265305, '1': 25.632653061224488, '2': 0.6938775510204082, '3': 3.9285714285714284, '4': 21.581632653061224, '5': 0.22448979591836735, '6': 10.704081632653061, '7': 0.8469387755102041, '8': 71.21428571428571, '9': 12.489795918367347}
}

卡方检验表明存在相关性：

卡方统计量：69.6630377155341
p值：1.2528156966101567e-05

现在，我拟合了数据：

dummy_contingency = pd.DataFrame(dummy_contingency)
ca_dummy = CA(n_components=2)  # 对应分析的成分数量
ca_dummy.fit(dummy_contingency)

以及绘图：

fig = ca_dummy.plot(
    X=dummy_contingency)
fig

如何为此图添加标签？其他人发布的示例（https://stackoverflow.com/questions/48521740/using-mca-package-in-python）使用了plot_coordinates()函数，该函数具有放置标签的选项。但似乎prince包不再提供此函数，需要使用没有放置标签选项的plot()函数。感谢任何帮助。

编辑：
带标签的输出示例：

图中每个点的文本，如“草莓”、“香蕉”、“酸奶”等，都是我正在寻找的标签，这些标签将是蓝色点的索引值0,1,2,3,4,5,6,7,8,9以及橙色点的列名“v1”, “v2”, “v3”, “v4”。

英文:

I am using the library prince in order to perform Correspondence Analysis

from prince import CA

My contingency table dummy_contingency looks like this:

{&#39;v1&#39;: {&#39;0&#39;: 4.479591836734694,
  &#39;1&#39;: 75.08163265306122,
  &#39;2&#39;: 1.1020408163265305,
  &#39;3&#39;: 5.285714285714286,
  &#39;4&#39;: 14.244897959183673,
  &#39;5&#39;: 0.0,
  &#39;6&#39;: 94.06122448979592,
  &#39;7&#39;: 0.5102040816326531,
  &#39;8&#39;: 87.62244897959184,
  &#39;9&#39;: 16.102040816326532},
 &#39;v2&#39;: {&#39;0&#39;: 6.142857142857143,
  &#39;1&#39;: 24.653061224489797,
  &#39;2&#39;: 0.3979591836734694,
  &#39;3&#39;: 2.63265306122449,
  &#39;4&#39;: 18.714285714285715,
  &#39;5&#39;: 0.0,
  &#39;6&#39;: 60.92857142857143,
  &#39;7&#39;: 1.030612244897959,
  &#39;8&#39;: 71.73469387755102,
  &#39;9&#39;: 14.76530612244898},
 &#39;v3&#39;: {&#39;0&#39;: 3.642857142857143,
  &#39;1&#39;: 21.551020408163264,
  &#39;2&#39;: 0.8061224489795918,
  &#39;3&#39;: 2.979591836734694,
  &#39;4&#39;: 14.5,
  &#39;5&#39;: 0.030612244897959183,
  &#39;6&#39;: 39.60204081632653,
  &#39;7&#39;: 0.7551020408163265,
  &#39;8&#39;: 71.89795918367346,
  &#39;9&#39;: 11.571428571428571},
 &#39;v4&#39;: {&#39;0&#39;: 6.1020408163265305,
  &#39;1&#39;: 25.632653061224488,
  &#39;2&#39;: 0.6938775510204082,
  &#39;3&#39;: 3.9285714285714284,
  &#39;4&#39;: 21.581632653061224,
  &#39;5&#39;: 0.22448979591836735,
  &#39;6&#39;: 10.704081632653061,
  &#39;7&#39;: 0.8469387755102041,
  &#39;8&#39;: 71.21428571428571,
  &#39;9&#39;: 12.489795918367347}}

Chi Square Test reveals dependence:

Chi-square statistic: 69.6630377155341
p-value: 1.2528156966101567e-05

Now I fit the data:

dummy_contingency = pd.DataFrame(dummy_contingency)
ca_dummy = CA(n_components=2)  # Number of components for correspondence analysis
ca_dummy.fit(dummy_contingency)

And the plot:

fig = ca_dummy.plot(
    X=dummy_contingency)
fig

How do I get the labelling done for this plot? The examples posted by others (https://stackoverflow.com/questions/48521740/using-mca-package-in-python) uses the function plot_coordinates() which has the option of putting the labels as well. But it looks like this function is no longer available with prince package and need to use the plot() function which does not have the option to put labels. Appreciate any help on this.

Edit:
Example of an output with labels:

The text for each of the points in the plot like "strawberries", "banana", "yogurt", etc. are the labels that I am looking for, which in this will be the index values 0,1,2,3,4,5,6,7,8,9 for the blue points and the column names "v1", "v2", "v3", "v4" for the orange points.

答案1

得分: 3

将注释添加到散点图中的方法来自于如何使用Altair进行注释，但这不包括从 ca 绘制点所需的步骤。
为了注释 correspondence-analysis 绘图，必须从 ca 模型中提取 .column_coordinates 和 .row_coordinates。这些是绘图上的点，而不是来自于 df 的点。

import pandas as pd
import prince
import altair as alt
# 将数据字典转换为数据框
df = pd.DataFrame(dummy_contingency)
# 创建模型
ca = prince.CA()
# 拟合模型
ca = ca.fit(df)
# 提取列坐标数据框，并更改列名
cc = ca.column_coordinates(df).reset_index()
cc.columns = ['name', 'x', 'y']
# 提取行坐标数据框，并更改列名
rc = ca.row_coordinates(df).reset_index()
rc.columns = ['name', 'x', 'y']
# 合并数据框
crc_df = pd.concat([cc, rc], ignore_index=True)
# 绘制和注释
points = ca.plot(df)
annot = alt.Chart(crc_df).mark_text(
    align='left',
    baseline='middle',
    fontSize=20,
    dx=7
).encode(
    x='x',
    y='y',
    text='name'
)
points + annot

请注意，绘图已经具有浮动注释，无需添加 annot。

注释也可以在不将 cc 和 rc 合并为单个数据框的情况下添加。

points = ca.plot(df)
annot1 = alt.Chart(cc).mark_text(
    align='left',
    baseline='middle',
    fontSize=20,
    dx=7
).encode(
    x='x',
    y='y',
    text='name'
)
annot2 = alt.Chart(rc).mark_text(
    align='left',
    baseline='middle',
    fontSize=20,
    dx=7
).encode(
    x='x',
    y='y',
    text='name'
)
points + annot1 + annot2

英文:

Adding the annotations to the scatter plot comes from How to do annotations with Altair, however, this doesn't include the necessary steps to plot the points from ca.
In order to annotate the correspondence-analysis plot, .column_coordinates and .row_coordinates must be extracted from the ca model. These are the points on the plot, not those from df.

import pandas as pd
import prince
import altair as alt
# convert the dictionary of data to a dataframe
df = pd.DataFrame(dummy_contingency)
# create the model
ca = prince.CA()
# fit the model
ca = ca.fit(df)
# extract the column coordinate dataframe, and change the column names
cc = ca.column_coordinates(df).reset_index()
cc.columns = [&#39;name&#39;, &#39;x&#39;, &#39;y&#39;]
# extract the row coordinates dataframe, and change the column names
rc = ca.row_coordinates(df).reset_index()
rc.columns = [&#39;name&#39;, &#39;x&#39;, &#39;y&#39;]
# combine the dataframes
crc_df = pd.concat([cc, rc], ignore_index=True)
# plot and annotate
points = ca.plot(df)
annot = alt.Chart(crc_df).mark_text(
    align=&#39;left&#39;,
    baseline=&#39;middle&#39;,
    fontSize = 20,
    dx = 7
).encode(
    x=&#39;x&#39;,
    y=&#39;y&#39;,
    text=&#39;name&#39;
)
points + annot

Note the plot already has floating annotations, without adding annot.

The annotations can also be added without combining cc and rc into a single dataframe.

points = ca.plot(df)
annot1 = alt.Chart(cc).mark_text(
    align=&#39;left&#39;,
    baseline=&#39;middle&#39;,
    fontSize = 20,
    dx = 7
).encode(
    x=&#39;x&#39;,
    y=&#39;y&#39;,
    text=&#39;name&#39;
)
annot2 = alt.Chart(rc).mark_text(
    align=&#39;left&#39;,
    baseline=&#39;middle&#39;,
    fontSize = 20,
    dx = 7
).encode(
    x=&#39;x&#39;,
    y=&#39;y&#39;,
    text=&#39;name&#39;
)
points + annot1 + annot2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何为使用prince库绘制的散点图点添加注释

问题

答案1

如何使用Huggingface模型deberta-v3-base-absa-v1.1生成预定义方面的情感分数？

TensorFlow的序列模型无法识别数据的完整形状。

获取趋势线的方程（二次多项式）

移除列表中相邻的元素。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。