如何为使用prince库绘制的散点图点添加注释

huangapple go评论68阅读模式
英文:

How to annotate scatter points plotted with the prince library

问题

我正在使用prince库来执行对应分析(Correspondence Analysis)。

from prince import CA

我的列联表(contingency table)dummy_contingency看起来像这样:

{
 'v1': {'0': 4.479591836734694, '1': 75.08163265306122, '2': 1.1020408163265305, '3': 5.285714285714286, '4': 14.244897959183673, '5': 0.0, '6': 94.06122448979592, '7': 0.5102040816326531, '8': 87.62244897959184, '9': 16.102040816326532},
 'v2': {'0': 6.142857142857143, '1': 24.653061224489797, '2': 0.3979591836734694, '3': 2.63265306122449, '4': 18.714285714285715, '5': 0.0, '6': 60.92857142857143, '7': 1.030612244897959, '8': 71.73469387755102, '9': 14.76530612244898},
 'v3': {'0': 3.642857142857143, '1': 21.551020408163264, '2': 0.8061224489795918, '3': 2.979591836734694, '4': 14.5, '5': 0.030612244897959183, '6': 39.60204081632653, '7': 0.7551020408163265, '8': 71.89795918367346, '9': 11.571428571428571},
 'v4': {'0': 6.1020408163265305, '1': 25.632653061224488, '2': 0.6938775510204082, '3': 3.9285714285714284, '4': 21.581632653061224, '5': 0.22448979591836735, '6': 10.704081632653061, '7': 0.8469387755102041, '8': 71.21428571428571, '9': 12.489795918367347}
}

卡方检验表明存在相关性:

卡方统计量:69.6630377155341
p值:1.2528156966101567e-05

现在,我拟合了数据:

dummy_contingency = pd.DataFrame(dummy_contingency)

ca_dummy = CA(n_components=2)  # 对应分析的成分数量
ca_dummy.fit(dummy_contingency)

以及绘图:

fig = ca_dummy.plot(
    X=dummy_contingency)
fig

如何为此图添加标签?其他人发布的示例(https://stackoverflow.com/questions/48521740/using-mca-package-in-python)使用了plot_coordinates()函数,该函数具有放置标签的选项。但似乎prince包不再提供此函数,需要使用没有放置标签选项的plot()函数。感谢任何帮助。

编辑:
带标签的输出示例:

如何为使用prince库绘制的散点图点添加注释

图中每个点的文本,如“草莓”、“香蕉”、“酸奶”等,都是我正在寻找的标签,这些标签将是蓝色点的索引值0,1,2,3,4,5,6,7,8,9以及橙色点的列名“v1”, “v2”, “v3”, “v4”。

英文:

I am using the library prince in order to perform Correspondence Analysis

from prince import CA

My contingency table dummy_contingency looks like this:

{'v1': {'0': 4.479591836734694,
  '1': 75.08163265306122,
  '2': 1.1020408163265305,
  '3': 5.285714285714286,
  '4': 14.244897959183673,
  '5': 0.0,
  '6': 94.06122448979592,
  '7': 0.5102040816326531,
  '8': 87.62244897959184,
  '9': 16.102040816326532},
 'v2': {'0': 6.142857142857143,
  '1': 24.653061224489797,
  '2': 0.3979591836734694,
  '3': 2.63265306122449,
  '4': 18.714285714285715,
  '5': 0.0,
  '6': 60.92857142857143,
  '7': 1.030612244897959,
  '8': 71.73469387755102,
  '9': 14.76530612244898},
 'v3': {'0': 3.642857142857143,
  '1': 21.551020408163264,
  '2': 0.8061224489795918,
  '3': 2.979591836734694,
  '4': 14.5,
  '5': 0.030612244897959183,
  '6': 39.60204081632653,
  '7': 0.7551020408163265,
  '8': 71.89795918367346,
  '9': 11.571428571428571},
 'v4': {'0': 6.1020408163265305,
  '1': 25.632653061224488,
  '2': 0.6938775510204082,
  '3': 3.9285714285714284,
  '4': 21.581632653061224,
  '5': 0.22448979591836735,
  '6': 10.704081632653061,
  '7': 0.8469387755102041,
  '8': 71.21428571428571,
  '9': 12.489795918367347}}

Chi Square Test reveals dependence:

Chi-square statistic: 69.6630377155341
p-value: 1.2528156966101567e-05

Now I fit the data:

dummy_contingency = pd.DataFrame(dummy_contingency)

ca_dummy = CA(n_components=2)  # Number of components for correspondence analysis
ca_dummy.fit(dummy_contingency)

And the plot:

fig = ca_dummy.plot(
    X=dummy_contingency)
fig

如何为使用prince库绘制的散点图点添加注释

How do I get the labelling done for this plot? The examples posted by others (https://stackoverflow.com/questions/48521740/using-mca-package-in-python) uses the function plot_coordinates() which has the option of putting the labels as well. But it looks like this function is no longer available with prince package and need to use the plot() function which does not have the option to put labels. Appreciate any help on this.

Edit:
Example of an output with labels:
如何为使用prince库绘制的散点图点添加注释

The text for each of the points in the plot like "strawberries", "banana", "yogurt", etc. are the labels that I am looking for, which in this will be the index values 0,1,2,3,4,5,6,7,8,9 for the blue points and the column names "v1", "v2", "v3", "v4" for the orange points.

答案1

得分: 3

  • 将注释添加到散点图中的方法来自于如何使用Altair进行注释,但这不包括从 ca 绘制点所需的步骤。
  • 为了注释 correspondence-analysis 绘图,必须从 ca 模型中提取 .column_coordinates.row_coordinates。这些是绘图上的点,而不是来自于 df 的点。
import pandas as pd
import prince
import altair as alt

# 将数据字典转换为数据框
df = pd.DataFrame(dummy_contingency)

# 创建模型
ca = prince.CA()

# 拟合模型
ca = ca.fit(df)

# 提取列坐标数据框,并更改列名
cc = ca.column_coordinates(df).reset_index()
cc.columns = ['name', 'x', 'y']

# 提取行坐标数据框,并更改列名
rc = ca.row_coordinates(df).reset_index()
rc.columns = ['name', 'x', 'y']

# 合并数据框
crc_df = pd.concat([cc, rc], ignore_index=True)

# 绘制和注释
points = ca.plot(df)

annot = alt.Chart(crc_df).mark_text(
    align='left',
    baseline='middle',
    fontSize=20,
    dx=7
).encode(
    x='x',
    y='y',
    text='name'
)

points + annot
  • 请注意,绘图已经具有浮动注释,无需添加 annot

如何为使用prince库绘制的散点图点添加注释


  • 注释也可以在不将 ccrc 合并为单个数据框的情况下添加。
points = ca.plot(df)

annot1 = alt.Chart(cc).mark_text(
    align='left',
    baseline='middle',
    fontSize=20,
    dx=7
).encode(
    x='x',
    y='y',
    text='name'
)

annot2 = alt.Chart(rc).mark_text(
    align='left',
    baseline='middle',
    fontSize=20,
    dx=7
).encode(
    x='x',
    y='y',
    text='name'
)

points + annot1 + annot2
英文:
  • Adding the annotations to the scatter plot comes from How to do annotations with Altair, however, this doesn't include the necessary steps to plot the points from ca.
  • In order to annotate the correspondence-analysis plot, .column_coordinates and .row_coordinates must be extracted from the ca model. These are the points on the plot, not those from df.
import pandas as pd
import prince
import altair as alt

# convert the dictionary of data to a dataframe
df = pd.DataFrame(dummy_contingency)

# create the model
ca = prince.CA()

# fit the model
ca = ca.fit(df)

# extract the column coordinate dataframe, and change the column names
cc = ca.column_coordinates(df).reset_index()
cc.columns = ['name', 'x', 'y']

# extract the row coordinates dataframe, and change the column names
rc = ca.row_coordinates(df).reset_index()
rc.columns = ['name', 'x', 'y']

# combine the dataframes
crc_df = pd.concat([cc, rc], ignore_index=True)

# plot and annotate
points = ca.plot(df)

annot = alt.Chart(crc_df).mark_text(
    align='left',
    baseline='middle',
    fontSize = 20,
    dx = 7
).encode(
    x='x',
    y='y',
    text='name'
)

points + annot
  • Note the plot already has floating annotations, without adding annot.

如何为使用prince库绘制的散点图点添加注释


  • The annotations can also be added without combining cc and rc into a single dataframe.
points = ca.plot(df)

annot1 = alt.Chart(cc).mark_text(
    align='left',
    baseline='middle',
    fontSize = 20,
    dx = 7
).encode(
    x='x',
    y='y',
    text='name'
)

annot2 = alt.Chart(rc).mark_text(
    align='left',
    baseline='middle',
    fontSize = 20,
    dx = 7
).encode(
    x='x',
    y='y',
    text='name'
)

points + annot1 + annot2

huangapple
  • 本文由 发表于 2023年6月15日 00:12:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475602.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定