问题

我有以下数据集：

data = {
    '1': ['A', 'B', 'C', 'NAN', 'A', 'C', 'NAN', 'C', 'B', 'A'],
    '2': ['B', 'NAN', 'A', 'B', 'C', 'A', 'B', 'NAN', 'A', 'C'],
    '3': ['NAN', 'A', 'B', 'C', 'NAN', 'B', 'A', 'B', 'C', 'A'],
    '4': ['C', 'B', 'NAN', 'A', 'B', 'NAN', 'C', 'A', 'NAN', 'B']
}
df = pd.DataFrame(data)

我想要绘制这个数据结构的简单桑基图。我甚至不知道从哪里开始...

英文:

I have the following dataset:

data = {
    &#39;1&#39;: [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;C&#39;, &#39;NAN&#39;, &#39;C&#39;, &#39;B&#39;, &#39;A&#39;],
    &#39;2&#39;: [&#39;B&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;A&#39;, &#39;B&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;C&#39;],
    &#39;3&#39;: [&#39;NAN&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;NAN&#39;, &#39;B&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;A&#39;],
    &#39;4&#39;: [&#39;C&#39;, &#39;B&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;B&#39;, &#39;NAN&#39;, &#39;C&#39;, &#39;A&#39;, &#39;NAN&#39;, &#39;B&#39;]
}
df = pd.DataFrame(data)

and I want to perform a simple Sankey plot of this data structure. I dont even know where to start...

答案1

得分: 1

以下是翻译好的内容：

# 获取数据的正确格式可能有些棘手。也许有比我现在想出的更高效的方法，但希望这能完成任务。
import plotly.graph_objects as go
import pandas as pd
import numpy as np
data = {
    '1': ['A', 'B', 'C', 'NAN', 'A', 'C', 'NAN', 'C', 'B', 'A'],
    '2': ['B', 'NAN', 'A', 'B', 'C', 'A', 'B', 'NAN', 'A', 'C'],
    '3': ['NAN', 'A', 'B', 'C', 'NAN', 'B', 'A', 'B', 'C', 'A'],
    '4': ['C', 'B', 'NAN', 'A', 'B', 'NAN', 'C', 'A', 'NAN', 'B']
}
df = pd.DataFrame(data)
df = df.replace('NAN', np.nan)
# 通过将列名添加到单元格值中，获取标签的列表，然后获取唯一的组合
label = sorted(df.apply(lambda x: x + x.name).melt().dropna()['value'].unique())
# 遍历两列，映射出关系
output = []
for i in range(1, df.shape[1]):
    output.extend(df[[str(i), str(i+1)]].value_counts().reset_index().apply(lambda x: x + x.name).values)
# 将关系转换为标签列表的索引
mapped = []
for x in output:
    mapped.append((label.index(x[0]), label.index(x[1]), x[2))
# 将值拆分为相应的桶
source, target, value = np.array(mapped).T
# 构建你的图表
fig = go.Figure(data=)
fig.update_layout(title_text="基本桑基图", font_size=10)
fig.show()

输出：


<details>
<summary>英文:</summary>
It is tricky to get the data in the correct shape.  Perhaps there is a more efficient way than what I have come up with, but hopefully this gets the job done.
    import plotly.graph_objects as go
    import pandas as pd
    import numpy as np
    data = {
        &#39;1&#39;: [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;C&#39;, &#39;NAN&#39;, &#39;C&#39;, &#39;B&#39;, &#39;A&#39;],
        &#39;2&#39;: [&#39;B&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;A&#39;, &#39;B&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;C&#39;],
        &#39;3&#39;: [&#39;NAN&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;NAN&#39;, &#39;B&#39;, &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;A&#39;],
        &#39;4&#39;: [&#39;C&#39;, &#39;B&#39;, &#39;NAN&#39;, &#39;A&#39;, &#39;B&#39;, &#39;NAN&#39;, &#39;C&#39;, &#39;A&#39;, &#39;NAN&#39;, &#39;B&#39;]
    }
    df = pd.DataFrame(data)
    df = df.replace(&#39;NAN&#39;, np.nan)
    
    # Get a list of labels by adding the column name to the the cell values and
    # getting the discict combinations
    label  = sorted(df.apply(lambda x: x+x.name).melt().dropna()[&#39;value&#39;].unique())
    
    # Iterate over two columns at a time to map out the relationships
    output = []
    for i in range(1, df.shape[1]):
        output.extend(df[[str(i),str(i+1)]].value_counts().reset_index().apply(lambda x: x+x.name).values)
    
    # Convert the relationships to the index of the labels list
    mapped = []
    for x in output:
        mapped.append((label.index(x[0]), label.index(x[1]), x[2]))
    
    # Split the values into their corresponding buckets
    source, target, value = np.array(mapped).T
    
    # Build your chart
    fig = go.Figure(data=)
    
    fig.update_layout(title_text=&quot;Basic Sankey Diagram&quot;, font_size=10)
    fig.show()
Output
[![enter image description here][1]][1]
  [1]: https://i.stack.imgur.com/LBYzN.png
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python: 带有复杂数据的桑基图表图

问题

答案1

排除用户从注释查询中。

如何将来自在Python循环中传递的函数中的变量附加到数据框？

为什么arcpy中的len()函数与实际情况不匹配

当Python线程完成时如何通知

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。