英文:
Python: Sankey plot chart with complex data
问题
我有以下数据集:
data = {
'1': ['A', 'B', 'C', 'NAN', 'A', 'C', 'NAN', 'C', 'B', 'A'],
'2': ['B', 'NAN', 'A', 'B', 'C', 'A', 'B', 'NAN', 'A', 'C'],
'3': ['NAN', 'A', 'B', 'C', 'NAN', 'B', 'A', 'B', 'C', 'A'],
'4': ['C', 'B', 'NAN', 'A', 'B', 'NAN', 'C', 'A', 'NAN', 'B']
}
df = pd.DataFrame(data)
我想要绘制这个数据结构的简单桑基图。我甚至不知道从哪里开始...
英文:
I have the following dataset:
data = {
'1': ['A', 'B', 'C', 'NAN', 'A', 'C', 'NAN', 'C', 'B', 'A'],
'2': ['B', 'NAN', 'A', 'B', 'C', 'A', 'B', 'NAN', 'A', 'C'],
'3': ['NAN', 'A', 'B', 'C', 'NAN', 'B', 'A', 'B', 'C', 'A'],
'4': ['C', 'B', 'NAN', 'A', 'B', 'NAN', 'C', 'A', 'NAN', 'B']
}
df = pd.DataFrame(data)
and I want to perform a simple Sankey plot of this data structure. I dont even know where to start...
答案1
得分: 1
以下是翻译好的内容:
# 获取数据的正确格式可能有些棘手。也许有比我现在想出的更高效的方法,但希望这能完成任务。
import plotly.graph_objects as go
import pandas as pd
import numpy as np
data = {
'1': ['A', 'B', 'C', 'NAN', 'A', 'C', 'NAN', 'C', 'B', 'A'],
'2': ['B', 'NAN', 'A', 'B', 'C', 'A', 'B', 'NAN', 'A', 'C'],
'3': ['NAN', 'A', 'B', 'C', 'NAN', 'B', 'A', 'B', 'C', 'A'],
'4': ['C', 'B', 'NAN', 'A', 'B', 'NAN', 'C', 'A', 'NAN', 'B']
}
df = pd.DataFrame(data)
df = df.replace('NAN', np.nan)
# 通过将列名添加到单元格值中,获取标签的列表,然后获取唯一的组合
label = sorted(df.apply(lambda x: x + x.name).melt().dropna()['value'].unique())
# 遍历两列,映射出关系
output = []
for i in range(1, df.shape[1]):
output.extend(df[[str(i), str(i+1)]].value_counts().reset_index().apply(lambda x: x + x.name).values)
# 将关系转换为标签列表的索引
mapped = []
for x in output:
mapped.append((label.index(x[0]), label.index(x[1]), x[2))
# 将值拆分为相应的桶
source, target, value = np.array(mapped).T
# 构建你的图表
fig = go.Figure(data=)
fig.update_layout(title_text="基本桑基图", font_size=10)
fig.show()
<details>
<summary>英文:</summary>
It is tricky to get the data in the correct shape. Perhaps there is a more efficient way than what I have come up with, but hopefully this gets the job done.
import plotly.graph_objects as go
import pandas as pd
import numpy as np
data = {
'1': ['A', 'B', 'C', 'NAN', 'A', 'C', 'NAN', 'C', 'B', 'A'],
'2': ['B', 'NAN', 'A', 'B', 'C', 'A', 'B', 'NAN', 'A', 'C'],
'3': ['NAN', 'A', 'B', 'C', 'NAN', 'B', 'A', 'B', 'C', 'A'],
'4': ['C', 'B', 'NAN', 'A', 'B', 'NAN', 'C', 'A', 'NAN', 'B']
}
df = pd.DataFrame(data)
df = df.replace('NAN', np.nan)
# Get a list of labels by adding the column name to the the cell values and
# getting the discict combinations
label = sorted(df.apply(lambda x: x+x.name).melt().dropna()['value'].unique())
# Iterate over two columns at a time to map out the relationships
output = []
for i in range(1, df.shape[1]):
output.extend(df[[str(i),str(i+1)]].value_counts().reset_index().apply(lambda x: x+x.name).values)
# Convert the relationships to the index of the labels list
mapped = []
for x in output:
mapped.append((label.index(x[0]), label.index(x[1]), x[2]))
# Split the values into their corresponding buckets
source, target, value = np.array(mapped).T
# Build your chart
fig = go.Figure(data=)
fig.update_layout(title_text="Basic Sankey Diagram", font_size=10)
fig.show()
Output
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/LBYzN.png
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论