如何在对数据框进行分组后设置断裂条顺序

huangapple go评论62阅读模式
英文:

How to set broken bar order after grouping the dataframe

问题

以下是翻译好的部分:

这个代码示例绘制了一个断断续续的横向条形图显示了一组人员在一段时间内加入和离开一个音乐团队的情况

```python
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

result = pd.DataFrame([['Bill', 1972, 1974],
                       ['Bill', 1976, 1978],
                       ['Bill', 1967, 1971],
                       ['Danny', 1969, 1975],
                       ['Danny', 1976, 1977],
                       ['James', 1971, 1972],
                       ['Marshall', 1967, 1975]],
                      columns=['Person', 'Year_start', 'Year_left'])

fig, ax = plt.subplots()

names = sorted(result['Person'].unique())

colormap = plt.get_cmap('plasma')
slicedColorMap = colormap(np.linspace(0, 1, len(names)))

height = 0.5
for y, (name, g) in enumerate(result.groupby('Person')):
    ax.broken_barh(list(zip(g['Year_start'],
                            g['Year_left'] - g['Year_start'])),
                   (y - height / 2, height),
                   facecolors=slicedColorMap[y]
                   )

ax.set_ylim(0 - height, len(names) - 1 + height)
ax.set_xlim(result['Year_start'].min() - 1, result['Year_left'].max() + 1)
ax.set_yticks(range(len(names)), names)

ax.grid(True)
plt.show()

输出结果是这个图表:

如何在对数据框进行分组后设置断裂条顺序

我需要按'Year_start'和'Year_left'的升序顺序对条形图进行排序(以及在y轴上的人员)。

我知道如何在数据分组后对数据进行汇总和排序,并且我应该在重置索引后进行:

sorted_result = result.groupby('Person').agg({'Year_start': min, 'Year_left': max})
sorted_result = sorted_result.sort_values(['Year_start', 'Year_left'], ascending=[True, True]).reset_index()
print(sorted_result)

但是在绘制ax.broken_barh时将这个排序嵌入到现有的“for循环”中让我感到很困难(也因为我理解的是不可能在单次迭代中使用“agg”和“groupby”进行“sort_values”)。
在这个脚本中是否可能进行这种排序,还是我应该彻底重新考虑脚本结构?
非常感谢!


<details>
<summary>英文:</summary>

The code example below draws a broken barh diagram with a list of persons which joined and left a music band during a period of time:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

result = pd.DataFrame([['Bill', 1972, 1974],
['Bill', 1976, 1978],
['Bill', 1967, 1971],
['Danny', 1969, 1975],
['Danny', 1976, 1977],
['James', 1971, 1972],
['Marshall', 1967, 1975]],
columns=['Person', 'Year_start', 'Year_left'])

fig, ax = plt.subplots()

names = sorted(result['Person'].unique())

colormap = plt.get_cmap('plasma')
slicedColorMap = colormap(np.linspace(0, 1, len(names)))

height = 0.5
for y, (name, g) in enumerate(result.groupby('Person')):
ax.broken_barh(list(zip(g['Year_start'],
g['Year_left']-g['Year_start'])),
(y-height/2, height),
facecolors=slicedColorMap[y]
)

ax.set_ylim(0-height, len(names)-1+height)
ax.set_xlim(result['Year_start'].min()-1, result['Year_left'].max()+1)
ax.set_yticks(range(len(names)), names)

ax.grid(True)
plt.show()


The output result is this diagram:
[![enter image description here](https://i.stack.imgur.com/zDabL.png)](https://i.stack.imgur.com/zDabL.png)

I need to order the bars (along with the Persons in y axis) by &#39;Year_start&#39; and &#39;Year_left&#39;, both in ascending order.

I know how to aggregate and order values in dataframe after the data is grouped, and that I should reset index afterwards:

sorted_result = result.groupby('Person').agg({'Year_start': min, 'Year_left': max})
sorted_result = sorted_result.sort_values(['Year_start', 'Year_left'], ascending=[True, True]).reset_index()
print(sorted_result)


But I am having a hard time to embed this sorting into existing &quot;for in&quot; loop when drawing the ax.broken_barh (also because as I understood it is not possible to perform &quot;sort_values&quot; with &quot;groupby&quot; using &quot;agg&quot; in a single iteration).
Is this sorting possible in this script at all or I should completely reconsider the script structure?
Many thanks!

</details>


# 答案1
**得分**: 1

以下是您要翻译的内容:

"你已经快要完成了 :-) 你已经按最早的开始和最早的结束时间对名称进行了排序。您只需要将“Person”列更改为按接收顺序的分类,并在分组之前添加sort_values('Person')以进行barh绘图。更新的代码如下。添加了注释以使其更容易理解。希望这是您正在寻找的内容...

请注意 - 我认为您正在使用matplotlib 3.2或更早版本。因此,set_yticks()仍然有效。但在更新的版本中已弃用。我还将其拆分为set_yticks()和set_yticklabels(),因为在新版本中需要这样做。还将名称更改为sorted_result.Person.to_list(),以确保标签正确对齐。

```python
result = pd.DataFrame([['Bill', 1972, 1974],
                       ['Bill', 1976, 1978],
                       ['Bill', 1967, 1971],
                       ['Danny', 1969, 1975],
                       ['Danny', 1976, 1977],
                       ['James', 1971, 1972],
                       ['Marshall', 1967, 1975]],
                      columns=['Person', 'Year_start', 'Year_left'])

fig, ax = plt.subplots()

names = sorted(result['Person'].unique())

colormap = plt.get_cmap('plasma')
slicedColorMap = colormap(np.linspace(0, 1, len(names)))

height = 0.5

## 新添加的代码 ##
## 这是您的代码.. 获取sorted_result
sorted_result = result.groupby('Person').agg({'Year_start': min, 'Year_left': max})
sorted_result = sorted_result.sort_values(['Year_start', 'Year_left'], ascending=[True, True]).reset_index()

## 将Person更改为分类,以便在排序时按您需要的顺序进行排序
## 请注意,我正在使用sorted_result.Person.to_list(),基本上是您需要的排序顺序
result['Person'] = pd.Categorical(
    result['Person'], 
    categories=sorted_result.Person.to_list(), 
    ordered=True
)

## 在分组之前添加sort_values('Person')...
for y, (name, g) in enumerate(result.sort_values('Person').groupby('Person')):
    ax.broken_barh(list(zip(g['Year_start'],
                            g['Year_left']-g['Year_start'])),
                   (y-height/2, height),
                   facecolors=slicedColorMap[y]
                   )

ax.set_ylim(0-height, len(names)-1+height)
ax.set_xlim(result['Year_start'].min()-1, result['Year_left'].max()+1)
ax.set_yticks(range(len(sorted_result.Person.to_list())))  ## 更改了名称
ax.set_yticklabels(sorted_result.Person.to_list())  ## 更改了名称

ax.grid(True)
plt.show()

如何在对数据框进行分组后设置断裂条顺序

英文:

You are almost there 如何在对数据框进行分组后设置断裂条顺序 You already have the names sorted by earliest start and earliest end. You just need to change the Person column to categorical with the order that you received and then do the barh plotting by adding sort_values('Person') before grouping. The updated code is below. Added comments to make it easy. Hope this is what you are you are looking for...

Please note - I think you are using matplotlib 3.2 or earlier. So, set_yticks() works. But, it is deprecated in newer version. Have split it into set_yticks() and set_yticklabels() as is required in newer versions. Also, changed names to sorted_result.Person.to_list() so the labels are correctly aligned.

result = pd.DataFrame([[&#39;Bill&#39;, 1972, 1974],
                       [&#39;Bill&#39;, 1976, 1978],
                       [&#39;Bill&#39;, 1967, 1971],
                       [&#39;Danny&#39;, 1969, 1975],
                       [&#39;Danny&#39;, 1976, 1977],
                       [&#39;James&#39;, 1971, 1972],
                       [&#39;Marshall&#39;, 1967, 1975]],
                      columns=[&#39;Person&#39;, &#39;Year_start&#39;, &#39;Year_left&#39;])

fig, ax = plt.subplots()

names = sorted(result[&#39;Person&#39;].unique())

colormap = plt.get_cmap(&#39;plasma&#39;)
slicedColorMap = colormap(np.linspace(0, 1, len(names)))

height = 0.5

## NEW ADDED CODE ##
## This is your code.. get the sorted_result
sorted_result = result.groupby(&#39;Person&#39;).agg({&#39;Year_start&#39;: min, &#39;Year_left&#39;: max})
sorted_result = sorted_result.sort_values([&#39;Year_start&#39;, &#39;Year_left&#39;], ascending=[True, True]).reset_index()

## Change Person to categorical, so that, when you sort it, it will be in the order you need
## Notice that I am using sorted_result.Person.to_list(), basically sort order as you need
result[&#39;Person&#39;] = pd.Categorical(
    result[&#39;Person&#39;], 
    categories=sorted_result.Person.to_list(), 
    ordered=True
)

## Here, added sort_values(&#39;Person&#39;) before grouping...
for y, (name, g) in enumerate(result.sort_values(&#39;Person&#39;).groupby(&#39;Person&#39;)):
    ax.broken_barh(list(zip(g[&#39;Year_start&#39;],
                            g[&#39;Year_left&#39;]-g[&#39;Year_start&#39;])),
                   (y-height/2, height),
                   facecolors=slicedColorMap[y]
                   )

ax.set_ylim(0-height, len(names)-1+height)
ax.set_xlim(result[&#39;Year_start&#39;].min()-1, result[&#39;Year_left&#39;].max()+1)
ax.set_yticks(range(len(sorted_result.Person.to_list())))  ##Changed name
ax.set_yticklabels(sorted_result.Person.to_list())  ## Changed name

ax.grid(True)
plt.show()

如何在对数据框进行分组后设置断裂条顺序

答案2

得分: 1

使用groupby()时,只需在使用sort=False参数,然后在数据帧中按照所需方式对数据进行排序。代码的其余部分可以保持不变:

result = pd.DataFrame([['Bill', 1972, 1974],
                       ['Bill', 1976, 1978],
                       ['Bill', 1967, 1971],
                       ['Danny', 1969, 1975],
                       ['Danny', 1976, 1977],
                       ['James', 1971, 1972],
                       ['Marshall', 1967, 1975]],
                      columns=['Person', 'Year_start', 'Year_left'])

sorter = result.groupby('Person').agg({'Year_start':'min','Year_left':'max'})\
    .sort_values(['Year_start','Year_left'],
                 ascending=[True,True])\
        .index.to_frame().\
            assign(sorter = range(result['Person'].nunique()))\
                .set_index('Person').to_dict()['sorter']
                                                   
result['sorter'] = result['Person'].map(sorter)
result = result.sort_values('sorter',ascending=True)

fig, ax = plt.subplots()

colormap = plt.get_cmap('plasma')
slicedColorMap = colormap(np.linspace(0, 1, result['Person'].nunique()))

height = 0.5
names = []
for y, (name, g) in enumerate(result.groupby('Person',sort=False)): #Here I'm using sort=False to avoid groupby from sorting it differently
    print(name)
    ax.broken_barh(list(zip(g['Year_start'],
                            g['Year_left']-g['Year_start'])),
                   (y-height/2, height),
                   facecolors=slicedColorMap[y]
                   )
    names.append(name)

其余代码保持不变。这将输出相同的结果。

英文:

IIIC, all you need to do is use sort=False when using groupby() and previously sort the dataframe in your desired way. The rest of the code can remain the same:

EDIT: However, due to the sorting being quite specific and not easily covered in sort_values(), I suggest to do in an external dataframe and then merging it back to the original dataframe to sort it.

result = pd.DataFrame([[&#39;Bill&#39;, 1972, 1974],
                       [&#39;Bill&#39;, 1976, 1978],
                       [&#39;Bill&#39;, 1967, 1971],
                       [&#39;Danny&#39;, 1969, 1975],
                       [&#39;Danny&#39;, 1976, 1977],
                       [&#39;James&#39;, 1971, 1972],
                       [&#39;Marshall&#39;, 1967, 1975]],
                      columns=[&#39;Person&#39;, &#39;Year_start&#39;, &#39;Year_left&#39;])

sorter = result.groupby(&#39;Person&#39;).agg({&#39;Year_start&#39;:&#39;min&#39;,&#39;Year_left&#39;:&#39;max&#39;})\
    .sort_values([&#39;Year_start&#39;,&#39;Year_left&#39;],
                 ascending=[True,True])\
        .index.to_frame().\
            assign(sorter = range(result[&#39;Person&#39;].nunique()))\
                .set_index(&#39;Person&#39;).to_dict()[&#39;sorter&#39;]
                                                               
result[&#39;sorter&#39;] = result[&#39;Person&#39;].map(sorter)
result = result.sort_values(&#39;sorter&#39;,ascending=True)

fig, ax = plt.subplots()

colormap = plt.get_cmap(&#39;plasma&#39;)
slicedColorMap = colormap(np.linspace(0, 1, result[&#39;Person&#39;].nunique()))

height = 0.5
names = []
for y, (name, g) in enumerate(result.groupby(&#39;Person&#39;,sort=False)): #Here I&#39;m using sort=False to avoid groupby from sorting it differently
    print(name)
    ax.broken_barh(list(zip(g[&#39;Year_start&#39;],
                            g[&#39;Year_left&#39;]-g[&#39;Year_start&#39;])),
                   (y-height/2, height),
                   facecolors=slicedColorMap[y]
                   )
    names.append(name)

The rest of the code remains the same. This outputs:

I'm also making a small improvement by not statically defining names which will later be passed, but rather creating the list as the loop goes by so the name will always match the bar. That's why I'm also using result[&#39;Person&#39;].nunique() rather than len(names)

EDIT: Code edited based on discussion with OP
1: https://i.stack.imgur.com/6epWQ.png

huangapple
  • 本文由 发表于 2023年6月5日 21:17:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76406816.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定