2023年6月8日 03:49:39go评论100阅读模式

英文:

How can I use transform (or other) instead of merge and temporary columns for special indexing/ranking?

问题

我有以下的Python代码，它的输出是正确的，但我觉得可以通过删除临时列和消除合并操作，也许使用transform或其他更巧妙的方式来写更符合Pythonic风格的代码。数据已按类别和子类别预先排序。每个组都将具有子_cat值为'c'。想法是输出一个名为'rank'的列，其中子_cat值'c'始终为零，并通过整数递增/递减计数到组的其余部分，在每个组中重置。cat_index和c_idx（表示组内'c'的索引）将被丢弃。

以下是您的工作代码：

import pandas as pd
import io
data = '''category,sub_cat
A,a
A,c
A,d
B,c
B,d
D,a
D,b
D,c
G,b
G,c
G,e
G,f
G,h
'''
df = pd.read_csv(io.StringIO(data), sep=',')
df['cat_index'] = df.groupby('category').cumcount()
c_idx = df.groupby('category').apply(lambda x: x.loc[x['sub_cat'] == 'c', 'cat_index'].values[0]).to_frame('c_idx')
df = df.merge(c_idx, how='left', on='category')
df['rank'] = df['cat_index'] - df['c_idx']

以下是正确的输出：

再次强调，'rank'是从'category'和'sub_cat'输入得到的期望输出。'cat_index'和'c_idx'列是临时的，将被删除。

有任何想法吗？请告诉我。谢谢。

英文:

I have the following python code which outputs correctly, but I sense the code could be more pythonic by removing temporary columns and removing the merge perhaps with transform or other clever way. The data is presorted by category then subcategory. Each group will have a will a sub_cat value of 'c'. The idea is to output a rank column where sub_cat value 'c' is always zero and counts by integers up/down through the rest of the group, resetting at each group. The cat_index, and c_idx (meaning the index of 'c' within the group) will be disposed of.

Here is my working code:

import pandas as pd
import io
data = &#39;&#39;&#39;
category,sub_cat
A,a
A,c
A,d
B,c
B,d
D,a
D,b
D,c
G,b
G,c
G,e
G,f
G,h
&#39;&#39;&#39;
df = pd.read_csv(io.StringIO(data), sep=&#39;,&#39;)
df[&#39;cat_index&#39;] = df.groupby(&#39;category&#39;).cumcount()
c_idx = df.groupby(&#39;category&#39;).apply(lambda x: x.loc[x[&#39;sub_cat&#39;] == &#39;c&#39;, &#39;cat_index&#39;].values[0]).to_frame(&#39;c_idx&#39;)
df = df.merge(c_idx, how=&#39;left&#39;, on=&#39;category&#39;)
df[&#39;rank&#39;] = df[&#39;cat_index&#39;] - df[&#39;c_idx&#39;]

And here is the correct output:

Again, 'rank' is the desired output from inputs 'category', and 'sub_cat'. Columns 'cat_index' and 'c_idx' are temporary and will be dropped.

Any ideas? Let me know. Thanks.

答案1

得分: 1

尝试：

df['rank'] = df.groupby('category')['sub_cat'].transform(lambda x: np.arange(len(x)) - ((x == 'c').idxmax() - x.index[0]))
print(df)

输出：

   category sub_cat  rank
0         A       a    -1
1         A       c     0
2         A       d     1
3         B       c     0
4         B       d     1
5         D       a    -2
6         D       b    -1
7         D       c     0
8         G       b    -1
9         G       c     0
10        G       e     1
11        G       f     2
12        G       h     3

英文:

Try:

df[&#39;rank&#39;] = df.groupby(&#39;category&#39;)[&#39;sub_cat&#39;].transform(lambda x: np.arange(len(x)) - ((x == &#39;c&#39;).idxmax() - x.index[0]))
print(df)

Prints:

   category sub_cat  rank
0         A       a    -1
1         A       c     0
2         A       d     1
3         B       c     0
4         B       d     1
5         D       a    -2
6         D       b    -1
7         D       c     0
8         G       b    -1
9         G       c     0
10        G       e     1
11        G       f     2
12        G       h     3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I use transform (or other) instead of merge and temporary columns for special indexing/ranking?

问题

答案1

‘DataFrame’ 对象没有 ‘merge’ 属性

将巨大的CSV文件拆分成多个文件，使用Python。

discord.py默认的错误处理

如何在Django管理面板中找到我拥有的站点ID。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。