英文:
How can I use transform (or other) instead of merge and temporary columns for special indexing/ranking?
问题
我有以下的Python代码,它的输出是正确的,但我觉得可以通过删除临时列和消除合并操作,也许使用transform或其他更巧妙的方式来写更符合Pythonic风格的代码。数据已按类别和子类别预先排序。每个组都将具有子_cat值为'c'。想法是输出一个名为'rank'的列,其中子_cat值'c'始终为零,并通过整数递增/递减计数到组的其余部分,在每个组中重置。cat_index和c_idx(表示组内'c'的索引)将被丢弃。
以下是您的工作代码:
import pandas as pd
import io
data = '''category,sub_cat
A,a
A,c
A,d
B,c
B,d
D,a
D,b
D,c
G,b
G,c
G,e
G,f
G,h
'''
df = pd.read_csv(io.StringIO(data), sep=',')
df['cat_index'] = df.groupby('category').cumcount()
c_idx = df.groupby('category').apply(lambda x: x.loc[x['sub_cat'] == 'c', 'cat_index'].values[0]).to_frame('c_idx')
df = df.merge(c_idx, how='left', on='category')
df['rank'] = df['cat_index'] - df['c_idx']
再次强调,'rank'是从'category'和'sub_cat'输入得到的期望输出。'cat_index'和'c_idx'列是临时的,将被删除。
有任何想法吗?请告诉我。谢谢。
英文:
I have the following python code which outputs correctly, but I sense the code could be more pythonic by removing temporary columns and removing the merge perhaps with transform or other clever way. The data is presorted by category then subcategory. Each group will have a will a sub_cat value of 'c'. The idea is to output a rank column where sub_cat value 'c' is always zero and counts by integers up/down through the rest of the group, resetting at each group. The cat_index, and c_idx (meaning the index of 'c' within the group) will be disposed of.
Here is my working code:
import pandas as pd
import io
data = '''
category,sub_cat
A,a
A,c
A,d
B,c
B,d
D,a
D,b
D,c
G,b
G,c
G,e
G,f
G,h
'''
df = pd.read_csv(io.StringIO(data), sep=',')
df['cat_index'] = df.groupby('category').cumcount()
c_idx = df.groupby('category').apply(lambda x: x.loc[x['sub_cat'] == 'c', 'cat_index'].values[0]).to_frame('c_idx')
df = df.merge(c_idx, how='left', on='category')
df['rank'] = df['cat_index'] - df['c_idx']
And here is the correct output:
Again, 'rank' is the desired output from inputs 'category', and 'sub_cat'. Columns 'cat_index' and 'c_idx' are temporary and will be dropped.
Any ideas? Let me know. Thanks.
答案1
得分: 1
尝试:
df['rank'] = df.groupby('category')['sub_cat'].transform(lambda x: np.arange(len(x)) - ((x == 'c').idxmax() - x.index[0]))
print(df)
输出:
category sub_cat rank
0 A a -1
1 A c 0
2 A d 1
3 B c 0
4 B d 1
5 D a -2
6 D b -1
7 D c 0
8 G b -1
9 G c 0
10 G e 1
11 G f 2
12 G h 3
英文:
Try:
df['rank'] = df.groupby('category')['sub_cat'].transform(lambda x: np.arange(len(x)) - ((x == 'c').idxmax() - x.index[0]))
print(df)
Prints:
category sub_cat rank
0 A a -1
1 A c 0
2 A d 1
3 B c 0
4 B d 1
5 D a -2
6 D b -1
7 D c 0
8 G b -1
9 G c 0
10 G e 1
11 G f 2
12 G h 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论