英文:
python pandas dataframe, add column and tag an adjusted and inserted row
问题
以下是代码的翻译部分:
我有以下的数据框
import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake)
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING)
print(df)
如果某个人的工时累积达到38小时,将对达到38小时的单元格进行调整,插入一行重复的数据行,并将工时余额添加到以下行。以下代码执行此操作,可以看到原始数据和调整后数据之间的差异。
s = df.groupby('number')['hours'].cumsum()
m = s.gt(38)
idx = m.groupby(df['number']).idxmax()
delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
out.loc[out.index.duplicated(), 'hours'] -= delta
print(out)
对于被调整的行和被插入的行,我需要通过插入另一列并添加字符(例如 'x')来标记它们以突出显示已调整和已插入的行。
<details>
<summary>英文:</summary>
I have the following data-frame
import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)
If the running balance of an individuals hours reach 38 an adjustment to the cell that reached the 38th hour is made, a duplicate row is inserted and the balance of hours is added to the following row. The following code performs this and the difference in output of original data to adjusted data can be seen.
s = df.groupby('number')['hours'].cumsum()
m = s.gt(38)
idx = m.groupby(df['number']).idxmax()
delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
out.loc[out.index.duplicated(), 'hours'] -= delta
print(out)
For the row that got adjusted and the row that got inserted I need to tag them via inserting another column and adding a character such as an 'x' to highlight the adjusted and inserted row
</details>
# 答案1
**得分**: 1
当你复制索引时,可以使用`out.index.duplicated`作为布尔掩码:
```python
# or out['mod'] = np.where(out.index.duplicated(keep=False), 'x', '-')
out.loc[out.index.duplicated(keep=False), 'mod'] = 'x'
print(out)
# 输出
number name hours loc mod
0 651 Alex 8.25 Nar NaN
1 651 Alex 7.50 SCC NaN
2 651 Alex 7.50 RSL NaN
3 4267 Ankit 7.50 UNIT-C NaN
4 4267 Ankit 14.00 UNIT-C NaN
5 4267 Ankit 12.00 UNIT-C NaN
6 4267 Ankit 4.50 UNIT-C x # 索引6
6 4267 Ankit 10.50 UNIT-C x # 重复
7 4267 Ankit 11.00 UNIT-C NaN
8 4267 Ankit 6.50 UNIT-C NaN
9 4267 Ankit 14.00 UNIT-C NaN
10 8806 Abhishek 15.00 UNI NaN
11 8806 Abhishek 15.00 UNI NaN
12 8806 Abhishek 8.00 UNI x # 索引12
12 8806 Abhishek 5.50 UNI x # 重复
13 6841 Blake 8.00 UNKING NaN
14 6841 Blake 8.00 UNKING NaN
15 6841 Blake 8.00 UNKING NaN
16 6841 Blake 8.00 UNKING NaN
请注意,这是代码的翻译,不包括代码部分。
英文:
As you duplicate index, you can use out.index.duplicated
as boolean mask:
# or out['mod'] = np.where(out.index.duplicated(keep=False), 'x', '-')
out.loc[out.index.duplicated(keep=False), 'mod'] = 'x'
print(out)
# Output
number name hours loc mod
0 651 Alex 8.25 Nar NaN
1 651 Alex 7.50 SCC NaN
2 651 Alex 7.50 RSL NaN
3 4267 Ankit 7.50 UNIT-C NaN
4 4267 Ankit 14.00 UNIT-C NaN
5 4267 Ankit 12.00 UNIT-C NaN
6 4267 Ankit 4.50 UNIT-C x # index 6
6 4267 Ankit 10.50 UNIT-C x # twice
7 4267 Ankit 11.00 UNIT-C NaN
8 4267 Ankit 6.50 UNIT-C NaN
9 4267 Ankit 14.00 UNIT-C NaN
10 8806 Abhishek 15.00 UNI NaN
11 8806 Abhishek 15.00 UNI NaN
12 8806 Abhishek 8.00 UNI x # index 12
12 8806 Abhishek 5.50 UNI x # twice
13 6841 Blake 8.00 UNKING NaN
14 6841 Blake 8.00 UNKING NaN
15 6841 Blake 8.00 UNKING NaN
16 6841 Blake 8.00 UNKING NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论