英文:
If cell number in certain dataframe column is equal to or greater, insert duplicate row and add balance
问题
我有以下数据框:
import pandas as pd
df = pd.DataFrame()
df['number'] = (651, 651, 651, 4267, 4267, 4267, 4267, 4267, 4267, 4267, 8806, 8806, 8806, 6841, 6841, 6841, 6841)
df['name'] = ('Alex', 'Alex', 'Alex', 'Ankit', 'Ankit', 'Ankit', 'Ankit', 'Ankit', 'Ankit', 'Ankit', 'Abhishek', 'Abhishek', 'Abhishek', 'Blake', 'Blake', 'Blake', 'Blake')
df['hours'] = (8.25, 7.5, 7.5, 7.5, 14, 12, 15, 11, 6.5, 14, 15, 15, 13.5, 8, 8, 8, 8)
df['loc'] = ('Nar', 'SCC', 'RSL', 'UNIT-C', 'UNIT-C', 'UNIT-C', 'UNIT-C', 'UNIT-C', 'UNIT-C', 'UNIT-C', 'UNI', 'UNI', 'UNI', 'UNKING', 'UNKING', 'UNKING', 'UNKING')
print(df)
对于每一行的小时数在>=10且<=12之间,我需要:
- 将当前行的小时总数更改为10
- 在上面插入一个重复的行,并将剩余的小时数添加到这行
对于每一行的小时数>12:
- 将当前行的小时总数更改为10
- 在上面插入一个重复的行,并将2小时添加到这行
- 在上面再次插入一个重复的行,并将剩余的小时数添加到这行
新数据框的结果应该如下所示:
英文:
I have the following data-frame
import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)
For each row that has hours between >=10 and <=12, I need to:
- change the total of the hours of the current row to 10
- insert a duplicate row to the above and add the balance of hours to this row
For each row that has hours >12
- change the total of the hours to 10
- insert below a duplicate row to the above and 2 hours to this row
- insert another duplicate row to the above and add the balance of hours to this row
The result of the new dataframe should look like the following:
答案1
得分: 4
以下是翻译好的代码部分:
# 根据小时值对数据框进行子集化
s = df[df['hours'] < 10]
s1 = df[df['hours'] > 12]
s2 = df[df['hours'].between(10, 12)]
# 为每个子集分配重复行并连接
pd.concat([
s,
s1.assign(hours=10),
s1.assign(hours=2),
s1.assign(hours=s1['hours'] - 12),
s2.assign(hours=10),
s2.assign(hours=s2['hours'] - 10)]
).sort_index(kind='stable', ignore_index=True)
请注意,这是代码的翻译部分。
英文:
Code
# subset the dataframe based on hour value
s = df[df['hours'] < 10]
s1 = df[df['hours'] > 12]
s2 = df[df['hours'].between(10, 12)]
# Assign duplicate rows per subset and concat
pd.concat([
s,
s1.assign(hours=10),
s1.assign(hours=2),
s1.assign(hours=s1['hours'] - 12),
s2.assign(hours=10),
s2.assign(hours=s2['hours'] - 10)]
).sort_index(kind='stable', ignore_index=True)
Result
number name hours loc
0 651 Alex 8.25 Nar
1 651 Alex 7.50 SCC
2 651 Alex 7.50 RSL
3 4267 Ankit 7.50 UNIT-C
4 4267 Ankit 10.00 UNIT-C
5 4267 Ankit 2.00 UNIT-C
6 4267 Ankit 2.00 UNIT-C
7 4267 Ankit 10.00 UNIT-C
8 4267 Ankit 2.00 UNIT-C
9 4267 Ankit 10.00 UNIT-C
10 4267 Ankit 2.00 UNIT-C
11 4267 Ankit 3.00 UNIT-C
12 4267 Ankit 10.00 UNIT-C
13 4267 Ankit 1.00 UNIT-C
14 4267 Ankit 6.50 UNIT-C
15 4267 Ankit 10.00 UNIT-C
16 4267 Ankit 2.00 UNIT-C
17 4267 Ankit 2.00 UNIT-C
18 8806 Abhishek 10.00 UNI
19 8806 Abhishek 2.00 UNI
20 8806 Abhishek 3.00 UNI
21 8806 Abhishek 10.00 UNI
22 8806 Abhishek 2.00 UNI
23 8806 Abhishek 3.00 UNI
24 8806 Abhishek 10.00 UNI
25 8806 Abhishek 2.00 UNI
26 8806 Abhishek 1.50 UNI
27 6841 Blake 8.00 UNKING
28 6841 Blake 8.00 UNKING
29 6841 Blake 8.00 UNKING
30 6841 Blake 8.00 UNKING
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论