英文:
subtracting the 'INT' column from the 'LMP' column, but only from the indexed row per unique 'ID'
问题
'sub': [float('NaN'), -411.0, 2.0, -732.0, -338.0, 0.0, 458.0, 801.0, 1118.0, 1461.0, 1848.0, 2259.0]
英文:
I like to create a new column named 'sub' in the DataFrame and calculate its values by subtracting the 'INT' column from the 'LMP' column, but only subtracting from the latest row per unique 'ID' where the 'FM' column is set to'time0', I calculated FM as the follows but I dont know how to implement the sub column.
data = {
'ID': [0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2],
'VIS': [0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
'STA': [float('NaN'), 4.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
'LMP': [float('NaN'), -35.0, 411.0, 773.0, 1143.0, 1506.0, float('NaN'), float('NaN'), float('NaN'), float('NaN'), float('NaN'), float('NaN')],
'INT': [0.0, 0.0, 413.0, 777.0, 1171.0, 1509.0, 1967.0, 2310.0, 2627.0, 2970.0, 3357.0, 3768.0],
'FM': [-1, -1, "time0", -1, -1, "time0", -1, -1, -1, -1, -1,-1]
}
sorted_data = pd.DataFrame(data)
sorted_data['FM'] = np.nan
for id in sorted_data['ID'].unique():
filter_condition = (sorted_data['ID'] == id) & (~sorted_data['LMP'].isnull())
if filter_condition.any():
last_row_index = sorted_data.loc[filter_condition].index[-1]
sorted_data.loc[last_row_index, 'FM'] = 'time0'
sorted_data['FM'] = sorted_data['FM'].fillna(-1)
Expected output should be calculated as this :
'sub': [float('NaN'), 0-411.0,413-411, 777-1509.0 , 1171.0-1509.0 ,1509-1509, 1967.0-1509, 2310.0-1509,2627.0- 1509, 2970.0-1509, 3357.0-1509,3768.0-1509]
答案1
得分: 1
以下是代码的翻译部分:
data = {
"ID": [0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2],
"VIS": [0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
"STA": [float("NaN"), 4.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
"LMP": [
float("NaN"),
-35.0,
411.0,
773.0,
1143.0,
1506.0,
float("NaN"),
float("NaN"),
float("NaN"),
float("NaN"),
float("NaN"),
float("NaN"),
],
"INT": [
0.0,
0.0,
413.0,
777.0,
1171.0,
1509.0,
1967.0,
2310.0,
2627.0,
2970.0,
3357.0,
3768.0,
],
"FM": [-1, -1, "time0", -1, -1, "time0", -1, -1, -1, -1, -1, -1],
}
sorted_data = pd.DataFrame(data)
lmp_at_time0 = (
sorted_data.groupby(["ID"])
.apply(lambda grp: grp[grp["FM"] == "time0"]["LMP"])
.reset_index()
.drop(columns=["level_1"])
)
lmp_at_time0.columns = ["ID", "LMP_at_time0"]
sorted_data = sorted_data.merge(lmp_at_time0, on="ID", how="left")
sorted_data["INT"] - sorted_data["LMP_at_time0"]
英文:
Here is sample code that subtracts the INT column from the LMP column at time0
data = {
"ID": [0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2],
"VIS": [0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
"STA": [float("NaN"), 4.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
"LMP": [
float("NaN"),
-35.0,
411.0,
773.0,
1143.0,
1506.0,
float("NaN"),
float("NaN"),
float("NaN"),
float("NaN"),
float("NaN"),
float("NaN"),
],
"INT": [
0.0,
0.0,
413.0,
777.0,
1171.0,
1509.0,
1967.0,
2310.0,
2627.0,
2970.0,
3357.0,
3768.0,
],
"FM": [-1, -1, "time0", -1, -1, "time0", -1, -1, -1, -1, -1, -1],
}
sorted_data = pd.DataFrame(data)
lmp_at_time0 = (
sorted_data.groupby(["ID"])
.apply(lambda grp: grp[grp["FM"] == "time0"]["LMP"])
.reset_index()
.drop(columns=["level_1"])
)
lmp_at_time0.columns = ["ID", "LMP_at_time0"]
sorted_data = sorted_data.merge(lmp_at_time0, on="ID", how="left")
sorted_data["INT"] - sorted_data["LMP_at_time0"]
答案2
得分: 1
以下是您要翻译的内容:
IIUC,您可以使用以下代码来获取每个ID的最后一个"time0"值:
# 获取每个ID的最后一个time0值
mapper = (sorted_data
.loc[sorted_data['FM'].eq('time0')]
.drop_duplicates(subset='ID', keep='last')
.set_index('ID')['LMP']
)
# 映射并相减
sorted_data['sub'] = sorted_data['INT'].sub(sorted_data['ID'].map(mapper))
输出结果:
ID VIS STA LMP INT FM sub
0 0 0.0 NaN NaN 0.0 -1 NaN
1 1 0.0 4.0 -35.0 0.0 -1 -411.0
2 1 1.0 7.0 411.0 413.0 time0 2.0
3 2 2.0 7.0 773.0 777.0 -1 -729.0
4 2 3.0 7.0 1143.0 1171.0 -1 -335.0
5 2 4.0 7.0 1506.0 1509.0 time0 3.0
6 2 5.0 2.0 NaN 1967.0 -1 461.0
7 2 6.0 2.0 NaN 2310.0 -1 804.0
8 2 7.0 2.0 NaN 2627.0 -1 1121.0
9 2 8.0 2.0 NaN 2970.0 -1 1464.0
10 2 9.0 2.0 NaN 3357.0 -1 1851.0
11 2 10.0 2.0 NaN 3768.0 -1 2262.0
希望这对您有所帮助。如果您有任何其他问题,请随时提出。
英文:
IIUC, you can use:
# get the last time0 value per ID
mapper = (sorted_data
.loc[sorted_data['FM'].eq('time0')]
.drop_duplicates(subset='ID', keep='last')
.set_index('ID')['LMP']
)
# map and subtract
sorted_data['sub'] = sorted_data['INT'].sub(sorted_data['ID'].map(mapper))
Output:
ID VIS STA LMP INT FM sub
0 0 0.0 NaN NaN 0.0 -1 NaN
1 1 0.0 4.0 -35.0 0.0 -1 -411.0
2 1 1.0 7.0 411.0 413.0 time0 2.0
3 2 2.0 7.0 773.0 777.0 -1 -729.0
4 2 3.0 7.0 1143.0 1171.0 -1 -335.0
5 2 4.0 7.0 1506.0 1509.0 time0 3.0
6 2 5.0 2.0 NaN 1967.0 -1 461.0
7 2 6.0 2.0 NaN 2310.0 -1 804.0
8 2 7.0 2.0 NaN 2627.0 -1 1121.0
9 2 8.0 2.0 NaN 2970.0 -1 1464.0
10 2 9.0 2.0 NaN 3357.0 -1 1851.0
11 2 10.0 2.0 NaN 3768.0 -1 2262.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论