英文:
Error while trying to shift values in dataframe and get the difference
问题
我创建了一个用户定义的函数来在数据框上执行特定任务。它检查一个列上的条件,并使用第二列的值来在第三列中给出结果。
以下是我编写的函数:
import numpy as np
def strk_inter(x):
if x['SYMBOL'] == x['SYMBOL'].shift(1):
a = x['STRIKE_PR'].shift(1) - x['STRIKE_PR']
else :
a = np.nan
return a
optt_df['STRIKE_INTERVAL'] = optt_df.apply(strk_inter,axis=1)
它显示了一个错误:
AttributeError: 'str' object has no attribute 'shift'
数据集:
期望:
英文:
I made a user defined function to perform specific task on dataframe. It checks the condition on one column and uses the values of the second column to give the result in the third column.
Following is the function I wrote :
import numpy as np
def strk_inter(x):
if x['SYMBOL'] == x['SYMBOL'].shift(1):
a = x['STRIKE_PR'].shift(1) - x['STRIKE_PR']
else :
a = np.nan
return a
optt_df['STRIKE_INTERVAL'] = optt_df.apply(strk_inter,axis=1)
It shows me an error of
> AttributeError: 'str' object has no attribute 'shift'
Dataset:
Expectation:
答案1
得分: 1
Pandas apply,使用 axis=1
,将行传递给您的函数,因此 x['SYMBOL']
是该特定行中 SYMBOL 的值,而不是整个可移动的整列,因此出现了错误消息。
使用您提供的数据帧:
import pandas as pd
df = pd.DataFrame({"SYMBOL": ["A", "A", "B", "B"], "STRIKE_PR": [1000, 1100, 950, 960]})
print(df)
# 输出
SYMBOL STRIKE_PR
0 A 1000
1 A 1100
2 B 950
3 B 960
这是获取预期结果的一种方法:
def strk_inter(df_):
for i in df_[df_["SYMBOL"] == df_.shift(-1)["SYMBOL"]].index:
df_.at[i, "STRIKE_INTERVAL"] = (
df_.at[i + 1, "STRIKE_PR"] - df_.at[i, "STRIKE_PR"]
)
return df_
print(strk_inter(df))
# 输出
SYMBOL STRIKE_PR STRIKE_INTERVAL
0 A 1000 100.0
1 A 1100 NaN
2 B 950 10.0
3 B 960 NaN
英文:
Pandas apply, with axis=1
, is passing rows to your function, so x['SYMBOL']
is the value of the SYMBOL in that particular row, not the whole column that you can shift, hence the error message.
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame({"SYMBOL": ["A", "A", "B", "B"], "STRIKE_PR": [1000, 1100, 950, 960]})
print(df)
# Output
SYMBOL STRIKE_PR
0 A 1000
1 A 1100
2 B 950
3 B 960
Here is one way to get the expected result:
def strk_inter(df_):
for i in df_[df_["SYMBOL"] == df_.shift(-1)["SYMBOL"]].index:
df_.at[i, "STRIKE_INTERVAL"] = (
df_.at[i + 1, "STRIKE_PR"] - df_.at[i, "STRIKE_PR"]
)
return df_
print(strk_inter(df))
# Output
SYMBOL STRIKE_PR STRIKE_INTERVAL
0 A 1000 100.0
1 A 1100 NaN
2 B 950 10.0
3 B 960 NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论