英文:
how to apply a def function with 2 parameters into panda columns table
问题
请问有人可以告诉我如何将具有两个参数的函数应用于DataFrame吗?我尝试了很多解决方案,但仍然没有成功。以下是我的代码。
import pandas as pd
df=pd.DataFrame({'tran_amt_lcy':[40,500,60],'tran_amt_usd':[30,40,50],'client_id':['2001','2033','2045']})
df.dtypes
def test_func(col1,col2):
if col1>30 & col2<500:
tran_status='approved'
else:
tran_status='declined'
return tran_status
df['tran_stat']=df.apply(lambda x:test_func(df['tran_amt_usd'],df['tran_amt_lcy']),axis=1)
错误消息仍然出现为ValueError:Series的真值是模糊的。使用a.empty、a.bool()、a.item()、a.any()或a.all()。
我不知道为什么它仍然失败。有人可以告诉我可能的方法吗?
非常感谢。
英文:
Would someone please let me know how to apply the function with 2 parameters into DataFrame? I have tried a lot of solution but still not successful. Here is my code below.
import pandas as pd
df=pd.DataFrame({'tran_amt_lcy':[40,500,60],'tran_amt_usd':[30,40,50],'client_id':['2001','2033','2045']})
df.dtypes
def test_func(col1,col2):
if col1>30 & col2<500:
tran_status='approved'
else:
tran_status='declined'
return tran_status
df['tran_stat']=df.apply(lambda x:test_func(df['tran_amt_usd'],df['tran_amt_lcy']),axis=1)
The error message still pop up as ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I don't know why it is still failed. Does anyone tell me the possible way?
Thanks a lot.
答案1
得分: 1
对于二进制条件,你可以使用 numpy.where
:
import numpy as np
# 布尔掩码
m = (df['tran_amt_usd'] > 30) & (df['tran_amt_lcy'] < 500)
df['tran_stat'] = np.where(m, 'approved', 'declined')
print(df)
# 输出
tran_amt_lcy tran_amt_usd client_id tran_stat
0 40 30 2001 declined
1 500 40 2033 declined
2 60 50 2045 approved
有很多帖子解释了这个错误。事实上,Python 不能将列表(或 Series)与标量值进行比较。在你的情况下,你尝试评估:
([30, 40, 50] > 30) & ([40, 500, 60] < 100)
更新
使用 def 函数来完成这个任务
def test_func(col1, col2):
m = (col1 > 30) & (col2 < 500)
return np.where(m, 'approved', 'declined')
# 这里不需要使用 apply
df['tran_stat'] = test_func(df['tran_amt_usd'], df['tran_amt_lcy'])
英文:
For binary condition, you can use numpy.where
:
import numpy as np
# Boolean mask
m = (df['tran_amt_usd'] > 30) & (df['tran_amt_lcy'] < 500)
df['tran_stat'] = np.where(m, 'approved', 'declined')
print(df)
# Output
tran_amt_lcy tran_amt_usd client_id tran_stat
0 40 30 2001 declined
1 500 40 2033 declined
2 60 50 2045 approved
There are many post which explain this error. In fact, Python can't compare a list (or a Series) to a scalar value. In your case, you try to evaluate:
([30, 40, 50] > 30) & ([40, 500, 60] < 100)
Update
> Do this with a def function
def test_func(col1, col2):
m = (col1 > 30) & (col2 < 500)
return np.where(m, 'approved', 'declined')
# You don't need apply here
df['tran_stat'] = test_func(df['tran_amt_usd'], df['tran_amt_lcy'])
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论