如何将一个具有两个参数的 def 函数应用于 Pandas 列表格中。

huangapple go评论66阅读模式
英文:

how to apply a def function with 2 parameters into panda columns table

问题

请问有人可以告诉我如何将具有两个参数的函数应用于DataFrame吗?我尝试了很多解决方案,但仍然没有成功。以下是我的代码。

import pandas as pd

df=pd.DataFrame({'tran_amt_lcy':[40,500,60],'tran_amt_usd':[30,40,50],'client_id':['2001','2033','2045']})

df.dtypes

def test_func(col1,col2):
    if col1>30 & col2<500:
        tran_status='approved'
    else:
        tran_status='declined'
    return tran_status

df['tran_stat']=df.apply(lambda x:test_func(df['tran_amt_usd'],df['tran_amt_lcy']),axis=1)

错误消息仍然出现为ValueError:Series的真值是模糊的。使用a.empty、a.bool()、a.item()、a.any()或a.all()。

我不知道为什么它仍然失败。有人可以告诉我可能的方法吗?

非常感谢。

英文:

Would someone please let me know how to apply the function with 2 parameters into DataFrame? I have tried a lot of solution but still not successful. Here is my code below.

import pandas as pd

df=pd.DataFrame({&#39;tran_amt_lcy&#39;:[40,500,60],&#39;tran_amt_usd&#39;:[30,40,50],&#39;client_id&#39;:[&#39;2001&#39;,&#39;2033&#39;,&#39;2045&#39;]})

df.dtypes

def test_func(col1,col2):
    if col1&gt;30 &amp; col2&lt;500:
        tran_status=&#39;approved&#39;
    else:
        tran_status=&#39;declined&#39;
    return tran_status

df[&#39;tran_stat&#39;]=df.apply(lambda x:test_func(df[&#39;tran_amt_usd&#39;],df[&#39;tran_amt_lcy&#39;]),axis=1)

The error message still pop up as ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I don't know why it is still failed. Does anyone tell me the possible way?

Thanks a lot.

答案1

得分: 1

对于二进制条件,你可以使用 numpy.where:

import numpy as np

# 布尔掩码
m = (df['tran_amt_usd'] > 30) & (df['tran_amt_lcy'] < 500)
df['tran_stat'] = np.where(m, 'approved', 'declined')
print(df)

# 输出
   tran_amt_lcy  tran_amt_usd client_id tran_stat
0            40            30      2001  declined
1           500            40      2033  declined
2            60            50      2045  approved

有很多帖子解释了这个错误。事实上,Python 不能将列表(或 Series)与标量值进行比较。在你的情况下,你尝试评估:

([30, 40, 50] > 30) & ([40, 500, 60] < 100)

更新

使用 def 函数来完成这个任务

def test_func(col1, col2):
    m = (col1 > 30) & (col2 < 500)
    return np.where(m, 'approved', 'declined')

# 这里不需要使用 apply
df['tran_stat'] = test_func(df['tran_amt_usd'], df['tran_amt_lcy'])
英文:

For binary condition, you can use numpy.where:

import numpy as np

# Boolean mask
m = (df[&#39;tran_amt_usd&#39;] &gt; 30) &amp; (df[&#39;tran_amt_lcy&#39;] &lt; 500)
df[&#39;tran_stat&#39;] = np.where(m, &#39;approved&#39;, &#39;declined&#39;)
print(df)

# Output
   tran_amt_lcy  tran_amt_usd client_id tran_stat
0            40            30      2001  declined
1           500            40      2033  declined
2            60            50      2045  approved

There are many post which explain this error. In fact, Python can't compare a list (or a Series) to a scalar value. In your case, you try to evaluate:

([30, 40, 50] &gt; 30) &amp; ([40, 500, 60] &lt; 100)

Update

> Do this with a def function

def test_func(col1, col2):
    m = (col1 &gt; 30) &amp; (col2 &lt; 500)
    return np.where(m, &#39;approved&#39;, &#39;declined&#39;)

# You don&#39;t need apply here
df[&#39;tran_stat&#39;] = test_func(df[&#39;tran_amt_usd&#39;], df[&#39;tran_amt_lcy&#39;])

</details>



huangapple
  • 本文由 发表于 2023年2月7日 01:04:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364399.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定