在Pandas数据框系列上根据条件应用函数。

huangapple go评论159阅读模式
英文:

Applying function based on condition on pandas dataframe series

问题

  1. 我是Pandas的新手
  2. 我的数据框架:
  3. **df**

A B
first True
second False
third False
fourth True
fifth False

  1. **期望的输出**

A B C
first True en
second False
third False
fourth True en
fifth False

  1. 我试图仅在`B`列为`True`时对`C`列应用函数。
  2. **我使用的代码**
  3. ```python
  4. if (df['B'] == True)):
  5. df['C'] = df['A'].apply(
  6. lambda x: TextBlob(x).detect_language())

但是我遇到了一个错误:

  1. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我尝试过的

  1. df['B'].bool()
  2. df['B'] is True
  3. df['B'] == 'True'

但是错误仍然存在,不确定如何构建一个说'仅在B列为True时'的语句。

感谢您的建议。

  1. <details>
  2. <summary>英文:</summary>
  3. I am new to Pandas
  4. My dataframe:
  5. **df**

A B
first True
second False
third False
fourth True
fifth False

  1. **Desired output**

A B C
first True en
second False
third False
fourth True en
fifth False

  1. I am trying to apply a function to column `C` only when the `B` column is `True`.
  2. **What I use**

if (df['B'] == True)):
df['C'] = df['A'].apply(
lambda x: TextBlob(x).detect_language())

  1. But I get an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

  1. **What I&#39;ve tried**

df['B'].bool()
df['B'] is True
df['B'] == 'True'

  1. But the error persists, not sure how I would form a statement saying &#39;only where column B is True&#39;.
  2. Thank you for your suggestions.
  3. </details>
  4. # 答案1
  5. **得分**: 3
  6. 如果希望在没有匹配行的情况下获取缺失值,请在“apply”之前过滤行,仅处理具有“True”的行:
  7. ```python
  8. df['C'] = df.loc[df['B'], 'A'].apply(lambda x: TextBlob(x).detect_language())
  9. print(df)
  10. A B C
  11. 0 first True en
  12. 1 second False NaN
  13. 2 third False NaN
  14. 3 fourth True en
  15. 4 fifth False NaN

或者,如果需要空字符串来表示非匹配的值,但要处理所有列,请使用以下代码:

  1. df['C'] = np.where(df['B'], df['A'].apply(lambda x: TextBlob(x).detect_language()), '')
  2. print(df)
  3. A B C
  4. 0 first True en
  5. 1 second False
  6. 2 third False
  7. 3 fourth True en
  8. 4 fifth False
英文:

If want missing values for no matched rows filter rows before apply for processing only rows with Trues:

  1. df[&#39;C&#39;] = df.loc[df[&#39;B&#39;], &#39;A&#39;].apply(lambda x: TextBlob(x).detect_language())
  2. print (df)
  3. A B C
  4. 0 first True en
  5. 1 second False NaN
  6. 2 third False NaN
  7. 3 fourth True en
  8. 4 fifth False NaN

Or if need empty strings for non matched values, but apply processing all columns:

  1. df[&#39;C&#39;] = np.where(df[&#39;B&#39;], df[&#39;A&#39;].apply(lambda x: TextBlob(x).detect_language()), &#39;&#39;)
  2. print (df)
  3. A B C
  4. 0 first True en
  5. 1 second False
  6. 2 third False
  7. 3 fourth True en
  8. 4 fifth False

huangapple
  • 本文由 发表于 2020年1月6日 18:33:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610513.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定