2023年8月4日 01:06:25go评论113阅读模式

英文:

Counting the number of times 3 different strings appear over numerous columns and putting this count in a new column

问题

I've translated the code part for you as requested:

df['Received_money'] = df[['Job1', 'Job2']].apply(lambda row: len(row[row == 'Yes']), axis=1)

If you have any more translation requests or need assistance with anything else, please feel free to ask.

英文:

I'm trying to create a new column that tallys up the number of times someone was paid for a job - regardless of if it was all of the money or just some of the money. So for each row, if it says "yes" or "partial" or "paid" in the job columns then I want a count of this in the new column.

My actual data has 15 different job columns that I want to "sum" across.

So before looks like:

Name	Job1	Job2
tom	Yes	No
nick	Partial	Yes
juli	No	No

And I'd like afterwards to look like:

Name	Job1	Job2	Received_money
tom	Yes	No	1
nick	Partial	Yes	2
juli	No	No	0

Current code

df['Received_money'] = df[['Job1', 'Job2']].apply(lambda row: len(row[row == 'Yes']), axis=1)
This is my current code and it partially does what I want. It adds up the number of times it says "Yes" in the columns listed. But:

I can't figure out how to expand this to include "== 'partial'" and "== 'paid'", and how to get it to give 1 point (so to speak) for each time these occur
Is there any other way of entering in all 15 of my column names instead of [['Job1', 'Job2', 'Job3', 'Job4', 'Job5'....'Job15' ]]

(Example data)

import pandas as pd
  
# initialize list of lists
data = [[&#39;tom&#39;, &quot;Yes&quot;, &quot;No&quot;], [&#39;nick&#39;, &quot;Partial&quot;, &quot;Yes&quot;], [&#39;juli&#39;, &quot;No&quot;, &quot;No&quot;]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=[&#39;Name&#39;, &#39;Job1&#39;, &#39;Job2&#39;])

Thank you!

答案1

得分: 1

你的方法看起来没问题。只需像这样添加其他过滤器：

df['Received_money'] = df[['Job1', 'Job2']].apply(lambda row: len(row[row == 'Yes']) + len(row[row == 'Partial']), axis=1)

英文:

Your approach seems okay. Just add the other filters like this:

df[&#39;Received_money&#39;] = df[[&#39;Job1&#39;, &#39;Job2&#39;]].apply(lambda row: len(row[row == &#39;Yes&#39;]) +len(row[row == &#39;Partial&#39;]), axis=1)

答案2

得分: 1

我添加了2个额外的工作列，只是为了测试。这应该满足您的要求。

data = [['tom', "Yes", "No", "Partial", "Paid"], ['nick', "Partial", "Yes"], ['juli', "No", "No", "Partial", "Paid"]]
df = pd.DataFrame(data, columns=['Name', 'Job1', 'Job2', 'Job3', 'Job4'])
job_cols = ['Job1', 'Job2', 'Job3', 'Job4']
paid_values = ['Yes', 'Paid', 'Partial']
df['Received_money'] = df[job_cols].apply(lambda row: len([r for r in row if r in paid_values]), axis=1)
print(df)

英文:

I added 2 more Job columns just for testing. This should satisfy your requirement.

data = [[&#39;tom&#39;, &quot;Yes&quot;, &quot;No&quot;,&quot;Partial&quot;,&quot;Paid&quot;], [&#39;nick&#39;, &quot;Partial&quot;, &quot;Yes&quot;], [&#39;juli&#39;, &quot;No&quot;, &quot;No&quot;,&quot;Partial&quot;,&quot;Paid&quot;]]
df = pd.DataFrame(data, columns=[&#39;Name&#39;, &#39;Job1&#39;, &#39;Job2&#39;,&#39;Job3&#39;,&#39;Job4&#39;])
job_cols =[&#39;Job1&#39;,&#39;Job2&#39;,&#39;Job3&#39;,&#39;Job4&#39;]
paid_values = [&#39;Yes&#39;,&#39;Paid&#39;,&#39;Partial&#39;]
df[&#39;Received_money&#39;] = df[job_cols].apply(lambda row : len([r for r in row if r in paid_values]),axis=1)
print(df)

答案3

得分: 1

不要使用 apply，您可以轻松向量化此操作：

df['Received_money'] = df.filter(like='Job').isin(['Yes', 'Partial']).sum(axis=1)

或者，如果 Job 列不是以字面意义上的 "Job" 开头：

cols = ['Job1', 'Job2']
df['Received_money'] = df[cols].isin(['Yes', 'Partial']).sum(axis=1)

输出结果：

   Name     Job1 Job2  Received_money
0   tom      Yes   No               1
1  nick  Partial  Yes               2
2  juli       No   No               0

英文:

Don't use apply, you can easily vectorize this:

df[&#39;Received_money&#39;] = df.filter(like=&#39;Job&#39;).isin([&#39;Yes&#39;, &#39;Partial&#39;]).sum(axis=1)

Or, if the Job columns don't start with a literal "Job":

cols = [&#39;Job1&#39;, &#39;Job2&#39;]
df[&#39;Received_money&#39;] = df[cols].isin([&#39;Yes&#39;, &#39;Partial&#39;]).sum(axis=1)

Output:

   Name     Job1 Job2  Received_money
0   tom      Yes   No               1
1  nick  Partial  Yes               2
2  juli       No   No               0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算3个不同字符串在多个列中出现的次数，并将此计数放入新列中。

问题

答案1

答案2

答案3

使用Python或Go创建一个与Google Talk集成的聊天机器人。

使用Python库来自定义Elasticsearch中的过滤器分析器。

Python创建两个文件的差异

打印数据框中每个唯一值的值，在for循环中。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论