Python(pandas)- 计算列中的出现次数

huangapple go评论65阅读模式
英文:

Python (pandas) - Count Occurrences in Column

问题

我有一个数据框,想要创建一个新列来计算每行中的Participants数量。是否有一种方法可以做到这一点?

数据: invoice_df

Order Id,Date,Meal Id,Company Id,Date of Meal,Participants,Meal Price,Type of Meal
839FKFW2LLX4LMBB,27-05-2016,INBUX904GIHI8YBD,LJKS5NK6788CYMUU,2016-05-31 07:00:00+02:00,['David Bishop'],469,Breakfast
97OX39BGVMHODLJM,27-09-2018,J0MMOOPP709DIDIE,LJKS5NK6788CYMUU,2018-10-01 20:00:00+02:00,['David Bishop'],22,Dinner
041ORQM5OIHTIU6L,24-08-2014,E4UJLQNCI16UX5CS,LJKS5NK6788CYMUU,2014-08-23 14:00:00+02:00,['Karen Stansell'],314,Lunch
YT796QI18WNGZ7ZJ,12-04-2014,C9SDFHF7553BE247,LJKS5NK6788CYMUU,2014-04-07 21:00:00+02:00,['Addie Patino'],438,Dinner
6YLROQT27B6HRF4E,28-07-2015,48EQXS6IHYNZDDZ5,LJKS5NK6788CYMUU,2015-07-27 14:00:00+02:00,['Addie Patino' 'Susan Guerrero'],690,Lunch
AT0R4DFYYAFOC88Q,21-07-2014,W48JPR1UYWJ18NC6,LJKS5NK6788CYMUU,2014-07-17 20:00:00+02:00,['David Bishop' 'Susan Guerrero' 'Karen Stansell'],181,Dinner
2DDN2LHS7G85GKPQ,29-04-2014,1MKLAKBOE3SP7YUL,LJKS5NK6788CYMUU,2014-04-30 21:00:00+02:00,['Susan Guerrero' 'David Bishop'],14,Dinner
FM608JK1N01BPUQN,08-05-2014,E8WJZ1FOSKZD2MJN,36MFTZOYMTAJP1RK,2014-05-07 09:00:00+02:00,['Amanda Knowles' 'Cheryl Feaster' 'Ginger Hoagland' 'Michael White'],320,Breakfast
CK331XXNIBQT81QL,23-05-2015,CTZSFFKQTY7SBZ4J,36MFTZOYMTAJP1RK,2015-05-18 13:00:00+02:00,['Cheryl Feaster' 'Amanda Knowles' 'Ginger Hoagland'],697,Lunch
FESGKOQN2OZZWXY3,10-01-2016,US0NQYNNHS1SQJ4S,36MFTZOYMTAJP1RK,2016-01-14 22:00:00+01:00,['Glenn Gould' 'Amanda Knowles' 'Ginger Hoagland' 'Michael White'],451,Dinner
YITOTLOF0MWZ0VYX,03-10-2016,RGYX8772307H78ON,36MFTZOYMTAJP1RK,2016-10-01 22:00:00+02:00,['Ginger Hoagland' 'Amanda Knowles' 'Michael White'],263,Dinner
8RIGCF74GUEQHQEE,23-07-2018,5XK0KTFTD6OAP9ZP,36MFTZOYMTAJP1RK,2018-07-27 08:00:00+02:00,['Amanda Knowles'],210,Breakfast
TH60C9D8TPYS7DGG,15-12-2016,KDSMP2VJ22HNEPYF,36MFTZOYMTAJP1RK,2016-12-13 08:00:00+01:00,['Cheryl Feaster' 'Bret Adams' 'Ginger Hoagland'],755,Breakfast
W1Y086SRAVUZU1AL,17-09-2017,8IUOYVS031QPROUG,36MFTZOYMTAJP1RK,2017-09-14 13:00:00+02:00,['Bret Adams'],469,Lunch
WKB58Q8BHLOFQAB5,31-08-2016,E2K2TQUMENXSI9RP,36MFTZOYMTAJP1RK,2016-09-03 14:00:00+02:00,['Michael White' 'Ginger Hoagland' 'Bret Adams'],502,Lunch
N8DOG58MW238BHA9,25-12-2018,KFR2TAYXZSVCHAA2,36MFTZOYMTAJP1RK,2018-12-20 12:00:00+01:00,['Ginger Hoagland' 'Cheryl Feaster' 'Glenn Gould' 'Bret Adams'],829,Lunch
DPDV9UGF0SUCYTGW,25-05-2017,6YV61SH7W9ECUZP0,36MFTZOYMTAJP1RK,2017-05-24 22:00:00+02:00,['Michael White'],708,Dinner
KNF3E3QTOQ22J269,20-06-2018,737T2U7604ABDFDF,36MFTZOYMTAJP1RK,

<details>
<summary>英文:</summary>

I have a data frame and want to create a new column to count the number of `Participants` there are in each row. Is there a way to do this?

**Data:** invoice_df

Order Id,Date,Meal Id,Company Id,Date of Meal,Participants,Meal Price,Type of Meal
839FKFW2LLX4LMBB,27-05-2016,INBUX904GIHI8YBD,LJKS5NK6788CYMUU,2016-05-31 07:00:00+02:00,['David Bishop'],469,Breakfast
97OX39BGVMHODLJM,27-09-2018,J0MMOOPP709DIDIE,LJKS5NK6788CYMUU,2018-10-01 20:00:00+02:00,['David Bishop'],22,Dinner
041ORQM5OIHTIU6L,24-08-2014,E4UJLQNCI16UX5CS,LJKS5NK6788CYMUU,2014-08-23 14:00:00+02:00,['Karen Stansell'],314,Lunch
YT796QI18WNGZ7ZJ,12-04-2014,C9SDFHF7553BE247,LJKS5NK6788CYMUU,2014-04-07 21:00:00+02:00,['Addie Patino'],438,Dinner
6YLROQT27B6HRF4E,28-07-2015,48EQXS6IHYNZDDZ5,LJKS5NK6788CYMUU,2015-07-27 14:00:00+02:00,['Addie Patino' 'Susan Guerrero'],690,Lunch
AT0R4DFYYAFOC88Q,21-07-2014,W48JPR1UYWJ18NC6,LJKS5NK6788CYMUU,2014-07-17 20:00:00+02:00,['David Bishop' 'Susan Guerrero' 'Karen Stansell'],181,Dinner
2DDN2LHS7G85GKPQ,29-04-2014,1MKLAKBOE3SP7YUL,LJKS5NK6788CYMUU,2014-04-30 21:00:00+02:00,['Susan Guerrero' 'David Bishop'],14,Dinner
FM608JK1N01BPUQN,08-05-2014,E8WJZ1FOSKZD2MJN,36MFTZOYMTAJP1RK,2014-05-07 09:00:00+02:00,['Amanda Knowles' 'Cheryl Feaster' 'Ginger Hoagland' 'Michael White'],320,Breakfast
CK331XXNIBQT81QL,23-05-2015,CTZSFFKQTY7SBZ4J,36MFTZOYMTAJP1RK,2015-05-18 13:00:00+02:00,['Cheryl Feaster' 'Amanda Knowles' 'Ginger Hoagland'],697,Lunch
FESGKOQN2OZZWXY3,10-01-2016,US0NQYNNHS1SQJ4S,36MFTZOYMTAJP1RK,2016-01-14 22:00:00+01:00,['Glenn Gould' 'Amanda Knowles' 'Ginger Hoagland' 'Michael White'],451,Dinner
YITOTLOF0MWZ0VYX,03-10-2016,RGYX8772307H78ON,36MFTZOYMTAJP1RK,2016-10-01 22:00:00+02:00,['Ginger Hoagland' 'Amanda Knowles' 'Michael White'],263,Dinner
8RIGCF74GUEQHQEE,23-07-2018,5XK0KTFTD6OAP9ZP,36MFTZOYMTAJP1RK,2018-07-27 08:00:00+02:00,['Amanda Knowles'],210,Breakfast
TH60C9D8TPYS7DGG,15-12-2016,KDSMP2VJ22HNEPYF,36MFTZOYMTAJP1RK,2016-12-13 08:00:00+01:00,['Cheryl Feaster' 'Bret Adams' 'Ginger Hoagland'],755,Breakfast
W1Y086SRAVUZU1AL,17-09-2017,8IUOYVS031QPROUG,36MFTZOYMTAJP1RK,2017-09-14 13:00:00+02:00,['Bret Adams'],469,Lunch
WKB58Q8BHLOFQAB5,31-08-2016,E2K2TQUMENXSI9RP,36MFTZOYMTAJP1RK,2016-09-03 14:00:00+02:00,['Michael White' 'Ginger Hoagland' 'Bret Adams'],502,Lunch
N8DOG58MW238BHA9,25-12-2018,KFR2TAYXZSVCHAA2,36MFTZOYMTAJP1RK,2018-12-20 12:00:00+01:00,['Ginger Hoagland' 'Cheryl Feaster' 'Glenn Gould' 'Bret Adams'],829,Lunch
DPDV9UGF0SUCYTGW,25-05-2017,6YV61SH7W9ECUZP0,36MFTZOYMTAJP1RK,2017-05-24 22:00:00+02:00,['Michael White'],708,Dinner
KNF3E3QTOQ22J269,20-06-2018,737T2U7604ABDFDF,36MFTZOYMTAJP1RK,2018-06-15 07:00:00+02:00,['Glenn Gould' 'Cheryl Feaster' 'Ginger Hoagland' 'Amanda Knowles'],475,Breakfast
LEED1HY47M8BR5VL,22-10-2017,I22P10IQQD06MO45,36MFTZOYMTAJP1RK,2017-10-22 14:00:00+02:00,['Glenn Gould'],27,Lunch
LSJPNJQLDTIRNWAL,27-01-2017,247IIVNN6CXGWINB,36MFTZOYMTAJP1RK,2017-01-23 13:00:00+01:00,['Amanda Knowles' 'Bret Adams'],672,Lunch
6UX5RMHJ1GK1F9YQ,24-08-2014,LL4AOPXDM8V5KP5S,H3JRC7XX7WJAD4ZO,2014-08-27 12:00:00+02:00,['Anthony Emerson' 'Irvin Gentry' 'Melba Inlow'],552,Lunch
5SYB15QEFWD1E4Q4,09-07-2017,KZI0VRU30GLSDYHA,H3JRC7XX7WJAD4ZO,2017-07-13 08:00:00+02:00,"['Anthony Emerson' 'Emma Steitz' 'Melba Inlow' 'Irvin Gentry'
'Kelly Killebrew']",191,Breakfast
W5S8VZ61WJONS4EE,25-03-2017,XPSPBQF1YLIG26N1,H3JRC7XX7WJAD4ZO,2017-03-25 07:00:00+01:00,['Irvin Gentry' 'Kelly Killebrew'],471,Breakfast
795SVIJKO8KS3ZEL,05-01-2015,HHTLB8M9U0TGC7Z4,H3JRC7XX7WJAD4ZO,2015-01-06 22:00:00+01:00,['Emma Steitz'],588,Dinner
8070KEFYSSPWPCD0,05-08-2014,VZ2OL0LREO8V9RKF,H3JRC7XX7WJAD4ZO,2014-08-09 12:00:00+02:00,['Lewis Eyre'],98,Lunch
RUQOHROBGBOSNUO4,10-06-2016,R3LFUK1WFDODC1YF,H3JRC7XX7WJAD4ZO,2016-06-09 08:00:00+02:00,['Anthony Emerson' 'Kelly Killebrew' 'Lewis Eyre'],516,Breakfast
6P91QRADC2O9WOVT,25-09-2016,L2F2HEGB6Q141080,H3JRC7XX7WJAD4ZO,2016-09-26 07:00:00+02:00,"['Kelly Killebrew' 'Lewis Eyre' 'Irvin Gentry' 'Emma Steitz'
'Anthony Emerson']",664,Breakfast


</details>


# 答案1
**得分**: 1

是的,你可以这样做。如果“Participants”列包含Python列表,你可以计算每行列表的长度:

```python
df['num_participants'] = df['Participants'].apply(lambda x: len(x))

然而,如果你直接读取CSV文件,该列将被视为文本,而不是列表,我们将得到字符的数量,而不是列表的长度。
因为你的数据没有用逗号分隔列表中的值,你可以计算单引号的数量并除以2:

df['num_participants'] = df['Participants'].apply(lambda x: x.count("&#39;")/2)
英文:

Yes you can do that. If the Participants column would contain python lists you could calculate the length of that list for each row:

df[&#39;num_participants&#39;] = df[&#39;Participants&#39;].apply(lambda x: len(x))

However, if you read the csv directly the column will be text and we will get the number of characters instead of the list length.
Because your data does not have comma's separating the values in the list, you could count apostrophes and divide by 2:

df[&#39;num_participants&#39;] = df[&#39;Participants&#39;].apply(lambda x: x.count(&quot;&#39;&quot;)/2)

答案2

得分: 0

它们似乎只是list,所以您应该可以通过将len应用于它们来获得所需的结果,考虑以下简单示例:

import pandas as pd

import pandas as pd
df = pd.DataFrame({'col1': [['A'],['A','B'],['A','B','C']]})
df['col1cnt'] = df.col1.apply(len)
print(df)

输出结果为:

        col1  col1cnt
0        [A]        1
1     [A, B]        2
2  [A, B, C]        3
英文:

They, seems to be just list so you should be able to get desired result by applying len to them, consider following simple example
import pandas as pd

import pandas as pd
df = pd.DataFrame({&#39;col1&#39;: [[&#39;A&#39;],[&#39;A&#39;,&#39;B&#39;],[&#39;A&#39;,&#39;B&#39;,&#39;C&#39;]]})
df[&#39;col1cnt&#39;] = df.col1.apply(len)
print(df)

gives output

        col1  col1cnt
0        [A]        1
1     [A, B]        2
2  [A, B, C]        3

答案3

得分: 0

你可以创建一个新列来计算每行参与者的数量,方法如下:

invoice_df['participant_count'] = invoice_df['Participants'].apply(lambda x: len(x))

apply函数会将一个自定义函数应用到Participants列中的每个元素上。这个函数会返回列表的长度,并将其作为新列的值。

英文:

You can create a new column to count the number of participants in each row as follows:

invoice_df[&#39;participant_count&#39;] = invoice_df[&#39;Participants&#39;].apply(lambda x: len(x))

The apply function applies a custom function to each element in the Participants column. The function returns the length of the list and puts it as the value of the new column.

huangapple
  • 本文由 发表于 2023年2月8日 23:18:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75387889.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定