2023年6月30日 04:17:31go评论114阅读模式

英文:

How to calculate the total count of characters in a python set

问题

A. 我已使用以下代码对上述数据进行了分组（对不包括id列的部分表示歉意）：

dff = df_new.groupby('id').agg({'brand':'first','title':'first','generic_keywords':'first',
                                  'revised_keywords': lambda x: set(x)})

所以我的第一个问题是，如何在逗号后面没有空格的情况下获得'revised_keywords'？
预期输出将类似于以下内容（仅显示第一行）：

{highland cow christmas ornament,cow tree ornament,highland cow christmas decoration,highland cow ornament,cow ornament tree}

现在来到我的第二个问题和主要问题，一旦我得到了上述更新的行（没有逗号后的空格），如何计算每行的字符总数？
len() 只会给我字符串的数量。

因此，输出看起来像这样：

5
16
19

但预期的输出将类似于以下内容：

123
261
347

（在Excel中计算，不包括逗号后的空格，但计算了单词和逗号之间的空格）。

能否请得到一些帮助？

TIA

英文:

I initially had 1 doubt, but while writing this question I got another doubt.
So I have a dataset as follows:

A. I have used the following code to get the above data groupedby (apologies for not including the id column):

dff = df_new.groupby(&#39;id&#39;).agg({&#39;brand&#39;:&#39;first&#39;,&#39;title&#39;:&#39;first&#39;,&#39;generic_keywords&#39;:&#39;first&#39;,
                                  &#39;revised_keywords&#39;: lambda x: set(x)})

so my 1st query is, how can I have the 'revised_keywords' without space after the commas ?
expected output would look something like this (here showing for just the 1st row):

{highland cow christmas ornament,cow tree ornament,highland cow christmas decoration,highland cow ornament,cow ornament tree}

Now coming to my 2nd query and main query, once I get the rows updated as above (without spaces after the commas), how can I calculate the total count of characters for each row ?
len() is just giving me the count of strings.

So the output looks something like this:

5
16
19

But the expected output would look something like this:

123
261
347

(calculated in excel without the spaces after each comma, but the spaces between the words and commas are calculated):

Can I please get some help on this ?

TIA

答案1

得分: 1

对于第一个问题，逗号后的空格可以通过在 lambda 函数中使用 ','.join(set(x)) 进行简单修改来解决，因为 set 是一个对象，它会原样打印内容。

对于计数问题，你可以使用

.apply(lambda x: sum(len(word) for words in x for word in words.split(&#39;,&#39;)))

这将帮助您计算字母的数量。这对你也会有帮助。帮助材料

英文:

For the first issue which is comma-separated space can be resolved by simple modification in your lambda function by using ','.join(set(x)) because set is an object and it will print the content originally.

For the count issue, you can use

.apply(lambda x: sum(len(word) for words in x for word in words.split(&#39;,&#39;)))

This will help you in counting the alphabet.

This will also be helpful for you.

helping material

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何计算Python集合中字符的总数

问题

答案1

如何在Plotly的饼图中任意位置添加标签和线？

快速实现Python代码以计算乘积的均值

Python：从JSON数据创建3D网格

为什么在这个特定的代码中使用 “if” 而不是 “elif” ？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。