2023年6月1日 05:55:59go评论94阅读模式

英文:

Add comma separated count from list of string

问题

我在我的数据框中有一个名为 diff_2 的列，其内容如下：


在 /Users 处发生错误，在 API GET /projects/{projectId} 中，响应属性 'id' 对于状态 '200' 变为可选 [response-property-became-optional]。

但我想要实现的是两种更改的计数，以逗号分隔。

我不太确定如何做到这一点，任何建议或想法都将非常有帮助。

英文:

I have a column named diff_2 in my df, which is of this form:

diff_2
error at /Users, in API GET /projects/{projectId} the response property &#39;id&#39; became optional for the status &#39;200&#39; [response-property-became-optional].

But what I want to achieve is a count for both type of changes, comma separated.

I am not sure how this can be done, any suggestions or ideas would be really helpful.

答案1

得分: 1

这是一个更新后的答案，其中包括将计数结果合并并添加回原始数据帧中的部分。

我为了清晰起见将解决方案分为三个阶段：

计算content中不同错误消息的数量。
合并错误消息和计数为逗号分隔的字符串。
将结果添加回原始数据帧作为包含常量值的2列。

阶段1：进行计数。

# 计算不同内容项的数量。
cnt = df.groupby('content').count()
# 我的示例仅包含`diff_2`和`content`列。 
# 如果您的数据帧有额外的列，它们应该像这样被剥离：
cnt = cnt.loc[:,['diff_2']].copy()
# 适当标记计数列。
cnt.columns = ['count']
# 将`content`列从索引移回到列。
cnt.reset_index(inplace=True)
# 将列转换为字符串数据类型，以便可以组合它们。
cnt = cnt.astype(str)
print('阶段1')
print(cnt)

计数结果如下所示：

	content	count
0	api-path-removed-without-deprecation	14
1	response-property-became-optional	2

阶段2：将结果合并为逗号分隔的字符串。

# 制作一个简单的函数，接受一系列字符串并使用逗号分隔符将它们组合起来。
def combine(srs): return srs.str.cat(sep=',')
# 将此函数应用于计数数据帧的两列。
combined = cnt.apply(combine)
print()
print('阶段2')
print(combined)

第二阶段的结果是一个Series，其中每行都是cnt中的列的字符串合并。
Series的索引是cnt中列的名称。

结果如下：

Index	Value
content	api-path-removed-without-deprecation,response-...
count	14,2

阶段2的结果对原始df中的每一行都应该相同。
combined中的每一行都可以添加到原始df中作为带有常量的列：

阶段3：将结果添加回原始DataFrame。

for label in combined.index:
    # 在列名上添加前缀以避免重复的名称
    col_name = 'Merged_' + label
    # 将值设置为包含每行常量值的df列。
    df[col_name] = combined.at[label]
print()
print('最终阶段：\t将结果与原始df合并')
print(df.info())

结果是df中的两列新列：

	Merged_content	Merged_count
0	api-path-removed-without-deprecation,response-...	14,2
1	api-path-removed-without-deprecation,response-...	14,2
2	api-path-removed-without-deprecation,response-...	14,2

英文:

Here is an updated answer which includes combining the count results and adding them back into the original data frame.

I have devided the solutiion into three stages for clarity:

Count the different error messages in content
Merge the error messages and counts into comma seperated strings
Add the results back into the original dataframe as 2 columns with constant values.

Stage 1: Do the counting.

# Count the number of different content items.
cnt = df.groupby(&#39;content&#39;).count()
# My example only has the `diff_2` and `content` coluumns. 
# If your daaframe has additional columns they should be stripped like this:
cnt = cnt.loc[:,[&#39;diff_2&#39;]].copy()
# Label the count column appropriately.
cnt.columns = [&#39;count&#39;]
# Move the `content` column from the index to a column.
cnt.reset_index(inplace=True)
# Convert the columns to the string data type so that they can be combined.
cnt = cnt.astype(str)
print(&#39;Stage 1&#39;)
print(cnt)

The counting results look like this:

	content	count
0	api-path-removed-without-deprecation	14
1	response-property-became-optional	2

Stage 2: Merge the results as comma delimited strings.

# Make a simple function that takes a series of strings and combines them with a comma seperator.
def combine(srs): return srs.str.cat(sep=&#39;,&#39;)
# Apply this function to both columns of the count dataframe.
combined = cnt.apply(combine)
print()
print(&#39;Stage 2&#39;)
print(combined)

The second stage result is a Series, where each row is a string complilation of a columns in cnt.
The index of the series is the names of the columns in cnt.

The results look like this:

Index	Value
content	api-path-removed-without-deprecation,response-...
count	14,2

The stage 2 results should be the same for every row in the original df.
Each row in combined can be added to the original df as columns with constants:

Stage 3: Add the results back into the original DataFrame.

for label in combined.index:
    # Add a prefix to the column name to avoid duplicate names
    col_name = &#39;Merged_&#39; + label
    # Set the value as a column in df with constant values for each row.
    df[col_name] = combined.at[label]
print()
print(&#39;The final stage:\t Merging the reults with the original df&#39;)
print(df.info())

The result is two new columns in df:

	Merged_content	Merged_count
0	api-path-removed-without-deprecation,response-...	14,2
1	api-path-removed-without-deprecation,response-...	14,2
2	api-path-removed-without-deprecation,response-...	14,2

...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

添加逗号分隔的字符串列表中的计数。

问题

答案1

xlsxwriter设置num_format = “0!.0,”，但当我在Excel中打开它时，格式显示为0!!.0。

在链接中查找子字符串

Python: for循环无原因停止

Golang将错误的字符连接到字符串中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。