英文:
Add comma separated count from list of string
问题
我在我的数据框中有一个名为 diff_2
的列,其内容如下:
在 /Users 处发生错误,在 API GET /projects/{projectId} 中,响应属性 'id' 对于状态 '200' 变为可选 [response-property-became-optional]。
但我想要实现的是两种更改的计数,以逗号分隔。
我不太确定如何做到这一点,任何建议或想法都将非常有帮助。
英文:
I have a column named diff_2
in my df, which is of this form:
diff_2
error at /Users, in API GET /projects/{projectId} the response property 'id' became optional for the status '200' [response-property-became-optional].
But what I want to achieve is a count for both type of changes, comma separated.
I am not sure how this can be done, any suggestions or ideas would be really helpful.
答案1
得分: 1
这是一个更新后的答案,其中包括将计数结果合并并添加回原始数据帧中的部分。
我为了清晰起见将解决方案分为三个阶段:
- 计算
content
中不同错误消息的数量。 - 合并错误消息和计数为逗号分隔的字符串。
- 将结果添加回原始数据帧作为包含常量值的2列。
阶段1:进行计数。
# 计算不同内容项的数量。
cnt = df.groupby('content').count()
# 我的示例仅包含`diff_2`和`content`列。
# 如果您的数据帧有额外的列,它们应该像这样被剥离:
cnt = cnt.loc[:,['diff_2']].copy()
# 适当标记计数列。
cnt.columns = ['count']
# 将`content`列从索引移回到列。
cnt.reset_index(inplace=True)
# 将列转换为字符串数据类型,以便可以组合它们。
cnt = cnt.astype(str)
print('阶段1')
print(cnt)
计数结果如下所示:
content | count | |
---|---|---|
0 | api-path-removed-without-deprecation | 14 |
1 | response-property-became-optional | 2 |
阶段2:将结果合并为逗号分隔的字符串。
# 制作一个简单的函数,接受一系列字符串并使用逗号分隔符将它们组合起来。
def combine(srs): return srs.str.cat(sep=',')
# 将此函数应用于计数数据帧的两列。
combined = cnt.apply(combine)
print()
print('阶段2')
print(combined)
- 第二阶段的结果是一个Series,其中每行都是cnt中的列的字符串合并。
- Series的索引是cnt中列的名称。
结果如下:
Index | Value |
---|---|
content | api-path-removed-without-deprecation,response-... |
count | 14,2 |
- 阶段2的结果对原始df中的每一行都应该相同。
combined
中的每一行都可以添加到原始df中作为带有常量的列:
阶段3:将结果添加回原始DataFrame。
for label in combined.index:
# 在列名上添加前缀以避免重复的名称
col_name = 'Merged_' + label
# 将值设置为包含每行常量值的df列。
df[col_name] = combined.at[label]
print()
print('最终阶段:\t将结果与原始df合并')
print(df.info())
结果是df中的两列新列:
Merged_content | Merged_count | |
---|---|---|
0 | api-path-removed-without-deprecation,response-... | 14,2 |
1 | api-path-removed-without-deprecation,response-... | 14,2 |
2 | api-path-removed-without-deprecation,response-... | 14,2 |
英文:
Here is an updated answer which includes combining the count results and adding them back into the original data frame.
I have devided the solutiion into three stages for clarity:
- Count the different error messages in
content
- Merge the error messages and counts into comma seperated strings
- Add the results back into the original dataframe as 2 columns with constant values.
Stage 1: Do the counting.
# Count the number of different content items.
cnt = df.groupby('content').count()
# My example only has the `diff_2` and `content` coluumns.
# If your daaframe has additional columns they should be stripped like this:
cnt = cnt.loc[:,['diff_2']].copy()
# Label the count column appropriately.
cnt.columns = ['count']
# Move the `content` column from the index to a column.
cnt.reset_index(inplace=True)
# Convert the columns to the string data type so that they can be combined.
cnt = cnt.astype(str)
print('Stage 1')
print(cnt)
The counting results look like this:
content | count | |
---|---|---|
0 | api-path-removed-without-deprecation | 14 |
1 | response-property-became-optional | 2 |
Stage 2: Merge the results as comma delimited strings.
# Make a simple function that takes a series of strings and combines them with a comma seperator.
def combine(srs): return srs.str.cat(sep=',')
# Apply this function to both columns of the count dataframe.
combined = cnt.apply(combine)
print()
print('Stage 2')
print(combined)
- The second stage result is a Series, where each row is a string complilation of a columns in cnt.
- The index of the series is the names of the columns in cnt.
The results look like this:
Index | Value |
---|---|
content | api-path-removed-without-deprecation,response-... |
count | 14,2 |
- The stage 2 results should be the same for every row in the original df.
- Each row in
combined
can be added to the original df as columns with constants:
Stage 3: Add the results back into the original DataFrame.
for label in combined.index:
# Add a prefix to the column name to avoid duplicate names
col_name = 'Merged_' + label
# Set the value as a column in df with constant values for each row.
df[col_name] = combined.at[label]
print()
print('The final stage:\t Merging the reults with the original df')
print(df.info())
The result is two new columns in df:
Merged_content | Merged_count | |
---|---|---|
0 | api-path-removed-without-deprecation,response-... | 14,2 |
1 | api-path-removed-without-deprecation,response-... | 14,2 |
2 | api-path-removed-without-deprecation,response-... | 14,2 |
...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论