添加逗号分隔的字符串列表中的计数。

huangapple go评论65阅读模式
英文:

Add comma separated count from list of string

问题

我在我的数据框中有一个名为 diff_2 的列,其内容如下:


在 /Users 处发生错误,在 API GET /projects/{projectId} 中,响应属性 'id' 对于状态 '200' 变为可选 [response-property-became-optional]。

但我想要实现的是两种更改的计数,以逗号分隔。

我不太确定如何做到这一点,任何建议或想法都将非常有帮助。

英文:

I have a column named diff_2 in my df, which is of this form:

diff_2

error at /Users, in API GET /projects/{projectId} the response property 'id' became optional for the status '200' [response-property-became-optional]. 

But what I want to achieve is a count for both type of changes, comma separated.

I am not sure how this can be done, any suggestions or ideas would be really helpful.

答案1

得分: 1

这是一个更新后的答案,其中包括将计数结果合并并添加回原始数据帧中的部分。

我为了清晰起见将解决方案分为三个阶段:

  1. 计算content中不同错误消息的数量。
  2. 合并错误消息和计数为逗号分隔的字符串。
  3. 将结果添加回原始数据帧作为包含常量值的2列。

阶段1:进行计数。

# 计算不同内容项的数量。
cnt = df.groupby('content').count()

# 我的示例仅包含`diff_2`和`content`列。 
# 如果您的数据帧有额外的列,它们应该像这样被剥离:
cnt = cnt.loc[:,['diff_2']].copy()

# 适当标记计数列。
cnt.columns = ['count']

# 将`content`列从索引移回到列。
cnt.reset_index(inplace=True)

# 将列转换为字符串数据类型,以便可以组合它们。
cnt = cnt.astype(str)
print('阶段1')
print(cnt)

计数结果如下所示:

content count
0 api-path-removed-without-deprecation 14
1 response-property-became-optional 2

阶段2:将结果合并为逗号分隔的字符串。

# 制作一个简单的函数,接受一系列字符串并使用逗号分隔符将它们组合起来。
def combine(srs): return srs.str.cat(sep=',')

# 将此函数应用于计数数据帧的两列。
combined = cnt.apply(combine)
print()
print('阶段2')
print(combined)
  • 第二阶段的结果是一个Series,其中每行都是cnt中的列的字符串合并。
  • Series的索引是cnt中列的名称。

结果如下:

Index Value
content api-path-removed-without-deprecation,response-...
count 14,2
  • 阶段2的结果对原始df中的每一行都应该相同。
  • combined中的每一行都可以添加到原始df中作为带有常量的列:

阶段3:将结果添加回原始DataFrame。

for label in combined.index:
    # 在列名上添加前缀以避免重复的名称
    col_name = 'Merged_' + label
    # 将值设置为包含每行常量值的df列。
    df[col_name] = combined.at[label]
print()
print('最终阶段:\t将结果与原始df合并')
print(df.info())

结果是df中的两列新列:

Merged_content Merged_count
0 api-path-removed-without-deprecation,response-... 14,2
1 api-path-removed-without-deprecation,response-... 14,2
2 api-path-removed-without-deprecation,response-... 14,2
英文:

Here is an updated answer which includes combining the count results and adding them back into the original data frame.

I have devided the solutiion into three stages for clarity:

  1. Count the different error messages in content
  2. Merge the error messages and counts into comma seperated strings
  3. Add the results back into the original dataframe as 2 columns with constant values.

Stage 1: Do the counting.

# Count the number of different content items.
cnt = df.groupby('content').count()

# My example only has the `diff_2` and `content` coluumns. 
# If your daaframe has additional columns they should be stripped like this:
cnt = cnt.loc[:,['diff_2']].copy()

# Label the count column appropriately.
cnt.columns = ['count']

# Move the `content` column from the index to a column.
cnt.reset_index(inplace=True)

# Convert the columns to the string data type so that they can be combined.
cnt = cnt.astype(str)
print('Stage 1')
print(cnt)

The counting results look like this:

content count
0 api-path-removed-without-deprecation 14
1 response-property-became-optional 2

Stage 2: Merge the results as comma delimited strings.

# Make a simple function that takes a series of strings and combines them with a comma seperator.
def combine(srs): return srs.str.cat(sep=',')

# Apply this function to both columns of the count dataframe.
combined = cnt.apply(combine)
print()
print('Stage 2')
print(combined)
  • The second stage result is a Series, where each row is a string complilation of a columns in cnt.
  • The index of the series is the names of the columns in cnt.

The results look like this:

Index Value
content api-path-removed-without-deprecation,response-...
count 14,2
  • The stage 2 results should be the same for every row in the original df.
  • Each row in combined can be added to the original df as columns with constants:

Stage 3: Add the results back into the original DataFrame.

for label in combined.index:
    # Add a prefix to the column name to avoid duplicate names
    col_name = 'Merged_' + label
    # Set the value as a column in df with constant values for each row.
    df[col_name] = combined.at[label]
print()
print('The final stage:\t Merging the reults with the original df')
print(df.info())

The result is two new columns in df:

Merged_content Merged_count
0 api-path-removed-without-deprecation,response-... 14,2
1 api-path-removed-without-deprecation,response-... 14,2
2 api-path-removed-without-deprecation,response-... 14,2

...

huangapple
  • 本文由 发表于 2023年6月1日 05:55:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377540.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定