“`python # 从一列中统计唯一值的数量,并将总和添加为一行 “`

huangapple go评论78阅读模式
英文:

dataframe count unique list values from a column and add the sum as a row

问题

I have a following dataframe:

data = {
    's1': [[1, 2], [None], [2, 3]],
    's2': [[4, 5], [6, 7], [3, 2]]
}

output:

        s1     s2
0   [1, 2]  [4, 5]
1      NaN  [6, 7]
2   [2, 3]  [3, 2]

I need to get a unique count of each element for these columns s1 and s2 and also add these counts as a row like:

expected output:
  step  count
0    1      3  # Ignoring NaN
1    2      6

What I did was a bit dirty:

s1_unique = df['s1'].explode().unique()
s2_unique = df['s2'].explode().unique()
new_df = pd.DataFrame()
new_df['step'] = [1, 2]
new_df['count'] = [len(s1_unique), len(s2_unique)]
new_df['name'] = 'Others'

Is there a "neat" dataframe way to handle this?

英文:

I have a following dataframe:

data = {
    's1': [[1, 2], [None], [2, 3]],
    's2': [[4, 5], [6, 7], [3, 2]]
}

output:

        s1     s2
0   [1, 2]  [4, 5]
1      NaN  [6, 7]
2   [2, 3]  [3, 2]

I need to get a unique counts of each elements for these columns s1 and s2 and also add these counts as a row like:
EDIT: also need to ignore None/null values from the count.

expected output:
  step      count
0    1      4 -> since [1,2,3,NaN] <<- EDIT this should only be 3 ignoring NaN
1    2     6  -> since[1,2,3,4,5,6]

What I did was a bit dirty:

s1_unique = df['s1'].explode().unique()
s2_unique = df['s2'].explode().unique()
new_df = pd.DataFrame()
new_df['step] = [1,2]
new_df['count'] = [len(s1_unique), len(s2_unique)]
new_df['name'] = 'Others'

Is there a "neat" dataframe way to handle this?

答案1

得分: 1

可以通过对每列进行一组计数应用并使用numpy连接来实现
```python
data = {
    's1': [[1, 2], [None], [2, 3]],
    's2': [[4, 5], [6, 7], [3, 2]]
}

df = pd.DataFrame(data)

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( np.concatenate(x.values))), axis=0)}
            )

#  	step 	count
# s1 	1 	4
# s2 	2 	6

编辑:

不计算 None 值:

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( np.concatenate(x.values)).difference({None})), axis=0)}
            )

或者

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( value for value in np.concatenate(x.values) if value is not None)), axis=0)}
            )
英文:

Can be done with a set count apply over each column + flattening with numpy concatenation:

data = {
    's1': [[1, 2], [None], [2, 3]],
    's2': [[4, 5], [6, 7], [3, 2]]
}

df = pd.DataFrame(data)

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( np.concatenate(x.values))), axis=0)}
            )

#  	step 	count
# s1 	1 	4
# s2 	2 	6

Edit:

Not counting None values:

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( np.concatenate(x.values)).difference({None})), axis=0)}
            )

or

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( value for value in np.concatenate(x.values) if value is not None)), axis=0)}
            )

答案2

得分: 1

另一种解决方案:

out = pd.DataFrame(
    [
        {"step": step, "count": len(df[c].explode().unique())}
        for step, c in enumerate(df, 1)
    ]
)
print(out)

打印结果:

   step  count
0     1      4
1     2      6

或者:

out = pd.DataFrame(
    [
        {"step": c, "count": len(df[c].explode().unique())}
        for c in df
    ]
)
print(out)

打印结果:

  step  count
0   s1      4
1   s2      6
英文:

Another solution:

out = pd.DataFrame(
    [
        {"step": step, "count": len(df[c].explode().unique())}
        for step, c in enumerate(df, 1)
    ]
)
print(out)

Prints:

   step  count
0     1      4
1     2      6

Or:

out = pd.DataFrame(
    [
        {"step": c, "count": len(df[c].explode().unique())}
        for c in df
    ]
)
print(out)

Prints:

  step  count
0   s1      4
1   s2      6

答案3

得分: 1

在你的情况下:

out = df.sum().map(set).map(len)
Out[97]: 
s1    4
s2    6
dtype: int64
英文:

So in your case

out = df.sum().map(set).map(len)
Out[97]: 
s1    4
s2    6
dtype: int64

答案4

得分: 1

这应该可以正常工作,如果你想忽略None:

df.sum().map(lambda x: len({i for i in x if i is not None}))

或者

df.stack().explode().groupby(level=1).nunique()

输出:

s1    3
s2    6
英文:

This should work if you would like to ignore None:

df.sum().map(lambda x: len({i for i in x if i is not None}))

or

df.stack().explode().groupby(level=1).nunique()

Output:

s1    3
s2    6

huangapple
  • 本文由 发表于 2023年6月30日 00:33:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582986.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定