2023年6月30日 00:33:13go评论78阅读模式

英文:

dataframe count unique list values from a column and add the sum as a row

问题

I have a following dataframe:

data = {
    's1': [[1, 2], [None], [2, 3]],
    's2': [[4, 5], [6, 7], [3, 2]]
}

output:

        s1     s2
0   [1, 2]  [4, 5]
1      NaN  [6, 7]
2   [2, 3]  [3, 2]

I need to get a unique count of each element for these columns s1 and s2 and also add these counts as a row like:

expected output:
  step  count
0    1      3  # Ignoring NaN
1    2      6

What I did was a bit dirty:

s1_unique = df['s1'].explode().unique()
s2_unique = df['s2'].explode().unique()
new_df = pd.DataFrame()
new_df['step'] = [1, 2]
new_df['count'] = [len(s1_unique), len(s2_unique)]
new_df['name'] = 'Others'

Is there a "neat" dataframe way to handle this?

英文:

I have a following dataframe:

data = {
    &#39;s1&#39;: [[1, 2], [None], [2, 3]],
    &#39;s2&#39;: [[4, 5], [6, 7], [3, 2]]
}

output:

        s1     s2
0   [1, 2]  [4, 5]
1      NaN  [6, 7]
2   [2, 3]  [3, 2]

I need to get a unique counts of each elements for these columns s1 and s2 and also add these counts as a row like:
EDIT: also need to ignore None/null values from the count.

expected output:
  step      count
0    1      4 -&gt; since [1,2,3,NaN] &lt;&lt;- EDIT this should only be 3 ignoring NaN
1    2     6  -&gt; since[1,2,3,4,5,6]

What I did was a bit dirty:

s1_unique = df[&#39;s1&#39;].explode().unique()
s2_unique = df[&#39;s2&#39;].explode().unique()
new_df = pd.DataFrame()
new_df[&#39;step] = [1,2]
new_df[&#39;count&#39;] = [len(s1_unique), len(s2_unique)]
new_df[&#39;name&#39;] = &#39;Others&#39;

Is there a "neat" dataframe way to handle this?

答案1

得分: 1

可以通过对每列进行一组计数应用并使用numpy连接来实现：
```python
data = {
    's1': [[1, 2], [None], [2, 3]],
    's2': [[4, 5], [6, 7], [3, 2]]
}

df = pd.DataFrame(data)

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( np.concatenate(x.values))), axis=0)}
            )

#  	step 	count
# s1 	1 	4
# s2 	2 	6

编辑：

不计算 None 值：

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( np.concatenate(x.values)).difference({None})), axis=0)}
            )

或者

pd.DataFrame( {'step': range(1, 1+df.shape[1]),
               'count': df.apply(lambda x : len(set( value for value in np.concatenate(x.values) if value is not None)), axis=0)}
            )

英文:

Can be done with a set count apply over each column + flattening with numpy concatenation:

data = {
    &#39;s1&#39;: [[1, 2], [None], [2, 3]],
    &#39;s2&#39;: [[4, 5], [6, 7], [3, 2]]
}

df = pd.DataFrame(data)

pd.DataFrame( {&#39;step&#39;: range(1, 1+df.shape[1]),
               &#39;count&#39;: df.apply(lambda x : len(set( np.concatenate(x.values))), axis=0)}
            )

#  	step 	count
# s1 	1 	4
# s2 	2 	6

Edit:

Not counting None values:

pd.DataFrame( {&#39;step&#39;: range(1, 1+df.shape[1]),
               &#39;count&#39;: df.apply(lambda x : len(set( np.concatenate(x.values)).difference({None})), axis=0)}
            )

pd.DataFrame( {&#39;step&#39;: range(1, 1+df.shape[1]),
               &#39;count&#39;: df.apply(lambda x : len(set( value for value in np.concatenate(x.values) if value is not None)), axis=0)}
            )

答案2

得分: 1

另一种解决方案：

out = pd.DataFrame(
    [
        {"step": step, "count": len(df[c].explode().unique())}
        for step, c in enumerate(df, 1)
    ]
)
print(out)

打印结果：

   step  count
0     1      4
1     2      6

或者：

out = pd.DataFrame(
    [
        {"step": c, "count": len(df[c].explode().unique())}
        for c in df
    ]
)
print(out)

打印结果：

  step  count
0   s1      4
1   s2      6

英文:

Another solution:

out = pd.DataFrame(
    [
        {&quot;step&quot;: step, &quot;count&quot;: len(df[c].explode().unique())}
        for step, c in enumerate(df, 1)
    ]
)
print(out)

Prints:

   step  count
0     1      4
1     2      6

Or:

out = pd.DataFrame(
    [
        {&quot;step&quot;: c, &quot;count&quot;: len(df[c].explode().unique())}
        for c in df
    ]
)
print(out)

Prints:

  step  count
0   s1      4
1   s2      6

答案3

得分: 1

在你的情况下：

out = df.sum().map(set).map(len)
Out[97]: 
s1    4
s2    6
dtype: int64

英文:

So in your case

out = df.sum().map(set).map(len)
Out[97]: 
s1    4
s2    6
dtype: int64

答案4

得分: 1

这应该可以正常工作，如果你想忽略None：

df.sum().map(lambda x: len({i for i in x if i is not None}))

或者

df.stack().explode().groupby(level=1).nunique()

输出：

s1    3
s2    6

英文:

This should work if you would like to ignore None:

df.sum().map(lambda x: len({i for i in x if i is not None}))

df.stack().explode().groupby(level=1).nunique()

Output:

s1    3
s2    6

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“`python # 从一列中统计唯一值的数量，并将总和添加为一行 “`

问题

答案1

答案2

答案3

答案4

如何在 Python 可执行文件中传递输入值？

计算一个144×144的矩阵，使用多个循环。

如何在Pyspark中迭代地评估当前行的前一行数值。

如何对包含字符串和整数的列中的整数进行数据框排序？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论