英文:
dataframe count unique list values from a column and add the sum as a row
问题
I have a following dataframe:
data = {
's1': [[1, 2], [None], [2, 3]],
's2': [[4, 5], [6, 7], [3, 2]]
}
output:
s1 s2
0 [1, 2] [4, 5]
1 NaN [6, 7]
2 [2, 3] [3, 2]
I need to get a unique count of each element for these columns s1
and s2
and also add these counts as a row like:
expected output:
step count
0 1 3 # Ignoring NaN
1 2 6
What I did was a bit dirty:
s1_unique = df['s1'].explode().unique()
s2_unique = df['s2'].explode().unique()
new_df = pd.DataFrame()
new_df['step'] = [1, 2]
new_df['count'] = [len(s1_unique), len(s2_unique)]
new_df['name'] = 'Others'
Is there a "neat" dataframe way to handle this?
英文:
I have a following dataframe:
data = {
's1': [[1, 2], [None], [2, 3]],
's2': [[4, 5], [6, 7], [3, 2]]
}
output:
s1 s2
0 [1, 2] [4, 5]
1 NaN [6, 7]
2 [2, 3] [3, 2]
I need to get a unique counts of each elements for these columns s1
and s2
and also add these counts as a row like:
EDIT: also need to ignore None/null values from the count.
expected output:
step count
0 1 4 -> since [1,2,3,NaN] <<- EDIT this should only be 3 ignoring NaN
1 2 6 -> since[1,2,3,4,5,6]
What I did was a bit dirty:
s1_unique = df['s1'].explode().unique()
s2_unique = df['s2'].explode().unique()
new_df = pd.DataFrame()
new_df['step] = [1,2]
new_df['count'] = [len(s1_unique), len(s2_unique)]
new_df['name'] = 'Others'
Is there a "neat" dataframe way to handle this?
答案1
得分: 1
可以通过对每列进行一组计数应用并使用numpy连接来实现:
```python
data = {
's1': [[1, 2], [None], [2, 3]],
's2': [[4, 5], [6, 7], [3, 2]]
}
df = pd.DataFrame(data)
pd.DataFrame( {'step': range(1, 1+df.shape[1]),
'count': df.apply(lambda x : len(set( np.concatenate(x.values))), axis=0)}
)
# step count
# s1 1 4
# s2 2 6
编辑:
不计算 None
值:
pd.DataFrame( {'step': range(1, 1+df.shape[1]),
'count': df.apply(lambda x : len(set( np.concatenate(x.values)).difference({None})), axis=0)}
)
或者
pd.DataFrame( {'step': range(1, 1+df.shape[1]),
'count': df.apply(lambda x : len(set( value for value in np.concatenate(x.values) if value is not None)), axis=0)}
)
英文:
Can be done with a set count apply over each column + flattening with numpy concatenation:
data = {
's1': [[1, 2], [None], [2, 3]],
's2': [[4, 5], [6, 7], [3, 2]]
}
df = pd.DataFrame(data)
pd.DataFrame( {'step': range(1, 1+df.shape[1]),
'count': df.apply(lambda x : len(set( np.concatenate(x.values))), axis=0)}
)
# step count
# s1 1 4
# s2 2 6
Edit:
Not counting None
values:
pd.DataFrame( {'step': range(1, 1+df.shape[1]),
'count': df.apply(lambda x : len(set( np.concatenate(x.values)).difference({None})), axis=0)}
)
or
pd.DataFrame( {'step': range(1, 1+df.shape[1]),
'count': df.apply(lambda x : len(set( value for value in np.concatenate(x.values) if value is not None)), axis=0)}
)
答案2
得分: 1
另一种解决方案:
out = pd.DataFrame(
[
{"step": step, "count": len(df[c].explode().unique())}
for step, c in enumerate(df, 1)
]
)
print(out)
打印结果:
step count
0 1 4
1 2 6
或者:
out = pd.DataFrame(
[
{"step": c, "count": len(df[c].explode().unique())}
for c in df
]
)
print(out)
打印结果:
step count
0 s1 4
1 s2 6
英文:
Another solution:
out = pd.DataFrame(
[
{"step": step, "count": len(df[c].explode().unique())}
for step, c in enumerate(df, 1)
]
)
print(out)
Prints:
step count
0 1 4
1 2 6
Or:
out = pd.DataFrame(
[
{"step": c, "count": len(df[c].explode().unique())}
for c in df
]
)
print(out)
Prints:
step count
0 s1 4
1 s2 6
答案3
得分: 1
在你的情况下:
out = df.sum().map(set).map(len)
Out[97]:
s1 4
s2 6
dtype: int64
英文:
So in your case
out = df.sum().map(set).map(len)
Out[97]:
s1 4
s2 6
dtype: int64
答案4
得分: 1
这应该可以正常工作,如果你想忽略None:
df.sum().map(lambda x: len({i for i in x if i is not None}))
或者
df.stack().explode().groupby(level=1).nunique()
输出:
s1 3
s2 6
英文:
This should work if you would like to ignore None:
df.sum().map(lambda x: len({i for i in x if i is not None}))
or
df.stack().explode().groupby(level=1).nunique()
Output:
s1 3
s2 6
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论