英文:
return the location of non null values
问题
Here are the translated code parts:
我有一个看起来像这样的数据框:
0 1 2 3 4 5
0 NaN NaN 7.0 NaN NaN NaN
1 NaN NaN 9.0 NaN NaN NaN
2 5.0 NaN 3.0 NaN 9.0 NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN 1.0
我试图返回非空值的位置。
例如,7.0 在第一行和第二列 (0-2)。
expected = ["0-2", "1-2", "2-0", "2-2", "2-4", "4-5"]
数据框:
```python
mylist=[[np.nan, np.nan, 7, np.nan, np.nan, np.nan],[np.nan, np.nan, 9, np.nan, np.nan, np.nan],[5, np.nan, 3, np.nan, 9, np.nan],[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],[np.nan, np.nan,np.nan, np.nan,np.nan, 1]]
df = pd.DataFrame(mylist)
更新:
我在列表中得到了重复的行。例如,34-35 和 35-34 是相同的。
out = ['34-35', '35-34',
'41-42', '42-41',
'46-47', '47-46',
'59-63', '63-59',
'75-76', '76-75',
'87-88', '88-87']
我需要去除重复项并获得唯一的值,如下:
expected = ['34-35', '41-42', '46-47', '59-63', '75-76', '87-88']
注意:代码部分已被翻译,不包括问题的回答。
英文:
I have a dataframe that looks like this:
0 1 2 3 4 5
0 NaN NaN 7.0 NaN NaN NaN
1 NaN NaN 9.0 NaN NaN NaN
2 5.0 NaN 3.0 NaN 9.0 NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN 1.0
I am trying to return the location of non null values.
For e.g. 7.0 is in the first row and second column (0-2)
expected = ["0-2", "1-2", "2-0", "2-2", "2-4", "4-5"]
Dataframe:
mylist=[[np.nan, np.nan, 7, np.nan, np.nan, np.nan],[np.nan, np.nan, 9, np.nan, np.nan, np.nan],[5, np.nan, 3, np.nan, 9, np.nan],[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],[np.nan, np.nan,np.nan, np.nan,np.nan, 1]]
df = pd.DataFrame(mylist)
Update:
I am getting duplicate rows in the list. For e.g. 34-35 is same as 35-34
out = ['34-35', '35-34',
'41-42', '42-41',
'46-47', '47-46',
'59-63', '63-59',
'75-76', '76-75',
'87-88', '88-87']
I need to remove the duplicates and get the unique values like:
expected = ['34-35', '41-42', '46-47', '59-63', '75-76', '87-88']
答案1
得分: 4
使用DataFrame.stack
和列表推导来完成:
out = [f'{i}-{c}' for i, c in df.stack().index]
print(out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']
或者使用numpy.where
来获取索引:
如果列和索引默认为RangeIndex
:
ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(ro, co)]
print(out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']
如果不是默认的RangeIndex
,可以使用索引:
df = df.rename(index=lambda x: f'i{x}', columns=lambda x: f'c{x}')
print(df)
c0 c1 c2 c3 c4 c5
i0 NaN NaN 7.0 NaN NaN NaN
i1 NaN NaN 9.0 NaN NaN NaN
i2 5.0 NaN 3.0 NaN 9.0 NaN
i3 NaN NaN NaN NaN NaN NaN
i4 NaN NaN NaN NaN NaN 1.0
ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(df.index[ro], df.columns[co])]
print(out)
['i0-c2', 'i1-c2', 'i2-c0', 'i2-c2', 'i2-c4', 'i4-c5']
编辑:如果需要去除排序后的重复项:
使用pd.unique
来去除排序后的重复项:
out = pd.unique(['-'.join(map(str, sorted(x))) for x in df.stack().index]).tolist()
print(out)
['0-2', '1-2', '2-2', '2-4', '4-5']
或者在numpy.where
的情况下:
ro, co = np.where(df.notna())
out = pd.unique(['-'.join(map(str, sorted(x)) for x in zip(df.index[ro], df.columns[co]))]).tolist()
print(out)
['0-2', '1-2', '2-2', '2-4', '4-5']
英文:
Use list comprehension with DataFrame.stack
:
out = [f'{i}-{c}' for i, c in df.stack().index]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']
Or numpy.where
for indices;
Solution if columns and index are default RangeIndex
:
ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(ro, co)]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']
If not, use indexing:
df = df.rename(index = lambda x: f'i{x}', columns = lambda x: f'c{x}')
print (df)
c0 c1 c2 c3 c4 c5
i0 NaN NaN 7.0 NaN NaN NaN
i1 NaN NaN 9.0 NaN NaN NaN
i2 5.0 NaN 3.0 NaN 9.0 NaN
i3 NaN NaN NaN NaN NaN NaN
i4 NaN NaN NaN NaN NaN 1.0
ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(df.index[ro], df.columns[co])]
print (out)
['i0-c2', 'i1-c2', 'i2-c0', 'i2-c2', 'i2-c4', 'i4-c5']
EDIT: If need remove sorted duplicates:
out = pd.unique(['-'.join(map(str, sorted(x))) for x in df.stack().index]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']
ro, co = np.where(df.notna())
out = pd.unique(['-'.join(map(str, sorted(x)))
for x in zip(df.index[ro], df.columns[co])]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论