返回非空值的位置。

huangapple go评论54阅读模式
英文:

return the location of non null values

问题

Here are the translated code parts:

我有一个看起来像这样的数据框
0	1	2	3	4	5

0 NaN NaN 7.0 NaN NaN NaN
1 NaN NaN 9.0 NaN NaN NaN
2 5.0 NaN 3.0 NaN 9.0 NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN 1.0

我试图返回非空值的位置。
例如,7.0 在第一行和第二列 (0-2)。

expected = ["0-2", "1-2", "2-0", "2-2", "2-4", "4-5"]


数据框:
```python
mylist=[[np.nan, np.nan, 7, np.nan, np.nan, np.nan],[np.nan, np.nan, 9, np.nan, np.nan, np.nan],[5, np.nan, 3, np.nan, 9, np.nan],[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],[np.nan, np.nan,np.nan, np.nan,np.nan, 1]]
df = pd.DataFrame(mylist)

更新:

我在列表中得到了重复的行。例如,34-35 和 35-34 是相同的。

out = ['34-35', '35-34',
 '41-42', '42-41',
 '46-47', '47-46',
 '59-63', '63-59',
 '75-76', '76-75',
 '87-88', '88-87']

我需要去除重复项并获得唯一的值,如下:

expected = ['34-35', '41-42', '46-47', '59-63', '75-76', '87-88']

注意:代码部分已被翻译,不包括问题的回答。

英文:

I have a dataframe that looks like this:

	0	1	2	3	4	5
0	NaN	NaN	7.0	NaN	NaN	NaN
1	NaN	NaN	9.0	NaN	NaN	NaN
2	5.0	NaN	3.0	NaN	9.0	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	1.0

I am trying to return the location of non null values.
For e.g. 7.0 is in the first row and second column (0-2)

expected = ["0-2", "1-2", "2-0", "2-2", "2-4", "4-5"]

Dataframe:

mylist=[[np.nan, np.nan, 7, np.nan, np.nan, np.nan],[np.nan, np.nan, 9, np.nan, np.nan, np.nan],[5, np.nan, 3, np.nan, 9, np.nan],[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],[np.nan, np.nan,np.nan, np.nan,np.nan, 1]]

df = pd.DataFrame(mylist)

Update:

I am getting duplicate rows in the list. For e.g. 34-35 is same as 35-34

out = ['34-35', '35-34',
 '41-42', '42-41',
 '46-47', '47-46',
 '59-63', '63-59',
 '75-76', '76-75',
 '87-88', '88-87']

I need to remove the duplicates and get the unique values like:

expected = ['34-35', '41-42', '46-47', '59-63', '75-76', '87-88']

答案1

得分: 4

使用DataFrame.stack和列表推导来完成:

out = [f'{i}-{c}' for i, c in df.stack().index]
print(out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

或者使用numpy.where来获取索引:

如果列和索引默认为RangeIndex

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(ro, co)]
print(out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

如果不是默认的RangeIndex,可以使用索引:

df = df.rename(index=lambda x: f'i{x}', columns=lambda x: f'c{x}')
print(df)
     c0  c1   c2  c3   c4   c5
i0  NaN NaN  7.0 NaN  NaN  NaN
i1  NaN NaN  9.0 NaN  NaN  NaN
i2  5.0 NaN  3.0 NaN  9.0  NaN
i3  NaN NaN  NaN NaN  NaN  NaN
i4  NaN NaN  NaN NaN  NaN  1.0

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(df.index[ro], df.columns[co])]
print(out)
['i0-c2', 'i1-c2', 'i2-c0', 'i2-c2', 'i2-c4', 'i4-c5']

编辑:如果需要去除排序后的重复项:

使用pd.unique来去除排序后的重复项:

out = pd.unique(['-'.join(map(str, sorted(x))) for x in df.stack().index]).tolist()
print(out)
['0-2', '1-2', '2-2', '2-4', '4-5']

或者在numpy.where的情况下:

ro, co = np.where(df.notna())
out = pd.unique(['-'.join(map(str, sorted(x)) for x in zip(df.index[ro], df.columns[co]))]).tolist()
print(out)
['0-2', '1-2', '2-2', '2-4', '4-5']
英文:

Use list comprehension with DataFrame.stack:

out = [f'{i}-{c}' for i, c in df.stack().index]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

Or numpy.where for indices;

Solution if columns and index are default RangeIndex:

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(ro, co)]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

If not, use indexing:

df = df.rename(index = lambda x: f'i{x}', columns = lambda x: f'c{x}')
print (df)
     c0  c1   c2  c3   c4   c5
i0  NaN NaN  7.0 NaN  NaN  NaN
i1  NaN NaN  9.0 NaN  NaN  NaN
i2  5.0 NaN  3.0 NaN  9.0  NaN
i3  NaN NaN  NaN NaN  NaN  NaN
i4  NaN NaN  NaN NaN  NaN  1.0

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(df.index[ro], df.columns[co])]
print (out)
['i0-c2', 'i1-c2', 'i2-c0', 'i2-c2', 'i2-c4', 'i4-c5']

EDIT: If need remove sorted duplicates:

out = pd.unique(['-'.join(map(str, sorted(x))) for x in df.stack().index]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']

ro, co = np.where(df.notna())
out = pd.unique(['-'.join(map(str, sorted(x))) 
                 for x in zip(df.index[ro], df.columns[co])]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']

huangapple
  • 本文由 发表于 2023年3月1日 14:09:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75600113.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定