2023年7月31日 18:54:57go评论110阅读模式

英文:

How can I rearrange df as the nodes index in pytorch geometric manner?

问题

我想按照PyTorch Geometric的方式重新排列我的DataFrame（data），将节点索引转换为原始名称以提取节点嵌入。

以下是代码的翻译部分：

import pandas as pd
data = {'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow', 'Rainfall', 'Inflow', 'Inflow', 'Inflow', 'SWT', 'SP1', 'SP1', 'SWD'],
        'Target': ['SP1', 'Evp', 'Outflow', 'SP2', 'SWD', 'SWD', 'SP2', 'SP1', 'SP1', 'SP2', 'Evp', 'Loss']}
df = pd.DataFrame(data)
nodes = pd.concat([df['Source'], df['Target']]).unique()
node_indices = {node: i for i, node in enumerate(nodes)}
df['Source'] = df['Source'].map(node_indices)
df['Target'] = df['Target'].map(node_indices)

这是我的预期输出：

预期输出：

感谢任何意见或建议。

英文:

I'd like to rearrange my dataframe (data) as the node index to the original name in pytorch geometric manner for extracting node embedding.

import pandas as pd
data = {&#39;Source&#39;: [&#39;Rainfall&#39;, &#39;SP2&#39;, &#39;SP2&#39;, &#39;Inflow&#39;,&#39;Rainfall&#39;,&#39;Inflow&#39;, &#39;Inflow&#39;, &#39;Inflow&#39;,&#39;SWT&#39;,&#39;SP1&#39;,&#39;SP1&#39;,&#39;SWD&#39;],
       &#39;Target&#39;: [&#39;SP1&#39;, &#39;Evp&#39;, &#39;Outflow&#39;, &#39;SP2&#39;,&#39;SWD&#39;,&#39;SWD&#39;, &#39;SP2&#39;,&#39;SP1&#39;,&#39;SP1&#39;,&#39;SP2&#39;,&#39;Evp&#39;,&#39;Loss&#39;]}  
df = pd.DataFrame(data)
nodes = pd.concat([df[&#39;Source&#39;], df[&#39;Target&#39;]]).unique()
node_indices = {node: i for i, node in enumerate(nodes)}
df[&#39;Source&#39;] = df[&#39;Source&#39;].map(node_indices)
df[&#39;Target&#39;] = df[&#39;Target&#39;].map(node_indices)

This is my expected outputs

Expected outputs:

Appreciate any though or suggestions.

答案1

得分: 0

我假设目标是获得一些参数的中间编码，并制作源值和目标值之间关系的可视化表示（似乎都来自同一个池，即此案例中的“节点”）。

data = {
    'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow', 'Rainfall', 'Inflow', 'Inflow', 'Inflow', 'SWT', 'SP1', 'SP1', 'SWD'],
    'Target': ['SP1', 'Evp', 'Outflow', 'SP2', 'SWD', 'SWD', 'SP2', 'SP1', 'SP1', 'SP2', 'Evp', 'Loss']
}  
df = pd.DataFrame(data)
# 获取数据的所有唯一值
nodes = {name: code for code, name in enumerate({*df.values.flat})}
# 用它们的编码替换值
df_coded = df.replace(nodes)
# 将原始数据和编码数据连接到一个DataFrame中
# 使用多级标头以便按名称和代码分别分隔列
df_repr = pd.concat([df, df_coded], axis=1, keys=['Name','Code'])
# 通过名称和代码重排列列
df_repr = df_repr.iloc[:, [0, 2, 3, 1]]
# 显示转置表示，隐藏原始索引
print(df_repr.T.to_string(header=False))

通过这个代码，我们获得了以下输出：

更新

在行df_repr = df_repr.iloc[:, [0,2,3,1]]之后，这是按其索引重新排列列的方式。这里[0,2,3,1]表示“将第二列（索引为1）放在最后”。我们也可以使用.loc来实现，只需按所需顺序传递列的名称。在这种情况下，使用.loc会更长一些：

df_repr = df_repr.loc[:, [('Name', 'Source'), 
                          ('Code', 'Source'), 
                          ('Code', 'Target'), 
                          ('Name', 'Target')]]

但如果可读性是首要任务，那当然最好使用loc而不是iloc。

P.S.因为我们这里只操作列，所以下面的方法也有效：

reordered_columns = [
    ('Name', 'Source'), 
    ('Code', 'Source'), 
    ('Code', 'Target'), 
    ('Name', 'Target')
]
df_repr = df_repr[reordered_columns]

英文:

I'm assuming the goal is to get an intermediate encoding of some parameters and make a visual representation of the relationships between source and target values (which seem to be all from the same pool, i.e. nodes in this case).

data = {
    &#39;Source&#39;: [&#39;Rainfall&#39;, &#39;SP2&#39;, &#39;SP2&#39;, &#39;Inflow&#39;,&#39;Rainfall&#39;,&#39;Inflow&#39;, &#39;Inflow&#39;, &#39;Inflow&#39;,&#39;SWT&#39;,&#39;SP1&#39;,&#39;SP1&#39;,&#39;SWD&#39;],
    &#39;Target&#39;: [&#39;SP1&#39;, &#39;Evp&#39;, &#39;Outflow&#39;, &#39;SP2&#39;,&#39;SWD&#39;,&#39;SWD&#39;, &#39;SP2&#39;,&#39;SP1&#39;,&#39;SP1&#39;,&#39;SP2&#39;,&#39;Evp&#39;,&#39;Loss&#39;]
}  
df = pd.DataFrame(data)
# get all unique values of data
nodes = {name: code for code, name in enumerate({*df.values.flat})}
# replace values with their codes
df_coded = df.replace(nodes)
# connect original and encoded data in one DataFrame
# use multilevel headers to ease separate columns by names and codes
df_repr = pd.concat([df, df_coded], axis=1, keys=[&#39;Name&#39;,&#39;Code&#39;])
# rearange columns like (source|name, source|code, target|code, target|name)
df_repr = df_repr.iloc[:, [0,2,3,1]]
# display transposed representation with hidden original indexes
print(df_repr.T.to_string(header=False))

With this we obtain the following output:

Update

As of the line df_repr = df_repr.iloc[:, [0,2,3,1]], this is reordering of columns by their indexes. Here [0,2,3,1] means put the second column (which is indexed by 1) at the very end. We can do it with .loc as well by passing names of columns in a desired order. It's just gonna be somewhat longer in this case:

df_repr = df_repr.loc[:, [(&#39;Name&#39;,&#39;Source&#39;), 
                          (&#39;Code&#39;,&#39;Source&#39;), 
                          (&#39;Code&#39;,&#39;Target&#39;), 
                          (&#39;Name&#39;,&#39;Target&#39;)]]

But if readability is a priority, then of course it's better to use loc instead of iloc.

P.S. Because we are manipulating here only with columns, the following also works:

reordered_columns = [
    (&#39;Name&#39;,&#39;Source&#39;), 
    (&#39;Code&#39;,&#39;Source&#39;), 
    (&#39;Code&#39;,&#39;Target&#39;), 
    (&#39;Name&#39;,&#39;Target&#39;)
]
df_repr = df_repr[reordered_columns]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列？

问题

答案1

Python openpyxl字体TypeError

加载Mongo中的.bson集合的方法在docker-compose中是怎样的？

如何调整 Seaborn 散点图图例中点的大小？

将带有时区的日期时间转换为Python中的正确格式。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。