我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列?

huangapple go评论110阅读模式
英文:

How can I rearrange df as the nodes index in pytorch geometric manner?

问题

我想按照PyTorch Geometric的方式重新排列我的DataFrame(data),将节点索引转换为原始名称以提取节点嵌入。

以下是代码的翻译部分:

  1. import pandas as pd
  2. data = {'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow', 'Rainfall', 'Inflow', 'Inflow', 'Inflow', 'SWT', 'SP1', 'SP1', 'SWD'],
  3. 'Target': ['SP1', 'Evp', 'Outflow', 'SP2', 'SWD', 'SWD', 'SP2', 'SP1', 'SP1', 'SP2', 'Evp', 'Loss']}
  4. df = pd.DataFrame(data)
  5. nodes = pd.concat([df['Source'], df['Target']]).unique()
  6. node_indices = {node: i for i, node in enumerate(nodes)}
  7. df['Source'] = df['Source'].map(node_indices)
  8. df['Target'] = df['Target'].map(node_indices)

这是我的预期输出:

预期输出:
我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列?

感谢任何意见或建议。

英文:

I'd like to rearrange my dataframe (data) as the node index to the original name in pytorch geometric manner for extracting node embedding.

  1. import pandas as pd
  2. data = {'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow','Rainfall','Inflow', 'Inflow', 'Inflow','SWT','SP1','SP1','SWD'],
  3. 'Target': ['SP1', 'Evp', 'Outflow', 'SP2','SWD','SWD', 'SP2','SP1','SP1','SP2','Evp','Loss']}
  4. df = pd.DataFrame(data)
  5. nodes = pd.concat([df['Source'], df['Target']]).unique()
  6. node_indices = {node: i for i, node in enumerate(nodes)}
  7. df['Source'] = df['Source'].map(node_indices)
  8. df['Target'] = df['Target'].map(node_indices)

This is my expected outputs

Expected outputs:
我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列?

Appreciate any though or suggestions.

答案1

得分: 0

我假设目标是获得一些参数的中间编码,并制作源值和目标值之间关系的可视化表示(似乎都来自同一个池,即此案例中的“节点”)。

  1. data = {
  2. 'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow', 'Rainfall', 'Inflow', 'Inflow', 'Inflow', 'SWT', 'SP1', 'SP1', 'SWD'],
  3. 'Target': ['SP1', 'Evp', 'Outflow', 'SP2', 'SWD', 'SWD', 'SP2', 'SP1', 'SP1', 'SP2', 'Evp', 'Loss']
  4. }
  5. df = pd.DataFrame(data)
  6. # 获取数据的所有唯一值
  7. nodes = {name: code for code, name in enumerate({*df.values.flat})}
  8. # 用它们的编码替换值
  9. df_coded = df.replace(nodes)
  10. # 将原始数据和编码数据连接到一个DataFrame中
  11. # 使用多级标头以便按名称和代码分别分隔列
  12. df_repr = pd.concat([df, df_coded], axis=1, keys=['Name','Code'])
  13. # 通过名称和代码重排列列
  14. df_repr = df_repr.iloc[:, [0, 2, 3, 1]]
  15. # 显示转置表示,隐藏原始索引
  16. print(df_repr.T.to_string(header=False))

通过这个代码,我们获得了以下输出:

我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列?

更新

在行df_repr = df_repr.iloc[:, [0,2,3,1]]之后,这是按其索引重新排列列的方式。这里[0,2,3,1]表示“将第二列(索引为1)放在最后”。我们也可以使用.loc来实现,只需按所需顺序传递列的名称。在这种情况下,使用.loc会更长一些:

  1. df_repr = df_repr.loc[:, [('Name', 'Source'),
  2. ('Code', 'Source'),
  3. ('Code', 'Target'),
  4. ('Name', 'Target')]]

但如果可读性是首要任务,那当然最好使用loc而不是iloc

P.S.因为我们这里只操作列,所以下面的方法也有效:

  1. reordered_columns = [
  2. ('Name', 'Source'),
  3. ('Code', 'Source'),
  4. ('Code', 'Target'),
  5. ('Name', 'Target')
  6. ]
  7. df_repr = df_repr[reordered_columns]
英文:

I'm assuming the goal is to get an intermediate encoding of some parameters and make a visual representation of the relationships between source and target values (which seem to be all from the same pool, i.e. nodes in this case).

  1. data = {
  2. 'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow','Rainfall','Inflow', 'Inflow', 'Inflow','SWT','SP1','SP1','SWD'],
  3. 'Target': ['SP1', 'Evp', 'Outflow', 'SP2','SWD','SWD', 'SP2','SP1','SP1','SP2','Evp','Loss']
  4. }
  5. df = pd.DataFrame(data)
  6. # get all unique values of data
  7. nodes = {name: code for code, name in enumerate({*df.values.flat})}
  8. # replace values with their codes
  9. df_coded = df.replace(nodes)
  10. # connect original and encoded data in one DataFrame
  11. # use multilevel headers to ease separate columns by names and codes
  12. df_repr = pd.concat([df, df_coded], axis=1, keys=['Name','Code'])
  13. # rearange columns like (source|name, source|code, target|code, target|name)
  14. df_repr = df_repr.iloc[:, [0,2,3,1]]
  15. # display transposed representation with hidden original indexes
  16. print(df_repr.T.to_string(header=False))

With this we obtain the following output:

我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列?

Update

As of the line df_repr = df_repr.iloc[:, [0,2,3,1]], this is reordering of columns by their indexes. Here [0,2,3,1] means put the second column (which is indexed by 1) at the very end. We can do it with .loc as well by passing names of columns in a desired order. It's just gonna be somewhat longer in this case:

  1. df_repr = df_repr.loc[:, [('Name','Source'),
  2. ('Code','Source'),
  3. ('Code','Target'),
  4. ('Name','Target')]]

But if readability is a priority, then of course it's better to use loc instead of iloc.

P.S. Because we are manipulating here only with columns, the following also works:

  1. reordered_columns = [
  2. ('Name','Source'),
  3. ('Code','Source'),
  4. ('Code','Target'),
  5. ('Name','Target')
  6. ]
  7. df_repr = df_repr[reordered_columns]

huangapple
  • 本文由 发表于 2023年7月31日 18:54:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76802927.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定