英文:
How can I rearrange df as the nodes index in pytorch geometric manner?
问题
我想按照PyTorch Geometric的方式重新排列我的DataFrame(data),将节点索引转换为原始名称以提取节点嵌入。
以下是代码的翻译部分:
import pandas as pd
data = {'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow', 'Rainfall', 'Inflow', 'Inflow', 'Inflow', 'SWT', 'SP1', 'SP1', 'SWD'],
'Target': ['SP1', 'Evp', 'Outflow', 'SP2', 'SWD', 'SWD', 'SP2', 'SP1', 'SP1', 'SP2', 'Evp', 'Loss']}
df = pd.DataFrame(data)
nodes = pd.concat([df['Source'], df['Target']]).unique()
node_indices = {node: i for i, node in enumerate(nodes)}
df['Source'] = df['Source'].map(node_indices)
df['Target'] = df['Target'].map(node_indices)
这是我的预期输出:
感谢任何意见或建议。
英文:
I'd like to rearrange my dataframe (data) as the node index to the original name in pytorch geometric manner for extracting node embedding.
import pandas as pd
data = {'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow','Rainfall','Inflow', 'Inflow', 'Inflow','SWT','SP1','SP1','SWD'],
'Target': ['SP1', 'Evp', 'Outflow', 'SP2','SWD','SWD', 'SP2','SP1','SP1','SP2','Evp','Loss']}
df = pd.DataFrame(data)
nodes = pd.concat([df['Source'], df['Target']]).unique()
node_indices = {node: i for i, node in enumerate(nodes)}
df['Source'] = df['Source'].map(node_indices)
df['Target'] = df['Target'].map(node_indices)
This is my expected outputs
Appreciate any though or suggestions.
答案1
得分: 0
我假设目标是获得一些参数的中间编码,并制作源值和目标值之间关系的可视化表示(似乎都来自同一个池,即此案例中的“节点”)。
data = {
'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow', 'Rainfall', 'Inflow', 'Inflow', 'Inflow', 'SWT', 'SP1', 'SP1', 'SWD'],
'Target': ['SP1', 'Evp', 'Outflow', 'SP2', 'SWD', 'SWD', 'SP2', 'SP1', 'SP1', 'SP2', 'Evp', 'Loss']
}
df = pd.DataFrame(data)
# 获取数据的所有唯一值
nodes = {name: code for code, name in enumerate({*df.values.flat})}
# 用它们的编码替换值
df_coded = df.replace(nodes)
# 将原始数据和编码数据连接到一个DataFrame中
# 使用多级标头以便按名称和代码分别分隔列
df_repr = pd.concat([df, df_coded], axis=1, keys=['Name','Code'])
# 通过名称和代码重排列列
df_repr = df_repr.iloc[:, [0, 2, 3, 1]]
# 显示转置表示,隐藏原始索引
print(df_repr.T.to_string(header=False))
通过这个代码,我们获得了以下输出:
更新
在行df_repr = df_repr.iloc[:, [0,2,3,1]]
之后,这是按其索引重新排列列的方式。这里[0,2,3,1]
表示“将第二列(索引为1)放在最后”。我们也可以使用.loc
来实现,只需按所需顺序传递列的名称。在这种情况下,使用.loc
会更长一些:
df_repr = df_repr.loc[:, [('Name', 'Source'),
('Code', 'Source'),
('Code', 'Target'),
('Name', 'Target')]]
但如果可读性是首要任务,那当然最好使用loc
而不是iloc
。
P.S.因为我们这里只操作列,所以下面的方法也有效:
reordered_columns = [
('Name', 'Source'),
('Code', 'Source'),
('Code', 'Target'),
('Name', 'Target')
]
df_repr = df_repr[reordered_columns]
英文:
I'm assuming the goal is to get an intermediate encoding of some parameters and make a visual representation of the relationships between source and target values (which seem to be all from the same pool, i.e. nodes
in this case).
data = {
'Source': ['Rainfall', 'SP2', 'SP2', 'Inflow','Rainfall','Inflow', 'Inflow', 'Inflow','SWT','SP1','SP1','SWD'],
'Target': ['SP1', 'Evp', 'Outflow', 'SP2','SWD','SWD', 'SP2','SP1','SP1','SP2','Evp','Loss']
}
df = pd.DataFrame(data)
# get all unique values of data
nodes = {name: code for code, name in enumerate({*df.values.flat})}
# replace values with their codes
df_coded = df.replace(nodes)
# connect original and encoded data in one DataFrame
# use multilevel headers to ease separate columns by names and codes
df_repr = pd.concat([df, df_coded], axis=1, keys=['Name','Code'])
# rearange columns like (source|name, source|code, target|code, target|name)
df_repr = df_repr.iloc[:, [0,2,3,1]]
# display transposed representation with hidden original indexes
print(df_repr.T.to_string(header=False))
With this we obtain the following output:
Update
As of the line df_repr = df_repr.iloc[:, [0,2,3,1]]
, this is reordering of columns by their indexes. Here [0,2,3,1]
means put the second column (which is indexed by 1) at the very end. We can do it with .loc
as well by passing names of columns in a desired order. It's just gonna be somewhat longer in this case:
df_repr = df_repr.loc[:, [('Name','Source'),
('Code','Source'),
('Code','Target'),
('Name','Target')]]
But if readability is a priority, then of course it's better to use loc
instead of iloc
.
P.S. Because we are manipulating here only with columns, the following also works:
reordered_columns = [
('Name','Source'),
('Code','Source'),
('Code','Target'),
('Name','Target')
]
df_repr = df_repr[reordered_columns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论