英文:
ValueError due to duplicate axis when replace values in pandas dataframe
问题
I have one dataset, df, including nodes (N and T) and indicators assigned to nodes (IND_N and IND_T):
N T IND_N IND_T
0 John Mark 1 0
1 Mike John 2 1
2 Stephan Simon 1 0
3 Laura Stephan 1 1
4 Matt Simon 3 0
5 Simon Joey 0 2
I split the dataset into two, one (df1) with nodes that keep the indicators from df, the other one (df2) with indicators replaced by a dummy value.
df1
(keeps indicators from df)
N T IND_N IND_T
0 John Mark 1 0
1 Stephan Simon 1 0
2 Simon Joey 0 2
df2 (please note that, after splitting, I assigned a dummy value -1 to all the indicators in df2)
N T IND_N IND_T
0 Laura Stephan -1 -1
1 Matt Simon -1 -1
2 Mike John -1 -1
Since there could be nodes in df2 that can be also found in df1, to avoid the case of nodes being in both the datasets (df1 and df2) but having different indicators (e.g., Simon in the example above), I wanted to keep/replace the indicators of nodes that are both df2 and df1 with their original indicator (i.e., that one from df1), then recombine the two datasets in order to have the final output:
df_out
N T IND_N IND_T
0 John Mark 1 0
1 Stephan Simon 1 0
2 Simon Joey 0 2
3 Laura Stephan -1 1
4 Matt Simon -1 0
5 Mike John -1 1
Following the solution proposed here, I have got the following error:
ValueError: cannot reindex from a duplicate axis
I tried to fix it as follows:
temp = df_unlabel[values]
temp.update(df_label[values].set_index(col, inplace=True))
After checking the values in the final table (df_out), I found that there are no dummy variables assigned (they are replaced again by the original ones).
I'd appreciate your help to fix this error in order to get the final output.
Happy to provide more info if needed.
英文:
I have one dataset, df, including nodes (N and T) and indicators assigned to nodes (IND_N and IND_T):
N T IND_N IND_T
0 John Mark 1 0
1 Mike John 2 1
2 Stephan Simon 1 0
3 Laura Stephan 1 1
4 Matt Simon 3 0
5 Simon Joey 0 2
I split the dataset into two, one (df1) with nodes that keep the indicators from df, the other one (df2) with indicators replaced by a dummy value.
df1
(keeps indicators from df)
N T IND_N IND_T
0 John Mark 1 0
1 Stephan Simon 1 0
2 Simon Joey 0 2
df2 (please note that, after splitting, I assigned a dummy value -1 to all the indicators in df2)
N T IND_N IND_T
0 Laura Stephan -1 -1
1 Matt Simon -1 -1
2 Mike John -1 -1
Since there could be nodes in df2 that can be also found in df1, to avoid the case of nodes being in both the datasets (df1 and df2) but having different indicators (e.g., Simon in the example above), I wanted to keep/replace the indicators of nodes that are both df2 and df1 with their original indicator (i.e., that one from df1), then recombine the two datasets in order to have the final output:
df_out
N T IND_N IND_T
0 John Mark 1 0
1 Stephan Simon 1 0
2 Simon Joey 0 2
3 Laura Stephan -1 1
4 Matt Simon -1 0
5 Mike John -1 1
Following the solution proposed here, I have got the following error:
ValueError: cannot reindex from a duplicate axis
I tried to fix it as follows:
temp = df_unlabel[values]
temp.update(df_label[values].set_index(col, inplace=True))
After checking the values in the final table (df_out), I found that there are no dummy variables assigned (they are replaced again by the original ones).
I'd appreciate your help to fix this error in order to get the final output.
Happy to provide more info if needed.
答案1
得分: 1
You can use a mapping dict:
# 创建一个带有默认值的映射字典
dmap = pd.concat([df1.set_index('N')['IND_N'], df.set_index('T')['IND_T']]).to_dict()
dmap.update({'.*': -1})
df2[['IND_N', 'IND_T']] = df2[['N', 'T']].replace(dmap, regex=True).values
out = pd.concat([df1, df2], axis=0, ignore_index=True)
Output:
>>> out
N T IND_N IND_T
0 John Mark 1 0
1 Stephan Simon 1 0
2 Simon Joey 0 2
3 Laura Stephan -1 1
4 Matt Simon -1 0
5 Mike John -1 1
>>> dmap
{'John': 1, 'Stephan': 1, 'Simon': 0, 'Mark': 0, 'Joey': 2, '.*': -1}
(Note: I've provided the translated code and output as requested.)
英文:
You can use a mapping dict:
# Create a mapping dict with default value
dmap = pd.concat([df1.set_index('N')['IND_N'], df.set_index('T')['IND_T']]).to_dict()
dmap.update({'.*': -1})
df2[['IND_N', 'IND_T']] = df2[['N', 'T']].replace(dmap, regex=True).values
out = pd.concat([df1, df2], axis=0, ignore_index=True)
Output:
>>> out
N T IND_N IND_T
0 John Mark 1 0
1 Stephan Simon 1 0
2 Simon Joey 0 2
3 Laura Stephan -1 1
4 Matt Simon -1 0
5 Mike John -1 1
>>> dmap
{'John': 1, 'Stephan': 1, 'Simon': 0, 'Mark': 0, 'Joey': 2, '.*': -1}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论