ValueError 由于在 pandas 数据框中替换值时出现重复轴。

huangapple go评论70阅读模式
英文:

ValueError due to duplicate axis when replace values in pandas dataframe

问题

I have one dataset, df, including nodes (N and T) and indicators assigned to nodes (IND_N and IND_T):

         N        T  IND_N  IND_T
0     John     Mark      1      0
1     Mike     John      2      1
2  Stephan    Simon      1      0
3    Laura  Stephan      1      1
4     Matt    Simon      3      0
5    Simon     Joey      0      2

I split the dataset into two, one (df1) with nodes that keep the indicators from df, the other one (df2) with indicators replaced by a dummy value.

df1 (keeps indicators from df)

         N      T  IND_N  IND_T
0     John   Mark      1      0
1  Stephan  Simon      1      0
2    Simon   Joey      0      2

df2 (please note that, after splitting, I assigned a dummy value -1 to all the indicators in df2)

       N        T  IND_N  IND_T
0  Laura  Stephan     -1     -1
1   Matt    Simon     -1     -1
2   Mike     John     -1     -1

Since there could be nodes in df2 that can be also found in df1, to avoid the case of nodes being in both the datasets (df1 and df2) but having different indicators (e.g., Simon in the example above), I wanted to keep/replace the indicators of nodes that are both df2 and df1 with their original indicator (i.e., that one from df1), then recombine the two datasets in order to have the final output:

df_out

         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1

Following the solution proposed here, I have got the following error:

ValueError: cannot reindex from a duplicate axis

I tried to fix it as follows:

temp = df_unlabel[values]
temp.update(df_label[values].set_index(col, inplace=True))

After checking the values in the final table (df_out), I found that there are no dummy variables assigned (they are replaced again by the original ones).

I'd appreciate your help to fix this error in order to get the final output.
Happy to provide more info if needed.

英文:

I have one dataset, df, including nodes (N and T) and indicators assigned to nodes (IND_N and IND_T):

         N        T  IND_N  IND_T
0     John     Mark      1      0
1     Mike     John      2      1
2  Stephan    Simon      1      0
3    Laura  Stephan      1      1
4     Matt    Simon      3      0
5    Simon     Joey      0      2

I split the dataset into two, one (df1) with nodes that keep the indicators from df, the other one (df2) with indicators replaced by a dummy value.

df1 (keeps indicators from df)

         N      T  IND_N  IND_T
0     John   Mark      1      0
1  Stephan  Simon      1      0
2    Simon   Joey      0      2

df2 (please note that, after splitting, I assigned a dummy value -1 to all the indicators in df2)

       N        T  IND_N  IND_T
0  Laura  Stephan     -1     -1
1   Matt    Simon     -1     -1
2   Mike     John     -1     -1

Since there could be nodes in df2 that can be also found in df1, to avoid the case of nodes being in both the datasets (df1 and df2) but having different indicators (e.g., Simon in the example above), I wanted to keep/replace the indicators of nodes that are both df2 and df1 with their original indicator (i.e., that one from df1), then recombine the two datasets in order to have the final output:

df_out

         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1

Following the solution proposed here, I have got the following error:

ValueError: cannot reindex from a duplicate axis

I tried to fix it as follows:

temp = df_unlabel[values]
temp.update(df_label[values].set_index(col, inplace=True))

After checking the values in the final table (df_out), I found that there are no dummy variables assigned (they are replaced again by the original ones).

I'd appreciate your help to fix this error in order to get the final output.
Happy to provide more info if needed.

答案1

得分: 1

You can use a mapping dict:

# 创建一个带有默认值的映射字典
dmap = pd.concat([df1.set_index('N')['IND_N'], df.set_index('T')['IND_T']]).to_dict()
dmap.update({'.*': -1})

df2[['IND_N', 'IND_T']] = df2[['N', 'T']].replace(dmap, regex=True).values
out = pd.concat([df1, df2], axis=0, ignore_index=True)

Output:

>>> out
         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1

>>> dmap
{'John': 1, 'Stephan': 1, 'Simon': 0, 'Mark': 0, 'Joey': 2, '.*': -1}

(Note: I've provided the translated code and output as requested.)

英文:

You can use a mapping dict:

# Create a mapping dict with default value
dmap = pd.concat([df1.set_index('N')['IND_N'], df.set_index('T')['IND_T']]).to_dict()
dmap.update({'.*': -1})

df2[['IND_N', 'IND_T']] = df2[['N', 'T']].replace(dmap, regex=True).values
out = pd.concat([df1, df2], axis=0, ignore_index=True)

Output:

>>> out
         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1

>>> dmap
{'John': 1, 'Stephan': 1, 'Simon': 0, 'Mark': 0, 'Joey': 2, '.*': -1}

huangapple
  • 本文由 发表于 2023年3月31日 03:16:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75892147.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定