2023年3月31日 03:16:46go评论106阅读模式

英文:

ValueError due to duplicate axis when replace values in pandas dataframe

问题

I have one dataset, df, including nodes (N and T) and indicators assigned to nodes (IND_N and IND_T):

         N        T  IND_N  IND_T
0     John     Mark      1      0
1     Mike     John      2      1
2  Stephan    Simon      1      0
3    Laura  Stephan      1      1
4     Matt    Simon      3      0
5    Simon     Joey      0      2

I split the dataset into two, one (df1) with nodes that keep the indicators from df, the other one (df2) with indicators replaced by a dummy value.

df1 (keeps indicators from df)

         N      T  IND_N  IND_T
0     John   Mark      1      0
1  Stephan  Simon      1      0
2    Simon   Joey      0      2

df2 (please note that, after splitting, I assigned a dummy value -1 to all the indicators in df2)

       N        T  IND_N  IND_T
0  Laura  Stephan     -1     -1
1   Matt    Simon     -1     -1
2   Mike     John     -1     -1

Since there could be nodes in df2 that can be also found in df1, to avoid the case of nodes being in both the datasets (df1 and df2) but having different indicators (e.g., Simon in the example above), I wanted to keep/replace the indicators of nodes that are both df2 and df1 with their original indicator (i.e., that one from df1), then recombine the two datasets in order to have the final output:

df_out

         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1

Following the solution proposed here, I have got the following error:

ValueError: cannot reindex from a duplicate axis

I tried to fix it as follows:

temp = df_unlabel[values]
temp.update(df_label[values].set_index(col, inplace=True))

After checking the values in the final table (df_out), I found that there are no dummy variables assigned (they are replaced again by the original ones).

I'd appreciate your help to fix this error in order to get the final output.
Happy to provide more info if needed.

英文:

I have one dataset, df, including nodes (N and T) and indicators assigned to nodes (IND_N and IND_T):

         N        T  IND_N  IND_T
0     John     Mark      1      0
1     Mike     John      2      1
2  Stephan    Simon      1      0
3    Laura  Stephan      1      1
4     Matt    Simon      3      0
5    Simon     Joey      0      2

I split the dataset into two, one (df1) with nodes that keep the indicators from df, the other one (df2) with indicators replaced by a dummy value.

df1 (keeps indicators from df)

         N      T  IND_N  IND_T
0     John   Mark      1      0
1  Stephan  Simon      1      0
2    Simon   Joey      0      2

df2 (please note that, after splitting, I assigned a dummy value -1 to all the indicators in df2)

       N        T  IND_N  IND_T
0  Laura  Stephan     -1     -1
1   Matt    Simon     -1     -1
2   Mike     John     -1     -1

df_out

         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1

Following the solution proposed here, I have got the following error:

ValueError: cannot reindex from a duplicate axis

I tried to fix it as follows:

temp = df_unlabel[values]
temp.update(df_label[values].set_index(col, inplace=True))

After checking the values in the final table (df_out), I found that there are no dummy variables assigned (they are replaced again by the original ones).

I'd appreciate your help to fix this error in order to get the final output.
Happy to provide more info if needed.

答案1

得分: 1

You can use a mapping dict:

# 创建一个带有默认值的映射字典
dmap = pd.concat([df1.set_index('N')['IND_N'], df.set_index('T')['IND_T']]).to_dict()
dmap.update({'.*': -1})
df2[['IND_N', 'IND_T']] = df2[['N', 'T']].replace(dmap, regex=True).values
out = pd.concat([df1, df2], axis=0, ignore_index=True)

Output:

>>> out
         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1
>>> dmap
{'John': 1, 'Stephan': 1, 'Simon': 0, 'Mark': 0, 'Joey': 2, '.*': -1}

(Note: I've provided the translated code and output as requested.)

英文:

You can use a mapping dict:

# Create a mapping dict with default value
dmap = pd.concat([df1.set_index(&#39;N&#39;)[&#39;IND_N&#39;], df.set_index(&#39;T&#39;)[&#39;IND_T&#39;]]).to_dict()
dmap.update({&#39;.*&#39;: -1})
df2[[&#39;IND_N&#39;, &#39;IND_T&#39;]] = df2[[&#39;N&#39;, &#39;T&#39;]].replace(dmap, regex=True).values
out = pd.concat([df1, df2], axis=0, ignore_index=True)

Output:

&gt;&gt;&gt; out
         N        T  IND_N  IND_T
0     John     Mark      1      0
1  Stephan    Simon      1      0
2    Simon     Joey      0      2
3    Laura  Stephan     -1      1
4     Matt    Simon     -1      0
5     Mike     John     -1      1
&gt;&gt;&gt; dmap
{&#39;John&#39;: 1, &#39;Stephan&#39;: 1, &#39;Simon&#39;: 0, &#39;Mark&#39;: 0, &#39;Joey&#39;: 2, &#39;.*&#39;: -1}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

ValueError 由于在 pandas 数据框中替换值时出现重复轴。

问题

答案1

String to HID key codes

在 pandas 中创建一列，该列中包含每天的平均损失值，放在列的最后一行。

从该网站使用Scrapy爬取数据

我如何高效地合并这些具有范围值的数据框？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。