2023年6月15日 18:05:45go评论101阅读模式

英文:

Advance sorting with pandas

问题

Input data:

  col1 col2
0  X    X
1  Y    Z
2  Z    A
4  A    
5  B

Output data:

  col1 col2
0  X    X
1  Y    
2  Z    Z
4  A    A
5  B

英文:

So i have input data just like bellow. I have 5 xslx tables. Within each xslx table are 5 columns. I want to join together xslx tables. So i did that with pd.concat. Problem occurs when i want to do some advanced sorting on that big joint together table. Bellow I tried to explain what i want to do so please check.

Input data:

  col1 col2
0  X    X
1  Y    Z
2  Z    A
4  A
5  B

I would like it to be sort like:

Output data:

  col1 col2
0  X    X
1  Y    
2  Z    Z
4  A    A
5  B

答案1

得分: 1

你确定你需要排序吗？这看起来更像是一个掩码操作（where + isin)：

df['col2'] = df['col1'].where(df['col1'].isin(df['col2']), '')

输出：

  col1 col2
0    X    X
1    Y     
2    Z    Z
4    A    A
5    B

或者也可以使用左连接的 merge：

df[['col1']].merge(df[['col2']], left_on='col1', right_on='col2', how='left')

输出：

  col1 col2
0    X    X
1    Y  NaN
2    Z    Z
3    A    A
4    B  NaN

英文:

Are you sure you need sorting? This rather looks like a mask (where + isin):

df[&#39;col2&#39;] = df[&#39;col1&#39;].where(df[&#39;col1&#39;].isin(df[&#39;col2&#39;]), &#39;&#39;)

Output:

  col1 col2
0    X    X
1    Y     
2    Z    Z
4    A    A
5    B

Or maybe a left-merge:

df[[&#39;col1&#39;]].merge(df[[&#39;col2&#39;]], left_on=&#39;col1&#39;, right_on=&#39;col2&#39;, how=&#39;left&#39;)

Output:

  col1 col2
0    X    X
1    Y  NaN
2    Z    Z
3    A    A
4    B  NaN

答案2

得分: 0

另一个可能的解决方案：

df.set_index('col1').join(df[['col2']].set_index('col2')).reset_index()

输出：

  col1 col2
0    A  NaN
1    B  NaN
2    X    X
3    Y    Z
4    Z    A

英文:

Another possible solution:

df.set_index(&#39;col1&#39;).join(df[[&#39;col2&#39;]].set_index(&#39;col2&#39;)).reset_index()

Output:

  col1 col2
0    A  NaN
1    B  NaN
2    X    X
3    Y    Z
4    Z    A

答案3

得分: 0

你可以通过遍历DataFrame中的每一行，并检查col1中的值是否在col2中来实现这一点。如果是的话，你可以将该行的col2值设置为col1的值。如果不是的话，你可以将该行的col2值设置为空字符串。

import pandas as pd
# 假设df是你的DataFrame
# 遍历每一行
for i in range(len(df)):
    # 如果col1中的值在col2中
    if df.loc[i, 'col1'] in df['col2'].values:
        # 将col2的值设置为col1的值
        df.loc[i, 'col2'] = df.loc[i, 'col1']
    else:
        # 将col2的值设置为空字符串
        df.loc[i, 'col2'] = ''
print(df)

英文:

You can achieve this by looping through each row in your DataFrame and checking if the value in col1 is in col2. If it is, you can set the value of col2 to the value of col1 for that row. If not, you can set the value of col2 to an empty string for that row.

import pandas as pd
# assuming df is your DataFrame
# iterate over each row
for i in range(len(df)):
    # if value in col1 is in col2
    if df.loc[i, &#39;col1&#39;] in df[&#39;col2&#39;].values:
        # set col2 value to col1 value
        df.loc[i, &#39;col2&#39;] = df.loc[i, &#39;col1&#39;]
    else:
        # set col2 value to an empty string
        df.loc[i, &#39;col2&#39;] = &#39;&#39;
print(df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用pandas进行高级排序

问题

答案1

答案2

答案3

在计算的条件下将两个Pyspark数据框连接起来。

使用Python库来自定义Elasticsearch中的过滤器分析器。

Pip在Windows 10上的路径问题

怎么创建一个可重复使用的函数来根据特定列中的值删除行？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论