2020年1月6日 22:57:39go评论99阅读模式

英文:

how to compare column between two excel in python?

问题

我有两个Excel表格

Excel 1:

A,B,C

1,2,3

Excel 2:

A,C,B

1,3,2

如何根据Excel 1的列来重新排列Excel 2？

使A,C,B变为A,B,C

我使用以下代码来检查列的顺序：

comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
    pass
else:
    print('列顺序错误！！！！')

英文:

I have two excel

Excel 1 :

A,B,C

1,2,3

Excel 2 :

A,C,B

1,3,2

How can i re position the excel 2 base on excel 1 column ?

so that A,C,B and become A,B,C

I use the following code to check column orders:

comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
    pass
else:
    print(&#39;Wrong column order !!!!! &#39;)

答案1

得分: 1

df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)
if all(df1.columns == df2.columns):
    pass
else:
    df1 = df1[df2.columns]

英文:

df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)
if all(df1.columns == df2.columns):
    pass
else:
    df1 = df1[df2.columns]

答案2

得分: 0

不管数据来自Excel还是其他格式，只要您知道它们的列顺序相同，您可以这样做：

import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])
print(df1[df0.columns])
   A  B  C
0  1  2  3

英文:

It doesn't really matter if the data comes from excel or other format. If you know that both have the same columns up to order you could just

import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=[&quot;A&quot;,&quot;B&quot;,&quot;C&quot;])
df1 = pd.DataFrame([[1,3,2]], columns=[&quot;A&quot;,&quot;C&quot;,&quot;B&quot;])
print(df1[df0.columns])
   A  B  C
0  1  2  3

答案3

得分: 0

以下是翻译好的代码部分：

这段代码片段将正常工作：
    def areColumnSame(df1, df2, checkTypes = True):
        if checkTypes:
            type1 = dict(df1.dtypes)
            type2 = dict(df2.dtypes)
            return type1 == type2
        
        else:
            col1 = list(df1.columns)
            col2 = list(df2.columns)
            col1.sort()
            col2.sort()
            return col1 == col2
为了展示上面的代码如何工作，让我们探讨一些示例：
考虑三个 Excel 文件：
   
    | A | B | C |
    |---|---|---|
    | 1 | 2 | 3 |
    | 4 | 5 | 6 |
    
    | A | C | B |
    |---|---|---|
    | 1 | 3 | 2 |
    | 4 | 6 | 5 |
    
    | A | B | C | A.1 | B.1 | C.1 |
    |---|---|---|-----|-----|-----|
    | 1 | 2 | 3 | 1   | 2   | 3   |
    | 4 | 5 | 6 | 4   | 5   | 6   |
现在，对于第一个文件，`dict(df.dtypes)` 如下所示：
    {'A': dtype('int64'),
     'B': dtype('int64'),
     'C': dtype('int64')}
类似地，对于其他两个文件：
    {'A': dtype('int64'),
     'C': dtype('int64'),
     'B': dtype('int64')}
和 
    {'A': dtype('int64'),
     'B': dtype('int64'),
     'C': dtype('int64'),
     'A.1': dtype('int64'),
     'B.1': dtype('int64'),
     'C.1': dtype('int64')}
我们只需要比较这些字典来获得结果。同时，它还检查数据的类型。
因此，前两个文件之间的比较将为真，而与第三个文件的比较将返回假。
但您始终可以禁用类型检查，在这种情况下，我们只会检查`[A, B, C]`是否与`[A, C, B]`相同，而不会比较它们的类型。
<details>
<summary>英文:</summary>
This code snippet will work fine:
    def areColumnSame(df1, df2, checkTypes = True):
        if checkTypes:
            type1 = dict(df1.dtypes)
            type2 = dict(df2.dtypes)
            return type1 == type2
        
        else:
            col1 = list(df1.columns)
            col2 = list(df2.columns)
            col1.sort()
            col2.sort()
            return col1 == col2
To show how the above code works let us explore examples:
Consider three excel files:
   
    | A | B | C |
    |---|---|---|
    | 1 | 2 | 3 |
    | 4 | 5 | 6 |
    
    | A | C | B |
    |---|---|---|
    | 1 | 3 | 2 |
    | 4 | 6 | 5 |
    
    | A | B | C | A.1 | B.1 | C.1 |
    |---|---|---|-----|-----|-----|
    | 1 | 2 | 3 | 1   | 2   | 3   |
    | 4 | 5 | 6 | 4   | 5   | 6   |
Now for the first file the `dict(df.dtypes)` is shown below:
    {&#39;A&#39;: dtype(&#39;int64&#39;),
     &#39;B&#39;: dtype(&#39;int64&#39;),
     &#39;C&#39;: dtype(&#39;int64&#39;)}
Similarly for other two files:
    {&#39;A&#39;: dtype(&#39;int64&#39;),
     &#39;C&#39;: dtype(&#39;int64&#39;),
     &#39;B&#39;: dtype(&#39;int64&#39;)}
and 
    {&#39;A&#39;: dtype(&#39;int64&#39;),
     &#39;B&#39;: dtype(&#39;int64&#39;),
     &#39;C&#39;: dtype(&#39;int64&#39;),
     &#39;A.1&#39;: dtype(&#39;int64&#39;),
     &#39;B.1&#39;: dtype(&#39;int64&#39;),
     &#39;C.1&#39;: dtype(&#39;int64&#39;)}
 We just need to compare these dictionaries to get the result. At the same time, it also checks for the type of data.
Hence for the comparison between the first two files will be true whereas the comparison with third will return false.
But you can always disable the type-checking in which case we will just check whether `[A, B, C]` is the same as `[A, C, B]` without comparing their types.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中比较两个Excel文件中的列？

问题

答案1

答案2

答案3

不断闪烁的QPushButton在线程中未正常工作。

如何在Django中正确扩展用户模型

“有许多国家的图表的有意义的绘图”

描述数据何时使用 `value_counts`。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。