如何在Python中比较两个Excel文件中的列?

huangapple go评论67阅读模式
英文:

how to compare column between two excel in python?

问题

我有两个Excel表格

Excel 1:

A,B,C

1,2,3

Excel 2:

A,C,B

1,3,2

如何根据Excel 1的列来重新排列Excel 2?

使A,C,B变为A,B,C

我使用以下代码来检查列的顺序:

comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
    pass
else:
    print('列顺序错误!!!!')
英文:

I have two excel

Excel 1 :

A,B,C

1,2,3

Excel 2 :

A,C,B

1,3,2

How can i re position the excel 2 base on excel 1 column ?

so that A,C,B and become A,B,C

I use the following code to check column orders:

comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
    pass
else:
    print('Wrong column order !!!!! ')

答案1

得分: 1

df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)

if all(df1.columns == df2.columns):
    pass
else:
    df1 = df1[df2.columns]
英文:
df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)

if all(df1.columns == df2.columns):
    pass
else:
    df1 = df1[df2.columns]

答案2

得分: 0

不管数据来自Excel还是其他格式,只要您知道它们的列顺序相同,您可以这样做:

import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])

print(df1[df0.columns])

   A  B  C
0  1  2  3
英文:

It doesn't really matter if the data comes from excel or other format. If you know that both have the same columns up to order you could just

import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])

print(df1[df0.columns])

   A  B  C
0  1  2  3

答案3

得分: 0

以下是翻译好的代码部分:

这段代码片段将正常工作


    def areColumnSame(df1, df2, checkTypes = True):
        if checkTypes:
            type1 = dict(df1.dtypes)
            type2 = dict(df2.dtypes)
            return type1 == type2
        
        else:
            col1 = list(df1.columns)
            col2 = list(df2.columns)
            col1.sort()
            col2.sort()
            return col1 == col2


为了展示上面的代码如何工作让我们探讨一些示例

考虑三个 Excel 文件
   

    | A | B | C |
    |---|---|---|
    | 1 | 2 | 3 |
    | 4 | 5 | 6 |
    
    | A | C | B |
    |---|---|---|
    | 1 | 3 | 2 |
    | 4 | 6 | 5 |
    
    | A | B | C | A.1 | B.1 | C.1 |
    |---|---|---|-----|-----|-----|
    | 1 | 2 | 3 | 1   | 2   | 3   |
    | 4 | 5 | 6 | 4   | 5   | 6   |

现在对于第一个文件,`dict(df.dtypes)` 如下所示

    {'A': dtype('int64'),
     'B': dtype('int64'),
     'C': dtype('int64')}

类似地对于其他两个文件

    {'A': dtype('int64'),
     'C': dtype('int64'),
     'B': dtype('int64')}


    {'A': dtype('int64'),
     'B': dtype('int64'),
     'C': dtype('int64'),
     'A.1': dtype('int64'),
     'B.1': dtype('int64'),
     'C.1': dtype('int64')}


我们只需要比较这些字典来获得结果同时它还检查数据的类型

因此前两个文件之间的比较将为真而与第三个文件的比较将返回假

但您始终可以禁用类型检查在这种情况下我们只会检查`[A, B, C]`是否与`[A, C, B]`相同而不会比较它们的类型

<details>
<summary>英文:</summary>

This code snippet will work fine:


    def areColumnSame(df1, df2, checkTypes = True):
        if checkTypes:
            type1 = dict(df1.dtypes)
            type2 = dict(df2.dtypes)
            return type1 == type2
        
        else:
            col1 = list(df1.columns)
            col2 = list(df2.columns)
            col1.sort()
            col2.sort()
            return col1 == col2


To show how the above code works let us explore examples:

Consider three excel files:
   

    | A | B | C |
    |---|---|---|
    | 1 | 2 | 3 |
    | 4 | 5 | 6 |
    
    | A | C | B |
    |---|---|---|
    | 1 | 3 | 2 |
    | 4 | 6 | 5 |
    
    | A | B | C | A.1 | B.1 | C.1 |
    |---|---|---|-----|-----|-----|
    | 1 | 2 | 3 | 1   | 2   | 3   |
    | 4 | 5 | 6 | 4   | 5   | 6   |

Now for the first file the `dict(df.dtypes)` is shown below:

    {&#39;A&#39;: dtype(&#39;int64&#39;),
     &#39;B&#39;: dtype(&#39;int64&#39;),
     &#39;C&#39;: dtype(&#39;int64&#39;)}

Similarly for other two files:

    {&#39;A&#39;: dtype(&#39;int64&#39;),
     &#39;C&#39;: dtype(&#39;int64&#39;),
     &#39;B&#39;: dtype(&#39;int64&#39;)}

and 

    {&#39;A&#39;: dtype(&#39;int64&#39;),
     &#39;B&#39;: dtype(&#39;int64&#39;),
     &#39;C&#39;: dtype(&#39;int64&#39;),
     &#39;A.1&#39;: dtype(&#39;int64&#39;),
     &#39;B.1&#39;: dtype(&#39;int64&#39;),
     &#39;C.1&#39;: dtype(&#39;int64&#39;)}


 We just need to compare these dictionaries to get the result. At the same time, it also checks for the type of data.

Hence for the comparison between the first two files will be true whereas the comparison with third will return false.

But you can always disable the type-checking in which case we will just check whether `[A, B, C]` is the same as `[A, C, B]` without comparing their types.


</details>



huangapple
  • 本文由 发表于 2020年1月6日 22:57:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/59614275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定