如何在Python中比较两个Excel文件中的列?

huangapple go评论99阅读模式
英文:

how to compare column between two excel in python?

问题

我有两个Excel表格

Excel 1:

A,B,C

1,2,3

Excel 2:

A,C,B

1,3,2

如何根据Excel 1的列来重新排列Excel 2?

使A,C,B变为A,B,C

我使用以下代码来检查列的顺序:

  1. comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
  2. if all(comparison_Columns):
  3. pass
  4. else:
  5. print('列顺序错误!!!!')
英文:

I have two excel

Excel 1 :

A,B,C

1,2,3

Excel 2 :

A,C,B

1,3,2

How can i re position the excel 2 base on excel 1 column ?

so that A,C,B and become A,B,C

I use the following code to check column orders:

  1. comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
  2. if all(comparison_Columns):
  3. pass
  4. else:
  5. print('Wrong column order !!!!! ')

答案1

得分: 1

  1. df1 = pd.read_excel(xls)
  2. df2 = pd.read_excel(xls2)
  3. if all(df1.columns == df2.columns):
  4. pass
  5. else:
  6. df1 = df1[df2.columns]
英文:
  1. df1 = pd.read_excel(xls)
  2. df2 = pd.read_excel(xls2)
  3. if all(df1.columns == df2.columns):
  4. pass
  5. else:
  6. df1 = df1[df2.columns]

答案2

得分: 0

不管数据来自Excel还是其他格式,只要您知道它们的列顺序相同,您可以这样做:

  1. import pandas as pd
  2. df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
  3. df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])
  4. print(df1[df0.columns])
  5. A B C
  6. 0 1 2 3
英文:

It doesn't really matter if the data comes from excel or other format. If you know that both have the same columns up to order you could just

  1. import pandas as pd
  2. df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
  3. df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])
  4. print(df1[df0.columns])
  5. A B C
  6. 0 1 2 3

答案3

得分: 0

以下是翻译好的代码部分:

  1. 这段代码片段将正常工作
  2. def areColumnSame(df1, df2, checkTypes = True):
  3. if checkTypes:
  4. type1 = dict(df1.dtypes)
  5. type2 = dict(df2.dtypes)
  6. return type1 == type2
  7. else:
  8. col1 = list(df1.columns)
  9. col2 = list(df2.columns)
  10. col1.sort()
  11. col2.sort()
  12. return col1 == col2
  13. 为了展示上面的代码如何工作让我们探讨一些示例
  14. 考虑三个 Excel 文件
  15. | A | B | C |
  16. |---|---|---|
  17. | 1 | 2 | 3 |
  18. | 4 | 5 | 6 |
  19. | A | C | B |
  20. |---|---|---|
  21. | 1 | 3 | 2 |
  22. | 4 | 6 | 5 |
  23. | A | B | C | A.1 | B.1 | C.1 |
  24. |---|---|---|-----|-----|-----|
  25. | 1 | 2 | 3 | 1 | 2 | 3 |
  26. | 4 | 5 | 6 | 4 | 5 | 6 |
  27. 现在对于第一个文件`dict(df.dtypes)` 如下所示
  28. {'A': dtype('int64'),
  29. 'B': dtype('int64'),
  30. 'C': dtype('int64')}
  31. 类似地对于其他两个文件
  32. {'A': dtype('int64'),
  33. 'C': dtype('int64'),
  34. 'B': dtype('int64')}
  35. {'A': dtype('int64'),
  36. 'B': dtype('int64'),
  37. 'C': dtype('int64'),
  38. 'A.1': dtype('int64'),
  39. 'B.1': dtype('int64'),
  40. 'C.1': dtype('int64')}
  41. 我们只需要比较这些字典来获得结果同时它还检查数据的类型
  42. 因此前两个文件之间的比较将为真而与第三个文件的比较将返回假
  43. 但您始终可以禁用类型检查在这种情况下我们只会检查`[A, B, C]`是否与`[A, C, B]`相同而不会比较它们的类型
  44. <details>
  45. <summary>英文:</summary>
  46. This code snippet will work fine:
  47. def areColumnSame(df1, df2, checkTypes = True):
  48. if checkTypes:
  49. type1 = dict(df1.dtypes)
  50. type2 = dict(df2.dtypes)
  51. return type1 == type2
  52. else:
  53. col1 = list(df1.columns)
  54. col2 = list(df2.columns)
  55. col1.sort()
  56. col2.sort()
  57. return col1 == col2
  58. To show how the above code works let us explore examples:
  59. Consider three excel files:
  60. | A | B | C |
  61. |---|---|---|
  62. | 1 | 2 | 3 |
  63. | 4 | 5 | 6 |
  64. | A | C | B |
  65. |---|---|---|
  66. | 1 | 3 | 2 |
  67. | 4 | 6 | 5 |
  68. | A | B | C | A.1 | B.1 | C.1 |
  69. |---|---|---|-----|-----|-----|
  70. | 1 | 2 | 3 | 1 | 2 | 3 |
  71. | 4 | 5 | 6 | 4 | 5 | 6 |
  72. Now for the first file the `dict(df.dtypes)` is shown below:
  73. {&#39;A&#39;: dtype(&#39;int64&#39;),
  74. &#39;B&#39;: dtype(&#39;int64&#39;),
  75. &#39;C&#39;: dtype(&#39;int64&#39;)}
  76. Similarly for other two files:
  77. {&#39;A&#39;: dtype(&#39;int64&#39;),
  78. &#39;C&#39;: dtype(&#39;int64&#39;),
  79. &#39;B&#39;: dtype(&#39;int64&#39;)}
  80. and
  81. {&#39;A&#39;: dtype(&#39;int64&#39;),
  82. &#39;B&#39;: dtype(&#39;int64&#39;),
  83. &#39;C&#39;: dtype(&#39;int64&#39;),
  84. &#39;A.1&#39;: dtype(&#39;int64&#39;),
  85. &#39;B.1&#39;: dtype(&#39;int64&#39;),
  86. &#39;C.1&#39;: dtype(&#39;int64&#39;)}
  87. We just need to compare these dictionaries to get the result. At the same time, it also checks for the type of data.
  88. Hence for the comparison between the first two files will be true whereas the comparison with third will return false.
  89. But you can always disable the type-checking in which case we will just check whether `[A, B, C]` is the same as `[A, C, B]` without comparing their types.
  90. </details>

huangapple
  • 本文由 发表于 2020年1月6日 22:57:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/59614275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定