英文:
how to compare column between two excel in python?
问题
我有两个Excel表格
Excel 1:
A,B,C
1,2,3
Excel 2:
A,C,B
1,3,2
如何根据Excel 1的列来重新排列Excel 2?
使A,C,B变为A,B,C
我使用以下代码来检查列的顺序:
comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
pass
else:
print('列顺序错误!!!!')
英文:
I have two excel
Excel 1 :
A,B,C
1,2,3
Excel 2 :
A,C,B
1,3,2
How can i re position the excel 2 base on excel 1 column ?
so that A,C,B and become A,B,C
I use the following code to check column orders:
comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
pass
else:
print('Wrong column order !!!!! ')
答案1
得分: 1
df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)
if all(df1.columns == df2.columns):
pass
else:
df1 = df1[df2.columns]
英文:
df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)
if all(df1.columns == df2.columns):
pass
else:
df1 = df1[df2.columns]
答案2
得分: 0
不管数据来自Excel还是其他格式,只要您知道它们的列顺序相同,您可以这样做:
import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])
print(df1[df0.columns])
A B C
0 1 2 3
英文:
It doesn't really matter if the data comes from excel or other format. If you know that both have the same columns up to order you could just
import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])
print(df1[df0.columns])
A B C
0 1 2 3
答案3
得分: 0
以下是翻译好的代码部分:
这段代码片段将正常工作:
def areColumnSame(df1, df2, checkTypes = True):
if checkTypes:
type1 = dict(df1.dtypes)
type2 = dict(df2.dtypes)
return type1 == type2
else:
col1 = list(df1.columns)
col2 = list(df2.columns)
col1.sort()
col2.sort()
return col1 == col2
为了展示上面的代码如何工作,让我们探讨一些示例:
考虑三个 Excel 文件:
| A | B | C |
|---|---|---|
| 1 | 2 | 3 |
| 4 | 5 | 6 |
| A | C | B |
|---|---|---|
| 1 | 3 | 2 |
| 4 | 6 | 5 |
| A | B | C | A.1 | B.1 | C.1 |
|---|---|---|-----|-----|-----|
| 1 | 2 | 3 | 1 | 2 | 3 |
| 4 | 5 | 6 | 4 | 5 | 6 |
现在,对于第一个文件,`dict(df.dtypes)` 如下所示:
{'A': dtype('int64'),
'B': dtype('int64'),
'C': dtype('int64')}
类似地,对于其他两个文件:
{'A': dtype('int64'),
'C': dtype('int64'),
'B': dtype('int64')}
和
{'A': dtype('int64'),
'B': dtype('int64'),
'C': dtype('int64'),
'A.1': dtype('int64'),
'B.1': dtype('int64'),
'C.1': dtype('int64')}
我们只需要比较这些字典来获得结果。同时,它还检查数据的类型。
因此,前两个文件之间的比较将为真,而与第三个文件的比较将返回假。
但您始终可以禁用类型检查,在这种情况下,我们只会检查`[A, B, C]`是否与`[A, C, B]`相同,而不会比较它们的类型。
<details>
<summary>英文:</summary>
This code snippet will work fine:
def areColumnSame(df1, df2, checkTypes = True):
if checkTypes:
type1 = dict(df1.dtypes)
type2 = dict(df2.dtypes)
return type1 == type2
else:
col1 = list(df1.columns)
col2 = list(df2.columns)
col1.sort()
col2.sort()
return col1 == col2
To show how the above code works let us explore examples:
Consider three excel files:
| A | B | C |
|---|---|---|
| 1 | 2 | 3 |
| 4 | 5 | 6 |
| A | C | B |
|---|---|---|
| 1 | 3 | 2 |
| 4 | 6 | 5 |
| A | B | C | A.1 | B.1 | C.1 |
|---|---|---|-----|-----|-----|
| 1 | 2 | 3 | 1 | 2 | 3 |
| 4 | 5 | 6 | 4 | 5 | 6 |
Now for the first file the `dict(df.dtypes)` is shown below:
{'A': dtype('int64'),
'B': dtype('int64'),
'C': dtype('int64')}
Similarly for other two files:
{'A': dtype('int64'),
'C': dtype('int64'),
'B': dtype('int64')}
and
{'A': dtype('int64'),
'B': dtype('int64'),
'C': dtype('int64'),
'A.1': dtype('int64'),
'B.1': dtype('int64'),
'C.1': dtype('int64')}
We just need to compare these dictionaries to get the result. At the same time, it also checks for the type of data.
Hence for the comparison between the first two files will be true whereas the comparison with third will return false.
But you can always disable the type-checking in which case we will just check whether `[A, B, C]` is the same as `[A, C, B]` without comparing their types.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论