列出并比较两个数据框之间的数据类型。

huangapple go评论58阅读模式
英文:

List and Compare data types between two dataframes

问题

我有两个数据框,其中列的计数和标题应该匹配。但列的数据类型可能不同。

例如,我有两个数据框 - df1 和 df2。DF1如下所示:

  • Geometry列具有几何数据类型
  • A列是整数
  • B列是字符串

在DF2中,A列是字符串,但应该是整数(就像在DF1中一样)。

我尝试过根据数据类型获取列的计数,并成功使用 df1.dtypes.value_counts()

我还尝试过使用 groupby 根据数据类型列出所有列名,但必须删除df中的几何列,因为它会引发类型错误。我设法在创建一个新的数据框后,删除了几何列后获得了列表。

我现在想比较这两个数据框及其列的数据类型,并列出不匹配的列。我还尝试使用 equals 方法,但结果是 FALSE

英文:

I have two data frames where the count and headers of the columns are supposed to match. But the column data type may be different. <br />
I want to be able to list out the columns as per the data type and then compare between the two, again giving me a list of column headers whose data types are not matching.

For example, I have two data frames - df1 and df2. DF1 is like below where

  • Geometry has the geometry data type
  • A is an integer
  • B is string
Geometry A B
123456 1 x
78.900 2 b

And in DF2, A is a string, whereas it should be an integer (like in DF1)

I have tried getting the count of the columns based on data types, and was able to get so using df1.dtypes.value_counts()<br /><br /> I have also tried groupby to list all the column names based on the data type using <br />g = df1.columns.to_series().groupby(df1.dtypes).groups <br /> But in order to use the groupby, I have to delete the geometry column from the df as it is throwing a TypeError for this. I managed to get the list after creating a new df where I dropped the geometry column. <br /> I want to now compare the two dataframes and their columns' data types, and list the same. <br />
I also tried using equals like df1.equals(df2) which provided FALSE.

答案1

得分: 0

假设两个数据框具有相同数量的列和相同的列名,这将为您提供数据类型不同的列名列表:

dt = (df1.dtypes.sort_index() == df2.dtypes.sort_index())
    
dt.loc[dt == False].index.to_list()
英文:

Assuming the two data frames have the same number of columns and same column names, this will give you the list of column names for which the dtypes are different:

dt = (df1.dtypes.sort_index() == df2.dtypes.sort_index())

dt.loc[dt == False].index.to_list()

huangapple
  • 本文由 发表于 2023年3月15日 20:36:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75744793.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定