检查我的数据框列中的元素是否具有相同的类型

huangapple go评论92阅读模式
英文:

Checking if elements in my dataframe columns have the same type

问题

我使用Python和DataFrame df一起工作。在尝试检查所有列的每行是否具有相同类型时,我编写了以下代码:

  1. a=0
  2. first_object = df.loc[df.index[0]]
  3. for column in df:
  4. for i in range(0,len(df)):
  5. if type(df[column][i]) != type(first_object[column]):
  6. a+=1
  7. print(a)

我得到的错误是:

  1. ---------------------------------------------------------------------------
  2. KeyError Traceback (most recent call last)
  3. ~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
  4. 3360 try:
  5. -> 3361 return self._engine.get_loc(casted_key)
  6. 3362 except KeyError as err:
  7. ...
  8. KeyError: 155

我感到困惑,因为type(df[column][i])type(first_object[column])单独使用时都能正常工作。我尝试了匹配类型和不匹配类型,预期地返回了TrueFalse。所以我不明白为什么我的代码不起作用。

英文:

I'm using Python and work with a dataframe df. When trying to check if for all columns, each row has the same type I wrote the following lines :

  1. a=0
  2. first_object = df.loc[df.index[0]]
  3. for column in df:
  4. for i in range(0,len(df)):
  5. if type(df[column][i]) != type(first_object[column]):
  6. a+=1
  7. print(a)

The error I got is :

  1. ---------------------------------------------------------------------------
  2. KeyError Traceback (most recent call last)
  3. ~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
  4. 3360 try:
  5. -> 3361 return self._engine.get_loc(casted_key)
  6. 3362 except KeyError as err:
  7. ~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
  8. ~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
  9. pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
  10. pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
  11. KeyError: 155
  12. The above exception was the direct cause of the following exception:
  13. KeyError Traceback (most recent call last)
  14. /var/folders/xb/74q_24bx0rxgqc6gtd6ksn7c0000gn/T/ipykernel_25626/3160699232.py in <module>
  15. 3 for column in df:
  16. 4 for i in range(0,len(df)):
  17. ----> 5 if type(df[column][i]) != type(first_object[column]):
  18. 6 a+=1
  19. ~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/series.py in __getitem__(self, key)
  20. 940
  21. 941 elif key_is_scalar:
  22. --> 942 return self._get_value(key)
  23. 943
  24. 944 if is_hashable(key):
  25. ~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
  26. 1049
  27. 1050 # Similar to Index.get_value, but we do not fall back to positional
  28. -> 1051 loc = self.index.get_loc(label)
  29. 1052 return self.index._get_values_for_loc(self, loc, label)
  30. 1053
  31. ~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
  32. 3361 return self._engine.get_loc(casted_key)
  33. 3362 except KeyError as err:
  34. -> 3363 raise KeyError(key) from err
  35. 3364
  36. 3365 if is_scalar(key) and isna(key) and not self.hasnans:
  37. KeyError: 155

I am confused as both type(df[column][i]) and type(first_object[column]) works separately. I tried it with matching types and non-matching types, and True and False were returned as expected. So I don't understand why my code is not working.

答案1

得分: 1

如果我理解正确,您想统计具有唯一对象类型的列数。

您可以使用:

  1. df.applymap(type).nunique().eq(1).sum()

修正您的代码:

  1. a = 0
  2. first_object = df.iloc[0]
  3. for column in df:
  4. for i in df.index:
  5. if type(df.loc[i, column]) != type(first_object[column]):
  6. a += 1

矢量等效(计算与第一行不同的值)将是:

  1. df2 = df.applymap(type)
  2. out = df2.ne(df2.iloc[0]).sum().sum()
英文:

If I understand correctly, you want to count the number of columns that have a unique type of object.

You can use:

  1. df.applymap(type).nunique().eq(1).sum()

fixing your code:

I wouldn't use a loop in real-life!

  1. a=0
  2. first_object = df.iloc[0]
  3. for column in df:
  4. for i in df.index:
  5. if type(df.loc[i, column]) != type(first_object[column]):
  6. a+=1

The vectorial equivalent (counting the values that differ from your first row) would be:

  1. df2 = df.applymap(type)
  2. out = df2.ne(df2.iloc[0]).sum().sum()

huangapple
  • 本文由 发表于 2023年3月20日 22:49:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75791793.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定