英文:
Checking if elements in my dataframe columns have the same type
问题
我使用Python和DataFrame df
一起工作。在尝试检查所有列的每行是否具有相同类型时,我编写了以下代码:
a=0
first_object = df.loc[df.index[0]]
for column in df:
for i in range(0,len(df)):
if type(df[column][i]) != type(first_object[column]):
a+=1
print(a)
我得到的错误是:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
...
KeyError: 155
我感到困惑,因为type(df[column][i])
和type(first_object[column])
单独使用时都能正常工作。我尝试了匹配类型和不匹配类型,预期地返回了True
和False
。所以我不明白为什么我的代码不起作用。
英文:
I'm using Python and work with a dataframe df
. When trying to check if for all columns, each row has the same type I wrote the following lines :
a=0
first_object = df.loc[df.index[0]]
for column in df:
for i in range(0,len(df)):
if type(df[column][i]) != type(first_object[column]):
a+=1
print(a)
The error I got is :
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 155
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/var/folders/xb/74q_24bx0rxgqc6gtd6ksn7c0000gn/T/ipykernel_25626/3160699232.py in <module>
3 for column in df:
4 for i in range(0,len(df)):
----> 5 if type(df[column][i]) != type(first_object[column]):
6 a+=1
~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/series.py in __getitem__(self, key)
940
941 elif key_is_scalar:
--> 942 return self._get_value(key)
943
944 if is_hashable(key):
~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
1049
1050 # Similar to Index.get_value, but we do not fall back to positional
-> 1051 loc = self.index.get_loc(label)
1052 return self.index._get_values_for_loc(self, loc, label)
1053
~/opt/anaconda3/envs/adsml/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 155
I am confused as both type(df[column][i])
and type(first_object[column])
works separately. I tried it with matching types and non-matching types, and True
and False
were returned as expected. So I don't understand why my code is not working.
答案1
得分: 1
如果我理解正确,您想统计具有唯一对象类型的列数。
您可以使用:
df.applymap(type).nunique().eq(1).sum()
修正您的代码:
a = 0
first_object = df.iloc[0]
for column in df:
for i in df.index:
if type(df.loc[i, column]) != type(first_object[column]):
a += 1
矢量等效(计算与第一行不同的值)将是:
df2 = df.applymap(type)
out = df2.ne(df2.iloc[0]).sum().sum()
英文:
If I understand correctly, you want to count the number of columns that have a unique type of object.
You can use:
df.applymap(type).nunique().eq(1).sum()
fixing your code:
I wouldn't use a loop in real-life!
a=0
first_object = df.iloc[0]
for column in df:
for i in df.index:
if type(df.loc[i, column]) != type(first_object[column]):
a+=1
The vectorial equivalent (counting the values that differ from your first row) would be:
df2 = df.applymap(type)
out = df2.ne(df2.iloc[0]).sum().sum()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论