英文:
Comparing two lists in pandas gives TypeError: 'float' object is not iterable
问题
我想比较包含字符串列表的DataFrame中的两列。我输入数据的一部分如下所示:
我尝试了不同的方法,使用迭代,但都引发了TypeError错误。以下是我的代码的前一部分:
df['one_one'] = df.apply(lambda row: set(row['one_one']), axis=1)
df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
df['check'] = df.apply(lambda row: row['one_one'] in row['one_one_back'], axis=1)
和错误:
Traceback (most recent call last):
File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <module>
df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/frame.py", line 8740, in apply
return op.apply()
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <lambda>
df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
TypeError: 'float' object is not iterable
代码的第二部分使用itertuples
函数也不起作用:
for x in df.itertuples():
if x.one_one in x.one_one_back:
df['start_end'] = 'start'
else:
df['start_end'] = 0
并引发错误:
Traceback (most recent call last):
File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 55, in <module>
if x.one_one in x.one_one_back:
TypeError: argument of type 'float' is not iterable
我已经检查了数据类型:
print(df.loc[0, 'one_one'], type(df.loc[0, 'one_one']))
print(df.loc[1, 'one_one_back'], type(df.loc[1, 'one_one_back']))
for x in df.loc[1, 'one_one_back']:
print(x, type(x))
并且可以确定这些都是字符串列表:
['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
P1-88 <class 'str'>
P1-89 <class 'str'>
P1-26 <class 'str'>
P1-12 <class 'str'>
那么为什么会出现浮点数,以及为什么会出现TypeError错误?请帮助我解决这个问题,因为我感到非常困惑...
英文:
I want to compare two columns in the dataframe containing lists of strings. A part of my input data looks like that:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1-25 P1-93 P1-88 P1-6 P1-89 P1-26 P1-12 P1-92 P1-22 P1-90 P1-28 P1-95 one_one zero_zero one_one_back zero_zero_next
0 NC_064017.1 153210 . T C . . . GT 0/0 0/0 1/1 0/0 1/1 1/1 1/1 0/0 0/0 0/1 0/0 0/0 [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95] NaN [P1-25, P1-12, P1-22]
1 NC_064017.1 965007 . A G . . . GT 0/0 1/1 . 0/1 1/1 . 0/0 1/1 0/0 0/1 . 0/1 [P1-93, P1-89, P1-92] [P1-25, P1-12, P1-22] [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-88, P1-12, P1-22, P1-28]
2 NC_064017.1 965038 . C T . . . GT 0/0 1/1 0/0 0/1 1/1 . 0/0 1/1 0/0 0/1 0/0 0/1 [P1-93, P1-89, P1-92] [P1-25, P1-88, P1-12, P1-22, P1-28] [P1-93, P1-89, P1-92] [P1-93, P1-26, P1-92]
3 NC_064017.1 1084455 . A G . . . GT 1/1 0/0 . 1/1 1/1 0/0 1/1 0/0 0/1 1/1 0/1 1/1 [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-26, P1-92] [P1-93, P1-89, P1-92] [P1-25, P1-6, P1-28]
4 NC_064017.1 1117756 . A C . . . GT 0/0 0/1 1/1 0/0 . 1/1 1/1 1/1 1/1 1/1 0/0 1/1 [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-25, P1-6, P1-28] [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-22, P1-90, P1-28]
5 NC_064017.1 1250643 . T C . . . GT 0/1 0/1 0/1 1/1 0/1 1/1 0/1 0/1 0/0 0/0 0/0 1/1 [P1-6, P1-26, P1-95] [P1-22, P1-90, P1-28] [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-22, P1-90, P1-28]
6 NC_064017.1 1250740 . T A . . . GT 0/1 1/1 0/1 1/1 0/1 1/1 0/1 0/1 0/0 0/0 0/0 0/1 [P1-93, P1-6, P1-26] [P1-22, P1-90, P1-28] [P1-6, P1-26, P1-95] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]
7 NC_064017.1 1372722 . A C . . . GT 1/1 0/0 1/1 0/0 0/0 1/1 0/0 1/1 1/1 0/0 1/1 0/0 [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-6, P1-26] [P1-93, P1-26, P1-28]
8 NC_064017.1 1502890 . G T . . . GT . 0/0 1/1 0/1 0/1 0/0 1/1 1/1 1/1 0/1 0/0 1/1 [P1-88, P1-12, P1-92, P1-22, P1-95] [P1-93, P1-26, P1-28] [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-89, P1-26, P1-95]
I've tried different methods using iteration but all raise the TypeError. Here is the first part of my code:
df['one_one'] = df.apply(lambda row: set(row['one_one']), axis=1)
df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
df['check'] = df.apply(lambda row: row['one_one'] in row['one_one_back'], axis=1)
and the error:
Traceback (most recent call last):
File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <module>
df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/frame.py", line 8740, in apply
return op.apply()
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <lambda>
df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
TypeError: 'float' object is not iterable
The second part of the code with itertuples
function doesn't work too:
for x in df.itertuples():
if x.one_one in x.one_one_back:
df['start_end'] = 'start'
else:
df['start_end'] = 0
and raises the error:
Traceback (most recent call last):
File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 55, in <module>
if x.one_one in x.one_one_back:
TypeError: argument of type 'float' is not iterable
I've checked the data types:
print(df.loc[0, 'one_one'], type(df.loc[0, 'one_one']))
print(df.loc[1, 'one_one_back'], type(df.loc[1, 'one_one_back']))
for x in df.loc[1, 'one_one_back']:
print(x, type(x))
and for sure there are lists of strings:
['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
P1-88 <class 'str'>
P1-89 <class 'str'>
P1-26 <class 'str'>
P1-12 <class 'str'>
So where is the float and why do I see the TypeError? Please, help because I'm totally confused...
答案1
得分: 1
你可以使用列表推导式循环遍历你的一对一配对,并使用布尔索引仅考虑具有非NaN值的行:
# 哪些行在两列中都有非NaN值?
m = df[['one_one', 'one_one_back']].notna().all(axis=1)
# 对于那些行,检查第一个列表是否是第二个列表的子集
df.loc[m, 'check'] = [set(a) <= set(b) for a, b in
zip(df.loc[m, 'one_one'], df.loc[m, 'one_one_back'])]
输出:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1-25 ... P1-22 P1-90 P1-28 P1-95 one_one \
0 NC_064017.1 153210 . T C . . . GT 0/0 ... 0/0 0/1 0/0 0/0 [P1-88, P1-89, P1-26, P1-12]
1 NC_064017.1 965007 . A G . . . GT 0/0 ... 0/1 . 0/1 0/1 [P1-93, P1-89, P1-92]
2 NC_064017.1 965038 . C T . . . GT 0/0 ... 0/1 0/0 0/1 0/1 [P1-93, P1-89, P1-92]
3 NC_064017.1 1084455 . A G . . . GT 1/1 ... 1/1 0/1 1/1 1/1 [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]
4 NC_064017.1 1117756 . A C . . . GT 0/0 ... 1/1 1/1 0/1 1/1 [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]
5 NC_064017.1 1250643 . T C . . . GT 0/1 ... 0/0 0/0 0/0 1/1 [P1-6, P1-26, P1-95]
6 NC_064017.1 1250740 . T A . . . GT 0/1 ... 0/0 0/0 0/1 0/1 [P1-93, P1-6, P1-26]
7 NC_064017.1 1372722 . A C . . . GT 1/1 ... 1/1 0/0 1/1 0/0 [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]
8 NC_064017.1 1502890 . G T . . . GT . ... 0/1 0/0 0/1 1/1 [P1-88, P1-12, P1-92, P1-22, P1-95]
zero_zero one_one_back zero_zero_next check
0 [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95] NaN [P1-25, P1-12, P1-22] NaN
1 [P1-25, P1-12, P1-22] [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-88, P1-12, P1-22, P1-28] False
2 [P1-25, P1-88, P1-12, P1-22, P1-28] [P1-93, P1-89, P1-92] [P1-93, P1-26, P1-92] True
3 [P1-93, P1-26, P1-92] [P1-93, P1-89, P1-92] [P1-25, P1-6, P1-28] False
4 [P1-25, P1-6, P1-28] [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-22, P1-90, P1-28] False
5 [P1-22, P1-90, P1-28] [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-22, P1-90, P1-28] False
6 [P1-22, P1-90, P1-28] [P1-6, P1-26, P1-95] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] False
7 [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-6, P1-26] [P1-93, P1-26, P1-28] False
8 [P1-93, P1-26, P1-28] [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-89, P1-26, P1-95] False
英文:
You can use a list comprehension to loop over your pairs, and boolean indexing to only consider the rows with non-NaN values:
# which rows have non-NA values in both columns?
m = df[['one_one', 'one_one_back']].notna().all(axis=1)
# for those, check if the first list is a subset of the second one
df.loc[m, 'check'] = [set(a)<=set(b) for a, b in
zip(df.loc[m, 'one_one'], df.loc[m, 'one_one_back'])]
Output:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1-25 ... P1-92 P1-22 P1-90 P1-28 P1-95 one_one \
0 NC_064017.1 153210 . T C . . . GT 0/0 ... 0/0 0/0 0/1 0/0 0/0 [P1-88, P1-89, P1-26, P1-12]
1 NC_064017.1 965007 . A G . . . GT 0/0 ... 1/1 0/0 0/1 . 0/1 [P1-93, P1-89, P1-92]
2 NC_064017.1 965038 . C T . . . GT 0/0 ... 1/1 0/0 0/1 0/0 0/1 [P1-93, P1-89, P1-92]
3 NC_064017.1 1084455 . A G . . . GT 1/1 ... 0/0 0/1 1/1 0/1 1/1 [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]
4 NC_064017.1 1117756 . A C . . . GT 0/0 ... 1/1 1/1 1/1 0/0 1/1 [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]
5 NC_064017.1 1250643 . T C . . . GT 0/1 ... 0/1 0/0 0/0 0/0 1/1 [P1-6, P1-26, P1-95]
6 NC_064017.1 1250740 . T A . . . GT 0/1 ... 0/1 0/0 0/0 0/0 0/1 [P1-93, P1-6, P1-26]
7 NC_064017.1 1372722 . A C . . . GT 1/1 ... 1/1 1/1 0/0 1/1 0/0 [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]
8 NC_064017.1 1502890 . G T . . . GT . ... 1/1 1/1 0/1 0/0 1/1 [P1-88, P1-12, P1-92, P1-22, P1-95]
zero_zero one_one_back zero_zero_next check
0 [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95] NaN [P1-25, P1-12, P1-22] NaN
1 [P1-25, P1-12, P1-22] [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-88, P1-12, P1-22, P1-28] False
2 [P1-25, P1-88, P1-12, P1-22, P1-28] [P1-93, P1-89, P1-92] [P1-93, P1-26, P1-92] True
3 [P1-93, P1-26, P1-92] [P1-93, P1-89, P1-92] [P1-25, P1-6, P1-28] False
4 [P1-25, P1-6, P1-28] [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-22, P1-90, P1-28] False
5 [P1-22, P1-90, P1-28] [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-22, P1-90, P1-28] False
6 [P1-22, P1-90, P1-28] [P1-6, P1-26, P1-95] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] False
7 [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-6, P1-26] [P1-93, P1-26, P1-28] False
8 [P1-93, P1-26, P1-28] [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-89, P1-26, P1-95] False
[9 rows x 26 columns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论