在pandas中比较两个列表会引发TypeError错误:“float”对象不可迭代。

huangapple go评论123阅读模式
英文:

Comparing two lists in pandas gives TypeError: 'float' object is not iterable

问题

我想比较包含字符串列表的DataFrame中的两列。我输入数据的一部分如下所示:

我尝试了不同的方法,使用迭代,但都引发了TypeError错误。以下是我的代码的前一部分:

  1. df['one_one'] = df.apply(lambda row: set(row['one_one']), axis=1)
  2. df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
  3. df['check'] = df.apply(lambda row: row['one_one'] in row['one_one_back'], axis=1)

和错误:

  1. Traceback (most recent call last):
  2. File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <module>
  3. df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
  4. File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/frame.py", line 8740, in apply
  5. return op.apply()
  6. File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
  7. return self.apply_standard()
  8. File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
  9. results, res_index = self.apply_series_generator()
  10. File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
  11. results[i] = self.f(v)
  12. File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <lambda>
  13. df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
  14. TypeError: 'float' object is not iterable

代码的第二部分使用itertuples函数也不起作用:

  1. for x in df.itertuples():
  2. if x.one_one in x.one_one_back:
  3. df['start_end'] = 'start'
  4. else:
  5. df['start_end'] = 0

并引发错误:

  1. Traceback (most recent call last):
  2. File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 55, in <module>
  3. if x.one_one in x.one_one_back:
  4. TypeError: argument of type 'float' is not iterable

我已经检查了数据类型:

  1. print(df.loc[0, 'one_one'], type(df.loc[0, 'one_one']))
  2. print(df.loc[1, 'one_one_back'], type(df.loc[1, 'one_one_back']))
  3. for x in df.loc[1, 'one_one_back']:
  4. print(x, type(x))

并且可以确定这些都是字符串列表:

  1. ['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
  2. ['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
  3. P1-88 <class 'str'>
  4. P1-89 <class 'str'>
  5. P1-26 <class 'str'>
  6. P1-12 <class 'str'>

那么为什么会出现浮点数,以及为什么会出现TypeError错误?请帮助我解决这个问题,因为我感到非常困惑...

英文:

I want to compare two columns in the dataframe containing lists of strings. A part of my input data looks like that:

  1. CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1-25 P1-93 P1-88 P1-6 P1-89 P1-26 P1-12 P1-92 P1-22 P1-90 P1-28 P1-95 one_one zero_zero one_one_back zero_zero_next
  2. 0 NC_064017.1 153210 . T C . . . GT 0/0 0/0 1/1 0/0 1/1 1/1 1/1 0/0 0/0 0/1 0/0 0/0 [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95] NaN [P1-25, P1-12, P1-22]
  3. 1 NC_064017.1 965007 . A G . . . GT 0/0 1/1 . 0/1 1/1 . 0/0 1/1 0/0 0/1 . 0/1 [P1-93, P1-89, P1-92] [P1-25, P1-12, P1-22] [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-88, P1-12, P1-22, P1-28]
  4. 2 NC_064017.1 965038 . C T . . . GT 0/0 1/1 0/0 0/1 1/1 . 0/0 1/1 0/0 0/1 0/0 0/1 [P1-93, P1-89, P1-92] [P1-25, P1-88, P1-12, P1-22, P1-28] [P1-93, P1-89, P1-92] [P1-93, P1-26, P1-92]
  5. 3 NC_064017.1 1084455 . A G . . . GT 1/1 0/0 . 1/1 1/1 0/0 1/1 0/0 0/1 1/1 0/1 1/1 [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-26, P1-92] [P1-93, P1-89, P1-92] [P1-25, P1-6, P1-28]
  6. 4 NC_064017.1 1117756 . A C . . . GT 0/0 0/1 1/1 0/0 . 1/1 1/1 1/1 1/1 1/1 0/0 1/1 [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-25, P1-6, P1-28] [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-22, P1-90, P1-28]
  7. 5 NC_064017.1 1250643 . T C . . . GT 0/1 0/1 0/1 1/1 0/1 1/1 0/1 0/1 0/0 0/0 0/0 1/1 [P1-6, P1-26, P1-95] [P1-22, P1-90, P1-28] [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-22, P1-90, P1-28]
  8. 6 NC_064017.1 1250740 . T A . . . GT 0/1 1/1 0/1 1/1 0/1 1/1 0/1 0/1 0/0 0/0 0/0 0/1 [P1-93, P1-6, P1-26] [P1-22, P1-90, P1-28] [P1-6, P1-26, P1-95] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]
  9. 7 NC_064017.1 1372722 . A C . . . GT 1/1 0/0 1/1 0/0 0/0 1/1 0/0 1/1 1/1 0/0 1/1 0/0 [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-6, P1-26] [P1-93, P1-26, P1-28]
  10. 8 NC_064017.1 1502890 . G T . . . GT . 0/0 1/1 0/1 0/1 0/0 1/1 1/1 1/1 0/1 0/0 1/1 [P1-88, P1-12, P1-92, P1-22, P1-95] [P1-93, P1-26, P1-28] [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-89, P1-26, P1-95]

I've tried different methods using iteration but all raise the TypeError. Here is the first part of my code:

  1. df[&#39;one_one&#39;] = df.apply(lambda row: set(row[&#39;one_one&#39;]), axis=1)
  2. df[&#39;one_one_back&#39;] = df.apply(lambda row: set(row[&#39;one_one_back&#39;]), axis=1)
  3. df[&#39;check&#39;] = df.apply(lambda row: row[&#39;one_one&#39;] in row[&#39;one_one_back&#39;], axis=1)

and the error:

  1. Traceback (most recent call last):
  2. File &quot;/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py&quot;, line 47, in &lt;module&gt;
  3. df[&#39;one_one_back&#39;] = df.apply(lambda row: set(row[&#39;one_one_back&#39;]), axis=1)
  4. File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/frame.py&quot;, line 8740, in apply
  5. return op.apply()
  6. File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py&quot;, line 688, in apply
  7. return self.apply_standard()
  8. File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py&quot;, line 812, in apply_standard
  9. results, res_index = self.apply_series_generator()
  10. File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py&quot;, line 828, in apply_series_generator
  11. results[i] = self.f(v)
  12. File &quot;/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py&quot;, line 47, in &lt;lambda&gt;
  13. df[&#39;one_one_back&#39;] = df.apply(lambda row: set(row[&#39;one_one_back&#39;]), axis=1)
  14. TypeError: &#39;float&#39; object is not iterable

The second part of the code with itertuples function doesn't work too:

  1. for x in df.itertuples():
  2. if x.one_one in x.one_one_back:
  3. df[&#39;start_end&#39;] = &#39;start&#39;
  4. else:
  5. df[&#39;start_end&#39;] = 0

and raises the error:

  1. Traceback (most recent call last):
  2. File &quot;/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py&quot;, line 55, in &lt;module&gt;
  3. if x.one_one in x.one_one_back:
  4. TypeError: argument of type &#39;float&#39; is not iterable

I've checked the data types:

  1. print(df.loc[0, &#39;one_one&#39;], type(df.loc[0, &#39;one_one&#39;]))
  2. print(df.loc[1, &#39;one_one_back&#39;], type(df.loc[1, &#39;one_one_back&#39;]))
  3. for x in df.loc[1, &#39;one_one_back&#39;]:
  4. print(x, type(x))

and for sure there are lists of strings:

  1. [&#39;P1-88&#39;, &#39;P1-89&#39;, &#39;P1-26&#39;, &#39;P1-12&#39;] &lt;class &#39;list&#39;&gt;
  2. [&#39;P1-88&#39;, &#39;P1-89&#39;, &#39;P1-26&#39;, &#39;P1-12&#39;] &lt;class &#39;list&#39;&gt;
  3. P1-88 &lt;class &#39;str&#39;&gt;
  4. P1-89 &lt;class &#39;str&#39;&gt;
  5. P1-26 &lt;class &#39;str&#39;&gt;
  6. P1-12 &lt;class &#39;str&#39;&gt;

So where is the float and why do I see the TypeError? Please, help because I'm totally confused...

答案1

得分: 1

你可以使用列表推导式循环遍历你的一对一配对,并使用布尔索引仅考虑具有非NaN值的行:

  1. # 哪些行在两列中都有非NaN值?
  2. m = df[['one_one', 'one_one_back']].notna().all(axis=1)
  3. # 对于那些行,检查第一个列表是否是第二个列表的子集
  4. df.loc[m, 'check'] = [set(a) <= set(b) for a, b in
  5. zip(df.loc[m, 'one_one'], df.loc[m, 'one_one_back'])]

输出:

  1. CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1-25 ... P1-22 P1-90 P1-28 P1-95 one_one \
  2. 0 NC_064017.1 153210 . T C . . . GT 0/0 ... 0/0 0/1 0/0 0/0 [P1-88, P1-89, P1-26, P1-12]
  3. 1 NC_064017.1 965007 . A G . . . GT 0/0 ... 0/1 . 0/1 0/1 [P1-93, P1-89, P1-92]
  4. 2 NC_064017.1 965038 . C T . . . GT 0/0 ... 0/1 0/0 0/1 0/1 [P1-93, P1-89, P1-92]
  5. 3 NC_064017.1 1084455 . A G . . . GT 1/1 ... 1/1 0/1 1/1 1/1 [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]
  6. 4 NC_064017.1 1117756 . A C . . . GT 0/0 ... 1/1 1/1 0/1 1/1 [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]
  7. 5 NC_064017.1 1250643 . T C . . . GT 0/1 ... 0/0 0/0 0/0 1/1 [P1-6, P1-26, P1-95]
  8. 6 NC_064017.1 1250740 . T A . . . GT 0/1 ... 0/0 0/0 0/1 0/1 [P1-93, P1-6, P1-26]
  9. 7 NC_064017.1 1372722 . A C . . . GT 1/1 ... 1/1 0/0 1/1 0/0 [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]
  10. 8 NC_064017.1 1502890 . G T . . . GT . ... 0/1 0/0 0/1 1/1 [P1-88, P1-12, P1-92, P1-22, P1-95]
  11. zero_zero one_one_back zero_zero_next check
  12. 0 [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95] NaN [P1-25, P1-12, P1-22] NaN
  13. 1 [P1-25, P1-12, P1-22] [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-88, P1-12, P1-22, P1-28] False
  14. 2 [P1-25, P1-88, P1-12, P1-22, P1-28] [P1-93, P1-89, P1-92] [P1-93, P1-26, P1-92] True
  15. 3 [P1-93, P1-26, P1-92] [P1-93, P1-89, P1-92] [P1-25, P1-6, P1-28] False
  16. 4 [P1-25, P1-6, P1-28] [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-22, P1-90, P1-28] False
  17. 5 [P1-22, P1-90, P1-28] [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-22, P1-90, P1-28] False
  18. 6 [P1-22, P1-90, P1-28] [P1-6, P1-26, P1-95] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] False
  19. 7 [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-6, P1-26] [P1-93, P1-26, P1-28] False
  20. 8 [P1-93, P1-26, P1-28] [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-89, P1-26, P1-95] False
英文:

You can use a list comprehension to loop over your pairs, and boolean indexing to only consider the rows with non-NaN values:

  1. # which rows have non-NA values in both columns?
  2. m = df[[&#39;one_one&#39;, &#39;one_one_back&#39;]].notna().all(axis=1)
  3. # for those, check if the first list is a subset of the second one
  4. df.loc[m, &#39;check&#39;] = [set(a)&lt;=set(b) for a, b in
  5. zip(df.loc[m, &#39;one_one&#39;], df.loc[m, &#39;one_one_back&#39;])]

Output:

  1. CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P1-25 ... P1-92 P1-22 P1-90 P1-28 P1-95 one_one \
  2. 0 NC_064017.1 153210 . T C . . . GT 0/0 ... 0/0 0/0 0/1 0/0 0/0 [P1-88, P1-89, P1-26, P1-12]
  3. 1 NC_064017.1 965007 . A G . . . GT 0/0 ... 1/1 0/0 0/1 . 0/1 [P1-93, P1-89, P1-92]
  4. 2 NC_064017.1 965038 . C T . . . GT 0/0 ... 1/1 0/0 0/1 0/0 0/1 [P1-93, P1-89, P1-92]
  5. 3 NC_064017.1 1084455 . A G . . . GT 1/1 ... 0/0 0/1 1/1 0/1 1/1 [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]
  6. 4 NC_064017.1 1117756 . A C . . . GT 0/0 ... 1/1 1/1 1/1 0/0 1/1 [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]
  7. 5 NC_064017.1 1250643 . T C . . . GT 0/1 ... 0/1 0/0 0/0 0/0 1/1 [P1-6, P1-26, P1-95]
  8. 6 NC_064017.1 1250740 . T A . . . GT 0/1 ... 0/1 0/0 0/0 0/0 0/1 [P1-93, P1-6, P1-26]
  9. 7 NC_064017.1 1372722 . A C . . . GT 1/1 ... 1/1 1/1 0/0 1/1 0/0 [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]
  10. 8 NC_064017.1 1502890 . G T . . . GT . ... 1/1 1/1 0/1 0/0 1/1 [P1-88, P1-12, P1-92, P1-22, P1-95]
  11. zero_zero one_one_back zero_zero_next check
  12. 0 [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95] NaN [P1-25, P1-12, P1-22] NaN
  13. 1 [P1-25, P1-12, P1-22] [P1-88, P1-89, P1-26, P1-12] [P1-25, P1-88, P1-12, P1-22, P1-28] False
  14. 2 [P1-25, P1-88, P1-12, P1-22, P1-28] [P1-93, P1-89, P1-92] [P1-93, P1-26, P1-92] True
  15. 3 [P1-93, P1-26, P1-92] [P1-93, P1-89, P1-92] [P1-25, P1-6, P1-28] False
  16. 4 [P1-25, P1-6, P1-28] [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-22, P1-90, P1-28] False
  17. 5 [P1-22, P1-90, P1-28] [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95] [P1-22, P1-90, P1-28] False
  18. 6 [P1-22, P1-90, P1-28] [P1-6, P1-26, P1-95] [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] False
  19. 7 [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95] [P1-93, P1-6, P1-26] [P1-93, P1-26, P1-28] False
  20. 8 [P1-93, P1-26, P1-28] [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28] [P1-89, P1-26, P1-95] False
  21. [9 rows x 26 columns]

huangapple
  • 本文由 发表于 2023年6月1日 17:32:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380495.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定