在pandas中比较两个列表会引发TypeError错误:“float”对象不可迭代。

huangapple go评论72阅读模式
英文:

Comparing two lists in pandas gives TypeError: 'float' object is not iterable

问题

我想比较包含字符串列表的DataFrame中的两列。我输入数据的一部分如下所示:

我尝试了不同的方法,使用迭代,但都引发了TypeError错误。以下是我的代码的前一部分:

df['one_one'] = df.apply(lambda row: set(row['one_one']), axis=1)
df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
df['check'] = df.apply(lambda row: row['one_one'] in row['one_one_back'], axis=1)

和错误:

Traceback (most recent call last):
  File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <module>
    df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
  File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/frame.py", line 8740, in apply
    return op.apply()
  File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
    return self.apply_standard()
  File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
    results[i] = self.f(v)
  File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 47, in <lambda>
    df['one_one_back'] = df.apply(lambda row: set(row['one_one_back']), axis=1)
TypeError: 'float' object is not iterable

代码的第二部分使用itertuples函数也不起作用:

for x in df.itertuples():
    if x.one_one in x.one_one_back:
        df['start_end'] = 'start'
    else:
        df['start_end'] = 0

并引发错误:

Traceback (most recent call last):
  File "/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py", line 55, in <module>
    if x.one_one in x.one_one_back:
TypeError: argument of type 'float' is not iterable

我已经检查了数据类型:

print(df.loc[0, 'one_one'], type(df.loc[0, 'one_one']))
print(df.loc[1, 'one_one_back'], type(df.loc[1, 'one_one_back']))

for x in df.loc[1, 'one_one_back']:
    print(x, type(x))

并且可以确定这些都是字符串列表:

['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
['P1-88', 'P1-89', 'P1-26', 'P1-12'] <class 'list'>
P1-88 <class 'str'>
P1-89 <class 'str'>
P1-26 <class 'str'>
P1-12 <class 'str'>

那么为什么会出现浮点数,以及为什么会出现TypeError错误?请帮助我解决这个问题,因为我感到非常困惑...

英文:

I want to compare two columns in the dataframe containing lists of strings. A part of my input data looks like that:

          CHROM      POS ID REF ALT QUAL FILTER INFO FORMAT P1-25 P1-93 P1-88 P1-6 P1-89 P1-26 P1-12 P1-92 P1-22 P1-90 P1-28 P1-95                                                  one_one                                         zero_zero                                       one_one_back                             zero_zero_next
0   NC_064017.1   153210  .   T   C    .      .    .     GT   0/0   0/0   1/1  0/0   1/1   1/1   1/1   0/0   0/0   0/1   0/0   0/0                             [P1-88, P1-89, P1-26, P1-12]  [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95]                                                NaN                      [P1-25, P1-12, P1-22]
1   NC_064017.1   965007  .   A   G    .      .    .     GT   0/0   1/1     .  0/1   1/1     .   0/0   1/1   0/0   0/1     .   0/1                                    [P1-93, P1-89, P1-92]                             [P1-25, P1-12, P1-22]                       [P1-88, P1-89, P1-26, P1-12]        [P1-25, P1-88, P1-12, P1-22, P1-28]
2   NC_064017.1   965038  .   C   T    .      .    .     GT   0/0   1/1   0/0  0/1   1/1     .   0/0   1/1   0/0   0/1   0/0   0/1                                    [P1-93, P1-89, P1-92]               [P1-25, P1-88, P1-12, P1-22, P1-28]                              [P1-93, P1-89, P1-92]                      [P1-93, P1-26, P1-92]
3   NC_064017.1  1084455  .   A   G    .      .    .     GT   1/1   0/0     .  1/1   1/1   0/0   1/1   0/0   0/1   1/1   0/1   1/1                [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]                             [P1-93, P1-26, P1-92]                              [P1-93, P1-89, P1-92]                       [P1-25, P1-6, P1-28]
4   NC_064017.1  1117756  .   A   C    .      .    .     GT   0/0   0/1   1/1  0/0     .   1/1   1/1   1/1   1/1   1/1   0/0   1/1        [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]                              [P1-25, P1-6, P1-28]          [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]                      [P1-22, P1-90, P1-28]
5   NC_064017.1  1250643  .   T   C    .      .    .     GT   0/1   0/1   0/1  1/1   0/1   1/1   0/1   0/1   0/0   0/0   0/0   1/1                                     [P1-6, P1-26, P1-95]                             [P1-22, P1-90, P1-28]  [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]                      [P1-22, P1-90, P1-28]
6   NC_064017.1  1250740  .   T   A    .      .    .     GT   0/1   1/1   0/1  1/1   0/1   1/1   0/1   0/1   0/0   0/0   0/0   0/1                                     [P1-93, P1-6, P1-26]                             [P1-22, P1-90, P1-28]                               [P1-6, P1-26, P1-95]  [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]
7   NC_064017.1  1372722  .   A   C    .      .    .     GT   1/1   0/0   1/1  0/0   0/0   1/1   0/0   1/1   1/1   0/0   1/1   0/0               [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]         [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]                               [P1-93, P1-6, P1-26]                      [P1-93, P1-26, P1-28]
8   NC_064017.1  1502890  .   G   T    .      .    .     GT     .   0/0   1/1  0/1   0/1   0/0   1/1   1/1   1/1   0/1   0/0   1/1                      [P1-88, P1-12, P1-92, P1-22, P1-95]                             [P1-93, P1-26, P1-28]         [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]                      [P1-89, P1-26, P1-95]

I've tried different methods using iteration but all raise the TypeError. Here is the first part of my code:

df[&#39;one_one&#39;] = df.apply(lambda row: set(row[&#39;one_one&#39;]), axis=1)
df[&#39;one_one_back&#39;] = df.apply(lambda row: set(row[&#39;one_one_back&#39;]), axis=1)
df[&#39;check&#39;] = df.apply(lambda row: row[&#39;one_one&#39;] in row[&#39;one_one_back&#39;], axis=1)

and the error:

Traceback (most recent call last):
  File &quot;/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py&quot;, line 47, in &lt;module&gt;
    df[&#39;one_one_back&#39;] = df.apply(lambda row: set(row[&#39;one_one_back&#39;]), axis=1)
  File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/frame.py&quot;, line 8740, in apply
    return op.apply()
  File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py&quot;, line 688, in apply
    return self.apply_standard()
  File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py&quot;, line 812, in apply_standard
    results, res_index = self.apply_series_generator()
  File &quot;/home/emoranska/miniconda3/envs/burak/lib/python3.7/site-packages/pandas/core/apply.py&quot;, line 828, in apply_series_generator
    results[i] = self.f(v)
  File &quot;/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py&quot;, line 47, in &lt;lambda&gt;
    df[&#39;one_one_back&#39;] = df.apply(lambda row: set(row[&#39;one_one_back&#39;]), axis=1)
TypeError: &#39;float&#39; object is not iterable

The second part of the code with itertuples function doesn't work too:

for x in df.itertuples():
    if x.one_one in x.one_one_back:
        df[&#39;start_end&#39;] = &#39;start&#39;
    else:
        df[&#39;start_end&#39;] = 0

and raises the error:

Traceback (most recent call last):
  File &quot;/home/emoranska/Pulpit/burak/homozyg_regions/homozyg_bins.py&quot;, line 55, in &lt;module&gt;
    if x.one_one in x.one_one_back:
TypeError: argument of type &#39;float&#39; is not iterable

I've checked the data types:

print(df.loc[0, &#39;one_one&#39;], type(df.loc[0, &#39;one_one&#39;]))
print(df.loc[1, &#39;one_one_back&#39;], type(df.loc[1, &#39;one_one_back&#39;]))

for x in df.loc[1, &#39;one_one_back&#39;]:
    print(x, type(x))

and for sure there are lists of strings:

[&#39;P1-88&#39;, &#39;P1-89&#39;, &#39;P1-26&#39;, &#39;P1-12&#39;] &lt;class &#39;list&#39;&gt;
[&#39;P1-88&#39;, &#39;P1-89&#39;, &#39;P1-26&#39;, &#39;P1-12&#39;] &lt;class &#39;list&#39;&gt;
P1-88 &lt;class &#39;str&#39;&gt;
P1-89 &lt;class &#39;str&#39;&gt;
P1-26 &lt;class &#39;str&#39;&gt;
P1-12 &lt;class &#39;str&#39;&gt;

So where is the float and why do I see the TypeError? Please, help because I'm totally confused...

答案1

得分: 1

你可以使用列表推导式循环遍历你的一对一配对,并使用布尔索引仅考虑具有非NaN值的行:

# 哪些行在两列中都有非NaN值?
m = df[['one_one', 'one_one_back']].notna().all(axis=1)

# 对于那些行,检查第一个列表是否是第二个列表的子集
df.loc[m, 'check'] = [set(a) <= set(b) for a, b in
                      zip(df.loc[m, 'one_one'], df.loc[m, 'one_one_back'])]

输出:

         CHROM      POS ID REF ALT QUAL FILTER INFO FORMAT P1-25  ... P1-22 P1-90 P1-28 P1-95                                            one_one  \
0  NC_064017.1   153210  .   T   C    .      .    .     GT   0/0  ...   0/0   0/1   0/0   0/0                       [P1-88, P1-89, P1-26, P1-12]   
1  NC_064017.1   965007  .   A   G    .      .    .     GT   0/0  ...   0/1     .   0/1   0/1                              [P1-93, P1-89, P1-92]
2  NC_064017.1   965038  .   C   T    .      .    .     GT   0/0  ...   0/1   0/0   0/1   0/1                              [P1-93, P1-89, P1-92]
3  NC_064017.1  1084455  .   A   G    .      .    .     GT   1/1  ...   1/1   0/1   1/1   1/1          [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]
4  NC_064017.1  1117756  .   A   C    .      .    .     GT   0/0  ...   1/1   1/1   0/1   1/1  [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]
5  NC_064017.1  1250643  .   T   C    .      .    .     GT   0/1  ...   0/0   0/0   0/0   1/1                               [P1-6, P1-26, P1-95]
6  NC_064017.1  1250740  .   T   A    .      .    .     GT   0/1  ...   0/0   0/0   0/1   0/1                               [P1-93, P1-6, P1-26]
7  NC_064017.1  1372722  .   A   C    .      .    .     GT   1/1  ...   1/1   0/0   1/1   0/0         [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]
8  NC_064017.1  1502890  .   G   T    .      .    .     GT     .  ...   0/1   0/0   0/1   1/1                [P1-88, P1-12, P1-92, P1-22, P1-95]

                                          zero_zero                                       one_one_back                             zero_zero_next  check
0  [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95]                                                NaN                      [P1-25, P1-12, P1-22]    NaN
1                             [P1-25, P1-12, P1-22]                       [P1-88, P1-89, P1-26, P1-12]        [P1-25, P1-88, P1-12, P1-22, P1-28]  False
2               [P1-25, P1-88, P1-12, P1-22, P1-28]                              [P1-93, P1-89, P1-92]                      [P1-93, P1-26, P1-92]   True
3                             [P1-93, P1-26, P1-92]                              [P1-93, P1-89, P1-92]                       [P1-25, P1-6, P1-28]  False
4                              [P1-25, P1-6, P1-28]          [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]                      [P1-22, P1-90, P1-28]  False
5                             [P1-22, P1-90, P1-28]  [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]                      [P1-22, P1-90, P1-28]  False
6                             [P1-22, P1-90, P1-28]                               [P1-6, P1-26, P1-95]  [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]  False
7         [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]                               [P1-93, P1-6, P1-26]                      [P1-93, P1-26, P1-28]  False
8                             [P1-93, P1-26, P1-28]         [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]                      [P1-89, P1-26, P1-95]  False
英文:

You can use a list comprehension to loop over your pairs, and boolean indexing to only consider the rows with non-NaN values:

# which rows have non-NA values in both columns?
m = df[[&#39;one_one&#39;, &#39;one_one_back&#39;]].notna().all(axis=1)
# for those, check if the first list is a subset of the second one
df.loc[m, &#39;check&#39;] = [set(a)&lt;=set(b) for a, b in
zip(df.loc[m, &#39;one_one&#39;], df.loc[m, &#39;one_one_back&#39;])]

Output:

         CHROM      POS ID REF ALT QUAL FILTER INFO FORMAT P1-25  ... P1-92 P1-22 P1-90 P1-28 P1-95                                            one_one  \
0  NC_064017.1   153210  .   T   C    .      .    .     GT   0/0  ...   0/0   0/0   0/1   0/0   0/0                       [P1-88, P1-89, P1-26, P1-12]   
1  NC_064017.1   965007  .   A   G    .      .    .     GT   0/0  ...   1/1   0/0   0/1     .   0/1                              [P1-93, P1-89, P1-92]   
2  NC_064017.1   965038  .   C   T    .      .    .     GT   0/0  ...   1/1   0/0   0/1   0/0   0/1                              [P1-93, P1-89, P1-92]   
3  NC_064017.1  1084455  .   A   G    .      .    .     GT   1/1  ...   0/0   0/1   1/1   0/1   1/1          [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]   
4  NC_064017.1  1117756  .   A   C    .      .    .     GT   0/0  ...   1/1   1/1   1/1   0/0   1/1  [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]   
5  NC_064017.1  1250643  .   T   C    .      .    .     GT   0/1  ...   0/1   0/0   0/0   0/0   1/1                               [P1-6, P1-26, P1-95]   
6  NC_064017.1  1250740  .   T   A    .      .    .     GT   0/1  ...   0/1   0/0   0/0   0/0   0/1                               [P1-93, P1-6, P1-26]   
7  NC_064017.1  1372722  .   A   C    .      .    .     GT   1/1  ...   1/1   1/1   0/0   1/1   0/0         [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]   
8  NC_064017.1  1502890  .   G   T    .      .    .     GT     .  ...   1/1   1/1   0/1   0/0   1/1                [P1-88, P1-12, P1-92, P1-22, P1-95]   
zero_zero                                       one_one_back                             zero_zero_next  check  
0  [P1-25, P1-93, P1-6, P1-92, P1-22, P1-28, P1-95]                                                NaN                      [P1-25, P1-12, P1-22]    NaN  
1                             [P1-25, P1-12, P1-22]                       [P1-88, P1-89, P1-26, P1-12]        [P1-25, P1-88, P1-12, P1-22, P1-28]  False  
2               [P1-25, P1-88, P1-12, P1-22, P1-28]                              [P1-93, P1-89, P1-92]                      [P1-93, P1-26, P1-92]   True  
3                             [P1-93, P1-26, P1-92]                              [P1-93, P1-89, P1-92]                       [P1-25, P1-6, P1-28]  False  
4                              [P1-25, P1-6, P1-28]          [P1-25, P1-6, P1-89, P1-12, P1-90, P1-95]                      [P1-22, P1-90, P1-28]  False  
5                             [P1-22, P1-90, P1-28]  [P1-88, P1-26, P1-12, P1-92, P1-22, P1-90, P1-95]                      [P1-22, P1-90, P1-28]  False  
6                             [P1-22, P1-90, P1-28]                               [P1-6, P1-26, P1-95]  [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]  False  
7         [P1-93, P1-6, P1-89, P1-12, P1-90, P1-95]                               [P1-93, P1-6, P1-26]                      [P1-93, P1-26, P1-28]  False  
8                             [P1-93, P1-26, P1-28]         [P1-25, P1-88, P1-26, P1-92, P1-22, P1-28]                      [P1-89, P1-26, P1-95]  False  
[9 rows x 26 columns]

huangapple
  • 本文由 发表于 2023年6月1日 17:32:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380495.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定