2023年1月9日 19:12:06go评论75阅读模式

英文:

concatenating multiple columns include NaN in dataframe

问题

我想要将许多包含NaN值的列连接/合并到一个新列中。
如何避免/跳过连接结果中的NaN值？
以下只是展示了我的尝试，我同时使用了.agg和.apply。

import pandas as pd
import numpy as np
df = pd.DataFrame({'foo':['a',np.nan,'c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
subcat_names=["foo","new"]

df["result"] = df[subcat_names].astype(str).agg(','.join, axis=1)

df=df.fillna("")

df["result_2"] =df[subcat_names].apply(lambda x : '{},{}'.format(x[0],x[1]), axis=1)

print(df)

在result中，nan,是不希望的。
在result_2中，,是不希望的。

谢谢

英文:

i want to concatenate/join many columns include Nan value to one new column.
how to avoid/pass the NaN in join result?
below just to show my try i used both .agg and .apply.

import pandas as pd
import numpy as np
df = pd.DataFrame({&#39;foo&#39;:[&#39;a&#39;,np.nan,&#39;c&#39;], &#39;bar&#39;:[1, 2, 3], &#39;new&#39;:[&#39;apple&#39;, &#39;banana&#39;, &#39;pear&#39;]})
subcat_names=[&quot;foo&quot;,&quot;new&quot;]

df[&quot;result&quot;] = df[subcat_names].astype(str).agg(&#39;,&#39;.join, axis=1)

df=df.fillna(&quot;&quot;)

df[&quot;result_2&quot;] =df[subcat_names].apply(lambda x : &#39;{},{}&#39;.format(x[0],x[1]), axis=1)

print(df)
    
  foo  bar     new      result result_2
0   a    1   apple     a,apple  a,apple
1        2  banana  nan,banana  ,banana
2   c    3    pear      c,pear   c,pear

at result the nan, is unwanted
at result_2 , is unwanted

thanks

答案1

得分: 1

我认为第二个选项几乎是正确的，你只需要以更加复杂的方式实现你的lambda函数。以下是伪代码，未经测试：

def process(row):
    filtered = list()

    for item in row:
        if np.isnan(item).any():
            continue

        filtered.append(item)

    return ",".join(filtered)

df["result_2"] = df[subcat_names].apply(process, axis=1)

最有可能你可以依赖于not_na pandas函数来从当前行中收集有效值。

英文:

I think that the second option is almost correct, you just have to implement your lambda in a bit more involved way. The following is pseudocode and it's not tested:

def process(row):
    filtered = list()

    for item in row:
        if np.isnan(item).any():
            continue

        filtered.append(item)

    return &quot;,&quot;.join(filtered)

df[&quot;result_2&quot;] =df[subcat_names].apply(process, axis=1)

Most likely you could rely on not_na pandas function to collect valid values out of current row

答案2

得分: 1

subcat_names = ["foo", "new"]
df["result"] = df[subcat_names].apply(lambda x: ",".join(x[pd.notnull(x)]), axis=1)
print(df)

Output:

   foo  bar     new   result
0    a    1   apple  a,apple
1         2  banana   banana
2    c    3    pear   c,pear

英文:

You can try pd.notnull()

subcat_names = [&quot;foo&quot;, &quot;new&quot;]
df[&quot;result&quot;] = df[subcat_names].apply(lambda x: &quot;,&quot;.join(x[pd.notnull(x)]), axis=1)
print(df)

Output:

   foo  bar     new   result
0    a    1   apple  a,apple
1         2  banana   banana
2    c    3    pear   c,pear

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并包括NaN的数据框中的多个列

问题

答案1

答案2

2D 列表。在 Python 中打印列表的前 5 个项目。

定制 “按值返回” 在 IntEnum 中

允许 eval() 仅评估算术表达式和特定函数。

如何在Python 3中循环遍历CSV数值数据作为单独的行

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论