英文:
concatenating multiple columns include NaN in dataframe
问题
我想要将许多包含NaN值的列连接/合并到一个新列中。
如何避免/跳过连接结果中的NaN值?
以下只是展示了我的尝试,我同时使用了.agg
和.apply
。
import pandas as pd
import numpy as np
df = pd.DataFrame({'foo':['a',np.nan,'c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
subcat_names=["foo","new"]
df["result"] = df[subcat_names].astype(str).agg(','.join, axis=1)
df=df.fillna("")
df["result_2"] =df[subcat_names].apply(lambda x : '{},{}'.format(x[0],x[1]), axis=1)
print(df)
在result
中,nan,
是不希望的。
在result_2
中,,
是不希望的。
谢谢
英文:
i want to concatenate/join many columns include Nan value to one new column.
how to avoid/pass the NaN in join result?
below just to show my try i used both .agg
and .apply
.
import pandas as pd
import numpy as np
df = pd.DataFrame({'foo':['a',np.nan,'c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
subcat_names=["foo","new"]
df["result"] = df[subcat_names].astype(str).agg(','.join, axis=1)
df=df.fillna("")
df["result_2"] =df[subcat_names].apply(lambda x : '{},{}'.format(x[0],x[1]), axis=1)
print(df)
foo bar new result result_2
0 a 1 apple a,apple a,apple
1 2 banana nan,banana ,banana
2 c 3 pear c,pear c,pear
at result the nan,
is unwanted
at result_2 ,
is unwanted
thanks
答案1
得分: 1
我认为第二个选项几乎是正确的,你只需要以更加复杂的方式实现你的lambda函数。以下是伪代码,未经测试:
def process(row):
filtered = list()
for item in row:
if np.isnan(item).any():
continue
filtered.append(item)
return ",".join(filtered)
df["result_2"] = df[subcat_names].apply(process, axis=1)
最有可能你可以依赖于not_na pandas函数来从当前行中收集有效值。
英文:
I think that the second option is almost correct, you just have to implement your lambda in a bit more involved way. The following is pseudocode and it's not tested:
def process(row):
filtered = list()
for item in row:
if np.isnan(item).any():
continue
filtered.append(item)
return ",".join(filtered)
df["result_2"] =df[subcat_names].apply(process, axis=1)
Most likely you could rely on not_na pandas function to collect valid values out of current row
答案2
得分: 1
subcat_names = ["foo", "new"]
df["result"] = df[subcat_names].apply(lambda x: ",".join(x[pd.notnull(x)]), axis=1)
print(df)
Output:
foo bar new result
0 a 1 apple a,apple
1 2 banana banana
2 c 3 pear c,pear
英文:
You can try pd.notnull()
subcat_names = ["foo", "new"]
df["result"] = df[subcat_names].apply(lambda x: ",".join(x[pd.notnull(x)]), axis=1)
print(df)
Output:
foo bar new result
0 a 1 apple a,apple
1 2 banana banana
2 c 3 pear c,pear
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论