英文:
Create column based on row data when column doesn't exist or column is NaN in pandas
问题
我有一个来自OSM数据的数据框。在这个数据框中,我获取了一切,除了我的区域中的“colour”列。然而,在其他区域,该列可能存在。现在,如果缺少该列,我想通过提供计算出的颜色来创建该列,同时当该列存在但行尚未具有颜色值时,我也想用颜色代码替换任何NaN值。
简而言之:如果需要,如何创建一个列,否则将NaN映射为其他颜色?
我已经尝试过以下方法:
import random
def setColor(_):
r = lambda: random.randint(0,255)
return '#%02X%02X%02X' % (r(),r(),r())
lines.loc[lines['colour'].isnull(),'colour'] = lines["colour"].map(setColor)
然而,如果初始时不存在颜色,这将失败。
我可以首先运行lines["colour"] = np.nan
,但虽然这对于空列有效,但对于列已经部分存在的情况不起作用。所以我想知道是否有更好的方法。
英文:
I have a dataframe from OSM data. In this I got everything but the colour
column in my area. However in other areas the column may exist. Now I want to create the column if it is missing by providing calculated colors and also want to replace any NaN values with a color code when the column exists but a row has no color value yet.
TLDR: How do I create a colum if needed and otherwise map NaN otherwise?
I already tried just doing:
import random
def setColor(_):
r = lambda: random.randint(0,255)
return '#%02X%02X%02X' % (r(),r(),r())
lines.loc[lines['colour'].isnull(),'colour'] = lines["colour"].map(setColor)
However this fails if colour doesnt exist initially.
I could run lines["colour"] = np.nan
first but while that works for empty colums this doesn't work for the case when the column already partially exists. So I wonder if there is a better way.
答案1
得分: 0
以下是代码部分的翻译:
# 给定 `df1` 和 `df2`:
import pandas as pd
import numpy as np
import random
df1 = pd.DataFrame({'Col_01': ['x', 'y', 'z']})
df2 = pd.DataFrame({'Col_01': ['x', 'y', 'z'], 'colour': ['#D30000', '#C21807', '']})
print("df1:\n", df1)
print("df2:\n", df2)
# 控制台输出:
df1:
Col_01
0 x
1 y
2 z
df2:
Col_01 colour
0 x #D30000
1 y #C21807
2 z
# 做一个小改动,去掉函数的参数并遍历所有数据框:
def setColor(): # 改动:去掉这里的参数
r = lambda: random.randint(0, 255)
return '#%02X%02X%02X' % (r(), r(), r())
for df in [df1, df2]:
if "colour" not in df:
df["colour"] = df.apply(lambda x: setColor(), axis=1)
else:
df["colour"] = np.where(df["colour"] == '', setColor(), df["colour"])
print("df1:\n", df1)
print("df2:\n", df2)
# 控制台输出:
df1:
Col_01 colour
0 x #C0ACB3
1 y #1FA09E
2 z #4A35FF
df2:
Col_01 colour
0 x #D30000
1 y #C21807
2 z #D97652
这些是您提供的代码部分的翻译。
英文:
It's not fully clear what you want, but maybe this is close.
Given df1
and df2
:
import pandas as pd
import numpy as np
import random
df1 = pd.DataFrame({'Col_01': ['x', 'y', 'z']})
df2 = pd.DataFrame({'Col_01': ['x', 'y', 'z'], 'colour': ['#D30000', '#C21807', '']})
print("df1:\n", df1)
print("df2:\n", df2)
Console output:
df1:
Col_01
0 x
1 y
2 z
df2:
Col_01 colour
0 x #D30000
1 y #C21807
2 z
With a slight change to your function (removing argument) and looping through all dataframes:
def setColor(): # change: remove the "_" here
r = lambda: random.randint(0, 255)
return '#%02X%02X%02X' % (r(),r(),r())
for df in [df1, df2]:
if "colour" not in df:
df["colour"] = df.apply(lambda x: setColor(), axis=1)
else:
df["colour"] = np.where(df["colour"] == '', setColor(), df["colour"])
print("df1:\n", df1)
print("df2:\n", df2)
Console output:
df1:
Col_01 colour
0 x #C0ACB3
1 y #1FA09E
2 z #4A35FF
df2:
Col_01 colour
0 x #D30000
1 y #C21807
2 z #D97652
It's probably self-explanatory, but the loop first looks to see if the colour
column exists; if not, it adds it and creates a hex code for each row. Otherwise, if the column exists, it uses np.where()
to create a hex code for blank rows, otherwise keeping hex code if it's there.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论