在pandas中,根据行数据创建列,当列不存在或列为NaN时。

huangapple go评论97阅读模式
英文:

Create column based on row data when column doesn't exist or column is NaN in pandas

问题

我有一个来自OSM数据的数据框。在这个数据框中,我获取了一切,除了我的区域中的“colour”列。然而,在其他区域,该列可能存在。现在,如果缺少该列,我想通过提供计算出的颜色来创建该列,同时当该列存在但行尚未具有颜色值时,我也想用颜色代码替换任何NaN值。

简而言之:如果需要,如何创建一个列,否则将NaN映射为其他颜色?

我已经尝试过以下方法:

  1. import random
  2. def setColor(_):
  3. r = lambda: random.randint(0,255)
  4. return '#%02X%02X%02X' % (r(),r(),r())
  5. lines.loc[lines['colour'].isnull(),'colour'] = lines["colour"].map(setColor)

然而,如果初始时不存在颜色,这将失败。

我可以首先运行lines["colour"] = np.nan,但虽然这对于空列有效,但对于列已经部分存在的情况不起作用。所以我想知道是否有更好的方法。

英文:

I have a dataframe from OSM data. In this I got everything but the colour column in my area. However in other areas the column may exist. Now I want to create the column if it is missing by providing calculated colors and also want to replace any NaN values with a color code when the column exists but a row has no color value yet.

TLDR: How do I create a colum if needed and otherwise map NaN otherwise?

I already tried just doing:

  1. import random
  2. def setColor(_):
  3. r = lambda: random.randint(0,255)
  4. return '#%02X%02X%02X' % (r(),r(),r())
  5. lines.loc[lines['colour'].isnull(),'colour'] = lines["colour"].map(setColor)

However this fails if colour doesnt exist initially.

I could run lines["colour"] = np.nan first but while that works for empty colums this doesn't work for the case when the column already partially exists. So I wonder if there is a better way.

答案1

得分: 0

以下是代码部分的翻译:

  1. # 给定 `df1` 和 `df2`:
  2. import pandas as pd
  3. import numpy as np
  4. import random
  5. df1 = pd.DataFrame({'Col_01': ['x', 'y', 'z']})
  6. df2 = pd.DataFrame({'Col_01': ['x', 'y', 'z'], 'colour': ['#D30000', '#C21807', '']})
  7. print("df1:\n", df1)
  8. print("df2:\n", df2)
  1. # 控制台输出:
  2. df1:
  3. Col_01
  4. 0 x
  5. 1 y
  6. 2 z
  7. df2:
  8. Col_01 colour
  9. 0 x #D30000
  10. 1 y #C21807
  11. 2 z
  1. # 做一个小改动,去掉函数的参数并遍历所有数据框:
  2. def setColor(): # 改动:去掉这里的参数
  3. r = lambda: random.randint(0, 255)
  4. return '#%02X%02X%02X' % (r(), r(), r())
  5. for df in [df1, df2]:
  6. if "colour" not in df:
  7. df["colour"] = df.apply(lambda x: setColor(), axis=1)
  8. else:
  9. df["colour"] = np.where(df["colour"] == '', setColor(), df["colour"])
  10. print("df1:\n", df1)
  11. print("df2:\n", df2)
  1. # 控制台输出:
  2. df1:
  3. Col_01 colour
  4. 0 x #C0ACB3
  5. 1 y #1FA09E
  6. 2 z #4A35FF
  7. df2:
  8. Col_01 colour
  9. 0 x #D30000
  10. 1 y #C21807
  11. 2 z #D97652

这些是您提供的代码部分的翻译。

英文:

It's not fully clear what you want, but maybe this is close.

Given df1 and df2:

  1. import pandas as pd
  2. import numpy as np
  3. import random
  4. df1 = pd.DataFrame({'Col_01': ['x', 'y', 'z']})
  5. df2 = pd.DataFrame({'Col_01': ['x', 'y', 'z'], 'colour': ['#D30000', '#C21807', '']})
  6. print("df1:\n", df1)
  7. print("df2:\n", df2)

Console output:

  1. df1:
  2. Col_01
  3. 0 x
  4. 1 y
  5. 2 z
  6. df2:
  7. Col_01 colour
  8. 0 x #D30000
  9. 1 y #C21807
  10. 2 z

With a slight change to your function (removing argument) and looping through all dataframes:

  1. def setColor(): # change: remove the "_" here
  2. r = lambda: random.randint(0, 255)
  3. return '#%02X%02X%02X' % (r(),r(),r())
  4. for df in [df1, df2]:
  5. if "colour" not in df:
  6. df["colour"] = df.apply(lambda x: setColor(), axis=1)
  7. else:
  8. df["colour"] = np.where(df["colour"] == '', setColor(), df["colour"])
  9. print("df1:\n", df1)
  10. print("df2:\n", df2)

Console output:

  1. df1:
  2. Col_01 colour
  3. 0 x #C0ACB3
  4. 1 y #1FA09E
  5. 2 z #4A35FF
  6. df2:
  7. Col_01 colour
  8. 0 x #D30000
  9. 1 y #C21807
  10. 2 z #D97652

It's probably self-explanatory, but the loop first looks to see if the colour column exists; if not, it adds it and creates a hex code for each row. Otherwise, if the column exists, it uses np.where() to create a hex code for blank rows, otherwise keeping hex code if it's there.

huangapple
  • 本文由 发表于 2023年1月8日 22:40:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75048637.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定