2023年1月8日 22:40:00go评论97阅读模式

英文:

Create column based on row data when column doesn't exist or column is NaN in pandas

问题

我有一个来自OSM数据的数据框。在这个数据框中，我获取了一切，除了我的区域中的“colour”列。然而，在其他区域，该列可能存在。现在，如果缺少该列，我想通过提供计算出的颜色来创建该列，同时当该列存在但行尚未具有颜色值时，我也想用颜色代码替换任何NaN值。

简而言之：如果需要，如何创建一个列，否则将NaN映射为其他颜色？

我已经尝试过以下方法：

import random
def setColor(_):
    r = lambda: random.randint(0,255)
    return '#%02X%02X%02X' % (r(),r(),r())
lines.loc[lines['colour'].isnull(),'colour'] = lines["colour"].map(setColor)

然而，如果初始时不存在颜色，这将失败。

我可以首先运行lines["colour"] = np.nan，但虽然这对于空列有效，但对于列已经部分存在的情况不起作用。所以我想知道是否有更好的方法。

英文:

I have a dataframe from OSM data. In this I got everything but the colour column in my area. However in other areas the column may exist. Now I want to create the column if it is missing by providing calculated colors and also want to replace any NaN values with a color code when the column exists but a row has no color value yet.

TLDR: How do I create a colum if needed and otherwise map NaN otherwise?

I already tried just doing:

import random
def setColor(_):
    r = lambda: random.randint(0,255)
    return &#39;#%02X%02X%02X&#39; % (r(),r(),r())
lines.loc[lines[&#39;colour&#39;].isnull(),&#39;colour&#39;] = lines[&quot;colour&quot;].map(setColor)

However this fails if colour doesnt exist initially.

I could run lines["colour"] = np.nan first but while that works for empty colums this doesn't work for the case when the column already partially exists. So I wonder if there is a better way.

答案1

得分: 0

以下是代码部分的翻译：

# 给定 `df1` 和 `df2`：
import pandas as pd
import numpy as np
import random
df1 = pd.DataFrame({'Col_01': ['x', 'y', 'z']})
df2 = pd.DataFrame({'Col_01': ['x', 'y', 'z'], 'colour': ['#D30000', '#C21807', '']})
print("df1:\n", df1)
print("df2:\n", df2)

# 控制台输出：
df1:
   Col_01
0      x
1      y
2      z
df2:
   Col_01   colour
0      x  #D30000
1      y  #C21807
2      z

# 做一个小改动，去掉函数的参数并遍历所有数据框：
def setColor():  # 改动：去掉这里的参数
    r = lambda: random.randint(0, 255)
    return '#%02X%02X%02X' % (r(), r(), r())
for df in [df1, df2]:
    if "colour" not in df:
        df["colour"] = df.apply(lambda x: setColor(), axis=1)
    else:
        df["colour"] = np.where(df["colour"] == '', setColor(), df["colour"])
print("df1:\n", df1)
print("df2:\n", df2)

# 控制台输出：
df1:
   Col_01   colour
0      x  #C0ACB3
1      y  #1FA09E
2      z  #4A35FF
df2:
   Col_01   colour
0      x  #D30000
1      y  #C21807
2      z  #D97652

这些是您提供的代码部分的翻译。

英文:

It's not fully clear what you want, but maybe this is close.

Given df1 and df2:

import pandas as pd
import numpy as np
import random
df1 = pd.DataFrame({&#39;Col_01&#39;: [&#39;x&#39;, &#39;y&#39;, &#39;z&#39;]})
df2 = pd.DataFrame({&#39;Col_01&#39;: [&#39;x&#39;, &#39;y&#39;, &#39;z&#39;], &#39;colour&#39;: [&#39;#D30000&#39;, &#39;#C21807&#39;, &#39;&#39;]})
print(&quot;df1:\n&quot;, df1)
print(&quot;df2:\n&quot;, df2)

Console output:

df1:
   Col_01
0      x
1      y
2      z
df2:
   Col_01   colour
0      x  #D30000
1      y  #C21807
2      z

With a slight change to your function (removing argument) and looping through all dataframes:

def setColor(): # change: remove the &quot;_&quot; here
    r = lambda: random.randint(0, 255)
    return &#39;#%02X%02X%02X&#39; % (r(),r(),r())
for df in [df1, df2]:
    if &quot;colour&quot; not in df:
        df[&quot;colour&quot;] = df.apply(lambda x: setColor(), axis=1)
    else:
        df[&quot;colour&quot;] = np.where(df[&quot;colour&quot;] == &#39;&#39;, setColor(), df[&quot;colour&quot;])
print(&quot;df1:\n&quot;, df1)
print(&quot;df2:\n&quot;, df2)

Console output:

df1:
   Col_01   colour
0      x  #C0ACB3
1      y  #1FA09E
2      z  #4A35FF
df2:
   Col_01   colour
0      x  #D30000
1      y  #C21807
2      z  #D97652

It's probably self-explanatory, but the loop first looks to see if the colour column exists; if not, it adds it and creates a hex code for each row. Otherwise, if the column exists, it uses np.where() to create a hex code for blank rows, otherwise keeping hex code if it's there.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas中，根据行数据创建列，当列不存在或列为NaN时。

问题

答案1

scikit-learn中Column Transformer中的全局变量

如何在使用VS Code运行Jupyter Notebook时向Python传递选项。

Pymongo可以用来在Python中连接MongoDB BI连接器吗？

可以共享跨区块的约束吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。