2023年3月4日 01:49:05go评论102阅读模式

英文:

Python Fill in column values based on ID

问题

我想通过ID号来填充'Col'中的缺失值。
我已经尝试了groupby

这是预期的输出:

df=pd.DataFrame({
    'ID':[1,2,1,2,1,2],
    'Col':['One','Two','One','Two','One','Two']
})

我知道这是一个简单的例子，但我会感激你能提供的任何帮助。此外，我有一个包含100万行的数据帧，因此任何提高效率的方法都将不胜感激。

我尝试过的方法:

x=df_total[df_total['id'].astype(str)=='2']
buck_map = dict(x[~x['buckets'].isnull()][['id','buckets']].values)
x['buckets']=x['id'].map(buck_map)

英文:

I wanted to fill in the na values in 'Col' by the ID number.
I have tried groupby

df=pd.DataFrame({
    &#39;ID&#39;:[1,2,1,2,1,2],
    &#39;Col&#39;:[&#39;One&#39;,&#39;NaN&#39;,&#39;NaN&#39;,&#39;Two&#39;,&#39;NaN&#39;,&#39;NaN&#39;]
})

This is the expected output:

df=pd.DataFrame({
    &#39;ID&#39;:[1,2,1,2,1,2],
    &#39;Col&#39;:[&#39;One&#39;,&#39;Two&#39;,&#39;One&#39;,&#39;Two&#39;,&#39;One&#39;,&#39;Two&#39;]
})

I know this is an easy example but I would appreciate any help you could give me. Also I have a dataframe with 1 million rows so anything that would be time efficient would be appreciated

What I have tried:

x=df_total[df_total[&#39;id&#39;].astype(str)==&#39;2&#39;]
buck_map = dict(x[~x[&#39;buckets&#39;].isnull()][[&#39;id&#39;,&#39;buckets&#39;]].values)
x[&#39;buckets&#39;]=x[&#39;id&#39;].map(buck_map)

答案1

得分: 1

以下是翻译好的代码部分：

import pandas as pd
df = pd.DataFrame({
    'ID': [1, 2, 1, 2, 1, 2],
    'Col': ['One', 'NaN', 'NaN', 'Two', 'NaN', 'NaN']
})
def func(row):
    d = {0: 'zero', 1: 'One', 2: 'Two'}
    if row['Col'] == 'NaN':
        val = d[row['ID']]
    else:
        val = row['Col']
    return val
df['Col'] = df.apply(func, axis=1)
print(df)

输出结果如下：

   ID  Col
0   1  One
1   2  Two
2   1  One
3   2  Two
4   1  One
5   2  Two

英文:

It is not clear what you really want and if it is just a translation and substitution or if groupby is needed. Assumining you mean strings for the column and that you want just a substitution then you need a way of translating such as 1 to 'One' (a Dictionary is ideal) and then applying this to each row. You can use:

import pandas as pd
df=pd.DataFrame({
    &#39;ID&#39;:[1,2,1,2,1,2],
    &#39;Col&#39;:[&#39;One&#39;,&#39;NaN&#39;,&#39;NaN&#39;,&#39;Two&#39;,&#39;NaN&#39;,&#39;NaN&#39;]
})
def func(row):
    d= {0: &#39;zero&#39;, 1:&#39;One&#39;, 2:&#39;Two&#39;}
    if row[&#39;Col&#39;] == &#39;NaN&#39;:
        val = d[row[&#39;ID&#39;]]
    else:
        val = row[&#39;Col&#39;]
    return val
 
df[&#39;Col&#39;] = df.apply(func, axis = 1)
print(df)

which gives:

   ID  Col
0   1  One
1   2  Two
2   1  One
3   2  Two
4   1  One
5   2  Two

答案2

得分: 1

以下是翻译好的部分：

"Your question is ambiguous, as there are several ways to produce the desired output based on your example."

"Assuming that you are looking for the "majority value" per ID, and also that the NaNs are actual float('NaN') and to be dropped, and not just the string 'NaN', then the following would be quite efficient:"

def majority(s):
    return s.mode()[0]
newdf = df.assign(Col=df.groupby(&#39;ID&#39;)[&#39;Col&#39;].transform(majority))

">> newdf
ID Col
0 1 One
1 2 Two
2 1 One
3 2 Two
4 1 One
5 2 Two


"Note: to make sure the `&#39;NaN&#39;` are nan and not strings, do this first:"
```python
df = df.assign(Col=df[&#39;Col&#39;].replace({&#39;NaN&#39;: float(&#39;Nan&#39;)}))

英文:

Your question is ambiguous, as there are several ways to produce the desired output based on your example.

Assuming that you are looking for the "majority value" per ID, and also that the NaNs are actual float('NaN') and to be dropped, and not just the string 'NaN', then the following would be quite efficient:

def majority(s):
    return s.mode()[0]
newdf = df.assign(Col=df.groupby(&#39;ID&#39;)[&#39;Col&#39;].transform(majority))
&gt;&gt;&gt; newdf
   ID  Col
0   1  One
1   2  Two
2   1  One
3   2  Two
4   1  One
5   2  Two

Note: to make sure the 'NaN' are nan and not strings, do this first:

df = df.assign(Col=df[&#39;Col&#39;].replace({&#39;NaN&#39;: float(&#39;Nan&#39;)}))

答案3

得分: 1

你可以创建一个将ID值映射到填充值的字典：

fill_dict = df.groupby('ID')['Col'].last().to_dict()

然后使用字典将NaN值替换为填充值：

df['Col'] = df['Col'].fillna(df['ID'].map(fill_dict))

英文:

You can create a dictionary mapping ID values to fill values:

fill_dict = df.groupby(&#39;ID&#39;)[&#39;Col&#39;].last().to_dict()

then replace NaN values with fill values using the dictionary:

df[&#39;Col&#39;] = df[&#39;Col&#39;].fillna(df[&#39;ID&#39;].map(fill_dict))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python基于ID填写列数值

问题

答案1

答案2

答案3

Python converting a column in df of strings in format "%M:%S.%f" into float of number of seconds

如何替换列表中的值

如何使用Python BigQuery客户端更新BigQuery分区过期时间？

如何从使用OWSLib提取的WMS图像中获取适当的GetMap尺寸

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论