英文:
Python Fill in column values based on ID
问题
我想通过ID号来填充'Col'中的缺失值。
我已经尝试了groupby
这是预期的输出:
df=pd.DataFrame({
'ID':[1,2,1,2,1,2],
'Col':['One','Two','One','Two','One','Two']
})
我知道这是一个简单的例子,但我会感激你能提供的任何帮助。此外,我有一个包含100万行的数据帧,因此任何提高效率的方法都将不胜感激。
我尝试过的方法:
x=df_total[df_total['id'].astype(str)=='2']
buck_map = dict(x[~x['buckets'].isnull()][['id','buckets']].values)
x['buckets']=x['id'].map(buck_map)
英文:
I wanted to fill in the na values in 'Col' by the ID number.
I have tried groupby
df=pd.DataFrame({
'ID':[1,2,1,2,1,2],
'Col':['One','NaN','NaN','Two','NaN','NaN']
})
This is the expected output:
df=pd.DataFrame({
'ID':[1,2,1,2,1,2],
'Col':['One','Two','One','Two','One','Two']
})
I know this is an easy example but I would appreciate any help you could give me. Also I have a dataframe with 1 million rows so anything that would be time efficient would be appreciated
What I have tried:
x=df_total[df_total['id'].astype(str)=='2']
buck_map = dict(x[~x['buckets'].isnull()][['id','buckets']].values)
x['buckets']=x['id'].map(buck_map)
答案1
得分: 1
以下是翻译好的代码部分:
import pandas as pd
df = pd.DataFrame({
'ID': [1, 2, 1, 2, 1, 2],
'Col': ['One', 'NaN', 'NaN', 'Two', 'NaN', 'NaN']
})
def func(row):
d = {0: 'zero', 1: 'One', 2: 'Two'}
if row['Col'] == 'NaN':
val = d[row['ID']]
else:
val = row['Col']
return val
df['Col'] = df.apply(func, axis=1)
print(df)
输出结果如下:
ID Col
0 1 One
1 2 Two
2 1 One
3 2 Two
4 1 One
5 2 Two
英文:
It is not clear what you really want and if it is just a translation and substitution or if groupby is needed. Assumining you mean strings for the column and that you want just a substitution then you need a way of translating such as 1 to 'One' (a Dictionary is ideal) and then applying this to each row. You can use:
import pandas as pd
df=pd.DataFrame({
'ID':[1,2,1,2,1,2],
'Col':['One','NaN','NaN','Two','NaN','NaN']
})
def func(row):
d= {0: 'zero', 1:'One', 2:'Two'}
if row['Col'] == 'NaN':
val = d[row['ID']]
else:
val = row['Col']
return val
df['Col'] = df.apply(func, axis = 1)
print(df)
which gives:
ID Col
0 1 One
1 2 Two
2 1 One
3 2 Two
4 1 One
5 2 Two
答案2
得分: 1
以下是翻译好的部分:
"Your question is ambiguous, as there are several ways to produce the desired output based on your example."
"Assuming that you are looking for the "majority value" per ID
, and also that the NaN
s are actual float('NaN')
and to be dropped, and not just the string 'NaN'
, then the following would be quite efficient:"
def majority(s):
return s.mode()[0]
newdf = df.assign(Col=df.groupby('ID')['Col'].transform(majority))
">> newdf
ID Col
0 1 One
1 2 Two
2 1 One
3 2 Two
4 1 One
5 2 Two
"Note: to make sure the `'NaN'` are nan and not strings, do this first:"
```python
df = df.assign(Col=df['Col'].replace({'NaN': float('Nan')}))
英文:
Your question is ambiguous, as there are several ways to produce the desired output based on your example.
Assuming that you are looking for the "majority value" per ID
, and also that the NaN
s are actual float('NaN')
and to be dropped, and not just the string 'NaN'
, then the following would be quite efficient:
def majority(s):
return s.mode()[0]
newdf = df.assign(Col=df.groupby('ID')['Col'].transform(majority))
>>> newdf
ID Col
0 1 One
1 2 Two
2 1 One
3 2 Two
4 1 One
5 2 Two
Note: to make sure the 'NaN'
are nan and not strings, do this first:
df = df.assign(Col=df['Col'].replace({'NaN': float('Nan')}))
答案3
得分: 1
你可以创建一个将ID值映射到填充值的字典:
fill_dict = df.groupby('ID')['Col'].last().to_dict()
然后使用字典将NaN值替换为填充值:
df['Col'] = df['Col'].fillna(df['ID'].map(fill_dict))
英文:
You can create a dictionary mapping ID values to fill values:
fill_dict = df.groupby('ID')['Col'].last().to_dict()
then replace NaN values with fill values using the dictionary:
df['Col'] = df['Col'].fillna(df['ID'].map(fill_dict))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论