英文:
how to handle this in dataframe with python programing
问题
Acc numb | CityName |
---|---|
123456 | |
123456 | Delhi |
123456 | |
123456 | |
123456 | |
910234 | |
910234 | |
910234 | |
910324 | |
910324 | Bangalore |
910324 | |
910324 | |
360825 | |
360825 | |
360825 | |
360825 | Mumbai |
360825 | |
123456 | |
123456 | |
123456 | |
123456 | |
123456 |
英文:
Acc numb | CityName |
---|---|
123456 | |
123456 | Delhi |
123456 | |
123456 | |
123456 | |
910234 | |
910234 | |
910234 | |
910324 | |
910324 | Bangalore |
910324 | |
910324 | |
360825 | |
360825 | |
360825 | |
360825 | Mumbai |
360825 | |
123456 | |
123456 | |
123456 | |
123456 | |
123456 |
Acc numb | CityName |
---|---|
123456 | Delhi |
123456 | Delhi |
123456 | Delhi |
123456 | Delhi |
123456 | Delhi |
910234 | Bangalore |
910234 | Bangalore |
910234 | Bangalore |
910324 | Bangalore |
910324 | Bangalore |
910324 | Bangalore |
910324 | Bangalore |
360825 | Mumbai |
360825 | Mumbai |
360825 | Mumbai |
360825 | Mumbai |
360825 | Mumbai |
123456 | |
123456 | |
123456 | |
123456 | |
123456 |
答案1
得分: 0
如果你想根据Acc numb
来填充CityName
,可以尝试以下代码:
import pandas as pd
import numpy as np
df = df.replace('', np.nan)
df['CityName'] = df.groupby('Acc numb', group_keys=False).apply(lambda group: group['CityName'].ffill().bfill())
解释:
这段代码根据Acc numb
进行分组,然后使用ffill()
和bfill()
来填充CityName
中的缺失值,以使用在该组中找到的CityName
。如果某个Acc numb
没有相应的CityName
,则相应行的值将保持为NaN
。
英文:
If you wanted to fill CityName
based on Acc numb
you can try the following:
import pandas as pd
import numpy as np
df = df.replace('',np.nan)
df['CityName'] = df.groupby('Acc numb',group_keys=False).apply(lambda group: group['CityName'].ffill().bfill())
Explanation:
This code groups by Acc numb
and fills the missing values in CityName
using ffill()
and bfill()
in order to use the CityName
found in that group. If there is no CityName
for certain Acc numb
the rows will stay as NaN
答案2
得分: 0
使用Series.cumsum
比较移动值以创建连续分组,然后使用GroupBy.transform
和GroupBy.first
在新列中获取每个分组的第一个非缺失值:
注意:910234
与910324
不同,因此不由Bangalore
填充。
g = df['Acc numb'].ne(df['Acc numb'].shift()).cumsum()
df['CityName'] = df.groupby(g)['CityName'].transform('first')
print(df)
Acc numb CityName
0 123456 Delhi
1 123456 Delhi
2 123456 Delhi
3 123456 Delhi
4 123456 Delhi
5 910234 None
6 910234 None
7 910234 None
8 910324 Bangalore
9 910324 Bangalore
10 910324 Bangalore
11 910324 Bangalore
12 360825 Mumbai
13 360825 Mumbai
14 360825 Mumbai
15 360825 Mumbai
16 360825 Mumbai
17 123456 None
18 123456 None
19 123456 None
20 123456 None
21 123456 None
英文:
Create consecutive groups by compare shifted values with cumulative sum by Series.cumsum
and get first non missing value per groups in new column by GroupBy.transform
with GroupBy.first
:
Notice: 910234
is different like 910324
, so not filled by Bangalore
.
g = df['Acc numb'].ne(df['Acc numb'].shift()).cumsum()
df['CityName'] = df.groupby(g)['CityName'].transform('first')
print (df)
Acc numb CityName
0 123456 Delhi
1 123456 Delhi
2 123456 Delhi
3 123456 Delhi
4 123456 Delhi
5 910234 None
6 910234 None
7 910234 None
8 910324 Bangalore
9 910324 Bangalore
10 910324 Bangalore
11 910324 Bangalore
12 360825 Mumbai
13 360825 Mumbai
14 360825 Mumbai
15 360825 Mumbai
16 360825 Mumbai
17 123456 None
18 123456 None
19 123456 None
20 123456 None
21 123456 None
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论