如何在Python编程中处理数据框中的这个问题。

huangapple go评论58阅读模式
英文:

how to handle this in dataframe with python programing

问题

Acc numb CityName
123456
123456 Delhi
123456
123456
123456
910234
910234
910234
910324
910324 Bangalore
910324
910324
360825
360825
360825
360825 Mumbai
360825
123456
123456
123456
123456
123456
英文:
Acc numb CityName
123456
123456 Delhi
123456
123456
123456
910234
910234
910234
910324
910324 Bangalore
910324
910324
360825
360825
360825
360825 Mumbai
360825
123456
123456
123456
123456
123456
Acc numb CityName
123456 Delhi
123456 Delhi
123456 Delhi
123456 Delhi
123456 Delhi
910234 Bangalore
910234 Bangalore
910234 Bangalore
910324 Bangalore
910324 Bangalore
910324 Bangalore
910324 Bangalore
360825 Mumbai
360825 Mumbai
360825 Mumbai
360825 Mumbai
360825 Mumbai
123456
123456
123456
123456
123456

答案1

得分: 0

如果你想根据Acc numb来填充CityName,可以尝试以下代码:

import pandas as pd
import numpy as np

df = df.replace('', np.nan)

df['CityName'] = df.groupby('Acc numb', group_keys=False).apply(lambda group: group['CityName'].ffill().bfill())

解释:

这段代码根据Acc numb进行分组,然后使用ffill()bfill()来填充CityName中的缺失值,以使用在该组中找到的CityName。如果某个Acc numb没有相应的CityName,则相应行的值将保持为NaN

英文:

If you wanted to fill CityName based on Acc numb you can try the following:

import pandas as pd
import numpy as np

df = df.replace('',np.nan)

df['CityName'] = df.groupby('Acc numb',group_keys=False).apply(lambda group: group['CityName'].ffill().bfill())

Explanation:

This code groups by Acc numb and fills the missing values in CityName using ffill() and bfill() in order to use the CityName found in that group. If there is no CityName for certain Acc numb the rows will stay as NaN

答案2

得分: 0

使用Series.cumsum比较移动值以创建连续分组,然后使用GroupBy.transformGroupBy.first在新列中获取每个分组的第一个非缺失值:

注意:910234910324不同,因此不由Bangalore填充。

g = df['Acc numb'].ne(df['Acc numb'].shift()).cumsum()
df['CityName'] = df.groupby(g)['CityName'].transform('first')
print(df)
    Acc numb   CityName
0     123456      Delhi
1     123456      Delhi
2     123456      Delhi
3     123456      Delhi
4     123456      Delhi
5     910234       None
6     910234       None
7     910234       None
8     910324  Bangalore
9     910324  Bangalore
10    910324  Bangalore
11    910324  Bangalore
12    360825     Mumbai
13    360825     Mumbai
14    360825     Mumbai
15    360825     Mumbai
16    360825     Mumbai
17    123456       None
18    123456       None
19    123456       None
20    123456       None
21    123456       None
英文:

Create consecutive groups by compare shifted values with cumulative sum by Series.cumsum and get first non missing value per groups in new column by GroupBy.transform with GroupBy.first:

Notice: 910234 is different like 910324, so not filled by Bangalore.

g = df['Acc numb'].ne(df['Acc numb'].shift()).cumsum()
df['CityName'] = df.groupby(g)['CityName'].transform('first')
print (df)
    Acc numb   CityName
0     123456      Delhi
1     123456      Delhi
2     123456      Delhi
3     123456      Delhi
4     123456      Delhi
5     910234       None
6     910234       None
7     910234       None
8     910324  Bangalore
9     910324  Bangalore
10    910324  Bangalore
11    910324  Bangalore
12    360825     Mumbai
13    360825     Mumbai
14    360825     Mumbai
15    360825     Mumbai
16    360825     Mumbai
17    123456       None
18    123456       None
19    123456       None
20    123456       None
21    123456       None

huangapple
  • 本文由 发表于 2023年3月1日 12:18:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/75599532.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定