2023年6月22日 05:52:31go评论91阅读模式

英文:

Any method of Replacing NaN values with already existing values

问题

我正在Jupyter笔记本中处理一个数据集。我的数据集有两列，'State'和'City'，如图所示

现在，State列中有一些缺失值，所以我想根据数据集中已有的数据来填充NaN数据。

例如，在图2中，城市是钦奈，但状态缺失，所以我想根据图1中的数据在状态列中填写泰米尔纳德邦。

示例2 - 我想填写安得拉邦的状态，城市是维萨卡帕特南，对所有NaN数据都是如此。

英文:

I am working on a dataset in a Jupyter notebook. My dataset has two columns, 'State' and 'City' as you can see in image

Now, State has some missing values in it, So I want to fill NaN data based on the already existing data in the dataset.

For example, in Image 2, the city is Chennai and the state is missing, So I want to fill Tamil Nadu in state column based on Image 1.

Example 2 - I want to fill in the state of Andhra Pradesh, where city is Vishakapatnam and so on for all the NaN data .

答案1

得分: 1

如您现在已经评论：“我有一个Pandas数据框，在其中一些州的值丢失，但在数据框的其他行中为相同城市提供”，那么您可以在城市上使用groupby，然后从相同城市的其他行中提取州的值。您可以使用下面的代码。ffill和bfill会在正确的州值之前或之后替换任何NaN值。

import pandas as pd
data = {
    'city': ['Delhi', 'Mumbai', 'Bangalore', 'Hyderabad', 'Chennai', 'Delhi', 'Chennai'],
    'state': ['Delhi', 'Maharashtra', 'Karnataka', 'Telangana', None, None, 'Tamil Nadu'],
}
df = pd.DataFrame(data)
df['state'] = df.groupby(['city'])['state'].transform(lambda x: x.ffill().bfill())
print(df)

结果为：

        city        state
0      Delhi        Delhi
1     Mumbai  Maharashtra
2  Bangalore    Karnataka
3  Hyderabad    Telangana
4    Chennai   Tamil Nadu
5      Delhi        Delhi
6    Chennai   Tamil Nadu

英文:

As you have now commented "I have one Pandas DF where some state values are missing but available for the same city at other rows in the DF" then you can use groupby on the City and pick-up the state from other row entries for the same City. You can use the code below. The ffill and bfill replaces any NaN values before or after the correct State value.

import pandas as pd
data = {
    &#39;city&#39;: [&#39;Delhi&#39;, &#39;Mumbai&#39;, &#39;Bangalore&#39;, &#39;Hyderabad&#39;, &#39;Chennai&#39;, &#39;Delhi&#39;, &#39;Chennai&#39;],
    &#39;state&#39;: [&#39;Delhi&#39;, &#39;Maharashtra&#39;, &#39;Karnataka&#39;, &#39;Telangana&#39;, None, None, &#39;Tamil Nadu&#39;],
}
df = pd.DataFrame(data)
df[&#39;state&#39;] = df.groupby([&#39;city&#39;])[&#39;state&#39;].transform(lambda x: x.ffill().bfill())
print(df)

gives:

        city        state
0      Delhi        Delhi
1     Mumbai  Maharashtra
2  Bangalore    Karnataka
3  Hyderabad    Telangana
4    Chennai   Tamil Nadu
5      Delhi        Delhi
6    Chennai   Tamil Nadu

答案2

得分: 0

我相信你需要一个将每个城市与其所属州份进行映射的字典，然后问题就迎刃而解了。
对于印度，这个链接提供了印度所有城市和州份的JSON文件格式。
以下是示例代码：

data = {
    'city': ['Delhi', 'Mumbai', 'Bangalore', 'Hyderabad', 'Chennai'],
    'state': ['Delhi', 'Maharashtra', 'Karnataka', 'Telangana', 'Tamil Nadu'],
}
city_state_dict = {
    'Delhi': 'Delhi',
    'Mumbai': 'Maharashtra',
    'Bangalore': 'Karnataka',
    'Hyderabad': 'Telangana',
    'Chennai': 'Tamil Nadu',
    'Kolkata': 'West Bengal',
    'Pune': 'Maharashtra',
}
# 这一行是为了模拟州份列中的NaN值
added_row = pd.Series(['Kolkata', np.nan], index=['city', 'state']).to_frame().T
added_row_2 = pd.Series(['Kolkata', 'West Bengal'], index=['city', 'state']).to_frame().T
added_row_3 = pd.Series(['Pune', np.nan], index=['city', 'state']).to_frame().T
final = (
    pd.concat(
        [
            pd.DataFrame(data),
            added_row, added_row_2, added_row_3
        ],
        axis=0,
        ignore_index=True)
    .assign(state=lambda df: df.city.map(city_state_dict))
)
final.head(10)

如果你想要一个快速解决方案，不考虑所有情况，你可以使用前向填充和后向填充，例如：

final = (
    pd.concat(
        [
            pd.DataFrame(tt),
            added_row,
            added_row_2,
            added_row_3
        ],
        axis=0,
        ignore_index=True)
    .assign(state= lambda df_: df_.groupby('city', group_key=False).state.apply(lambda x: x.bfill().ffill()))
)
final.head(10)

这将产生以下输出，如果在你的数据框中有一个城市没有对应的州份但你想要填充它，它将使用 bfill() 和 ffill() 来产生NaN值：

这种情况发生的原因是我们按city分组，城市Pune在组中只有一行，没有关于州份的后向或前向知识来填充它。

希望这对你有所帮助！

英文:

I believe you need a dictionary that maps each city with its state then it will be a piece of cake.
for India, this link has a JSON file format for all cities and states in India.
Here is a sample code:

data = {
    &#39;city&#39;: [&#39;Delhi&#39;, &#39;Mumbai&#39;, &#39;Bangalore&#39;, &#39;Hyderabad&#39;, &#39;Chennai&#39;],
    &#39;state&#39;: [&#39;Delhi&#39;, &#39;Maharashtra&#39;, &#39;Karnataka&#39;, &#39;Telangana&#39;, &#39;Tamil Nadu&#39;],
}
city_state_dict = {
    &#39;Delhi&#39;: &#39;Delhi&#39;,
    &#39;Mumbai&#39;: &#39;Maharashtra&#39;,
    &#39;Bangalore&#39;: &#39;Karnataka&#39;,
    &#39;Hyderabad&#39;: &#39;Telangana&#39;,
    &#39;Chennai&#39;: &#39;Tamil Nadu&#39;,
    &#39;Kolkata&#39;: &#39;West Bengal&#39;,
    &#39;Pune&#39;: &#39;Maharashtra&#39;,
}
# This line is to simulate the Nan valu in state column
added_row = pd.Series([&#39;Kolkata&#39;, np.nan], index=[&#39;city&#39;, &#39;state&#39;]).to_frame().T
added_row_2 = pd.Series([&#39;Kolkata&#39;, &#39;West Bengal&#39;], index=[&#39;city&#39;, &#39;state&#39;]).to_frame().T
added_row_3 = pd.Series([&#39;Pune&#39;, np.nan], index=[&#39;city&#39;, &#39;state&#39;]).to_frame().T
final = (
    pd.concat(
        [
            pd.DataFrame(data),
               added_row, added_row_2, added_row_3],
               axis=0,
               ignore_index=True)
    .assign(state=lambda df: df.city.map(city_state_dict))
)
final.head(10)

If you want a quick fix that doesn't cover all cases, you can use forward-fill and back-fill, for instance:

final = (
    pd.concat(
        [
            pd.DataFrame(tt),
               added_row,
               added_row_2,
               added_row_3],
               axis=0,
               ignore_index=True)
    # .assign(state=lambda df: df.city.map(city_state_dict))
    .assign(state= lambda df_: df_.groupby(&#39;city&#39;, group_key=False).state.apply(lambda x: x.bfill().ffill()))
)
final.head(10)

This will give this output, if there is one city that doesn't have a corresponding state in your data frame but you want to fill it, it will be a NaN with the bfill() and ffill():

Why this happens? because we grouped by city and the city Pune has only one row in the group with no back or forward knowledge about the state to impute it.

I hope this helps!

答案3

得分: 0

output:

    city	        state
0	Pune	        NaN
1	Visakhapatnam	AndhraPradesh
2	Indore	        NaN
3	Visakhapatnam	AndhraPradesh
4	Chennai	        TamilNadu
5	Chennai	        TamilNadu
6	Indore	        NaN
7	Chennai	        TamilNadu
8	Delhi	        Delhi
9	Pune	        NaN

英文:

Example

plz provide code not image and provide desired output

import numpy as np
import pandas as pd
data1 = {&#39;city&#39;: [&#39;Delhi&#39;, &#39;Calcutta&#39;, &#39;Delhi&#39;, &#39;Chennai&#39;, &#39;Bokaro&#39;, &#39;Visakhapatnam&#39;, &#39;Chennai&#39;, &#39;Wanparti&#39;, &#39;Delhi&#39;, &#39;Mumbai&#39;], 
         &#39;state&#39;: [&#39;Delhi&#39;, &#39;West Bengal&#39;, &#39;Delhi&#39;, &#39;TamilNadu&#39;, &#39;Jharkhand&#39;, &#39;AndhraPradesh&#39;, &#39;TamilNadu&#39;, &#39;AndhraPradesh&#39;, &#39;Delhi&#39;, &#39;Maharashtra&#39;]}
df1 = pd.DataFrame(data1)
data2 = {&#39;city&#39;: [&#39;Pune&#39;, &#39;Visakhapatnam&#39;, &#39;Indore&#39;, &#39;Visakhapatnam&#39;, &#39;Chennai&#39;, &#39;Chennai&#39;, &#39;Indore&#39;, &#39;Chennai&#39;, &#39;Delhi&#39;, &#39;Pune&#39;]}
df2 = pd.DataFrame(data2)
df2[&#39;state&#39;] = np.nan

df1

    city	        state
0	Delhi	        Delhi
1	Calcutta	    West Bengal
2	Delhi	        Delhi
3	Chennai	        TamilNadu
4	Bokaro	        Jharkhand
5	Visakhapatnam	AndhraPradesh
6	Chennai	        TamilNadu
7	Wanparti	    AndhraPradesh
8	Delhi	        Delhi
9	Mumbai	        Maharashtra

df2

    city	        state
0	Pune	        NaN
1	Visakhapatnam	NaN
2	Indore	        NaN
3	Visakhapatnam	NaN
4	Chennai	        NaN
5	Chennai	        NaN
6	Indore	        NaN
7	Chennai	        NaN
8	Delhi	        NaN
9	Pune	        NaN

Code

m = dict(df1.values)
df2.assign(state=df2[&#39;state&#39;].fillna(df2[&#39;city&#39;].map(m)))

output:

    city	        state
0	Pune	        NaN
1	Visakhapatnam	AndhraPradesh
2	Indore	        NaN
3	Visakhapatnam	AndhraPradesh
4	Chennai	        TamilNadu
5	Chennai	        TamilNadu
6	Indore	        NaN
7	Chennai	        TamilNadu
8	Delhi	        Delhi
9	Pune	        NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用已有值替换 NaN 值的任何方法

问题

答案1

答案2

答案3

如何使用haversine距离度量构建BallTree？

在Pandas.DataFrame中获取排名，包括可能存在的并列排名。

如何在Tkinter中将两个小部件在同一行中分散到窗口的两端？

Is there a way to mock/patch all functions in a module at once (or patch whole module)?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。