处理租金价格预测项目中的NaN值的方法

huangapple go评论71阅读模式
英文:

How to handle NaN values in rental price prediction project

问题

我正在进行一个租金预测项目,从Facebook Marketplace网站上爬取了数据。在提取物业的面积时,我遇到了许多NaN值。

我正在从一个小城市进行网页数据抓取,很可能无法找到更多的数据。在这种情况下,如何有效地处理数据中的NaN值?是否有任何机器学习算法或外部信息源可以用来填补缺失的值?

任何建议或意见将不胜感激。在此提前表示感谢!

我已经考虑过根据物业类型、卧室数量和浴室数量的平均值或中值来填补缺失值,但我不确定这是否是最佳方法。

英文:

I am working on a rental price prediction project where I web scraped data from Facebook Marketplace. When extracting the areas of the properties, I am encountering many NaN values.

I am web scraping from a small city and it is unlikely that I will be able to find more data. How can I effectively handle the NaN values in my data? Are there any machine learning algorithms or external sources of information that can be used to impute missing values in this situation?

Any suggestions or advice would be greatly appreciated. Thank you in advance!

I have considered using the mean or median based on property type, number of bedrooms, and bathrooms, but I am not sure if this is the best approach.

答案1

得分: 0

有很多方法可以处理数据中的缺失值。正如你提到的一般方法是用均值或中位数来填充。我建议首先对它们进行分组,然后使用均值或中位数来填充。

df['a'].fillna(df.groupby('b')['a'].transform('mean'))

我建议你可以使用邮政编码或类似的方式来对它们进行分组。

另一件你可以做的事情是在填充空白位置之前,创建另一个列,指示值是否缺失。这可以帮助你的模型以不同方式处理这些值,避免在这些值上过拟合。

更多信息请查看链接

英文:

There are many methods that you can use when it comes to missing values in your data. As you mentioned general approach is to fill with mean-median. I recommend grouping them first then filling with mean or median.

df['a'].fillna(df.groupby('b')['a'].transform('mean'))

I recon you can use zipcode or something similar to group them.

Another thing you can do is before filling empty places, create another column that indicates if the values are missing. this may help your model to treat those values differently and don't overfit on those values.

For further info link

huangapple
  • 本文由 发表于 2023年2月14日 07:23:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75442094.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定