Binning error for a dataframe column – KeyError: "None of [Float64Index([61.5, 59.8, 56.8…. dtype='float64', length=53940)] are in the [columns]"

huangapple go评论107阅读模式
英文:

Binning error for a dataframe column - KeyError: "None of [Float64Index([61.5, 59.8, 56.8.... dtype='float64', length=53940)] are in the [columns]"

问题

  1. 我为数据框列编写了一个用于分箱数值的函数即将列值分成指定数量的类别
  2. def binning_fun(df, col_name, num_of_bins):
  3. lt=[]
  4. for i in range(0,num_of_bins):
  5. lt.append(i)
  6. df[col_name]=pd.cut(df[col_name],bins=i+1, labels=lt)
  7. return df
  8. df="C:/Users/shootings.csv"
  9. binning_fun(df, df['depth'], 4)
  10. 这导致以下错误
  11. **"None of [Float64Index([61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,
  12. ...
  13. 60.5, 59.8, 60.5, 61.2, 62.7, 60.8, 63.1, 62.8, 61.0, 62.2],
  14. dtype='float64', length=53940)] are in the [columns]"**
  15. 这些值确实存在于 'depth' 列中为什么会被认为不存在呢
  16. 我的数据集
  17. carat cut clarity depth table
  18. 0 0.23 Ideal SI2 61.5 55.0
  19. 1 0.21 Premium SI1 59.8 61.0
  20. 2 0.23 Good VS1 56.9 65.0
  21. 3 0.29 Premium VS2 62.4 58.0
  22. 4 0.31 Good SI2 63.3 58.0
  23. 5 0.24 Good VVS2 90.7 62.8
  24. 预期输出
  25. depth
  26. 1
  27. 0
  28. 0
  29. 1
  30. 1
  31. 2
英文:

I wrote a function for binning the numerical values of a dataframe column, i.e., dividing the column values into the specified number of categories.

  1. def binning_fun(df, col_name, num_of_bins):
  2. lt=[]
  3. for i in range(0,num_of_bins):
  4. lt.append(i)
  5. df[col_name]=pd.cut(df[col_name],bins=i+1, labels=lt)
  6. return df
  7. df="C:/Users/shootings.csv"
  8. binning_fun(df, df['depth'], 4)

This gives the following error:

"None of [Float64Index([61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,\n ...\n 60.5, 59.8, 60.5, 61.2, 62.7, 60.8, 63.1, 62.8, 61.0, 62.2],\n dtype='float64', length=53940)] are in the [columns]"

These values do exist in the column 'depth'. Why are they being called inexistent?

My dataset:

  1. carat cut clarity depth table
  2. 0 0.23 Ideal SI2 61.5 55.0
  3. 1 0.21 Premium SI1 59.8 61.0
  4. 2 0.23 Good VS1 56.9 65.0
  5. 3 0.29 Premium VS2 62.4 58.0
  6. 4 0.31 Good SI2 63.3 58.0
  7. 5 0.24 Good VVS2 90.7 62.8

Expected output:

  1. depth
  2. 1
  3. 0
  4. 0
  5. 1
  6. 1
  7. 2

答案1

得分: 1

你可以使用 cut 来获得固定的箱体大小:

  1. def binning_fun(df, col_name, num_of_bins):
  2. df[col_name] = pd.cut(df[col_name], bins=num_of_bins, labels=range(num_of_bins))
  3. return df
  4. df = pd.read_csv("C:/Users/shootings.csv")
  5. binning_fun(df, 'depth', 4)

输出:

  1. carat cut clarity depth table
  2. 0 0.23 Ideal SI2 0 55.00
  3. 1 0.21 Premium SI1 0 61.00
  4. 2 0.23 Good VS1 0 65.00
  5. 3 0.29 Premium VS2 0 58.00
  6. 4 0.31 Good SI2 0 58.00
  7. 5 0.24 Good VVS2 3 62.80

或者使用 qcut 来获得等大小的桶:

  1. def binning_fun(df, col_name, num_of_bins):
  2. df[col_name] = pd.qcut(df[col_name], q=num_of_bins, labels=range(num_of_bins))
  3. return df
  4. df = pd.read_csv("C:/Users/shootings.csv")
  5. binning_fun(df, 'depth', 4)

输出:

  1. carat cut clarity depth table
  2. 0 0.23 Ideal SI2 1 55.00
  3. 1 0.21 Premium SI1 0 61.00
  4. 2 0.23 Good VS1 0 65.00
  5. 3 0.29 Premium VS2 2 58.00
  6. 4 0.31 Good SI2 3 58.00
  7. 5 0.24 Good VVS2 3 62.80

希望这有所帮助。

英文:

You can use cut for fixed bin sizes:

  1. def binning_fun(df, col_name, num_of_bins):
  2. df[col_name]=pd.cut(df[col_name], bins=num_of_bins, labels=range(num_of_bins))
  3. return df
  4. df = pd.read_csv("C:/Users/shootings.csv")
  5. binning_fun(df, 'depth', 4)

Output:

  1. carat cut clarity depth table
  2. 0 0.23 Ideal SI2 0 55.00
  3. 1 0.21 Premium SI1 0 61.00
  4. 2 0.23 Good VS1 0 65.00
  5. 3 0.29 Premium VS2 0 58.00
  6. 4 0.31 Good SI2 0 58.00
  7. 5 0.24 Good VVS2 3 62.80

Or use qcut for equal-sized buckets:

  1. def binning_fun(df, col_name, num_of_bins):
  2. df[col_name]=pd.qcut(df[col_name], q=num_of_bins, labels=range(num_of_bins))
  3. return df
  4. df=pd.read_csv("C:/Users/shootings.csv")
  5. binning_fun(df, 'depth', 4)

Output:

  1. carat cut clarity depth table
  2. 0 0.23 Ideal SI2 1 55.00
  3. 1 0.21 Premium SI1 0 61.00
  4. 2 0.23 Good VS1 0 65.00
  5. 3 0.29 Premium VS2 2 58.00
  6. 4 0.31 Good SI2 3 58.00
  7. 5 0.24 Good VVS2 3 62.80

I hope this helps.

huangapple
  • 本文由 发表于 2023年2月8日 14:56:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382294.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定