英文:
Binning error for a dataframe column - KeyError: "None of [Float64Index([61.5, 59.8, 56.8.... dtype='float64', length=53940)] are in the [columns]"
问题
我为数据框列编写了一个用于分箱数值的函数,即将列值分成指定数量的类别。
def binning_fun(df, col_name, num_of_bins):
lt=[]
for i in range(0,num_of_bins):
lt.append(i)
df[col_name]=pd.cut(df[col_name],bins=i+1, labels=lt)
return df
df="C:/Users/shootings.csv"
binning_fun(df, df['depth'], 4)
这导致以下错误:
**"None of [Float64Index([61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,
...
60.5, 59.8, 60.5, 61.2, 62.7, 60.8, 63.1, 62.8, 61.0, 62.2],
dtype='float64', length=53940)] are in the [columns]"**
这些值确实存在于 'depth' 列中。为什么会被认为不存在呢?
我的数据集:
carat cut clarity depth table
0 0.23 Ideal SI2 61.5 55.0
1 0.21 Premium SI1 59.8 61.0
2 0.23 Good VS1 56.9 65.0
3 0.29 Premium VS2 62.4 58.0
4 0.31 Good SI2 63.3 58.0
5 0.24 Good VVS2 90.7 62.8
预期输出:
depth
1
0
0
1
1
2
英文:
I wrote a function for binning the numerical values of a dataframe column, i.e., dividing the column values into the specified number of categories.
def binning_fun(df, col_name, num_of_bins):
lt=[]
for i in range(0,num_of_bins):
lt.append(i)
df[col_name]=pd.cut(df[col_name],bins=i+1, labels=lt)
return df
df="C:/Users/shootings.csv"
binning_fun(df, df['depth'], 4)
This gives the following error:
"None of [Float64Index([61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,\n ...\n 60.5, 59.8, 60.5, 61.2, 62.7, 60.8, 63.1, 62.8, 61.0, 62.2],\n dtype='float64', length=53940)] are in the [columns]"
These values do exist in the column 'depth'. Why are they being called inexistent?
My dataset:
carat cut clarity depth table
0 0.23 Ideal SI2 61.5 55.0
1 0.21 Premium SI1 59.8 61.0
2 0.23 Good VS1 56.9 65.0
3 0.29 Premium VS2 62.4 58.0
4 0.31 Good SI2 63.3 58.0
5 0.24 Good VVS2 90.7 62.8
Expected output:
depth
1
0
0
1
1
2
答案1
得分: 1
你可以使用 cut 来获得固定的箱体大小:
def binning_fun(df, col_name, num_of_bins):
df[col_name] = pd.cut(df[col_name], bins=num_of_bins, labels=range(num_of_bins))
return df
df = pd.read_csv("C:/Users/shootings.csv")
binning_fun(df, 'depth', 4)
输出:
carat cut clarity depth table
0 0.23 Ideal SI2 0 55.00
1 0.21 Premium SI1 0 61.00
2 0.23 Good VS1 0 65.00
3 0.29 Premium VS2 0 58.00
4 0.31 Good SI2 0 58.00
5 0.24 Good VVS2 3 62.80
或者使用 qcut 来获得等大小的桶:
def binning_fun(df, col_name, num_of_bins):
df[col_name] = pd.qcut(df[col_name], q=num_of_bins, labels=range(num_of_bins))
return df
df = pd.read_csv("C:/Users/shootings.csv")
binning_fun(df, 'depth', 4)
输出:
carat cut clarity depth table
0 0.23 Ideal SI2 1 55.00
1 0.21 Premium SI1 0 61.00
2 0.23 Good VS1 0 65.00
3 0.29 Premium VS2 2 58.00
4 0.31 Good SI2 3 58.00
5 0.24 Good VVS2 3 62.80
希望这有所帮助。
英文:
You can use cut for fixed bin sizes:
def binning_fun(df, col_name, num_of_bins):
df[col_name]=pd.cut(df[col_name], bins=num_of_bins, labels=range(num_of_bins))
return df
df = pd.read_csv("C:/Users/shootings.csv")
binning_fun(df, 'depth', 4)
Output:
carat cut clarity depth table
0 0.23 Ideal SI2 0 55.00
1 0.21 Premium SI1 0 61.00
2 0.23 Good VS1 0 65.00
3 0.29 Premium VS2 0 58.00
4 0.31 Good SI2 0 58.00
5 0.24 Good VVS2 3 62.80
Or use qcut for equal-sized buckets:
def binning_fun(df, col_name, num_of_bins):
df[col_name]=pd.qcut(df[col_name], q=num_of_bins, labels=range(num_of_bins))
return df
df=pd.read_csv("C:/Users/shootings.csv")
binning_fun(df, 'depth', 4)
Output:
carat cut clarity depth table
0 0.23 Ideal SI2 1 55.00
1 0.21 Premium SI1 0 61.00
2 0.23 Good VS1 0 65.00
3 0.29 Premium VS2 2 58.00
4 0.31 Good SI2 3 58.00
5 0.24 Good VVS2 3 62.80
I hope this helps.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论