2023年2月8日 14:56:02go评论107阅读模式

英文:

Binning error for a dataframe column - KeyError: "None of [Float64Index([61.5, 59.8, 56.8.... dtype='float64', length=53940)] are in the [columns]"

问题

我为数据框列编写了一个用于分箱数值的函数，即将列值分成指定数量的类别。
    def binning_fun(df, col_name, num_of_bins):
        lt=[]
        for i in range(0,num_of_bins):
            lt.append(i)
            df[col_name]=pd.cut(df[col_name],bins=i+1, labels=lt)
        return df
    
    df="C:/Users/shootings.csv"
    binning_fun(df, df['depth'], 4)
这导致以下错误：
**"None of [Float64Index([61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,
              ...
              60.5, 59.8, 60.5, 61.2, 62.7, 60.8, 63.1, 62.8, 61.0, 62.2],
             dtype='float64', length=53940)] are in the [columns]"**
这些值确实存在于 'depth' 列中。为什么会被认为不存在呢？
我的数据集：
        carat   cut     clarity     depth   table
    0   0.23    Ideal   SI2         61.5    55.0
    1   0.21    Premium SI1         59.8    61.0
    2   0.23    Good    VS1         56.9    65.0
    3   0.29    Premium VS2         62.4    58.0
    4   0.31    Good    SI2         63.3    58.0
    5   0.24    Good    VVS2 90.7  62.8
预期输出：
    depth
    1
    0
    0
    1
    1
    2

英文:

I wrote a function for binning the numerical values of a dataframe column, i.e., dividing the column values into the specified number of categories.

def binning_fun(df, col_name, num_of_bins):
    lt=[]
    for i in range(0,num_of_bins):
        lt.append(i)
        df[col_name]=pd.cut(df[col_name],bins=i+1, labels=lt)
    return df
df=&quot;C:/Users/shootings.csv&quot;
binning_fun(df, df[&#39;depth&#39;], 4)

This gives the following error:

"None of [Float64Index([61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,\n ...\n 60.5, 59.8, 60.5, 61.2, 62.7, 60.8, 63.1, 62.8, 61.0, 62.2],\n dtype='float64', length=53940)] are in the [columns]"

These values do exist in the column 'depth'. Why are they being called inexistent?

My dataset:

		carat	cut		clarity	 depth	table	
0   	0.23	Ideal		SI2	 61.5	55.0	
1		0.21	Premium		SI1	 59.8	61.0	
2		0.23	Good		VS1	 56.9	65.0	
3		0.29	Premium		VS2	 62.4	58.0	
4		0.31	Good		SI2	 63.3	58.0	
5		0.24	Good		VVS2 90.7	62.8

Expected output:

depth
1
0
0
1
1
2

答案1

得分: 1

你可以使用 cut 来获得固定的箱体大小：

def binning_fun(df, col_name, num_of_bins):
    df[col_name] = pd.cut(df[col_name], bins=num_of_bins, labels=range(num_of_bins))
    return df
df = pd.read_csv("C:/Users/shootings.csv")
binning_fun(df, 'depth', 4)

输出：

    carat   cut     clarity     depth   table
0   0.23    Ideal   SI2         0       55.00
1   0.21    Premium SI1         0       61.00
2   0.23    Good    VS1         0       65.00
3   0.29    Premium VS2         0       58.00
4   0.31    Good    SI2         0       58.00
5   0.24    Good    VVS2        3       62.80

或者使用 qcut 来获得等大小的桶：

def binning_fun(df, col_name, num_of_bins):
    df[col_name] = pd.qcut(df[col_name], q=num_of_bins, labels=range(num_of_bins))
    return df
df = pd.read_csv("C:/Users/shootings.csv")
binning_fun(df, 'depth', 4)

输出：

    carat   cut     clarity     depth   table
0   0.23    Ideal   SI2         1       55.00
1   0.21    Premium SI1         0       61.00
2   0.23    Good    VS1         0       65.00
3   0.29    Premium VS2         2       58.00
4   0.31    Good    SI2         3       58.00
5   0.24    Good    VVS2        3       62.80

希望这有所帮助。

英文:

You can use cut for fixed bin sizes:

def binning_fun(df, col_name, num_of_bins):
    df[col_name]=pd.cut(df[col_name], bins=num_of_bins, labels=range(num_of_bins))
    return df
df = pd.read_csv(&quot;C:/Users/shootings.csv&quot;)
binning_fun(df, &#39;depth&#39;, 4)

Output:

    carat	cut	    clarity	 depth	 table
0	0.23	Ideal	SI2	     0	     55.00
1	0.21	Premium	SI1  	 0	     61.00
2	0.23	Good	VS1	     0       65.00
3	0.29	Premium	VS2	     0       58.00
4	0.31	Good	SI2	     0       58.00
5	0.24	Good	VVS2	 3	     62.80

Or use qcut for equal-sized buckets:

def binning_fun(df, col_name, num_of_bins):
    df[col_name]=pd.qcut(df[col_name], q=num_of_bins, labels=range(num_of_bins))
    return df
df=pd.read_csv(&quot;C:/Users/shootings.csv&quot;)
binning_fun(df, &#39;depth&#39;, 4)

Output:

    carat	cut	    clarity	depth	table
0	0.23	Ideal	SI2	    1	    55.00
1	0.21	Premium	SI1	    0	    61.00
2	0.23	Good	VS1	    0	    65.00
3	0.29	Premium	VS2	    2	    58.00
4	0.31	Good	SI2	    3	    58.00
5	0.24	Good	VVS2	3	    62.80

I hope this helps.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Binning error for a dataframe column – KeyError: "None of [Float64Index([61.5, 59.8, 56.8…. dtype='float64', length=53940)] are in the [columns]"

问题

答案1

无法在Django中筛选空字段。

如何将用户定义的参数传递给setuptools，以设置更改编译宏的标志。

迭代日期选择器中的日期，找到可用日期，然后选择其中之一。

我的无限循环不是无限的 – Python

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。