2023年2月23日 21:58:44go评论69阅读模式

英文:

While grouping and aggregating data the TypeError: unhashable numpy.ndarray has been occcured

问题

我处理了从YouTube频道统计数据中检索到的数据，而在聚合数据时，我收到了一个错误消息，其中numpy.ndarray 是不可散列的类型。用于创建 'body_focus' 和 'type_of_workout' 列的代码如下：

workout_df = videos_df[['Year', 'body_focus', 'type_of_workout', 'viewCount', 'commentCount', 'likeCount']]
workout_df

接着，我获得了以下错误消息：

TypeError Traceback (most recent call last)
...
TypeError: unhashable type: 'numpy.ndarray'

我已经检查了每个变量的数据类型，以及聚合所需的类别是否是正确的数据类型。请帮助我找出这个代码错误并执行结果的聚合。

英文:

I have processed data retrieved from Youtube channels statistics and while I have been aggregating data I obtained error message that numpy.ndarray was unhashable type. For creating columns 'body_focus' and ' type_of_workout' I have been used np.where function. I will not post this part of script because I do not think that this was the reason of type error issue.
My lines of code:

workout_df = videos_df[[&#39;Year&#39;,&#39;body_focus&#39;,&#39;type_of_workout&#39;,&#39;viewCount&#39;,&#39;commentCount&#39;,&#39;likeCount&#39;]]
workout_df

workout_df.groupby(by = [&#39;Year&#39;,&#39;body_focus&#39;])[&#39;viewCount&#39;,&#39;commentCount&#39;,&#39;likeCount&#39;].sum()\
                  .sort(&#39;Year&#39;, ascending = True)

Then I obtained error message:

----------------------------------------------------------------
TypeError                      Traceback (most recent call last)
Cell In[166], line 1
----&gt; 1 videos_df.groupby(by = [&#39;Year&#39;,&#39;body_focus&#39;])[&#39;viewCount&#39;,&#39;commentCount&#39;,&#39;likeCount&#39;].sum()\
      2          .sort(&#39;Year&#39;, ascending = True)

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\groupby.py:2434, in GroupBy.sum(self, numeric_only, min_count, engine, engine_kwargs)
   2429 else:
   2430     # If we are grouping on categoricals we want unobserved categories to
   2431     # return zero, rather than the default of NaN which the reindexing in
   2432     # _agg_general() returns. GH #31422
   2433     with com.temp_setattr(self, &quot;observed&quot;, True):
-&gt; 2434         result = self._agg_general(
   2435             numeric_only=numeric_only,
   2436             min_count=min_count,
   2437             alias=&quot;sum&quot;,
   2438             npfunc=np.sum,
   2439         )
   2441     return self._reindex_output(result, fill_value=0)

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\groupby.py:1692, in GroupBy._agg_general(self, numeric_only, min_count, alias, npfunc)
   1680 @final
   1681 def _agg_general(
   1682     self,
   (...)
   1687     npfunc: Callable,
   1688 ):
   1690     with self._group_selection_context():
   1691         # try a cython aggregation if we can
-&gt; 1692         result = self._cython_agg_general(
   1693             how=alias,
   1694             alt=npfunc,
   1695             numeric_only=numeric_only,
   1696             min_count=min_count,
   1697         )
   1698         return result.__finalize__(self.obj, method=&quot;groupby&quot;)

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\groupby.py:1796, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count, ignore_failures, **kwargs)
   1793 if not is_ser and len(new_mgr) &lt; orig_len:
   1794     warn_dropping_nuisance_columns_deprecated(type(self), how, numeric_only)
-&gt; 1796 res = self._wrap_agged_manager(new_mgr)
   1797 if is_ser:
   1798     res.index = self.grouper.result_index

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\generic.py:1511, in DataFrameGroupBy._wrap_agged_manager(self, mgr)
   1509     result = result._consolidate()
   1510 else:
-&gt; 1511     index = self.grouper.result_index
   1512     mgr.set_axis(1, index)
   1513     result = self.obj._constructor(mgr)

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\_libs\properties.pyx:36, in pandas._libs.properties.CachedProperty.__get__()

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\ops.py:995, in BaseGrouper.result_index(self)
    992 if len(self.groupings) == 1:
    993     return self.groupings[0].result_index.rename(self.names[0])
--&gt; 995 codes = self.reconstructed_codes
    996 levels = [ping.result_index for ping in self.groupings]
    997 return MultiIndex(
    998     levels=levels, codes=codes, verify_integrity=False, names=self.names
    999 )

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\ops.py:986, in BaseGrouper.reconstructed_codes(self)
    984 @property
    985 def reconstructed_codes(self) -&gt; list[npt.NDArray[np.intp]]:
--&gt; 986     codes = self.codes
    987     ids, obs_ids, _ = self.group_info
    988     return decons_obs_group_ids(ids, obs_ids, self.shape, codes, xnull=True)

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\ops.py:897, in BaseGrouper.codes(self)
    894 @final
    895 @property
    896 def codes(self) -&gt; list[npt.NDArray[np.signedinteger]]:
--&gt; 897     return [ping.codes for ping in self.groupings]

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\ops.py:897, in &lt;listcomp&gt;(.0)
    894 @final
    895 @property
    896 def codes(self) -&gt; list[npt.NDArray[np.signedinteger]]:
--&gt; 897     return [ping.codes for ping in self.groupings]

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\grouper.py:621, in Grouping.codes(self)
    617 if self._codes is not None:
    618     # _codes is set in __init__ for MultiIndex cases
    619     return self._codes
--&gt; 621 return self._codes_and_uniques[0]

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\_libs\properties.pyx:36, in pandas._libs.properties.CachedProperty.__get__()

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\groupby\grouper.py:692, in Grouping._codes_and_uniques(self)
    685     uniques = (
    686         self.grouping_vector.result_index._values  # type: ignore[assignment]
    687     )
    688 else:
    689     # GH35667, replace dropna=False with use_na_sentinel=False
    690     # error: Incompatible types in assignment (expression has type &quot;Union[
    691     # ndarray[Any, Any], Index]&quot;, variable has type &quot;Categorical&quot;)
--&gt; 692     codes, uniques = algorithms.factorize(  # type: ignore[assignment]
    693         self.grouping_vector, sort=self._sort, use_na_sentinel=self._dropna
    694     )
    695 return codes, uniques

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\algorithms.py:818, in factorize(values, sort, na_sentinel, use_na_sentinel, size_hint)
    815             # Don&#39;t modify (potentially user-provided) array
    816             values = np.where(null_mask, na_value, values)
--&gt; 818     codes, uniques = factorize_array(
    819         values,
    820         na_sentinel=na_sentinel_arg,
    821         size_hint=size_hint,
    822     )
    824 if sort and len(uniques) &gt; 0:
    825     if na_sentinel is None:
    826         # TODO: Can remove when na_sentinel=na_sentinel as in TODO above

File ~\AppData\Roaming\Python\Python310\site-packages\pandas\core\algorithms.py:574, in factorize_array(values, na_sentinel, size_hint, na_value, mask)
    571 hash_klass, values = _get_hashtable_algo(values)
    573 table = hash_klass(size_hint or len(values))
--&gt; 574 uniques, codes = table.factorize(
    575     values,
    576     na_sentinel=na_sentinel,
    577     na_value=na_value,
    578     mask=mask,
    579     ignore_na=ignore_na,
    580 )
    582 # re-cast e.g. i8-&gt;dt64/td64, uint8-&gt;bool
    583 uniques = _reconstruct_data(uniques, original.dtype, original)

File pandas\_libs\hashtable_class_helper.pxi:5943, in pandas._libs.hashtable.PyObjectHashTable.factorize()

File pandas\_libs\hashtable_class_helper.pxi:5857, in pandas._libs.hashtable.PyObjectHashTable._unique()

TypeError: unhashable type: &#39;numpy.ndarray&#39;

I have checked up the dtype of each variable and needed categories for aggregation were proper data type. Please, help me to find out this code bug and how to perform the aggregation of my results.

答案1

得分: 1

尝试使用这行代码：

videos_grouped = videos_df.groupby(by=['Year', 'body_focus'], as_index=False)['viewCount', 'commentCount', 'likeCount'].sum()

videos_sorted = videos_grouped.sort_values(by='Year', ascending=True)

英文:

Try using this line

videos_grouped = videos_df.groupby(by=['Year', 'body_focus'], as_index=False)['viewCount', 'commentCount', 'likeCount'].sum()

videos_sorted = videos_grouped.sort_values(by='Year', ascending=True)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在对数据进行分组和聚合时出现了 TypeError: unhashable numpy.ndarray。

问题

答案1

Polars groupby + value_counts

如何将CSV转换为嵌套JSON。

SQL按ID分组运行总和，并受条件限制（使用窗口函数）

groupby datetime64[ns]列的均值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论