2023年7月17日 22:31:59go评论107阅读模式

英文:

Why cannot pivot_table a non numeric column?

问题

我尝试了这段代码：

df_aggregated[['modeWinddirectiondiscrete', 'CAMPAÑA', 'ID_ESTACION', 'Month']].pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month')

其中列'modeWinddirectiondiscrete'的值类似于"North, West, South, East ..."。

结果只包含索引。我尝试了其他数值列，它运行良好，所以问题在于该列包含字符串。是否可以使用字符串获得正确的解决方案？还是我需要将方向转换为数字？

编辑：
例如，我有这个表格：

mode  modeWinddirectiondiscrete  CAMPAÑA  ID_ESTACION  Month
    0                     South       16            0      1
    1                     North       16            0      1
    2                     North       16            0      1
    3                     East        16            0      2
    4                     West        16            0      2
    5                     East        16            0      2
    6                     South       17            1      3
    7                     West        17            1      3
    8                     North       17            1      3
    9                     West        17            1      3

我想要类似于下面的结果：

                                       DewpointLocalDayAvg
              Month   1   10   11   12   2   3   4   5   6   7   8   9
CAMPAÑA	ID_ESTACION
     16           0 North .............East...........................
     17           1 .......................West.......................

结果应该类似于这样，我不知道是否能够理解这个示例。

英文:

Im trying this code

df_aggregated[[&#39;modeWinddirectiondiscrete&#39;,&#39;CAMPA&#209;A&#39;, &#39;ID_ESTACION&#39;,&#39;Month&#39;]].pivot_table(index=[&#39;CAMPA&#209;A&#39;, &#39;ID_ESTACION&#39;], columns=&#39;Month&#39;)

Where the column 'modeWinddirectiondiscrete' have values like "North, West, South, East ..."

And I have as result just the index. I tried it with other numeric columns and works well so the problem is that the column has Strings. Is it possible to have a correct solution with the Strings? or I need to transform the directions to numbers?

EDIT:
For example I have this table

mode  modeWinddirectiondiscrete  CAMPA&#209;A  ID_ESTACION  Month
    0                     South       16            0      1
    1                     North       16            0      1
    2                     North       16            0      1
    3                     East        16            0      2
    4                     West        16            0      2
    5                     East        16            0      2
    6                     South       17            1      3
    7                     West        17            1      3
    8                     North       17            1      3
    9                     West        17            1      3

And i want something like:

                                                   DewpointLocalDayAvg
              Month   1   10   11   12   2   3   4   5   6   7   8   9
CAMPA&#209;A	ID_ESTACION
     16           0 North .............East...........................
     17           1 .......................West.......................

The result must be something like that, I don't know if this example is well understood.

答案1

得分: 1

以下是翻译的代码部分：

import pandas as pd
# 玩具数据集
data = {
    'modeWinddirectiondiscrete': ['北', '南', '东', '西', '北'],
    'CAMPAÑA': [1, 1, 2, 2, 3],
    'ID_ESTACION': [1, 2, 1, 2, 3],
    'Month': ['一月', '二月', '一月', '二月', '一月']
}
df_aggregated = pd.DataFrame(data)
# 将 modeWinddirectiondiscrete 转化为分类数据（类似于R中的因子）
df_aggregated['modeWinddirectiondiscrete'] = pd.Categorical(df_aggregated['modeWinddirectiondiscrete'], categories=['北', '南', '东', '西'])
df_aggregated.pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month', values='modeWinddirectiondiscrete', aggfunc='first')

Month               二月    一月
CAMPAÑA ID_ESTACION            
1       1           NaN    北
        2           南      NaN
2       1           NaN    东
        2           西      NaN
3       3           NaN    北

如果实际数据不适用，请告诉我，我会进行更新

英文:

What about this?

import pandas as pd
# toy dataset
data = {
    &#39;modeWinddirectiondiscrete&#39;: [&#39;North&#39;, &#39;South&#39;, &#39;East&#39;, &#39;West&#39;, &#39;North&#39;],
    &#39;CAMPA&#209;A&#39;: [1, 1, 2, 2, 3],
    &#39;ID_ESTACION&#39;: [1, 2, 1, 2, 3],
    &#39;Month&#39;: [&#39;January&#39;, &#39;February&#39;, &#39;January&#39;, &#39;February&#39;, &#39;January&#39;]
}
df_aggregated = pd.DataFrame(data)
# turn modeWinddirectiondiscrete into categorical data (equivalent of factors in R)
df_aggregated[&#39;modeWinddirectiondiscrete&#39;] = pd.Categorical(df_aggregated[&#39;modeWinddirectiondiscrete&#39;], categories=[&#39;North&#39;, &#39;South&#39;, &#39;East&#39;, &#39;West&#39;])
df_aggregated.pivot_table(index=[&#39;CAMPA&#209;A&#39;, &#39;ID_ESTACION&#39;], columns=&#39;Month&#39;, values=&#39;modeWinddirectiondiscrete&#39;, aggfunc=&#39;first&#39;)

Month               February January
CAMPA&#209;A ID_ESTACION                 
1       1                NaN   North
        2              South     NaN
2       1                NaN    East
        2               West     NaN
3       3                NaN   North

Let me know if this doesn't work for the actual data, and I'll update it

Update: ouroboros1 helpfully pointed out that the issue with your code is that pivot_table defaults to aggregating values using their mean, which doesn't make any sense for categorical data, as the categories don't correspond to numbers which can be averaged.

Read here for more information on data types and the operations which can be performed on them.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么无法对非数值列进行数据透视表操作？

问题

答案1

提取动态输入的特定部分，然后将它们组合在一起？

你可以在哪里找到spacy.py文件以重命名。

确定数据框中各行之间的过渡和持续时间。

如何使用TKinter选择两个Excel文件并将它们保存为文件？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。