为什么无法对非数值列进行数据透视表操作?

huangapple go评论62阅读模式
英文:

Why cannot pivot_table a non numeric column?

问题

我尝试了这段代码:

df_aggregated[['modeWinddirectiondiscrete', 'CAMPAÑA', 'ID_ESTACION', 'Month']].pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month')

其中列'modeWinddirectiondiscrete'的值类似于"North, West, South, East ..."。

结果只包含索引。我尝试了其他数值列,它运行良好,所以问题在于该列包含字符串。是否可以使用字符串获得正确的解决方案?还是我需要将方向转换为数字?

编辑:
例如,我有这个表格:

mode  modeWinddirectiondiscrete  CAMPAÑA  ID_ESTACION  Month
    0                     South       16            0      1
    1                     North       16            0      1
    2                     North       16            0      1
    3                     East        16            0      2
    4                     West        16            0      2
    5                     East        16            0      2
    6                     South       17            1      3
    7                     West        17            1      3
    8                     North       17            1      3
    9                     West        17            1      3

我想要类似于下面的结果:

                                       DewpointLocalDayAvg
              Month   1   10   11   12   2   3   4   5   6   7   8   9
CAMPAÑA	ID_ESTACION
     16           0 North .............East...........................
     17           1 .......................West.......................

结果应该类似于这样,我不知道是否能够理解这个示例。

英文:

Im trying this code

df_aggregated[['modeWinddirectiondiscrete','CAMPAÑA', 'ID_ESTACION','Month']].pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month')

Where the column 'modeWinddirectiondiscrete' have values like "North, West, South, East ..."

And I have as result just the index. I tried it with other numeric columns and works well so the problem is that the column has Strings. Is it possible to have a correct solution with the Strings? or I need to transform the directions to numbers?

EDIT:
For example I have this table

mode  modeWinddirectiondiscrete  CAMPAÑA  ID_ESTACION  Month
    0                     South       16            0      1
    1                     North       16            0      1
    2                     North       16            0      1
    3                     East        16            0      2
    4                     West        16            0      2
    5                     East        16            0      2
    6                     South       17            1      3
    7                     West        17            1      3
    8                     North       17            1      3
    9                     West        17            1      3

And i want something like:

                                                   DewpointLocalDayAvg
              Month   1   10   11   12   2   3   4   5   6   7   8   9
CAMPAÑA	ID_ESTACION
     16           0 North .............East...........................
     17           1 .......................West.......................

The result must be something like that, I don't know if this example is well understood.

答案1

得分: 1

以下是翻译的代码部分:

import pandas as pd

# 玩具数据集
data = {
    'modeWinddirectiondiscrete': ['北', '南', '东', '西', '北'],
    'CAMPAÑA': [1, 1, 2, 2, 3],
    'ID_ESTACION': [1, 2, 1, 2, 3],
    'Month': ['一月', '二月', '一月', '二月', '一月']
}

df_aggregated = pd.DataFrame(data)

# 将 modeWinddirectiondiscrete 转化为分类数据(类似于R中的因子)
df_aggregated['modeWinddirectiondiscrete'] = pd.Categorical(df_aggregated['modeWinddirectiondiscrete'], categories=['北', '南', '东', '西'])

df_aggregated.pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month', values='modeWinddirectiondiscrete', aggfunc='first')
Month               二月    一月
CAMPAÑA ID_ESTACION            
1       1           NaN    北
        2           南      NaN
2       1           NaN    东
        2           西      NaN
3       3           NaN    北

如果实际数据不适用,请告诉我,我会进行更新 为什么无法对非数值列进行数据透视表操作?

英文:

What about this?

import pandas as pd

# toy dataset
data = {
    'modeWinddirectiondiscrete': ['North', 'South', 'East', 'West', 'North'],
    'CAMPAÑA': [1, 1, 2, 2, 3],
    'ID_ESTACION': [1, 2, 1, 2, 3],
    'Month': ['January', 'February', 'January', 'February', 'January']
}

df_aggregated = pd.DataFrame(data)

# turn modeWinddirectiondiscrete into categorical data (equivalent of factors in R)
df_aggregated['modeWinddirectiondiscrete'] = pd.Categorical(df_aggregated['modeWinddirectiondiscrete'], categories=['North', 'South', 'East', 'West'])

df_aggregated.pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month', values='modeWinddirectiondiscrete', aggfunc='first')
Month               February January
CAMPAÑA ID_ESTACION                 
1       1                NaN   North
        2              South     NaN
2       1                NaN    East
        2               West     NaN
3       3                NaN   North

Let me know if this doesn't work for the actual data, and I'll update it 为什么无法对非数值列进行数据透视表操作?

Update: ouroboros1 helpfully pointed out that the issue with your code is that pivot_table defaults to aggregating values using their mean, which doesn't make any sense for categorical data, as the categories don't correspond to numbers which can be averaged.

Read here for more information on data types and the operations which can be performed on them.

huangapple
  • 本文由 发表于 2023年7月17日 22:31:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76705512.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定