英文:
Why cannot pivot_table a non numeric column?
问题
我尝试了这段代码:
df_aggregated[['modeWinddirectiondiscrete', 'CAMPAÑA', 'ID_ESTACION', 'Month']].pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month')
其中列'modeWinddirectiondiscrete'的值类似于"North, West, South, East ..."。
结果只包含索引。我尝试了其他数值列,它运行良好,所以问题在于该列包含字符串。是否可以使用字符串获得正确的解决方案?还是我需要将方向转换为数字?
编辑:
例如,我有这个表格:
mode modeWinddirectiondiscrete CAMPAÑA ID_ESTACION Month
0 South 16 0 1
1 North 16 0 1
2 North 16 0 1
3 East 16 0 2
4 West 16 0 2
5 East 16 0 2
6 South 17 1 3
7 West 17 1 3
8 North 17 1 3
9 West 17 1 3
我想要类似于下面的结果:
DewpointLocalDayAvg
Month 1 10 11 12 2 3 4 5 6 7 8 9
CAMPAÑA ID_ESTACION
16 0 North .............East...........................
17 1 .......................West.......................
结果应该类似于这样,我不知道是否能够理解这个示例。
英文:
Im trying this code
df_aggregated[['modeWinddirectiondiscrete','CAMPAÑA', 'ID_ESTACION','Month']].pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month')
Where the column 'modeWinddirectiondiscrete' have values like "North, West, South, East ..."
And I have as result just the index. I tried it with other numeric columns and works well so the problem is that the column has Strings. Is it possible to have a correct solution with the Strings? or I need to transform the directions to numbers?
EDIT:
For example I have this table
mode modeWinddirectiondiscrete CAMPAÑA ID_ESTACION Month
0 South 16 0 1
1 North 16 0 1
2 North 16 0 1
3 East 16 0 2
4 West 16 0 2
5 East 16 0 2
6 South 17 1 3
7 West 17 1 3
8 North 17 1 3
9 West 17 1 3
And i want something like:
DewpointLocalDayAvg
Month 1 10 11 12 2 3 4 5 6 7 8 9
CAMPAÑA ID_ESTACION
16 0 North .............East...........................
17 1 .......................West.......................
The result must be something like that, I don't know if this example is well understood.
答案1
得分: 1
以下是翻译的代码部分:
import pandas as pd
# 玩具数据集
data = {
'modeWinddirectiondiscrete': ['北', '南', '东', '西', '北'],
'CAMPAÑA': [1, 1, 2, 2, 3],
'ID_ESTACION': [1, 2, 1, 2, 3],
'Month': ['一月', '二月', '一月', '二月', '一月']
}
df_aggregated = pd.DataFrame(data)
# 将 modeWinddirectiondiscrete 转化为分类数据(类似于R中的因子)
df_aggregated['modeWinddirectiondiscrete'] = pd.Categorical(df_aggregated['modeWinddirectiondiscrete'], categories=['北', '南', '东', '西'])
df_aggregated.pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month', values='modeWinddirectiondiscrete', aggfunc='first')
Month 二月 一月
CAMPAÑA ID_ESTACION
1 1 NaN 北
2 南 NaN
2 1 NaN 东
2 西 NaN
3 3 NaN 北
如果实际数据不适用,请告诉我,我会进行更新
英文:
What about this?
import pandas as pd
# toy dataset
data = {
'modeWinddirectiondiscrete': ['North', 'South', 'East', 'West', 'North'],
'CAMPAÑA': [1, 1, 2, 2, 3],
'ID_ESTACION': [1, 2, 1, 2, 3],
'Month': ['January', 'February', 'January', 'February', 'January']
}
df_aggregated = pd.DataFrame(data)
# turn modeWinddirectiondiscrete into categorical data (equivalent of factors in R)
df_aggregated['modeWinddirectiondiscrete'] = pd.Categorical(df_aggregated['modeWinddirectiondiscrete'], categories=['North', 'South', 'East', 'West'])
df_aggregated.pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month', values='modeWinddirectiondiscrete', aggfunc='first')
Month February January
CAMPAÑA ID_ESTACION
1 1 NaN North
2 South NaN
2 1 NaN East
2 West NaN
3 3 NaN North
Let me know if this doesn't work for the actual data, and I'll update it
Update: ouroboros1 helpfully pointed out that the issue with your code is that pivot_table defaults to aggregating values using their mean, which doesn't make any sense for categorical data, as the categories don't correspond to numbers which can be averaged.
Read here for more information on data types and the operations which can be performed on them.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论