如何创建具有`pd.CategoricalIndex`属性的一行`pd.Series`。

huangapple go评论62阅读模式
英文:

How to make one-liner `pd.Series` with `pd.CategoricalIndex` property

问题

以下是您要翻译的内容:

"I have a one-liner liner code that is working, which from pd.DataFrame create pd.Series with an index pd.CategoricalIndex.

Since pd.DataFrame is an API based on pd.Series I would like to generate the same series but now with pd.Series only, this is a question of optimization and API panda skills.

The pd.DataFrame code is listed below

import pandas as pd


pd_series_1 = pd.DataFrame(
    data=[
        ("2018-01", 0.0),
        ("2019-02", 1200.0),
        ("2019-03", 600.0),
    ],
    columns=['TIME_PERIOD', "OBS_VALUE"],
).astype(
    {"TIME_PERIOD": "category"}
).set_index(
    "TIME_PERIOD"
)["OBS_VALUE"]

assert pd_series_1.index.name == "TIME_PERIOD"
assert repr(pd_series_1.index) == "CategoricalIndex(['2018-01', '2019-02', '2019-03'], " \
                                "categories=['2018-01', '2019-02', '2019-03'], " \
                                "ordered=False, " \
                                "dtype='category', " \
                                "name='TIME_PERIOD')", repr(pd_series_1.index)

assert repr(pd_series_1) == "TIME_PERIOD\n" \
                          "2018-01       0.0\n" \
                          "2019-02    1200.0\n" \
                          "2019-03     600.0\n" \
                          "Name: OBS_VALUE, dtype: float64", repr(pd_series_1)

As you can see, the final series pd_series_1 has: CategoricalIndex.name equal with 'TIME_PERIOD' and the name as 'OBS_VALUE'.

The same is desired to have by using only pd.Series API within constructor or plus additional chain methods alike .set_index as in pd_series_1.

The code which I used for pd.Series is listed below

pd_series_2 = pd.Series(dict(
    [
        ("2018-01", 0.0),
        ("2019-02", 1200.0),
        ("2019-03", 600.0),
    ]),
    name='OBS_VALUE',
)

print(pd_series_2)
# 2018-01       0.0
# 2019-02    1200.0
# 2019-03     600.0
# Name: OBS_VALUE, dtype: float64

pd_series_2.index = pd.CategoricalIndex(pd_series_2.index, name='TIME_PERIOD')
print(pd_series_2)
# TIME_PERIOD
# 2018-01       0.0
# 2019-02    1200.0
# 2019-03     600.0
# Name: OBS_VALUE, dtype: float64

As you can observe, I managed to get the result, but the code is not one-liner.
Please suggest one-liner syntax here,

thank you in advance"

英文:

I have a one-liner liner code that is working, which from pd.DataFrame create pd.Series with an index pd.CategoricalIndex.

Since pd.DataFrame is an API based on pd.Series I would like to generate the same series but now with pd.Series only, this is a question of optimization and API panda skills.

The pd.DataFrame code is listed below

import pandas as pd


pd_series_1 = pd.DataFrame(
    data=[
        ("2018-01", 0.0),
        ("2019-02", 1200.0),
        ("2019-03", 600.0),
    ],
    columns=['TIME_PERIOD', "OBS_VALUE"],
).astype(
    {"TIME_PERIOD": "category"}
).set_index(
    "TIME_PERIOD"
)["OBS_VALUE"]

assert pd_series_1.index.name == "TIME_PERIOD"
assert repr(pd_series_1.index) == "CategoricalIndex(['2018-01', '2019-02', '2019-03'], " \
                                "categories=['2018-01', '2019-02', '2019-03'], " \
                                "ordered=False, " \
                                "dtype='category', " \
                                "name='TIME_PERIOD')", repr(pd_series_1.index)

assert repr(pd_series_1) == "TIME_PERIOD\n" \
                          "2018-01       0.0\n" \
                          "2019-02    1200.0\n" \
                          "2019-03     600.0\n" \
                          "Name: OBS_VALUE, dtype: float64", repr(pd_series_1)

As you can see, the final series pd_series_1 has: CategoricalIndex.name equal with 'TIME_PERIOD' and the name as 'OBS_VALUE'.

The same is desired to have by using only pd.Series API within constructor or plus additional chain methods alike .set_index as in pd_series_1.

The code which I used for pd.Series is listed below

pd_series_2 = pd.Series(dict(
    [
        ("2018-01", 0.0),
        ("2019-02", 1200.0),
        ("2019-03", 600.0),
    ]),
    name='OBS_VALUE',
)

print(pd_series_2)
# 2018-01       0.0
# 2019-02    1200.0
# 2019-03     600.0
# Name: OBS_VALUE, dtype: float64

pd_series_2.index = pd.CategoricalIndex(pd_series_2.index, name='TIME_PERIOD')
print(pd_series_2)
# TIME_PERIOD
# 2018-01       0.0
# 2019-02    1200.0
# 2019-03     600.0
# Name: OBS_VALUE, dtype: float64

As you can observe, I managed to get the result, but the code is not one-liner.
Please suggest one-liner syntax here,

thank you in advance

答案1

得分: 2

使用 Series.pipeSeries.set_axis

pd_series_2 = pd.Series(dict(
    [
        ("2018-01", 0.0),
        ("2019-02", 1200.0),
        ("2019-03", 600.0),
    ]),
    name='OBS_VALUE',
).pipe(lambda x:x.set_axis(pd.CategoricalIndex(x.index, name='TIME_PERIOD')))

print(pd_series_2.index)
CategoricalIndex(['2018-01', '2019-02', '2019-03'], 
                 categories=['2018-01', '2019-02', '2019-03'], 
                 ordered=False, 
                 dtype='category', 
                 name='TIME_PERIOD')
英文:

Use Series.pipe with Series.set_axis:

pd_series_2 = pd.Series(dict(
    [
        ("2018-01", 0.0),
        ("2019-02", 1200.0),
        ("2019-03", 600.0),
    ]),
    name='OBS_VALUE',
).pipe(lambda x:x.set_axis(pd.CategoricalIndex(x.index, name='TIME_PERIOD')))

print(pd_series_2.index)
CategoricalIndex(['2018-01', '2019-02', '2019-03'], 
                 categories=['2018-01', '2019-02', '2019-03'], 
                 ordered=False, 
                 dtype='category', 
                 name='TIME_PERIOD')

huangapple
  • 本文由 发表于 2023年6月12日 14:37:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76454108.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定