如何创建具有`pd.CategoricalIndex`属性的一行`pd.Series`。

huangapple go评论107阅读模式
英文:

How to make one-liner `pd.Series` with `pd.CategoricalIndex` property

问题

以下是您要翻译的内容:

"I have a one-liner liner code that is working, which from pd.DataFrame create pd.Series with an index pd.CategoricalIndex.

Since pd.DataFrame is an API based on pd.Series I would like to generate the same series but now with pd.Series only, this is a question of optimization and API panda skills.

The pd.DataFrame code is listed below

  1. import pandas as pd
  2. pd_series_1 = pd.DataFrame(
  3. data=[
  4. ("2018-01", 0.0),
  5. ("2019-02", 1200.0),
  6. ("2019-03", 600.0),
  7. ],
  8. columns=['TIME_PERIOD', "OBS_VALUE"],
  9. ).astype(
  10. {"TIME_PERIOD": "category"}
  11. ).set_index(
  12. "TIME_PERIOD"
  13. )["OBS_VALUE"]
  14. assert pd_series_1.index.name == "TIME_PERIOD"
  15. assert repr(pd_series_1.index) == "CategoricalIndex(['2018-01', '2019-02', '2019-03'], " \
  16. "categories=['2018-01', '2019-02', '2019-03'], " \
  17. "ordered=False, " \
  18. "dtype='category', " \
  19. "name='TIME_PERIOD')", repr(pd_series_1.index)
  20. assert repr(pd_series_1) == "TIME_PERIOD\n" \
  21. "2018-01 0.0\n" \
  22. "2019-02 1200.0\n" \
  23. "2019-03 600.0\n" \
  24. "Name: OBS_VALUE, dtype: float64", repr(pd_series_1)

As you can see, the final series pd_series_1 has: CategoricalIndex.name equal with 'TIME_PERIOD' and the name as 'OBS_VALUE'.

The same is desired to have by using only pd.Series API within constructor or plus additional chain methods alike .set_index as in pd_series_1.

The code which I used for pd.Series is listed below

  1. pd_series_2 = pd.Series(dict(
  2. [
  3. ("2018-01", 0.0),
  4. ("2019-02", 1200.0),
  5. ("2019-03", 600.0),
  6. ]),
  7. name='OBS_VALUE',
  8. )
  9. print(pd_series_2)
  10. # 2018-01 0.0
  11. # 2019-02 1200.0
  12. # 2019-03 600.0
  13. # Name: OBS_VALUE, dtype: float64
  14. pd_series_2.index = pd.CategoricalIndex(pd_series_2.index, name='TIME_PERIOD')
  15. print(pd_series_2)
  16. # TIME_PERIOD
  17. # 2018-01 0.0
  18. # 2019-02 1200.0
  19. # 2019-03 600.0
  20. # Name: OBS_VALUE, dtype: float64

As you can observe, I managed to get the result, but the code is not one-liner.
Please suggest one-liner syntax here,

thank you in advance"

英文:

I have a one-liner liner code that is working, which from pd.DataFrame create pd.Series with an index pd.CategoricalIndex.

Since pd.DataFrame is an API based on pd.Series I would like to generate the same series but now with pd.Series only, this is a question of optimization and API panda skills.

The pd.DataFrame code is listed below

  1. import pandas as pd
  2. pd_series_1 = pd.DataFrame(
  3. data=[
  4. ("2018-01", 0.0),
  5. ("2019-02", 1200.0),
  6. ("2019-03", 600.0),
  7. ],
  8. columns=['TIME_PERIOD', "OBS_VALUE"],
  9. ).astype(
  10. {"TIME_PERIOD": "category"}
  11. ).set_index(
  12. "TIME_PERIOD"
  13. )["OBS_VALUE"]
  14. assert pd_series_1.index.name == "TIME_PERIOD"
  15. assert repr(pd_series_1.index) == "CategoricalIndex(['2018-01', '2019-02', '2019-03'], " \
  16. "categories=['2018-01', '2019-02', '2019-03'], " \
  17. "ordered=False, " \
  18. "dtype='category', " \
  19. "name='TIME_PERIOD')", repr(pd_series_1.index)
  20. assert repr(pd_series_1) == "TIME_PERIOD\n" \
  21. "2018-01 0.0\n" \
  22. "2019-02 1200.0\n" \
  23. "2019-03 600.0\n" \
  24. "Name: OBS_VALUE, dtype: float64", repr(pd_series_1)

As you can see, the final series pd_series_1 has: CategoricalIndex.name equal with 'TIME_PERIOD' and the name as 'OBS_VALUE'.

The same is desired to have by using only pd.Series API within constructor or plus additional chain methods alike .set_index as in pd_series_1.

The code which I used for pd.Series is listed below

  1. pd_series_2 = pd.Series(dict(
  2. [
  3. ("2018-01", 0.0),
  4. ("2019-02", 1200.0),
  5. ("2019-03", 600.0),
  6. ]),
  7. name='OBS_VALUE',
  8. )
  9. print(pd_series_2)
  10. # 2018-01 0.0
  11. # 2019-02 1200.0
  12. # 2019-03 600.0
  13. # Name: OBS_VALUE, dtype: float64
  14. pd_series_2.index = pd.CategoricalIndex(pd_series_2.index, name='TIME_PERIOD')
  15. print(pd_series_2)
  16. # TIME_PERIOD
  17. # 2018-01 0.0
  18. # 2019-02 1200.0
  19. # 2019-03 600.0
  20. # Name: OBS_VALUE, dtype: float64

As you can observe, I managed to get the result, but the code is not one-liner.
Please suggest one-liner syntax here,

thank you in advance

答案1

得分: 2

使用 Series.pipeSeries.set_axis

  1. pd_series_2 = pd.Series(dict(
  2. [
  3. ("2018-01", 0.0),
  4. ("2019-02", 1200.0),
  5. ("2019-03", 600.0),
  6. ]),
  7. name='OBS_VALUE',
  8. ).pipe(lambda x:x.set_axis(pd.CategoricalIndex(x.index, name='TIME_PERIOD')))
  9. print(pd_series_2.index)
  10. CategoricalIndex(['2018-01', '2019-02', '2019-03'],
  11. categories=['2018-01', '2019-02', '2019-03'],
  12. ordered=False,
  13. dtype='category',
  14. name='TIME_PERIOD')
英文:

Use Series.pipe with Series.set_axis:

  1. pd_series_2 = pd.Series(dict(
  2. [
  3. ("2018-01", 0.0),
  4. ("2019-02", 1200.0),
  5. ("2019-03", 600.0),
  6. ]),
  7. name='OBS_VALUE',
  8. ).pipe(lambda x:x.set_axis(pd.CategoricalIndex(x.index, name='TIME_PERIOD')))
  9. print(pd_series_2.index)
  10. CategoricalIndex(['2018-01', '2019-02', '2019-03'],
  11. categories=['2018-01', '2019-02', '2019-03'],
  12. ordered=False,
  13. dtype='category',
  14. name='TIME_PERIOD')

huangapple
  • 本文由 发表于 2023年6月12日 14:37:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76454108.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定