2023年1月9日 15:16:48go评论101阅读模式

英文:

Lookup value by index and name in Pandas

问题

我有一个带有扁平化层次结构的pandas数据框：

Level 1 ID	Level 2 ID	Level 3 ID	Level 4 ID	Name	Path
1	null	null	null	Finance	Finance
1	4	null	null	Reporting	Finance > Reporting
1	4	5	null	Tax Reporting	Finance > Reporting > Tax Reporting

我想要做的是根据Level ID列添加或替换为4个Level Name列，如下所示：

Level 1 Name	Level 2 Name	Level 3 Name	Level 4 Name	Name	Path
Finance	null	null	null	Finance	Finance
Finance	Reporting	null	null	Reporting	Finance > Reporting
Finance	Reporting	Tax Reporting	null	Tax Reporting	Finance > Reporting > Tax Reporting

我会在Path列上使用分隔符，但在实际的数据框中，存在ID而不是名称（格式化为"1 > 4 > 5"）。

我应该如何处理这个问题？

df.info()的输出如下：

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135 entries, 0 to 134
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   name        135 non-null    object
 1   depid       135 non-null    object
 2   depcode     135 non-null    object
 3   parentpath  135 non-null    object
 4   DEP_LV1_ID  135 non-null    object
 5   DEP_LV2_ID  135 non-null    object
 6   DEP_LV3_ID  98 non-null     object
 7   DEP_LV4_ID  56 non-null     object
dtypes: object(8)
memory usage: 8.6+ KB

英文:

I have a pandas dataframe with a flattened hierarchy:

Level 1 ID	Level 2 ID	Level 3 ID	Level 4 ID	Name	Path
1	null	null	null	Finance	Finance
1	4	null	null	Reporting	Finance > Reporting
1	4	5	null	Tax Reporting	Finance > Reporting > Tax Reporting

What I want to do is add or replace with the Level ID columns with 4 Level Name columns based on the Level [] ID columns, like the following:

Level 1 Name	Level 2 Name	Level 3 Name	Level 4 Name	Name	Path
Finance	null	null	null	Finance	Finance
Finance	Reporting	null	null	Reporting	Finance > Reporting
Finance	Reporting	Tax Reporting	null	Tax Reporting	Finance > Reporting > Tax Reporting

I would use a separator on the Path column, but in the real dataframe, there are IDs instead of names (formatted like "1 > 4 > 5")

How should I approach this?

Output of df.info() is the following:

df.info()
&lt;class &#39;pandas.core.frame.DataFrame&#39;&gt;
RangeIndex: 135 entries, 0 to 134
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   name        135 non-null    object
 1   depid       135 non-null    object
 2   depcode     135 non-null    object
 3   parentpath  135 non-null    object
 4   DEP_LV1_ID  135 non-null    object
 5   DEP_LV2_ID  135 non-null    object
 6   DEP_LV3_ID  98 non-null     object
 7   DEP_LV4_ID  56 non-null     object
dtypes: object(8)
memory usage: 8.6+ KB

答案1

得分: 1

以下是翻译好的部分：

假设源是 `df['Name']`

cols = df.filter(like=&#39;Level &#39;).columns
names = df[&#39;Name&#39;].values
mask = df[cols[:len(names)]].notna()
df[cols[:len(names)]] = mask.mul(names, axis=1).where(mask)

输出:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance        NaN            NaN         NaN        Finance                              Finance
1    Finance  Reporting            NaN         NaN      Reporting                  Finance &gt; Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance &gt; Reporting &gt; Tax Reporting

如果您更喜欢从 "Path" 中提取

cols = df.filter(like=&#39;Level &#39;).columns
names = df[&#39;Path&#39;].str.split(&#39; &gt; &#39;, expand=True)
df.loc[:, cols[:names.shape[1]]] = names.to_numpy()

输出:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance       None           None         NaN        Finance                              Finance
1    Finance  Reporting           None         NaN      Reporting                  Finance &gt; Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance &gt; Reporting &gt; Tax Reporting

可复现的输入:

import pandas as pd
from numpy import nan
df = pd.DataFrame({&#39;Level 1 ID&#39;: [1, 1, 1],
                   &#39;Level 2 ID&#39;: [nan, 4.0, 4.0],
                   &#39;Level 3 ID&#39;: [nan, nan, 5.0],
                   &#39;Level 4 ID&#39;: [nan, nan, nan],
                   &#39;Name&#39;: [&#39;Finance&#39;, &#39;Reporting&#39;, &#39;Tax Reporting&#39;],
                   &#39;Path&#39;: [&#39;Finance&#39;, &#39;Finance &gt; Reporting&#39;, &#39;Finance &gt; Reporting &gt; Tax Reporting&#39;]}
                 )

英文:

The logic is unclear, in particular what is the source of the final values? See two different options below.

Assuming the source is `df['Name']`

cols = df.filter(like=&#39;Level &#39;).columns
names = df[&#39;Name&#39;].values
mask = df[cols[:len(names)]].notna()
df[cols[:len(names)]] = mask.mul(names, axis=1).where(mask)

Output:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance        NaN            NaN         NaN        Finance                              Finance
1    Finance  Reporting            NaN         NaN      Reporting                  Finance &gt; Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance &gt; Reporting &gt; Tax Reporting

If you rather want to extract from "Path"

cols = df.filter(like=&#39;Level &#39;).columns
names = df[&#39;Path&#39;].str.split(&#39; &gt; &#39;, expand=True)
df.loc[:, cols[:names.shape[1]]] = names.to_numpy()

Output:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance       None           None         NaN        Finance                              Finance
1    Finance  Reporting           None         NaN      Reporting                  Finance &gt; Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance &gt; Reporting &gt; Tax Reporting

reproducible input:

import pandas as pd
from numpy import nan
df = pd.DataFrame({&#39;Level 1 ID&#39;: [1, 1, 1],
                   &#39;Level 2 ID&#39;: [nan, 4.0, 4.0],
                   &#39;Level 3 ID&#39;: [nan, nan, 5.0],
                   &#39;Level 4 ID&#39;: [nan, nan, nan],
                   &#39;Name&#39;: [&#39;Finance&#39;, &#39;Reporting&#39;, &#39;Tax Reporting&#39;],
                   &#39;Path&#39;: [&#39;Finance&#39;, &#39;Finance &gt; Reporting&#39;, &#39;Finance &gt; Reporting &gt; Tax Reporting&#39;]}
                 )

答案2

得分: 1

你可以创建一个映射系列来解析数字 -> 名称：

url = 'https://drive.google.com/uc?id=1-2YXvyb8QEtHrrAO0UCH6vSJ5ww6CCjK&amp;export=download'
df = pd.read_excel(url, index_col=0)
cols = df.columns[df.columns.str.contains('DEP_LV\d_ID')]
idx = df[cols].ffill(axis=1).iloc[:, -1].tolist()
sr = pd.Series(df['name'].tolist(), index=idx)
df[cols] = df[cols].apply(lambda x: x.map(sr))

输出：

&gt;&gt;&gt; df
                                           name  depid depcode         parentpath                 DEP_LV1_ID                                  DEP_LV2_ID                            DEP_LV3_ID     DEP_LV4_ID
0                         Дотоод аудитын хэлтэс    152   61100              |152|      Дотоод аудитын хэлтэс                                         NaN                                   NaN            NaN
1                       Санхүү бүртгэлийн газар    214   31000              |214|    Санхүү бүртгэлийн газар                                         NaN                                   NaN            NaN
2    Хүний нөөцийн бодлого, төлөвлөлтийн хэлтэс    211   32100          |209|211|        Хүний нөөцийн газар  Хүний нөөцийн бодлого, төлөвлөлтийн хэлтэс                                   NaN            NaN
3                      Санхүү бүртгэлийн хэлтэс    215   31100          |214|215|    Санхүү бүртгэлийн газар                    Санхүү бүртгэлийн хэлтэс                                   NaN            NaN
4                           Хүний нөөцийн газар    209   32000              |209|        Хүний нөөцийн газар                                         NaN                                   NaN            NaN
..                                          ...    ...     ...                ...                        ...                                         ...                                   ...            ...
130                               Оёх нэгж (C1)    816   20512  |511|522|811|816|   Үйлдвэр удирдлагын газар                          Сүлжмэлийн үйлдвэр                   Сүлжмэлийн 1-р алба  Оёх нэгж (C1)
131                            Галлериа УБ нэгж    867   11209      |857|859|867|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс                      Галлериа УБ нэгж            NaN
132        Хими цэвэрлэгээ, нөхөн засварын алба    870   11230      |857|859|870|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс  Хими цэвэрлэгээ, нөхөн засварын алба            NaN
133                                 Дархан нэгж    868   11205      |857|859|868|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс                           Дархан нэгж            NaN
134                            Төв дэлгүүр нэгж    869   11201      |857|859|869|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс                      Төв дэлгүүр нэгж            NaN
[135 rows x 8 columns]

英文:

You can create a mapping Series to resolve number -> name:

url = &#39;https://drive.google.com/uc?id=1-2YXvyb8QEtHrrAO0UCH6vSJ5ww6CCjK&amp;export=download&#39;
df = pd.read_excel(url, index_col=0)
cols = df.columns[df.columns.str.contains(&#39;DEP_LV\d_ID&#39;)]
idx = df[cols].ffill(axis=1).iloc[:, -1].tolist()
sr = pd.Series(df[&#39;name&#39;].tolist(), index=idx)
df[cols] = df[cols].apply(lambda x: x.map(sr))

Output:

&gt;&gt;&gt; df
                                           name  depid depcode         parentpath                 DEP_LV1_ID                                  DEP_LV2_ID                            DEP_LV3_ID     DEP_LV4_ID
0                         Дотоод аудитын хэлтэс    152   61100              |152|      Дотоод аудитын хэлтэс                                         NaN                                   NaN            NaN
1                       Санхүү бүртгэлийн газар    214   31000              |214|    Санхүү бүртгэлийн газар                                         NaN                                   NaN            NaN
2    Хүний нөөцийн бодлого, төлөвлөлтийн хэлтэс    211   32100          |209|211|        Хүний нөөцийн газар  Хүний нөөцийн бодлого, төлөвлөлтийн хэлтэс                                   NaN            NaN
3                      Санхүү бүртгэлийн хэлтэс    215   31100          |214|215|    Санхүү бүртгэлийн газар                    Санхүү бүртгэлийн хэлтэс                                   NaN            NaN
4                           Хүний нөөцийн газар    209   32000              |209|        Хүний нөөцийн газар                                         NaN                                   NaN            NaN
..                                          ...    ...     ...                ...                        ...                                         ...                                   ...            ...
130                               Оёх нэгж (C1)    816   20512  |511|522|811|816|   Үйлдвэр удирдлагын газар                          Сүлжмэлийн үйлдвэр                   Сүлжмэлийн 1-р алба  Оёх нэгж (C1)
131                            Галлериа УБ нэгж    867   11209      |857|859|867|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс                      Галлериа УБ нэгж            NaN
132        Хими цэвэрлэгээ, нөхөн засварын алба    870   11230      |857|859|870|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс  Хими цэвэрлэгээ, нөхөн засварын алба            NaN
133                                 Дархан нэгж    868   11205      |857|859|868|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс                           Дархан нэгж            NaN
134                            Төв дэлгүүр нэгж    869   11201      |857|859|869|  Дотоод борлуулалтын газар                  Дотоод борлуулалтын хэлтэс                      Төв дэлгүүр нэгж            NaN
[135 rows x 8 columns]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas中按索引和名称查找数值

问题

答案1

假设源是 `df['Name']`

如果您更喜欢从 "Path" 中提取

可复现的输入:

Assuming the source is `df['Name']`

If you rather want to extract from "Path"

reproducible input:

答案2

Getting [Errno 1] Operation is not permitted error when trying to write to /storage/emulated/0 with Kivy and Python on Android

如何将文件从S3存储桶复制到同一S3存储桶中的文件夹？

将一个Excel列的数据拆分成两列，使用Python。

如何高效计算多个样本的逐样本梯度？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

答案1

假设源是 df[&#39;Name&#39;]

如果您更喜欢从 "Path" 中提取

可复现的输入:

Assuming the source is df[&#39;Name&#39;]

If you rather want to extract from "Path"

reproducible input:

答案2

发表评论

假设源是 `df['Name']`

Assuming the source is `df['Name']`