2023年2月19日 02:55:18go评论85阅读模式

英文:

Reference dataframes using indices stored in another dataframe

问题

我试图引用来自另一个数据框中存储的索引的源数据框中的数据。

例如，假设我们有一个名为“shifts”的数据框，其中包含每个日期值班的人员的姓名（某些值可能为NaN）：

          a    b    c

2023-01-01 Sam Max NaN
2023-01-02 Mia NaN Max
2023-01-03 NaN Sam Mia


然后我们有一个名为“performance”的数据框，其中包含每个日期每个员工的表现。行索引与“shifts”数据框相同，但列名不同：

        Sam  Mia  Max  Ian

2023-01-01 4.5 NaN 3.0 NaN
2023-01-02 NaN 2.0 3.0 NaN
2023-01-03 4.0 3.0 NaN 4.0


最后我们有一个名为“salary”的数据框，其结构和索引与其他两个数据框不同：

员工工资
0 Sam 100
1 Mia 90
2 Max 80
3 Ian 70


我需要创建两个输出数据框，其结构和索引与“shifts”相同。
在第一个输出数据框中，我需要用该日期的员工表现来替换员工姓名。
在第二个输出数据框中，员工姓名将被其薪水取代。以下是预期的输出：

输出 1:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0

输出 2:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0


如何实现？谢谢

英文:

I'm trying to reference data from source dataframes, using indices stored in another dataframe.

For example, let's say we have a "shifts" dataframe with the names of the people on duty on each date (some values can be NaN):

              a    b    c
2023-01-01  Sam  Max  NaN
2023-01-02  Mia  NaN  Max
2023-01-03  NaN  Sam  Mia

Then we have a "performance" dataframe, with the performance of each employee on each date. Row indices are the same as the shifts dataframe, but column names are different:

            Sam  Mia  Max  Ian
2023-01-01  4.5  NaN  3.0  NaN
2023-01-02  NaN  2.0  3.0  NaN
2023-01-03  4.0  3.0  NaN  4.0

and finally we have a "salary" dataframe, whose structure and indices are different from the other two dataframes:

  Employee  Salary
0      Sam     100
1      Mia      90
2      Max      80
3      Ian      70

I need to create two output dataframes, with same structure and indices as "shifts".
In the first one, I need to substitute the employee name with his/her performance on that date.
In the second output dataframe, the employee name is replaced with his/her salary. Theses are the expected outputs:

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

Any idea of how to do it? Thanks

答案1

得分: 4

对于第一个部分的代码，输出如下：

              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

对于第二个部分的代码，输出如下：

                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

英文:

For the first one:

(shifts
 .reset_index().melt(&#39;index&#39;)
 .merge(performance.stack().rename(&#39;p&#39;),
        left_on=[&#39;index&#39;, &#39;value&#39;], right_index=True)
 .pivot(index=&#39;index&#39;, columns=&#39;variable&#39;, values=&#39;p&#39;)
 .reindex_like(shifts)
)

Output:

              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

For the second:

shifts.replace(salary.set_index(&#39;Employee&#39;)[&#39;Salary&#39;])

Output:

                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

答案2

得分: 1

这是你的代码的翻译部分：

out1 = ( shifts.stack()
    .rename_axis(index=('date','shift'))
    .reset_index().rename(columns={0:'employee'})
    .pipe(lambda df: df.assign(perf=
        df.apply(lambda row: perf.loc[row.date, row.employee], axis=1)))
    .pivot(index='date', columns='shift', values='perf')
    .rename_axis(index=None, columns=None) )

out2 = shifts.replace(salary.set_index('Employee').Salary)

输出部分：

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

英文:

Here's a way to do what your question asks:

out1 = ( shifts.stack()
    .rename_axis(index=(&#39;date&#39;,&#39;shift&#39;))
    .reset_index().rename(columns={0:&#39;employee&#39;})
    .pipe(lambda df: df.assign(perf=
        df.apply(lambda row: perf.loc[row.date, row.employee], axis=1)))
    .pivot(index=&#39;date&#39;, columns=&#39;shift&#39;, values=&#39;perf&#39;)
    .rename_axis(index=None, columns=None) )

out2 = shifts.replace(salary.set_index(&#39;Employee&#39;).Salary)

Output:

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用存储在另一个数据框中的索引引用数据框。

问题

答案1

答案2

打印使用Python的信息？

如何从一个 TensorFlow 张量的概率分布中抽样 5 个索引及其对应的概率？

自动根据GeoJSON属性使用geopandas设置轮廓和填充颜色。

C++与Python在特定问题上的比较

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论