使用存储在另一个数据框中的索引引用数据框。

huangapple go评论74阅读模式
英文:

Reference dataframes using indices stored in another dataframe

问题

我试图引用来自另一个数据框中存储的索引的源数据框中的数据。

例如,假设我们有一个名为“shifts”的数据框,其中包含每个日期值班的人员的姓名(某些值可能为NaN):

          a    b    c

2023-01-01 Sam Max NaN
2023-01-02 Mia NaN Max
2023-01-03 NaN Sam Mia


然后我们有一个名为“performance”的数据框,其中包含每个日期每个员工的表现。行索引与“shifts”数据框相同,但列名不同:

        Sam  Mia  Max  Ian

2023-01-01 4.5 NaN 3.0 NaN
2023-01-02 NaN 2.0 3.0 NaN
2023-01-03 4.0 3.0 NaN 4.0


最后我们有一个名为“salary”的数据框,其结构和索引与其他两个数据框不同:

员工 工资
0 Sam 100
1 Mia 90
2 Max 80
3 Ian 70


我需要创建两个输出数据框,其结构和索引与“shifts”相同。
在第一个输出数据框中,我需要用该日期的员工表现来替换员工姓名。
在第二个输出数据框中,员工姓名将被其薪水取代。以下是预期的输出:

输出 1:
a b c
2023-01-01 4.5 3.0 NaN
2023-01-02 2.0 NaN 3.0
2023-01-03 NaN 4.0 3.0

输出 2:
a b c
2023-01-01 100.0 80.0 NaN
2023-01-02 90.0 NaN 80.0
2023-01-03 NaN 100.0 90.0


如何实现?谢谢
英文:

I'm trying to reference data from source dataframes, using indices stored in another dataframe.

For example, let's say we have a "shifts" dataframe with the names of the people on duty on each date (some values can be NaN):

              a    b    c
2023-01-01  Sam  Max  NaN
2023-01-02  Mia  NaN  Max
2023-01-03  NaN  Sam  Mia

Then we have a "performance" dataframe, with the performance of each employee on each date. Row indices are the same as the shifts dataframe, but column names are different:

            Sam  Mia  Max  Ian
2023-01-01  4.5  NaN  3.0  NaN
2023-01-02  NaN  2.0  3.0  NaN
2023-01-03  4.0  3.0  NaN  4.0

and finally we have a "salary" dataframe, whose structure and indices are different from the other two dataframes:

  Employee  Salary
0      Sam     100
1      Mia      90
2      Max      80
3      Ian      70

I need to create two output dataframes, with same structure and indices as "shifts".
In the first one, I need to substitute the employee name with his/her performance on that date.
In the second output dataframe, the employee name is replaced with his/her salary. Theses are the expected outputs:

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

Any idea of how to do it? Thanks

答案1

得分: 4

对于第一个部分的代码,输出如下:

              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

对于第二个部分的代码,输出如下:

                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0
英文:

For the first one:

(shifts
 .reset_index().melt('index')
 .merge(performance.stack().rename('p'),
        left_on=['index', 'value'], right_index=True)
 .pivot(index='index', columns='variable', values='p')
 .reindex_like(shifts)
)

Output:

              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

For the second:

shifts.replace(salary.set_index('Employee')['Salary'])

Output:

                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

答案2

得分: 1

这是你的代码的翻译部分:

out1 = ( shifts.stack()
    .rename_axis(index=('date','shift'))
    .reset_index().rename(columns={0:'employee'})
    .pipe(lambda df: df.assign(perf=
        df.apply(lambda row: perf.loc[row.date, row.employee], axis=1)))
    .pivot(index='date', columns='shift', values='perf')
    .rename_axis(index=None, columns=None) )

out2 = shifts.replace(salary.set_index('Employee').Salary)

输出部分:

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0
英文:

Here's a way to do what your question asks:

out1 = ( shifts.stack()
    .rename_axis(index=('date','shift'))
    .reset_index().rename(columns={0:'employee'})
    .pipe(lambda df: df.assign(perf=
        df.apply(lambda row: perf.loc[row.date, row.employee], axis=1)))
    .pivot(index='date', columns='shift', values='perf')
    .rename_axis(index=None, columns=None) )

out2 = shifts.replace(salary.set_index('Employee').Salary)

Output:

Output 1:
              a    b    c
2023-01-01  4.5  3.0  NaN
2023-01-02  2.0  NaN  3.0
2023-01-03  NaN  4.0  3.0

Output 2:
                a      b     c
2023-01-01  100.0   80.0   NaN
2023-01-02   90.0    NaN  80.0
2023-01-03    NaN  100.0  90.0

huangapple
  • 本文由 发表于 2023年2月19日 02:55:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75495667.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定