2023年6月22日 20:13:52go评论104阅读模式

英文:

capture column name as value in pandas

问题

我只想在新列中以值的形式捕获主题名称（列名），在重新评估后学生的成绩有所改善。

我有重新评估前的数据集：

姓名	班级	考试	数学	物理	化学
约翰	10年级	模型 1	98	78	75
鲍勃	06年级	期中考试	65	72	92
罗斯	06年级	模型 2	91	70	54
迈克尔	07年级	模型 1	72	90	45

现在我有重新评估后的数据集，有些学生的成绩有所提高，其他学生的成绩有新数据，

姓名	班级	考试	数学	物理	化学
约翰	10年级	模型 1	98	78	87
鲍勃	06年级	期中考试	65	91	92
罗斯	06年级	模型 2	91	70	54
迈克尔	07年级	模型 1	100	90	45
萨姆	08年级	期中考试	43	62	80
詹姆斯	10年级	模型 `	76	66	96
亨利	09年级	模型 1	34	91	70

现在，我们需要合并这两个数据集，并标记哪些行已更新，以及哪些列已更新，因此，合并后的数据集看起来像这样，

姓名	班级	考试	数学	物理	化学
约翰	10年级	模型 1	98	78	75
鲍勃	06年级	期中考试	65	72	92
罗斯	06年级	模型 2	91	70	54
迈克尔	07年级	模型 1	72	90	45
约翰	10年级	模型 1	98	78	87
鲍勃	06年级	期中考试	65	91	92
罗斯	06年级	模型 2	91	70	54
迈克尔	07年级	模型 1	100	90	45
萨姆	08年级	期中考试	43	62	80
詹姆斯	10年级	模型 `	76	66	96
亨利	09年级	模型 1	34	91	70

现在，最终输出应该如下所示，有两列新列，我能消除重复项并添加新列任何改进，但我卡在添加另一新列改进的科目。

姓名	班级	考试	数学	物理	化学	任何改进	改进的科目
约翰	10年级	模型 1	98	78	87	是	化学
鲍勃	06年级	期中考试	65	91	92	是	物理
罗斯	06年级	模型 2	91	70	54	否	无改进
迈克尔	07年级	模型 1	100	90	45	是	数学
萨姆	08年级	期中考试	43	62	80	新条目	新条目
詹姆斯	10年级	模型 `	76	66	96	新条目	新条目
亨利	09年级	模型 1	34	91	70	新条目	新条目

以下是我用于此目的的代码，

通过连接姓名
<details>
<summary>英文:</summary>
I just want to capture the subject name (column name) as value in new column where there is some improvements in the students marks after re-evaluation.
I have the dataset before re-evaluation:
|   name  |    class   |   exam  | maths | physics | chemistry |
|---------|------------|---------|-------|---------|-----------|
|  John   | Grade - 10 | model 1 |  98   |   78    |    75     |
|  Bob    | Grade - 06 | mid term|  65   |   72    |    92     |
|  Rose   | Grade - 06 | model 2 |  91   |   70    |    54     |
| Michael | Grade - 07 | model 1 |  72   |   90    |    45     |
Now I have the dataset after re-evaluation, there are some improvements in some students marks, and there are new data on other students marks who took their exam recently,
|   name  |    class   |   exam  |   maths    |   physics  |   chemistry  |
|---------|------------|---------|------------|------------|--------------|
|  John   | Grade - 10 | model 1 |    98      |     78     |    **87**    |
|  Bob    | Grade - 06 | mid term|    65      |   **91**   |      92      |
|  Rose   | Grade - 06 | model 2 |    91      |     70     |      54      |
| Michael | Grade - 07 | model 1 |  **100**   |     90     |      45      |
|  Sam    | Grade - 08 | mid term|    43      |     62     |      80      |
|  James  | Grade - 10 | model ` |    76      |     66     |      96      |
| Henry   | Grade - 09 | model 1 |    34      |     91     |      70      |
Now, we need to concat these two datasets, and mark which row is updated, and which column got updated, so, the concatenated dataset looks like this,
|   name  |    class   |   exam  |   maths    |   physics  |   chemistry  |
|---------|------------|---------|------------|------------|--------------|
|  John   | Grade - 10 | model 1 |    98      |     78     |      75      |
|  Bob    | Grade - 06 | mid term|    65      |     72     |      92      |
|  Rose   | Grade - 06 | model 2 |    91      |     70     |      54      |
| Michael | Grade - 07 | model 1 |    72      |     90     |      45      |
|  John   | Grade - 10 | model 1 |    98      |     78     |    **87**    |
|  Bob    | Grade - 06 | mid term|    65      |   **91**   |      92      |
|  Rose   | Grade - 06 | model 2 |    91      |     70     |      54      |
| Michael | Grade - 07 | model 1 |  **100**   |     90     |      45      |
|  Sam    | Grade - 08 | mid term|    43      |     62     |      80      |
|  James  | Grade - 10 | model ` |    76      |     66     |      96      |
| Henry   | Grade - 09 | model 1 |    34      |     91     |      70      |
Now, the final output should look like this, with 2 new columns, I was able to eliminate the duplicates and added the new columns **any improvement**, but I got stuck on adding the other new column **improved subject**
|   name  |    class   |   exam  | maths  |physics|chemistry|any improvement|improved subject|
|---------|------------|---------|--------|-------|---------|---------------|----------------|
|  John   | Grade - 10 | model 1 |  98    |  78   | **87**  |      Yes      |    chemistry   |
|  Bob    | Grade - 06 | mid term|  65    |**91** |   92    |      Yes      |    physics     |
|  Rose   | Grade - 06 | model 2 |  91    |  70   |   54    |      No       | no improvement |
| Michael | Grade - 07 | model 1 |**100** |  90   |   45    |      Yes      |     maths      |
|  Sam    | Grade - 08 | mid term|  43    |  62   |   80    |   New Entry   |    new entry   |
|  James  | Grade - 10 | model ` |  76    |  66   |   96    |   New Entry   |    new entry   |
| Henry   | Grade - 09 | model 1 |  34    |  91   |   70    |   New Entry   |    new entry   |
Below is the code, I used for this,

added primary key column by concatenating name, class, exam and secondary key column by concatenating maths,physics,chemistry.

dupedf = concatdf.loc[concatdf.duplicated(subset=['PrimaryKey', 'SecondaryKey'],keep=False)]

dupedf1 = concatdf.loc[concatdf.duplicated(subset=['PrimaryKey'],keep=False)]

for i,j in dupedf.iterrows():
for k,l in dupedf1.iterrows():
if l['PrimaryKey'] == j['PrimaryKey']:

        dupedf = dupedf.drop_duplicates(subset=[&#39;PrimaryKey&#39;,&#39;SecondaryKey&#39;],keep=&#39;last&#39;)
        dupedf[&#39;any improvement&#39;] = &#39;No&#39;
        # dupedf[&#39;improved subject&#39;] = &#39; &#39;
    else:
        dupedf1 = dupedf1.drop_duplicates(subset=[&#39;SecondaryKey&#39;],keep=False)
        
        dupedf1 = dupedf1.drop_duplicates(subset=[&#39;PrimaryKey&#39;],keep=&#39;last&#39;)
        dupedf1[&#39;any improvement&#39;] = &#39;Yes&#39;
       # dupedf1[&#39;improved subject&#39;] = &#39;column name&#39;


in the above code, I am iterating only the rows which exists in both before &amp; after re-evaluation datasets. iterating row by row to have fill the 2 new columns **any improvement &amp; improved subject.** **I was able to achieve for any improvement column, but I need help with improved subject column.**
</details>
# 答案1
**得分**: 1
```python
# 一个可能的解决方案：
pkeys = ["name", "class", "exam"]
delta = df1.set_index(pkeys).sub(df2.set_index(pkeys)).lt(0)
mapper = { # 这将处理多个科目的改进
    k: "/".join(delta.columns[delta.loc[k]]) # 根据需要更改分隔符
    for k in delta.index if any(delta.loc[k])
}
tmp = pd.concat([df1.set_index(pkeys), df2.set_index(pkeys)])
m = tmp.index.duplicated(keep=False)
tmp["any improvement"] = (
    pd.Index(tmp.index.isin(mapper))
    .map({True: "是", False: "否"}).where(m, "新条目")
)
tmp["improved subject"] = (
    tmp.index.map(mapper).fillna("没有改进").where(m, "新条目")
)
out = tmp.query("~index.duplicated(keep='last')").reset_index()
# 输出：
print(out)
   姓名       班级    考试      数学     物理    化学  任何改进  改进科目
0  约翰  10年级  模型1    98    78      87          是              化学
1  鲍勃   06年级  中期    65    91      92          是              物理
2  罗斯   06年级  模型2   91    70      54          否              没有改进
3  迈克尔  07年级  模型1  100    90      45          是              数学
4  山姆   08年级  中期    43    62      80   新条目     新条目
5  詹姆斯  10年级  模型`    76    66      96   新条目     新条目
6  亨利   09年级  模型1    34    91      70   新条目     新条目

英文:

A possible solution :

pkeys = [&quot;name&quot;, &quot;class&quot;, &quot;exam&quot;]
delta = df1.set_index(pkeys).sub(df2.set_index(pkeys)).lt(0)
mapper = { # this will handle multiple subjects improvements
    k: &quot;/&quot;.join(delta.columns[delta.loc[k]]) # change the sep if needed
    for k in delta.index if any(delta.loc[k])
}
tmp = pd.concat([df1.set_index(pkeys), df2.set_index(pkeys)])
m = tmp.index.duplicated(keep=False)
tmp[&quot;any improvement&quot;] = (
    pd.Index(tmp.index.isin(mapper))
    .map({True: &quot;Yes&quot;, False: &quot;No&quot;}).where(m, &quot;New Entry&quot;)
)
tmp[&quot;improved subject&quot;] = (
    tmp.index.map(mapper).fillna(&quot;no improvement&quot;).where(m, &quot;New Entry&quot;)
)
out = tmp.query(&quot;~index.duplicated(keep=&#39;last&#39;)&quot;).reset_index()

Output :

print(out)
   name      class     exam  maths  physics  chemistry any improvement improved subject
   John Grade - 10  model 1     98       78         87             Yes        chemistry
    Bob Grade - 06 mid term     65       91         92             Yes          physics
   Rose Grade - 06  model 2     91       70         54              No   no improvement
Michael Grade - 07  model 1    100       90         45             Yes            maths
    Sam Grade - 08 mid term     43       62         80       New Entry        New Entry
  James Grade - 10  model `     76       66         96       New Entry        New Entry
  Henry Grade - 09  model 1     34       91         70       New Entry        New Entry

With the Styler to expose the improvements :

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas中将列名捕获为值

问题

Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

Iterate over specific csv rows and rerun code when new string detected after empty cell in first column

Pydroid PIL在安卓上无法显示图像。

如何将当前时间设置为Django模型中“TimeField()”的默认值？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。