问题

以下是代码的翻译部分：

我有以下的数据框
    import pandas as pd
    df = pd.DataFrame()
    df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
    df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake)
    df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
    df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING)
    print(df)
如果某个人的工时累积达到38小时，将对达到38小时的单元格进行调整，插入一行重复的数据行，并将工时余额添加到以下行。以下代码执行此操作，可以看到原始数据和调整后数据之间的差异。
    s = df.groupby('number')['hours'].cumsum()
    m = s.gt(38)
    idx = m.groupby(df['number']).idxmax()
    delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
    out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
    out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
    out.loc[out.index.duplicated(), 'hours'] -= delta
    print(out)
对于被调整的行和被插入的行，我需要通过插入另一列并添加字符（例如 'x'）来标记它们以突出显示已调整和已插入的行。
<details>
<summary>英文:</summary>
I have the following data-frame
    import pandas as pd
    df = pd.DataFrame()
    df[&#39;number&#39;] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
    df[&#39;name&#39;]=(&#39;Alex&#39;,&#39;Alex&#39;,&#39;Alex&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Ankit&#39;,&#39;Abhishek&#39;,&#39;Abhishek&#39;,&#39;Abhishek&#39;,&#39;Blake&#39;,&#39;Blake&#39;,&#39;Blake&#39;,&#39;Blake&#39;)
    df[&#39;hours&#39;]=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
    df[&#39;loc&#39;]=(&#39;Nar&#39;,&#39;SCC&#39;,&#39;RSL&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNIT-C&#39;,&#39;UNI&#39;,&#39;UNI&#39;,&#39;UNI&#39;,&#39;UNKING&#39;,&#39;UNKING&#39;,&#39;UNKING&#39;,&#39;UNKING&#39;)
    print(df)
If the running balance of an individuals hours reach 38 an adjustment to the cell that reached the 38th hour is made, a duplicate row is inserted and the balance of hours is added to the following row. The following code performs this and the difference in output of original data to adjusted data can be seen.
    s = df.groupby(&#39;number&#39;)[&#39;hours&#39;].cumsum()
    m = s.gt(38)
    idx = m.groupby(df[&#39;number&#39;]).idxmax()
    delta = s.groupby(df[&#39;number&#39;]).shift().rsub(38).fillna(s)
    out = df.loc[df.index.repeat((df.index.isin(idx)&amp;m)+1)]
    out.loc[out.index.duplicated(keep=&#39;last&#39;), &#39;hours&#39;] = delta
    out.loc[out.index.duplicated(), &#39;hours&#39;] -= delta
    print(out)
For the row that got adjusted and the row that got inserted I need to tag them via inserting another column and adding a character such as an &#39;x&#39; to highlight the adjusted and inserted row
</details>
# 答案1
**得分**: 1
当你复制索引时，可以使用`out.index.duplicated`作为布尔掩码：
```python
# or out['mod'] = np.where(out.index.duplicated(keep=False), 'x', '-')
out.loc[out.index.duplicated(keep=False), 'mod'] = 'x'
print(out)
# 输出
    number      name  hours     loc  mod
0      651      Alex   8.25     Nar  NaN
1      651      Alex   7.50     SCC  NaN
2      651      Alex   7.50     RSL  NaN
3     4267     Ankit   7.50  UNIT-C  NaN
4     4267     Ankit  14.00  UNIT-C  NaN
5     4267     Ankit  12.00  UNIT-C  NaN
6     4267     Ankit   4.50  UNIT-C    x  # 索引6
6     4267     Ankit  10.50  UNIT-C    x  # 重复
7     4267     Ankit  11.00  UNIT-C  NaN
8     4267     Ankit   6.50  UNIT-C  NaN
9     4267     Ankit  14.00  UNIT-C  NaN
10    8806  Abhishek  15.00     UNI  NaN
11    8806  Abhishek  15.00     UNI  NaN
12    8806  Abhishek   8.00     UNI    x  # 索引12
12    8806  Abhishek   5.50     UNI    x  # 重复
13    6841     Blake   8.00  UNKING  NaN
14    6841     Blake   8.00  UNKING  NaN
15    6841     Blake   8.00  UNKING  NaN
16    6841     Blake   8.00  UNKING  NaN

请注意，这是代码的翻译，不包括代码部分。

英文:

As you duplicate index, you can use out.index.duplicated as boolean mask:

# or out[&#39;mod&#39;] = np.where(out.index.duplicated(keep=False), &#39;x&#39;, &#39;-&#39;)
out.loc[out.index.duplicated(keep=False), &#39;mod&#39;] = &#39;x&#39;
print(out)
# Output
number      name  hours     loc  mod
0      651      Alex   8.25     Nar  NaN
1      651      Alex   7.50     SCC  NaN
2      651      Alex   7.50     RSL  NaN
3     4267     Ankit   7.50  UNIT-C  NaN
4     4267     Ankit  14.00  UNIT-C  NaN
5     4267     Ankit  12.00  UNIT-C  NaN
6     4267     Ankit   4.50  UNIT-C    x  # index 6
6     4267     Ankit  10.50  UNIT-C    x  # twice
7     4267     Ankit  11.00  UNIT-C  NaN
8     4267     Ankit   6.50  UNIT-C  NaN
9     4267     Ankit  14.00  UNIT-C  NaN
10    8806  Abhishek  15.00     UNI  NaN
11    8806  Abhishek  15.00     UNI  NaN
12    8806  Abhishek   8.00     UNI    x  # index 12
12    8806  Abhishek   5.50     UNI    x  # twice
13    6841     Blake   8.00  UNKING  NaN
14    6841     Blake   8.00  UNKING  NaN
15    6841     Blake   8.00  UNKING  NaN
16    6841     Blake   8.00  UNKING  NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Pandas DataFrame，添加列并标记已调整和插入的行。

问题

如何将数据框传递给不同的函数，并进行筛选和分组。

如何在SQLAlchemy中查询由JSON字符串组成的属性？

在Pandas数据框中动态重命名列，并将其写入S3为CSV格式。

过滤和排序CSV数据，并将其存储为PDF文件，在特定行后插入分页。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论