2020年1月6日 21:33:38go评论104阅读模式

英文:

first() is not adding in first line

问题

以下是您要翻译的内容：

我试图根据第一行在df中添加新行。
期望输出
我正在尝试
只插入第一行正常，然后将其添加到555的末尾，放在x之后，有些地方放在中间，例如在y之后。我的df中有成千上万行。有人可以帮忙吗？

英文:

I am trying to add new rows in df based on first rows.

id                        name                              value
111                   length                                46
111                   status                              completed
111                    segment                              21
555                     tp                                 0.1
555                     x                                  56
888                     point                              23.01
888                     x                                  50
888                     y                                  40

expected output

     id                        name                              value
    111                       type                                description #new row
    111                    length                                46
    111                    status                              completed
    111                   segment                               21
    555                       type                               description  #new row
    555                    tp                                   0.1
    555                     x                                    56
    888                     type                                description  #new row
    888                     point                                23.01
    888                     x                                    50
    888                     y                                    40

i am trying

new = df.groupby(&quot;id&quot;, as_index=False).first().assign(attribute=&#39;rdf:type&#39;, value=&#39;description&#39;)
df = pd.concat([new, df]).sort_values(&#39;id&#39;)

it only insert first row fine but then it add on last of 555 after x and some where it add in between e.g after y.I have thousands of rows in df.can any body please help?

答案1

得分: 1

定义以下函数：

def prepend(grp):
    new_row = pd.DataFrame([[grp.iloc[0].id, 'type', 'description']],
        columns=grp.columns)
    return pd.concat([new_row, grp])

然后应用它：

result = df.groupby('id').apply(prepend).droplevel(level=0)\
    .reset_index(drop=True)

根据关于新行位置的评论进行编辑

对于您的示例数据（以及自动生成的索引），我得到了正确的结果。

可能导致行的顺序不同的一个原因是，您的DataFrame中可能有一些带有负索引的行。
在这种情况下：

第一行（添加的行）生成的索引为0，
其他行具有“原始”索引，

因此连接顺序可能不同，例如使用某个旧版本的* Pandas （我尝试设置了负索引，但即使在那种情况下，我仍然得到了每次应用 prepend *返回的行的正确顺序）。

尝试更改* prepend *中的最后一行：

return pd.concat([new_row, grp], ignore_index=True)

也就是添加* ignore_index=True *。

在这种情况下，旧索引值将被忽略，每个组中的索引值将是连续的数字。
在最后一步（* reset_index ）中，它们将被覆盖为连续数字的新序列，但至少应该有 prepend *的每次应用返回的行的正确顺序。

英文:

Define the following function:

def prepend(grp):
    new_row = pd.DataFrame([[ grp.iloc[0].id, &#39;type&#39;, &#39;description&#39; ]],
        columns=grp.columns)
    return pd.concat([new_row, grp])

Then apply it:

result = df.groupby(&#39;id&#39;).apply(prepend).droplevel(level=0)\
    .reset_index(drop=True)

Edit following the comment concerning position of new rows

For your sample data (and automatically generated index) I received the
proper result.

One reason why the order of rows can be different is that maybe some rows
in your DataFrame have negative indices.
In this case:

the first (added row) is generated with index == 0,
other with "original" indices,

hence the concatenation order may be different, e.g. using some older
version of Pandas (I tried to set such negative indices, but even then
I still got the proper sequence of rows).

Try changing the last line in prepend to:

return pd.concat([new_row, grp], ignore_index=True)

i.e. add ignore_index=True.

In this case old index values are ignored and in each group index values
will be consecutive numbers.
In the last step (reset_index) they will be overwritten with the
new sequence of consecutive numbers, but at least there should be
the proper order of rows in the results returned by each application
of prepend.

答案2

得分: 0

你可以遍历分组后的数据框：

final_df = pd.DataFrame(columns=df.columns)
for id_, df_ in df.groupby('id'):
    final_df = final_df.append(pd.DataFrame([[id_, 'rdf:type', 'description']], columns=df.columns))
    final_df = final_df.append(df_)
final_df = final_df.reset_index(drop=True)

然后你会得到：

     id      name        value
0   111  rdf:type  description
1   111    length           46
2   111    status    completed
3   111   segment           21
4   555  rdf:type  description
5   555        tp          0.1
6   555         x           56
7   888  rdf:type  description
8   888     point        23.01
9   888         x           50
10  888         y           40

英文:

You could iterate through the grouped dataframes:

final_df = pd.DataFrame(columns=df.columns)
for id_, df_ in df.groupby(&#39;id&#39;):
    final_df = final_df.append(pd.DataFrame([[id_, &#39;rdf:type&#39;, &#39;description&#39;]], columns=df.columns))
    final_df = final_df.append(df_)
final_df = final_df.reset_index(drop=True)

Then you'll get

     id      name        value
0   111  rdf:type  description
1   111    length           46
2   111    status    completed
3   111   segment           21
4   555  rdf:type  description
5   555        tp          0.1
6   555         x           56
7   888  rdf:type  description
8   888     point        23.01
9   888         x           50
10  888         y           40

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“first() is not adding in first line” 可以翻译为 “first() 没有添加在第一行”。

问题

答案1

根据关于新行位置的评论进行编辑

Edit following the comment concerning position of new rows

答案2

Styleframe 模块 – read_excel_as_template 不起作用，输出一个没有样式的文件。

如何在PostgreSQL中删除数百万行数据？

从Pandas DataFrame提取数据

如何创建具有`pd.CategoricalIndex`属性的一行`pd.Series`。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论