“first() is not adding in first line” 可以翻译为 “first() 没有添加在第一行”。

huangapple go评论79阅读模式
英文:

first() is not adding in first line

问题

以下是您要翻译的内容:

我试图根据第一行在df中添加新行。

期望输出

我正在尝试

只插入第一行正常,然后将其添加到555的末尾,放在x之后,有些地方放在中间,例如在y之后。我的df中有成千上万行。有人可以帮忙吗?
英文:

I am trying to add new rows in df based on first rows.

<!-- language: none -->

id                        name                              value
111                   length                                46
111                   status                              completed
111                    segment                              21
555                     tp                                 0.1
555                     x                                  56
888                     point                              23.01
888                     x                                  50
888                     y                                  40

expected output

<!-- language: none -->

     id                        name                              value
    111                       type                                description #new row
    111                    length                                46
    111                    status                              completed
    111                   segment                               21
    555                       type                               description  #new row
    555                    tp                                   0.1
    555                     x                                    56
    888                     type                                description  #new row
    888                     point                                23.01
    888                     x                                    50
    888                     y                                    40

i am trying

<!-- language: none -->

new = df.groupby(&quot;id&quot;, as_index=False).first().assign(attribute=&#39;rdf:type&#39;, value=&#39;description&#39;)
df = pd.concat([new, df]).sort_values(&#39;id&#39;)

it only insert first row fine but then it add on last of 555 after x and some where it add in between e.g after y.I have thousands of rows in df.can any body please help?

答案1

得分: 1

定义以下函数:

def prepend(grp):
    new_row = pd.DataFrame([[grp.iloc[0].id, 'type', 'description']],
        columns=grp.columns)
    return pd.concat([new_row, grp])

然后应用它:

result = df.groupby('id').apply(prepend).droplevel(level=0)\
    .reset_index(drop=True)

根据关于新行位置的评论进行编辑

对于您的示例数据(以及自动生成的索引),我得到了正确的结果。

可能导致行的顺序不同的一个原因是,您的DataFrame中可能有一些带有负索引的行。
在这种情况下:

  • 第一行(添加的行)生成的索引为0,
  • 其他行具有“原始”索引,

因此连接顺序可能不同,例如使用某个旧版本的* Pandas (我尝试设置了负索引,但即使在那种情况下,我仍然得到了每次应用 prepend *返回的行的正确顺序)。

尝试更改* prepend *中的最后一行:

return pd.concat([new_row, grp], ignore_index=True)

也就是添加* ignore_index=True *。

在这种情况下,旧索引值将被忽略,每个组中的索引值将是连续的数字。
在最后一步(* reset_index )中,它们将被覆盖为连续数字的新序列,但至少应该有 prepend *的每次应用返回的行的正确顺序。

英文:

Define the following function:

def prepend(grp):
    new_row = pd.DataFrame([[ grp.iloc[0].id, &#39;type&#39;, &#39;description&#39; ]],
        columns=grp.columns)
    return pd.concat([new_row, grp])

Then apply it:

result = df.groupby(&#39;id&#39;).apply(prepend).droplevel(level=0)\
    .reset_index(drop=True)

Edit following the comment concerning position of new rows

For your sample data (and automatically generated index) I received the
proper result.

One reason why the order of rows can be different is that maybe some rows
in your DataFrame have negative indices.
In this case:

  • the first (added row) is generated with index == 0,
  • other with "original" indices,

hence the concatenation order may be different, e.g. using some older
version of Pandas (I tried to set such negative indices, but even then
I still got the proper sequence of rows).

Try changing the last line in prepend to:

return pd.concat([new_row, grp], ignore_index=True)

i.e. add ignore_index=True.

In this case old index values are ignored and in each group index values
will be consecutive numbers.
In the last step (reset_index) they will be overwritten with the
new sequence of consecutive numbers, but at least there should be
the proper order of rows in the results returned by each application
of prepend.

答案2

得分: 0

你可以遍历分组后的数据框:

final_df = pd.DataFrame(columns=df.columns)
for id_, df_ in df.groupby('id'):
    final_df = final_df.append(pd.DataFrame([[id_, 'rdf:type', 'description']], columns=df.columns))
    final_df = final_df.append(df_)
final_df = final_df.reset_index(drop=True)

然后你会得到:

     id      name        value
0   111  rdf:type  description
1   111    length           46
2   111    status    completed
3   111   segment           21
4   555  rdf:type  description
5   555        tp          0.1
6   555         x           56
7   888  rdf:type  description
8   888     point        23.01
9   888         x           50
10  888         y           40
英文:

You could iterate through the grouped dataframes:

final_df = pd.DataFrame(columns=df.columns)
for id_, df_ in df.groupby(&#39;id&#39;):
    final_df = final_df.append(pd.DataFrame([[id_, &#39;rdf:type&#39;, &#39;description&#39;]], columns=df.columns))
    final_df = final_df.append(df_)
final_df = final_df.reset_index(drop=True)

Then you'll get

     id      name        value
0   111  rdf:type  description
1   111    length           46
2   111    status    completed
3   111   segment           21
4   555  rdf:type  description
5   555        tp          0.1
6   555         x           56
7   888  rdf:type  description
8   888     point        23.01
9   888         x           50
10  888         y           40

huangapple
  • 本文由 发表于 2020年1月6日 21:33:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613062.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定