“first() is not adding in first line” 可以翻译为 “first() 没有添加在第一行”。

huangapple go评论104阅读模式
英文:

first() is not adding in first line

问题

以下是您要翻译的内容:

  1. 我试图根据第一行在df中添加新行。
  2. 期望输出
  3. 我正在尝试
  4. 只插入第一行正常,然后将其添加到555的末尾,放在x之后,有些地方放在中间,例如在y之后。我的df中有成千上万行。有人可以帮忙吗?
英文:

I am trying to add new rows in df based on first rows.

<!-- language: none -->

  1. id name value
  2. 111 length 46
  3. 111 status completed
  4. 111 segment 21
  5. 555 tp 0.1
  6. 555 x 56
  7. 888 point 23.01
  8. 888 x 50
  9. 888 y 40

expected output

<!-- language: none -->

  1. id name value
  2. 111 type description #new row
  3. 111 length 46
  4. 111 status completed
  5. 111 segment 21
  6. 555 type description #new row
  7. 555 tp 0.1
  8. 555 x 56
  9. 888 type description #new row
  10. 888 point 23.01
  11. 888 x 50
  12. 888 y 40

i am trying

<!-- language: none -->

  1. new = df.groupby(&quot;id&quot;, as_index=False).first().assign(attribute=&#39;rdf:type&#39;, value=&#39;description&#39;)
  2. df = pd.concat([new, df]).sort_values(&#39;id&#39;)

it only insert first row fine but then it add on last of 555 after x and some where it add in between e.g after y.I have thousands of rows in df.can any body please help?

答案1

得分: 1

定义以下函数:

  1. def prepend(grp):
  2. new_row = pd.DataFrame([[grp.iloc[0].id, 'type', 'description']],
  3. columns=grp.columns)
  4. return pd.concat([new_row, grp])

然后应用它:

  1. result = df.groupby('id').apply(prepend).droplevel(level=0)\
  2. .reset_index(drop=True)

根据关于新行位置的评论进行编辑

对于您的示例数据(以及自动生成的索引),我得到了正确的结果。

可能导致行的顺序不同的一个原因是,您的DataFrame中可能有一些带有负索引的行。
在这种情况下:

  • 第一行(添加的行)生成的索引为0,
  • 其他行具有“原始”索引,

因此连接顺序可能不同,例如使用某个旧版本的* Pandas (我尝试设置了负索引,但即使在那种情况下,我仍然得到了每次应用 prepend *返回的行的正确顺序)。

尝试更改* prepend *中的最后一行:

  1. return pd.concat([new_row, grp], ignore_index=True)

也就是添加* ignore_index=True *。

在这种情况下,旧索引值将被忽略,每个组中的索引值将是连续的数字。
在最后一步(* reset_index )中,它们将被覆盖为连续数字的新序列,但至少应该有 prepend *的每次应用返回的行的正确顺序。

英文:

Define the following function:

  1. def prepend(grp):
  2. new_row = pd.DataFrame([[ grp.iloc[0].id, &#39;type&#39;, &#39;description&#39; ]],
  3. columns=grp.columns)
  4. return pd.concat([new_row, grp])

Then apply it:

  1. result = df.groupby(&#39;id&#39;).apply(prepend).droplevel(level=0)\
  2. .reset_index(drop=True)

Edit following the comment concerning position of new rows

For your sample data (and automatically generated index) I received the
proper result.

One reason why the order of rows can be different is that maybe some rows
in your DataFrame have negative indices.
In this case:

  • the first (added row) is generated with index == 0,
  • other with "original" indices,

hence the concatenation order may be different, e.g. using some older
version of Pandas (I tried to set such negative indices, but even then
I still got the proper sequence of rows).

Try changing the last line in prepend to:

  1. return pd.concat([new_row, grp], ignore_index=True)

i.e. add ignore_index=True.

In this case old index values are ignored and in each group index values
will be consecutive numbers.
In the last step (reset_index) they will be overwritten with the
new sequence of consecutive numbers, but at least there should be
the proper order of rows in the results returned by each application
of prepend.

答案2

得分: 0

你可以遍历分组后的数据框:

  1. final_df = pd.DataFrame(columns=df.columns)
  2. for id_, df_ in df.groupby('id'):
  3. final_df = final_df.append(pd.DataFrame([[id_, 'rdf:type', 'description']], columns=df.columns))
  4. final_df = final_df.append(df_)
  5. final_df = final_df.reset_index(drop=True)

然后你会得到:

  1. id name value
  2. 0 111 rdf:type description
  3. 1 111 length 46
  4. 2 111 status completed
  5. 3 111 segment 21
  6. 4 555 rdf:type description
  7. 5 555 tp 0.1
  8. 6 555 x 56
  9. 7 888 rdf:type description
  10. 8 888 point 23.01
  11. 9 888 x 50
  12. 10 888 y 40
英文:

You could iterate through the grouped dataframes:

  1. final_df = pd.DataFrame(columns=df.columns)
  2. for id_, df_ in df.groupby(&#39;id&#39;):
  3. final_df = final_df.append(pd.DataFrame([[id_, &#39;rdf:type&#39;, &#39;description&#39;]], columns=df.columns))
  4. final_df = final_df.append(df_)
  5. final_df = final_df.reset_index(drop=True)

Then you'll get

  1. id name value
  2. 0 111 rdf:type description
  3. 1 111 length 46
  4. 2 111 status completed
  5. 3 111 segment 21
  6. 4 555 rdf:type description
  7. 5 555 tp 0.1
  8. 6 555 x 56
  9. 7 888 rdf:type description
  10. 8 888 point 23.01
  11. 9 888 x 50
  12. 10 888 y 40

huangapple
  • 本文由 发表于 2020年1月6日 21:33:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/59613062.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定