英文:
first() is not adding in first line
问题
以下是您要翻译的内容:
我试图根据第一行在df中添加新行。
期望输出
我正在尝试
只插入第一行正常,然后将其添加到555的末尾,放在x之后,有些地方放在中间,例如在y之后。我的df中有成千上万行。有人可以帮忙吗?
英文:
I am trying to add new rows in df based on first rows.
<!-- language: none -->
id name value
111 length 46
111 status completed
111 segment 21
555 tp 0.1
555 x 56
888 point 23.01
888 x 50
888 y 40
expected output
<!-- language: none -->
id name value
111 type description #new row
111 length 46
111 status completed
111 segment 21
555 type description #new row
555 tp 0.1
555 x 56
888 type description #new row
888 point 23.01
888 x 50
888 y 40
i am trying
<!-- language: none -->
new = df.groupby("id", as_index=False).first().assign(attribute='rdf:type', value='description')
df = pd.concat([new, df]).sort_values('id')
it only insert first row fine but then it add on last of 555 after x and some where it add in between e.g after y.I have thousands of rows in df.can any body please help?
答案1
得分: 1
定义以下函数:
def prepend(grp):
new_row = pd.DataFrame([[grp.iloc[0].id, 'type', 'description']],
columns=grp.columns)
return pd.concat([new_row, grp])
然后应用它:
result = df.groupby('id').apply(prepend).droplevel(level=0)\
.reset_index(drop=True)
根据关于新行位置的评论进行编辑
对于您的示例数据(以及自动生成的索引),我得到了正确的结果。
可能导致行的顺序不同的一个原因是,您的DataFrame中可能有一些带有负索引的行。
在这种情况下:
- 第一行(添加的行)生成的索引为0,
- 其他行具有“原始”索引,
因此连接顺序可能不同,例如使用某个旧版本的* Pandas (我尝试设置了负索引,但即使在那种情况下,我仍然得到了每次应用 prepend *返回的行的正确顺序)。
尝试更改* prepend *中的最后一行:
return pd.concat([new_row, grp], ignore_index=True)
也就是添加* ignore_index=True *。
在这种情况下,旧索引值将被忽略,每个组中的索引值将是连续的数字。
在最后一步(* reset_index )中,它们将被覆盖为连续数字的新序列,但至少应该有 prepend *的每次应用返回的行的正确顺序。
英文:
Define the following function:
def prepend(grp):
new_row = pd.DataFrame([[ grp.iloc[0].id, 'type', 'description' ]],
columns=grp.columns)
return pd.concat([new_row, grp])
Then apply it:
result = df.groupby('id').apply(prepend).droplevel(level=0)\
.reset_index(drop=True)
Edit following the comment concerning position of new rows
For your sample data (and automatically generated index) I received the
proper result.
One reason why the order of rows can be different is that maybe some rows
in your DataFrame have negative indices.
In this case:
- the first (added row) is generated with index == 0,
- other with "original" indices,
hence the concatenation order may be different, e.g. using some older
version of Pandas (I tried to set such negative indices, but even then
I still got the proper sequence of rows).
Try changing the last line in prepend to:
return pd.concat([new_row, grp], ignore_index=True)
i.e. add ignore_index=True.
In this case old index values are ignored and in each group index values
will be consecutive numbers.
In the last step (reset_index) they will be overwritten with the
new sequence of consecutive numbers, but at least there should be
the proper order of rows in the results returned by each application
of prepend.
答案2
得分: 0
你可以遍历分组后的数据框:
final_df = pd.DataFrame(columns=df.columns)
for id_, df_ in df.groupby('id'):
final_df = final_df.append(pd.DataFrame([[id_, 'rdf:type', 'description']], columns=df.columns))
final_df = final_df.append(df_)
final_df = final_df.reset_index(drop=True)
然后你会得到:
id name value
0 111 rdf:type description
1 111 length 46
2 111 status completed
3 111 segment 21
4 555 rdf:type description
5 555 tp 0.1
6 555 x 56
7 888 rdf:type description
8 888 point 23.01
9 888 x 50
10 888 y 40
英文:
You could iterate through the grouped dataframes:
final_df = pd.DataFrame(columns=df.columns)
for id_, df_ in df.groupby('id'):
final_df = final_df.append(pd.DataFrame([[id_, 'rdf:type', 'description']], columns=df.columns))
final_df = final_df.append(df_)
final_df = final_df.reset_index(drop=True)
Then you'll get
id name value
0 111 rdf:type description
1 111 length 46
2 111 status completed
3 111 segment 21
4 555 rdf:type description
5 555 tp 0.1
6 555 x 56
7 888 rdf:type description
8 888 point 23.01
9 888 x 50
10 888 y 40
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论