英文:
when concatenating two data frames an extra row is added
问题
我试图连接两个panda数据框,但不幸的是它不起作用,以下是代码:
train_df = pd.concat([x_train, y_train], axis=1)
print(train_df)
y_train和x_train的长度相同,大小和行索引都正确,我只想像连接两个矩阵一样连接它们。
我的当前输出如下:
Age Sex HighChol BMI ... PhysHlth DiffWalk HighBP Diabetes
0 10.0 1.0 1.0 33.0 ... 30.0 0.0 1.0 NaN
1 10.0 1.0 0.0 21.0 ... 30.0 1.0 1.0 1.0
2 4.0 0.0 0.0 32.0 ... 7.0 0.0 0.0 1.0
3 11.0 1.0 1.0 35.0 ... 10.0 1.0 1.0 0.0
4 10.0 0.0 1.0 27.0 ... 0.0 0.0 1.0 1.0
... ... ... ... ... ... ... ... ... ...
996 3.0 0.0 1.0 33.0 ... 0.0 0.0 0.0 0.0
997 9.0 0.0 1.0 41.0 ... 30.0 1.0 1.0 0.0
998 12.0 0.0 1.0 34.0 ... 0.0 0.0 1.0 1.0
999 6.0 0.0 0.0 31.0 ... 0.0 0.0 0.0 0.0
1000 NaN NaN NaN NaN ... NaN NaN NaN 1.0
[1001 rows x 15 columns]
出现nan行的原因似乎是因为y_train实际上是一个系列(Series)。
英文:
I am trying to concatenate two panda dataframes but unfortunately it's not working this is the following code:
train_df =pd.concat([x_train,y_train],axis =1 )
print(train_df)
y_train and x_train are of the same length and have the correct size and row indexes, I just wish to conctenate both of them like concatenating two matrices together.
My current output is the following:
Age Sex HighChol BMI ... PhysHlth DiffWalk HighBP Diabetes
0 10.0 1.0 1.0 33.0 ... 30.0 0.0 1.0 NaN
1 10.0 1.0 0.0 21.0 ... 30.0 1.0 1.0 1.0
2 4.0 0.0 0.0 32.0 ... 7.0 0.0 0.0 1.0
3 11.0 1.0 1.0 35.0 ... 10.0 1.0 1.0 0.0
4 10.0 0.0 1.0 27.0 ... 0.0 0.0 1.0 1.0
... ... ... ... ... ... ... ... ... ...
996 3.0 0.0 1.0 33.0 ... 0.0 0.0 0.0 0.0
997 9.0 0.0 1.0 41.0 ... 30.0 1.0 1.0 0.0
998 12.0 0.0 1.0 34.0 ... 0.0 0.0 1.0 1.0
999 6.0 0.0 0.0 31.0 ... 0.0 0.0 0.0 0.0
1000 NaN NaN NaN NaN ... NaN NaN NaN 1.0
[1001 rows x 15 columns]
which for some reason seems to add a row of nan
edit:
apparently y_train is a series
答案1
得分: 2
你的问题描述中包括以下翻译内容:
-
"You have a shift between
y_train
andx_train
index:x_train
index range is 0-999 whiley_train
is 1-1000." 翻译为 "你的y_train
和x_train
索引存在偏移:x_train
的索引范围是 0 到 999,而y_train
是 1 到 1000。" -
"
pd.concat
uses this index to align row. A workaround is:" 翻译为 "pd.concat
使用这个索引来对齐行。一种解决方法是:" -
"train_df = x_train.copy()\ntrain_df['Diabetes'] = y_train.values\n\n# Or\train_df = pd.concat([x_train, y_train.reset_index(drop=True)], axis=1)" 翻译为 "train_df = x_train.copy()\ntrain_df['Diabetes'] = y_train.values\n\n# 或者\train_df = pd.concat([x_train, y_train.reset_index(drop=True)], axis=1)"
-
"But take care, you have to find why you have this shift." 翻译为 "但要注意,你需要找出为什么存在这种偏移。"
-
"Note:
y_train
is aSeries
whose the name isDiabetes
that's why the last column oftrain_df
isDiabetes
." 翻译为 "注意:y_train
是一个名为Diabetes
的Series
,这就是为什么train_df
的最后一列是Diabetes
。"
英文:
You have a shift between y_train
and x_train
index: x_train
index range is 0-999 while y_train
is 1-1000.
pd.concat
uses this index to align row. A workaround is:
train_df = x_train.copy()
train_df['Diabetes'] = y_train.values
# Or
train_df = pd.concat([x_train, y_train.reset_index(drop=True)], axis=1)
But take care, you have to find why you have this shift.
Note: y_train
is a Series
whose the name is Diabetes
that's why the last column of train_df
is Diabetes
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论