2023年3月21日 01:21:18go评论106阅读模式

英文:

when concatenating two data frames an extra row is added

问题

我试图连接两个panda数据框，但不幸的是它不起作用，以下是代码：
train_df = pd.concat([x_train, y_train], axis=1)
print(train_df)

y_train和x_train的长度相同，大小和行索引都正确，我只想像连接两个矩阵一样连接它们。

我的当前输出如下：

       Age  Sex  HighChol   BMI  ...  PhysHlth  DiffWalk  HighBP  Diabetes
0     10.0  1.0       1.0  33.0  ...      30.0       0.0     1.0       NaN
1     10.0  1.0       0.0  21.0  ...      30.0       1.0     1.0       1.0
2      4.0  0.0       0.0  32.0  ...       7.0       0.0     0.0       1.0
3     11.0  1.0       1.0  35.0  ...      10.0       1.0     1.0       0.0
4     10.0  0.0       1.0  27.0  ...       0.0       0.0     1.0       1.0
...    ...  ...       ...   ...  ...       ...       ...     ...       ...
996    3.0  0.0       1.0  33.0  ...       0.0       0.0     0.0       0.0
997    9.0  0.0       1.0  41.0  ...      30.0       1.0     1.0       0.0
998   12.0  0.0       1.0  34.0  ...       0.0       0.0     1.0       1.0
999    6.0  0.0       0.0  31.0  ...       0.0       0.0     0.0       0.0
1000   NaN  NaN       NaN   NaN  ...       NaN       NaN     NaN       1.0
[1001 rows x 15 columns]

出现nan行的原因似乎是因为y_train实际上是一个系列（Series）。

英文:

I am trying to concatenate two panda dataframes but unfortunately it's not working this is the following code:


train_df =pd.concat([x_train,y_train],axis =1 )
print(train_df)

y_train and x_train are of the same length and have the correct size and row indexes, I just wish to conctenate both of them like concatenating two matrices together.
My current output is the following:

       Age  Sex  HighChol   BMI  ...  PhysHlth  DiffWalk  HighBP  Diabetes
0     10.0  1.0       1.0  33.0  ...      30.0       0.0     1.0       NaN
1     10.0  1.0       0.0  21.0  ...      30.0       1.0     1.0       1.0
2      4.0  0.0       0.0  32.0  ...       7.0       0.0     0.0       1.0
3     11.0  1.0       1.0  35.0  ...      10.0       1.0     1.0       0.0
4     10.0  0.0       1.0  27.0  ...       0.0       0.0     1.0       1.0
...    ...  ...       ...   ...  ...       ...       ...     ...       ...
996    3.0  0.0       1.0  33.0  ...       0.0       0.0     0.0       0.0
997    9.0  0.0       1.0  41.0  ...      30.0       1.0     1.0       0.0
998   12.0  0.0       1.0  34.0  ...       0.0       0.0     1.0       1.0
999    6.0  0.0       0.0  31.0  ...       0.0       0.0     0.0       0.0
1000   NaN  NaN       NaN   NaN  ...       NaN       NaN     NaN       1.0
[1001 rows x 15 columns]

which for some reason seems to add a row of nan

edit:
apparently y_train is a series

答案1

得分: 2

你的问题描述中包括以下翻译内容：

"You have a shift between y_train and x_train index: x_train index range is 0-999 while y_train is 1-1000." 翻译为 "你的 y_train 和 x_train 索引存在偏移：x_train 的索引范围是 0 到 999，而 y_train 是 1 到 1000。"
"pd.concat uses this index to align row. A workaround is:" 翻译为 "pd.concat 使用这个索引来对齐行。一种解决方法是："
"train_df = x_train.copy()\ntrain_df['Diabetes'] = y_train.values\n\n# Or\train_df = pd.concat([x_train, y_train.reset_index(drop=True)], axis=1)" 翻译为 "train_df = x_train.copy()\ntrain_df['Diabetes'] = y_train.values\n\n# 或者\train_df = pd.concat([x_train, y_train.reset_index(drop=True)], axis=1)"
"But take care, you have to find why you have this shift." 翻译为 "但要注意，你需要找出为什么存在这种偏移。"
"Note: y_train is a Series whose the name is Diabetes that's why the last column of train_df is Diabetes." 翻译为 "注意：y_train 是一个名为 Diabetes 的 Series，这就是为什么 train_df 的最后一列是 Diabetes。"

英文:

You have a shift between y_train and x_train index: x_train index range is 0-999 while y_train is 1-1000.

pd.concat uses this index to align row. A workaround is:

train_df = x_train.copy()
train_df[&#39;Diabetes&#39;] = y_train.values
# Or
train_df = pd.concat([x_train, y_train.reset_index(drop=True)], axis=1)

But take care, you have to find why you have this shift.

Note: y_train is a Series whose the name is Diabetes that's why the last column of train_df is Diabetes.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

当连接两个数据框时，会添加一个额外的行。

问题

答案1

压缩日志文件，当它达到10个备份计数时，使用Python。

在时间序列中随时间增加一个数值

Java：字节数组打印相同字符串时显示未知值

如何从Django序列化器中列出字段名称

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。