英文:
Cleanest way of changing dtype of several DataFrame columns
问题
I have four columns of type object
in a Pandas (2.0.1) DataFrame which want to convert to int
.
Applying the following method:
cols = ['x1','x2','y1','y2']
df[cols] = df[cols].apply(pd.to_numeric)
# The same message is raised when trying to cast a single column:
df['x1'] = pd.to_numeric(df['x1'])
# The same message is also raised when using .astype():
df[cols] = df[cols].astype(int)
as described here: https://stackoverflow.com/a/28648923/6630397
raises the message:
/tmp/ipykernel_87959/2834796204.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[cols] = df[cols].apply(pd.to_numeric)
How can I properly (and rapidly) cast my four columns from object
to int
?
英文:
I have four columns of type object
in a Pandas (2.0.1) DataFrame which want to convert to int
.
Applying the following method:
cols = ['x1','x2','y1','y2']
df[cols] = df[cols].apply(pd.to_numeric)
# The same message is raised when trying to cast a single column:
df['x1'] = pd.to_numeric(df['x1'])
# The same message is also raised when using .astype():
dff[cols] = dff[cols].astype(int)
as described here: https://stackoverflow.com/a/28648923/6630397
raises the message:
/tmp/ipykernel_87959/2834796204.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[cols] = df[cols].apply(pd.to_numeric)
How can I properly (and rapidly) cast my four columns from object
to int
?
答案1
得分: 2
A possible solution:
df[cols] = df[cols].astype('int')
英文:
A possible solution:
df[cols] = df[cols].astype('int')
答案2
得分: 1
The SettingWithCopyWarning
可以通过使用 .loc
索引器来选择和修改原始数据框的特定列来避免。我还会使用 downcast
参数来指定所需的输出数据类型为整数,因为如果列包含非整数值,pd.to_numeric()
可能会返回一个 float
数据类型。
Code
cols = ["x1", "x2", "y1", "y2"]
df.loc[:, cols] = df[cols].apply(pd.to_numeric, downcast="integer")
英文:
The SettingWithCopyWarning
can be avoided by using .loc
indexer to select and modify the specific columns of original dataframe.
I would also specify the desired output data type as integer using the downcast
parameter, as pd.to_numeric()
may return a float
data type if the column contains any non-integer values.
Code
cols = ["x1", "x2", "y1", "y2"]
df.loc[:, cols] = df[cols].apply(pd.to_numeric, downcast="integer")
答案3
得分: 0
我认为有时候pandas可能有点过于急于抛出警告,您的解决方案没有特别大的问题,但也许这个解决方案会稍微更清晰一些:
df = df.astype({'x1': 'int', 'x2': 'int', 'y1': 'int', 'y2': 'int'})
英文:
I believe that sometimes pandas is a little bit overeager to throw warnings and it is nothing particularly wrong with your solution, but maybe this one will be slightly cleaner:
df = df.astype({'x1': 'int', 'x2': 'int', 'y1': 'int', 'y2': 'int'})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论