更改多个DataFrame列的最清晰方式

huangapple go评论65阅读模式
英文:

Cleanest way of changing dtype of several DataFrame columns

问题

I have four columns of type object in a Pandas (2.0.1) DataFrame which want to convert to int.

Applying the following method:

cols = ['x1','x2','y1','y2']

df[cols] = df[cols].apply(pd.to_numeric)

# The same message is raised when trying to cast a single column:
df['x1'] = pd.to_numeric(df['x1'])

# The same message is also raised when using .astype():
df[cols] = df[cols].astype(int)

as described here: https://stackoverflow.com/a/28648923/6630397
raises the message:

/tmp/ipykernel_87959/2834796204.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
    
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[cols] = df[cols].apply(pd.to_numeric)

How can I properly (and rapidly) cast my four columns from object to int?

英文:

I have four columns of type object in a Pandas (2.0.1) DataFrame which want to convert to int.

Applying the following method:

cols = ['x1','x2','y1','y2']

df[cols] = df[cols].apply(pd.to_numeric)

# The same message is raised when trying to cast a single column:
df['x1'] = pd.to_numeric(df['x1'])

# The same message is also raised when using .astype():
dff[cols] = dff[cols].astype(int)

as described here: https://stackoverflow.com/a/28648923/6630397
raises the message:

/tmp/ipykernel_87959/2834796204.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
    
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[cols] = df[cols].apply(pd.to_numeric)

How can I properly (and rapidly) cast my four columns from object to int?

答案1

得分: 2

A possible solution:

df[cols] = df[cols].astype('int')

英文:

A possible solution:

df[cols] = df[cols].astype('int')

答案2

得分: 1

The SettingWithCopyWarning 可以通过使用 .loc 索引器来选择和修改原始数据框的特定列来避免。我还会使用 downcast 参数来指定所需的输出数据类型为整数,因为如果列包含非整数值,pd.to_numeric() 可能会返回一个 float 数据类型。

Code

cols = ["x1", "x2", "y1", "y2"]

df.loc[:, cols] = df[cols].apply(pd.to_numeric, downcast="integer")
英文:

The SettingWithCopyWarning can be avoided by using .loc indexer to select and modify the specific columns of original dataframe.

I would also specify the desired output data type as integer using the downcast parameter, as pd.to_numeric() may return a float data type if the column contains any non-integer values.

Code

cols = ["x1", "x2", "y1", "y2"]

df.loc[:, cols] = df[cols].apply(pd.to_numeric, downcast="integer")

答案3

得分: 0

我认为有时候pandas可能有点过于急于抛出警告,您的解决方案没有特别大的问题,但也许这个解决方案会稍微更清晰一些:

df = df.astype({'x1': 'int', 'x2': 'int', 'y1': 'int', 'y2': 'int'})
英文:

I believe that sometimes pandas is a little bit overeager to throw warnings and it is nothing particularly wrong with your solution, but maybe this one will be slightly cleaner:

df = df.astype({'x1': 'int', 'x2': 'int', 'y1': 'int', 'y2': 'int'})

huangapple
  • 本文由 发表于 2023年5月6日 17:43:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76188195.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定