英文:
Cut each string in a pandas dataframe
问题
我有一个名为'country'的数据框,如下所示:
Booking date Country1 Country2 Country3 Country 4
2023-07-08T00:00:00.000 NaN NaN 129.6119.7449.3519.7439.4819 13.018.614
2023-07-89T00:00:00.000 NaN NaN 19.7439.4849.3516.09.8739.4834.4819.7419 67.518.616.557.629
我想要:
Booking date Country1 Country2 Country3 Country4
2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
2023-07-89T00:00:00.000 NaN NaN 19.74 67.52
所以基本上,我想要在pandas数据框中将每个字符串截断到第一个点之后的三个小数,然后将其四舍五入为两个小数。我应该如何做?
我尝试过在这里找到的方法:https://stackoverflow.com/questions/64885222/editing-strings-in-a-pandas-dataframe
country.str[:5],但那只适用于每列,如:country['Country1'].str[:5],不能一次应用于整个数据框。
英文:
I have a Dataframe 'country' like this:
Booking date Country1 Country2 Country3 Country 4
2023-07-08T00:00:00.000 NaN NaN 129.6119.7449.3519.7439.4819 13.018.614
2023-07-89T00:00:00.000 NaN NaN 19.7439.4849.3516.09.8739.4834.4819.7419 67.518.616.557.629
I would like to have:
Booking date Country1 Country2 Country3 Country4
2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
2023-07-89T00:00:00.000 NaN NaN 19.74 67.52
So basically, I want to cut each string in a pandas DataFrame to three decimals after the first point and then round it to two decimals. How do I go about doing this?
I have tried what I found here: https://stackoverflow.com/questions/64885222/editing-strings-in-a-pandas-dataframe
country.str[:5], but that only works for each column as: country['Country1'].str[:5], not for the whole DataFrame at once.
答案1
得分: 1
你可以通过replace
将第二个句点之后的所有内容移除,然后转换为浮点数并四舍五入。最后更新数据框。
df.update(df.filter(regex='Country').astype(str)
.apply(lambda x:x.str.replace(r'(\d+\.\d+).*', '\\1', regex = True))
.astype(float).round(2))
df
Booking_date Country1 Country2 Country3 Country_4
0 2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
1 2023-07-89T00:00:00.000 NaN NaN 19.74 67.52
英文:
You could remove everything after the second period, by replace
then convert to float and round. finally update the dataframe.
df.update(df.filter(regex='Country').astype(str)
.apply(lambda x:x.str.replace(r'(\d+\.\d+).*', '\', regex = True))
.astype(float).round(2))
df
Booking_date Country1 Country2 Country3 Country_4
0 2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
1 2023-07-89T00:00:00.000 NaN NaN 19.74 67.52
答案2
得分: 0
你可以首先筛选列(这里是包含字符串的Country*列),然后提取
数字的第一部分,转换为浮点数并四舍五入
,最后在原地更新
DataFrame:
df.update(df
# or manually list the columns here: [['Country3', 'Country 4']]
.filter(like='Country').select_dtypes(exclude='number')
.apply(lambda s: s.str.extract(r'^(\d+(?:\.\d*)?)', expand=False))
.astype(float).round(2)
)
或者使用循环的另一种方法:
for col in ['Country3', 'Country 4']:
df[col] = (df[col].str.extract(r'^(\d+(?:\.\d*)?)', expand=False)
.astype(float).round(2)
)
更新后的DataFrame:
Booking date Country1 Country2 Country3 Country 4
0 2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
1 2023-07-89T00:00:00.000 NaN NaN 19.74 67.52
英文:
You can first filter the columns (here the Country* columns with strings), then extract
the first part of the digit, convert to float and round
, finally update
the DataFrame in place:
df.update(df
# or manually list the columns here: [['Country3', 'Country 4']]
.filter(like='Country').select_dtypes(exclude='number')
.apply(lambda s: s.str.extract(r'^(\d+(?:\.\d*)?)', expand=False))
.astype(float).round(2)
)
Alternative using a loop:
for col in ['Country3', 'Country 4']:
df[col] = (df[col].str.extract(r'^(\d+(?:\.\d*)?)', expand=False)
.astype(float).round(2)
)
Updated DataFrame:
Booking date Country1 Country2 Country3 Country 4
0 2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
1 2023-07-89T00:00:00.000 NaN NaN 19.74 67.52
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论