在一个 pandas 数据框中截取每个字符串。

huangapple go评论59阅读模式
英文:

Cut each string in a pandas dataframe

问题

我有一个名为'country'的数据框,如下所示:

Booking date Country1 Country2 Country3 Country 4
2023-07-08T00:00:00.000 NaN NaN 129.6119.7449.3519.7439.4819 13.018.614
2023-07-89T00:00:00.000 NaN NaN 19.7439.4849.3516.09.8739.4834.4819.7419 67.518.616.557.629


我想要:

Booking date Country1 Country2 Country3 Country4
2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
2023-07-89T00:00:00.000 NaN NaN 19.74 67.52


所以基本上,我想要在pandas数据框中将每个字符串截断到第一个点之后的三个小数,然后将其四舍五入为两个小数。我应该如何做?

我尝试过在这里找到的方法:https://stackoverflow.com/questions/64885222/editing-strings-in-a-pandas-dataframe

country.str[:5],但那只适用于每列,如:country['Country1'].str[:5],不能一次应用于整个数据框。
英文:

I have a Dataframe 'country' like this:

Booking date            Country1 Country2 Country3                                 Country 4
2023-07-08T00:00:00.000 NaN      NaN      129.6119.7449.3519.7439.4819             13.018.614
2023-07-89T00:00:00.000 NaN      NaN      19.7439.4849.3516.09.8739.4834.4819.7419 67.518.616.557.629

I would like to have:

Booking date            Country1 Country2 Country3 Country4
2023-07-08T00:00:00.000 NaN      NaN      129.61   13.02
2023-07-89T00:00:00.000 NaN      NaN      19.74    67.52

So basically, I want to cut each string in a pandas DataFrame to three decimals after the first point and then round it to two decimals. How do I go about doing this?

I have tried what I found here: https://stackoverflow.com/questions/64885222/editing-strings-in-a-pandas-dataframe

country.str[:5], but that only works for each column as: country['Country1'].str[:5], not for the whole DataFrame at once.

答案1

得分: 1

你可以通过replace将第二个句点之后的所有内容移除,然后转换为浮点数并四舍五入。最后更新数据框。

df.update(df.filter(regex='Country').astype(str)
  .apply(lambda x:x.str.replace(r'(\d+\.\d+).*', '\\1', regex = True))
  .astype(float).round(2))

df
              Booking_date  Country1  Country2 Country3 Country_4
0  2023-07-08T00:00:00.000       NaN       NaN   129.61     13.02
1  2023-07-89T00:00:00.000       NaN       NaN    19.74     67.52
英文:

You could remove everything after the second period, by replace then convert to float and round. finally update the dataframe.

df.update(df.filter(regex='Country').astype(str)
  .apply(lambda x:x.str.replace(r'(\d+\.\d+).*', '\', regex = True))
  .astype(float).round(2))

df
              Booking_date  Country1  Country2 Country3 Country_4
0  2023-07-08T00:00:00.000       NaN       NaN   129.61     13.02
1  2023-07-89T00:00:00.000       NaN       NaN    19.74     67.52

答案2

得分: 0

你可以首先筛选列(这里是包含字符串的Country*列),然后提取数字的第一部分,转换为浮点数并四舍五入,最后在原地更新DataFrame:

df.update(df
   # or manually list the columns here: [['Country3', 'Country 4']]
   .filter(like='Country').select_dtypes(exclude='number')
   .apply(lambda s: s.str.extract(r'^(\d+(?:\.\d*)?)', expand=False))
   .astype(float).round(2)
)

或者使用循环的另一种方法:

for col in ['Country3', 'Country 4']:
    df[col] = (df[col].str.extract(r'^(\d+(?:\.\d*)?)', expand=False)
                .astype(float).round(2)
              )

更新后的DataFrame:

              Booking date  Country1  Country2  Country3  Country 4
0  2023-07-08T00:00:00.000       NaN       NaN    129.61      13.02
1  2023-07-89T00:00:00.000       NaN       NaN     19.74      67.52
英文:

You can first filter the columns (here the Country* columns with strings), then extract the first part of the digit, convert to float and round, finally update the DataFrame in place:

df.update(df
   # or manually list the columns here: [['Country3', 'Country 4']]
   .filter(like='Country').select_dtypes(exclude='number')
   .apply(lambda s: s.str.extract(r'^(\d+(?:\.\d*)?)', expand=False))
   .astype(float).round(2)
)

Alternative using a loop:

for col in ['Country3', 'Country 4']:
    df[col] = (df[col].str.extract(r'^(\d+(?:\.\d*)?)', expand=False)
                .astype(float).round(2)
              )

Updated DataFrame:

              Booking date  Country1  Country2 Country3 Country 4
0  2023-07-08T00:00:00.000       NaN       NaN   129.61     13.02
1  2023-07-89T00:00:00.000       NaN       NaN    19.74     67.52

huangapple
  • 本文由 发表于 2023年7月13日 16:41:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76677460.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定