2023年7月13日 16:41:10go评论59阅读模式

英文:

Cut each string in a pandas dataframe

问题

我有一个名为'country'的数据框，如下所示：

Booking date Country1 Country2 Country3 Country 4
2023-07-08T00:00:00.000 NaN NaN 129.6119.7449.3519.7439.4819 13.018.614
2023-07-89T00:00:00.000 NaN NaN 19.7439.4849.3516.09.8739.4834.4819.7419 67.518.616.557.629


我想要：

Booking date Country1 Country2 Country3 Country4
2023-07-08T00:00:00.000 NaN NaN 129.61 13.02
2023-07-89T00:00:00.000 NaN NaN 19.74 67.52


所以基本上，我想要在pandas数据框中将每个字符串截断到第一个点之后的三个小数，然后将其四舍五入为两个小数。我应该如何做？

我尝试过在这里找到的方法：https://stackoverflow.com/questions/64885222/editing-strings-in-a-pandas-dataframe

country.str[:5]，但那只适用于每列，如：country['Country1'].str[:5]，不能一次应用于整个数据框。

英文:

I have a Dataframe 'country' like this:

Booking date            Country1 Country2 Country3                                 Country 4
2023-07-08T00:00:00.000 NaN      NaN      129.6119.7449.3519.7439.4819             13.018.614
2023-07-89T00:00:00.000 NaN      NaN      19.7439.4849.3516.09.8739.4834.4819.7419 67.518.616.557.629

I would like to have:

Booking date            Country1 Country2 Country3 Country4
2023-07-08T00:00:00.000 NaN      NaN      129.61   13.02
2023-07-89T00:00:00.000 NaN      NaN      19.74    67.52

So basically, I want to cut each string in a pandas DataFrame to three decimals after the first point and then round it to two decimals. How do I go about doing this?

I have tried what I found here: https://stackoverflow.com/questions/64885222/editing-strings-in-a-pandas-dataframe

country.str[:5], but that only works for each column as: country['Country1'].str[:5], not for the whole DataFrame at once.

答案1

得分: 1

你可以通过replace将第二个句点之后的所有内容移除，然后转换为浮点数并四舍五入。最后更新数据框。

df.update(df.filter(regex='Country').astype(str)
  .apply(lambda x:x.str.replace(r'(\d+\.\d+).*', '\\1', regex = True))
  .astype(float).round(2))

df
              Booking_date  Country1  Country2 Country3 Country_4
0  2023-07-08T00:00:00.000       NaN       NaN   129.61     13.02
1  2023-07-89T00:00:00.000       NaN       NaN    19.74     67.52

英文:

You could remove everything after the second period, by replace then convert to float and round. finally update the dataframe.

df.update(df.filter(regex=&#39;Country&#39;).astype(str)
  .apply(lambda x:x.str.replace(r&#39;(\d+\.\d+).*&#39;, &#39;\&#39;, regex = True))
  .astype(float).round(2))

df
              Booking_date  Country1  Country2 Country3 Country_4
0  2023-07-08T00:00:00.000       NaN       NaN   129.61     13.02
1  2023-07-89T00:00:00.000       NaN       NaN    19.74     67.52

答案2

得分: 0

你可以首先筛选列（这里是包含字符串的Country*列），然后提取数字的第一部分，转换为浮点数并四舍五入，最后在原地更新DataFrame：

df.update(df
   # or manually list the columns here: [['Country3', 'Country 4']]
   .filter(like='Country').select_dtypes(exclude='number')
   .apply(lambda s: s.str.extract(r'^(\d+(?:\.\d*)?)', expand=False))
   .astype(float).round(2)
)

或者使用循环的另一种方法：

for col in ['Country3', 'Country 4']:
    df[col] = (df[col].str.extract(r'^(\d+(?:\.\d*)?)', expand=False)
                .astype(float).round(2)
              )

更新后的DataFrame：

              Booking date  Country1  Country2  Country3  Country 4
0  2023-07-08T00:00:00.000       NaN       NaN    129.61      13.02
1  2023-07-89T00:00:00.000       NaN       NaN     19.74      67.52

英文:

You can first filter the columns (here the Country* columns with strings), then extract the first part of the digit, convert to float and round, finally update the DataFrame in place:

df.update(df
   # or manually list the columns here: [[&#39;Country3&#39;, &#39;Country 4&#39;]]
   .filter(like=&#39;Country&#39;).select_dtypes(exclude=&#39;number&#39;)
   .apply(lambda s: s.str.extract(r&#39;^(\d+(?:\.\d*)?)&#39;, expand=False))
   .astype(float).round(2)
)

Alternative using a loop:

for col in [&#39;Country3&#39;, &#39;Country 4&#39;]:
    df[col] = (df[col].str.extract(r&#39;^(\d+(?:\.\d*)?)&#39;, expand=False)
                .astype(float).round(2)
              )

Updated DataFrame:

              Booking date  Country1  Country2 Country3 Country 4
0  2023-07-08T00:00:00.000       NaN       NaN   129.61     13.02
1  2023-07-89T00:00:00.000       NaN       NaN    19.74     67.52

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个 pandas 数据框中截取每个字符串。

问题

答案1

答案2

如何提取字符串中所有夹杂字母的数字？

无法使用bind_rows来合并由for循环创建的列表输出。

如何删除分隔符的最后一个出现位置之后的所有内容？

如何使用一个字典作为另一个字典的键值的变量。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论