2023年4月17日 17:09:25go评论101阅读模式

英文:

Is it possible to split a pandas column after last integer?

问题

I am trying to split a pandas column into two separate, where the first should contain just the date and the second string. But I don't want to split it after a certain character, like counting where the last integer instead I want to make a code that is applicable in general.

My col looks like this :

Column A
01.01.2000John Doe
01.01.2002Jane Doe

And I want it to look like this:

Column A	Column B
01.01.2000	Johne Doe
01.01.2001	Jane Doe

英文:

My col looks like this :

Column A
01.01.2000John Doe
01.01.2002Jane Doe

And I want it to look like this:

Column A	Column B
01.01.2000	Johne Doe
01.01.2001	Jane Doe

df_t[&#39;date&#39;] = df_t[&#39;date_time&#39;].str[0:19]
df_t[&quot;name&quot;] = df_t[&quot;date_time&quot;].str[19: ]
    
    
tid = df_t.drop([&quot;date_time&quot;], axis = 1)

This is the way I did it but I need a general way as mentioned above

答案1

得分: 1

You can use str.extract along with regular expressions in your code to extract date and name information from the given data.

import pandas as pd
# 示例数据
data = {'Column A': ['01.01.2000John Doe', '01.01.2002Jane Doe']}
df = pd.DataFrame(data)
# 正则表达式模式
pattern = r'(?P<Date>\d{2}\.\d{2}\.\d{4})(?P<Name>.*)&'
# 将日期和姓名提取到单独的列中
df[['Column A', 'Column B']] = df['Column A'].str.extract(pattern)
print(df)

Explanation:

pattern 变量包含了正则表达式模式。表达式 (?P<Date>\d{2}\.\d{2}\.\d{4}) 用于捕获日期，而 (?P<Name>.*) 用于捕获姓名。
使用 ?P<> 语法对捕获的组进行命名，以便更容易在DataFrame中创建新列。

英文:

You can use str.extract together with regular expressions:

import pandas as pd
# Sample data
data = {&#39;Column A&#39;: [&#39;01.01.2000John Doe&#39;, &#39;01.01.2002Jane Doe&#39;]}
df = pd.DataFrame(data)
# Regular expression pattern
pattern = r&#39;(?P&lt;Date&gt;\d{2}\.\d{2}\.\d{4})(?P&lt;Name&gt;.*)&#39;
# Extracting the date and name into separate columns
df[[&#39;Column A&#39;, &#39;Column B&#39;]] = df[&#39;Column A&#39;].str.extract(pattern)
print(df)

Explanation:

The pattern variable contains the regular expression pattern. The expression (?P<Date>\d{2}.\d{2}.\d{4}) captures the date, and (?P<Name>.*) captures the name.
The ?P<> syntax is used to name the captured groups, which makes it easier to create the new columns in the DataFrame.

EDIT

import pandas as pd
# Sample data
data = {
    &#39;1Column A&#39;: [&#39;2000-01-01 00:00:00John Doe&#39;, &#39;2002-01-01 00:00:00Jane Doe&#39;],
    &#39;2Column B&#39;: [&#39;2000-01-01 00:00:00Alice&#39;, &#39;2002-01-01 00:00:00Bob&#39;],
    &#39;3Column C&#39;: [&#39;Some other data&#39;, &#39;Not a date and name&#39;],
}
df = pd.DataFrame(data)
# Regular expression pattern
pattern = r&#39;(?P&lt;Date&gt;\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(?P&lt;Name&gt;.*)&#39;
# Iterate through columns and apply the pattern conditionally
for col in df.columns:
    if col.startswith(&quot;1&quot;) or col.startswith(&quot;2&quot;):
        # Extract date and name into separate columns with suffixes
        df[[f&quot;{col}_date&quot;, f&quot;{col}_name&quot;]] = df[col].str.extract(pattern)
        # Drop the original column
        df.drop(col, axis=1, inplace=True)
print(df)

答案2

得分: 1

你可以简单地使用索引：

df['Column A'], df['Column B'] = df['Column A'].str[:10], df['Column A'].str[10:]
print(df)
# 输出
     Column A  Column B
0  01.01.2000  John Doe
1  01.01.2002  Jane Doe

如果你想转换成datetime64：

df['Column A'], df['Column B'] = \
    pd.to_datetime(df['Column A'].str[:10], dayfirst=True), df['Column A'].str[10:]
print(df)
# 输出
    Column A  Column B
0 2000-01-01  John Doe
1 2002-01-01  Jane Doe

英文:

You can simply use indexing:

df[&#39;Column A&#39;], df[&#39;Column B&#39;] = df[&#39;Column A&#39;].str[:10], df[&#39;Column A&#39;].str[10:]
print(df)
# Output
     Column A  Column B
0  01.01.2000  John Doe
1  01.01.2002  Jane Doe

If you want to convert as datetime64:

df[&#39;Column A&#39;], df[&#39;Column B&#39;] = \
    pd.to_datetime(df[&#39;Column A&#39;].str[:10], dayfirst=True), df[&#39;Column A&#39;].str[10:]
print(df)
# Output
    Column A  Column B
0 2000-01-01  John Doe
1 2002-01-01  Jane Doe

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可以在最后一个整数后拆分pandas列吗？

问题

答案1

EDIT

答案2

Python TypedDict 中的任意键

在Django中的标签问题 | 使用taggit存储标签

限制Java、C++、Python程序的权限。

在链表中添加搜索功能。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。