可以在最后一个整数后拆分pandas列吗?

huangapple go评论66阅读模式
英文:

Is it possible to split a pandas column after last integer?

问题

I am trying to split a pandas column into two separate, where the first should contain just the date and the second string. But I don't want to split it after a certain character, like counting where the last integer instead I want to make a code that is applicable in general.

My col looks like this :

Column A
01.01.2000John Doe
01.01.2002Jane Doe

And I want it to look like this:

Column A Column B
01.01.2000 Johne Doe
01.01.2001 Jane Doe
英文:

I am trying to split a pandas column into two separate, where the first should contain just the date and the second string. But I don't want to split it after a certain character, like counting where the last integer instead I want to make a code that is applicable in general.

My col looks like this :

Column A
01.01.2000John Doe
01.01.2002Jane Doe

And I want it to look like this:

Column A Column B
01.01.2000 Johne Doe
01.01.2001 Jane Doe
df_t['date'] = df_t['date_time'].str[0:19]
df_t["name"] = df_t["date_time"].str[19: ]
    
    
tid = df_t.drop(["date_time"], axis = 1)

This is the way I did it but I need a general way as mentioned above

答案1

得分: 1

You can use str.extract along with regular expressions in your code to extract date and name information from the given data.

import pandas as pd

# 示例数据
data = {'Column A': ['01.01.2000John Doe', '01.01.2002Jane Doe']}
df = pd.DataFrame(data)

# 正则表达式模式
pattern = r'(?P<Date>\d{2}\.\d{2}\.\d{4})(?P<Name>.*)&'

# 将日期和姓名提取到单独的列中
df[['Column A', 'Column B']] = df['Column A'].str.extract(pattern)

print(df)

Explanation:

  • pattern 变量包含了正则表达式模式。表达式 (?P<Date>\d{2}\.\d{2}\.\d{4}) 用于捕获日期,而 (?P<Name>.*) 用于捕获姓名。
  • 使用 ?P<> 语法对捕获的组进行命名,以便更容易在DataFrame中创建新列。
英文:

You can use str.extract together with regular expressions:

import pandas as pd

# Sample data
data = {&#39;Column A&#39;: [&#39;01.01.2000John Doe&#39;, &#39;01.01.2002Jane Doe&#39;]}
df = pd.DataFrame(data)

# Regular expression pattern
pattern = r&#39;(?P&lt;Date&gt;\d{2}\.\d{2}\.\d{4})(?P&lt;Name&gt;.*)&#39;

# Extracting the date and name into separate columns
df[[&#39;Column A&#39;, &#39;Column B&#39;]] = df[&#39;Column A&#39;].str.extract(pattern)

print(df)

Explanation:

  • The pattern variable contains the regular expression pattern. The expression (?P<Date>\d{2}.\d{2}.\d{4}) captures the date, and (?P<Name>.*) captures the name.
  • The ?P<> syntax is used to name the captured groups, which makes it easier to create the new columns in the DataFrame.

EDIT

import pandas as pd

# Sample data
data = {
    &#39;1Column A&#39;: [&#39;2000-01-01 00:00:00John Doe&#39;, &#39;2002-01-01 00:00:00Jane Doe&#39;],
    &#39;2Column B&#39;: [&#39;2000-01-01 00:00:00Alice&#39;, &#39;2002-01-01 00:00:00Bob&#39;],
    &#39;3Column C&#39;: [&#39;Some other data&#39;, &#39;Not a date and name&#39;],
}

df = pd.DataFrame(data)

# Regular expression pattern
pattern = r&#39;(?P&lt;Date&gt;\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(?P&lt;Name&gt;.*)&#39;

# Iterate through columns and apply the pattern conditionally
for col in df.columns:
    if col.startswith(&quot;1&quot;) or col.startswith(&quot;2&quot;):
        # Extract date and name into separate columns with suffixes
        df[[f&quot;{col}_date&quot;, f&quot;{col}_name&quot;]] = df[col].str.extract(pattern)
        # Drop the original column
        df.drop(col, axis=1, inplace=True)

print(df)

答案2

得分: 1

你可以简单地使用索引:

df['Column A'], df['Column B'] = df['Column A'].str[:10], df['Column A'].str[10:]
print(df)

# 输出
     Column A  Column B
0  01.01.2000  John Doe
1  01.01.2002  Jane Doe

如果你想转换成datetime64

df['Column A'], df['Column B'] = \
    pd.to_datetime(df['Column A'].str[:10], dayfirst=True), df['Column A'].str[10:]
print(df)

# 输出
    Column A  Column B
0 2000-01-01  John Doe
1 2002-01-01  Jane Doe
英文:

You can simply use indexing:

df[&#39;Column A&#39;], df[&#39;Column B&#39;] = df[&#39;Column A&#39;].str[:10], df[&#39;Column A&#39;].str[10:]
print(df)

# Output
     Column A  Column B
0  01.01.2000  John Doe
1  01.01.2002  Jane Doe

If you want to convert as datetime64:

df[&#39;Column A&#39;], df[&#39;Column B&#39;] = \
    pd.to_datetime(df[&#39;Column A&#39;].str[:10], dayfirst=True), df[&#39;Column A&#39;].str[10:]
print(df)

# Output
    Column A  Column B
0 2000-01-01  John Doe
1 2002-01-01  Jane Doe

huangapple
  • 本文由 发表于 2023年4月17日 17:09:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033467.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定