可以在最后一个整数后拆分pandas列吗?

huangapple go评论101阅读模式
英文:

Is it possible to split a pandas column after last integer?

问题

I am trying to split a pandas column into two separate, where the first should contain just the date and the second string. But I don't want to split it after a certain character, like counting where the last integer instead I want to make a code that is applicable in general.

My col looks like this :

Column A
01.01.2000John Doe
01.01.2002Jane Doe

And I want it to look like this:

Column A Column B
01.01.2000 Johne Doe
01.01.2001 Jane Doe
英文:

I am trying to split a pandas column into two separate, where the first should contain just the date and the second string. But I don't want to split it after a certain character, like counting where the last integer instead I want to make a code that is applicable in general.

My col looks like this :

Column A
01.01.2000John Doe
01.01.2002Jane Doe

And I want it to look like this:

Column A Column B
01.01.2000 Johne Doe
01.01.2001 Jane Doe
  1. df_t['date'] = df_t['date_time'].str[0:19]
  2. df_t["name"] = df_t["date_time"].str[19: ]
  3. tid = df_t.drop(["date_time"], axis = 1)

This is the way I did it but I need a general way as mentioned above

答案1

得分: 1

You can use str.extract along with regular expressions in your code to extract date and name information from the given data.

  1. import pandas as pd
  2. # 示例数据
  3. data = {'Column A': ['01.01.2000John Doe', '01.01.2002Jane Doe']}
  4. df = pd.DataFrame(data)
  5. # 正则表达式模式
  6. pattern = r'(?P<Date>\d{2}\.\d{2}\.\d{4})(?P<Name>.*)&'
  7. # 将日期和姓名提取到单独的列中
  8. df[['Column A', 'Column B']] = df['Column A'].str.extract(pattern)
  9. print(df)

Explanation:

  • pattern 变量包含了正则表达式模式。表达式 (?P<Date>\d{2}\.\d{2}\.\d{4}) 用于捕获日期,而 (?P<Name>.*) 用于捕获姓名。
  • 使用 ?P<> 语法对捕获的组进行命名,以便更容易在DataFrame中创建新列。
英文:

You can use str.extract together with regular expressions:

  1. import pandas as pd
  2. # Sample data
  3. data = {&#39;Column A&#39;: [&#39;01.01.2000John Doe&#39;, &#39;01.01.2002Jane Doe&#39;]}
  4. df = pd.DataFrame(data)
  5. # Regular expression pattern
  6. pattern = r&#39;(?P&lt;Date&gt;\d{2}\.\d{2}\.\d{4})(?P&lt;Name&gt;.*)&#39;
  7. # Extracting the date and name into separate columns
  8. df[[&#39;Column A&#39;, &#39;Column B&#39;]] = df[&#39;Column A&#39;].str.extract(pattern)
  9. print(df)

Explanation:

  • The pattern variable contains the regular expression pattern. The expression (?P<Date>\d{2}.\d{2}.\d{4}) captures the date, and (?P<Name>.*) captures the name.
  • The ?P<> syntax is used to name the captured groups, which makes it easier to create the new columns in the DataFrame.

EDIT

  1. import pandas as pd
  2. # Sample data
  3. data = {
  4. &#39;1Column A&#39;: [&#39;2000-01-01 00:00:00John Doe&#39;, &#39;2002-01-01 00:00:00Jane Doe&#39;],
  5. &#39;2Column B&#39;: [&#39;2000-01-01 00:00:00Alice&#39;, &#39;2002-01-01 00:00:00Bob&#39;],
  6. &#39;3Column C&#39;: [&#39;Some other data&#39;, &#39;Not a date and name&#39;],
  7. }
  8. df = pd.DataFrame(data)
  9. # Regular expression pattern
  10. pattern = r&#39;(?P&lt;Date&gt;\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(?P&lt;Name&gt;.*)&#39;
  11. # Iterate through columns and apply the pattern conditionally
  12. for col in df.columns:
  13. if col.startswith(&quot;1&quot;) or col.startswith(&quot;2&quot;):
  14. # Extract date and name into separate columns with suffixes
  15. df[[f&quot;{col}_date&quot;, f&quot;{col}_name&quot;]] = df[col].str.extract(pattern)
  16. # Drop the original column
  17. df.drop(col, axis=1, inplace=True)
  18. print(df)

答案2

得分: 1

你可以简单地使用索引:

  1. df['Column A'], df['Column B'] = df['Column A'].str[:10], df['Column A'].str[10:]
  2. print(df)
  3. # 输出
  4. Column A Column B
  5. 0 01.01.2000 John Doe
  6. 1 01.01.2002 Jane Doe

如果你想转换成datetime64

  1. df['Column A'], df['Column B'] = \
  2. pd.to_datetime(df['Column A'].str[:10], dayfirst=True), df['Column A'].str[10:]
  3. print(df)
  4. # 输出
  5. Column A Column B
  6. 0 2000-01-01 John Doe
  7. 1 2002-01-01 Jane Doe
英文:

You can simply use indexing:

  1. df[&#39;Column A&#39;], df[&#39;Column B&#39;] = df[&#39;Column A&#39;].str[:10], df[&#39;Column A&#39;].str[10:]
  2. print(df)
  3. # Output
  4. Column A Column B
  5. 0 01.01.2000 John Doe
  6. 1 01.01.2002 Jane Doe

If you want to convert as datetime64:

  1. df[&#39;Column A&#39;], df[&#39;Column B&#39;] = \
  2. pd.to_datetime(df[&#39;Column A&#39;].str[:10], dayfirst=True), df[&#39;Column A&#39;].str[10:]
  3. print(df)
  4. # Output
  5. Column A Column B
  6. 0 2000-01-01 John Doe
  7. 1 2002-01-01 Jane Doe

huangapple
  • 本文由 发表于 2023年4月17日 17:09:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033467.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定