Sort dates in mm/dd/yy and dd/mm/yy where I know the month they are from

huangapple go评论62阅读模式
英文:

Sort dates in mm/dd/yy and dd/mm/yy where I know the month they are from

问题

I have a column of date strings I know are from a single month, in this case the dates are all between January and February 2020. I want to sort them in ascending order. However, they are in different formats some in mm/dd/yy, some in dd/mm/yy. How can I sort them?

data = {
    'date': ['1/1/2020', '20/1/2020', '1/1/2020', '1/28/2020', '21/1/2020', '1/25/2020', '29/1/2020'],
}

df = pd.DataFrame(data)

print(df)

Edit

Another sample of dates I'd like to be sorted


import pandas as pd

data = {'Tgl': {
  1: '1/1/2023',
  2: '1/1/2023',
  3: '1/3/2023',
  4: '1/5/2023',
  5: '1/5/2023',
  6: '1/9/2023',
  7: '10/1/2023',
  8: '12/1/2023',
  9: '16/1/2023'}}

df = pd.DataFrame(data)

df = pd.to_datetime(df['Tgl'])

df = pd.to_datetime(df['Tgl'], dayfirst=True)
英文:

I have a column of date strings I know are from a single month, in this case the dates are all between January and February 2020. I want to sort them in ascending order. However, they are in different formats some in mm/dd/yy, some in dd/mm/yy. How can I sort them?

data = {
    'date': ['1/1/2020','20/1/2020', '1/1/2020', '1/28/2020','21/1/2020', '1/25/2020', '29/1/2020'],
}


df = pd.DataFrame(data)

print(df)

Edit

Another sample of dates I'd like to be sorted


import pandas as pd

data = {'Tgl': {
  1: '1/1/2023',
  2: '1/1/2023',
  3: '1/3/2023',
  4: '1/5/2023',
  5: '1/5/2023',
  6: '1/9/2023',
  7: '10/1/2023',
  8: '12/1/2023',
  9: '16/1/2023'}}

df = pd.DataFrame(data)

df = pd.to_datetime(df['Tgl'])

df = pd.to_datetime(df['Tgl'], dayfirst = True)

答案1

得分: 3

在提供的示例中,由于不存在一个日期≤ 12的情况与月份不同,因此存在有限的歧义。

所以您可以使用 pandas.to_datetime(pd.to_datetime(df['date'])) 来将其转换为干净的日期时间,或者在保留原始字符串的同时进行排序:

df.sort_values(by='date', key=pd.to_datetime)

输出:

       date
0   1/1/2020
2   1/1/2020
1  20/1/2020
4  21/1/2020
5  1/25/2020
3  1/28/2020
6  29/1/2020

如果您有含有歧义的日期(例如 1/2/2020),您可以选择使用 dayfirst 参数来优先考虑日期/月份:

df.sort_values(by='date', key=lambda x: pd.to_datetime(x, dayfirst=True))

示例:

        date
2   2/1/2020  # 1月2日
1  20/1/2020
4  21/1/2020
5  1/25/2020
3  1/28/2020
6  29/1/2020
0   1/2/2020  # 2月1日

自定义逻辑

假设第一个数字是日期,除非该值大于2,否则我们将其转换为月份。

def custom_date(s):
    return (
      pd.to_datetime(s, dayfirst=True)
        .mask(lambda x: x.dt.month>2,
              pd.to_datetime(s, dayfirst=False))
    )

df.sort_values(by='date', key=custom_date)

输出(带有额外列以查看自定义转换的结果):

        date  converted
2   2/1/2020 2020-01-02
7  10/1/2020 2020-01-10 # 都转换了
8  1/10/2020 2020-01-10 # 为1月10日
1  20/1/2020 2020-01-20
4  21/1/2020 2020-01-21
5  1/25/2020 2020-01-25
3  1/28/2020 2020-01-28
6  29/1/2020 2020-01-29
0   1/2/2020 2020-02-01
英文:

In the provided example, there is limited ambiguity as you don't have cases for which a day ≤ 12 is different from the month.

So you can use pandas.to_datetime(pd.to_datetime(df['date'])) to convert to a clean datetime, or, to sort while keeping the original strings:

df.sort_values(by='date', key=pd.to_datetime)

Output:

       date
0   1/1/2020
2   1/1/2020
1  20/1/2020
4  21/1/2020
5  1/25/2020
3  1/28/2020
6  29/1/2020

If you have ambiguous dates (like 1/2/2020) you can choose to give priority to days/months with the dayfirst parameter:

df.sort_values(by='date', key=lambda x: pd.to_datetime(x, dayfirst=True))

Example:

        date
2   2/1/2020  # Jan 2nd
1  20/1/2020
4  21/1/2020
5  1/25/2020
3  1/28/2020
6  29/1/2020
0   1/2/2020  # Feb 1st

custom logic

Let's assume the first number is the day, unless the value is > 2, in which case we convert it to month.

def custom_date(s):
    return (
      pd.to_datetime(s, dayfirst=True)
        .mask(lambda x: x.dt.month>2,
              pd.to_datetime(s, dayfirst=False))
    )

df.sort_values(by='date', key=custom_date)

Output (with an additional column to see the result of the custom conversion):

        date  converted
2   2/1/2020 2020-01-02
7  10/1/2020 2020-01-10 # both converted 
8  1/10/2020 2020-01-10 # to Jan 10
1  20/1/2020 2020-01-20
4  21/1/2020 2020-01-21
5  1/25/2020 2020-01-25
3  1/28/2020 2020-01-28
6  29/1/2020 2020-01-29
0   1/2/2020 2020-02-01

huangapple
  • 本文由 发表于 2023年2月10日 11:24:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/75406634.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定