英文:
Sort dates in mm/dd/yy and dd/mm/yy where I know the month they are from
问题
I have a column of date strings I know are from a single month, in this case the dates are all between January and February 2020. I want to sort them in ascending order. However, they are in different formats some in mm/dd/yy, some in dd/mm/yy. How can I sort them?
data = {
'date': ['1/1/2020', '20/1/2020', '1/1/2020', '1/28/2020', '21/1/2020', '1/25/2020', '29/1/2020'],
}
df = pd.DataFrame(data)
print(df)
Edit
Another sample of dates I'd like to be sorted
import pandas as pd
data = {'Tgl': {
1: '1/1/2023',
2: '1/1/2023',
3: '1/3/2023',
4: '1/5/2023',
5: '1/5/2023',
6: '1/9/2023',
7: '10/1/2023',
8: '12/1/2023',
9: '16/1/2023'}}
df = pd.DataFrame(data)
df = pd.to_datetime(df['Tgl'])
df = pd.to_datetime(df['Tgl'], dayfirst=True)
英文:
I have a column of date strings I know are from a single month, in this case the dates are all between January and February 2020. I want to sort them in ascending order. However, they are in different formats some in mm/dd/yy, some in dd/mm/yy. How can I sort them?
data = {
'date': ['1/1/2020','20/1/2020', '1/1/2020', '1/28/2020','21/1/2020', '1/25/2020', '29/1/2020'],
}
df = pd.DataFrame(data)
print(df)
Edit
Another sample of dates I'd like to be sorted
import pandas as pd
data = {'Tgl': {
1: '1/1/2023',
2: '1/1/2023',
3: '1/3/2023',
4: '1/5/2023',
5: '1/5/2023',
6: '1/9/2023',
7: '10/1/2023',
8: '12/1/2023',
9: '16/1/2023'}}
df = pd.DataFrame(data)
df = pd.to_datetime(df['Tgl'])
df = pd.to_datetime(df['Tgl'], dayfirst = True)
答案1
得分: 3
在提供的示例中,由于不存在一个日期≤ 12的情况与月份不同,因此存在有限的歧义。
所以您可以使用 pandas.to_datetime
(pd.to_datetime(df['date'])
) 来将其转换为干净的日期时间,或者在保留原始字符串的同时进行排序:
df.sort_values(by='date', key=pd.to_datetime)
输出:
date
0 1/1/2020
2 1/1/2020
1 20/1/2020
4 21/1/2020
5 1/25/2020
3 1/28/2020
6 29/1/2020
如果您有含有歧义的日期(例如 1/2/2020
),您可以选择使用 dayfirst
参数来优先考虑日期/月份:
df.sort_values(by='date', key=lambda x: pd.to_datetime(x, dayfirst=True))
示例:
date
2 2/1/2020 # 1月2日
1 20/1/2020
4 21/1/2020
5 1/25/2020
3 1/28/2020
6 29/1/2020
0 1/2/2020 # 2月1日
自定义逻辑
假设第一个数字是日期,除非该值大于2,否则我们将其转换为月份。
def custom_date(s):
return (
pd.to_datetime(s, dayfirst=True)
.mask(lambda x: x.dt.month>2,
pd.to_datetime(s, dayfirst=False))
)
df.sort_values(by='date', key=custom_date)
输出(带有额外列以查看自定义转换的结果):
date converted
2 2/1/2020 2020-01-02
7 10/1/2020 2020-01-10 # 都转换了
8 1/10/2020 2020-01-10 # 为1月10日
1 20/1/2020 2020-01-20
4 21/1/2020 2020-01-21
5 1/25/2020 2020-01-25
3 1/28/2020 2020-01-28
6 29/1/2020 2020-01-29
0 1/2/2020 2020-02-01
英文:
In the provided example, there is limited ambiguity as you don't have cases for which a day ≤ 12 is different from the month.
So you can use pandas.to_datetime
(pd.to_datetime(df['date'])
) to convert to a clean datetime, or, to sort while keeping the original strings:
df.sort_values(by='date', key=pd.to_datetime)
Output:
date
0 1/1/2020
2 1/1/2020
1 20/1/2020
4 21/1/2020
5 1/25/2020
3 1/28/2020
6 29/1/2020
If you have ambiguous dates (like 1/2/2020
) you can choose to give priority to days/months with the dayfirst
parameter:
df.sort_values(by='date', key=lambda x: pd.to_datetime(x, dayfirst=True))
Example:
date
2 2/1/2020 # Jan 2nd
1 20/1/2020
4 21/1/2020
5 1/25/2020
3 1/28/2020
6 29/1/2020
0 1/2/2020 # Feb 1st
custom logic
Let's assume the first number is the day, unless the value is > 2, in which case we convert it to month.
def custom_date(s):
return (
pd.to_datetime(s, dayfirst=True)
.mask(lambda x: x.dt.month>2,
pd.to_datetime(s, dayfirst=False))
)
df.sort_values(by='date', key=custom_date)
Output (with an additional column to see the result of the custom conversion):
date converted
2 2/1/2020 2020-01-02
7 10/1/2020 2020-01-10 # both converted
8 1/10/2020 2020-01-10 # to Jan 10
1 20/1/2020 2020-01-20
4 21/1/2020 2020-01-21
5 1/25/2020 2020-01-25
3 1/28/2020 2020-01-28
6 29/1/2020 2020-01-29
0 1/2/2020 2020-02-01
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论