英文:
pandas create a mask according to only the day and the month
问题
我有以下数据框:
dates,qq
1900-01-01,1
1900-01-02,2
1900-01-03,3
1900-01-04,1
1900-01-05,2
1901-01-06,5
1901-01-01,2
1901-01-02,2
1901-01-03,1
1901-01-04,4
1901-01-05,5
1901-01-06,6
1902-01-01,7
1902-01-02,1
1902-01-03,1
1902-01-04,2
1902-01-05,4
1902-01-06,5
和
dates,th
01-01,1
01-02,2
01-03,2
01-04,3
01-05,3
01-06,1
我们称它们为dfr和dfr_t。
在第二个数据框dfr_t中,存储了一些阈值。
我想要将第一个数据框中的值与第二个数据框(dfr_t)中的值按照日期和月份进行比较。
正如你所注意到的,在第二个数据框中,日期列只包含月份和日期,而年份不存在。
我想知道dfr中具有特定日期和月份的值是否小于或等于第二个数据框中(dfr_t)具有相同日期和月份的阈值。
以下是预期的结果:
1900-01-01 1 01-01 1 true
1900-01-02 2 01-02 2 true
1900-01-03 3 01-03 2 false
1900-01-04 1 01-04 3 true
1900-01-05 2 01-05 3 true
1900-01-06 5 01-06 1 false
1901-02-01 2 01-01 1 false
1901-02-02 2 01-02 2 true
1901-02-03 1 01-03 2 true
1901-02-04 4 01-04 3 false
1901-02-05 5 01-05 3 false
1901-02-06 6 01-06 1 false
1903-03-01 7 01-01 1 false
1903-03-02 1 01-02 2 true
1903-01-03 1 01-03 2 true
1903-01-04 2 01-04 3 true
1903-01-05 4 01-05 3 false
1903-01-06 5 01-06 1 false
如你所见,对于日期01-01,与1900-01-01对应的值等于1(结果为True);与1901-01-01对应的值不等于1(结果为False);与1902-01-01对应的值也不等于1(结果为False)。
我是如何读取这些数据框的:
dfr = pd.read_csv('test.csv', sep=',', index_col=0, parse_dates=True)
dfr_t = pd.read_csv('treh.csv', sep=',', index_col=0)
我看到的第一个问题是:
dfr_t.index.dtype
Out[15]: dtype('O')
我无法将其转换为日期索引:
ValueError: time data "1" doesn't match format "%m-%d", at position 0. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
一种想法是复制dfr_t并将其连接起来。然而,这个选项似乎不够通用。
英文:
I have the following dataframes:
dates,qq
1900-01-01,1
1900-01-02,2
1900-01-03,3
1900-01-04,1
1900-01-05,2
1901-01-06,5
1901-01-01,2
1901-01-02,2
1901-01-03,1
1901-01-04,4
1901-01-05,5
1901-01-06,6
1902-01-01,7
1902-01-02,1
1902-01-03,1
1902-01-04,2
1902-01-05,4
1902-01-06,5
and
dates,th
01-01,1
01-02,2
01-03,2
01-04,3
01-05,3
01-06,1
let's say dfr and dfr_t.
In the second one, dfr_t, some thresholds are stored.
I would like to compare the values of the first dataframe with the ones in the second (dfr_t) according to the day and month.
As you can notice indeed in the second one the dates column has only the month and the day while the year is non present.
I would like to know where a values of dfr with a specific day and month is less or equal to the threshold value defined in the second one (dfr_t) with the same day and month.
The following is the expected outcome:
1900-01-01 1 01-01 1 true
1900-01-02 2 01-02 2 true
1900-01-03 3 01-03 2 false
1900-01-04 1 01-04 3 true
1900-01-05 2 01-05 3 true
1900-01-06 5 01-06 1 false
1901-02-01 2 01-01 1 false
1901-02-02 2 01-02 2 true
1901-02-03 1 01-03 2 true
1901-02-04 4 01-04 3 false
1901-02-05 5 01-05 3 false
1901-02-06 6 01-06 1 false
1903-03-01 7 01-01 1 false
1903-03-02 1 01-02 2 true
1903-01-03 1 01-03 2 true
1903-01-04 2 01-04 3 true
1903-01-05 4 01-05 3 false
1903-01-06 5 01-06 1 false
As you can notice, for the date 01-01 the value corresponding to 1900-01-01 is equal to 1 (result=True); the value corresponding to 1901-01-01 is not equal\less to 1 (result=False); the value corresponding to 1902-01-01 is not equal\less to 1 (result=False).
I have read those dataframe as follow:
dfr = pd.read_csv('test.csv', sep=',',index_col=0,parse_dates=True)
dfr_t = pd.read_csv('treh.csv', sep=',',index_col=0)
The first issue that I see is the following:
dfr_t.index.dtype
Out[15]: dtype('O')
I can not convert it to dateindex
ValueError: time data "1" doesn't match format "%m-%d", at position 0. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
One idea could be duplicate dfr_t and concatenate it. This option, however, does not seem general.
答案1
得分: 4
The provided code appears to be performing a merge operation on two DataFrames, dfr
and dfr_t
, based on the "month-day" or "day" values. It then calculates a "flag" column based on the comparison of "qq" and "th" values. The code is presented in both cases: merging on "month-day" and merging on "day" only.
If you have any specific questions or need further assistance, please let me know.
英文:
The exact logic is unclear given the provided output, however you can probably merge
on the month-day, then compare the values:
out = (dfr
.merge(dfr_t, left_on=dfr.index.strftime('%m-%d'),
right_index=True, how='left')
.assign(flag=lambda d: d['qq'].le(d['th']))
)
Output:
dates dates_x qq dates_y th flag
0 01-01 1900-01-01 1 01-01 1.0 True
1 01-02 1900-01-02 2 01-02 2.0 True
2 01-03 1900-01-03 3 01-03 2.0 False
3 01-04 1900-01-04 1 01-04 3.0 True
4 01-05 1900-01-05 2 01-05 3.0 True
5 01-06 1900-01-06 5 01-06 1.0 False
6 02-01 1900-02-01 2 NaN NaN False
7 02-02 1900-02-02 2 NaN NaN False
8 02-03 1900-02-03 1 NaN NaN False
9 02-04 1900-02-04 4 NaN NaN False
10 02-05 1900-02-05 5 NaN NaN False
11 02-06 1900-02-06 6 NaN NaN False
12 03-01 1900-03-01 7 NaN NaN False
13 03-02 1900-03-02 1 NaN NaN False
14 01-03 1900-01-03 1 01-03 2.0 True
15 01-04 1900-01-04 2 01-04 3.0 True
16 01-05 1900-01-05 4 01-05 3.0 False
17 01-06 1900-01-06 5 01-06 1.0 False
matching on the day only
out = (dfr
.merge(dfr_t,
left_on=pd.to_datetime(dfr['dates']).dt.strftime('%d'),
right_on=dfr_t['dates'].str.extract(r'-(\d+)$', expand=False),
how='left')
.assign(flag=lambda d: d['qq'].le(d['th']))
)
Output:
key_0 dates_x qq dates_y th flag
0 01 1900-01-01 1 01-01 1 True
1 02 1900-01-02 2 01-02 2 True
2 03 1900-01-03 3 01-03 2 False
3 04 1900-01-04 1 01-04 3 True
4 05 1900-01-05 2 01-05 3 True
5 06 1900-01-06 5 01-06 1 False
6 01 1900-02-01 2 01-01 1 False
7 02 1900-02-02 2 01-02 2 True
8 03 1900-02-03 1 01-03 2 True
9 04 1900-02-04 4 01-04 3 False
10 05 1900-02-05 5 01-05 3 False
11 06 1900-02-06 6 01-06 1 False
12 01 1900-03-01 7 01-01 1 False
13 02 1900-03-02 1 01-02 2 True
14 03 1900-01-03 1 01-03 2 True
15 04 1900-01-04 2 01-04 3 True
16 05 1900-01-05 4 01-05 3 False
17 06 1900-01-06 5 01-06 1 False
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论