全外连接与小于/大于或等于条件的左连接。

huangapple go评论100阅读模式
英文:

Full left outer join with less/more than or equal conditions

问题

我有2个数据框,我想执行一个简单的全外连接左连接,条件是小于或等于。

  1. d1 = {'object': ['A1', 'B2', 'C1'],
  2. 'start_date': ['2019-01-19', '2009-07-03', '2021-11-23'],
  3. 'end_date': ['2019-03-15', '2009-10-28', '2024-12-22']
  4. }
  5. df1 = pandas.DataFrame(data=d1)
  6. 另一个数据框只是一个日历
  7. ```python
  8. df2 = pd.DataFrame({"date": pd.date_range('1990-01-01', '2050-12-31')})
  9. df["first_day_of_month"] = df["date"].to_numpy().astype('datetime64[M]')

我想要的结果类似于以下SQL:

  1. SELECT * FROM df1 LEFT JOIN df2
  2. ON df1.start_date <= df2.date AND df1.end_date >= df2.date
  3. WHERE df2.first_day_of_month = 1

因此,df1中的每一行都会与所有在df1.start_date和df2.end_date之间的月份的df2.first_day_of_month重复。

如何实现这个结果?

编辑:结果数据框将如下所示:

Object start_date end_date df2.date
A1 2019-01-19 2019-03-15 2019-02-01
A1 2019-01-19 2019-03-15 2019-03-01
B2 2009-07-03 2009-10-28 2009-08-01
B2 2009-07-03 2009-10-28 2009-09-01
B2 2009-07-03 2009-10-28 2009-10-01
  1. <details>
  2. <summary>英文:</summary>
  3. I have 2 dataframes that I want to perform a simple full outer left join with condition less than or equal.
  4. d1 = {&#39;object&#39;: [&#39;A1&#39;, &#39;B2&#39;, &#39;C1&#39;],
  5. &#39;start_date&#39;: [&#39;2019-01-19&#39;, &#39;2009-07-03&#39;, &#39;2021-11-23&#39;],
  6. &#39;end_date&#39;: [&#39;2019-03-15&#39;, &#39;2009-10-28&#39;, &#39;2024-12-22&#39;]
  7. }
  8. df1 = pandas.DataFrame(data=d1)
  9. The other dataframe is simply a calendar:
  10. df2 = pd.DataFrame({&quot;date&quot;: pd.date_range(&#39;1990-01-01&#39;, &#39;2050-12-31&#39;)})
  11. df[&quot;first_day_of_month&quot;] = df[&quot;date&quot;].to_numpy().astype(&#39;datetime64[M]&#39;)
  12. I would like to get results that are similar with this type of sql:
  13. SELECT * FROM df1 LEFT JOIN df2
  14. ON df1.start_date &lt;= df2.date AND df1.end_date &gt;= df2.date
  15. WHERE df2.first_day_of_month = 1
  16. So every row in df1 repeated with all the ```df2.first_day_of_month```
  17. of the months that are between ```df1.start_date``` and ```df2.end_date```.
  18. How can I achieve this result?
  19. Edit: the resulted dataframe would be like:
  20. | Object | start_date | end_date | df2.date |
  21. | -------- | -------------- |--------------|--------------|
  22. | A1 | 2019-01-19 | 2019-03-15 | 2019-02-01 |
  23. | A1 | 2019-01-19 | 2019-03-15 | 2019-03-01 |
  24. | B2 | 2009-07-03 | 2009-10-28 | 2009-08-01 |
  25. | B2 | 2009-07-03 | 2009-10-28 | 2009-09-01 |
  26. | B2 | 2009-07-03 | 2009-10-28 | 2009-10-01 |
  27. </details>
  28. # 答案1
  29. **得分**: 1
  30. 你可以使用 [`janitor`](https://pyjanitor-devs.github.io/pyjanitor) 的 [`conditional_join`](https://pyjanitor-devs.github.io/pyjanitor/api/functions/#janitor.functions.conditional_join) 方法:
  31. ```python
  32. # pip install janitor
  33. import janitor
  34. df1['start_date'] = pd.to_datetime(df1['start_date'])
  35. df1['end_date'] = pd.to_datetime(df1['end_date'])
  36. df2['date'] = pd.to_datetime(df2['date'])
  37. out = df1.conditional_join(df2[df2['first_day_of_month'].dt.day.eq(1)],
  38. ('start_date', 'date', '<='),
  39. ('end_date', 'date', '>='),
  40. how='left')

输出:

  1. object start_date end_date date first_day_of_month
  2. 0 A1 2019-01-19 2022-12-15 2019-01-19 2019-01-01
  3. 1 A1 2019-01-19 2022-12-15 2019-01-20 2019-01-01
  4. 2 A1 2019-01-19 2022-12-15 2019-01-21 2019-01-01
  5. 3 A1 2019-01-19 2022-12-15 2019-01-22 2019-01-01
  6. 4 A1 2019-01-19 2022-12-15 2019-01-23 2019-01-01
  7. ... ... ... ... ... ...
  8. 5435 C1 2021-11-23 2024-12-22 2024-12-18 2024-12-01
  9. 5436 C1 2021-11-23 2024-12-22 2024-12-19 2024-12-01
  10. 5437 C1 2021-11-23 2024-12-22 2024-12-20 2024-12-01
  11. 5438 C1 2021-11-23 2024-12-22 2024-12-21 2024-12-01
  12. 5439 C1 2021-11-23 2024-12-22 2024-12-22 2024-12-01
  13. [5440 rows x 5 columns]
英文:

You can use janitor's conditional_join:

  1. # pip install janitor
  2. import janitor
  3. df1[&#39;start_date&#39;] = pd.to_datetime(df1[&#39;start_date&#39;])
  4. df1[&#39;end_date&#39;] = pd.to_datetime(df1[&#39;end_date&#39;])
  5. df2[&#39;date&#39;] = pd.to_datetime(df2[&#39;date&#39;])
  6. out = df1.conditional_join(df2[df2[&#39;first_day_of_month&#39;].dt.day.eq(1)],
  7. (&#39;start_date&#39;, &#39;date&#39;, &#39;&lt;=&#39;),
  8. (&#39;end_date&#39;, &#39;date&#39;, &#39;&gt;=&#39;),
  9. how=&#39;left&#39;)

Output:

  1. object start_date end_date date first_day_of_month
  2. 0 A1 2019-01-19 2022-12-15 2019-01-19 2019-01-01
  3. 1 A1 2019-01-19 2022-12-15 2019-01-20 2019-01-01
  4. 2 A1 2019-01-19 2022-12-15 2019-01-21 2019-01-01
  5. 3 A1 2019-01-19 2022-12-15 2019-01-22 2019-01-01
  6. 4 A1 2019-01-19 2022-12-15 2019-01-23 2019-01-01
  7. ... ... ... ... ... ...
  8. 5435 C1 2021-11-23 2024-12-22 2024-12-18 2024-12-01
  9. 5436 C1 2021-11-23 2024-12-22 2024-12-19 2024-12-01
  10. 5437 C1 2021-11-23 2024-12-22 2024-12-20 2024-12-01
  11. 5438 C1 2021-11-23 2024-12-22 2024-12-21 2024-12-01
  12. 5439 C1 2021-11-23 2024-12-22 2024-12-22 2024-12-01
  13. [5440 rows x 5 columns]

答案2

得分: 0

以下是翻译好的代码部分:

  1. import pandas as pd
  2. import numpy as np
  3. d1 = {
  4. 'object': ['A1', 'B2', 'C1'],
  5. 'start_date': ['2019-01-19', '2009-07-03', '2021-11-23'],
  6. 'end_date': ['2022-12-15', '2017-05-28', '2024-12-22']
  7. }
  8. df1 = pd.DataFrame(data=d1)
  9. df2 = pd.DataFrame({"date": pd.date_range('1990-01-01', '2050-12-31')})
  10. df2_filtered = df2[df2["date"].dt.day == 1]
  11. # 创建一个用于连接的公共键
  12. df1['key'] = 1
  13. df2_filtered['key'] = 1
  14. # 执行交叉连接
  15. merged_df = pd.merge(df1, df2_filtered, on='key', how='outer')
  16. # 删除键列
  17. merged_df = merged_df.drop(columns='key')
  18. # 应用条件
  19. filtered_df = merged_df[
  20. (merged_df['start_date'] <= merged_df['date']) & (merged_df['end_date'] >= merged_df['date'])
  21. ]
  22. filtered_df
  23. # 重置索引
  24. filtered_df = filtered_df.reset_index(drop=True)
  25. # 打印结果数据框
  26. print(filtered_df)

希望这对你有所帮助。

英文:

Hello there you can use the following logic 全外连接与小于/大于或等于条件的左连接。

  1. import pandas as pd
  2. import numpy as np
  3. d1 = {
  4. &#39;object&#39;: [&#39;A1&#39;, &#39;B2&#39;, &#39;C1&#39;],
  5. &#39;start_date&#39;: [&#39;2019-01-19&#39;, &#39;2009-07-03&#39;, &#39;2021-11-23&#39;],
  6. &#39;end_date&#39;: [&#39;2022-12-15&#39;, &#39;2017-05-28&#39;, &#39;2024-12-22&#39;]
  7. }
  8. df1 = pd.DataFrame(data=d1)
  9. df2 = pd.DataFrame({&quot;date&quot;: pd.date_range(&#39;1990-01-01&#39;, &#39;2050-12-31&#39;)})
  10. df2_filtered = df2[df2[&quot;date&quot;].dt.day == 1]
  11. # Create a common key for joining
  12. df1[&#39;key&#39;] = 1
  13. df2_filtered[&#39;key&#39;] = 1
  14. # Perform the cross join
  15. merged_df = pd.merge(df1, df2_filtered, on=&#39;key&#39;, how=&#39;outer&#39;)
  16. # Drop the key column
  17. merged_df = merged_df.drop(columns=&#39;key&#39;)
  18. # Apply the conditions
  19. filtered_df = merged_df[
  20. (merged_df[&#39;start_date&#39;] &lt;= merged_df[&#39;date&#39;]) &amp; (merged_df[&#39;end_date&#39;] &gt;= merged_df[&#39;date&#39;])
  21. ]
  22. filtered_df
  23. # Reset the index
  24. filtered_df = filtered_df.reset_index(drop=True)
  25. # Print the resulting dataframe
  26. print(filtered_df)

全外连接与小于/大于或等于条件的左连接。

huangapple
  • 本文由 发表于 2023年5月25日 15:23:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76329790.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定