2023年7月3日 21:29:51go评论69阅读模式

英文:

Fill gaps in time intervals with other time intervals

问题

我们有两个带有时间间隔的表格。我想要填补df1中的间隙，使用df2，就像图中所示得到df3。df1保持不变，只有df2中与df1间隙（差异）重叠的部分被移到df3中。

import pandas as pd

df1 = pd.DataFrame({'Start': ['2023-01-01', '2023-02-01', '2023-03-15', '2023-04-18', '2023-05-15', '2023-05-25'],
                    'End': ['2023-01-15', '2023-02-20', '2023-04-01', '2023-05-03', '2023-05-20', '2023-05-30']})

df2 = pd.DataFrame({'Start': ['2023-01-02', '2023-01-05', '2023-01-20', '2023-02-25', '2023-03-05', '2023-04-18', '2023-05-12'],
                    'End': ['2023-01-03', '2023-01-10', '2023-02-10', '2023-03-01', '2023-04-15', '2023-05-10', '2023-06-05']})

df3 = pd.DataFrame({'Start': ['2023-01-01', '2023-01-20', '2023-02-01', '2023-02-25', '2023-03-05', '2023-03-15', '2023-04-02', '2023-04-18', '2023-05-04', '2023-05-12', '2023-05-15', '2023-05-21', '2023-05-25', '2023-05-31'],
                    'End': ['2023-01-15', '2023-01-31', '2023-02-20', '2023-03-01', '2023-03-14', '2023-04-01', '2023-04-15', '2023-05-03', '2023-05-10', '2023-05-14', '2023-05-20', '2023-05-24', '2023-05-30', '2023-06-05']})

# 你的代码部分
import plotly.express as px

df_plot = pd.concat(
    [
        df1.assign(color='df1', df='df1'),
        df2.assign(color='df2', df='df2'), 
        df3.assign(color=['df1', 'df2', 'df1', 'df2', 'df2', 'df1', 'df2', 'df1', 'df2', 'df2', 'df1', 'df2', 'df1', 'df2'], df='df3')
    ],
)
fig = px.timeline(df_plot, x_start="Start", x_end="End", y="df", color="color")
fig.update_yaxes(categoryorder='category descending')
fig.show()

英文:

We have two tables with time intervals. I want to fill gaps in df1 with df2 as in the graph to get df3. df1 is moved to df3 as it is, and only the parts of df2 that lie in the gaps of df1 (difference) are moved to df3.

df1 = pd.DataFrame({&#39;Start&#39;: [&#39;2023-01-01&#39;, &#39;2023-02-01&#39;, &#39;2023-03-15&#39;, &#39;2023-04-18&#39;, &#39;2023-05-15&#39;, &#39;2023-05-25&#39;],
                    &#39;End&#39;: [&#39;2023-01-15&#39;, &#39;2023-02-20&#39;, &#39;2023-04-01&#39;, &#39;2023-05-03&#39;,  &#39;2023-05-20&#39;, &#39;2023-05-30&#39;]})

df2 = pd.DataFrame({&#39;Start&#39;: [&#39;2023-01-02&#39;, &#39;2023-01-05&#39;, &#39;2023-01-20&#39;, &#39;2023-02-25&#39;, &#39;2023-03-05&#39;, &#39;2023-04-18&#39;, &#39;2023-05-12&#39;],
                    &#39;End&#39;: [&#39;2023-01-03&#39;, &#39;2023-01-10&#39;, &#39;2023-02-10&#39;, &#39;2023-03-01&#39;, &#39;2023-04-15&#39;, &#39;2023-05-10&#39;, &#39;2023-06-05&#39;]})

df3 =  pd.DataFrame({&#39;Start&#39;: [&#39;2023-01-01&#39;, &#39;2023-01-20&#39;, &#39;2023-02-01&#39;, &#39;2023-02-25&#39;, &#39;2023-03-05&#39;, &#39;2023-03-15&#39;, &#39;2023-04-02&#39;, &#39;2023-04-18&#39;, &#39;2023-05-04&#39;, &#39;2023-05-12&#39;, &#39;2023-05-15&#39;, &#39;2023-05-21&#39;, &#39;2023-05-25&#39;, &#39;2023-05-31&#39;],
                     &#39;End&#39;: [&#39;2023-01-15&#39;, &#39;2023-01-31&#39;, &#39;2023-02-20&#39;, &#39;2023-03-01&#39;, &#39;2023-03-14&#39;, &#39;2023-04-01&#39;, &#39;2023-04-15&#39;, &#39;2023-05-03&#39;, &#39;2023-05-10&#39;, &#39;2023-05-14&#39;, &#39;2023-05-20&#39;, &#39;2023-05-24&#39;, &#39;2023-05-30&#39;, &#39;2023-06-05&#39;]})

# df1

        Start         End
0  2023-01-01  2023-01-15
1  2023-02-01  2023-02-20
2  2023-03-15  2023-04-01
3  2023-04-18  2023-05-03
4  2023-05-15  2023-05-20
5  2023-05-25  2023-05-30

# df2

        Start         End
0  2023-01-02  2023-01-03
1  2023-01-05  2023-01-10
2  2023-01-20  2023-02-10
3  2023-02-25  2023-03-01
4  2023-03-05  2023-04-15
5  2023-04-18  2023-05-10
6  2023-05-12  2023-06-05

# df3 (desired result)

         Start         End
0   2023-01-01  2023-01-15
1   2023-01-20  2023-01-31
2   2023-02-01  2023-02-20
3   2023-02-25  2023-03-01
4   2023-03-05  2023-03-14
5   2023-03-15  2023-04-01
6   2023-04-02  2023-04-15
7   2023-04-18  2023-05-03
8   2023-05-04  2023-05-10
9   2023-05-12  2023-05-14
10  2023-05-15  2023-05-20
11  2023-05-21  2023-05-24
12  2023-05-25  2023-05-30
13  2023-05-31  2023-06-05

Code to generate plot:

import plotly.express as px

df_plot = pd.concat(
    [
        df1.assign(color=&#39;df1&#39;, df=&#39;df1&#39;),
        df2.assign(color=&#39;df2&#39;, df=&#39;df2&#39;), 
        df3.assign(color=[&#39;df1&#39;, &#39;df2&#39;, &#39;df1&#39;, &#39;df2&#39;, &#39;df2&#39;, &#39;df1&#39;, &#39;df2&#39;, &#39;df1&#39;, &#39;df2&#39;, &#39;df2&#39;, &#39;df1&#39;, &#39;df2&#39;, &#39;df1&#39;, &#39;df2&#39;], df=&#39;df3&#39;)
    ],
)
fig = px.timeline(df_plot, x_start=&quot;Start&quot;, x_end=&quot;End&quot;, y=&quot;df&quot;, color=&quot;color&quot;)
fig.update_yaxes(categoryorder=&#39;category descending&#39;)
fig.show()

答案1

得分: 3

我认为我可以让你接近：

df1 = pd.DataFrame({'开始': ['2023-01-01', '2023-02-01', '2023-03-15'],
                    '结束': ['2023-01-15', '2023-02-20', '2023-04-01']})
df2 = pd.DataFrame({'开始': ['2023-01-02', '2023-01-05', '2023-01-20', '2023-02-25', '2023-03-05'],
                    '结束': ['2023-01-03', '2023-01-10', '2023-02-10', '2023-03-01', '2023-04-15']})
df3 = pd.DataFrame({'开始': ['2023-01-01', '2023-01-20', '2023-02-01', '2023-02-25', '2023-03-05', '2023-03-15', '2023-04-02'],
                    '结束': ['2023-01-15', '2023-01-31', '2023-02-20', '2023-03-01', '2023-03-14', '2023-04-01', '2023-04-15']})


df1['日期'] = [pd.date_range(s,e) for s, e in zip(df1['开始'], df1['结束'])]    
df2['日期'] = [pd.date_range(s,e) for s, e in zip(df2['开始'], df2['结束'])]

df1e = df1.explode('日期').assign(来源='df1')
df2e = df2.explode('日期').assign(来源='df2')

df3e = df1e.set_index(df1e['日期']).combine_first(df2e.set_index(df2e['日期']))
df3e['日期'] = pd.to_datetime(df3e['日期'])

df3e['分组'] = ((df3e['来源'] != df3e['来源'].shift()) | 
                 (df3e['日期'] - df3e['日期'].shift() > pd.Timedelta(days=1))).cumsum()

df_out = df3e.groupby(['分组', '来源'])['日期'].agg([min, max])

英文:

I think I can get you close:

df1 = pd.DataFrame({&#39;Start&#39;: [&#39;2023-01-01&#39;, &#39;2023-02-01&#39;, &#39;2023-03-15&#39;],
                    &#39;End&#39;: [&#39;2023-01-15&#39;, &#39;2023-02-20&#39;, &#39;2023-04-01&#39;]})
df2 = pd.DataFrame({&#39;Start&#39;: [&#39;2023-01-02&#39;, &#39;2023-01-05&#39;, &#39;2023-01-20&#39;, &#39;2023-02-25&#39;, &#39;2023-03-05&#39;],
                    &#39;End&#39;: [&#39;2023-01-03&#39;, &#39;2023-01-10&#39;, &#39;2023-02-10&#39;, &#39;2023-03-01&#39;, &#39;2023-04-15&#39;]})
df3 =  pd.DataFrame({&#39;Start&#39;: [&#39;2023-01-01&#39;, &#39;2023-01-20&#39;, &#39;2023-02-01&#39;, &#39;2023-02-25&#39;, &#39;2023-03-05&#39;, &#39;2023-03-15&#39;, &#39;2023-04-02&#39;],
                     &#39;End&#39;: [&#39;2023-01-15&#39;, &#39;2023-01-31&#39;, &#39;2023-02-20&#39;, &#39;2023-03-01&#39;, &#39;2023-03-14&#39;, &#39;2023-04-01&#39;, &#39;2023-04-15&#39;]})


df1[&#39;dates&#39;] = [pd.date_range(s,e) for s, e in zip(df1[&#39;Start&#39;], df1[&#39;End&#39;])]    
df2[&#39;dates&#39;] = [pd.date_range(s,e) for s, e in zip(df2[&#39;Start&#39;], df2[&#39;End&#39;])]

df1e = df1.explode(&#39;dates&#39;).assign(source=&#39;df1&#39;)
df2e = df2.explode(&#39;dates&#39;).assign(source=&#39;df2&#39;)


df3e = df1e.set_index(df1e[&#39;dates&#39;]).combine_first(df2e.set_index(df2e[&#39;dates&#39;]))
df3e[&#39;dates&#39;] = pd.to_datetime(df3e[&#39;dates&#39;])

df3e[&#39;group&#39;] = ((df3e[&#39;source&#39;] != df3e[&#39;source&#39;].shift()) | 
                 (df3e[&#39;dates&#39;] - df3e[&#39;dates&#39;].shift() &gt; pd.Timedelta(days=1))).cumsum()

df_out = df3e.groupby([&#39;group&#39;, &#39;source&#39;])[&#39;dates&#39;].agg([min, max])

Output:

                    min        max
group source                      
1     df1    2023-01-01 2023-01-15
2     df2    2023-01-20 2023-01-31
3     df1    2023-02-01 2023-02-20
4     df2    2023-02-25 2023-03-01
5     df2    2023-03-05 2023-03-14
6     df1    2023-03-15 2023-04-01
7     df2    2023-04-02 2023-04-15

Graphical Output:

import plotly.express as px

df_out = df_out.reset_index().rename({&#39;source&#39;:&#39;color&#39;, &#39;min&#39;:&#39;Start&#39;, &#39;max&#39;:&#39;End&#39;}, axis=1)
df_plot = pd.concat(
    [
        df1.assign(color=&#39;df1&#39;),
        df2.assign(color=&#39;df2&#39;), 
        df_out
    ],
    keys=[&#39;df1&#39; , &#39;df2&#39;, &#39;df3&#39;]
).reset_index(level=0, names=&#39;df&#39;)
fig = px.timeline(df_plot, x_start=&quot;Start&quot;, x_end=&quot;End&quot;, y=&quot;df&quot;, color=&quot;color&quot;)
fig.update_yaxes(categoryorder=&#39;category descending&#39;)
fig.show()

Graph:

with updated dataset:

                    min        max
group source                      
1     df1    2023-01-01 2023-01-15
2     df2    2023-01-20 2023-01-31
3     df1    2023-02-01 2023-02-20
4     df2    2023-02-25 2023-03-01
5     df2    2023-03-05 2023-03-14
6     df1    2023-03-15 2023-04-01
7     df2    2023-04-02 2023-04-15
8     df1    2023-04-18 2023-05-03
9     df2    2023-05-04 2023-05-10
10    df2    2023-05-12 2023-05-14
11    df1    2023-05-15 2023-05-20
12    df2    2023-05-21 2023-05-24
13    df1    2023-05-25 2023-05-30
14    df2    2023-05-31 2023-06-05

Graph output:

答案2

得分: 2

以下是代码部分的翻译：

tmp = (pd.concat([df1, df2], keys=['ref', 'other']).rename_axis(index=['origin', 'index'])
         .sort_values(by=['Start', 'End']).astype('datetime64').reset_index()
         .assign(ref_next_start=lambda d: d['Start'].where(d['origin']=='ref').bfill(),
                 ref_prev_end=lambda d: d['End'].where(d['origin']=='ref').ffill(),
                 other_prev_end=lambda d: d['End'].where(d['origin']=='other').ffill(),
                )
      )

m = tmp['origin'].eq('ref')

out = pd.concat(
          [(tmp.loc[tmp['End'].ge(tmp['ref_prev_end'])]
               .assign(End=lambda d: d['End'].mask(d['origin'].shift(-1).eq('ref'),
                                                   d['ref_next_start']-pd.Timedelta("1d")))
           )[['origin', 'Start', 'End']],
           (tmp.loc[m & tmp['End'].le(tmp['other_prev_end']), ['End', 'other_prev_end']]
               .rename(columns={'End': 'Start', 'other_prev_end': 'End'})
               .assign(Start=lambda d: d['Start']+pd.Timedelta("1d"), origin='other')
            )]
).sort_values(by=['Start', 'End'], ignore_index=True)

tmp = pd.concat([df1, df2], keys=['df1', 'df2'], names=['origin']).reset_index(level=0)

tmp['Start'] = pd.to_datetime(tmp['Start'])
tmp['End'] = pd.to_datetime(tmp['End'])

out = (tmp
 .sort_values(by=['Start', 'End'], ignore_index=True)
 .assign(m=lambda d: d['origin'].eq('df1'),
         Start=lambda d: d['Start'].mask((~d['m']) & d['Start'].lt(s:=d['End'].where(out['m']).ffill()), s.add(pd.Timedelta('1day'))),
         End=lambda d: d['End'].mask((~d['m']) & d['End'].gt(s:=d['Start'].where(out['m']).bfill()), s.sub(pd.Timedelta('1day'))),
        )
  .query('Start<End').drop(columns='m')
)

希望这对你有帮助！

英文:

Full vectorial solution:

tmp = (pd.concat([df1, df2], keys=[&#39;ref&#39;, &#39;other&#39;]).rename_axis(index=[&#39;origin&#39;, &#39;index&#39;])
         .sort_values(by=[&#39;Start&#39;, &#39;End&#39;]).astype(&#39;datetime64&#39;).reset_index()
         .assign(ref_next_start=lambda d: d[&#39;Start&#39;].where(d[&#39;origin&#39;]==&#39;ref&#39;).bfill(),
                 ref_prev_end=lambda d: d[&#39;End&#39;].where(d[&#39;origin&#39;]==&#39;ref&#39;).ffill(),
                 other_prev_end=lambda d: d[&#39;End&#39;].where(d[&#39;origin&#39;]==&#39;other&#39;).ffill(),
                )
      )

m = tmp[&#39;origin&#39;].eq(&#39;ref&#39;)

out = pd.concat(
          [(tmp.loc[tmp[&#39;End&#39;].ge(tmp[&#39;ref_prev_end&#39;])]
               .assign(End=lambda d: d[&#39;End&#39;].mask(d[&#39;origin&#39;].shift(-1).eq(&#39;ref&#39;),
                                                   d[&#39;ref_next_start&#39;]-pd.Timedelta(&quot;1d&quot;)))
           )[[&#39;origin&#39;, &#39;Start&#39;, &#39;End&#39;]],
           (tmp.loc[m &amp; tmp[&#39;End&#39;].le(tmp[&#39;other_prev_end&#39;]), [&#39;End&#39;, &#39;other_prev_end&#39;]]
               .rename(columns={&#39;End&#39;: &#39;Start&#39;, &#39;other_prev_end&#39;: &#39;End&#39;})
               .assign(Start=lambda d: d[&#39;Start&#39;]+pd.Timedelta(&quot;1d&quot;), origin=&#39;other&#39;)
            )]
).sort_values(by=[&#39;Start&#39;, &#39;End&#39;], ignore_index=True)

Output:

  origin      Start        End
0    ref 2023-01-01 2023-01-15
1  other 2023-01-20 2023-01-31
2    ref 2023-02-01 2023-02-20
3  other 2023-02-25 2023-03-01
4  other 2023-03-05 2023-03-14
5    ref 2023-03-15 2023-04-01
6  other 2023-04-02 2023-04-15

Previous temporary solution that also doesn't compute the last interval, it should be relatively efficient:

tmp = pd.concat([df1, df2], keys=[&#39;df1&#39;, &#39;df2&#39;], names=[&#39;origin&#39;]).reset_index(level=0)

tmp[&#39;Start&#39;] = pd.to_datetime(tmp[&#39;Start&#39;])
tmp[&#39;End&#39;] = pd.to_datetime(tmp[&#39;End&#39;])

out = (tmp
 .sort_values(by=[&#39;Start&#39;, &#39;End&#39;], ignore_index=True)
 .assign(m=lambda d: d[&#39;origin&#39;].eq(&#39;df1&#39;),
         Start=lambda d: d[&#39;Start&#39;].mask((~d[&#39;m&#39;]) &amp; d[&#39;Start&#39;].lt(s:=d[&#39;End&#39;].where(out[&#39;m&#39;]).ffill()), s.add(pd.Timedelta(&#39;1day&#39;))),
         End=lambda d: d[&#39;End&#39;].mask((~d[&#39;m&#39;]) &amp; d[&#39;End&#39;].gt(s:=d[&#39;Start&#39;].where(out[&#39;m&#39;]).bfill()), s.sub(pd.Timedelta(&#39;1day&#39;))),
        )
  .query(&#39;Start&lt;End&#39;).drop(columns=&#39;m&#39;)
)

Output:

  origin      Start        End
0    df1 2023-01-01 2023-01-15
3    df2 2023-01-20 2023-01-31
4    df1 2023-02-01 2023-02-20
5    df2 2023-02-25 2023-03-01
6    df2 2023-03-05 2023-03-14
7    df1 2023-03-15 2023-04-01

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

填补时间间隔中的空白部分与其他时间间隔。

问题

答案1

答案2

你可以使用Python的`readlines`函数来按特定模式格式化文件中的行。

Conditionally modify import statements.

如何从逻辑表达式创建一个树形结构？

避免在Pandas数据框中使用for循环遍历列值，而是使用函数。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论