2023年6月29日 17:14:09go评论109阅读模式

英文:

How to remove duplicies within time interval

问题

Here is the translated code part:

df1 = pd.DataFrame({
    'IN': ['2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01'],
    'OUT': ['2023-01-10', '2023-02-10', '2023-03-10', '2023-04-10'],
    'Ticker': ['AAPL', 'AAPL', 'GOOG', 'GOOG']
})
df2 = pd.DataFrame({
    'IN': ['2023-01-05', '2023-05-01', '2023-02-05', '2023-05-01'],
    'OUT': ['2023-01-15', '2023-05-15', '2023-02-15', '2023-05-15'],
    'Ticker': ['AAPL', 'GOOG', 'MSFT', 'XXXX']
})

And here's the translation of the code you provided:

df1 = df1[~((df1['Ticker'].isin(df2['Ticker'])) & (df1['IN'].between(df2['OUT'], df2['OUT'])))]

Please note that this code is written in Python and assumes that you have the necessary libraries like pandas imported in your environment.

英文:

I have a two pandas dataframes, let's say:

df1 = pd.DataFrame({
    &#39;IN&#39;: [&#39;2023-01-01&#39;, &#39;2023-02-01&#39;, &#39;2023-03-01&#39;, &#39;2023-04-01&#39;],
    &#39;OUT&#39;: [&#39;2023-01-10&#39;, &#39;2023-02-10&#39;, &#39;2023-03-10&#39;, &#39;2023-04-10&#39;],
    &#39;Ticker&#39;: [&#39;AAPL&#39;, &#39;AAPL&#39;, &#39;GOOG&#39;, &#39;GOOG&#39;]
})
df2 = pd.DataFrame({
    &#39;IN&#39;: [&#39;2023-01-05&#39;, &#39;2023-05-01&#39;, &#39;2023-02-05&#39;, &#39;2023-05-01&#39;],
    &#39;OUT&#39;: [&#39;2023-01-15&#39;, &#39;2023-05-15&#39;, &#39;2023-02-15&#39;, &#39;2023-05-15&#39;],
    &#39;Ticker&#39;: [&#39;AAPL&#39;, &#39;GOOG&#39;, &#39;MSFT&#39;, &#39;XXXX&#39;]
})

The question is how to remove (or copy index for later drop) from df2 such records which are already in df1 (let's say like open trades) between interval IN-OUT.

E.g. the first trade/row in df1 is AAPL from 2023-01-01 to 2023-01-10, therefore the first trade in df2 must be removed because its interval is 2023-01-05 to 2023-01-15. But the second trade/row must be kept.

Does exists a way how to do it simply without iterations?

I have tried something like:

df1 = df1[~((df1[&#39;Ticker&#39;].isin(df2[&#39;Ticker&#39;])) &amp; (df1[&#39;IN&#39;].between(df2[&#39;OUT&#39;], df2[&#39;OUT&#39;])))]

but did not get right result and besides, it does not work if number of rows of dataframes are different.

答案1

得分: 0

你可以使用 merge 来匹配数据框之间的股票代码，然后使用 query 来保留你想要删除的行：

idx_to_drop = (df2.reset_index().merge(df1, on='Ticker')
                  .query('(IN_y > IN_x)')['index'].tolist())
out = df2.drop(idx_to_drop)

输出：

>>> out
          IN        OUT Ticker
1 2023-05-01 2023-05-15   GOOG
2 2023-02-05 2023-02-15   MSFT
3 2023-05-01 2023-05-15   XXXX

中间步骤：

>>> df2.reset_index().merge(df1, on='Ticker')
   index       IN_x      OUT_x Ticker       IN_y      OUT_y
0      0 2023-01-05 2023-01-15   AAPL 2023-01-01 2023-01-10
1      0 2023-01-05 2023-01-15   AAPL 2023-02-01 2023-02-10
2      1 2023-05-01 2023-05-15   GOOG 2023-03-01 2023-03-10
3      1 2023-05-01 2023-05-15   GOOG 2023-04-01 2023-04-10

英文:

You can use merge to match tickers between dataframes then use query to keep rows you want to drop:

idx_to_drop = (df2.reset_index().merge(df1, on=&#39;Ticker&#39;)
                  .query(&#39;(IN_y &gt; IN_x)&#39;)[&#39;index&#39;].tolist())
out = df2.drop(idx_to_drop)

Output:

&gt;&gt;&gt; out
          IN        OUT Ticker
1 2023-05-01 2023-05-15   GOOG
2 2023-02-05 2023-02-15   MSFT
3 2023-05-01 2023-05-15   XXXX

Intermediate step:

&gt;&gt;&gt; df2.reset_index().merge(df1, on=&#39;Ticker&#39;)
   index       IN_x      OUT_x Ticker       IN_y      OUT_y
0      0 2023-01-05 2023-01-15   AAPL 2023-01-01 2023-01-10
1      0 2023-01-05 2023-01-15   AAPL 2023-02-01 2023-02-10
2      1 2023-05-01 2023-05-15   GOOG 2023-03-01 2023-03-10
3      1 2023-05-01 2023-05-15   GOOG 2023-04-01 2023-04-10

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在时间间隔内去除重复项

问题

答案1

将BigQuery的输出从Python保存为JSON。

TarFile.extractall基本路径错误，python？

使用Selenium Python进行网页抓取选择下拉选项。

My code在从输入文本框获取数据到Python Flask时似乎有问题。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。