英文:
How to remove duplicies within time interval
问题
Here is the translated code part:
df1 = pd.DataFrame({
'IN': ['2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01'],
'OUT': ['2023-01-10', '2023-02-10', '2023-03-10', '2023-04-10'],
'Ticker': ['AAPL', 'AAPL', 'GOOG', 'GOOG']
})
df2 = pd.DataFrame({
'IN': ['2023-01-05', '2023-05-01', '2023-02-05', '2023-05-01'],
'OUT': ['2023-01-15', '2023-05-15', '2023-02-15', '2023-05-15'],
'Ticker': ['AAPL', 'GOOG', 'MSFT', 'XXXX']
})
And here's the translation of the code you provided:
df1 = df1[~((df1['Ticker'].isin(df2['Ticker'])) & (df1['IN'].between(df2['OUT'], df2['OUT'])))]
Please note that this code is written in Python and assumes that you have the necessary libraries like pandas imported in your environment.
英文:
I have a two pandas dataframes, let's say:
df1 = pd.DataFrame({
'IN': ['2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01'],
'OUT': ['2023-01-10', '2023-02-10', '2023-03-10', '2023-04-10'],
'Ticker': ['AAPL', 'AAPL', 'GOOG', 'GOOG']
})
df2 = pd.DataFrame({
'IN': ['2023-01-05', '2023-05-01', '2023-02-05', '2023-05-01'],
'OUT': ['2023-01-15', '2023-05-15', '2023-02-15', '2023-05-15'],
'Ticker': ['AAPL', 'GOOG', 'MSFT', 'XXXX']
})
The question is how to remove (or copy index for later drop) from df2 such records which are already in df1 (let's say like open trades) between interval IN-OUT.
E.g. the first trade/row in df1 is AAPL from 2023-01-01 to 2023-01-10, therefore the first trade in df2 must be removed because its interval is 2023-01-05 to 2023-01-15. But the second trade/row must be kept.
Does exists a way how to do it simply without iterations?
I have tried something like:
df1 = df1[~((df1['Ticker'].isin(df2['Ticker'])) & (df1['IN'].between(df2['OUT'], df2['OUT'])))]
but did not get right result and besides, it does not work if number of rows of dataframes are different.
答案1
得分: 0
你可以使用 merge
来匹配数据框之间的股票代码,然后使用 query
来保留你想要删除的行:
idx_to_drop = (df2.reset_index().merge(df1, on='Ticker')
.query('(IN_y > IN_x)')['index'].tolist())
out = df2.drop(idx_to_drop)
输出:
>>> out
IN OUT Ticker
1 2023-05-01 2023-05-15 GOOG
2 2023-02-05 2023-02-15 MSFT
3 2023-05-01 2023-05-15 XXXX
中间步骤:
>>> df2.reset_index().merge(df1, on='Ticker')
index IN_x OUT_x Ticker IN_y OUT_y
0 0 2023-01-05 2023-01-15 AAPL 2023-01-01 2023-01-10
1 0 2023-01-05 2023-01-15 AAPL 2023-02-01 2023-02-10
2 1 2023-05-01 2023-05-15 GOOG 2023-03-01 2023-03-10
3 1 2023-05-01 2023-05-15 GOOG 2023-04-01 2023-04-10
英文:
You can use merge
to match tickers between dataframes then use query
to keep rows you want to drop:
idx_to_drop = (df2.reset_index().merge(df1, on='Ticker')
.query('(IN_y > IN_x)')['index'].tolist())
out = df2.drop(idx_to_drop)
Output:
>>> out
IN OUT Ticker
1 2023-05-01 2023-05-15 GOOG
2 2023-02-05 2023-02-15 MSFT
3 2023-05-01 2023-05-15 XXXX
Intermediate step:
>>> df2.reset_index().merge(df1, on='Ticker')
index IN_x OUT_x Ticker IN_y OUT_y
0 0 2023-01-05 2023-01-15 AAPL 2023-01-01 2023-01-10
1 0 2023-01-05 2023-01-15 AAPL 2023-02-01 2023-02-10
2 1 2023-05-01 2023-05-15 GOOG 2023-03-01 2023-03-10
3 1 2023-05-01 2023-05-15 GOOG 2023-04-01 2023-04-10
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论