2023年5月24日 20:17:43go评论91阅读模式

英文:

How to find indices/rows where a combination of columns are equal to columns another dataframe

问题

I can help with the translation part. Here's your provided content translated into Chinese:

我有两个数据框。第一个数据框的行中有多个相同值的条目，而另一个数据框中每个列组合只有一行。我想要获取第一个数据框中列 A、B 和 C 在两个数据框中都是两两相等的行的索引/行号，同时第一个数据框中的列 D 等于 n。列 A、B 和 C 是所有列的子集。我想要明确指定只有这些列必须相等。

使用这些索引，我想要更改列 E 的值（简单地使用 df.loc[indices, 'E'] = 'n'）。

第一个数据框是具有相同值多个条目的那个。原始数据框具有日期时间戳，因此不会有完全相同的行。

第二个数据框每个条目只有一行，不包含列 D 和 E，是第一个数据框中条目的子集。

在这个示例中，期望的输出是满足所有条件的索引列表。

[2, 3, 4]

我尝试过使用合并（merge），虽然它确实给了我正确的行，但我没有得到用于更新第一个数据框的正确索引。我还尝试过在这三列上使用 MultiIndex，但然后我就无法筛选列 D。

如何实现这个目标？

英文:

I have two dataframes. The first one is with multiple entries of the same values in rows and the other only has one row for each combination of columns. I want to get the indices/rows of the first dataframe where column A, B and C are pairwise equal in the two dataframes and where column D in the first dataframe is equal to n. Column A, B and C are a subset of all columns. I want to specify that only those columns have to be equal.
With those indices I want to change the value of column E (simply using df.loc[indices, 'E']='n'.

The first one is the one with multiple entries for same values. The original dataframe has date stamps so it doesn't have multiple completely identical rows.

   A B C D E
0  a b c y y
1  a b c y y
2  b n m n y
3  b n m n y
4  b n m n y
5  t u j y y
6  t u j y y
7  t u j y y
8  e t y y y
9  e t y y y

The second one only has one row per entry and not column D and E and is a subset of the entries in the first dataframe.

   A B C
0  a b c
1  b n m
2  t u j

In this example the desired output is the list of indices where all conditions are fulfilled.

[2,3,4]

I've tried with merge, while that does give me the right rows, I don't get the right indices to be used to update the first dataframe. I've also tried MultiIndex on the three columns but then I'm missing the filter on column D.

How can I achieve this?

答案1

得分: 2

以下是翻译好的内容：

一种获取索引列表的方法是合并两个数据框，将索引作为第一个数据框的一列，并将第二个数据框中的D列分配为'n'，以便只匹配在A、B和C上匹配且D中具有'n'的行。然后，您可以从第一个数据框中取出索引列作为结果：

indices = (df1
    .reset_index(names='index')
    .merge(df2.assign(D='n'), on=['A', 'B', 'C', 'D'])['index']
    .to_list())

输出（针对您的示例数据）：

[2, 3, 4]

请注意，to_list() 不是严格必需的，以下代码将在没有它的情况下工作。

然后，您可以使用这些索引将E设置为'n'：

df1.loc[indices, 'E'] = 'n'

输出：

   A  B  C  D  E
0  a  b  c  y  y
1  a  b  c  y  y
2  b  n  m  n  n
3  b  n  m  n  n
4  b  n  m  n  n
5  t  u  j  y  y
6  t  u  j  y  y
7  t  u  j  y  y
8  e  t  y  y  y
9  e  t  y  y  y

英文:

One way to get the list of indexes is to merge the two dataframes, adding the index as a column to the first, and assigning 'n' to the D column in the second so that only rows which match on A, B and C and have 'n' in D will match. Then you can take the index column from the first as your result:

indices = (df1
    .reset_index(names=&#39;index&#39;)
    .merge(df2.assign(D=&#39;n&#39;), on=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;])[&#39;index&#39;]
    .to_list())

Output (for your sample data):

[2, 3, 4]

Note that the to_list() is not strictly necessary, the following code will work without it.

You can then use the indices to set E to 'n':

df1.loc[indices, &#39;E&#39;] = &#39;n&#39;

Output:

   A  B  C  D  E
0  a  b  c  y  y
1  a  b  c  y  y
2  b  n  m  n  n
3  b  n  m  n  n
4  b  n  m  n  n
5  t  u  j  y  y
6  t  u  j  y  y
7  t  u  j  y  y
8  e  t  y  y  y
9  e  t  y  y  y

答案2

得分: 1

以下是您要翻译的内容：

假设 df1 和 df2，您可以使用布尔索引：

# 是否在df2中也存在相同的列（A/B/C）？
m1 = df1.merge(df2.assign(mask=True), how='left')['mask'].fillna(False)
# 或者
import numpy as np
m1 = np.isin(df1[df2.columns], df2).all(1)
# "D" 是否为 "n"？
m2 = df1['D'].eq('n')
# 如果两个条件都为True，则更新E
df1.loc[m1&m2, 'E'] = 'n'

输出：

    A  B  C  D  E
0   a  b  c  y  y
1   a  b  c  y  y
2   b  n  m  n  n
3   b  n  m  n  n
4   b  n  m  n  n
5   t  u  j  y  y
6   t  u  j  y  y
7   t  u  j  y  y
8   e  t  y  y  y
9   e  t  y  y  y
10  e  t  y  n  y  # 未更新

目标索引：

df1.index[m1&m2]
# Index([2, 3, 4], dtype='int64')

请注意，代码部分没有进行翻译，只提供了翻译好的内容。

英文:

Assuming df1 and df2, you can use boolean indexing:

# are the common columns (A/B/C) also in df2?
m1 = df1.merge(df2.assign(mask=True), how=&#39;left&#39;)[&#39;mask&#39;].fillna(False)
# or
import numpy as np
m1 = np.isin(df1[df2.columns], df2).all(1)
# is &quot;D&quot; a &quot;n&quot;?
m2 = df1[&#39;D&#39;].eq(&#39;n&#39;)
# if both conditions are True, update E
df1.loc[m1&amp;m2, &#39;E&#39;] = &#39;n&#39;

Output:

    A  B  C  D  E
0   a  b  c  y  y
1   a  b  c  y  y
2   b  n  m  n  n
3   b  n  m  n  n
4   b  n  m  n  n
5   t  u  j  y  y
6   t  u  j  y  y
7   t  u  j  y  y
8   e  t  y  y  y
9   e  t  y  y  y
10  e  t  y  n  y  # not updated

Target indices:

df1.index[m1&amp;m2]
# Index([2, 3, 4], dtype=&#39;int64&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何找到一个组合的列等于另一个数据框的列的索引/行。

问题

答案1

答案2

if (item 在 array 中) { Java }

App Engine Python 3开发环境

Fitz draw_rect 坐标

Web Scraping News Articles Python（使用Python进行网页抓取新闻文章）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。