2023年1月9日 03:21:52go评论77阅读模式

英文:

Comparing Two Dataframes to See if One Sits Inside the Range of the Other's columns

问题

我有两个如下的数据框：

```python
import pandas as pd
import itertools

df_test = {
    "col1": [1, 5, 6, 5],
    "col2": [4, 5, 6, 7],
    "col3": [5, 9, 10, 11],
    "col4": [11, 10, 10, 4],
}
df_train = {
    "col1": [3, 6, 7, 4, 4],
    "col2": [5, 10, 5, 5, 5],
    "col3": [6, 8, 9, 10, 6],
    "col4": [11, 10, 8, 12, 6],
}
df_train = pd.DataFrame(data=df_train)
df_test = pd.DataFrame(data=df_test)

我已经找到了df_train的最小值和最大值，想要查看df_test是否在这些值范围内，我想遍历每一行，并根据训练集的每列给出的最小值和最大值不断地更改。我不断地比较df_test行中的每个值，看它是否在最小值和最大值内。如果是，则将True插入到一个列表中，否则将插入False。

# 寻找最小最大值
df_train_minMax = df_train.agg([min, max])

# 确定测试数据的列数和行数
columns = len(df_test.columns)
rows = len(df_test.index)

df_train_minMax看起来像：

     col1  col2  col3  col4
min     3     5     6     6
max     7    10    10    12

我创建了两个列表 - list接收df_test每行的True或False值。list2接收False值的索引位置。然后我创建了一个循环来遍历df_test，尽管它变得很难阅读。我确信有一种更好更简单的方法来做到这一点。

list = []
list2 = []

def inRange(min, max):
    x = 0
    # i = index, j = columns
    for i, j in itertools.product(range(rows), range(columns)):
        if df_test.iloc[i, j] >= min and df_test.iloc[i, j] <= max:
            x = x + 1
        else:
            x = x + 0
            list2.append((i, j))
    if x == columns:
        list.insert(i, "True")
    else:
        list.insert(i, "False")

# 调用函数需要根据df_train_minMax中每列的最小/最大值不断更新数值
inRange(5, 10)

print(list) 
print(list2)

预期输出：

list = [False, True, True, False]
list2 = [(0, 0), (0, 1), (0, 3), (3, 2), (3, 3)] 或类似

调用函数应该根据df_train_minMax中的列不断更新最小值和最大值，这是我正在努力编码的部分。

英文:

I have 2 dataframes as follows:

import pandas as pd
import itertools

df_test = {
    &quot;col1&quot;: [1, 5, 6, 5],
    &quot;col2&quot;: [4, 5, 6, 7],
    &quot;col3&quot;: [5, 9, 10, 11],
    &quot;col4&quot;: [11, 10, 10, 4],
}
df_train = {
    &quot;col1&quot;: [3, 6, 7, 4, 4],
    &quot;col2&quot;: [5, 10, 5, 5, 5],
    &quot;col3&quot;: [6, 8, 9, 10, 6],
    &quot;col4&quot;: [11, 10, 8, 12, 6],
}
df_train = pd.DataFrame(data=df_train)
df_test = pd.DataFrame(data=df_test)

I have already found the min and max values of df_train and want to see if df_test sits within these values, I want to iterate through each row, continually changing the min and max values as given for each column of the training set. I continually compare each value in the df_test row to see if it fits inside the min and max. If yes, then i will insert True to a list, else I'll write False.

# find minMax
df_train_minMax = df_train.agg([min, max])

# identify no of columns and rows in test data
columns = len(df_test.columns)
rows = len(df_test.index)

df_train_minMax looks like:

     col1  col2  col3  col4
min     3     5     6     6
max     7    10    10    12

I have made 2 lists - list receives True or False values per row of df_test. list2 receives the index location of the False values. And then I've made a loop to iterate through df_test, although its becoming very complicated to read. I am certain there is a better and simpler way of doing this.

list = []
list2 = []

def inRange(min, max):
    x = 0
    # i = index, j = columns
    for i, j in itertools.product(range(rows), range(columns)):
        if df_test.iloc[i, j] &gt;= min and df_test.iloc[i, j] &lt;= max:
            x = x + 1
        else:
            x = x + 0
            list2.append((i, j))
        # print(x)
    if x == columns:
        list.insert(i, &quot;True&quot;)
    else:
        list.insert(i, &quot;False&quot;)

# calling function would need the values to continually change per min/max of each column
inRange(5, 10)

print(list) 
print(list2)

Expected:

list = [False, True, True, False]
list2 = [(0, 0), (0, 1), (0, 3), (3, 2), (3, 3)] or similar

calling function should continually update the min and max values in accordance with the columns in df_train_minMax, which is what I'm struggling to code.

答案1

得分: 1

&gt;&gt;&gt; df_mask = df_test.mask(lambda x: (x&gt;=5) &amp; (x&lt;=10))
&gt;&gt;&gt; df_mask.values.tolist() 
[[1.0, 4.0, nan, 11.0], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, 11.0, 4.0]]

在这一步中，遍历每个元素，如果只包含“True”值，表示整行都符合条件（在最小值和最大值之间，即5和10），则插入“True”。

&gt;&gt;&gt; min_num, max_num = 5, 10
&gt;&gt;&gt; df_mask = df_test.mask(lambda x: (x&gt;=min_num) &amp; (x&lt;=max_num))
&gt;&gt;&gt; df_mask
   col1  col2  col3  col4
0   1.0   4.0   NaN  11.0
1   NaN   NaN   NaN   NaN
2   NaN   NaN   NaN   NaN
3   NaN   NaN  11.0   4.0
&gt;&gt;&gt; df_mask = df_mask.isnull()
&gt;&gt;&gt; df_mask
    col1   col2   col3   col4
0  False  False   True  False
1   True   True   True   True
2   True   True   True   True
3   True   True  False  False
&gt;&gt;&gt; df_mask = df_mask.values.tolist()
&gt;&gt;&gt; df_mask
[[False, False, True, False], [True, True, True, True], [True, True, True, True], [True, True, False, False]]
# ...
&gt;&gt;&gt; list2 = [&quot;True&quot; if (len(set(i)) == 1 and list(set(i))[0] == True) else &quot;False&quot; for i in df_mask]
&gt;&gt;&gt; list2
[&#39;False&#39;, &#39;True&#39;, &#39;True&#39;, &#39;False&#39;]

&lt;br&gt;

# 编辑:

添加了inRange函数并修改了参数

```python
def inRange(min_num, max_num, df):
    # 只有True值符合条件 min &lt; x &gt; max 
    # （可选）使用列表值创建列
    return df.mask(lambda x: (x&gt;=min_num) &amp; (x&lt;=max_num)).isnull()

for col in df_train.tolist():
     min_num = min(col)
     max_num = max(col)
     df1 = inRange(min_num, max_num, df_test)
     print([[&quot;True&quot; if (len(set(i)) == 1 and list(set(i))[0] == True) else &quot;False&quot; for i in df1.tolist()])

英文:

Replace values of each row using mask and setting as conditional numbers less or equal and greater or equal than x that is the value in a dataframe you iterate

&gt;&gt;&gt; df_mask = df_test.mask(lambda x: (x&gt;=5) &amp; (x&lt;=10))
&gt;&gt;&gt; df_mask.values.tolist() 
[[1.0, 4.0, nan, 11.0], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, 11.0, 4.0]]

At this point pass through each element and if just contains True values means that you insert "True" because that entire row match for your condition between min and max numbers (5, 10)

&gt;&gt;&gt; min_num, max_num = 5, 10
&gt;&gt;&gt; df_mask = df_test.mask(lambda x: (x&gt;=min_num) &amp; (x&lt;=max_num))
&gt;&gt;&gt; df_mask
   col1  col2  col3  col4
0   1.0   4.0   NaN  11.0
1   NaN   NaN   NaN   NaN
2   NaN   NaN   NaN   NaN
3   NaN   NaN  11.0   4.0
&gt;&gt;&gt; df_mask = df_mask.isnull()
&gt;&gt;&gt; df_mask
    col1   col2   col3   col4
0  False  False   True  False
1   True   True   True   True
2   True   True   True   True
3   True   True  False  False
&gt;&gt;&gt; df_mask = df_mask.values.tolist()
&gt;&gt;&gt; df_mask
[[False, False, True, False], [True, True, True, True], [True, True, True, True], [True, True, False, False]]
# ...
&gt;&gt;&gt; list2 = [&quot;True&quot; if (len(set(i)) == 1 and list(set(i))[0] == True) else &quot;False&quot; for i in df_mask]
&gt;&gt;&gt; list2
[&#39;False&#39;, &#39;True&#39;, &#39;True&#39;, &#39;False&#39;]

<br>

EDIT:

added inRange function and modified args

def inRange(min_num, max_num, df):
    # only True values match condition min &lt; x &gt; max 
    # (optional) create colum using list values
    return df.mask(lambda x: (x&gt;=min_num) &amp; (x&lt;=max_num)).isnull()

for col in df_train.tolist():
     min_num = min(col)
     max_num = max(col)
     df1 = inRange(min_num, max_num, df_test)
     print([[&quot;True&quot; if (len(set(i)) == 1 and list(set(i))[0] == True) else &quot;False&quot; for i in df1.tolist()])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

比较两个数据框，看一个数据框的列是否在另一个数据框的范围内。

问题

答案1

EDIT:

.ico 文件未在 Python Tkinter 中定义。

How to do simple inheritance in Go

Web-scrapping ASPX页面，在网络>负载中循环遍历页面编号。

显示图表上的线条时，悬停显示系列名称。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论