2023年7月23日 23:21:07go评论107阅读模式

英文:

Lookup a value in a matrix based on two variables of another dataframe using Python

问题

以下是翻译好的部分：

总距离数据框

出发地	目的地	计数	距离	总距离
10001	10002	5	3	15
10001	10003	10	50	500
10002	10001	100	3	300
10002	10003	10	9	90

英文:

I am a still beginner in data analysis, especially in my new study field analysis. Currently, I have a Distance Matrix [7,000 row x 7,000 col] with a total of 49 M cells. Also, I have another Count Dataframe that has some indices from that matrix, not all of them.

Distance Matrix

Zone	10001	10002	10003	10004
10001	0	3	50	40
10002	3	0	9	25
10003	50	9	0	1
10004	40	25	1	0

Count Dataframe

Origin	Destination	Counts
10001	10002	5
10001	10003	10
10002	10001	100
10002	10003	10

So I need to look up and get the relevant distance value from the Distance matrix based on each i and j values of the Count Dataframe to calculate the total distance, such as the following table using Python.

Total Distance Dataframe
Counts Dataframe

Origin	Destination	Counts	Dist	Total_Dist
10001	10002	5	3	15
10001	10003	10	50	500
10002	10001	100	3	300
10002	10003	10	9	90

答案1

得分: 1

你可以首先创建一个函数，根据 Counts DataFrame 中的 Origin 和 Destination 值查找 Distance Matrix 中的距离值。

然后，将这个函数应用到 Counts DataFrame 中，以获取 Distance 值。

最后，通过将 Counts 列和 Dist 列相乘来计算 Total_Dist 列。

# 将数据转换为 DataFrames
distance_df = pd.DataFrame(distance_data).set_index('Zone')
counts_df = pd.DataFrame(counts_data)
# 查找基于 Origin 和 Destination 的距离值的函数
def get_distance(origin, destination):
    return distance_df.loc[origin, destination]
# 将该函数应用到 Counts DataFrame 以获取 Dist 列
counts_df['Dist'] = counts_df.apply(lambda row: get_distance(row['Origin'], row['Destination']), axis=1)
# 计算 Total_Dist 列
counts_df['Total_Dist'] = counts_df['Counts'] * counts_df['Dist']
# 显示最终的 Total Distance DataFrame
print(counts_df)

输出：

   Origin  Destination  Counts  Dist  Total_Dist
0   10001        10002       5     3          15
1   10001        10003      10    50         500
2   10002        10001     100     3         300
3   10002        10003      10     9          90

英文:

you can first create a function to look up the distance value from the Distance Matrix based on the Origin and Destination values from the Counts DataFrame.

Then, apply this function to the Counts DataFrame to get the Distance values.

Finally, calculate the Total_Dist column by multiplying the Counts and Dist columns.

# Convert data to DataFrames
distance_df = pd.DataFrame(distance_data).set_index(&#39;Zone&#39;)
counts_df = pd.DataFrame(counts_data)
# Function to look up distance value based on Origin and Destination
def get_distance(origin, destination):
    return distance_df.loc[origin, destination]
# Apply the function to the Counts DataFrame to get Dist column
counts_df[&#39;Dist&#39;] = counts_df.apply(lambda row: get_distance(row[&#39;Origin&#39;], row[&#39;Destination&#39;]), axis=1)
# Calculate the Total_Dist column
counts_df[&#39;Total_Dist&#39;] = counts_df[&#39;Counts&#39;] * counts_df[&#39;Dist&#39;]
# Display the final Total Distance DataFrame
print(counts_df)

OUTPUT :

   Origin  Destination  Counts  Dist  Total_Dist
0   10001        10002       5     3          15
1   10001        10003      10    50         500
2   10002        10001     100     3         300
3   10002        10003      10     9          90

答案2

得分: 0

你可以将距离DataFrame（df_dist）从"Zone"列设置为索引（如果尚未设置），然后使用.loc来定位正确的元素：

# 如果尚未设置，将df_dist的索引设置为'Zone'列：
df_dist.set_index('Zone', inplace=True)
df_count['Dist'] = df_count.apply(lambda x: df_dist.loc[x['Origin'], x['Destination']], axis=1)
df_count['Total Dist'] = df_count['Dist'] * df_count['Counts']
print(df_count)

打印结果如下：

   Origin  Destination  Counts  Dist  Total Dist
0   10001        10002       5     3          15
1   10001        10003      10    50         500
2   10002        10001     100     3         300
3   10002        10003      10     9          90

英文:

You can set index of distance df from the column "Zone" (if not already) and then use .loc to locate the right elements:

# set index of df_dist (if not already):
df_dist.set_index(&#39;Zone&#39;, inplace=True)
df_count[&#39;Dist&#39;] = df_count.apply(lambda x: df_dist.loc[x[&#39;Origin&#39;], x[&#39;Destination&#39;]], axis=1)
df_count[&#39;Total Dist&#39;] = df_count[&#39;Dist&#39;] * df_count[&#39;Counts&#39;]
print(df_count)

Prints:

   Origin  Destination  Counts  Dist  Total Dist
0   10001        10002       5     3          15
1   10001        10003      10    50         500
2   10002        10001     100     3         300
3   10002        10003      10     9          90

答案3

得分: 0

如果您的唯一要求是使用Python，那么我建议使用Pandas来进行大规模的工作。

我会这样做：

import pandas as pd
# 给定矩阵的声明：Distance_Matrix 
Distance_Matrix = {
    'Zone': [10001, 10002, 10003, 10004],
    '10001': [0, 3, 50, 40],
    '10002': [3, 0, 9, 25],
    '10003': [50, 9, 0, 1],
    '10004': [40, 25, 1, 0],
}
# 给定矩阵的声明：Count_Dataframe 
Count_Dataframe = {
    'Origin': [10001, 10001, 10002, 10002],
    'Destination': [10002, 10003, 10001, 10003],
    'Counts': [5, 10, 100, 10],
}
# 将矩阵转换为Pandas DataFrame
dm = pd.DataFrame(Distance_Matrix)
cm = pd.DataFrame(Count_Dataframe)
# 用于选择“Distance_Matrix”值的函数
def get_distance_value(row):
    # 原始区域 = 从“Zone”中获取的行
    origin_zone = row['Origin']
    
    # 目标区域 = 列
    destination_zone = row['Destination']
    
    # 返回位于Distance_Matrix位置[origin_zone,destination_zone]的值
    return dm.loc[dm['Zone'] == origin_zone, str(destination_zone)].values[0]
# 创建新表，因为它与Count_Dataframe具有相同的基础，所以创建一个副本
Total_Distance_Dataframe_Counts_Dataframe = cm.copy()
# 通过应用创建的函数来创建新列“Distance”，以选择来自“Distance_Matrix”的值
Total_Distance_Dataframe_Counts_Dataframe['Distance'] = Total_Distance_Dataframe_Counts_Dataframe.apply(get_distance_value, axis=1)
# 创建最后一列，即“Counts” * “Distance”的乘积
Total_Distance_Dataframe_Counts_Dataframe['Total_Dist'] =  Total_Distance_Dataframe_Counts_Dataframe['Counts'] * Total_Distance_Dataframe_Counts_Dataframe['Distance']
# 打印新表
print(Total_Distance_Dataframe_Counts_Dataframe)

简单解释：
在这种情况下，DataFrame就是您的表，我将它们视为一个高级字典，因为就像在字典中一样，您可以通过以列名的形式插入键来编辑值。在DataFrame中，您会得到一个完整的列，然后可以对其进行操作。
在Pandas DataFrame上使用.loc[]可以看作是SQL语句中的SELECT语句。将选择和.loc[]结合起来，您可以在表中搜索值或编辑它们。

英文:

If your only requirement is to use python then i would recommend using Pandas for large scale work.

I would do it something like this

import pandas as pd
#Declaration of given Matrix : Distance_Matrix 
Distance_Matrix = {
    &#39;Zone&#39;: [10001, 10002, 10003, 10004],
    &#39;10001&#39;: [0, 3, 50, 40],
    &#39;10002&#39;: [3, 0, 9, 25],
    &#39;10003&#39;: [50, 9, 0, 1],
    &#39;10004&#39;: [40, 25, 1, 0],
}
#Declaration of given Matrix : Count_Dataframe 
Count_Dataframe = {
    &#39;Origin&#39;: [10001, 10001, 10002, 10002],
    &#39;Destination&#39;: [10002, 10003, 10001, 10003],
    &#39;Counts&#39;: [5, 10, 100, 10],
}
#Convert Matrix into a Pandas DataFrame
dm = pd.DataFrame(Distance_Matrix)
cm = pd.DataFrame(Count_Dataframe)
#function to for Selecting value of &quot;Distance_Matrix&quot; 
def get_distance_value(row):
    # origin_zone = row taken from &quot;Zone&quot;
    origin_zone = row[&#39;Origin&#39;]
    
    # destination_zone = column
    destination_zone = row[&#39;Destination&#39;]
    
    # Return Value located in Distance_Matrix at position [origin_zone,destination_zone]
    return dm.loc[dm[&#39;Zone&#39;] == origin_zone, str(destination_zone)].values[0]
#Create new Table, as its the same base as Count_Dataframe create a copy 
Total_Distance_Dataframe_Counts_Dataframe= cm.copy()
#create new column &quot;Distance&quot; by applying created function to select Value from &quot;Distance_Matrix &quot;
Total_Distance_Dataframe_Counts_Dataframe[&#39;Distance&#39;] = Total_Distance_Dataframe_Counts_Dataframe.apply(get_distance_value, axis=1)
#create last column that is product of &quot;Counts&quot; * &quot;Distance&quot;
Total_Distance_Dataframe_Counts_Dataframe[&#39;Total_Dist&#39;] =  Total_Distance_Dataframe_Counts_Dataframe[&#39;Counts&#39;] * Total_Distance_Dataframe_Counts_Dataframe[&#39;Distance&#39;]
#print new table
print(Total_Distance_Dataframe_Counts_Dataframe)

Simple Explanation:
DataFrames are, in this case, your Tables, I see them as a fancy Dictionary because just like in a Dict you can edit values by inserting the key in form of the column name. In a form of a DataFrame you get a complete Column back that you can then work with.
Using .loc[] on a pandas DataFrame can be seen as a SELECT statement in SQL statements. combining the selection and .loc[] you can search for values in your Table or edit them.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中，基于另一个数据框的两个变量查找矩阵中的数值。

问题

答案1

答案2

答案3

Pandas设置DataFrame值时搜索嵌套字典以获取数值的最有效方式

在训练值上执行Tf-idf向量化器（Tf-idfvectorizer()）时发生错误。

如何在项目的多个模块中构建 GUI 屏幕？

如何使用每个类别最多一次来最大化一个函数

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。