英文:
Lookup a value in a matrix based on two variables of another dataframe using Python
问题
以下是翻译好的部分:
总距离数据框
出发地 | 目的地 | 计数 | 距离 | 总距离 |
---|---|---|---|---|
10001 | 10002 | 5 | 3 | 15 |
10001 | 10003 | 10 | 50 | 500 |
10002 | 10001 | 100 | 3 | 300 |
10002 | 10003 | 10 | 9 | 90 |
英文:
I am a still beginner in data analysis, especially in my new study field analysis. Currently, I have a Distance Matrix [7,000 row x 7,000 col] with a total of 49 M cells. Also, I have another Count Dataframe that has some indices from that matrix, not all of them.
Distance Matrix
Zone | 10001 | 10002 | 10003 | 10004 |
---|---|---|---|---|
10001 | 0 | 3 | 50 | 40 |
10002 | 3 | 0 | 9 | 25 |
10003 | 50 | 9 | 0 | 1 |
10004 | 40 | 25 | 1 | 0 |
Count Dataframe
Origin | Destination | Counts |
---|---|---|
10001 | 10002 | 5 |
10001 | 10003 | 10 |
10002 | 10001 | 100 |
10002 | 10003 | 10 |
So I need to look up and get the relevant distance value from the Distance matrix based on each i
and j
values of the Count Dataframe to calculate the total distance, such as the following table using Python.
Total Distance Dataframe
Counts Dataframe
Origin | Destination | Counts | Dist | Total_Dist |
---|---|---|---|---|
10001 | 10002 | 5 | 3 | 15 |
10001 | 10003 | 10 | 50 | 500 |
10002 | 10001 | 100 | 3 | 300 |
10002 | 10003 | 10 | 9 | 90 |
答案1
得分: 1
你可以首先创建一个函数,根据 Counts DataFrame 中的 Origin 和 Destination 值查找 Distance Matrix 中的距离值。
然后,将这个函数应用到 Counts DataFrame 中,以获取 Distance 值。
最后,通过将 Counts 列和 Dist 列相乘来计算 Total_Dist 列。
# 将数据转换为 DataFrames
distance_df = pd.DataFrame(distance_data).set_index('Zone')
counts_df = pd.DataFrame(counts_data)
# 查找基于 Origin 和 Destination 的距离值的函数
def get_distance(origin, destination):
return distance_df.loc[origin, destination]
# 将该函数应用到 Counts DataFrame 以获取 Dist 列
counts_df['Dist'] = counts_df.apply(lambda row: get_distance(row['Origin'], row['Destination']), axis=1)
# 计算 Total_Dist 列
counts_df['Total_Dist'] = counts_df['Counts'] * counts_df['Dist']
# 显示最终的 Total Distance DataFrame
print(counts_df)
输出:
Origin Destination Counts Dist Total_Dist
0 10001 10002 5 3 15
1 10001 10003 10 50 500
2 10002 10001 100 3 300
3 10002 10003 10 9 90
英文:
you can first create a function to look up the distance value from the Distance Matrix based on the Origin and Destination values from the Counts DataFrame.
Then, apply this function to the Counts DataFrame to get the Distance values.
Finally, calculate the Total_Dist column by multiplying the Counts and Dist columns.
# Convert data to DataFrames
distance_df = pd.DataFrame(distance_data).set_index('Zone')
counts_df = pd.DataFrame(counts_data)
# Function to look up distance value based on Origin and Destination
def get_distance(origin, destination):
return distance_df.loc[origin, destination]
# Apply the function to the Counts DataFrame to get Dist column
counts_df['Dist'] = counts_df.apply(lambda row: get_distance(row['Origin'], row['Destination']), axis=1)
# Calculate the Total_Dist column
counts_df['Total_Dist'] = counts_df['Counts'] * counts_df['Dist']
# Display the final Total Distance DataFrame
print(counts_df)
OUTPUT :
Origin Destination Counts Dist Total_Dist
0 10001 10002 5 3 15
1 10001 10003 10 50 500
2 10002 10001 100 3 300
3 10002 10003 10 9 90
答案2
得分: 0
你可以将距离DataFrame(df_dist)从"Zone"列设置为索引(如果尚未设置),然后使用.loc
来定位正确的元素:
# 如果尚未设置,将df_dist的索引设置为'Zone'列:
df_dist.set_index('Zone', inplace=True)
df_count['Dist'] = df_count.apply(lambda x: df_dist.loc[x['Origin'], x['Destination']], axis=1)
df_count['Total Dist'] = df_count['Dist'] * df_count['Counts']
print(df_count)
打印结果如下:
Origin Destination Counts Dist Total Dist
0 10001 10002 5 3 15
1 10001 10003 10 50 500
2 10002 10001 100 3 300
3 10002 10003 10 9 90
英文:
You can set index of distance df from the column "Zone" (if not already) and then use .loc
to locate the right elements:
# set index of df_dist (if not already):
df_dist.set_index('Zone', inplace=True)
df_count['Dist'] = df_count.apply(lambda x: df_dist.loc[x['Origin'], x['Destination']], axis=1)
df_count['Total Dist'] = df_count['Dist'] * df_count['Counts']
print(df_count)
Prints:
Origin Destination Counts Dist Total Dist
0 10001 10002 5 3 15
1 10001 10003 10 50 500
2 10002 10001 100 3 300
3 10002 10003 10 9 90
答案3
得分: 0
如果您的唯一要求是使用Python,那么我建议使用Pandas来进行大规模的工作。
我会这样做:
import pandas as pd
# 给定矩阵的声明:Distance_Matrix
Distance_Matrix = {
'Zone': [10001, 10002, 10003, 10004],
'10001': [0, 3, 50, 40],
'10002': [3, 0, 9, 25],
'10003': [50, 9, 0, 1],
'10004': [40, 25, 1, 0],
}
# 给定矩阵的声明:Count_Dataframe
Count_Dataframe = {
'Origin': [10001, 10001, 10002, 10002],
'Destination': [10002, 10003, 10001, 10003],
'Counts': [5, 10, 100, 10],
}
# 将矩阵转换为Pandas DataFrame
dm = pd.DataFrame(Distance_Matrix)
cm = pd.DataFrame(Count_Dataframe)
# 用于选择“Distance_Matrix”值的函数
def get_distance_value(row):
# 原始区域 = 从“Zone”中获取的行
origin_zone = row['Origin']
# 目标区域 = 列
destination_zone = row['Destination']
# 返回位于Distance_Matrix位置[origin_zone,destination_zone]的值
return dm.loc[dm['Zone'] == origin_zone, str(destination_zone)].values[0]
# 创建新表,因为它与Count_Dataframe具有相同的基础,所以创建一个副本
Total_Distance_Dataframe_Counts_Dataframe = cm.copy()
# 通过应用创建的函数来创建新列“Distance”,以选择来自“Distance_Matrix”的值
Total_Distance_Dataframe_Counts_Dataframe['Distance'] = Total_Distance_Dataframe_Counts_Dataframe.apply(get_distance_value, axis=1)
# 创建最后一列,即“Counts” * “Distance”的乘积
Total_Distance_Dataframe_Counts_Dataframe['Total_Dist'] = Total_Distance_Dataframe_Counts_Dataframe['Counts'] * Total_Distance_Dataframe_Counts_Dataframe['Distance']
# 打印新表
print(Total_Distance_Dataframe_Counts_Dataframe)
简单解释:
在这种情况下,DataFrame就是您的表,我将它们视为一个高级字典,因为就像在字典中一样,您可以通过以列名的形式插入键来编辑值。在DataFrame中,您会得到一个完整的列,然后可以对其进行操作。
在Pandas DataFrame上使用.loc[]
可以看作是SQL语句中的SELECT语句。将选择和.loc[]
结合起来,您可以在表中搜索值或编辑它们。
英文:
If your only requirement is to use python then i would recommend using Pandas for large scale work.
I would do it something like this
import pandas as pd
#Declaration of given Matrix : Distance_Matrix
Distance_Matrix = {
'Zone': [10001, 10002, 10003, 10004],
'10001': [0, 3, 50, 40],
'10002': [3, 0, 9, 25],
'10003': [50, 9, 0, 1],
'10004': [40, 25, 1, 0],
}
#Declaration of given Matrix : Count_Dataframe
Count_Dataframe = {
'Origin': [10001, 10001, 10002, 10002],
'Destination': [10002, 10003, 10001, 10003],
'Counts': [5, 10, 100, 10],
}
#Convert Matrix into a Pandas DataFrame
dm = pd.DataFrame(Distance_Matrix)
cm = pd.DataFrame(Count_Dataframe)
#function to for Selecting value of "Distance_Matrix"
def get_distance_value(row):
# origin_zone = row taken from "Zone"
origin_zone = row['Origin']
# destination_zone = column
destination_zone = row['Destination']
# Return Value located in Distance_Matrix at position [origin_zone,destination_zone]
return dm.loc[dm['Zone'] == origin_zone, str(destination_zone)].values[0]
#Create new Table, as its the same base as Count_Dataframe create a copy
Total_Distance_Dataframe_Counts_Dataframe= cm.copy()
#create new column "Distance" by applying created function to select Value from "Distance_Matrix "
Total_Distance_Dataframe_Counts_Dataframe['Distance'] = Total_Distance_Dataframe_Counts_Dataframe.apply(get_distance_value, axis=1)
#create last column that is product of "Counts" * "Distance"
Total_Distance_Dataframe_Counts_Dataframe['Total_Dist'] = Total_Distance_Dataframe_Counts_Dataframe['Counts'] * Total_Distance_Dataframe_Counts_Dataframe['Distance']
#print new table
print(Total_Distance_Dataframe_Counts_Dataframe)
Simple Explanation:
DataFrames are, in this case, your Tables, I see them as a fancy Dictionary because just like in a Dict you can edit values by inserting the key in form of the column name. In a form of a DataFrame you get a complete Column back that you can then work with.
Using .loc[]
on a pandas DataFrame can be seen as a SELECT statement in SQL statements. combining the selection and .loc[]
you can search for values in your Table or edit them.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论