英文:
Why is my Mixed-Integer Linear Programming problem running slow on python compared to R?
问题
I have a very large mixed-integer linear programming problem that I need to run thousands of times, so speed is a priority.
I need it to run in both R
and in python
. My R
code runs very fast, the solver takes about 0.002 seconds to solve the problem. My python
code is a different story, it takes anywhere from of 0.05 to 0.80 seconds and often times much more.
I highly doubt that R
is 25x more efficient than python
, especially considering that (to my knowledge) both are using C
code to solve the problem. I'm looking for help to determine how to make my python
code faster. I assume it starts with getting a different package than the one I have now? I really need the python
code to consistently be below .01 seconds.
Here is sample R
code and python
code that solve the same problem. In this example of my real problem, I'm building "fantasy basketball" lineups with certain constraints. Player pool is 500, must choose exactly 8 players with at least 1 and no more than 3 from each position. Salary must be under 50000. I randomly sample Positions/Salary since the actual solution isn't important as much as the speed.
R Code
library(lpSolve)
library(dplyr)
# Make dummy dataframe with player positions, salaries, and projected points
Positions <- c('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
Position <- sample(Positions, 500, replace = TRUE)
Salary <- sample(seq(3000, 12000, by=100), 500, replace=TRUE)
playerTable <- data.frame(ID = 1:500, Position, Salary) %>%
mutate(ProjPts = round(Salary/1000*5))
# Make positional constraint vectors
ConVec_Position <- playerTable %>%
select(Position) %>%
mutate(AnyPG = grepl('PG', Position)*1,
AnySG = grepl('SG', Position)*1,
AnySF = grepl('SF', Position)*1,
AnyPF = grepl('PF', Position)*1,
AnyC = grepl('C', Position)*1,
orPGSGSFPFC = (AnyPG + AnySG + AnySF + AnyPF + AnyC > 0)*1
) %>%
select(-Position)
# Make salary constraint vectors
ConVec_Salary <- playerTable$Salary
# Get Objective (projected points)
ProjectedPoints <- playerTable$ProjPts
# Make constraint values and directions
ConValGT <- c(1,1,1,1,1, 8) # Must have at least one of each position, and 8 total
ConValLT <- c(3,3,3,3,2, 8, 50000) # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT <- c(rep(">=", ncol(ConVec_Position))) # Make >= directions for each position constraint
ConDirLT <- c(rep("<=", ncol(ConVec_Position) + 1)) # Make <= directions for each position constraint, plus 1 for Salary
# Compile constraint vectors, values, and directions
ConVec_Final <- cbind(ConVec_Position, ConVec_Position, ConVec_Salary)
ConDir_Final <- c(ConDirGT, ConDirLT)
ConVal_Final <- c(ConValGT, ConValLT)
start_time <- proc.time()
sol <- lp("max",
objective.in = ProjectedPoints,
const.mat = t(ConVec_Final),
const.dir = ConDir_Final,
const.rhs = ConVal_Final,
binary.vec = 1:length(ProjectedPoints) # All decisions are binary
)
cat("\n--- It took ", round((proc.time() - start_time)[3], 4), " seconds to optimize lineups ---\n", sep = "")
Python Code
import pandas as pd
import numpy as np
from lp_solve import *
import time
# Make dummy dataframe with player positions, salaries, and projected points
ID = range(500)
Positions = ['PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C']
Position = np.random.choice(Positions, 500)
Salary = np.random.choice(range(3000, 12000, 100), 500)
playerTable = pd.DataFrame({'ID': ID, 'Position': Position, 'Salary': Salary})
playerTable['ProjPts'] = playerTable['Salary'] / 1000 * 5
# Make positional constraint vectors
AnyPG = playerTable.Position.str.contains('PG') * 1
AnySG = playerTable.Position.str.contains('SG') * 1
AnySF = playerTable.Position.str.contains('SF') * 1
AnyPF = playerTable.Position.str.contains('PF') * 1
AnyC = playerTable.Position.str.contains('C') * 1
AnyPos = [1] * 500
ConVec_Position = pd.DataFrame({'AnyPG': AnyPG, 'AnySG': AnySG, 'AnySF': AnySF, 'AnyPF': AnyPF, 'AnyC': AnyC, 'AnyPos': AnyPos})
# Make salary constraint vectors
ConVec_Salary = playerTable[['Salary']]
# Get Objective (projected points)
ProjectedPoints = playerTable['ProjPts'].tolist()
# Make constraint values and directions
ConValGT = [1, 1, 1, 1, 1, 8] # Must have at least one of each position, and 8 total
ConValLT = [3, 3, 3, 3, 2, 8, 50000] # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT = [1] * (len(ConVec_Position.columns)) # Make >= directions for each position constraint
ConDirLT = [-1] * (len(ConVec_Position.columns) + 1) # Make <= directions for each position constraint, plus 1 for Salary
# Compile constraint vectors, values, and directions
ConVec_Final = pd.concat([ConVec_Position, ConVec_Position, ConVec_Salary], axis=1)
ConVal_Final = ConValGT + ConValLT
ConDir_Final = ConDirGT + ConDirLT
# Force all decisions to be binary
vLB = [0] * len(ProjectedPoints) # lower bound 0
vUB = [1] * len(ProjectedPoints) # upper bound 1
xint = [i+1 for i in range(len(ProjectedPoints))] # all decisions are integers (aka all are binary)
solveTimefix = time.time()
[obj, sol, duals] = lp_solve(ProjectedPoints,
ConVec_Final.T.values.tolist(),
ConVal_Final,
ConDir_Final,
vLB,
vUB,
xint)
solveTime
<details>
<summary>英文:</summary>
I have a very large mixed-integer linear programming problem that I need to run thousands of times, so speed is a priority.
I need it to run in both `R` and in `python`. My `R` code runs very fast, the solver takes about 0.002 seconds to solve the problem. My `python` code is a different story, it takes anywhere from of 0.05 to 0.80 seconds and often times much more.
I highly doubt that `R` is 25x more efficient than `python`, especially considering that (to my knowledge) both are using `C` code to solve the problem. I'm looking for help to determine how to make my `python` code faster. I assume it starts with getting a different package than the one I have now? I really need the `python` code to consistently be below .01 seconds.
Here is sample `R` code and `python` code that solve the same problem. In this example of my real problem, I'm building "fantasy basketball" lineups with certain constraints. Player pool is 500, must choose exactly 8 players with at least 1 and no more than 3 from each position. Salary must be under 50000. I randomly sample Positions/Salary since the actual solution isn't important as much as the speed.
**R Code**
library(lpSolve)
library(dplyr)
Make dummy dataframe with player positions, salaries, and projected points
Positions <- c('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
Position <- sample(Positions, 500, replace = TRUE)
Salary <- sample(seq(3000, 12000, by=100), 500, replace=TRUE)
playerTable <- data.frame(ID = 1:500, Position, Salary) %>%
mutate(ProjPts = round(Salary/1000*5))
Make positional constraint vectors
ConVec_Position <- playerTable %>%
select(Position) %>%
mutate(AnyPG = grepl('PG', Position)*1,
AnySG = grepl('SG', Position)*1,
AnySF = grepl('SF', Position)*1,
AnyPF = grepl('PF', Position)*1,
AnyC = grepl('C', Position)*1,
orPGSGSFPFC = (AnyPG + AnySG + AnySF + AnyPF + AnyC > 0)*1
) %>%
select(-Position)
Make salary constraint vectors
ConVec_Salary <- playerTable$Salary
Get Objective (projected points)
ProjectedPoints <- playerTable$ProjPts
Make constraint values and directions
ConValGT <- c(1,1,1,1,1, 8) # Must have at least one of each position, and 8 total
ConValLT <- c(3,3,3,3,2, 8, 50000) # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT <- c(rep(">=", ncol(ConVec_Position))) # Make >= directions for each position constraint
ConDirLT <- c(rep("<=", ncol(ConVec_Position) + 1)) # Make <= directions for each position constraint, plus 1 for Salary
Compile constraint vectors, values, and directions
ConVec_Final <- cbind(ConVec_Position, ConVec_Position, ConVec_Salary)
ConDir_Final <- c(ConDirGT, ConDirLT)
ConVal_Final <- c(ConValGT, ConValLT)
start_time <- proc.time()
sol <- lp("max",
objective.in = ProjectedPoints,
const.mat = t(ConVec_Final),
const.dir = ConDir_Final,
const.rhs = ConVal_Final,
binary.vec = 1:length(ProjectedPoints) # All decisions are binary
)
cat("\n--- It took ", round((proc.time() - start_time)[3], 4), " seconds to optimize lineups ---\n", sep = "")
**python Code**
import pandas as pd
import numpy as np
from lp_solve import *
import time
Make dummy dataframe with player positions, salaries, and projected points
ID = range(500)
Positions = ['PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C']
Position = np.random.choice(Positions,500)
Salary = np.random.choice(range(3000,12000,100),500)
playerTable = pd.DataFrame({'ID': ID, 'Position': Position, 'Salary': Salary})
playerTable['ProjPts'] = playerTable['Salary']/1000*5
Make positional constraint vectors
AnyPG = playerTable.Position.str.contains('PG')*1
AnySG = playerTable.Position.str.contains('SG')*1
AnySF = playerTable.Position.str.contains('SF')*1
AnyPF = playerTable.Position.str.contains('PF')*1
AnyC = playerTable.Position.str.contains('C')*1
AnyPos = [1]*500
ConVec_Position = pd.DataFrame({'AnyPG': AnyPG, 'AnySG': AnySG, 'AnySF': AnySF, 'AnyPF': AnyPF, 'AnyC': AnyC, 'AnyPos': AnyPos})
Make salary constraint vectors
ConVec_Salary = playerTable[['Salary']]
Get Objective (projected points)
ProjectedPoints = playerTable['ProjPts'].tolist()
Make constraint values and directions
ConValGT = [1,1,1,1,1, 8] # Must have at least one of each position, and 8 total
ConValLT = [3,3,3,3,2, 8, 50000] # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT = [1] * (len(ConVec_Position.columns)) # Make >= directions for each position constraint
ConDirLT = [-1] * (len(ConVec_Position.columns) + 1) # Make <= directions for each position constraint, plus 1 for Salary
Compile constraint vectors, values, and directions
ConVec_Final = pd.concat([ConVec_Position, ConVec_Position, ConVec_Salary], axis=1)
ConVal_Final = ConValGT + ConValLT
ConDir_Final = ConDirGT + ConDirLT
Force all decisions to be binary
vLB = [0] * len(ProjectedPoints) # lower bound 0
vUB = [1] * len(ProjectedPoints) # upper bound 1
xint = [i+1 for i in range(len(ProjectedPoints))] # all decisions are integers (aka all are binary)
solveTimefix = time.time()
[obj, sol, duals] = lp_solve(ProjectedPoints,
ConVec_Final.T.values.tolist(),
ConVal_Final,
ConDir_Final,
vLB,
vUB,
xint)
solveTimeCurrRun = (time.time() - solveTimefix)
print("\n--- It took %s seconds to get all LUs ---\n" % round(solveTimeCurrRun, 2))
Any ideas on how to make my `python` code run the same speed as `R`?
</details>
# 答案1
**得分**: 1
在我的笔记本电脑上,这个代码大约需要4毫秒,但重要的是它并不是一对一的比较。我认为你可能忽略了一个(很大的)约束:每个个体球员只能被分配到一个位置。
这个代码片段使用了HiGHS作为后端,我没有尝试过`lpsolve55`。输出结果如下:
```none
优化成功终止。 (HiGHS 状态 7: 最优解)
解决方案用时 3.4 毫秒
位置 薪水 预计得分
ID 位置
314 SF SF 10000 50
390 PG PG 3000 15
447 SG SG 3000 15
467 SF SF 10000 50
480 PF PF 7000 35
484 SG SG 3000 15
487 C C 7000 35
494 C C 7000 35
总计:
薪水 50000
预计得分 250
你也可以尝试在PuLP/CBC上运行(这里没有展示)。
英文:
On my craptop this takes about 4 ms, but importantly it is not apples-to-apples. I believe that you are missing a (large) constraint: each individual player can only be assigned to one position.
from time import perf_counter
import pandas as pd
import numpy as np
import scipy.sparse
from scipy.optimize import milp, LinearConstraint, Bounds
from numpy.random._generator import default_rng
rand = default_rng(seed=0)
# Make dummy dataframe with player positions, salaries, and projected points
n = 500
position_combos = ('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
salary_k = rand.integers(low=3, high=12, size=n)
players = pd.DataFrame(
{
'Positions': rand.choice(position_combos, n),
'Salary': salary_k * 1_000,
'ProjPts': salary_k * 5,
},
index=pd.RangeIndex(name='ID', stop=n),
) # 500x3
# Convert from slash-separated position option strings to a dense ID/Position multi-index
exploded_positions = players.Positions.str.split('/', expand=True) # 500x2
position_values = np.array(('PG', 'SG', 'SF', 'PF', 'C'))
position_sparse = pd.DataFrame(
(
exploded_positions.values[:, :, np.newaxis]
== position_values[np.newaxis, np.newaxis, :]
).any(axis=1),
index=players.index,
columns=pd.Index(name='Position', data=position_values),
) # 500x5
position_sparse[~position_sparse] = np.nan
positions = position_sparse.stack().index # 685x2
# Sparse 2D position assignment matrix with guaranteed single one per column
position_assignments = np.equal.outer(
positions.get_level_values('Position'),
position_values,
).astype(float).T # 5x685
# Sparse 2D player assignment matrix with guaranteed single one per column
player_assignments = np.equal.outer(
positions.get_level_values('ID'),
players.index,
).astype(float).T # 500x685
# Dense salary assignment (row) with repeats according to player ID
salary_assignments = players.Salary[positions.get_level_values('ID')] # 685
# Cost values: negative projected points, only where a player can take a given position;
# multi-indexed by ID and Position
c = -players.ProjPts[positions.get_level_values('ID')].astype(float)
c.index = positions # 685
integrality = np.ones(shape=c.size, dtype=np.uint8) # 685
bounds = Bounds(
lb=np.zeros_like(c),
ub=np.ones_like(c), # 685
)
A = scipy.sparse.csc_array(
scipy.sparse.vstack((
scipy.sparse.csr_array(np.ones_like(c)), # The total number of selected players must be fixed
scipy.sparse.csr_array(salary_assignments), # There is a maximum salary
scipy.sparse.csr_array(position_assignments), # Each position count has a min and max
scipy.sparse.csr_array(player_assignments), # One player cannot occupy multiple positions
), format='csc') # this is what milp uses; see _milp_iv()
) # 507x685
lb, ub = np.block([
[8, -np.inf, 1, 1, 1, 1, 1, np.full(shape=n, fill_value=-np.inf)],
[8, 50_000, 3, 3, 3, 3, 2, np.ones(n)],
])
t0 = perf_counter()
result = milp(
c=c, integrality=integrality, bounds=bounds,
constraints=(LinearConstraint(A=A, lb=lb, ub=ub),),
)
t1 = perf_counter()
assert result.success, result.message
print(result.message)
print(f'Solution in {1e3*(t1-t0):.1f} ms')
outputs = pd.Series(name='Assigned', data=result.x, index=positions)
selections = outputs[outputs > 0.5].index
merged_outputs = players.loc[selections.get_level_values('ID')].set_index(selections)
print(merged_outputs, end='\n\n')
print('Total:')
print(merged_outputs[['Salary', 'ProjPts']].sum().to_string())
This uses HiGHS in the backend; I have not attempted lpsolve55
. The output is
Optimization terminated successfully. (HiGHS Status 7: Optimal)
Solution in 3.4 ms
Positions Salary ProjPts
ID Position
314 SF SF 10000 50
390 PG PG 3000 15
447 SG SG 3000 15
467 SF SF 10000 50
480 PF PF 7000 35
484 SG SG 3000 15
487 C C 7000 35
494 C C 7000 35
Total:
Salary 50000
ProjPts 250
You could also try running on PuLP/CBC (I have not shown that here).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论