我的混合整数线性规划问题在Python上运行速度比R慢的原因是什么?

huangapple go评论68阅读模式
英文:

Why is my Mixed-Integer Linear Programming problem running slow on python compared to R?

问题

I have a very large mixed-integer linear programming problem that I need to run thousands of times, so speed is a priority.

I need it to run in both R and in python. My R code runs very fast, the solver takes about 0.002 seconds to solve the problem. My python code is a different story, it takes anywhere from of 0.05 to 0.80 seconds and often times much more.

I highly doubt that R is 25x more efficient than python, especially considering that (to my knowledge) both are using C code to solve the problem. I'm looking for help to determine how to make my python code faster. I assume it starts with getting a different package than the one I have now? I really need the python code to consistently be below .01 seconds.

Here is sample R code and python code that solve the same problem. In this example of my real problem, I'm building "fantasy basketball" lineups with certain constraints. Player pool is 500, must choose exactly 8 players with at least 1 and no more than 3 from each position. Salary must be under 50000. I randomly sample Positions/Salary since the actual solution isn't important as much as the speed.

R Code

library(lpSolve)
library(dplyr)

# Make dummy dataframe with player positions, salaries, and projected points
Positions <- c('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
Position <- sample(Positions, 500, replace = TRUE)
Salary <- sample(seq(3000, 12000, by=100), 500, replace=TRUE)
playerTable <- data.frame(ID = 1:500, Position, Salary) %>%
  mutate(ProjPts = round(Salary/1000*5))

# Make positional constraint vectors
ConVec_Position <- playerTable %>%
  select(Position) %>%
  mutate(AnyPG =  grepl('PG', Position)*1,
         AnySG = grepl('SG', Position)*1,
         AnySF = grepl('SF', Position)*1,
         AnyPF = grepl('PF', Position)*1,
         AnyC = grepl('C', Position)*1,
         orPGSGSFPFC = (AnyPG + AnySG + AnySF + AnyPF + AnyC > 0)*1
  ) %>%
  select(-Position)

# Make salary constraint vectors
ConVec_Salary <- playerTable$Salary

# Get Objective (projected points)
ProjectedPoints <- playerTable$ProjPts

# Make constraint values and directions
ConValGT <- c(1,1,1,1,1, 8) # Must have at least one of each position, and 8 total
ConValLT <- c(3,3,3,3,2, 8, 50000) # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT <- c(rep(">=", ncol(ConVec_Position))) # Make >= directions for each position constraint
ConDirLT <- c(rep("<=", ncol(ConVec_Position) + 1)) # Make <= directions for each position constraint, plus 1 for Salary

# Compile constraint vectors, values, and directions
ConVec_Final <- cbind(ConVec_Position, ConVec_Position, ConVec_Salary)
ConDir_Final <- c(ConDirGT, ConDirLT)
ConVal_Final <- c(ConValGT, ConValLT)

start_time <- proc.time()

sol <- lp("max",
          objective.in = ProjectedPoints,
          const.mat	= t(ConVec_Final),
          const.dir	= ConDir_Final,
          const.rhs	= ConVal_Final,
          binary.vec = 1:length(ProjectedPoints) # All decisions are binary
)

cat("\n--- It took ", round((proc.time() - start_time)[3], 4), " seconds to optimize lineups ---\n", sep = "")

Python Code

import pandas as pd
import numpy as np
from lp_solve import *
import time

# Make dummy dataframe with player positions, salaries, and projected points
ID = range(500)
Positions = ['PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C']
Position = np.random.choice(Positions, 500)
Salary = np.random.choice(range(3000, 12000, 100), 500)
playerTable = pd.DataFrame({'ID': ID, 'Position': Position, 'Salary': Salary})
playerTable['ProjPts'] = playerTable['Salary'] / 1000 * 5

# Make positional constraint vectors
AnyPG = playerTable.Position.str.contains('PG') * 1
AnySG = playerTable.Position.str.contains('SG') * 1
AnySF = playerTable.Position.str.contains('SF') * 1
AnyPF = playerTable.Position.str.contains('PF') * 1
AnyC = playerTable.Position.str.contains('C') * 1
AnyPos = [1] * 500
ConVec_Position = pd.DataFrame({'AnyPG': AnyPG, 'AnySG': AnySG, 'AnySF': AnySF, 'AnyPF': AnyPF, 'AnyC': AnyC, 'AnyPos': AnyPos})

# Make salary constraint vectors
ConVec_Salary = playerTable[['Salary']]

# Get Objective (projected points)
ProjectedPoints = playerTable['ProjPts'].tolist()

# Make constraint values and directions
ConValGT = [1, 1, 1, 1, 1, 8] # Must have at least one of each position, and 8 total
ConValLT = [3, 3, 3, 3, 2, 8, 50000] # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT = [1] * (len(ConVec_Position.columns)) # Make >= directions for each position constraint
ConDirLT = [-1] * (len(ConVec_Position.columns) + 1) # Make <= directions for each position constraint, plus 1 for Salary

# Compile constraint vectors, values, and directions
ConVec_Final = pd.concat([ConVec_Position, ConVec_Position, ConVec_Salary], axis=1)
ConVal_Final = ConValGT + ConValLT
ConDir_Final = ConDirGT + ConDirLT

# Force all decisions to be binary
vLB = [0] * len(ProjectedPoints) # lower bound 0
vUB = [1] * len(ProjectedPoints) # upper bound 1
xint = [i+1 for i in range(len(ProjectedPoints))] # all decisions are integers (aka all are binary)

solveTimefix = time.time()
[obj, sol, duals] = lp_solve(ProjectedPoints, 
                             ConVec_Final.T.values.tolist(), 
                             ConVal_Final, 
                             ConDir_Final, 
                             vLB, 
                             vUB, 
                             xint)

solveTime

<details>
<summary>英文:</summary>

I have a very large mixed-integer linear programming problem that I need to run thousands of times, so speed is a priority. 

I need it to run in both `R` and in `python`. My `R` code runs very fast, the solver takes about 0.002 seconds to solve the problem. My `python` code is a different story, it takes anywhere from of 0.05 to 0.80 seconds and often times much more. 

I highly doubt that `R` is 25x more efficient than `python`, especially considering that (to my knowledge) both are using `C` code to solve the problem. I&#39;m looking for help to determine how to make my `python` code faster. I assume it starts with getting a different package than the one I have now? I really need the `python` code to consistently be below .01 seconds.

Here is sample `R` code and `python` code that solve the same problem. In this example of my real problem, I&#39;m building &quot;fantasy basketball&quot; lineups with certain constraints. Player pool is 500, must choose exactly 8 players with at least 1 and no more than 3 from each position. Salary must be under 50000. I randomly sample Positions/Salary since the actual solution isn&#39;t important as much as the speed.

**R Code**

library(lpSolve)
library(dplyr)

Make dummy dataframe with player positions, salaries, and projected points

Positions <- c('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
Position <- sample(Positions, 500, replace = TRUE)
Salary <- sample(seq(3000, 12000, by=100), 500, replace=TRUE)
playerTable <- data.frame(ID = 1:500, Position, Salary) %>%
mutate(ProjPts = round(Salary/1000*5))

Make positional constraint vectors

ConVec_Position <- playerTable %>%
select(Position) %>%
mutate(AnyPG = grepl('PG', Position)*1,
AnySG = grepl('SG', Position)*1,
AnySF = grepl('SF', Position)*1,
AnyPF = grepl('PF', Position)*1,
AnyC = grepl('C', Position)*1,

     orPGSGSFPFC = (AnyPG + AnySG + AnySF + AnyPF + AnyC &gt; 0)*1

) %>%
select(-Position)

Make salary constraint vectors

ConVec_Salary <- playerTable$Salary

Get Objective (projected points)

ProjectedPoints <- playerTable$ProjPts

Make constraint values and directions

ConValGT <- c(1,1,1,1,1, 8) # Must have at least one of each position, and 8 total
ConValLT <- c(3,3,3,3,2, 8, 50000) # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT <- c(rep(">=", ncol(ConVec_Position))) # Make >= directions for each position constraint
ConDirLT <- c(rep("<=", ncol(ConVec_Position) + 1)) # Make <= directions for each position constraint, plus 1 for Salary

Compile constraint vectors, values, and directions

ConVec_Final <- cbind(ConVec_Position, ConVec_Position, ConVec_Salary)
ConDir_Final <- c(ConDirGT, ConDirLT)
ConVal_Final <- c(ConValGT, ConValLT)

start_time <- proc.time()

sol <- lp("max",
objective.in = ProjectedPoints,
const.mat = t(ConVec_Final),
const.dir = ConDir_Final,
const.rhs = ConVal_Final,
binary.vec = 1:length(ProjectedPoints) # All decisions are binary
)

cat("\n--- It took ", round((proc.time() - start_time)[3], 4), " seconds to optimize lineups ---\n", sep = "")


**python Code**

import pandas as pd
import numpy as np
from lp_solve import *
import time

Make dummy dataframe with player positions, salaries, and projected points

ID = range(500)
Positions = ['PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C']
Position = np.random.choice(Positions,500)
Salary = np.random.choice(range(3000,12000,100),500)
playerTable = pd.DataFrame({'ID': ID, 'Position': Position, 'Salary': Salary})
playerTable['ProjPts'] = playerTable['Salary']/1000*5

Make positional constraint vectors

AnyPG = playerTable.Position.str.contains('PG')*1
AnySG = playerTable.Position.str.contains('SG')*1
AnySF = playerTable.Position.str.contains('SF')*1
AnyPF = playerTable.Position.str.contains('PF')*1
AnyC = playerTable.Position.str.contains('C')*1
AnyPos = [1]*500
ConVec_Position = pd.DataFrame({'AnyPG': AnyPG, 'AnySG': AnySG, 'AnySF': AnySF, 'AnyPF': AnyPF, 'AnyC': AnyC, 'AnyPos': AnyPos})

Make salary constraint vectors

ConVec_Salary = playerTable[['Salary']]

Get Objective (projected points)

ProjectedPoints = playerTable['ProjPts'].tolist()

Make constraint values and directions

ConValGT = [1,1,1,1,1, 8] # Must have at least one of each position, and 8 total
ConValLT = [3,3,3,3,2, 8, 50000] # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT = [1] * (len(ConVec_Position.columns)) # Make >= directions for each position constraint
ConDirLT = [-1] * (len(ConVec_Position.columns) + 1) # Make <= directions for each position constraint, plus 1 for Salary

Compile constraint vectors, values, and directions

ConVec_Final = pd.concat([ConVec_Position, ConVec_Position, ConVec_Salary], axis=1)
ConVal_Final = ConValGT + ConValLT
ConDir_Final = ConDirGT + ConDirLT

Force all decisions to be binary

vLB = [0] * len(ProjectedPoints) # lower bound 0
vUB = [1] * len(ProjectedPoints) # upper bound 1
xint = [i+1 for i in range(len(ProjectedPoints))] # all decisions are integers (aka all are binary)

solveTimefix = time.time()
[obj, sol, duals] = lp_solve(ProjectedPoints,
ConVec_Final.T.values.tolist(),
ConVal_Final,
ConDir_Final,
vLB,
vUB,
xint)

solveTimeCurrRun = (time.time() - solveTimefix)
print("\n--- It took %s seconds to get all LUs ---\n" % round(solveTimeCurrRun, 2))


Any ideas on how to make my `python` code run the same speed as `R`?
</details>
# 答案1
**得分**: 1
在我的笔记本电脑上,这个代码大约需要4毫秒,但重要的是它并不是一对一的比较。我认为你可能忽略了一个(很大的)约束:每个个体球员只能被分配到一个位置。
这个代码片段使用了HiGHS作为后端,我没有尝试过`lpsolve55`。输出结果如下:
```none
优化成功终止。 (HiGHS 状态 7: 最优解)
解决方案用时 3.4 毫秒
位置    薪水   预计得分
ID  位置
314 SF              SF   10000       50
390 PG              PG    3000       15
447 SG              SG    3000       15
467 SF              SF   10000       50
480 PF              PF    7000       35
484 SG              SG    3000       15
487 C                C    7000       35
494 C                C    7000       35
总计:
薪水     50000
预计得分      250

你也可以尝试在PuLP/CBC上运行(这里没有展示)。

英文:

On my craptop this takes about 4 ms, but importantly it is not apples-to-apples. I believe that you are missing a (large) constraint: each individual player can only be assigned to one position.

from time import perf_counter

import pandas as pd
import numpy as np
import scipy.sparse
from scipy.optimize import milp, LinearConstraint, Bounds

from numpy.random._generator import default_rng


rand = default_rng(seed=0)

# Make dummy dataframe with player positions, salaries, and projected points
n = 500
position_combos = (&#39;PG&#39;, &#39;SG&#39;, &#39;SF&#39;, &#39;PF&#39;, &#39;C&#39;, &#39;PG/SG&#39;, &#39;SG/SF&#39;, &#39;PF/C&#39;)
salary_k = rand.integers(low=3, high=12, size=n)
players = pd.DataFrame(
    {
        &#39;Positions&#39;: rand.choice(position_combos, n),
        &#39;Salary&#39;: salary_k * 1_000,
        &#39;ProjPts&#39;: salary_k * 5,
    },
    index=pd.RangeIndex(name=&#39;ID&#39;, stop=n),
)  # 500x3

# Convert from slash-separated position option strings to a dense ID/Position multi-index
exploded_positions = players.Positions.str.split(&#39;/&#39;, expand=True)  # 500x2
position_values = np.array((&#39;PG&#39;, &#39;SG&#39;, &#39;SF&#39;, &#39;PF&#39;, &#39;C&#39;))
position_sparse = pd.DataFrame(
    (
        exploded_positions.values[:, :, np.newaxis]
        == position_values[np.newaxis, np.newaxis, :]
    ).any(axis=1),
    index=players.index,
    columns=pd.Index(name=&#39;Position&#39;, data=position_values),
)  # 500x5
position_sparse[~position_sparse] = np.nan
positions = position_sparse.stack().index  # 685x2

# Sparse 2D position assignment matrix with guaranteed single one per column
position_assignments = np.equal.outer(
    positions.get_level_values(&#39;Position&#39;),
    position_values,
).astype(float).T  # 5x685

# Sparse 2D player assignment matrix with guaranteed single one per column
player_assignments = np.equal.outer(
    positions.get_level_values(&#39;ID&#39;),
    players.index,
).astype(float).T  # 500x685

# Dense salary assignment (row) with repeats according to player ID
salary_assignments = players.Salary[positions.get_level_values(&#39;ID&#39;)]  # 685

# Cost values: negative projected points, only where a player can take a given position;
# multi-indexed by ID and Position
c = -players.ProjPts[positions.get_level_values(&#39;ID&#39;)].astype(float)
c.index = positions  # 685

integrality = np.ones(shape=c.size, dtype=np.uint8)  # 685
bounds = Bounds(
    lb=np.zeros_like(c),
    ub=np.ones_like(c),  # 685
)

A = scipy.sparse.csc_array(
    scipy.sparse.vstack((
        scipy.sparse.csr_array(np.ones_like(c)),       # The total number of selected players must be fixed
        scipy.sparse.csr_array(salary_assignments),    # There is a maximum salary
        scipy.sparse.csr_array(position_assignments),  # Each position count has a min and max
        scipy.sparse.csr_array(player_assignments),    # One player cannot occupy multiple positions
    ), format=&#39;csc&#39;)  # this is what milp uses; see _milp_iv()
)  # 507x685
lb, ub = np.block([
    [8, -np.inf, 1, 1, 1, 1, 1, np.full(shape=n, fill_value=-np.inf)],
    [8,  50_000, 3, 3, 3, 3, 2, np.ones(n)],
])

t0 = perf_counter()
result = milp(
    c=c, integrality=integrality, bounds=bounds,
    constraints=(LinearConstraint(A=A, lb=lb, ub=ub),),
)
t1 = perf_counter()
assert result.success, result.message

print(result.message)
print(f&#39;Solution in {1e3*(t1-t0):.1f} ms&#39;)

outputs = pd.Series(name=&#39;Assigned&#39;, data=result.x, index=positions)
selections = outputs[outputs &gt; 0.5].index
merged_outputs = players.loc[selections.get_level_values(&#39;ID&#39;)].set_index(selections)
print(merged_outputs, end=&#39;\n\n&#39;)

print(&#39;Total:&#39;)
print(merged_outputs[[&#39;Salary&#39;, &#39;ProjPts&#39;]].sum().to_string())

This uses HiGHS in the backend; I have not attempted lpsolve55. The output is

Optimization terminated successfully. (HiGHS Status 7: Optimal)
Solution in 3.4 ms
Positions  Salary  ProjPts
ID  Position                           
314 SF              SF   10000       50
390 PG              PG    3000       15
447 SG              SG    3000       15
467 SF              SF   10000       50
480 PF              PF    7000       35
484 SG              SG    3000       15
487 C                C    7000       35
494 C                C    7000       35
Total:
Salary     50000
ProjPts      250

You could also try running on PuLP/CBC (I have not shown that here).

huangapple
  • 本文由 发表于 2023年7月18日 06:49:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76708540.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定