2023年7月18日 06:49:52go评论88阅读模式

英文:

Why is my Mixed-Integer Linear Programming problem running slow on python compared to R?

问题

I have a very large mixed-integer linear programming problem that I need to run thousands of times, so speed is a priority.

I need it to run in both R and in python. My R code runs very fast, the solver takes about 0.002 seconds to solve the problem. My python code is a different story, it takes anywhere from of 0.05 to 0.80 seconds and often times much more.

I highly doubt that R is 25x more efficient than python, especially considering that (to my knowledge) both are using C code to solve the problem. I'm looking for help to determine how to make my python code faster. I assume it starts with getting a different package than the one I have now? I really need the python code to consistently be below .01 seconds.

Here is sample R code and python code that solve the same problem. In this example of my real problem, I'm building "fantasy basketball" lineups with certain constraints. Player pool is 500, must choose exactly 8 players with at least 1 and no more than 3 from each position. Salary must be under 50000. I randomly sample Positions/Salary since the actual solution isn't important as much as the speed.

R Code

library(lpSolve)
library(dplyr)

# Make dummy dataframe with player positions, salaries, and projected points
Positions <- c('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
Position <- sample(Positions, 500, replace = TRUE)
Salary <- sample(seq(3000, 12000, by=100), 500, replace=TRUE)
playerTable <- data.frame(ID = 1:500, Position, Salary) %>%
  mutate(ProjPts = round(Salary/1000*5))

# Make positional constraint vectors
ConVec_Position <- playerTable %>%
  select(Position) %>%
  mutate(AnyPG =  grepl('PG', Position)*1,
         AnySG = grepl('SG', Position)*1,
         AnySF = grepl('SF', Position)*1,
         AnyPF = grepl('PF', Position)*1,
         AnyC = grepl('C', Position)*1,
         orPGSGSFPFC = (AnyPG + AnySG + AnySF + AnyPF + AnyC > 0)*1
  ) %>%
  select(-Position)

# Make salary constraint vectors
ConVec_Salary <- playerTable$Salary

# Get Objective (projected points)
ProjectedPoints <- playerTable$ProjPts

# Make constraint values and directions
ConValGT <- c(1,1,1,1,1, 8) # Must have at least one of each position, and 8 total
ConValLT <- c(3,3,3,3,2, 8, 50000) # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT <- c(rep(">=", ncol(ConVec_Position))) # Make >= directions for each position constraint
ConDirLT <- c(rep("<=", ncol(ConVec_Position) + 1)) # Make <= directions for each position constraint, plus 1 for Salary

# Compile constraint vectors, values, and directions
ConVec_Final <- cbind(ConVec_Position, ConVec_Position, ConVec_Salary)
ConDir_Final <- c(ConDirGT, ConDirLT)
ConVal_Final <- c(ConValGT, ConValLT)

start_time <- proc.time()

sol <- lp("max",
          objective.in = ProjectedPoints,
          const.mat	= t(ConVec_Final),
          const.dir	= ConDir_Final,
          const.rhs	= ConVal_Final,
          binary.vec = 1:length(ProjectedPoints) # All decisions are binary
)

cat("\n--- It took ", round((proc.time() - start_time)[3], 4), " seconds to optimize lineups ---\n", sep = "")

Python Code

import pandas as pd
import numpy as np
from lp_solve import *
import time

# Make dummy dataframe with player positions, salaries, and projected points
ID = range(500)
Positions = ['PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C']
Position = np.random.choice(Positions, 500)
Salary = np.random.choice(range(3000, 12000, 100), 500)
playerTable = pd.DataFrame({'ID': ID, 'Position': Position, 'Salary': Salary})
playerTable['ProjPts'] = playerTable['Salary'] / 1000 * 5

# Make positional constraint vectors
AnyPG = playerTable.Position.str.contains('PG') * 1
AnySG = playerTable.Position.str.contains('SG') * 1
AnySF = playerTable.Position.str.contains('SF') * 1
AnyPF = playerTable.Position.str.contains('PF') * 1
AnyC = playerTable.Position.str.contains('C') * 1
AnyPos = [1] * 500
ConVec_Position = pd.DataFrame({'AnyPG': AnyPG, 'AnySG': AnySG, 'AnySF': AnySF, 'AnyPF': AnyPF, 'AnyC': AnyC, 'AnyPos': AnyPos})

# Make salary constraint vectors
ConVec_Salary = playerTable[['Salary']]

# Get Objective (projected points)
ProjectedPoints = playerTable['ProjPts'].tolist()

# Make constraint values and directions
ConValGT = [1, 1, 1, 1, 1, 8] # Must have at least one of each position, and 8 total
ConValLT = [3, 3, 3, 3, 2, 8, 50000] # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT = [1] * (len(ConVec_Position.columns)) # Make >= directions for each position constraint
ConDirLT = [-1] * (len(ConVec_Position.columns) + 1) # Make <= directions for each position constraint, plus 1 for Salary

# Compile constraint vectors, values, and directions
ConVec_Final = pd.concat([ConVec_Position, ConVec_Position, ConVec_Salary], axis=1)
ConVal_Final = ConValGT + ConValLT
ConDir_Final = ConDirGT + ConDirLT

# Force all decisions to be binary
vLB = [0] * len(ProjectedPoints) # lower bound 0
vUB = [1] * len(ProjectedPoints) # upper bound 1
xint = [i+1 for i in range(len(ProjectedPoints))] # all decisions are integers (aka all are binary)

solveTimefix = time.time()
[obj, sol, duals] = lp_solve(ProjectedPoints, 
                             ConVec_Final.T.values.tolist(), 
                             ConVal_Final, 
                             ConDir_Final, 
                             vLB, 
                             vUB, 
                             xint)

solveTime

<details>
<summary>英文:</summary>

I have a very large mixed-integer linear programming problem that I need to run thousands of times, so speed is a priority. 

I need it to run in both `R` and in `python`. My `R` code runs very fast, the solver takes about 0.002 seconds to solve the problem. My `python` code is a different story, it takes anywhere from of 0.05 to 0.80 seconds and often times much more. 

I highly doubt that `R` is 25x more efficient than `python`, especially considering that (to my knowledge) both are using `C` code to solve the problem. I&#39;m looking for help to determine how to make my `python` code faster. I assume it starts with getting a different package than the one I have now? I really need the `python` code to consistently be below .01 seconds.

Here is sample `R` code and `python` code that solve the same problem. In this example of my real problem, I&#39;m building &quot;fantasy basketball&quot; lineups with certain constraints. Player pool is 500, must choose exactly 8 players with at least 1 and no more than 3 from each position. Salary must be under 50000. I randomly sample Positions/Salary since the actual solution isn&#39;t important as much as the speed.

**R Code**

library(lpSolve)
library(dplyr)

Make dummy dataframe with player positions, salaries, and projected points

Positions <- c('PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C')
Position <- sample(Positions, 500, replace = TRUE)
Salary <- sample(seq(3000, 12000, by=100), 500, replace=TRUE)
playerTable <- data.frame(ID = 1:500, Position, Salary) %>%
mutate(ProjPts = round(Salary/1000*5))

Make positional constraint vectors

ConVec_Position <- playerTable %>%
select(Position) %>%
mutate(AnyPG = grepl('PG', Position)*1,
AnySG = grepl('SG', Position)*1,
AnySF = grepl('SF', Position)*1,
AnyPF = grepl('PF', Position)*1,
AnyC = grepl('C', Position)*1,

     orPGSGSFPFC = (AnyPG + AnySG + AnySF + AnyPF + AnyC &gt; 0)*1

) %>%
select(-Position)

Make salary constraint vectors

ConVec_Salary <- playerTable$Salary

Get Objective (projected points)

ProjectedPoints <- playerTable$ProjPts

Make constraint values and directions

ConValGT <- c(1,1,1,1,1, 8) # Must have at least one of each position, and 8 total
ConValLT <- c(3,3,3,3,2, 8, 50000) # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT <- c(rep(">=", ncol(ConVec_Position))) # Make >= directions for each position constraint
ConDirLT <- c(rep("<=", ncol(ConVec_Position) + 1)) # Make <= directions for each position constraint, plus 1 for Salary

Compile constraint vectors, values, and directions

ConVec_Final <- cbind(ConVec_Position, ConVec_Position, ConVec_Salary)
ConDir_Final <- c(ConDirGT, ConDirLT)
ConVal_Final <- c(ConValGT, ConValLT)

start_time <- proc.time()

sol <- lp("max",
objective.in = ProjectedPoints,
const.mat = t(ConVec_Final),
const.dir = ConDir_Final,
const.rhs = ConVal_Final,
binary.vec = 1:length(ProjectedPoints) # All decisions are binary
)

cat("\n--- It took ", round((proc.time() - start_time)[3], 4), " seconds to optimize lineups ---\n", sep = "")


**python Code**

import pandas as pd
import numpy as np
from lp_solve import *
import time

Make dummy dataframe with player positions, salaries, and projected points

ID = range(500)
Positions = ['PG', 'SG', 'SF', 'PF', 'C', 'PG/SG', 'SG/SF', 'PF/C']
Position = np.random.choice(Positions,500)
Salary = np.random.choice(range(3000,12000,100),500)
playerTable = pd.DataFrame({'ID': ID, 'Position': Position, 'Salary': Salary})
playerTable['ProjPts'] = playerTable['Salary']/1000*5

Make positional constraint vectors

AnyPG = playerTable.Position.str.contains('PG')*1
AnySG = playerTable.Position.str.contains('SG')*1
AnySF = playerTable.Position.str.contains('SF')*1
AnyPF = playerTable.Position.str.contains('PF')*1
AnyC = playerTable.Position.str.contains('C')*1
AnyPos = [1]*500
ConVec_Position = pd.DataFrame({'AnyPG': AnyPG, 'AnySG': AnySG, 'AnySF': AnySF, 'AnyPF': AnyPF, 'AnyC': AnyC, 'AnyPos': AnyPos})

Make salary constraint vectors

ConVec_Salary = playerTable[['Salary']]

Get Objective (projected points)

ProjectedPoints = playerTable['ProjPts'].tolist()

Make constraint values and directions

ConValGT = [1,1,1,1,1, 8] # Must have at least one of each position, and 8 total
ConValLT = [3,3,3,3,2, 8, 50000] # Can't have more than 3 of each position, 8 total, and 50000 salary
ConDirGT = [1] * (len(ConVec_Position.columns)) # Make >= directions for each position constraint
ConDirLT = [-1] * (len(ConVec_Position.columns) + 1) # Make <= directions for each position constraint, plus 1 for Salary

Compile constraint vectors, values, and directions

ConVec_Final = pd.concat([ConVec_Position, ConVec_Position, ConVec_Salary], axis=1)
ConVal_Final = ConValGT + ConValLT
ConDir_Final = ConDirGT + ConDirLT

Force all decisions to be binary

vLB = [0] * len(ProjectedPoints) # lower bound 0
vUB = [1] * len(ProjectedPoints) # upper bound 1
xint = [i+1 for i in range(len(ProjectedPoints))] # all decisions are integers (aka all are binary)

solveTimefix = time.time()
[obj, sol, duals] = lp_solve(ProjectedPoints,
ConVec_Final.T.values.tolist(),
ConVal_Final,
ConDir_Final,
vLB,
vUB,
xint)

solveTimeCurrRun = (time.time() - solveTimefix)
print("\n--- It took %s seconds to get all LUs ---\n" % round(solveTimeCurrRun, 2))


Any ideas on how to make my `python` code run the same speed as `R`?
</details>
# 答案1
**得分**: 1
在我的笔记本电脑上，这个代码大约需要4毫秒，但重要的是它并不是一对一的比较。我认为你可能忽略了一个（很大的）约束：每个个体球员只能被分配到一个位置。
这个代码片段使用了HiGHS作为后端，我没有尝试过`lpsolve55`。输出结果如下：
```none
优化成功终止。 (HiGHS 状态 7: 最优解)
解决方案用时 3.4 毫秒
位置    薪水   预计得分
ID  位置
314 SF              SF   10000       50
390 PG              PG    3000       15
447 SG              SG    3000       15
467 SF              SF   10000       50
480 PF              PF    7000       35
484 SG              SG    3000       15
487 C                C    7000       35
494 C                C    7000       35
总计:
薪水     50000
预计得分      250

你也可以尝试在PuLP/CBC上运行（这里没有展示）。

英文:

On my craptop this takes about 4 ms, but importantly it is not apples-to-apples. I believe that you are missing a (large) constraint: each individual player can only be assigned to one position.

from time import perf_counter

import pandas as pd
import numpy as np
import scipy.sparse
from scipy.optimize import milp, LinearConstraint, Bounds

from numpy.random._generator import default_rng


rand = default_rng(seed=0)

# Make dummy dataframe with player positions, salaries, and projected points
n = 500
position_combos = (&#39;PG&#39;, &#39;SG&#39;, &#39;SF&#39;, &#39;PF&#39;, &#39;C&#39;, &#39;PG/SG&#39;, &#39;SG/SF&#39;, &#39;PF/C&#39;)
salary_k = rand.integers(low=3, high=12, size=n)
players = pd.DataFrame(
    {
        &#39;Positions&#39;: rand.choice(position_combos, n),
        &#39;Salary&#39;: salary_k * 1_000,
        &#39;ProjPts&#39;: salary_k * 5,
    },
    index=pd.RangeIndex(name=&#39;ID&#39;, stop=n),
)  # 500x3

# Convert from slash-separated position option strings to a dense ID/Position multi-index
exploded_positions = players.Positions.str.split(&#39;/&#39;, expand=True)  # 500x2
position_values = np.array((&#39;PG&#39;, &#39;SG&#39;, &#39;SF&#39;, &#39;PF&#39;, &#39;C&#39;))
position_sparse = pd.DataFrame(
    (
        exploded_positions.values[:, :, np.newaxis]
        == position_values[np.newaxis, np.newaxis, :]
    ).any(axis=1),
    index=players.index,
    columns=pd.Index(name=&#39;Position&#39;, data=position_values),
)  # 500x5
position_sparse[~position_sparse] = np.nan
positions = position_sparse.stack().index  # 685x2

# Sparse 2D position assignment matrix with guaranteed single one per column
position_assignments = np.equal.outer(
    positions.get_level_values(&#39;Position&#39;),
    position_values,
).astype(float).T  # 5x685

# Sparse 2D player assignment matrix with guaranteed single one per column
player_assignments = np.equal.outer(
    positions.get_level_values(&#39;ID&#39;),
    players.index,
).astype(float).T  # 500x685

# Dense salary assignment (row) with repeats according to player ID
salary_assignments = players.Salary[positions.get_level_values(&#39;ID&#39;)]  # 685

# Cost values: negative projected points, only where a player can take a given position;
# multi-indexed by ID and Position
c = -players.ProjPts[positions.get_level_values(&#39;ID&#39;)].astype(float)
c.index = positions  # 685

integrality = np.ones(shape=c.size, dtype=np.uint8)  # 685
bounds = Bounds(
    lb=np.zeros_like(c),
    ub=np.ones_like(c),  # 685
)

A = scipy.sparse.csc_array(
    scipy.sparse.vstack((
        scipy.sparse.csr_array(np.ones_like(c)),       # The total number of selected players must be fixed
        scipy.sparse.csr_array(salary_assignments),    # There is a maximum salary
        scipy.sparse.csr_array(position_assignments),  # Each position count has a min and max
        scipy.sparse.csr_array(player_assignments),    # One player cannot occupy multiple positions
    ), format=&#39;csc&#39;)  # this is what milp uses; see _milp_iv()
)  # 507x685
lb, ub = np.block([
    [8, -np.inf, 1, 1, 1, 1, 1, np.full(shape=n, fill_value=-np.inf)],
    [8,  50_000, 3, 3, 3, 3, 2, np.ones(n)],
])

t0 = perf_counter()
result = milp(
    c=c, integrality=integrality, bounds=bounds,
    constraints=(LinearConstraint(A=A, lb=lb, ub=ub),),
)
t1 = perf_counter()
assert result.success, result.message

print(result.message)
print(f&#39;Solution in {1e3*(t1-t0):.1f} ms&#39;)

outputs = pd.Series(name=&#39;Assigned&#39;, data=result.x, index=positions)
selections = outputs[outputs &gt; 0.5].index
merged_outputs = players.loc[selections.get_level_values(&#39;ID&#39;)].set_index(selections)
print(merged_outputs, end=&#39;\n\n&#39;)

print(&#39;Total:&#39;)
print(merged_outputs[[&#39;Salary&#39;, &#39;ProjPts&#39;]].sum().to_string())

This uses HiGHS in the backend; I have not attempted lpsolve55. The output is

Optimization terminated successfully. (HiGHS Status 7: Optimal)
Solution in 3.4 ms
Positions  Salary  ProjPts
ID  Position                           
314 SF              SF   10000       50
390 PG              PG    3000       15
447 SG              SG    3000       15
467 SF              SF   10000       50
480 PF              PF    7000       35
484 SG              SG    3000       15
487 C                C    7000       35
494 C                C    7000       35
Total:
Salary     50000
ProjPts      250

You could also try running on PuLP/CBC (I have not shown that here).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我的混合整数线性规划问题在Python上运行速度比R慢的原因是什么？

问题

Make dummy dataframe with player positions, salaries, and projected points

Make positional constraint vectors

Make salary constraint vectors

Get Objective (projected points)

Make constraint values and directions

Compile constraint vectors, values, and directions

Make dummy dataframe with player positions, salaries, and projected points

Make positional constraint vectors

Make salary constraint vectors

Get Objective (projected points)

Make constraint values and directions

Compile constraint vectors, values, and directions

Force all decisions to be binary

如何在循环中将不同数据框的列相加？

rvecs和tvecs在OpenCV的calibrateCamera中为什么在每个视图中都不同？

Jupyter notebook在使用SVM核进行训练时需要无限的时间。

使用RGB值和坐标绘制Pandas DataFrame的图形

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论