Testing Python Package Dependencies

huangapple go评论75阅读模式
英文:

Testing Python Package Dependencies

问题

以下是您要翻译的内容的翻译部分:

假设我有一个广泛分布/使用的Python包叫做foo,它旨在与以下依赖项一起使用:

  • pandas> = 1.3.0
  • pyarrow> = 8.0
  • python> = 3.8

我如何确保我的foo包实际上与所有这些依赖关系兼容,以便人们在使用我的包时有无缝的体验?

我想到的一个想法是针对许多具有不同版本的依赖包的环境运行我的测试套件。例如,在具有以下依赖版本的环境下运行测试套件13次:

  1. pandas=1.3.0, pyarrow=11.0, python=3.11.2
  2. pandas=1.4.0, pyarrow=11.0, python=3.11.2
  3. pandas=1.5.0, pyarrow=11.0, python=3.11.2
  4. pandas=2.0.0, pyarrow=11.0, python=3.11.2
  5. pyarrow=8.0, pandas=2.0.0, python=3.11.2
  6. pyarrow=9.0, pandas=2.0.0, python=3.11.2
  7. pyarrow=10.0, pandas=2.0.0, python=3.11.2
  8. pyarrow=11.0, pandas=2.0.0, python=3.11.2
  9. python=3.8, pandas=2.0.0, pyarrow=11.0
  10. python=3.9, pandas=2.0.0, pyarrow=11.0
  11. python=3.10, pandas=2.0.0, pyarrow=11.0
  12. python=3.11, pandas=2.0.0, pyarrow=11.0
  13. python=3.11.2, pandas=2.0.0, pyarrow=11.0

有没有更强大的方法来做到这一点?例如,如果我的foo包与pandas版本1.5.3不兼容,我认为测试所有依赖包的每个主要和次要版本发布是不可行的。

英文:

Lets say I have a widely distributed/used python package called foo that's designed to work with the following dependencies:

  • pandas>=1.3.0    
  • pyarrow>=8.0
  • python>=3.8

How do I make sure that my foo package is actually compatible with all those dependencies so that people have a seamless experience with using my package?

One idea that I had is to run my test suite against a whole bunch of environments with different versions of the dependent packages. For example, run the test suite 13 times under environments with the following dependency versions:

  1. pandas=1.3.0, pyarrow=11.0, python=3.11.2
  2. pandas=1.4.0, pyarrow=11.0, python=3.11.2
  3. pandas=1.5.0, pyarrow=11.0, python=3.11.2
  4. pandas=2.0.0, pyarrow=11.0, python=3.11.2
  5. pyarrow=8.0, pandas=2.0.0, python=3.11.2
  6. pyarrow=9.0, pandas=2.0.0, python=3.11.2
  7. pyarrow=10.0, pandas=2.0.0, python=3.11.2
  8. pyarrow=11.0, pandas=2.0.0, python=3.11.2
  9. python=3.8, pandas=2.0.0, pyarrow=11.0
  10. python=3.9, pandas=2.0.0, pyarrow=11.0
  11. python=3.10, pandas=2.0.0, pyarrow=11.0
  12. python=3.11, pandas=2.0.0, pyarrow=11.0
  13. python=3.11.2, pandas=2.0.0, pyarrow=11.0

Is there a more robust way to do it? For example, what if my foo package doesn't work with pandas version 1.5.3. I don't think testing every major and minor release for all the dependent packages is feasible.

答案1

得分: 1

在一般情况下,我们可能会有远超过三个依赖项,导致组合爆炸。依赖项之间的相互兼容性可能很脆弱,会让你需要跟踪诸如“在破坏性更改X和错误修复Y之间,A 不能导入B”之类的事情。导入重命名有时会引发类似的问题。

例如,测试 pandas 1.5.0 可能在我们知道 1.5.3 的错误修复后就不再那么有兴趣了。

> 有没有更稳健的方法?

我建议你采用“时间点”的方法,使测试配置类似于实际用户配置。

首先选择一个预算K个测试和一个“最早”的日期。我们将在该日期和当前日期之间进行测试,因此最初对于这些日期,我们有2个测试,剩余的预算为K - 2。

对于给定的历史日期,扫描依赖项的发布日期,计算它们的最小值,并请求安装相应的版本号。允许灵活性,以便你可以安装例如 pandas 1.4.4(“<1.5”),而不是不那么有趣的 1.4.0。运行测试,观察它是否成功。报告测试的相应日期,这是已安装依赖项的日期中的最大日期。

此时,你可以采取两种方法。你可以选择一个单一的依赖项并对其进行约束(“>= 1.5”或“>= 2.0”),以模拟希望获得某个已发布功能并更新特定包的用户。你的测试预算更好的方式可能是对你的“报告”日期范围进行二分搜索,找到依赖项升级其次版本号的时间,并调整约束以引入该更改。这_可能_会影响单个依赖项,但很可能安装求解器也会升级其他依赖项,这也没有问题。报告测试结果,洗涤,重复,使用预算。骄傲地在你的网站上发布测试详细信息。

鉴于_一切_都依赖于cPython解释器,进行“时间点”测试的一种方法是简单地选择K个解释器版本并对其进行约束,以要求对版本号进行精确匹配,例如3.10.8。尽可能降低各个次要版本号,例如 pandas "< 1.5" 或 "< 1.4"。

英文:

In general we may have significantly more than three deps,
leading to combinatorial explosion.
And mutual compatibility among the deps may be fragile,
burdening you with tracking things like "A cannot import B
between breaking change X and bugfix Y".
Import renames will sometimes stir up trouble of that sort.

Testing e.g. pandas 1.5.0 may be of limited interest
once we know the bugfixes of 1.5.3 are out.

> Is there a more robust way to do it?

I recommend you adopt a "point in time" approach,
so test configs resemble real user configs.

First pick a budget of K tests, and an "earliest" date.
We will test between that date and current date,
so initially we have 2 tests for those dates,
with K - 2 remaining in the budget.

For a given historic date, scan the deps for
their release dates, compute min over them,
and request installation of the corresponding
version number. Allow flexibility so that
you get e.g. pandas 1.4.4 installed ("< 1.5")
rather than the less interesting 1.4.0.
Run the test, watch it succeed.
Report the test's corresponding date,
which is max of dates for the installed dependencies.

At this point there's two ways you could go.
You might pick a single dep and constrain
it (">= 1.5" or ">= 2.0") to simulate a user
who wanted a certain released feature and
updated that specific package.
Likely a better way for you to spend the test budget
is to bisect a range of your "reported" dates,
locate when a dep bumped its minor version number,
and adjust the constraints to pull that in.
It may affect a single dep,
but likely the install solver will uprev
additional deps, as well, and that's fine.
Report the test result, lather, rinse, consume the budget.
Proudly publish the testing details on your website.


Given that everything takes a dependency on the cPython
interpreter, one way to do "point in time" is to
simply pick K interpreter releases and constrain
the install so it demands exact match on the release
number, e.g. 3.10.8. Ratchet down the various minor version numbers
as far as you can get away with, e.g. pandas "< 1.5" or "< 1.4".

答案2

得分: 0

https://sentry.engineering/blog/how-we-run-our-python-tests-in-hundreds-of-environments-really-fast

他们在数百个环境中测试他们的软件包,看起来他们对一些软件包测试主要版本,对其他软件包只测试最新版本。令人印象深刻的是,他们仍然支持 Python 2.7 和 8 年前的依赖软件包版本。我问过他们如何选择要测试的软件包版本,他们说基本上是在为一个框架添加集成时测试当前可用的版本,然后几乎不再删除该版本。

我另外想到的一个想法是将 foo 软件包与依赖软件包的克隆版本一起发布。例如,将 pandas 复制并粘贴为 my_pandas 并将其包含在 foo 软件包中(以及 my_numpy 和其他依赖项,一直到依赖关系树的底层)。在内部,foo 软件包将导入 my_pandas 而不是使用用户环境中的 pandas 版本。我认为这种方法与 spyder IDE 使用的方法有些相似。

英文:

I just came across this article below in case it helps anyone:

https://sentry.engineering/blog/how-we-run-our-python-tests-in-hundreds-of-environments-really-fast

They test their package against hundreds of environments and looking at their repo it seems like they're testing against major versions for some packages and just the latest versions of others. It's kind of impressive that they still support python 2.7 and a dependent package version from 8 years ago. I asked them how they pick which versions of packages they test against and they said they basically test against the current versions available when they add an integration for a framework and then almost never remove that version once it's there.

Another idea i had is to ship the foo package with cloned versions of the dependent packages. For example, basically copy and past pandas as my_pandas and include it in the foo package (along with my_numpy and other dependencies, walking down the dependency tree). Internally the foo package would import my_pandas instead of using the pandas version in the users environment. I think this approach is somewhat similar to what the spyder IDE uses.

huangapple
  • 本文由 发表于 2023年4月4日 07:59:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924550.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定