英文:
How to convert a conda env yaml file to a list of requirements for a settings.ini file accounting for channels and conversions for pypi
问题
Motivation
我使用并喜欢nbdev。它使得逐步开发一个Python包变得非常容易。尤其是因为我通常会在使用该包的项目旁边开发它。这个问题是否需要了解nbdev?不需要。它只是解释了我为什么提出这个问题。通常情况下,当我创建一个新的nbdev项目(nbdev_new
)时,我会得到一个settings.ini
文件和一个setup.py
文件。为了使在不同的包/项目上继续工作变得简单,我会立即为项目创建一个conda环境文件 env.yml
(参见文件 > 示例Conda文件
[也许这个链接有效][示例环境文件])。
我知道开发包的环境不一定是它所需的最低依赖关系。此外,一个包的依赖关系是使用该包进行项目工作时可能需要的那些依赖关系的子集。在我的用例中,我很明确地双重使用了它!我在项目上使用它时正在开发包。
因此,出于这个问题的考虑,让我们假设包依赖关系 == 项目依赖关系
。换句话说,env.yml
文件包含了settings.ini
文件的所有requirements
。
nbdev工作流程
-
创建一个新的空仓库“current_project”并克隆它
-
cd path/to/current_project
-
nbdev_new
-
创建
env.yml
文件 -
创建/更新环境:
# 创建conda环境
$ mamba env create -f env.yml
# 根据需要更新conda环境
$ mamba env update -n current_project --file env.yml
# $ mamba env update -n current_project --file env.mac.yml
- 激活环境:
# 激活conda环境
$ conda activate current_project
- 安装
current_project
:
# 用于本地开发安装
$ pip install -e .
问题阐述
我正在使用setup.py
文件开发一个Python包。我的包可能有依赖关系(在settings.ini
下的键requirements
中列出),这些依赖关系会自动导入并在setup.py
文件中使用。在开发我的包时,我有一个conda环境,该环境在一个名为env.yml
的yaml文件中指定(参见文件 > 示例Conda文件
[也许这个链接有效][示例环境文件])。
我还有一些GitHub操作,用于测试我的包。我不喜欢手动更新settings.ini
(尤其是因为它不允许多行),以将要求导入setup.py
中。尤其是因为我已经在我的env.yml
文件中将它们列得很漂亮。所以我的问题如下:
问题
> 鉴于一个conda环境yaml文件(例如env.yml
),如何迭代其内容并将依赖关系(和版本)转换为正确的pypi
版本(setup.py
所需),并将它们存储在settings.ini
中的关键字requirements
下?
注意:
- conda中的版本规范要求与
pypi
不同。特别是=
与==
等等。 - conda的包名称可能与
pypi
不同。例如,pytorch
在pypi
上列为torch
,而在conda中列为pytorch
。 - 环境yaml文件可能包含渠道说明符,例如
conda-forge::<package-name>
- 环境yaml文件可能指定Python版本,例如
python>=3.10
,这不应该是一个要求。 - 我的理想解决方案适用于我的工作流程。这意味着
env.yml
的内容需要传递到settings.ini
中。
期望的结果
我的期望结果是,我可以将所有包要求存储在conda环境文件env.yml
中,并自动将它们找到并放入setup.py
文件的install_requires
中。由于我的工作流程是围绕从settings.ini
文件(来自nbdev)中读取要求来构建的,所以我希望解决方案能够获取env.yml
的值并将它们放入settings.ini
中。
注意:我会在下面的答案中分享我的当前解决方案。请帮助/改进它!
文件
示例conda文件
# 示例YAML文件
name: current_project
channels:
- pytorch
- conda-forge
- fastai
dependencies:
- python>=3.10
# 实用工具
# -------------------------------------------------------------------------
- tqdm
- rich
- typer
# Jupyter Notebook
# -------------------------------------------------------------------------
- conda-forge::notebook
- conda-forge::ipykernel
- conda-forge::ipywidgets
- conda-forge::jupyter_contrib_nbextensions
# nbdev
# -------------------------------------------------------------------------
- fastai::nbdev>=2.3.12
# PyTorch &深度学习
# -------------------------------------------------------------------------
- pytorch>=2
# 注意:如果使用支持CUDA的GPU,添加pytorch-cuda。如果您使用的是Apple Silicon,则需要删除此项
# - pytorch::pytorch-cuda
- conda-forge::pytorch-lightning
# 绘图
# -------------------------------------------------------------------------
- conda-forge::matplotlib
- conda-forge::seaborn
# 数据整理
# -------------------------------------------------------------------------
- conda-forge::scikit
<details>
<summary>英文:</summary>
**NOTE**: you can start at [Problem formulation][problem formulation]. The motivation section just explains how I found myself asking this.
# Motivation
I use and love [nbdev]. It makes developing a python package iteratively as easy as it gets. Especially, as I generally do this along side a project which uses said package. Does this question require knowledge of [nbdev]? **no**. It only motivates why I am asking. Normally when I create a new [nbdev] project (`nbdev_new`), I get a `settings.ini` file and `setup.py` file. In order to keep working on different packages / projects simple, I will immediately create a [conda environment file][conda yml] `env.yml` for the project (see `Files > Example Conda File` [maybe this link works][example env file]).
I know that the environment one develops a package in is **NOT** necessarily the minimum dependencies it requires. Further, a package's dependencies are a subset of those one may _need_ for a working on a project utilizing said package. In **MY** use case it is clear that I am double-dipping! I am *developing* the package as I _use_ it on a project.
So for the sake of this question let's assume that the `package dependencies == project dependencies`. In other words, the `env.yml` file contains all of the `requirements` for the `setting.ini` file.
## nbdev workflow
1. make new empty repo "current_project" and clone it
2. `cd path/to/current_project`
3. `nbdev_new`
4. make `env.yml` file
5. create / update environment:
```sh
# create conda environment
$ mamba env create -f env.yml
# update conda environment as needed
$ mamba env update -n current_project --file env.yml
# $ mamba env update -n current_project --file env.mac.yml
- active environment:
# activate conda environment
$ conda activate current_project
- install
current_project
:
# install for local development
$ pip install -e .
Problem formulation
I am developing a package in python using a setup.py
file. My package may have requirements (listed under settings.ini
with the key requirements
) that get automatically important and used in the setup.py
file. While developing my package I have a conda environment which is specified in a yaml file env.yml
(see Files > Example Conda File
maybe this link works).
I also have some GitHub actions that test my package. I dislike having to update settings.ini
manually (especially since it doesn't allow for multiple lines) to get the requirements into setup.py
. Especially as I have already listed them out nice and neatly in my env.yml
file. So my question is as follows:
Question
> Given a conda environment yaml file (e.g. env.yml
) how can one iterate through its content and convert the dependencies (and the versions) to the correct pypi
version (required by setup.py
), storing them in settings.ini
under the keyword requirements
?
caveats:
- version specifier requirements in conda are not the same as
pypi
. Most notably=
vs==
, amongst others. - package names for conda may not be the same for
pypi
. For examplepytorch
is listed astorch
forpypi
andpytorch
for conda. - the environment yaml file may have channel specifiers e.g.
conda-forge::<package-name>
- the environment yaml file may specify the python version e.g.
python>=3.10
, which shouldn't be a requirement. - MY ideal solution works with my workflow. That means the contents of
env.yml
need to get transferred tosettings.ini
.
Desired outcome
My desired outcome is that I can store all of my package requirements in the conda environment file env.yml
and have them automatically find themselves in the setup.py
file under install_requires
. Since my workflow is built around reading the requirements in from a settings.ini
file (from nbdev), I would like the solution to take the values of env.yml
and put them in settings.ini
.
Note I am sharing my current solution as an answer below. Please help / improve it!
Files
Example conda file
# EXAMPLE YAML FILE
name: current_project
channels:
- pytorch
- conda-forge
- fastai
dependencies:
- python>=3.10
# Utilities
# -------------------------------------------------------------------------
- tqdm
- rich
- typer
# Jupyter Notebook
# -------------------------------------------------------------------------
- conda-forge::notebook
- conda-forge::ipykernel
- conda-forge::ipywidgets
- conda-forge::jupyter_contrib_nbextensions
# nbdev
# -------------------------------------------------------------------------
- fastai::nbdev>=2.3.12
# PyTorch & Deep Learning
# -------------------------------------------------------------------------
- pytorch>=2
# NOTE: add pytorch-cuda if using a CUDA enabled GPU. You will need to
# remove this if you are on Apple Silicon
# - pytorch::pytorch-cuda
- conda-forge::pytorch-lightning
# Plotting
# -------------------------------------------------------------------------
- conda-forge::matplotlib
- conda-forge::seaborn
# Data Wrangling
# -------------------------------------------------------------------------
- conda-forge::scikit-learn
- pandas>=2
- numpy
- scipy
# Pip / non-conda packages
# -------------------------------------------------------------------------
- pip
- pip:
# PyTorch & Deep Learning
# -----------------------------------------------------------------------
- dgl
答案1
得分: 1
Current Solution
当前解决方案是文件 env_to_ini.py
(查看 Files > env_to_ini.py
也许这个链接有效)。
注意:此解决方案使用 rich
和 typer
创建了一个信息丰富的命令行界面(CLI),它将显示添加、更改、移除或未更改的依赖项。这应该会使与脚本一起工作更容易(特别是如果其中存在错误的话),因为它尚未经过广泛测试。
如何使用 env_to_ini.py
假设:
- 项目根目录下有
env.yml
或env.mac.yml
文件。 - 项目根目录下有
settings.ini
文件。 - 项目根目录下有
env_to_ini.py
文件。
此脚本提供了这样一个功能,如果 env.yml
(或 env.mac.yml
)文件发生更改,您可以自动更新 current_project
包(在 settings.ini
中)的依赖关系,使其与之匹配。
# 默认用法
$ python env_to_ini.py
# 显示未更改的包
$ python env_to_ini.py --unchanged
# 指定不同的环境文件
$ python env_to_ini.py --unchanged --file=env.mac.yml
注意事项
这有点糟糕。您可以根据需要为每个项目进行修改。所谓的 "糟糕" 主要位于两个 TODO
下,我现在会解释。注意,TODO
2 比 TODO
1 更重要。
TODO 1:排除
在提供的脚本中搜索以下内容:
# TODO: 改进 1:使用要排除的包的列表。
# 默认情况下,这将包括 python 和 pip。
根据原始问题,需要从 conda 环境文件 env.yml
中排除一些包。即 python
。在第一个 TODO
下,通过 if / elif / else
语句目前实现了这一点,您可以更改脚本以接受包含要排除的包或包含这些包的附加文件的额外参数。
注意:
如果您修改 env.yml
文件以在 dependencies
后面添加一个名为 ignore
的部分,不清楚它是否会影响您在 conda 中的使用。例如:
dependencies:
- python>=3.10
# - ...
ignore:
- python
- pip
# ...
TODO 2:映射
在提供的脚本中搜索以下内容:
# TODO: 改进 2:使用要重命名的包的映射。
# 默认情况下,这将包括 pytorch --> torch。
# 理想情况下,它会自动弄清楚。
根据原始问题,需要重命名一些包,因为它们在 conda 上的包名称与在 pypi
上的包名称不同。给出的示例是 pytorch
,在 pypi
上列为 torch
,在 conda 上列为 pytorch
。
在函数 requirements_to_ini
下,目前通过使用 if / elif / else
语句来实现这一点。您可以更改脚本以接受包含要重命名的包或包含这些包的附加文件的额外参数。
注意:
如果您修改 env.yml
文件以在 dependencies
后面添加一个名为 rename
的部分,不清楚它是否会影响您在 conda 中的使用。例如:
dependencies:
- python>=3.10
# - ...
rename:
- pytorch,torch
# ...
注意:
根据您的问题的期望结果,不清楚如何自动确定这一点。
Files
env_to_ini.py
# env_to_ini.py
import yaml
import configparser
from rich.console import Console
from rich.table import Table
import typer
from typing import Optional, Tuple
app = typer.Typer()
console = Console()
# NOTE: utility function to print colored text
def cprint(style: str, text: str) -> None:
console.print(f"[{style}]{text}[/{style}]")
def has_channel(requirements_str: str) -> bool:
return '::' in requirements_str
def extract_channel(requirements_str: str) -> Tuple[Optional[str], str]:
channel = None
if has_channel(requirements_str):
channel, requirements_str = requirements_str.split('::', 1)
return channel, requirements_str
def is_not_valid_package_char(s: str) -> bool:
return not (s.isalnum() or s in ['-', '_', '.'])
def split_str_at_first_non_alpha(s: str) -> Tuple[str, str]:
idx = next((
i for i, char in enumerate(s)
if is_not_valid_package_char(char)
), len(s))
return s[:idx], s[idx:]
def split_package_version(s: str) -> Tuple[str, str]:
# NOTE: alias for split_str_at_first_non_alpha
return split_str_at_first_non_alpha(s)
# NOTE: this parses requirements from the settings.ini file. Thus there is one line and each package is separated by a space.
def parse_requirements(requirements_str):
requirements = {}
for req in requirements_str.split():
package, version = split_package_version(req)
requirements[package] = version
return requirements
# NOTE: this parse depdencies form the env.yml file.
def extract_packages(dependencies):
packages = {}
for dep in dependencies:
if isinstance(dep, str):
channel, package_version = extract_channel(dep)
package, version = split_package_version(package_version)
# TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
# by default this would include python and pip.
# NOTE: we do not need to add python to the requirements
if package == 'python':
continue
# NOTE: likewise we do not need pip
elif package == 'pip':
continue
packages[package] = version
<details>
<summary>英文:</summary>
# Current Solution
The current solution is the file `env_to_ini.py` (see `Files > env_to_ini.py` [maybe this link works][script]).
**NOTE** this solution uses [`rich`][rich] and [`typer`][typer] to create an informative command line interface (CLI) which will show you which dependencies were added, changed, removed, or unchanged. This should make working with the script easier (especially should there be bugs in it) as it has not been extensively tested.
## How to use `env_to_ini.py`
Assumptions:
- `env.yml` or `env.mac.yml` under project root
- `settings.ini` under project root
- `env_to_ini.py` under project root
This script is provided so that if the `env.yml` (or `env.mac.yml`) file changes you can automatically update the dependencies of the `current_project` package (under `settings.ini`) to match.
```shell
# default usage
$ python env_to_ini.py
# show packages that didnt change
$ python env_to_ini.py --unchanged
# specify a different environment file
$ python env_to_ini.py --unchanged --file=env.mac.yml
Caveats
This is a bit hacky. You can modify it per project as needed. The so called "hackiness" is primarily located under the two TODO
s, which I will now explain. Note that TODO
2 is more important than TODO
1.
TODO 1: exclusion
Search for the following in the provided script:
# TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
# by default this would include python and pip.
Per the original question, some packages are to be excluded from the conda environment file env.yml
. Namely, python
. Under the first TODO
, which achieves this currently via an if / elif / else
statement, you could change the script to accept an additional argument containing package to exclude, or read in an additional file containing these.
NOTE:
It is unclear to me that if you were to modify your env.yml
file to have a section after dependencies
called ignore
if it would mess up your usage with conda. e.g.
dependencies:
- python>=3.10
# - ...
ignore:
- python
- pip
# ...
TODO 2: mapping
Search for the following in the provided script:
# TODO: IMPROVEMENT 2: utilize a map of packages to rename.
# by default this would include pytorch --> torch.
# Ideally, this would figure it out automatically.
Per the original question, some packages need to be renamed because their package name on conda is different than it is on pypi
. The example give is pytorch
which is listed as torch
for pypi
and pytorch
for conda.
Under the function requirements_to_ini
this is currently achieved by using an if / elif / else
statement. You could change the script to accept an additional argument containing package to rename, or read in an additional file containing these.
NOTE:
It is unclear to me that if you were to modify your env.yml
file to have a section after dependencies
called rename
if it would mess up your usage with conda. e.g.
dependencies:
- python>=3.10
# - ...
rename:
- pytorch,torch
# ...
NOTE:
It is unclear to me how you could determine this automatically per your question's desired outcome.
Files
env_to_ini.py
# env_to_ini.py
import yaml
import configparser
from rich.console import Console
from rich.table import Table
import typer
from typing import Optional, Tuple
app = typer.Typer()
console = Console()
# NOTE: utility function to print colored text
def cprint(style:str, text:str) -> None:
console.print(f"[{style}]{text}[/{style}]")
def has_channel(requirements_str:str) -> bool:
return '::' in requirements_str
def extract_channel(requirements_str:str) -> Tuple[Optional[str], str]:
channel = None
if has_channel(requirements_str):
channel, requirements_str = requirements_str.split('::', 1)
return channel, requirements_str
def is_not_valid_package_char(s:str) -> bool:
return not (s.isalnum() or s in ['-', '_', '.'])
def split_str_at_first_non_alpha(s:str) -> Tuple[str, str]:
idx = next((
i for i, char in enumerate(s)
if is_not_valid_package_char(char)
), len(s))
return s[:idx], s[idx:]
def split_package_version(s:str) -> Tuple[str, str]:
# NOTE: alias for split_str_at_first_non_alpha
return split_str_at_first_non_alpha(s)
# NOTE: this parses requirements from the settings.ini file. Thus there is one line and each package is separated by a space.
def parse_requirements(requirements_str):
requirements = {}
for req in requirements_str.split():
package, version = split_package_version(req)
requirements[package] = version
return requirements
# NOTE: this parse depdencies form the env.yml file.
def extract_packages(dependencies):
packages = {}
for dep in dependencies:
if isinstance(dep, str):
channel, package_version = extract_channel(dep)
package, version = split_package_version(package_version)
# TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
# by default this would include python and pip.
# NOTE: we do not need to add python to the requirements
if package == 'python':
continue
# NOTE: likewise we do not need pip
elif package == 'pip':
continue
packages[package] = version
elif isinstance(dep, dict):
for key, values in dep.items():
if key == 'pip':
for pip_dep in values:
package, version = split_package_version(pip_dep)
packages[package] = version
return packages
# NOTE: check if the depdencies in the env.yml file vary from the ones in the settings.ini file.
def compare_requirements(old, new):
added = {k: v for k, v in new.items() if k not in old}
removed = {k: v for k, v in old.items() if k not in new}
changed = {k: (old[k], new[k]) for k in old if k in new and old[k] != new[k]}
remained = {k: (old[k], new[k]) for k in old if k in new and old[k] == new[k]}
return added, removed, changed, remained
# NOTE: I like pretty terminals
def print_changes(added, removed, changed, remained):
table = Table(title="Changes")
table.add_column("Package", style="cyan")
table.add_column("Old Version", style="magenta")
table.add_column("New Version", style="green")
table.add_column("Status", style="yellow")
for package, version in added.items():
table.add_row(f':package: {package}', "", version, "Added")
for package, version in removed.items():
table.add_row(f':package: {package}', version, "", "Removed")
for package, versions in changed.items():
table.add_row(f':package: {package}', versions[0], versions[1], "Changed")
for package, versions in remained.items():
table.add_row(f':package: {package}', versions[0], versions[1], "Unchanged")
console.print(table)
def requirements_to_ini(requirments:dict) -> str:
ini = ''
for package, version in requirments.items():
# TODO: IMPROVEMENT 2: utilize a map of packages to rename.
# by default this would include pytorch --> torch.
# Ideally, this would figure it out automatically.
# NOTE: this is a hack to make the env.yml file compatible with the settings.ini file
# since the env.yml file uses pytorch and the settings.ini file uses torch.
# Add more elif statements if you need to change other package names.
if package == 'pytorch':
package = 'torch'
if version:
ini += f"{package}{version} "
else:
ini += f"{package} "
return ini
@app.command()
def update_requirements(
file: Optional[str] = typer.Option(
'env.mac.yml',
help="YAML file to extract the new requirements from.",
),
unchanged: Optional[bool] = typer.Option(
False,
help="Whether to print all packages, including the ones whose versions haven't changed.",
),
):
# NOTE: notice that file is `env.mac.yml` and not `env.yml`. Now with Apple Silicon I have
# one env file for more common CUDA versions and one for Apple Silicon.
cprint("bold cyan", f"Loading environment yaml file {file}...")
with open(file, 'r') as f:
env = yaml.safe_load(f)
# NOTE: read in the current dependencies from the conda env.yml file
cprint("bold cyan", "Extracting packages and their versions...")
new_requirements = extract_packages(env['dependencies'])
# NOTE: read in the previous requirements from the settings.ini file
cprint("bold cyan", "Loading settings.ini file...")
config = configparser.ConfigParser()
config.read('settings.ini')
cprint("bold cyan", "Comparing the old and new requirements...")
old_requirements = parse_requirements(config['DEFAULT']['requirements'])
# NOTE: check for changes
added, removed, changed, remained = compare_requirements(old_requirements, new_requirements)
# If --unchanged option is given, print unchanged packages as well
if unchanged:
print_changes(added, removed, changed, remained)
else:
print_changes(added, removed, changed, {})
# NOTE: update the requirements in the settings.ini file
cprint("bold cyan", "Updating the requirements...")
config['DEFAULT']['requirements'] = requirements_to_ini(new_requirements)
cprint("bold cyan", "Saving the updated settings.ini file...")
with open('settings.ini', 'w') as f:
config.write(f)
cprint("bold green", "Successfully updated the requirements in settings.ini!")
if __name__ == "__main__":
app()
Update env2ini
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论