How to convert a conda env yaml file to a list of requirements for a settings.ini file accounting for channels and conversions for pypi

huangapple go评论77阅读模式
英文:

How to convert a conda env yaml file to a list of requirements for a settings.ini file accounting for channels and conversions for pypi

问题

Motivation

我使用并喜欢nbdev。它使得逐步开发一个Python包变得非常容易。尤其是因为我通常会在使用该包的项目旁边开发它。这个问题是否需要了解nbdev不需要。它只是解释了我为什么提出这个问题。通常情况下,当我创建一个新的nbdev项目(nbdev_new)时,我会得到一个settings.ini文件和一个setup.py文件。为了使在不同的包/项目上继续工作变得简单,我会立即为项目创建一个conda环境文件 env.yml(参见文件 > 示例Conda文件[也许这个链接有效][示例环境文件])。

我知道开发包的环境不一定是它所需的最低依赖关系。此外,一个包的依赖关系是使用该包进行项目工作时可能需要的那些依赖关系的子集。在我的用例中,我很明确地双重使用了它!我在项目上使用它时正在开发包。

因此,出于这个问题的考虑,让我们假设包依赖关系 == 项目依赖关系。换句话说,env.yml文件包含了settings.ini文件的所有requirements

nbdev工作流程

  1. 创建一个新的空仓库“current_project”并克隆它

  2. cd path/to/current_project

  3. nbdev_new

  4. 创建env.yml文件

  5. 创建/更新环境:

# 创建conda环境
$ mamba env create -f env.yml

# 根据需要更新conda环境
$ mamba env update -n current_project --file env.yml
# $ mamba env update -n current_project --file env.mac.yml
  1. 激活环境:
# 激活conda环境
$ conda activate current_project
  1. 安装current_project
# 用于本地开发安装
$ pip install -e .

问题阐述

我正在使用setup.py文件开发一个Python包。我的包可能有依赖关系(在settings.ini下的键requirements中列出),这些依赖关系会自动导入并在setup.py文件中使用。在开发我的包时,我有一个conda环境,该环境在一个名为env.yml的yaml文件中指定(参见文件 > 示例Conda文件[也许这个链接有效][示例环境文件])。

我还有一些GitHub操作,用于测试我的包。我不喜欢手动更新settings.ini(尤其是因为它不允许多行),以将要求导入setup.py中。尤其是因为我已经在我的env.yml文件中将它们列得很漂亮。所以我的问题如下:

问题

> 鉴于一个conda环境yaml文件(例如env.yml),如何迭代其内容并将依赖关系(和版本)转换为正确的pypi版本(setup.py所需),并将它们存储在settings.ini中的关键字requirements下?

注意

  • conda中的版本规范要求与pypi不同。特别是===等等。
  • conda的包名称可能与pypi不同。例如,pytorchpypi上列为torch,而在conda中列为pytorch
  • 环境yaml文件可能包含渠道说明符,例如conda-forge::<package-name>
  • 环境yaml文件可能指定Python版本,例如python>=3.10,这不应该是一个要求。
  • 我的理想解决方案适用于我的工作流程。这意味着env.yml的内容需要传递到settings.ini中。

期望的结果

我的期望结果是,我可以将所有包要求存储在conda环境文件env.yml中,并自动将它们找到并放入setup.py文件的install_requires中。由于我的工作流程是围绕从settings.ini文件(来自nbdev)中读取要求来构建的,所以我希望解决方案能够获取env.yml的值并将它们放入settings.ini中。

注意:我会在下面的答案中分享我的当前解决方案。请帮助/改进它!

文件

示例conda文件

# 示例YAML文件

name: current_project
channels:
  - pytorch
  - conda-forge  
  - fastai

dependencies:  
  - python>=3.10

 # 实用工具
 # -------------------------------------------------------------------------
  - tqdm
  - rich
  - typer

  # Jupyter Notebook
  # -------------------------------------------------------------------------
  - conda-forge::notebook
  - conda-forge::ipykernel
  - conda-forge::ipywidgets
  - conda-forge::jupyter_contrib_nbextensions
  
  # nbdev
  # -------------------------------------------------------------------------
  - fastai::nbdev>=2.3.12

  # PyTorch &深度学习
  # -------------------------------------------------------------------------
  - pytorch>=2
  # 注意:如果使用支持CUDA的GPU,添加pytorch-cuda。如果您使用的是Apple Silicon,则需要删除此项
  # - pytorch::pytorch-cuda
  - conda-forge::pytorch-lightning

  # 绘图
  # -------------------------------------------------------------------------
  - conda-forge::matplotlib
  - conda-forge::seaborn
  
  # 数据整理
  # -------------------------------------------------------------------------
  - conda-forge::scikit

<details>
<summary>英文:</summary>

**NOTE**: you can start at [Problem formulation][problem formulation]. The motivation section just explains how I found myself asking this.

# Motivation
I use and love [nbdev]. It makes developing a python package iteratively as easy as it gets. Especially, as I generally do this along side a project which uses said package. Does this question require knowledge of [nbdev]? **no**. It only motivates why I am asking. Normally when I create a new [nbdev] project (`nbdev_new`), I get a `settings.ini` file and `setup.py` file. In order to keep working on different packages / projects simple, I will immediately create a [conda environment file][conda yml] `env.yml` for the project (see `Files &gt; Example Conda File` [maybe this link works][example env file]). 

I know that the environment one develops a package in is **NOT** necessarily the minimum dependencies it requires. Further, a package&#39;s dependencies are a subset of those one may _need_ for a working on a project utilizing said package. In **MY** use case it is clear that I am double-dipping! I am *developing* the package as I _use_ it on a project. 

So for the sake of this question let&#39;s assume that the `package dependencies == project dependencies`. In other words, the `env.yml` file contains all of the `requirements` for the `setting.ini` file.

## nbdev workflow

1. make new empty repo &quot;current_project&quot; and clone it

2. `cd path/to/current_project`

3. `nbdev_new`

4. make `env.yml` file

5. create / update environment:
```sh
# create conda environment
$ mamba env create -f env.yml

# update conda environment as needed
$ mamba env update -n current_project --file env.yml
# $ mamba env update -n current_project --file env.mac.yml
  1. active environment:
# activate conda environment
$ conda activate current_project
  1. install current_project:
# install for local development
$ pip install -e .

Problem formulation

I am developing a package in python using a setup.py file. My package may have requirements (listed under settings.ini with the key requirements) that get automatically important and used in the setup.py file. While developing my package I have a conda environment which is specified in a yaml file env.yml (see Files &gt; Example Conda File maybe this link works).

I also have some GitHub actions that test my package. I dislike having to update settings.ini manually (especially since it doesn't allow for multiple lines) to get the requirements into setup.py. Especially as I have already listed them out nice and neatly in my env.yml file. So my question is as follows:

Question

> Given a conda environment yaml file (e.g. env.yml) how can one iterate through its content and convert the dependencies (and the versions) to the correct pypi version (required by setup.py), storing them in settings.ini under the keyword requirements?

caveats:

  • version specifier requirements in conda are not the same as pypi. Most notably = vs ==, amongst others.
  • package names for conda may not be the same for pypi. For example pytorch is listed as torch for pypi and pytorch for conda.
  • the environment yaml file may have channel specifiers e.g. conda-forge::&lt;package-name&gt;
  • the environment yaml file may specify the python version e.g. python&gt;=3.10, which shouldn't be a requirement.
  • MY ideal solution works with my workflow. That means the contents of env.yml need to get transferred to settings.ini.

Desired outcome

My desired outcome is that I can store all of my package requirements in the conda environment file env.yml and have them automatically find themselves in the setup.py file under install_requires. Since my workflow is built around reading the requirements in from a settings.ini file (from nbdev), I would like the solution to take the values of env.yml and put them in settings.ini.

Note I am sharing my current solution as an answer below. Please help / improve it!

Files

Example conda file

# EXAMPLE YAML FILE

name: current_project
channels:
  - pytorch
  - conda-forge  
  - fastai

dependencies:  
  - python&gt;=3.10

 # Utilities
 # -------------------------------------------------------------------------
  - tqdm
  - rich
  - typer

  # Jupyter Notebook
  # -------------------------------------------------------------------------
  - conda-forge::notebook
  - conda-forge::ipykernel
  - conda-forge::ipywidgets
  - conda-forge::jupyter_contrib_nbextensions
  
  # nbdev
  # -------------------------------------------------------------------------
  - fastai::nbdev&gt;=2.3.12

  # PyTorch &amp; Deep Learning
  # -------------------------------------------------------------------------
  - pytorch&gt;=2
  # NOTE: add pytorch-cuda if using a CUDA enabled GPU. You will need to 
  #       remove this if you are on Apple Silicon
  # - pytorch::pytorch-cuda
  - conda-forge::pytorch-lightning

  # Plotting
  # -------------------------------------------------------------------------
  - conda-forge::matplotlib
  - conda-forge::seaborn
  
  # Data Wrangling
  # -------------------------------------------------------------------------
  - conda-forge::scikit-learn
  - pandas&gt;=2
  - numpy
  - scipy    

  # Pip / non-conda packages
  # -------------------------------------------------------------------------
  - pip
  - pip: 
    # PyTorch &amp; Deep Learning
    # -----------------------------------------------------------------------
    - dgl

答案1

得分: 1

Current Solution

当前解决方案是文件 env_to_ini.py(查看 Files > env_to_ini.py 也许这个链接有效)。

注意:此解决方案使用 richtyper 创建了一个信息丰富的命令行界面(CLI),它将显示添加、更改、移除或未更改的依赖项。这应该会使与脚本一起工作更容易(特别是如果其中存在错误的话),因为它尚未经过广泛测试。

如何使用 env_to_ini.py

假设:

  • 项目根目录下有 env.ymlenv.mac.yml 文件。
  • 项目根目录下有 settings.ini 文件。
  • 项目根目录下有 env_to_ini.py 文件。

此脚本提供了这样一个功能,如果 env.yml(或 env.mac.yml)文件发生更改,您可以自动更新 current_project 包(在 settings.ini 中)的依赖关系,使其与之匹配。

# 默认用法
$ python env_to_ini.py

# 显示未更改的包
$ python env_to_ini.py  --unchanged  

# 指定不同的环境文件
$ python env_to_ini.py  --unchanged  --file=env.mac.yml

注意事项

这有点糟糕。您可以根据需要为每个项目进行修改。所谓的 "糟糕" 主要位于两个 TODO 下,我现在会解释。注意TODO 2 比 TODO 1 更重要。

TODO 1:排除

在提供的脚本中搜索以下内容:

# TODO: 改进 1:使用要排除的包的列表。
# 默认情况下,这将包括 python 和 pip。

根据原始问题,需要从 conda 环境文件 env.yml 中排除一些包。即 python。在第一个 TODO 下,通过 if / elif / else 语句目前实现了这一点,您可以更改脚本以接受包含要排除的包或包含这些包的附加文件的额外参数。

注意
如果您修改 env.yml 文件以在 dependencies 后面添加一个名为 ignore 的部分,不清楚它是否会影响您在 conda 中的使用。例如:

dependencies:
  - python>=3.10
  # - ...
ignore:
  - python
  - pip
  # ...

TODO 2:映射

在提供的脚本中搜索以下内容:

# TODO: 改进 2:使用要重命名的包的映射。
# 默认情况下,这将包括 pytorch --> torch。
# 理想情况下,它会自动弄清楚。

根据原始问题,需要重命名一些包,因为它们在 conda 上的包名称与在 pypi 上的包名称不同。给出的示例是 pytorch,在 pypi 上列为 torch,在 conda 上列为 pytorch

在函数 requirements_to_ini 下,目前通过使用 if / elif / else 语句来实现这一点。您可以更改脚本以接受包含要重命名的包或包含这些包的附加文件的额外参数。

注意
如果您修改 env.yml 文件以在 dependencies 后面添加一个名为 rename 的部分,不清楚它是否会影响您在 conda 中的使用。例如:

dependencies:
  - python>=3.10
  # - ...
rename:
  - pytorch,torch
  # ...

注意
根据您的问题的期望结果,不清楚如何自动确定这一点。

Files

env_to_ini.py

# env_to_ini.py

import yaml
import configparser
from rich.console import Console
from rich.table import Table

import typer
from typing import Optional, Tuple

app = typer.Typer()
console = Console()


# NOTE: utility function to print colored text
def cprint(style: str, text: str) -> None:
    console.print(f"[{style}]{text}[/{style}]")


def has_channel(requirements_str: str) -> bool:
    return '::' in requirements_str


def extract_channel(requirements_str: str) -> Tuple[Optional[str], str]:
    channel = None
    if has_channel(requirements_str):
        channel, requirements_str = requirements_str.split('::', 1)
    return channel, requirements_str


def is_not_valid_package_char(s: str) -> bool:
    return not (s.isalnum() or s in ['-', '_', '.'])


def split_str_at_first_non_alpha(s: str) -> Tuple[str, str]:
    idx = next((
        i for i, char in enumerate(s)
        if is_not_valid_package_char(char)
    ), len(s))
    return s[:idx], s[idx:]


def split_package_version(s: str) -> Tuple[str, str]:
    # NOTE: alias for split_str_at_first_non_alpha
    return split_str_at_first_non_alpha(s)


# NOTE: this parses requirements from the settings.ini file. Thus there is one line and each package is separated by a space.
def parse_requirements(requirements_str):
    requirements = {}
    for req in requirements_str.split():
        package, version = split_package_version(req)
        requirements[package] = version
    return requirements


# NOTE: this parse depdencies form the env.yml file.
def extract_packages(dependencies):
    packages = {}
    for dep in dependencies:

        if isinstance(dep, str):
            channel, package_version = extract_channel(dep)
            package, version = split_package_version(package_version)

            # TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
            #       by default this would include python and pip.

            # NOTE: we do not need to add python to the requirements
            if package == 'python':
                continue

            # NOTE: likewise we do not need pip
            elif package == 'pip':
                continue

            packages[package] = version



<details>
<summary>英文:</summary>

# Current Solution

The current solution is the file `env_to_ini.py` (see `Files &gt; env_to_ini.py` [maybe this link works][script]). 

**NOTE** this solution uses [`rich`][rich] and [`typer`][typer] to create an informative command line interface (CLI) which will show you which dependencies were added, changed, removed, or unchanged. This should make working with the script easier (especially should there be bugs in it) as it has not been extensively tested.

## How to use `env_to_ini.py`

Assumptions:
- `env.yml` or `env.mac.yml` under project root
- `settings.ini` under project root
- `env_to_ini.py` under project root

This script is provided so that if the `env.yml` (or `env.mac.yml`) file changes you can automatically update the dependencies of the `current_project` package (under `settings.ini`) to match.

```shell
# default usage
$ python env_to_ini.py

# show packages that didnt change
$ python env_to_ini.py  --unchanged  

# specify a different environment file
$ python env_to_ini.py  --unchanged  --file=env.mac.yml

Caveats

This is a bit hacky. You can modify it per project as needed. The so called "hackiness" is primarily located under the two TODOs, which I will now explain. Note that TODO 2 is more important than TODO 1.

TODO 1: exclusion

Search for the following in the provided script:

# TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
#       by default this would include python and pip.        

Per the original question, some packages are to be excluded from the conda environment file env.yml. Namely, python. Under the first TODO, which achieves this currently via an if / elif / else statement, you could change the script to accept an additional argument containing package to exclude, or read in an additional file containing these.

NOTE:
It is unclear to me that if you were to modify your env.yml file to have a section after dependencies called ignore if it would mess up your usage with conda. e.g.

dependencies:
  - python&gt;=3.10
  # - ...
ignore:
  - python
  - pip
  # ...

TODO 2: mapping

Search for the following in the provided script:

# TODO: IMPROVEMENT 2: utilize a map of packages to rename.
#       by default this would include pytorch --&gt; torch.
#       Ideally, this would figure it out automatically.

Per the original question, some packages need to be renamed because their package name on conda is different than it is on pypi. The example give is pytorch which is listed as torch for pypi and pytorch for conda.

Under the function requirements_to_ini this is currently achieved by using an if / elif / else statement. You could change the script to accept an additional argument containing package to rename, or read in an additional file containing these.

NOTE:
It is unclear to me that if you were to modify your env.yml file to have a section after dependencies called rename if it would mess up your usage with conda. e.g.

dependencies:
  - python&gt;=3.10
  # - ...
rename:
  - pytorch,torch
  # ...

NOTE:
It is unclear to me how you could determine this automatically per your question's desired outcome.

Files

env_to_ini.py

# env_to_ini.py

import yaml
import configparser
from rich.console import Console
from rich.table import Table

import typer
from typing import Optional, Tuple

app = typer.Typer()
console = Console()


# NOTE: utility function to print colored text
def cprint(style:str, text:str) -&gt; None:
    console.print(f&quot;[{style}]{text}[/{style}]&quot;)

def has_channel(requirements_str:str) -&gt; bool:
    return &#39;::&#39; in requirements_str

def extract_channel(requirements_str:str) -&gt; Tuple[Optional[str], str]:
    channel = None    
    if has_channel(requirements_str):
        channel, requirements_str = requirements_str.split(&#39;::&#39;, 1)        
    return channel, requirements_str

def is_not_valid_package_char(s:str) -&gt; bool:
    return not (s.isalnum() or s in [&#39;-&#39;, &#39;_&#39;, &#39;.&#39;])

def split_str_at_first_non_alpha(s:str) -&gt; Tuple[str, str]:
    idx = next((
            i for i, char in enumerate(s) 
            if is_not_valid_package_char(char)
        ), len(s))
    return s[:idx], s[idx:]

def split_package_version(s:str) -&gt; Tuple[str, str]:
    # NOTE: alias for split_str_at_first_non_alpha
    return split_str_at_first_non_alpha(s)


# NOTE: this parses requirements from the settings.ini file. Thus there is one line and each package is separated by a space.
def parse_requirements(requirements_str):
    requirements = {}
    for req in requirements_str.split():
        package, version = split_package_version(req)
        requirements[package] = version
    return requirements

# NOTE: this parse depdencies form the env.yml file.
def extract_packages(dependencies):
    packages = {}
    for dep in dependencies:

        if isinstance(dep, str):
            channel, package_version = extract_channel(dep)
            package, version = split_package_version(package_version)


            # TODO: IMPROVEMENT 1: utilize a list of packages to exclude.
            #       by default this would include python and pip.
            
            # NOTE: we do not need to add python to the requirements
            if package == &#39;python&#39;:
                continue

            # NOTE: likewise we do not need pip
            elif package == &#39;pip&#39;:
                continue


            packages[package] = version
        
        elif isinstance(dep, dict):
            for key, values in dep.items():                
                if key == &#39;pip&#39;:
                    for pip_dep in values:                        
                        package, version = split_package_version(pip_dep)                        
                        packages[package] = version                        
    return packages

# NOTE: check if the depdencies in the env.yml file vary from the ones in the settings.ini file.
def compare_requirements(old, new):
    added = {k: v for k, v in new.items() if k not in old}
    removed = {k: v for k, v in old.items() if k not in new}
    changed = {k: (old[k], new[k]) for k in old if k in new and old[k] != new[k]}
    remained = {k: (old[k], new[k]) for k in old if k in new and old[k] == new[k]}
    return added, removed, changed, remained

# NOTE: I like pretty terminals
def print_changes(added, removed, changed, remained):
    table = Table(title=&quot;Changes&quot;)
    table.add_column(&quot;Package&quot;, style=&quot;cyan&quot;)
    table.add_column(&quot;Old Version&quot;, style=&quot;magenta&quot;)
    table.add_column(&quot;New Version&quot;, style=&quot;green&quot;)
    table.add_column(&quot;Status&quot;, style=&quot;yellow&quot;)

    for package, version in added.items():
        table.add_row(f&#39;:package: {package}&#39;, &quot;&quot;, version, &quot;Added&quot;)
    for package, version in removed.items():
        table.add_row(f&#39;:package: {package}&#39;, version, &quot;&quot;, &quot;Removed&quot;)        
    for package, versions in changed.items():
        table.add_row(f&#39;:package: {package}&#39;, versions[0], versions[1], &quot;Changed&quot;)
    for package, versions in remained.items():
        table.add_row(f&#39;:package: {package}&#39;, versions[0], versions[1], &quot;Unchanged&quot;)

    console.print(table)


def requirements_to_ini(requirments:dict) -&gt; str:
    ini = &#39;&#39;
    for package, version in requirments.items():
        # TODO: IMPROVEMENT 2: utilize a map of packages to rename.
        #       by default this would include pytorch --&gt; torch.
        #       Ideally, this would figure it out automatically.

        # NOTE: this is a hack to make the env.yml file compatible with the settings.ini file
        #       since the env.yml file uses pytorch and the settings.ini file uses torch.
        #       Add more elif statements if you need to change other package names.
        if package == &#39;pytorch&#39;:
            package = &#39;torch&#39;

        if version:
            ini += f&quot;{package}{version} &quot;
        else:
            ini += f&quot;{package} &quot;
    return ini


@app.command()
def update_requirements(
    file: Optional[str] = typer.Option(
        &#39;env.mac.yml&#39;, 
        help=&quot;YAML file to extract the new requirements from.&quot;,
    ),
    unchanged: Optional[bool] = typer.Option(
        False,
        help=&quot;Whether to print all packages, including the ones whose versions haven&#39;t changed.&quot;,
    ),
):
    # NOTE: notice that file is `env.mac.yml` and not `env.yml`. Now with Apple Silicon I have 
    #       one env file for more common CUDA versions and one for Apple Silicon.
    
    cprint(&quot;bold cyan&quot;, f&quot;Loading environment yaml file {file}...&quot;)
    with open(file, &#39;r&#39;) as f:
        env = yaml.safe_load(f)


    # NOTE: read in the current dependencies from the conda env.yml file
    cprint(&quot;bold cyan&quot;, &quot;Extracting packages and their versions...&quot;)
    new_requirements = extract_packages(env[&#39;dependencies&#39;])

    # NOTE: read in the previous requirements from the settings.ini file
    cprint(&quot;bold cyan&quot;, &quot;Loading settings.ini file...&quot;)
    config = configparser.ConfigParser()
    config.read(&#39;settings.ini&#39;)

    cprint(&quot;bold cyan&quot;, &quot;Comparing the old and new requirements...&quot;)
    old_requirements = parse_requirements(config[&#39;DEFAULT&#39;][&#39;requirements&#39;])

    # NOTE: check for changes
    added, removed, changed, remained = compare_requirements(old_requirements, new_requirements)

    # If --unchanged option is given, print unchanged packages as well
    if unchanged:
        print_changes(added, removed, changed, remained)
    else:
        print_changes(added, removed, changed, {})

    # NOTE: update the requirements in the settings.ini file
    cprint(&quot;bold cyan&quot;, &quot;Updating the requirements...&quot;)
    config[&#39;DEFAULT&#39;][&#39;requirements&#39;] = requirements_to_ini(new_requirements)

    cprint(&quot;bold cyan&quot;, &quot;Saving the updated settings.ini file...&quot;)
    with open(&#39;settings.ini&#39;, &#39;w&#39;) as f:
        config.write(f)

    cprint(&quot;bold green&quot;, &quot;Successfully updated the requirements in settings.ini!&quot;)

if __name__ == &quot;__main__&quot;:
    app()

Update env2ini

Now it is a CLI on pypi, conda, github, and "docs"

huangapple
  • 本文由 发表于 2023年5月29日 00:22:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76352457.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定