2023年4月7日 01:52:22go评论106阅读模式

英文:

Using Regex to pull ID specific columns in a dataframe

问题

我有一个来自组织切片数据集的数据框，其中包含以下列

图像
名字
tumor_stroma_epi_nsclc_v2: 上皮 %
tumor_stroma_epi_nsclc_v2: 上皮面积（μm^2）
tumor_stroma_epi_nsclc_v2: 坏死 %
tumor_stroma_epi_nsclc_v2: 坏死面积（μm^2）
tumor_stroma_epi_nsclc_v2: 基质 %
tumor_stroma_epi_nsclc_v2: 基质面积（μm^2）
tumor_stroma_epi_nsclc_v2: 肿瘤 %
tumor_stroma_epi_nsclc_v2: 肿瘤面积（μm^2）
面积（μm^2）

列名中的 nsclc_v2 部分根据不同的组织类型在多个不同数据集中是可变的。我想创建一个正则表达式，用于删除可以识别所有具有相同格式但不同组织类型的列的百分比列。到目前为止，以下是我能想到的内容。

tumor_temp.drop(columns=['Image', 'Name',
                        '^tumor_stroma_epi_[a-z0-9_]+: Epithelium %$',
                        '^tumor_stroma_epi_[a-z0-9_]+: Necrosis %$',
                        '^tumor_stroma_epi_[a-z0-9_]+: Stroma %$',
                        '^tumor_stroma_epi_[a-z0-9_]+: Tumor %$',
                        'Area μm^2'], inplace=True)

如果需要进一步的帮助，请告诉我。

英文:

I have a dataframe from a tissue slide dataset with the following columns

Image
Name
tumor_stroma_epi_nsclc_v2: Epithelium %
tumor_stroma_epi_nsclc_v2: Epithelium area Âµm^2
tumor_stroma_epi_nsclc_v2: Necrosis %
tumor_stroma_epi_nsclc_v2: Necrosis area Âµm^2
tumor_stroma_epi_nsclc_v2: Stroma %
tumor_stroma_epi_nsclc_v2: Stroma area Âµm^2
tumor_stroma_epi_nsclc_v2: Tumor %
tumor_stroma_epi_nsclc_v2: Tumor area Âµm^2
Area Âµm^2

The nsclc_v2 component of the columns is variable across multiple different datasets depending on the different tissue types. I want to create a regex to drop the % columns that can recognize all columns with the same format, but different tissue types. So far, this is all I was able to come up with.

tumor_temp.drop(columns=[&#39;Image&#39;,&#39;Name&#39;,
                         &#39;^tumor_stroma_epi_[a-z0-9_]: Epithelium %$&#39;,
                         &#39;^tumor_stroma_epi_[a-z0-9_]: Necrosis %$&#39;,
                         &#39;^tumor_stroma_epi_[a-z0-9_]: Stroma %$&#39;,
                         &#39;^tumor_stroma_epi_[a-z0-9_]: Tumor %$&#39;,
                         &#39;Area &#194;&#181;?m^2&#39;], inplace=True)

Apologies if this is a little basic, I mostly have an R background.

答案1

得分: 2

你可以使用pandas中的filter()函数：

import re
pattern = re.compile("^tumor_stroma_epi_[a-z0-9_]+:.*%$")  # 用于匹配包含%的列的正则表达式
cols_to_drop = df.filter(regex=pattern).columns
df.drop(columns=cols_to_drop, inplace=True)

英文:

You can use the filter() function from pandas:

import re
pattern = re.compile(&quot;^tumor_stroma_epi_[a-z0-9_]+:.*%$&quot;)  # regular expression to match columns with %
cols_to_drop = df.filter(regex=pattern).columns
df.drop(columns=cols_to_drop, inplace=True)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用正则表达式从数据框中提取特定的ID列

问题

答案1

Trying to build a jacobian matrix using the multiprocessing library in python – how to share a matrix variable across multiple processes?

如何提高这个微调后的BERT模型的神经网络的结果？

Bokeh：无法显示具有月份轴的数据

无法向Keras层内的权重添加随机噪声。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。