2023年2月6日 14:45:59go评论95阅读模式

英文:

Python - Insert column based on condition check from dictionary

问题

我正在尝试根据使用单独列的验证来插入一个包含值'True'和'False'的列。我遇到的问题是条件取决于另一列，它充当字典（使用正则表达式）的键。

例如：

我有的表格：

类型	值
类型A	a1111
类型B	1b111
类型C	11c11
类型D	111d1
类型D	1111e

我有的字典：

列A	列B
A	\w\d\d\d\d
B	\d\w\d\d\d
C	\d\d\w\d\d
D	\d\d\d\w\d

我想要的结果：

类型	值	结果
类型A	a1111	True
类型B	1b111	True
类型C	11c11	True
类型D	111d1	True
类型D	1111e	False

任何帮助将不胜感激！

我尝试过使用numpy.where()来尝试解决，但运气不太好。

英文:

I am trying to insert a column with values 'True' and 'False' based on a validation using a separate column. The issue I'm having is that the condition is dependent on another column, acting as the dictionary (which uses regex) key.

E.g.

Table I have:

Type	Value
TypeA	a1111
TypeB	1b111
TypeC	11c11
TypeD	111d1
TypeD	1111e

Dictionary I have:

Column A	Column B
A	\w\d\d\d\d
B	\d\w\d\d\d
C	\d\d\w\d\d
D	\d\d\d\w\d

Result I want:

Type	Value	Result
TypeA	a1111	True
TypeB	1b111	True
TypeC	11c11	True
TypeD	111d1	True
TypeD	1111e	False

Any help would be appreciated!

I have tried playing around with numpy.where() but haven't had much luck.

答案1

得分: 0

只有一行

df = pd.DataFrame({'Type': ['TypeA', 'TypeB', 'TypeC', 'TypeD', 'TypeD'],
                   'Value': ['a1111', '1b111', '11c11', '111d1', '1111e']})
re_dict = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
df['result'] = df.apply(lambda row: re.match(re_dict[row['Type'][-1]], row['Value']) is not None, axis=1)

     Type  Value  result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False

英文:

Just One Line

df=pd.DataFrame({&#39;Type&#39;:&#39;TypeA&#39;,&#39;TypeB&#39;,&#39;TypeC&#39;,&#39;TypeD&#39;,&#39;TypeD&#39;],
                 &#39;Value&#39;:[&#39;a1111&#39;,&#39;1b111&#39;,&#39;11c11&#39;,&#39;111d1&#39;,&#39;1111e&#39;]})
re_dict={&#39;A&#39;:r&#39;\w\d\d\d\d&#39;,&#39;B&#39;:r&#39;\d\w\d\d\d&#39;,&#39;C&#39;:r&#39;\d\d\w\d\d&#39;,&#39;D&#39;:r&#39;\d\d\d\w\d&#39;}
df[&#39;result&#39;]=df.apply(lambda row:re.match(re_dict[row[&#39;Type&#39;][-1]],row[&#39;Value&#39;])!=None,axis=1)
	Type	Value	result
0	TypeA	a1111	True
1	TypeB	1b111	True
2	TypeC	11c11	True
3	TypeD	111d1	True
4	TypeD	1111e	False

答案2

得分: 0

您可以使用np.vectorize来创建一个 lambda 函数，该函数接受 Value 和模式，并根据 re.match 的输出返回 True 或 False。

import re
# 创建数据框
df = pd.DataFrame({
    "Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"], 
    "Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
})
# 创建字典数据框
df_dict = pd.DataFrame({
    "Column A": ["A", "B", "C", "D"], 
    "Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
})
# 在每个值的开头添加 "Type" 以匹配主数据框中的 "Type" 列
df_dict["Column A"] = "Type" + df_dict["Column A"]
# 合并两个数据框以获取相应的正则表达式模式用于 "Type"
df = df.merge(df_dict, left_on="Type", right_on="Column A")
match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False)  # 创建一个矢量化函数以匹配正则表达式模式
df["Result"] = match_func(df["Value"], df["Column B"])  # 将结果添加到数据框中
df = df.drop(columns=["Column A", "Column B"])  # 删除不再需要的列
df

这段代码创建了一个数据框 df，并使用 np.vectorize 来创建一个函数，将每个 Value 与相应的正则表达式模式进行匹配，结果存储在 "Result" 列中。最后，删除了不再需要的列，得到了最终的数据框。

英文:

You can use np.vectorize to create a lambda function that takes Value and pattern, and return True or False based on the output of re.match.

import re
# Create the dataframe
df = pd.DataFrame({
    &quot;Type&quot;: [&quot;TypeA&quot;, &quot;TypeB&quot;, &quot;TypeC&quot;, &quot;TypeD&quot;, &quot;TypeD&quot;], 
    &quot;Value&quot;: [&quot;a1111&quot;, &quot;1b111&quot;, &quot;11c11&quot;, &quot;111d1&quot;, &quot;1111e&quot;]
})
# Create the dictionary dataframe
df_dict = pd.DataFrame({
    &quot;Column A&quot;: [&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;], 
    &quot;Column B&quot;: [r&quot;\w\d\d\d\d&quot;, r&quot;\d\w\d\d\d&quot;, r&quot;\d\d\w\d\d&quot;, r&quot;\d\d\d\w\d&quot;]
})
# Add &quot;Type&quot; to the beginning of each value to match the &quot;Type&quot; column in the main dataframe
df_dict[&quot;Column A&quot;] = &quot;Type&quot; + df_dict[&quot;Column A&quot;]
# Merge the two dataframes to get corresponding regex pattern for &quot;Type&quot;
df = df.merge(df_dict, left_on=&quot;Type&quot;, right_on=&quot;Column A&quot;)
match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False)  # Create a vectorized function to match the regex pattern
df[&quot;Result&quot;] = match_func(df[&quot;Value&quot;], df[&quot;Column B&quot;])  # Add the result to the dataframe
df = df.drop(columns=[&quot;Column A&quot;, &quot;Column B&quot;])  # Drop the columns that are no longer needed
df
    Type  Value  Result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False

答案3

得分: 0

使用Series.map和re.match函数：

d = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
df['Result'] = df.Type.str[-1].map(d)
df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)

    Type  Value  Result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False

英文:

With Series.map and re.match functions:

d = {&#39;A&#39;:&#39;\w\d\d\d\d&#39;,&#39;B&#39;:&#39;\d\w\d\d\d&#39;,&#39;C&#39;:&#39;\d\d\w\d\d&#39;,&#39;D&#39;:&#39;\d\d\d\w\d&#39;}
df[&#39;Result&#39;] = df.Type.str[-1].map(d)
df[&#39;Result&#39;] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)

   Type  Value  Result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python – 根据字典中的条件检查插入列

问题

答案1

答案2

答案3

你可以使用shutil.copy()和os.rename()在Python中覆盖不同文件夹中的文件。

Understand goless.select from the sample code

Tkinter框架的高度和宽度与主窗口的完整尺寸不匹配。

如何在DataFrame中交换月份和日期的数值？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论