英文:
Python - Insert column based on condition check from dictionary
问题
我正在尝试根据使用单独列的验证来插入一个包含值'True'和'False'的列。我遇到的问题是条件取决于另一列,它充当字典(使用正则表达式)的键。
例如:
我有的表格:
类型 | 值 |
---|---|
类型A | a1111 |
类型B | 1b111 |
类型C | 11c11 |
类型D | 111d1 |
类型D | 1111e |
我有的字典:
列A | 列B |
---|---|
A | \w\d\d\d\d |
B | \d\w\d\d\d |
C | \d\d\w\d\d |
D | \d\d\d\w\d |
我想要的结果:
类型 | 值 | 结果 |
---|---|---|
类型A | a1111 | True |
类型B | 1b111 | True |
类型C | 11c11 | True |
类型D | 111d1 | True |
类型D | 1111e | False |
任何帮助将不胜感激!
我尝试过使用numpy.where()来尝试解决,但运气不太好。
英文:
I am trying to insert a column with values 'True' and 'False' based on a validation using a separate column. The issue I'm having is that the condition is dependent on another column, acting as the dictionary (which uses regex) key.
E.g.
Table I have:
Type | Value |
---|---|
TypeA | a1111 |
TypeB | 1b111 |
TypeC | 11c11 |
TypeD | 111d1 |
TypeD | 1111e |
Dictionary I have:
Column A | Column B |
---|---|
A | \w\d\d\d\d |
B | \d\w\d\d\d |
C | \d\d\w\d\d |
D | \d\d\d\w\d |
Result I want:
Type | Value | Result |
---|---|---|
TypeA | a1111 | True |
TypeB | 1b111 | True |
TypeC | 11c11 | True |
TypeD | 111d1 | True |
TypeD | 1111e | False |
Any help would be appreciated!
I have tried playing around with numpy.where() but haven't had much luck.
答案1
得分: 0
只有一行
df = pd.DataFrame({'Type': ['TypeA', 'TypeB', 'TypeC', 'TypeD', 'TypeD'],
'Value': ['a1111', '1b111', '11c11', '111d1', '1111e']})
re_dict = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
df['result'] = df.apply(lambda row: re.match(re_dict[row['Type'][-1]], row['Value']) is not None, axis=1)
Type Value result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
英文:
Just One Line
df=pd.DataFrame({'Type':'TypeA','TypeB','TypeC','TypeD','TypeD'],
'Value':['a1111','1b111','11c11','111d1','1111e']})
re_dict={'A':r'\w\d\d\d\d','B':r'\d\w\d\d\d','C':r'\d\d\w\d\d','D':r'\d\d\d\w\d'}
df['result']=df.apply(lambda row:re.match(re_dict[row['Type'][-1]],row['Value'])!=None,axis=1)
Type Value result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
答案2
得分: 0
您可以使用np.vectorize
来创建一个 lambda 函数,该函数接受 Value
和模式,并根据 re.match
的输出返回 True
或 False
。
import re
# 创建数据框
df = pd.DataFrame({
"Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"],
"Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
})
# 创建字典数据框
df_dict = pd.DataFrame({
"Column A": ["A", "B", "C", "D"],
"Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
})
# 在每个值的开头添加 "Type" 以匹配主数据框中的 "Type" 列
df_dict["Column A"] = "Type" + df_dict["Column A"]
# 合并两个数据框以获取相应的正则表达式模式用于 "Type"
df = df.merge(df_dict, left_on="Type", right_on="Column A")
match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False) # 创建一个矢量化函数以匹配正则表达式模式
df["Result"] = match_func(df["Value"], df["Column B"]) # 将结果添加到数据框中
df = df.drop(columns=["Column A", "Column B"]) # 删除不再需要的列
df
这段代码创建了一个数据框 df
,并使用 np.vectorize
来创建一个函数,将每个 Value
与相应的正则表达式模式进行匹配,结果存储在 "Result" 列中。最后,删除了不再需要的列,得到了最终的数据框。
英文:
You can use np.vectorize
to create a lambda function that takes Value
and pattern, and return True
or False
based on the output of re.match
.
import re
# Create the dataframe
df = pd.DataFrame({
"Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"],
"Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
})
# Create the dictionary dataframe
df_dict = pd.DataFrame({
"Column A": ["A", "B", "C", "D"],
"Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
})
# Add "Type" to the beginning of each value to match the "Type" column in the main dataframe
df_dict["Column A"] = "Type" + df_dict["Column A"]
# Merge the two dataframes to get corresponding regex pattern for "Type"
df = df.merge(df_dict, left_on="Type", right_on="Column A")
match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False) # Create a vectorized function to match the regex pattern
df["Result"] = match_func(df["Value"], df["Column B"]) # Add the result to the dataframe
df = df.drop(columns=["Column A", "Column B"]) # Drop the columns that are no longer needed
df
Type Value Result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
答案3
得分: 0
使用Series.map
和re.match
函数:
d = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
df['Result'] = df.Type.str[-1].map(d)
df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)
Type Value Result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
英文:
With Series.map
and re.match
functions:
d = {'A':'\w\d\d\d\d','B':'\d\w\d\d\d','C':'\d\d\w\d\d','D':'\d\d\d\w\d'}
df['Result'] = df.Type.str[-1].map(d)
df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)
Type Value Result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论