英文:
Python - Insert column based on condition check from dictionary
问题
我正在尝试根据使用单独列的验证来插入一个包含值'True'和'False'的列。我遇到的问题是条件取决于另一列,它充当字典(使用正则表达式)的键。
例如:
我有的表格:
| 类型 | 值 |
|---|---|
| 类型A | a1111 |
| 类型B | 1b111 |
| 类型C | 11c11 |
| 类型D | 111d1 |
| 类型D | 1111e |
我有的字典:
| 列A | 列B |
|---|---|
| A | \w\d\d\d\d |
| B | \d\w\d\d\d |
| C | \d\d\w\d\d |
| D | \d\d\d\w\d |
我想要的结果:
| 类型 | 值 | 结果 |
|---|---|---|
| 类型A | a1111 | True |
| 类型B | 1b111 | True |
| 类型C | 11c11 | True |
| 类型D | 111d1 | True |
| 类型D | 1111e | False |
任何帮助将不胜感激!
我尝试过使用numpy.where()来尝试解决,但运气不太好。
英文:
I am trying to insert a column with values 'True' and 'False' based on a validation using a separate column. The issue I'm having is that the condition is dependent on another column, acting as the dictionary (which uses regex) key.
E.g.
Table I have:
| Type | Value |
|---|---|
| TypeA | a1111 |
| TypeB | 1b111 |
| TypeC | 11c11 |
| TypeD | 111d1 |
| TypeD | 1111e |
Dictionary I have:
| Column A | Column B |
|---|---|
| A | \w\d\d\d\d |
| B | \d\w\d\d\d |
| C | \d\d\w\d\d |
| D | \d\d\d\w\d |
Result I want:
| Type | Value | Result |
|---|---|---|
| TypeA | a1111 | True |
| TypeB | 1b111 | True |
| TypeC | 11c11 | True |
| TypeD | 111d1 | True |
| TypeD | 1111e | False |
Any help would be appreciated!
I have tried playing around with numpy.where() but haven't had much luck.
答案1
得分: 0
只有一行
df = pd.DataFrame({'Type': ['TypeA', 'TypeB', 'TypeC', 'TypeD', 'TypeD'],
'Value': ['a1111', '1b111', '11c11', '111d1', '1111e']})
re_dict = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
df['result'] = df.apply(lambda row: re.match(re_dict[row['Type'][-1]], row['Value']) is not None, axis=1)
Type Value result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
英文:
Just One Line
df=pd.DataFrame({'Type':'TypeA','TypeB','TypeC','TypeD','TypeD'],
'Value':['a1111','1b111','11c11','111d1','1111e']})
re_dict={'A':r'\w\d\d\d\d','B':r'\d\w\d\d\d','C':r'\d\d\w\d\d','D':r'\d\d\d\w\d'}
df['result']=df.apply(lambda row:re.match(re_dict[row['Type'][-1]],row['Value'])!=None,axis=1)
Type Value result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
答案2
得分: 0
您可以使用np.vectorize来创建一个 lambda 函数,该函数接受 Value 和模式,并根据 re.match 的输出返回 True 或 False。
import re
# 创建数据框
df = pd.DataFrame({
"Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"],
"Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
})
# 创建字典数据框
df_dict = pd.DataFrame({
"Column A": ["A", "B", "C", "D"],
"Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
})
# 在每个值的开头添加 "Type" 以匹配主数据框中的 "Type" 列
df_dict["Column A"] = "Type" + df_dict["Column A"]
# 合并两个数据框以获取相应的正则表达式模式用于 "Type"
df = df.merge(df_dict, left_on="Type", right_on="Column A")
match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False) # 创建一个矢量化函数以匹配正则表达式模式
df["Result"] = match_func(df["Value"], df["Column B"]) # 将结果添加到数据框中
df = df.drop(columns=["Column A", "Column B"]) # 删除不再需要的列
df
这段代码创建了一个数据框 df,并使用 np.vectorize 来创建一个函数,将每个 Value 与相应的正则表达式模式进行匹配,结果存储在 "Result" 列中。最后,删除了不再需要的列,得到了最终的数据框。
英文:
You can use np.vectorize to create a lambda function that takes Value and pattern, and return True or False based on the output of re.match.
import re
# Create the dataframe
df = pd.DataFrame({
"Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"],
"Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
})
# Create the dictionary dataframe
df_dict = pd.DataFrame({
"Column A": ["A", "B", "C", "D"],
"Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
})
# Add "Type" to the beginning of each value to match the "Type" column in the main dataframe
df_dict["Column A"] = "Type" + df_dict["Column A"]
# Merge the two dataframes to get corresponding regex pattern for "Type"
df = df.merge(df_dict, left_on="Type", right_on="Column A")
match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False) # Create a vectorized function to match the regex pattern
df["Result"] = match_func(df["Value"], df["Column B"]) # Add the result to the dataframe
df = df.drop(columns=["Column A", "Column B"]) # Drop the columns that are no longer needed
df
Type Value Result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
答案3
得分: 0
使用Series.map和re.match函数:
d = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
df['Result'] = df.Type.str[-1].map(d)
df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)
Type Value Result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
英文:
With Series.map and re.match functions:
d = {'A':'\w\d\d\d\d','B':'\d\w\d\d\d','C':'\d\d\w\d\d','D':'\d\d\d\w\d'}
df['Result'] = df.Type.str[-1].map(d)
df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)
Type Value Result
0 TypeA a1111 True
1 TypeB 1b111 True
2 TypeC 11c11 True
3 TypeD 111d1 True
4 TypeD 1111e False
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论