Python – 根据字典中的条件检查插入列

huangapple go评论64阅读模式
英文:

Python - Insert column based on condition check from dictionary

问题

我正在尝试根据使用单独列的验证来插入一个包含值'True'和'False'的列。我遇到的问题是条件取决于另一列,它充当字典(使用正则表达式)的键。

例如:

我有的表格:

类型
类型A a1111
类型B 1b111
类型C 11c11
类型D 111d1
类型D 1111e

我有的字典:

列A 列B
A \w\d\d\d\d
B \d\w\d\d\d
C \d\d\w\d\d
D \d\d\d\w\d

我想要的结果:

类型 结果
类型A a1111 True
类型B 1b111 True
类型C 11c11 True
类型D 111d1 True
类型D 1111e False

任何帮助将不胜感激!

我尝试过使用numpy.where()来尝试解决,但运气不太好。

英文:

I am trying to insert a column with values 'True' and 'False' based on a validation using a separate column. The issue I'm having is that the condition is dependent on another column, acting as the dictionary (which uses regex) key.

E.g.

Table I have:

Type Value
TypeA a1111
TypeB 1b111
TypeC 11c11
TypeD 111d1
TypeD 1111e

Dictionary I have:

Column A Column B
A \w\d\d\d\d
B \d\w\d\d\d
C \d\d\w\d\d
D \d\d\d\w\d

Result I want:

Type Value Result
TypeA a1111 True
TypeB 1b111 True
TypeC 11c11 True
TypeD 111d1 True
TypeD 1111e False

Any help would be appreciated!

I have tried playing around with numpy.where() but haven't had much luck.

答案1

得分: 0

只有一行

df = pd.DataFrame({'Type': ['TypeA', 'TypeB', 'TypeC', 'TypeD', 'TypeD'],
                   'Value': ['a1111', '1b111', '11c11', '111d1', '1111e']})
re_dict = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}

df['result'] = df.apply(lambda row: re.match(re_dict[row['Type'][-1]], row['Value']) is not None, axis=1)
     Type  Value  result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False
英文:

Just One Line

df=pd.DataFrame({'Type':'TypeA','TypeB','TypeC','TypeD','TypeD'],
                 'Value':['a1111','1b111','11c11','111d1','1111e']})
re_dict={'A':r'\w\d\d\d\d','B':r'\d\w\d\d\d','C':r'\d\d\w\d\d','D':r'\d\d\d\w\d'}

df['result']=df.apply(lambda row:re.match(re_dict[row['Type'][-1]],row['Value'])!=None,axis=1)


	Type	Value	result
0	TypeA	a1111	True
1	TypeB	1b111	True
2	TypeC	11c11	True
3	TypeD	111d1	True
4	TypeD	1111e	False

答案2

得分: 0

您可以使用np.vectorize来创建一个 lambda 函数,该函数接受 Value 和模式,并根据 re.match 的输出返回 TrueFalse

import re

# 创建数据框
df = pd.DataFrame({
    "Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"], 
    "Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
})

# 创建字典数据框
df_dict = pd.DataFrame({
    "Column A": ["A", "B", "C", "D"], 
    "Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
})
# 在每个值的开头添加 "Type" 以匹配主数据框中的 "Type" 列
df_dict["Column A"] = "Type" + df_dict["Column A"]

# 合并两个数据框以获取相应的正则表达式模式用于 "Type"
df = df.merge(df_dict, left_on="Type", right_on="Column A")

match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False)  # 创建一个矢量化函数以匹配正则表达式模式
df["Result"] = match_func(df["Value"], df["Column B"])  # 将结果添加到数据框中
df = df.drop(columns=["Column A", "Column B"])  # 删除不再需要的列
df

这段代码创建了一个数据框 df,并使用 np.vectorize 来创建一个函数,将每个 Value 与相应的正则表达式模式进行匹配,结果存储在 "Result" 列中。最后,删除了不再需要的列,得到了最终的数据框。

英文:

You can use np.vectorize to create a lambda function that takes Value and pattern, and return True or False based on the output of re.match.

import re

# Create the dataframe
df = pd.DataFrame({
    "Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"], 
    "Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
})

# Create the dictionary dataframe
df_dict = pd.DataFrame({
    "Column A": ["A", "B", "C", "D"], 
    "Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
})
# Add "Type" to the beginning of each value to match the "Type" column in the main dataframe
df_dict["Column A"] = "Type" + df_dict["Column A"]

# Merge the two dataframes to get corresponding regex pattern for "Type"
df = df.merge(df_dict, left_on="Type", right_on="Column A")

match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False)  # Create a vectorized function to match the regex pattern
df["Result"] = match_func(df["Value"], df["Column B"])  # Add the result to the dataframe
df = df.drop(columns=["Column A", "Column B"])  # Drop the columns that are no longer needed
df

    Type  Value  Result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False

答案3

得分: 0

使用Series.mapre.match函数:

d = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
df['Result'] = df.Type.str[-1].map(d)
df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)

    Type  Value  Result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False
英文:

With Series.map and re.match functions:

d = {'A':'\w\d\d\d\d','B':'\d\w\d\d\d','C':'\d\d\w\d\d','D':'\d\d\d\w\d'}
df['Result'] = df.Type.str[-1].map(d)
df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)

   Type  Value  Result
0  TypeA  a1111    True
1  TypeB  1b111    True
2  TypeC  11c11    True
3  TypeD  111d1    True
4  TypeD  1111e   False

huangapple
  • 本文由 发表于 2023年2月6日 14:45:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358112.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定