Python – 根据字典中的条件检查插入列

huangapple go评论95阅读模式
英文:

Python - Insert column based on condition check from dictionary

问题

我正在尝试根据使用单独列的验证来插入一个包含值'True'和'False'的列。我遇到的问题是条件取决于另一列,它充当字典(使用正则表达式)的键。

例如:

我有的表格:

类型
类型A a1111
类型B 1b111
类型C 11c11
类型D 111d1
类型D 1111e

我有的字典:

列A 列B
A \w\d\d\d\d
B \d\w\d\d\d
C \d\d\w\d\d
D \d\d\d\w\d

我想要的结果:

类型 结果
类型A a1111 True
类型B 1b111 True
类型C 11c11 True
类型D 111d1 True
类型D 1111e False

任何帮助将不胜感激!

我尝试过使用numpy.where()来尝试解决,但运气不太好。

英文:

I am trying to insert a column with values 'True' and 'False' based on a validation using a separate column. The issue I'm having is that the condition is dependent on another column, acting as the dictionary (which uses regex) key.

E.g.

Table I have:

Type Value
TypeA a1111
TypeB 1b111
TypeC 11c11
TypeD 111d1
TypeD 1111e

Dictionary I have:

Column A Column B
A \w\d\d\d\d
B \d\w\d\d\d
C \d\d\w\d\d
D \d\d\d\w\d

Result I want:

Type Value Result
TypeA a1111 True
TypeB 1b111 True
TypeC 11c11 True
TypeD 111d1 True
TypeD 1111e False

Any help would be appreciated!

I have tried playing around with numpy.where() but haven't had much luck.

答案1

得分: 0

只有一行

  1. df = pd.DataFrame({'Type': ['TypeA', 'TypeB', 'TypeC', 'TypeD', 'TypeD'],
  2. 'Value': ['a1111', '1b111', '11c11', '111d1', '1111e']})
  3. re_dict = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
  4. df['result'] = df.apply(lambda row: re.match(re_dict[row['Type'][-1]], row['Value']) is not None, axis=1)
  1. Type Value result
  2. 0 TypeA a1111 True
  3. 1 TypeB 1b111 True
  4. 2 TypeC 11c11 True
  5. 3 TypeD 111d1 True
  6. 4 TypeD 1111e False
英文:

Just One Line

  1. df=pd.DataFrame({'Type':'TypeA','TypeB','TypeC','TypeD','TypeD'],
  2. 'Value':['a1111','1b111','11c11','111d1','1111e']})
  3. re_dict={'A':r'\w\d\d\d\d','B':r'\d\w\d\d\d','C':r'\d\d\w\d\d','D':r'\d\d\d\w\d'}
  4. df['result']=df.apply(lambda row:re.match(re_dict[row['Type'][-1]],row['Value'])!=None,axis=1)
  5. Type Value result
  6. 0 TypeA a1111 True
  7. 1 TypeB 1b111 True
  8. 2 TypeC 11c11 True
  9. 3 TypeD 111d1 True
  10. 4 TypeD 1111e False

答案2

得分: 0

您可以使用np.vectorize来创建一个 lambda 函数,该函数接受 Value 和模式,并根据 re.match 的输出返回 TrueFalse

  1. import re
  2. # 创建数据框
  3. df = pd.DataFrame({
  4. "Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"],
  5. "Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
  6. })
  7. # 创建字典数据框
  8. df_dict = pd.DataFrame({
  9. "Column A": ["A", "B", "C", "D"],
  10. "Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
  11. })
  12. # 在每个值的开头添加 "Type" 以匹配主数据框中的 "Type" 列
  13. df_dict["Column A"] = "Type" + df_dict["Column A"]
  14. # 合并两个数据框以获取相应的正则表达式模式用于 "Type"
  15. df = df.merge(df_dict, left_on="Type", right_on="Column A")
  16. match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False) # 创建一个矢量化函数以匹配正则表达式模式
  17. df["Result"] = match_func(df["Value"], df["Column B"]) # 将结果添加到数据框中
  18. df = df.drop(columns=["Column A", "Column B"]) # 删除不再需要的列
  19. df

这段代码创建了一个数据框 df,并使用 np.vectorize 来创建一个函数,将每个 Value 与相应的正则表达式模式进行匹配,结果存储在 "Result" 列中。最后,删除了不再需要的列,得到了最终的数据框。

英文:

You can use np.vectorize to create a lambda function that takes Value and pattern, and return True or False based on the output of re.match.

  1. import re
  2. # Create the dataframe
  3. df = pd.DataFrame({
  4. "Type": ["TypeA", "TypeB", "TypeC", "TypeD", "TypeD"],
  5. "Value": ["a1111", "1b111", "11c11", "111d1", "1111e"]
  6. })
  7. # Create the dictionary dataframe
  8. df_dict = pd.DataFrame({
  9. "Column A": ["A", "B", "C", "D"],
  10. "Column B": [r"\w\d\d\d\d", r"\d\w\d\d\d", r"\d\d\w\d\d", r"\d\d\d\w\d"]
  11. })
  12. # Add "Type" to the beginning of each value to match the "Type" column in the main dataframe
  13. df_dict["Column A"] = "Type" + df_dict["Column A"]
  14. # Merge the two dataframes to get corresponding regex pattern for "Type"
  15. df = df.merge(df_dict, left_on="Type", right_on="Column A")
  16. match_func = np.vectorize(lambda x, pattern: True if re.match(pattern, x) else False) # Create a vectorized function to match the regex pattern
  17. df["Result"] = match_func(df["Value"], df["Column B"]) # Add the result to the dataframe
  18. df = df.drop(columns=["Column A", "Column B"]) # Drop the columns that are no longer needed
  19. df
  20. Type Value Result
  21. 0 TypeA a1111 True
  22. 1 TypeB 1b111 True
  23. 2 TypeC 11c11 True
  24. 3 TypeD 111d1 True
  25. 4 TypeD 1111e False

答案3

得分: 0

使用Series.mapre.match函数:

  1. d = {'A': r'\w\d\d\d\d', 'B': r'\d\w\d\d\d', 'C': r'\d\d\w\d\d', 'D': r'\d\d\d\w\d'}
  2. df['Result'] = df.Type.str[-1].map(d)
  3. df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)

  1. Type Value Result
  2. 0 TypeA a1111 True
  3. 1 TypeB 1b111 True
  4. 2 TypeC 11c11 True
  5. 3 TypeD 111d1 True
  6. 4 TypeD 1111e False
英文:

With Series.map and re.match functions:

  1. d = {'A':'\w\d\d\d\d','B':'\d\w\d\d\d','C':'\d\d\w\d\d','D':'\d\d\d\w\d'}
  2. df['Result'] = df.Type.str[-1].map(d)
  3. df['Result'] = df.apply(lambda x: bool(re.match(x.Result, x.Value)), axis=1)

  1. Type Value Result
  2. 0 TypeA a1111 True
  3. 1 TypeB 1b111 True
  4. 2 TypeC 11c11 True
  5. 3 TypeD 111d1 True
  6. 4 TypeD 1111e False

huangapple
  • 本文由 发表于 2023年2月6日 14:45:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358112.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定