Python代码模式:尝试不同方法直到成功的优雅方式?

huangapple go评论64阅读模式
英文:

Python code patterns: elegant way of trying different methods until one suceeds?

问题

我正在尝试从HTML页面中提取信息,比如其最后修改日期,在一个有多种声明方式且使用非统一数据的背景下(意味着无法简单循环遍历提取的数据)。

这个丑陋的任务如下:

def get_date(html):
  date = None
  # 方法1
  time_tag = html.find("time", {"datetime": True})
  if time_tag:
    date = time_tag["datetime"]
  if date:
    return date

  # 方法2
  mod_tag = html.find("meta", {"property": "article:modified_time", "content": True})
  if mod_tag:
    date = mod_tag["content"]
  if date:
    return date
  
  # 方法3
  div_tag = html.find("div", {"class": "dateline"})
  if div_tag:
    date = div_tag.get_text()
  if date:
    return date
  
  # 方法n
  # ...

  return date

我想知道Python是否没有一种简洁而优雅的通过“while”逻辑来实现这一点的方式,以便快速运行,易于阅读和易于维护:

def method_1(html):
  test = html.find("time", {"datetime": True})
  return test["datetime"] if test else None

def method_2(html):
  test = html.find("meta", {"property": "article:modified_time", "content": True})
  return test["content"] if test else None

def method_3(html):
  test = html.find("div", {"class": "dateline"})
  return test.get_text() if test else None

...

def get_date(html):
  date = None
  bag_of_methods = [method_1, method_2, method_3, ...]

  i = 0
  while not date and i < len(bag_of_methods):
    date = bag_of_methods[i](html)
    i += 1

  return date

我可以通过将第一个代码片段中的每种方法转换为一个函数,并将所有函数附加到“bag_of_methods”可迭代对象中,然后运行它们,直到其中一个起作用,从而使其正常工作。

然而,这些函数只有2行,后续不会重用,所以它似乎只是增加了更多的代码行并且污染了命名空间,毫无意义。

有没有更好的方法做到这一点?

英文:

I'm trying to extract informations from an HTML page, like its last modification date, in a context where there are more than one way of declaring it, and those ways use non-uniform data (meaning a simple loop over fetched data is not possible).

The ugly task is as follow:

def get_date(html):
  date = None
  # Approach 1
  time_tag = html.find(&quot;time&quot;, {&quot;datetime&quot;: True})
  if time_tag:
    date = time_tag[&quot;datetime&quot;]
  if date:
    return date

  # Approach 2
  mod_tag = html.find(&quot;meta&quot;, {&quot;property&quot;: &quot;article:modified_time&quot;, &quot;content&quot;: True})
  if mod_tag:
    date = mod_tag[&quot;content&quot;]
  if date:
    return date
  
  # Approach 3
  div_tag = html.find(&quot;div&quot;, {&quot;class&quot;: &quot;dateline&quot;})
  if div_tag:
    date = div_tag.get_text()
  if date:
    return date
  
  # Approach n
  # ...

  return date

I wonder if Python doesn't have some concise and elegant way of achieving this through a `while" logic, in order to run fast, be legible and maintenance-friendly:

def method_1(html):
  test = html.find(&quot;time&quot;, {&quot;datetime&quot;: True})
  return test[&quot;datetime&quot;] if test else None

def method_2(html):
  test = html.find(&quot;meta&quot;, {&quot;property&quot;: &quot;article:modified_time&quot;, &quot;content&quot;: True})
  return test[&quot;content&quot;] if test else None

def method_3(html):
  test = html.find(&quot;div&quot;, {&quot;class&quot;: &quot;dateline&quot;})
  return test.get_text() if test else None

...

def get_date(html):
  date = None
  bag_of_methods = [method_1, method_2, method_3, ...]

  i = 0
  while not date and i &lt; len(bag_of_methods):
    date = bag_of_methods[i](html)
    i += 1

  return date

I can make that work right now by turning each approach from the first snippet in a function, append all functions to the bag_of_methods iterable and run them all until one works.

However, those functions would be 2 lines each and will not be reused later in the program, so it just seems like it's adding more lines of code and polluting the namespace for nothing.

Is there a better way of doing this ?

答案1

得分: 2

为了保持命名空间的清晰,您可以在您的 get_date 函数内部定义 method_x 函数:

def get_date(HTML):

  def method_1(HTML):
    test = html.find(&quot;time&quot;, {&quot;datetime&quot;: True})
    return test[&quot;datetime&quot;] if test else None

  def method_2(HTML):
    test = html.find(&quot;meta&quot;, {&quot;property&quot;: &quot;article:modified_time&quot;, &quot;content&quot;: True})
    return test[&quot;content&quot;] if test else None

  date = None
  bag_of_methods = [method_1, method_2, ...]
  #...

当我看到一个 while 循环遍历已定义序列时,我也感到不舒服,所以我会倾向于用 for 循环和 break 重写。另外,要注意某些类型在布尔上下文中可能会被视为 False,所以您可能想重新编写退出条件,使用 is None

date = None

for test in bag_of_methods:
    date = test(html)
    if date is not None:
        break

此外,您可以利用 or 的短路特性:

date = method_1(html) or method_2(html) or method_3(html)
if date:
  return date
else:
  return None

执行将从左到右进行,一旦有函数返回一个在布尔上下文中评估为 True 的值,就会立即停止 - 尽管如果其中一个日期以某种方式评估为 False,也会陷入相同的陷阱。

英文:

To keep the namespace clean, you can define the method_x functions inside your get_date:

def get_date(HTML):

  def method_1(HTML):
    test = html.find(&quot;time&quot;, {&quot;datetime&quot;: True})
    return test[&quot;datetime&quot;] if test else None

  def method_2(HTML):
    test = html.find(&quot;meta&quot;, {&quot;property&quot;: &quot;article:modified_time&quot;, &quot;content&quot;: True})
    return test[&quot;content&quot;] if test else None

  date = None
  bag_of_methods = [method_1, method_2, ...]
  #...

I also feel hinky when I see a while loop looping over a defined sequence, so I'd be tempted to rewrite with a for and a break. Also, beware that certain types can evaluate to False in a boolean context, so you might want to re-write the escape condition using is None:

date = None

for test in bag_of_methods:
    date = test(html)
    if date is not None:
        break

Also, you can utilize the short-circuiting property of or:

date = method_1(html) or method_2(html) or method_3(html)
if date:
  return date
else:
  return None

Execution will go from left to right, and will stop as soon as something returns a value that evaluates to True - although this falls into the same trap if one of your dates somehow evaluates to False

答案2

得分: 2

这是一些可重复使用的内容:

class CallChain:
    def __init__(self, *methods):
        self.methods = methods
        
    def __call__(self, *args, **kwargs):
        for method in self.methods:
            result = method(*args, **kwargs)
            if result is not None:
                return result
        return False

要使用它:

get_date = CallChain(method_1, method_2, method_3)
date = get_date(html)

你也可以用这个类来创建其他方法链。

另一种实现这个想法的方式是使用闭包:

def call_chain(*methods):
    def call_it(*args, **kwargs):
        for method in methods:
            result = method(*args, **kwargs)
            if result is not None:
                return result
        return False
    return call_it

要使用它:

get_date = call_chain(method_1, method_2, method_3)
date = get_date(html)
英文:

Here is something reusable:

class CallChain:
    def __init__(self, *methods):
        self.methods = methods
        
    def __call__(self, *args, **kwargs):
        for method in self.methods:
            result = method(*args, **kwargs)
            if result is not None:
                return result
        return False

To use it:

get_date = CallChain(method_1, method_2, method_3)
date = get_date(html)

You can use this class for other method chains as well.

A different implementation of this idea is to use enclosure:

def call_chain(*methods):
    def call_it(*args, **kwargs):
        for method in methods:
            result = method(*args, **kwargs)
            if result is not None:
                return result
        return False
    return call_it

To use it:

get_date = call_chain(method_1, method_2, method_3)
date = get_date(html)

huangapple
  • 本文由 发表于 2023年4月19日 23:06:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056089.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定