英文:
Python code patterns: elegant way of trying different methods until one suceeds?
问题
我正在尝试从HTML页面中提取信息,比如其最后修改日期,在一个有多种声明方式且使用非统一数据的背景下(意味着无法简单循环遍历提取的数据)。
这个丑陋的任务如下:
def get_date(html):
date = None
# 方法1
time_tag = html.find("time", {"datetime": True})
if time_tag:
date = time_tag["datetime"]
if date:
return date
# 方法2
mod_tag = html.find("meta", {"property": "article:modified_time", "content": True})
if mod_tag:
date = mod_tag["content"]
if date:
return date
# 方法3
div_tag = html.find("div", {"class": "dateline"})
if div_tag:
date = div_tag.get_text()
if date:
return date
# 方法n
# ...
return date
我想知道Python是否没有一种简洁而优雅的通过“while”逻辑来实现这一点的方式,以便快速运行,易于阅读和易于维护:
def method_1(html):
test = html.find("time", {"datetime": True})
return test["datetime"] if test else None
def method_2(html):
test = html.find("meta", {"property": "article:modified_time", "content": True})
return test["content"] if test else None
def method_3(html):
test = html.find("div", {"class": "dateline"})
return test.get_text() if test else None
...
def get_date(html):
date = None
bag_of_methods = [method_1, method_2, method_3, ...]
i = 0
while not date and i < len(bag_of_methods):
date = bag_of_methods[i](html)
i += 1
return date
我可以通过将第一个代码片段中的每种方法转换为一个函数,并将所有函数附加到“bag_of_methods”可迭代对象中,然后运行它们,直到其中一个起作用,从而使其正常工作。
然而,这些函数只有2行,后续不会重用,所以它似乎只是增加了更多的代码行并且污染了命名空间,毫无意义。
有没有更好的方法做到这一点?
英文:
I'm trying to extract informations from an HTML page, like its last modification date, in a context where there are more than one way of declaring it, and those ways use non-uniform data (meaning a simple loop over fetched data is not possible).
The ugly task is as follow:
def get_date(html):
date = None
# Approach 1
time_tag = html.find("time", {"datetime": True})
if time_tag:
date = time_tag["datetime"]
if date:
return date
# Approach 2
mod_tag = html.find("meta", {"property": "article:modified_time", "content": True})
if mod_tag:
date = mod_tag["content"]
if date:
return date
# Approach 3
div_tag = html.find("div", {"class": "dateline"})
if div_tag:
date = div_tag.get_text()
if date:
return date
# Approach n
# ...
return date
I wonder if Python doesn't have some concise and elegant way of achieving this through a `while" logic, in order to run fast, be legible and maintenance-friendly:
def method_1(html):
test = html.find("time", {"datetime": True})
return test["datetime"] if test else None
def method_2(html):
test = html.find("meta", {"property": "article:modified_time", "content": True})
return test["content"] if test else None
def method_3(html):
test = html.find("div", {"class": "dateline"})
return test.get_text() if test else None
...
def get_date(html):
date = None
bag_of_methods = [method_1, method_2, method_3, ...]
i = 0
while not date and i < len(bag_of_methods):
date = bag_of_methods[i](html)
i += 1
return date
I can make that work right now by turning each approach from the first snippet in a function, append all functions to the bag_of_methods
iterable and run them all until one works.
However, those functions would be 2 lines each and will not be reused later in the program, so it just seems like it's adding more lines of code and polluting the namespace for nothing.
Is there a better way of doing this ?
答案1
得分: 2
为了保持命名空间的清晰,您可以在您的 get_date
函数内部定义 method_x
函数:
def get_date(HTML):
def method_1(HTML):
test = html.find("time", {"datetime": True})
return test["datetime"] if test else None
def method_2(HTML):
test = html.find("meta", {"property": "article:modified_time", "content": True})
return test["content"] if test else None
date = None
bag_of_methods = [method_1, method_2, ...]
#...
当我看到一个 while
循环遍历已定义序列时,我也感到不舒服,所以我会倾向于用 for
循环和 break
重写。另外,要注意某些类型在布尔上下文中可能会被视为 False
,所以您可能想重新编写退出条件,使用 is None
:
date = None
for test in bag_of_methods:
date = test(html)
if date is not None:
break
此外,您可以利用 or
的短路特性:
date = method_1(html) or method_2(html) or method_3(html)
if date:
return date
else:
return None
执行将从左到右进行,一旦有函数返回一个在布尔上下文中评估为 True
的值,就会立即停止 - 尽管如果其中一个日期以某种方式评估为 False
,也会陷入相同的陷阱。
英文:
To keep the namespace clean, you can define the method_x
functions inside your get_date
:
def get_date(HTML):
def method_1(HTML):
test = html.find("time", {"datetime": True})
return test["datetime"] if test else None
def method_2(HTML):
test = html.find("meta", {"property": "article:modified_time", "content": True})
return test["content"] if test else None
date = None
bag_of_methods = [method_1, method_2, ...]
#...
I also feel hinky when I see a while
loop looping over a defined sequence, so I'd be tempted to rewrite with a for
and a break
. Also, beware that certain types can evaluate to False
in a boolean context, so you might want to re-write the escape condition using is None
:
date = None
for test in bag_of_methods:
date = test(html)
if date is not None:
break
Also, you can utilize the short-circuiting property of or
:
date = method_1(html) or method_2(html) or method_3(html)
if date:
return date
else:
return None
Execution will go from left to right, and will stop as soon as something returns a value that evaluates to True
- although this falls into the same trap if one of your dates somehow evaluates to False
答案2
得分: 2
这是一些可重复使用的内容:
class CallChain:
def __init__(self, *methods):
self.methods = methods
def __call__(self, *args, **kwargs):
for method in self.methods:
result = method(*args, **kwargs)
if result is not None:
return result
return False
要使用它:
get_date = CallChain(method_1, method_2, method_3)
date = get_date(html)
你也可以用这个类来创建其他方法链。
另一种实现这个想法的方式是使用闭包:
def call_chain(*methods):
def call_it(*args, **kwargs):
for method in methods:
result = method(*args, **kwargs)
if result is not None:
return result
return False
return call_it
要使用它:
get_date = call_chain(method_1, method_2, method_3)
date = get_date(html)
英文:
Here is something reusable:
class CallChain:
def __init__(self, *methods):
self.methods = methods
def __call__(self, *args, **kwargs):
for method in self.methods:
result = method(*args, **kwargs)
if result is not None:
return result
return False
To use it:
get_date = CallChain(method_1, method_2, method_3)
date = get_date(html)
You can use this class for other method chains as well.
A different implementation of this idea is to use enclosure:
def call_chain(*methods):
def call_it(*args, **kwargs):
for method in methods:
result = method(*args, **kwargs)
if result is not None:
return result
return False
return call_it
To use it:
get_date = call_chain(method_1, method_2, method_3)
date = get_date(html)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论