你应该如何解释 tdda.rexpy.extract 的输出?

huangapple go评论64阅读模式
英文:

How should I interpret the output of tdda.rexpy.extract?

问题

我对Rexpy感兴趣,因为我正在寻找一个能推断匹配字符串的正则表达式的工具。通过使用rexpy.extracthelp来检查,它看起来可能是我想要的。

extract(examples, tag=False, encoding=None, as_object=False, extra_letters=None, full_escape=False, remove_empties=False, strip=False, variableLengthFrags=False, max_patterns=None, min_diff_strings_per_pattern=1, min_strings_per_pattern=1, size=None, seed=None, dialect='portable', verbose=0)
    从示例中提取正则表达式并返回
    
    通常示例应该是Unicode即Python3中的str和Python2中的unicode”)。但是如果指定了编码可以传递编码的字符串
    
    结果将始终为Unicode
    
    如果设置了as_object则会返回提取器对象其中结果在.results.rex中否则将返回一列正则表达式作为Unicode字符串

所以我尝试了一个例子:

>>> from tdda import rexpy
>>> s = 'andrew.gelman@statistics.com'
>>> rexpy.extract(s)
['^[.@]$', '^[a-z]$']

我期望得到类似于['^[a-z].[a-z]@[a-z].[a-z]$']而不是['^[.@]$', '^[a-z]$']。提取器只是告诉我特殊符号'.''@'在字符串中的某个位置被使用了吗?

英文:

I am interesting in Rexpy because I am looking for a tool which infers a regular expression that would match a string. Inspecting rexpy.extract with help it looked like it 'might' be what I want.

extract(examples, tag=False, encoding=None, as_object=False, extra_letters=None, full_escape=False, remove_empties=False, strip=False, variableLengthFrags=False, max_patterns=None, min_diff_strings_per_pattern=1, min_strings_per_pattern=1, size=None, seed=None, dialect='portable', verbose=0)
    Extract regular expression(s) from examples and return them.
    
    Normally, examples should be unicode (i.e. ``str`` in Python3,
    and ``unicode`` in Python2). However, encoded strings can be
    passed in provided the encoding is specified.
    
    Results will always be unicode.
    
    If as_object is set, the extractor object is returned,
    with results in .results.rex; otherwise, a list of regular
    expressions, as unicode strings is returned.

So I tried an example:

>>> from tdda import rexpy
>>> s = 'andrew.gelman@statistics.com'
>>> rexpy.extract(s)
['^[.@]$', '^[a-z]$']

I expected something similar to ['^[a-z].[a-z]@[a-z].[a-z]$'] rather than ['^[.@]$', '^[a-z]$']. Is the extractor just telling me that special symbols '.' and '@' are used 'somewhere' in the string?

答案1

得分: 3

The examples parameter expects an iterable of strings, by providing a single string as the parameter the function iterates over each individual character and is outputting regular expressions to match those single character examples.

尝试提供一个字符串列表,例如 rexpy.extract(

展开收缩
).

英文:

The examples parameter expects an iterable of strings, by providing a single string as the parameter the function iterates over each individual character and is outputting regular expressions to match those single character examples.

Try providing a list of strings instead, e.g. rexpy.extract(

展开收缩
).

huangapple
  • 本文由 发表于 2023年5月14日 00:57:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76243939.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定