Alternative to 'find' which supports PCRE

huangapple go评论65阅读模式
英文:

Alternative to 'find' which supports PCRE

问题

Linux的find命令不支持Perl兼容的正则表达式(PCRE)。
是否有替代方法可以做到这一点,而且使用简洁(一行命令行)?

我找到了一些一行命令,但它们又长又复杂,很难理解它们的功能,每次写它们都很麻烦。

示例:

https://unix.stackexchange.com/questions/726878/is-it-possible-to-use-perl-like-regular-expressions-with-the-linux-find-command

使用管道、选项和多个函数。

https://stackoverflow.com/questions/19894673/unix-linux-freebsd-find-command-with-perl-regex

使用了许多选项和Perl。

我尝试直接使用Perl,但没有找到一个纯Perl的一行命令来实现。

示例:

https://stackoverflow.com/questions/55304227/perl-regex-command-line-how-to-get-matches-instead-replacing

提供了一个用于在单个文件中查找匹配项的一行命令,但不能在目录中查找文件名的匹配项。

英文:

Linux's find command does not support Perl compatible regular expressions (PCRE).
Is there an alternative that can do that that is concise to use (one line on command line).

I found some one liners but they were long and complicated, making it difficult to understand what they do and a pain to write them every time.

Examples:

https://unix.stackexchange.com/questions/726878/is-it-possible-to-use-perl-like-regular-expressions-with-the-linux-find-command

uses pipelining, -options, and multiple functions.

https://stackoverflow.com/questions/19894673/unix-linux-freebsd-find-command-with-perl-regex

uses a lot of options and also Perl

I tried using Perl directly but didn't find a pure Perl one-liner for it.

Example:

https://stackoverflow.com/questions/55304227/perl-regex-command-line-how-to-get-matches-instead-replacing

Gives a one liner for finding matches within a single file. But does not find filename matches within a directory.

答案1

得分: 2

使用Perl的File::Find

这会递归查找当前目录及其子目录中以.pl结尾的所有条目。

或者以目录作为输入,当前目录作为默认值

    find( sub { say $File::Find::name if /\.pl$/ }, $d )' directory-name

或者将找到的所有文件组合起来,进行可能的后处理,写入文件等操作

    find( sub { push @f, $File::Find::name if /\.pl$/ }, $d ); 
    say for @f'  directory-name

(如果没有提供目录名称,则使用当前目录)

然而,我不明白为什么不使用find + grep的简单管道。grep本身支持基本的正则表达式,而使用-E可以支持扩展正则表达式,使用-P(Perl)可以使用PCRE。因此,以下命令一次性完成所需操作:

find ... | grep -P regex 

文件名的标准可以分为两部分,一部分用于find的自身通配符,一部分用于grep的正则表达式。

最后,问题要求使用PCRE,而find的确没有PCRE正则表达式,如上所述。但是,find支持其他正则表达式的风格。有关这些不同风格之间的差异的详细描述可以在Linux上使用info find命令找到(我在互联网上找不到)。简而言之,与grep和其他工具使用的PCRE相比,主要差异在于:1)正则表达式模式必须与整个路径匹配,而不仅仅是其中的子字符串,2)这非常基本。

因此,要查找文件名中具有字母和数字以及.txt扩展名的文件,路径中可以有其他内容,可以在当前目录中的任何位置或其子目录中使用以下命令:

find . -type f -regex '.*\/[a-zA-Z]+[0-9]+\.txt'

请注意,前导的.*是必需的,否则无法匹配到文件名本身所在的路径(至少包括./)。尽管与完整的PCRE相比,这很基本,但对于大多数用途来说可能已经足够了。

英文:

Using Perl's File::Find

perl -MFile::Find -wE'find( sub { say $File::Find::name if /\.pl$/ }, q(.) )'

This finds all entries which end with .pl, recursively anywehere under the current directory.

Or take the directory as input, with the current dir as default

perl -MFile::Find -wE' $d = shift//q(.); 
    find( sub { say $File::Find::name if /\.pl$/ }, $d )' directory-name

Or assemble all files found for some possible post-processing, writing to file etc

perl -MFile::Find -wE' $d = shift//q(.); 
    find( sub { push @f, $File::Find::name if /\.pl$/ }, $d ); 
    say for @f'  directory-name

(If run without a directory-name then the current directory is used)

However, I don't see why a simple pipeline of find + grep isn't suitable. The grep itself supports basic regex, while with -E it supports extended ones and with -P (Perl) it uses PCRE. So

find ... | grep -P regex 

does exactly what is asked, in one command line. Criteria for filenames then can be split, some to go with find's own globbing and some in grep's regex.


Finally, the question asks for PCRE and find indeed doesn't have PCRE regex, as stated. However, find does support other flavors of regex. The man page has only a basic statement while a detailed description of differences between the flavors can be found with info find command on Linux (what I couldn't find on internet).

In short, the main differences from PCRE as used by grep and other tools, are: 1) the regex pattern has to match the whole path and not just a substring in it, and 2) this is very basic

So to find a file which has letters and then numbers before a .txt extension in the filename, with anything else for the path, anywhere in or under the current directory

find . -type f -regex '.*\/[a-zA-Z]+[0-9]+\.txt'

Note that the leading .* is necessary, otherwise the path leading to the filename itself can't be matched (there's at least ./ in it).

Basic as it is in comparison with the full PCRE, this may well be plenty enough for most uses.

huangapple
  • 本文由 发表于 2023年3月21日 02:55:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75794238-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定