英文:
Grep doesn't work when adding an extra character at the end of operator
问题
以下是您要翻译的内容:
I am grepping (Ubuntu) multiples files with this command:
Which returns a few results, this being one of them:
but if I add [EÈÉÊËeèéêë] as part of the operator like shown below:
Then I get nothing.
Why is that?
Thanks!
英文:
I am grepping (Ubuntu) multiples files with this command:
<pre>
LANG=en_US.UTF-8 grep -P -R -i -I -H -A1 "^name#.?r[AÀÁÂÃÄaàáâãä]f[AÀÁÂÃÄaàáâãä][EÈÉÊËeèéêë]l s[IÌÍÎÏiìíîï]m[OÔÒÓÕÖoòóôõö].?#.?#.?#.?#.?$" image_args_*
</pre>
Which returns a few results, this being of them:
<pre>
image_args_search_134.txt:name#Rafael Simões Vieira#1767###Emerenciana Rodrigues de Oliveira
image_args_search_134.txt-#bati.#134#somelinkhere.com##
</pre>
but if I add [EÈÉÊËeèéêë] as part of the operator like shown below:
<pre>
LANG=en_US.UTF-8 grep -P -R -i -I -H -A1 "^name#.?r[AÀÁÂÃÄaàáâãä]f[AÀÁÂÃÄaàáâãä][EÈÉÊËeèéêë]l s[IÌÍÎÏiìíîï]m[OÔÒÓÕÖoòóôõö][EÈÉÊËeèéêë].?#.?#.?#.?#.?$" image_args_*
</pre>
Then I get nothing.
Why is that?
Thanks!
答案1
得分: 1
以下是翻译好的内容:
从我注意到的情况来看,似乎您想要使用与给定字母匹配的所有可能的变音符号。在正则表达式的概念中,您可以使用_等效类_。
等效类表达式将表示属于等效类的整理元素集,如在整理顺序中所描述。只能识别主要的等效类。该类将由将等效类中的任何整理元素之一括在等号 (
[=
和=]
) 分隔符内来表示。例如,如果'a'、'à' 和 'â' 属于同一个等效类,则[[=a=]b]
、[[=à=]b]
和[[=â=]b]
各自等同于[aàâb]
。如果整理元素不属于等效类,则等效类表达式将被视为整理符号。
因此,您可能想要基于以下内容编写一些代码:
$ grep -i 'r[[=a=]]f[[=a=]][[=e=]]l s[[=i=]]m[[=o=]][[=e=]]s' file1 file2 file3
请注意,这在PCRE中不存在,因此您只需使用扩展正则表达式:
$ grep -A1 -iIREH '^name#[^#]*r[[=a=]]f[[=a=]][[=e=]]l s[[=i=]]m[[=o=]][[=e=]]s[^#]*(#[^#]*){4}$' *
英文:
From what I notice, it looks like you would like to use all possible diacritics that fit a given letter. Within the concept of regular expressions, you can use equivalence classes.
> An equivalence class expression shall represent the set of collating elements belonging to an equivalence class, as described in Collation Order. Only primary equivalence classes shall be recognized. The class shall be expressed by enclosing any one of the collating elements in the equivalence class within bracket-equal ( [=
and =]
) delimiters. For example, if 'a', 'à', and 'â' belong to the same equivalence class, then [[=a=]b]
, [[=à=]b]
, and [[=â=]b]
are each equivalent to [aàâb]
. If the collating element does not belong to an equivalence class, the equivalence class expression shall be treated as a collating symbol.
So you might want to write something based on:
$ grep -i 'r[[=a=]]f[[=a=]][[=e=]]l s[[=i=]]m[[=o=]][[=e=]]s' file1 file2 file3
Note that this does not exist in PCRE, so you just need to use extended regular expressions:
$ grep -A1 -iIREH '^name#[^#]*r[[=a=]]f[[=a=]][[=e=]]l s[[=i=]]m[[=o=]][[=e=]]s[^#]*(#[^#]*){4}$' *
答案2
得分: 0
请确认所需的语言环境已激活(不要被前导的 #
注释掉):
grep "en_US" /etc/locales.gen
# en_US ISO-8859-1
# en_US.ISO-8859-15 ISO-8859-15
en_US.UTF-8 UTF-8
(可以通过删除 /etc/locales.gen
文件中的注释 #
来配置该文件以满足您的需求)
确保已生成这些配置的语言环境:
sudo update-locale
英文:
I see your problem only, if i use some not installed locales.
Please verify that the required locales are activated (not commended out by leading #
)
grep "en_US" /etc/locales.gen
# en_US ISO-8859-1
# en_US.ISO-8859-15 ISO-8859-15
en_US.UTF-8 UTF-8
(The file /etc/locales.gen
can be configured to your needs by removing the commenting #
)
Assure that these configured locales are really generated:
sudo update-locale
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论