获取字符串中第一个非空白字符的简单方法

huangapple go评论113阅读模式
英文:

Simple way to get the first non whitespace character from string

问题

The first non-whitespace character is a in abc, I get it with:

echo '   abc' | grep -o '^\s\+.'
a

Is there a shorter way to do this?

英文:

The first non whitespace character is a in abc,i get it with:

echo '   abc' | grep -o '^\s\+.' |tr -d  ' '
a

Is there more shorter way to do so?

答案1

得分: 1

echo '   abc  sd ggf ' | awk '{print $1}' | cut -c 1
echo '   abc  sd ggf ' | sed 's/^[ \t]*//' | cut -d ' ' -f 1 | cut -c 1
echo '   abc  sd ggf ' | sed -r 's/\s+//' | cut -c -1

PS:已经更正以打印第一个字符。

英文:
echo '   abc  sd ggf ' | awk '{print $1}' | cut -c 1
echo '   abc  sd ggf ' | sed 's/^[ \t]*//' | cut -d' ' -f 1 | cut -c 1
echo '   abc  sd ggf ' | sed -r 's/\s+//' | cut -c -1

PS: Corrected for printing the first character.

答案2

得分: 1

BASH_REMATCH[]数组中检索正则表达式匹配项:

str='   abc'
regex='[^[:space:]]'                                          # 匹配单个非空格字符
[[ "${str}" =~ ${regex} ]] && char1="${BASH_REMATCH[0]}"

结果:

$ typeset -p char1
declare -- char1="a"

$ echo "${char1}"
a

另一种使用参数扩展来去除空格字符并进行子字符串调用的方法:

str='   abc'
newstr="${str//[[:space:]]/}"                 # 去除空格
char1="${newstr:0:1}"                         # 通过子字符串提取第一个字符(起始位置0,长度1)

结果:

$ typeset -p char1
declare -- char1="a"

$ echo "${char1}"
a

注意: 这种方法打字少一些,还可以消除为每个管道命令(例如,| grep| tr| awk| sed| cut)创建子shell所带来的性能开销。

英文:

Retrieving a regex match from the BASH_REMATCH[] array:

str='   abc'
regex='[^[:space:]]'                                          # match a single non-whitespace character
[[ "${str}" =~ ${regex} ]] && char1="${BASH_REMATCH[0]}" 

Result:

$ typeset -p char1
declare -- char1="a"

$ echo "${char1}"
a

NOTE: while this requires a bit more typing it has the benefit of eliminating the performance overhead of creating a subshell for each piped command (eg, | grep, |tr , | awk, | sed, | cut)


Another idea using parameter expansion to strip out whitespace characters plus a substring call:

str='   abc'
newstr="${str//[[:space:]]/}"                 # strip out whitespace
char1="${newstr:0:1}"                         # extract 1st character via substring (start position 0, length of 1)

Result:

$ typeset -p char1
declare -- char1="a"

$ echo "${char1}"
a

NOTE: a little less typing and this also eliminates the performance overhead of creating a subshell for each piped command (eg, | grep, |tr , | awk, | sed, | cut)

答案3

得分: 1

以下是您要翻译的内容:

让我们首先通过去除无用的空格并使用here字符串来缩短您自己的命令,而不是使用 echo 和管道。让我们还使用bash变量使其与字符串无关:

s='   abc'
grep -o '^\s\+.'<<<$s|tr -d  ' '

现在它有32个字符长。我们能找到更短的吗?是的,使用trcut。如果空格是实际空格(不是制表符),则以下是一个22个字符长的脚本:

tr -d ' '<<<$s|cut -c1

如果空格也可以是制表符,我们需要两个额外的字符,即24:

tr -d ' \t'<<<$s|cut -c1

如果它们可以是任何水平空白字符,我们需要6个额外的字符(30):

tr -d '[:blank:]'<<<$s|cut -c1

如果它们可以是任何水平或垂直空白字符(但如果字符串包含换行符,您的基于grep的解决方案将不再有效),我们仍然有一个30个字符长的脚本:

tr -d '[:space:]'<<<$s|cut -c1
英文:

Let's shorten your own command first by removing useless spaces and using a here string instead of echo and a pipe. Let's also make it string independent with a bash variable:

s=&#39;   abc&#39;
grep -o &#39;^\s\+.&#39;&lt;&lt;&lt;$s|tr -d  &#39; &#39;

It is now 32 characters long. Can we find shorter? Yes, with tr and cut. If the whitespaces are real spaces (not tabs) here is a 22 characters long script:

tr -d &#39; &#39;&lt;&lt;&lt;$s|cut -c1

If the whitespaces can also be tabs we need two more characters, that is, 24:

tr -d &#39; \t&#39;&lt;&lt;&lt;$s|cut -c1

If they can be any horizontal whitespace we need 6 more (30):

tr -d &#39;[:blank:]&#39;&lt;&lt;&lt;$s|cut -c1

And if they can be any horizontal or vertical whitespace (but if the string contains newlines your grep-based solution does not work any more), we still have a 30 characters long script:

tr -d &#39;[:space:]&#39;&lt;&lt;&lt;$s|cut -c1

答案4

得分: 0

也可以只使用参数扩展形式来完成:

```sh
shopt -s extglob # 打开扩展的通配符以使用+()
foo='   abc'
trimmed=${foo##+([[:space:]])} # 如果有的话,去除前导空格
echo "${trimmed:0:1}" # 并显示剩余文本的第一个字符
英文:

You can also do it with just parameter expansion forms:

shopt -s extglob # Turn on extended globs to get +()
foo=&#39;   abc&#39;
trimmed=${foo##+([[:space:]])} # Remove leading whitespace if any
echo &quot;${trimmed:0:1}&quot; # And display the first character of the remaining text

答案5

得分: 0

UPDATE 1 ::: 清晰

mawk '_ {退出} _ = _$(NF = 1)' RS='[ \t-\r]+' FS=

它将第一个字符分配给 _,只要输入行为空,它就不会退出。

_$( ) 不是什么特别的东西 - 它只是将变量 _ 与字段 $( … ) 直接连接。

echo ' abc' |

gawk '_ {退出} _ = "" < $(NF=1)' RS='[ \t-\r]+' FS=

a

英文:

UPDATE 1 ::: MUCH cleaner

mawk &#39;_ {exit} _ = _$(NF = 1)&#39; RS=&#39;[ \t-\r]+&#39; FS=

It assigns the 1st character into _, so as long as the input rows are empty, it wouldn't exit.

The _$( ) isn't anything special - it's merely a straight up string concat of variable _ with field $( … )

>

echo &#39;   abc&#39; | 

gawk &#39;_ {exit} _ = &quot;&quot; &lt; $(NF=1)&#39; RS=&#39;[ \t-\r]+&#39; FS= 

a

huangapple
  • 本文由 发表于 2023年7月6日 20:49:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76629019.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定