我的JavaScript正则表达式能处理引号内的句点吗?

huangapple go评论64阅读模式
英文:

Can my JavaScript regex handle dots within quotes?

问题

我已经创建了一个正则表达式,用于在字符串中将?.添加到JavaScript表达式中。

例如:bar.foo.some 变成 bar?.foo?.some

它工作得很完美,除非在引号之间有点号('")。
我的意思是,当我在这个表达式上测试正则表达式时:data.foo ? 'yes. I can do it': 'no.',它变成了 data?.foo ? 'yes?. I can do it': 'no?.'

我需要这个正则表达式忽略引号内的点号。但是,我不确定如何指示正则表达式这样做。

我创建的正则表达式相当复杂,因为它涵盖了诸如 ?.[)? 用于索引访问,以及函数调用 (some[...] 和 some.fn().bar) 等情况。

以下是一些可能的表达式,以及期望的输出作为注释:

[
  `bla.params.query ? 'yes.' : 'no.'`, // 期望输出: bla?.params?.query ? 'yes.': 'no.'
  `bla[foo]`, // 期望输出: bla?.[foo]
  `bla[foo].bar }}`, // 期望输出: bla?.[foo]?.bar
  `bar.some().bar`, // 期望输出: bla?.some()?.bar
  `bla`, // 期望输出: bla
  `bla.some`, // 期望输出: bla?.some
  `bla.some.go`, // 期望输出: bla?.some?.go
  `bla ? 'bla.some': 'you.go'`, // 期望输出: bla ? 'bla.some': 'you.go'
  `bla.`, // 无效表达式,应忽略因为点号后面没有内容。期望输出: bla.
].forEach((v) => {
  v = v.replace(
    /(?<=[a-z_])\.(?!\?)|(?<=[\]a-z_])(?=\[)|(?<=[\]a-z_])(?=\^\()|(?<=]|\))(?=\[)|(?<=]|\))\./gi,
    '&#39;?.&#39;'
  );

  console.log(v);
});

希望这能帮助你解决问题。

英文:

I have created a regular expression to add ?. to JavaScript expressions within a string.

For example: bar.foo.some becomes bar?.foo?.some.

It works perfectly, except when there are dots between quotes (&#39; or &quot;).
I mean, when I test the regex on this expression: data.foo ? &#39;yes. I can do it&#39;: &#39;no.&#39;, it becomes data?.foo ? &#39;yes?. I can do it&#39;: &#39;no?.&#39;.

What I need from this regex is to ignore the dots inside the quotes. However, I am unsure how to instruct the regex to do so.

The regex I created is quite complex as it covers cases such as ?.[ and )?. for index access, as well as function calls (some[...] and some.fn().bar).

Below are some possible expressions with the expected output as comments:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

[
  `bla.params.query ? &#39;yes.&#39; : &#39;no.&#39;`, // expected output: bla?.params?.query ? &#39;yes.&#39;: &#39;no.&#39;
  `bla[foo]`, // expected output: bla?.[foo]
  `bla[foo].bar }}`, // expected output: bla?.[foo]?.bar
  `bar.some().bar`, // expected output: bla?.some()?.bar
  `bla`, // expected output: bla
  `bla.some`, // expected output: bla?.some
  `bla.some.go`, // expected output: bla?.some?.go
  `bla ? &#39;bla.some&#39;: &#39;you.go&#39;`, // expected output: bla ? &#39;bla.some&#39;: &#39;you.go&#39;
  `bla.`, // invalid expresssion, should ignore because there nothing after the dot. expected output: bla.
].forEach((v) =&gt; {
  v = v.replace(
    /(?&lt;=[a-z_])\.(?!\?)|(?&lt;=[\]a-z_])(?=\[)|(?&lt;=[\]a-z_])(?=^\()|(?&lt;=]|\))(?=\[)|(?&lt;=]|\))\./gi,
    &#39;?.&#39;
  );

  console.log(v);
});

<!-- end snippet -->

答案1

得分: 1

要跳过单引号或双引号内部的 .,您可以使用负向前瞻来确保 . 后面不跟着一个闭合引号。以下是更新后的正则表达式:

/(?<=[a-z_])\.(?!\?)(?=(?:[^'"]|'[^']*'|"[^"]*")*$)|(?<=[\]a-z_])(?=\[)|(?<=[\]a-z_])(?=^\()|(?<=]|\))(?=\[)|(?<=]|\))\./gi

这个正则表达式与之前的类似,但添加了一个前瞻断言,以检查 . 后面是否跟着一个闭合引号。前瞻匹配任意数量的字符,这些字符不是引号,或者是一个不跟随另一个引号的引号。这确保了 . 不在引号字符串内部。

请注意,我只翻译了代码部分,没有其他内容。

英文:

To skip . inside single or double quotes, you can use a negative lookahead to ensure that the . is not followed by a closing quote. Here's the updated regex:

/(?&lt;=[a-z_])\.(?!\?)(?=(?:[^&#39;&quot;]|&#39;[^&#39;]*&#39;|&quot;[^&quot;]*&quot;)*$)|(?&lt;=[\]a-z_])(?=\[)|(?&lt;=[\]a-z_])(?=^\()|(?&lt;=]|\))(?=\[)|(?&lt;=]|\))\./gi

This regex is similar to the previous one, but it adds a lookahead assertion to check that the . is not followed by a closing quote. The lookahead matches any number of characters that are not a quote or a quote that is not followed by another quote. This ensures that the . is not inside a quoted string.

<!-- begin snippet: js hide: true console: true babel: false -->

<!-- language: lang-js -->

[
  `bla.params.query ? &#39;yes.&#39; : &#39;no.&#39;`, // expected output: bla?.params?.query ? &#39;yes.&#39;: &#39;no.&#39;
  `bla[foo]`, // expected output: bla?.[foo]
  `bla[foo].bar }}`, // expected output: bla?.[foo]?.bar
  `bar.some().bar`, // expected output: bla?.some()?.bar
  `bla`, // expected output: bla
  `bla.some`, // expected output: bla?.some
  `bla.some.go`, // expected output: bla?.some?.go
  `bla ? &#39;bla.some&#39;: &#39;you.go&#39;`, // expected output: bla ? &#39;bla.some&#39;: &#39;you.go&#39;
  `bla.`, // invalid expresssion, should ignore because there nothing after the dot. expected output: bla.
].forEach((v) =&gt; {
  v = v.replace(
    /(?&lt;=[a-z_])\.(?!\?)(?=(?:[^&#39;&quot;]|&#39;[^&#39;]*&#39;|&quot;[^&quot;]*&quot;)*$)|(?&lt;=[\]a-z_])(?=\[)|(?&lt;=[\]a-z_])(?=^\()|(?&lt;=]|\))(?=\[)|(?&lt;=]|\))\./gi,
    &#39;?.&#39;
  );

  console.log(v);
});

<!-- end snippet -->

答案2

得分: 1

为了明确表示要在每个要替换的属性访问器(点或方括号)之前允许/匹配哪种字符,可以选择使用Unicode转义的表示法。

允许字母、数字、下划线、闭合括号和闭合括号的字符集/范围,嵌入到正向后顾中,看起来像这样...

(?<=[\p{L}\p{N}_\])])

只有在之后是字母、数字和下划线的情况下,才可能被认为是有效的替换点(匹配)的点...

\.(?=[\p{L}\p{N}_])

...而这样的附加规则不适用于开括号。

因此,将上述所有内容组合起来,用要么是一个开括号要么是一个有关键点的点的交替来描述,就得到了...

(?<=[\p{L}\p{N}_\])])(?:\[|\.(?=[\p{L}\p{N}_]))

为了防止匹配包含在单引号中,可以添加另一个后顾,这次是一个否定的后顾,如...

(?<!&#39;.*?)

最终的正则表达式... /(?<!&#39;.*?)(?<=\[)(?=[\p{L}\p{N}_\]])|\.(?=[\p{L}\p{N}_])/gu ... 符合提供的用例的要求。这并不意味着它是最终解决方案。提问者可以根据自己的需求优化模式。

无论如何,一个牢固的解决方案都必须建立在解析器/词法分析器之上。

const regXDotReadAccessors =
  // see ... [https://regex101.com/r/fmfcwR/1]
  /(?<!&#39;.*?)(?<=\[)(?=[\p{L}\p{N}_\]])|\.(?=[\p{L}\p{N}_])/gu;
英文:

In order to explicitly express which kind of character one wants to allow/match before each to be replaced property accessor, either dot or (opening) bracket, one might choose the notation of unicode escapes.

A character set/range, which allows letters, numbers, underscore, closing bracket and closing parentheses, embedded into a positive lookbehind then would look like this ...

(?&lt;=[\p{L}\p{N}_\])])

And only dots which in addition are followed (positive lookahead) by either letters, numbers and underscore might be considered valid replacement points (matches), ...

\.(?=[\p{L}\p{N}_])

... whereas such an additional rule would not apply to an opening bracket.

Thus, assembling all of the above with the alternation of either an opening bracket or a dot which has a followers rule, one already comes up with ...

(?&lt;=[\p{L}\p{N}_\])])(?:\[|\.(?=[\p{L}\p{N}_]))

And in order to prevent matches enclosed into single quotes, one could add another lookbehind, this time, a negative one like ...

(?&lt;!&#39;.*?)

The final regex ... /(?&lt;!&#39;.*?)(?&lt;=[\p{L}\p{N}_\])])(?:\[|\.(?=[\p{L}\p{N}_]))/gu ... fulfills the OP's requirements as far as it covers the OP's provided use cases. It does not mean, it's the ultimate solution. The OP could refine the pattern to the OP's needs.

A bullet proof solution anyhow had to be built on a parser/lexer.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const regXDotBracketAccessors =
  // see ... [https://regex101.com/r/fmfcwR/1]
  /(?&lt;!&#39;.*?)(?&lt;=[\p{L}\p{N}_\])])(?:\[|\.(?=[\p{L}\p{N}_]))/gu;

console.log([                           // expected:

  `bla.params.query ? &#39;yes.&#39; : &#39;no.&#39;`,  // bla?.params?.query ? &#39;yes.&#39; : &#39;no.&#39;
  `bla[foo]`,                           // bla?.[foo]
  `{{ bla[foo].bar }}`,                 // {{ bla?.[foo]?.bar }}
  `bar.some().bar`,                     // bar?.some()?.bar
  `bla`,                                // bla
  `bla.some`,                           // bla?.some
  `bla.some.go`,                        // bla?.some?.go
  `bla ? &#39;bla.some&#39; : &#39;you.go&#39;`,        // bla ? &#39;bla.some&#39; : &#39;you.go&#39;
  `bla.`,                               // bla.

  ].map(item =&gt;
    item.replace(
      regXDotBracketAccessors,
      match =&gt; `?.${ match !== &#39;.&#39; &amp;&amp; match || &#39;&#39; }`
    )
  )
);

<!-- language: lang-css -->

.as-console-wrapper { min-height: 100%!important; top: 0; }

<!-- end snippet -->

答案3

得分: 0

似乎它无法区分JavaScript对象中用作属性访问运算符的字符串内部的点和点。

RegExp用于线性模式匹配,对于您的情况,它需要忽略带有双引号或单引号的内容,这不是RegExp的长项。

一种解决方法是将字符串分割成多个部分,并对非字符串部分应用正则表达式,如下所示:

function safeRegexReplace(str, regex, replacement) {
    var parts = str.split(/(&quot;.*?&quot;|&#39;.*?&#39;)/); // or \.(?!\?|$)

    // for loop for the parts without quotes
    for (var i = 0; i < parts.length; i++) {
        if (i % 2 === 0) { // 如果索引为偶数,表示我们不在字符串内
            parts[i] = parts[i].replace(regex, replacement);
        }
    }

    // Join all
    return parts.join('');
}

[
  `bla.params.query ? &#39;yes.&#39; : &#39;no.&#39;`, // 期望输出: bla?.params?.query ? &#39;yes.&#39;: &#39;no.&#39;
  `bla[foo]`, // 期望输出: bla?.[foo]
  `bla[foo].bar }}`, // 期望输出: bla?.[foo]?.bar
  `bar.some().bar`, // 期望输出: bla?.some()?.bar
  `bla`, // 期望输出: bla
  `bla.some`, // 期望输出: bla?.some
  `bla.some.go`, // 期望输出: bla?.some?.go
  `bla ? &#39;bla.some&#39;: &#39;you.go&#39;`, // 期望输出: bla ? &#39;bla.some&#39;: &#39;you.go&#39;
  `bla.`, // 无效的表达式,应忽略,因为点后面没有内容。期望输出: bla.
].forEach((v) => {
  v = safeRegexReplace(v, /(?<=[a-z_])\.(?!\?)|(?<=[\]a-z_])(?=\[)|(?<=[\]a-z_])(?=\()|(?<=[\]a-z_])(?=\[)|(?<=[\]a-z_])(?=\))\./gi, '?.'); //every little combination
  console.log(v);
});

我假设字符串不包含嵌套字符串。

英文:

It seems that is that it fails to distinguish between dots inside strings and dots used as property access operators in JavaScript objects.

RegExp is designed for linear pattern matching, in your case it needs to ignore something with double quotes or single quotes and it is not RegExp's forte.

A solution could be to break the string into pieces and apply regex to non string parts like this:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

function safeRegexReplace(str, regex, replacement) {
    var parts = str.split(/(&quot;.*?&quot;|&#39;.*?&#39;)/); // or \.(?!\?|$)

    // for loop for the parts without quotes
    for (var i = 0; i &lt; parts.length; i++) {
        if (i % 2 === 0) { // If the index is even, we are not inside a string
            parts[i] = parts[i].replace(regex, replacement);
        }
    }

    // Join all
    return parts.join(&#39;&#39;);
}

[
  `bla.params.query ? &#39;yes.&#39; : &#39;no.&#39;`, // expected output: bla?.params?.query ? &#39;yes.&#39;: &#39;no.&#39;
  `bla[foo]`, // expected output: bla?.[foo]
  `bla[foo].bar }}`, // expected output: bla?.[foo]?.bar
  `bar.some().bar`, // expected output: bla?.some()?.bar
  `bla`, // expected output: bla
  `bla.some`, // expected output: bla?.some
  `bla.some.go`, // expected output: bla?.some?.go
  `bla ? &#39;bla.some&#39;: &#39;you.go&#39;`, // expected output: bla ? &#39;bla.some&#39;: &#39;you.go&#39;
  `bla.`, // invalid expresssion, should ignore because there nothing after the dot. expected output: bla.
].forEach((v) =&gt; {
  v = safeRegexReplace(v, /(?&lt;=[a-z_])\.(?!\?)|(?&lt;=[\]a-z_])(?=\[)|(?&lt;=[\]a-z_])(?=^\()|(?&lt;=]|\))(?=\[)|(?&lt;=]|\))\./gi, &#39;?.&#39;); //every little combination
  console.log(v);
});

<!-- end snippet -->

I'm assuming that the string does not contain nested strings.

huangapple
  • 本文由 发表于 2023年6月16日 14:55:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76487640.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定