如何通过正则表达式从有效的JavaScript源代码(而不是JSON)中提取键值对?

huangapple go评论77阅读模式
英文:

How to extract key-value pairs from valid JavaScript source code (not JSON) via regex?

问题

该字符串虽然是有效的JS源代码,但不符合JSON格式。因此,我认为没有简单的解决方案可以使用JSON.parse。尽管我可能错了。

问题

我有一串键值对的字符串,想要使用正则表达式提取它们。

  • 所有键都是已知的。
  • 分隔符是冒号。
  • 键可以或不可以由单引号或双引号包围,即key:value'key':value"key":value
  • 键与分隔符之间可能有空格,即key:valuekey :value
  • 分隔符与值之间可能有空格,即key:valuekey: value
  • 值可以或不可以由单引号或双引号包围,即key:valuekey:"value"key:'value'
  • 值可以包含多行文本,例如
key: {
   val1: 1,
   val2: 2,
   val3: 3,
 }
key: [
   val1,
   val2,
   val3,
 ]
key: (arg1, arg2) => {
   return {
     arg1,
     arg2
   }
 }

示例

字符串:

value1         :        true,
value2 : "something, something-else",
value3: [
  {
    a: 'a',
    b: true,
    c: 3
  }, {
    a: Thing,
    func: () => {
      return new Thing()
    }
  }
],
"value4": [1, 2, 3, 4],
'value5': "['a', 'b', 'c', 'd']",
value6: false

最终,我想得到一个包含键值对的二维数组,但可以在提取键和值后处理它。

期望的结果:

 [
   ['value1', true],
   ['value2', 'something, something-else'],
   ['value3', "{
                 a: 'a',
                 b: true,
                 c: 3
               }, {
                 a: Thing,
                 func: () => {
                   return new Thing()
                 }
               }"],
   ['value4', "[1, 2, 3, 4]"],
   ['value5', "['a', 'b', 'c', 'd']"],
   ['value6', false]
 [

尝试的解决方案

到目前为止,我想到了以下解决方案:

(?<key>value1|value2|value3|value4|value5|value6)["'\s]*?:\s*(?<value>(?!value1|value2|value3|value4|value5).*)
  1. 使用命名捕获组明确匹配冒号左边的键 - 考虑到可选的单引号或双引号和两侧的空白
(?<key>value1|value2|value3|value4|value5|value6)["'\s]*?:
  1. 使用负向前瞻匹配值,直到下一个键
\s*(?<value>(?!value1|value2|value3|value4|value5).*)

但是这似乎不是我所想的,如果您删除所有单词并用任意内容替换它们,结果仍然相同

\s*(?<value>(?!a).*)

我意识到这实际上并没有检查换行符,但我不确定如何结合它?

regex101上尝试的解决方案

额外的功能

对于值,只提取可选单引号或双引号内的内容,而不包括引号或逗号,例如something, something-else而不是'something, something-else',

注意

regex101示例设置为PCRE,以便我可以使用正则表达式调试器,但我正在寻找一个使用有效JavaScript正则表达式的解决方案。

英文:

The string, though being perfectly valid JS source code, does not meet the JSON format. Therefore I don't think there's a simple solution which allows the usage of JSON.parse. Although i may be wrong.


Problem

I have a string of key value pairs and would like extract them using regex.

  • The keys are all known
  • The separator is a colon
  • The key may or may not be surrounded by single or double quotes. i.e key:value, &#39;key&#39;:value, &quot;key&quot;:value
  • There may or may not be space between the key and the separator. i.e key:value, key :value
  • There may or may not be space between the separator and the value. i.e key:value, key: value
  • The value may or may not be surrounded by single or double quotes. i.e key:value, key:&quot;value&quot;, key:&#39;value&#39;
  • The value may consist of multiline text. i.e
key: {
       val1: 1,
       val2: 2,
       val3: 3,
     }
key: [
       val1,
       val2,
       val3,
     ]
key: (arg1, arg2) =&gt; {
       return {
         arg1,
         arg2
       }
     }

Example

The string:

value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
  {
    a: &#39;a&#39;,
    b: true,
    c: 3
  }, {
    a: Thing,
    func: () =&gt; {
      return new Thing()
    }
  }
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false

Ultimately i'd like to end up with a 2 dimensional array containing the key value pairs, but can handle that once the keys and values have been extracted using the regex.

The desired result:

 [
   [&#39;value1&#39;, true],
   [&#39;value2&#39;, &#39;something, something-else&#39;],
   [&#39;value3&#39;, &quot;{
                 a: &#39;a&#39;,
                 b: true,
                 c: 3
               }, {
                 a: Thing,
                 func: () =&gt; {
                   return new Thing()
                 }
               }&quot;],
   [&#39;value4&#39;, &quot;[1, 2, 3, 4]&quot;],
   [&#39;value5&#39;, &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;],
   [&#39;value6&#39;, false]
 [

Attempted solution

This is what i've come up with so far:

(?&lt;key&gt;value1|value2|value3|value4|value5|value6)[&quot;&#39;\s]*?:\s*(?&lt;value&gt;(?!value1|value2|value3|value4|value5).*)
  1. Use a named capture group to explicitly match the key to the left of the colon - taking into account the optional single or double quotes and whitespace either side
(?&lt;key&gt;value1|value2|value3|value4|value5|value6)[&quot;&#39;\s]*?:
  1. Use a negative lookahead to match the value up to the next key
\s*(?&lt;value&gt;(?!value1|value2|value3|value4|value5).*)

But this doesn't appear to be doing what i thought it was, as if you remove all the words and replace them with something arbitrary, the result is still the same

\s*(?&lt;value&gt;(?!a).*)

I realise that this isn't actually checking for a newline, but i'm not sure how to incorporate that?

Attempted solution on regex101

Nice to have

For the value, only extract what's inside the optional single of double quotes, not the quotes or comma. i.e this something, something-else rather than &#39;something, something-else&#39;,

Note

The regex101 example is set to PCRE so that i can use the Regex debugger, but i'm looking for a solution that uses valid javascript regex.

答案1

得分: 1

根据类似于以下正则表达式的情况...

/^(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?),*$/

... 其全局多行变体会生成在这个正则表达式测试工具中显示的匹配项,你可以选择基于reduce的方法,该方法处理基于行的令牌,这些令牌是通过在每个换行符处对源字符串进行split操作的结果。

然后,实现非常简单。对于每个行令牌,只需执行正则表达式并尝试访问正则表达式结果的命名捕获组。一旦至少存在value捕获,就有一个有效的匹配,因此可以将有效的键-值对推入结果数组中,其中键要么未引用,要么被捕获为其未引用的变体。如果value为null,表示没有匹配,这表明上一个收集的有效键-值对的多行值。因此,必须将后者的值逐行连接起来。

以下是代码部分的翻译:

const regXKeyValueCaptures =
  /^(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?),*$/;

console.log(
  sampleData
    .split(/\n/)
    .reduce((result, lineToken) => {
      const {
        dequotedKey = null,
        unquotedKey = null,
        value = null,
      } = regXKeyValueCaptures.exec(lineToken)?.groups ?? {};

      if (value === null) {
        result.at(-1)[1] = `${result.at(-1)[1]}\n${lineToken}`;
      } else {
        result.push([dequotedKey || unquotedKey, value]);
      }
      return result;
    }, [])
);

如果OP需要问题请求的结果,该结果与上述实现的原始解析结果不同,则上述代码需要附加一个额外的清理任务,该任务会从每个键-值对的值中删除尾随和前导引号。

以下是代码部分的翻译,包括附加的清理任务:

const regXKeyValueCaptures =
  /^(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?),*$/;

console.log(
  sampleData
    .split(/\n/)
    .reduce((result, lineToken) => {
      const {
        dequotedKey = null,
        unquotedKey = null,
        value = null,
      } = regXKeyValueCaptures.exec(lineToken)?.groups ?? {};

      if (value === null) {
        result.at(-1)[1] = `${result.at(-1)[1]}\n${lineToken}`;
      } else {
        result.push([dequotedKey || unquotedKey, value]);
      }
      return result;
    }, [])
    .map(([key, value]) =>
      [key, value.replace(/^[&quot;&#39;]|[&quot;&#39;]$/g, '')]
    )
);

此外,您提到了一个完全基于“正则表达式和捕获组”的方法,如果您对此有进一步的疑问或需要更多信息,请告诉我。

英文:

Based on a regex like ...

/^(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?),*$/

... where its global multiline variant produces matches like shown at this regex playground, one would chose a reduce based approach which does process line based tokens that are the result of splitting the source string at each newline.

The implementation then is very straightforward. For each line-token one just does execute the regex and tries accessing the regex-result's named capturing groups. As soon as at least the value capture does exist one has a valid match, thus, one can push a valid key-value pair into the OP's result array where the key either is unquoted or was captured as its dequoted variant. In case value is null, nothing did match which indicates a multiline value of the last collected valid key-value pair. Thus one has to line-wise concatenate the value of the latter.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const regXKeyValueCaptures =
  /^(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?),*$/;

console.log(

  sampleData
    .split(/\n/)
    .reduce((result, lineToken) =&gt; {
      const {
        dequotedKey = null,
        unquotedKey = null,
        value = null,
      } = regXKeyValueCaptures.exec(lineToken)?.groups ?? {};

      if (value === null) {
        result.at(-1)[1] = `${ result.at(-1)[1] }\n${ lineToken }`;
      } else {
        result.push([dequotedKey || unquotedKey, value]);
      }
      return result;
    }, [])

);

<!-- language: lang-css -->

.as-console-wrapper { min-height: 100%!important; top: 0; }

<!-- language: lang-html -->

&lt;script&gt;
const sampleData =
`value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
  {
    a: &#39;a&#39;,
    b: true,
    c: 3
  }, {
    a: Thing,
    func: () =&gt; {
      return new Thing()
    }
  }
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false`;
&lt;/script&gt;

<!-- end snippet -->

In case the OP is in need of the question's requested result which does not equal the above implementation's parsed raw result, the above code then needs to be accompanied by an additional sanitizing tasks which does strip trailing and leading quotes from each key-value pair's value.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const regXKeyValueCaptures =
  /^(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?),*$/;

console.log(

  sampleData
    .split(/\n/)
    .reduce((result, lineToken) =&gt; {
      const {
        dequotedKey = null,
        unquotedKey = null,
        value = null,
      } = regXKeyValueCaptures.exec(lineToken)?.groups ?? {};

      if (value === null) {
        result.at(-1)[1] = `${ result.at(-1)[1] }\n${ lineToken }`;
      } else {
        result.push([dequotedKey || unquotedKey, value]);
      }
      return result;

    }, []).map(([key, value]) =&gt;

      [key, value.replace(/^[&quot;&#39;]|[&quot;&#39;]$/g, &#39;&#39;)]
    )
);

<!-- language: lang-css -->

.as-console-wrapper { min-height: 100%!important; top: 0; }

<!-- language: lang-html -->

&lt;script&gt;
const sampleData =
`value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
  {
    a: &#39;a&#39;,
    b: true,
    c: 3
  }, {
    a: Thing,
    func: () =&gt; {
      return new Thing()
    }
  }
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false`;
&lt;/script&gt;

<!-- end snippet -->

EDIT

Bonus ... entirely "regex and capturing groups only" based approach

Having dealt a lot with the first regex pattern, I thought it couldn't be that difficult to find a regex only approach. So after trying a little and giving the before ...

/^(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?),*$/

... a twist, the regex which can deal with the parsing just by itself is ...

/(?:^|\n)(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?)(?=,\n[&#39;&quot;\w]|$)/gs

Its explanation and how it actually works can be found a this regex' playground.

The main changes are ...

  • the treatment of the multiline string as single line, hence the single line flag, which allows the value group to capture multiline matches via the non greedy dot all (.*?) quantifier,
  • and the combination of a non capturing grouped alternation (?:^|\n) at the pattern's beginning and the positive lookahead (with an alternation too (?=,\n[&#39;&quot;\w]|$)) at the end of the pattern which both do identify the beginning and the end of a key-value pair even if it is spread over several newlines.

Utilizing the new pattern and adapting the before introduced code accordingly leads to following implementation ...

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const regXKeyValueCaptures =
  /(?:^|\n)(?:(?&lt;quote&gt;[&#39;&quot;])(?&lt;dequotedKey&gt;[^&#39;&quot;]+)\k&lt;quote&gt;|(?&lt;unquotedKey&gt;\w+))\s*\:\s*(?&lt;value&gt;.*?)(?=,\n[&#39;&quot;\w]|$)/gs;

console.log(

  [...sampleData.matchAll(regXKeyValueCaptures)]
    .map(({ groups: { dequotedKey, unquotedKey, value } }) =&gt; [

      (dequotedKey || unquotedKey),
      value.replace(/^[&quot;&#39;]|[&quot;&#39;]$/g, &#39;&#39;),
    ])
);

<!-- language: lang-css -->

.as-console-wrapper { min-height: 100%!important; top: 0; }

<!-- language: lang-html -->

&lt;script&gt;
const sampleData =
`value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
  {
    a: &#39;a&#39;,
    b: true,
    c: 3
  }, {
    a: Thing,
    func: () =&gt; {
      return new Thing()
    }
  }
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false`;
&lt;/script&gt;

<!-- end snippet -->

答案2

得分: 0

键的顺序已知吗?如果是的话,您可以尝试从一个键到下一个键对源字符串进行切片,然后从每个单独的值的开头和末尾删除不需要的部分(空格、换行符、逗号、引号):

const str = `
  value1         :        true,
  value2 : &quot;something, something-else&quot;,
  value3: [
    {
      a: &#39;a&#39;,
      b: true,
      c: 3
    }, {
      a: Thing,
      func: () =&gt; {
        return new Thing()
      }
    }
  ],
  &quot;value4&quot;: [1, 2, 3, 4],
  &#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
  value6: false
`

function clean(dirtyValue) {
  return dirtyValue
    .replace(/^[&#39;&quot;]?\s*\:\s*/, &#39;&#39;)
    .replace(/,?\s*[&#39;&quot;]?$/, &#39;&#39;)
}

const keys = [&#39;value1&#39;, &#39;value2&#39;, &#39;value3&#39;, &#39;value4&#39;, &#39;value5&#39;, &#39;value6&#39;]

const parsed = keys.reduce((acc, key, i) =&gt; {
  const indexOfKey = str.indexOf(key);
  const indexOfNextKey = i &lt; keys.length - 1 ? str.indexOf(keys[i + 1]) : str.length
  
  acc[key] = clean(str.slice(indexOfKey + key.length, indexOfNextKey))

  return acc;
}, {})

Object.entries(parsed).forEach(([key, value]) =&gt; console.log(key, &#39;=&#39;, value))

您还可以根据无序键来调整上面的示例:

const str = `
  value1         :        true,
  value2 : &quot;something, something-else&quot;,
  value3: [
    {
      a: &#39;a&#39;,
      b: true,
      c: 3
    }, {
      a: Thing,
      func: () =&gt; {
        return new Thing()
      }
    }
  ],
  &quot;value4&quot;: [1, 2, 3, 4],
  &#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
  value6: false
`

function clean(dirtyValue) {
  return dirtyValue
    .replace(/^[&#39;&quot;]?\s*\:\s*/, &#39;&#39;)
    .replace(/,?\s*[&#39;&quot;]?$/, &#39;&#39;)
}

function shuffleArray(arr) {
  const shuffledArray = arr.slice(0)

  for (let i = shuffledArray.length - 1; i &gt; 0; i--) {
      const j = Math.floor(Math.random() * (i + 1));    

      [shuffledArray[i], shuffledArray[j]] = [shuffledArray[j], shuffledArray[i]]
  }

  return shuffledArray;
}

const keys = shuffleArray([&#39;value1&#39;, &#39;value2&#39;, &#39;value3&#39;, &#39;value4&#39;, &#39;value5&#39;, &#39;value6&#39;])

const parsed = keys.reduce((acc, key) =&gt; {
  const indexOfKey = str.indexOf(key);
      
  const closestIndexOfNextKey = keys.map((possibleNextKey) =&gt; {
    const possibleNextKeyIndex = str.indexOf(possibleNextKey, indexOfKey + 1)
    
    return possibleNextKeyIndex &lt;= 0 ? Infinity : possibleNextKeyIndex
  }).sort((a, b) =&gt; a - b)[0]
  
  acc[key] = clean(str.slice(indexOfKey + key.length, closestIndexOfNextKey))

  return acc;
}, {})

Object.entries(parsed).forEach(([key, value]) =&gt; console.log(key, &#39;=&#39;, value))

请注意,如果有许多键,您可能希望通过从键数组中删除已找到的键来优化此代码。

英文:

Is the order of the keys known? If so, you could try to slice the source string from one key to the next, and then removing the unwanted bits (spaces, line breaks, commas, quotes) from the start and end of each individual value:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const str = `
value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
{
a: &#39;a&#39;,
b: true,
c: 3
}, {
a: Thing,
func: () =&gt; {
return new Thing()
}
}
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false
`
function clean(dirtyValue) {
return dirtyValue
.replace(/^[&#39;&quot;]?\s*\:\s*/, &#39;&#39;)
.replace(/,?\s*[&#39;&quot;]?$/, &#39;&#39;)
}
const keys = [&#39;value1&#39;, &#39;value2&#39;, &#39;value3&#39;, &#39;value4&#39;, &#39;value5&#39;, &#39;value6&#39;]
const parsed = keys.reduce((acc, key, i) =&gt; {
const indexOfKey = str.indexOf(key);
const indexOfNextKey = i &lt; keys.length - 1 ? str.indexOf(keys[i + 1]) : str.length
acc[key] = clean(str.slice(indexOfKey + key.length, indexOfNextKey))
return acc;
}, {})
Object.entries(parsed).forEach(([key, value]) =&gt; console.log(key, &#39;=&#39;, value))

<!-- end snippet -->

You can also adapt the example above to work with unsorted keys:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const str = `
value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
{
a: &#39;a&#39;,
b: true,
c: 3
}, {
a: Thing,
func: () =&gt; {
return new Thing()
}
}
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false
`
function clean(dirtyValue) {
return dirtyValue
.replace(/^[&#39;&quot;]?\s*\:\s*/, &#39;&#39;)
.replace(/,?\s*[&#39;&quot;]?$/, &#39;&#39;)
}
function shuffleArray(arr) {
const shuffledArray = arr.slice(0)
for (let i = shuffledArray.length - 1; i &gt; 0; i--) {
const j = Math.floor(Math.random() * (i + 1));    
[shuffledArray[i], shuffledArray[j]] = [shuffledArray[j], shuffledArray[i]]
}
return shuffledArray;
}
const keys = shuffleArray([&#39;value1&#39;, &#39;value2&#39;, &#39;value3&#39;, &#39;value4&#39;, &#39;value5&#39;, &#39;value6&#39;])
const parsed = keys.reduce((acc, key) =&gt; {
const indexOfKey = str.indexOf(key);
const closestIndexOfNextKey = keys.map((possibleNextKey) =&gt; {
const possibleNextKeyIndex = str.indexOf(possibleNextKey, indexOfKey + 1)
return possibleNextKeyIndex &lt;= 0 ? Infinity : possibleNextKeyIndex
}).sort((a, b) =&gt; a - b)[0]
acc[key] = clean(str.slice(indexOfKey + key.length, closestIndexOfNextKey))
return acc;
}, {})
Object.entries(parsed).forEach(([key, value]) =&gt; console.log(key, &#39;=&#39;, value))

<!-- end snippet -->

Note if you have many keys, you might want to optimize this code by removing the keys you already found from the keys array.

答案3

得分: 0

我已经成功解决了问题,使用以下正则表达式(可能可以进行优化,但它完成了我需要的工作):

/(?<key>value1|value2|value3|value4|value5|value6)['" ]*?:(?<value>[\s\S]*?)(?:['",]*?\n*?['",]*?)(?=value1|value2|value3|value4|value5|value6|$)/g

它将键捕获到冒号左边的命名组'key'中,并允许在键后面添加可选的空格或单/双引号:

(?<key>value1|value2|value3|value4|value5|value6)['" ]*?:

然后,它使用负向前瞻来匹配冒号右边的值,以查找下一个键或字符串的末尾。在负向前瞻之前,它查找:

  • 可选的空格、单/双引号和逗号的混合
  • 值(由命名组'value'捕获)
  • 可选的换行符
  • 可选的空格、单/双引号和逗号的混合
['"]*(?<value>[\s\S]*?)(?:['",]*?\n*?['",]*?)(?=value1|value2|value3|value4|value5|value6|$)

工作演示在这里

英文:

I've managed to solve the problem with the following regex (which can probably be optimised, but does the job i needed it to do):

/(?&lt;key&gt;value1|value2|value3|value4|value5|value6)[&quot;&#39; ]*?:[ &#39;&quot;]*(?&lt;value&gt;[\s\S]*?)(?:[ &#39;&quot;,]*?\n*?[ &#39;&quot;,]*?)(?=value1|value2|value3|value4|value5|value6|$)/g

It captures the key to the left of the colon in the named group 'key' and allows for optional whitespace or single/double quotes after the key:

(?&lt;key&gt;value1|value2|value3|value4|value5|value6)[&quot;&#39; ]*?:

It then matches the value on the right of the colon using a negative lookahead to find either the next key or the end of the string. Preceding the negative lookahead it looks for:

  • An optional mixture of a spaces, single/double quotes and a comma
  • The value (captured by the named group 'value')
  • An optional newline
  • An optional mixture of a spaces, single/double quotes and a comma
[ &#39;&quot;]*(?&lt;value&gt;[\s\S]*?)(?:[ &#39;&quot;,]*?\n*?[ &#39;&quot;,]*?)(?=value1|value2|value3|value4|value5|value6|$)

Working demo here


<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const str = `value1         :        true,
value2 : &#39;something, something-else&#39;,
value3: [
{
a: &#39;a&#39;,
b: true,
c: 3
}, {
a: Thing,
func: () =&gt; {
return new Thing()
}
}
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false`;
const keyAndValue = new RegExp(/(?&lt;key&gt;value1|value2|value3|value4|value5|value6)[&quot;&#39; ]*?:[ &#39;&quot;]*(?&lt;value&gt;[\s\S]*?)(?:[ &#39;&quot;,]*?\n*?[ &#39;&quot;,]*?)(?=value1|value2|value3|value4|value5|value6|$)/, &#39;g&#39;);
console.log([...str.matchAll(keyAndValue)])

<!-- end snippet -->

答案4

得分: -1

获取到的数据不是JSON,但看起来更像是JavaScript对象,以起始的 { 和结束的 } 剥离。

因此,您可以使用 eval 来解析它,但要注意 eval 的问题,换句话说:确保信任数据源。

您的源代码中包含一些像 Thing() 这样的函数,因此需要进行存根化处理。下面我使用了一个小技巧来捕获异常并自动添加 Thing,它在Chrome和Firefox上对我有效,但解析错误字符串感觉有点巧妙,所以需要注意。

我注意到您的布尔值结果没有包装在字符串中,所以我为此进行了测试,因为我只是使用JSON来处理值部分,所以与您的输出不完全相同,但可以用自定义的方法来更接近您的要求。

如果需要更详细的处理,我建议使用AST来解析它,可能可以使用正则表达式,但我觉得可能会有一些边缘情况会让您困扰。如果使用AST,请不要忘记添加 {} 以便解析。

以下是代码部分的翻译,不包括代码的注释:

const src = `value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
  {
    a: &#39;a&#39;,
    b: true,
    c: 3
  }, {
    a: Thing,
    func: () =&gt; {
      return new Thing()
    }
  }
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false`;

function keyValue(src) {
  const stubs = [];
  for (let l = 0; l < 100; l++) {
    try {
      const stubsTxt = stubs.map(m => `function ${m}(){}`).join(';');
      const p = eval(`${stubsTxt};({${src}})`);
      return Object.entries(p).map(([k, v]) => {
        return [k, typeof v === 'boolean' ? v : JSON.stringify(v)]
      });
    } catch (e) {
      const t = e.toString().split(' ');
      if (t[0] === 'ReferenceError:') {
        stubs.push(t[1]);
      } else break;
    }
  }
}

document.querySelector('pre').innerText = JSON.stringify(keyValue(src), null, '  ');

希望这对您有所帮助。如果您有其他问题,请随时提问。

英文:

The Data your getting is not JSON, but looks like it's a Javascript Object instead, with the starting { and end } taken off.

As such you could just parse this using eval, but be aware of issues of eval, IOW: make sure you trust the source.

Your source has some functions like Thing() so would need stubbing, below I've used a little hack to trap the exceptions and add Thing automatically, it works on here for me on Chrome & Firefox, but parsing an error string just feels a little hacky, so just something to be aware off.

I've noticed your result for boolean's is not wrapping in strings, so I've done a test for that, because I've just used JSON to do the value part it's not exactly the same as your output, but that could be replaced with a custom one that that gets closer to what your after.

Anything more detailed than that, I would suggest using an AST to parse it, it might be possible with regex, but I feel like they could be some edge cases that will catch you out. If you do use an AST, don't forget to add the { & } to make so it can parse.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const src = `value1         :        true,
value2 : &quot;something, something-else&quot;,
value3: [
{
a: &#39;a&#39;,
b: true,
c: 3
}, {
a: Thing,
func: () =&gt; {
return new Thing()
}
}
],
&quot;value4&quot;: [1, 2, 3, 4],
&#39;value5&#39;: &quot;[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]&quot;,
value6: false`;
function keyValue(src) {
const stubs = [];
for (let l = 0; l &lt; 100; l ++) {
try {
const stubsTxt = stubs.map(m=&gt;`function ${m}(){}`).join(&#39;;&#39;);
const p = eval(`${stubsTxt};({${src}})`);
return Object.entries(p).map(([k, v]) =&gt; {
return [k, typeof v === &#39;boolean&#39; ? v : JSON.stringify(v)]
});
} catch (e) {
const t = e.toString().split(&#39; &#39;);
if (t[0] === &#39;ReferenceError:&#39;) {
stubs.push(t[1]);
} else break;
}
}
}
document.querySelector(&#39;pre&#39;).innerText = JSON.stringify(keyValue(src), null, &#39;  &#39;);

<!-- language: lang-html -->

&lt;pre&gt;
&lt;/pre&gt;

<!-- end snippet -->

huangapple
  • 本文由 发表于 2023年7月13日 00:13:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76672573.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定