NodeJS正则表达式在string.search()上返回0。

huangapple go评论77阅读模式
英文:

NodeJS regex returns 0 on string.search()

问题

I'm working on a NodeJS script (launched from the CMD with the node command) getting me some HTML content in a string, in which I need to extract some data between a specific <div> element. I'm having a hard time figuring why this portion of code doesn't give me the desired output.

const input = '<div class="some_class">Some data</div><div class="some_other_class">< class="some_other_other_class">...</div></div>';
const regex = new RegExp(/<div class="some_class">(.*?)<\/div>/g);
let obj = {
    'tmp': input.search(regex),
}
console.log(obj) // outputs { tmp: 0}
console.log(input.search(/<div class="some_class">(.*?)<\/div>/g)) // outputs 0
 
const x = input.search(/<div class="some_class">(.*?)<\/div>/g);
console.log(x) // outputs 0

I know this seems a bit of a regular issue here, but I tried passing the Regex with string format (between single quotes '), passing it as a Regex (between delimiter /) and finally by defining a new RegExp element, but without success. I always happen to get 0 as an output.

However, when I test it on an online tool, it does match and capture the desired data in the group #1: RegexTester

I don't know if I'm missing something or if I'm doing something wrong, but after some hours spent on this issue, I'm quite struggling to get my ideas straight.

英文:

I'm working on a NodeJS script (launched from the CMD with the node command) getting me some HTML content in a string, in which I need to extract some data between a specific &lt;div&gt; element. I'm having a hard time firguring why this portion of code doesn't give me the desired output.

const input = &#39;&lt;div class=&quot;some_class&quot;&gt;Some data&lt;/div&gt;&lt;div class=&quot;some_other_class&quot;&gt;&lt; class=&quot;some_other_other_class&quot;&gt;...&lt;/div&gt;&lt;/div&gt;&#39;
const regex = new RegExp(/&lt;div class=&quot;some_class&quot;\&gt;(.*?)&lt;\/div&gt;/g)
let obj = {
    &#39;tmp&#39;: input.search(regex),
}
console.log(obj) // outputs { tmp: 0}
console.log(input.search(/&lt;div class=&quot;some_class&quot;\&gt;(.*?)&lt;\/div&gt;/g)) // outputs 0
 
const x = input.search(/&lt;div class=&quot;some_class&quot;\&gt;(.*?)&lt;\/div&gt;/g)
console.log(x) // outputs 0

I know this seems a bit of a regular issue here, but I tried passing the Regex with string format (between single quotes '), passing it as a Regex (between delimiter /) and finally by defining a new RegExp element, but without success. I always happen to get 0 as an output.

However, when I test it on an online tool, it does match and capture the desired data in the group #1 : https://www.regextester.com/?fam=131034

I don't know if I'm missing something or if I'm doing something wrong, but after some hours spent on this issue, I'm quite struggling to get my ideas straight.

答案1

得分: 1

String::search() 返回找到的字符串位置,这在你的情况下是 0,这是完全正确的。
你需要使用 String::match(),别忘了获取正确的正则表达式组索引。

const input = '&lt;div class=&quot;some_class&quot;&gt;Some data&lt;/div&gt;&lt;div class=&quot;some_other_class&quot;&gt;&lt; class=&quot;some_other_other_class&quot;&gt;...&lt;/div&gt;&lt;/div&gt;';
console.log(input.match(/&lt;div class=&quot;some_class&quot;&gt;(.*?)&lt;\/div&gt;/)?.[1])

要避免处理组,有时我更喜欢使用断言:

const input = '&lt;div class=&quot;some_class&quot;&gt;Some data&lt;/div&gt;&lt;div class=&quot;some_other_class&quot;&gt;&lt; class=&quot;some_other_other_class&quot;&gt;...&lt;/div&gt;&lt;/div&gt;';
console.log(...input.match(/(?&lt;=&lt;div class=&quot;some_class&quot;&gt;).*?(?=&lt;\/div&gt;)/))

如果你的HTML经常更改,我建议使用 https://www.npmjs.com/package/jsdom 来使用DOM来访问你所需标签内的内容。

英文:

String::search() returns the found string's position, which is 0 in your case which is perfectly right.
You need String::match() and don't forget to get the right regexp group index:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const input = &#39;&lt;div class=&quot;some_class&quot;&gt;Some data&lt;/div&gt;&lt;div class=&quot;some_other_class&quot;&gt;&lt; class=&quot;some_other_other_class&quot;&gt;...&lt;/div&gt;&lt;/div&gt;&#39;

console.log(input.match(/&lt;div class=&quot;some_class&quot;&gt;(.*?)&lt;\/div&gt;/)?.[1])

<!-- end snippet -->

To avoid bothering with the groups I prefer sometimes use assertions:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

const input = &#39;&lt;div class=&quot;some_class&quot;&gt;Some data&lt;/div&gt;&lt;div class=&quot;some_other_class&quot;&gt;&lt; class=&quot;some_other_other_class&quot;&gt;...&lt;/div&gt;&lt;/div&gt;&#39;

console.log(...input.match(/(?&lt;=&lt;div class=&quot;some_class&quot;&gt;).*?(?=&lt;\/div&gt;)/))

<!-- end snippet -->

If your html changes often I recommend to use https://www.npmjs.com/package/jsdom
to use DOM to access content inside your needed tags.

huangapple
  • 本文由 发表于 2023年6月8日 02:54:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76426286.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定