JS: replacing all occurrences of a word in html with <span> element ONLY for p, span & divs. Not working if parent node contains the word

huangapple go评论57阅读模式
英文:

JS: replacing all occurrences of a word in html with <span> element ONLY for p, span & divs. Not working if parent node contains the word

问题

以下是您要翻译的代码部分:

var newspan = "<span style='color:red;'>BOOFAR</span>";

var regExNameSearch = new RegExp('World','gi');
var lc= 'World'.toLowerCase();

const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).filter(
  (element) => {
    for (let child of element.childNodes) {
      if (child.nodeType === Node.TEXT_NODE && child.textContent.toLowerCase().includes(lc)) {
        console.log('found ' + child.textContent);
        let parent = child.parentNode;
        let html = parent.innerHTML;

        // Find all the child elements in the element
        var excludeElements = parent.querySelectorAll('*');

        if (excludeElements.length == 0){
          console.log('no child elements');
          parent.innerHTML = parent.innerHTML.replace(regExNameSearch, newspan);
          // (also tried this) parent.innerHTML = html;
        }else{

          // Replace the text of each child element with placeholder
          excludeElements.forEach(excludeElement => {
            console.log('phase 1 - replacing - BEFORE');
            html = html.replace(excludeElement.outerHTML, 'FOOBAR');
            console.log('phase 1 - replacing - AFTER');
          });
          html = html.replace(regExNameSearch, newspan);

          // Replace the text of each child element back to its original HTML
          excludeElements.forEach(excludeElement => {
            console.log('phase 2 - replacing - BEFORE:');
            html = html.replace('FOOBAR', excludeElement.outerHTML);
            console.log('phase 2 - replacing - AFTER:');
          });

          // Update the element's innerHTML with the updated HTML
          parent.innerHTML = html;
          
        }
          return true;
      }
    }
    return false;
  }
);

请注意,我已将代码部分翻译成中文,而不是将其执行结果翻译出来。如果您有其他需要翻译的部分,请告诉我。

英文:

I have this html:

&lt;div&gt;
hello world
&lt;p&gt;
the world is round
&lt;img src=&quot;domain.com/world.jpg&quot;&gt;
&lt;/p&gt;
&lt;/div&gt;

And want to replace the word "world" (or mixed case variants thereof) with &lt;span style=&#39;color:red;&#39;&gt;BARFOO&lt;/span&gt; but only in &lt;p&gt;, &lt;div&gt; and a few other specific elements.

In the following code, it changes the text in the &lt;div&gt;, but not in the &lt;p&gt;. A replace operation is done (on something), but does not show up in the browser's html.

If I just supply p to querySelectorAll, then repeat again for &lt;div&gt;, it works fine.

I am thinking that once the code processes the &lt;div&gt; and finds that it has a child element(s), when that element(s) is put back into the html string, then the element reference for the &lt;p&gt; is lost.

jsfiddle is set up here <https://jsfiddle.net/limeygent/t5q8ch23/12/> with more debug statements.

Any thoughts on what is happening & how to fix? (js only solution please)

var newspan = &quot;&lt;span style=&#39;color:red;&#39;&gt;BOOFAR&lt;/span&gt;&quot;;
var regExNameSearch = new RegExp(&#39;World&#39;,&#39;gi&#39;);
var lc= &#39;World&#39;.toLowerCase();
const elements = Array.from(document.querySelectorAll(&#39;p, span, div, strong, h1, h2, h3, h4&#39;)).filter(
(element) =&gt; {
for (let child of element.childNodes) {
if (child.nodeType === Node.TEXT_NODE &amp;&amp; child.textContent.toLowerCase().includes(lc)) {
console.log(&#39;found &#39; + child.textContent);
let parent = child.parentNode;
let html = parent.innerHTML;
// Find all the child elements in the element
var excludeElements = parent.querySelectorAll(&#39;*&#39;);
if (excludeElements.length == 0){
console.log(&#39;no child elements&#39;);
parent.innerHTML = parent.innerHTML.replace(regExNameSearch, newspan);
// (also tried this) parent.innerHTML = html;
}else{
// Replace the text of each child element with placeholder
excludeElements.forEach(excludeElement =&gt; {
console.log(&#39;phase 1 - replacing - BEFORE&#39;);
html = html.replace(excludeElement.outerHTML, &#39;FOOBAR&#39;);
console.log(&#39;phase 1 - replacing - AFTER&#39;);
});
html = html.replace(regExNameSearch, newspan);
// Replace the text of each child element back to its original HTML
excludeElements.forEach(excludeElement =&gt; {
console.log(&#39;phase 2 - replacing - BEFORE:&#39;);
html = html.replace(&#39;FOOBAR&#39;, excludeElement.outerHTML);
console.log(&#39;phase 2 - replacing - AFTER:&#39;);
});
// Update the element&#39;s innerHTML with the updated HTML
parent.innerHTML = html;
}
return true;
}
}
return false;
}
);

edit: if you supply an answer recc. editing the innerHTML, make sure it doesn't affect any child nodes. The code I present here got super complex because I had to avoid editing anything further inside the node.
Oh, and if you present reccs from chatGPT (while it can be useful), please test what you post first JS: replacing all occurrences of a word in html with <span> element ONLY for p, span & divs. Not working if parent node contains the word

答案1

得分: 1

你可以使用TreeWalker API来实现所需的结果。

基本逻辑如下:

迭代符合指定条件的text节点:文本内容与不区分大小写的正则表达式模式匹配,并且节点是元素的直接子节点(或者,如果需要,是后代)并且该元素匹配你的选择器。

对于每个匹配的文本节点:删除它的父节点,但首先拆分节点的文本内容,对于每个生成的字符串:

  • 如果它不为空,将其作为新的文本节点插入到父节点中(就在匹配的节点之前)。在每个字符串之前(除了第一个):创建替代的&lt;span&gt;节点的副本并插入它。

TS Playground

function assert (expr: unknown, msg?: string
<details>
<summary>英文:</summary>
You can use the [`TreeWalker`](https://developer.mozilla.org/en-US/docs/Web/API/TreeWalker) API to achieve the desired results.
The essential logic is this:
Iterate [text](https://developer.mozilla.org/en-US/docs/Web/API/Text) nodes that meet the specified criteria: the text content matches the case-insensitive regular expression pattern and the node is the direct child (or, if desired, a [descendant](https://developer.mozilla.org/en-US/docs/Web/API/Element/closest)) of an element that [matches](https://developer.mozilla.org/en-US/docs/Web/API/Element/matches) your selector.
For each matched text node: [remove](https://developer.mozilla.org/en-US/docs/Web/API/Node/removeChild) it from its parent, but first [split](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split) the node&#39;s text content on the regular expression pattern, and for each resulting string:
- If it is non-empty, re-[insert](https://developer.mozilla.org/en-US/docs/Web/API/Node/insertBefore) it into the parent node (just before the matched node) as a new text node. Before each string (except the first): create a copy of your substitute `&lt;span&gt;` node and insert it as well.
[TS Playground](https://www.typescriptlang.org/play?noUncheckedIndexedAccess=true&amp;target=99&amp;jsx=4&amp;useUnknownInCatchVariables=true&amp;exactOptionalPropertyTypes=true#code/GYVwdgxgLglg9mABAQwM6oKYCcqIBQYAeADlgFyLgDWYcA7mADSIC2qA5gPwWpRYxh2ASgppMOVIiKlEAbwBQiRDGD4AhNKxDEUABZZ6iMBjqIAolgNY8bYQG55AX3nzQkWAkQQsGZFAwAKkRQAHJwACYYAGIwADb+WFFIeD7s0hQAShjsZiTMyJAYvHBYAMoYsRjQJTx8AsIUeP6EoREYFEEt2gC8AHxGICwARthyiog+UCBYyU3BYZEdwSIDw6N9Y0pKKup441s6820AdM1QAMII-mBQ+1sAZPcT2dKnRVBzLQsYby2XNxgbkI7kJtJNpkhvjF4thjlEAJIAGQCZgyAH0MmYAFJmc4BBx3AD0hMQATgiGAAnCKDAAE9WH4ILp6jSIO8SvhaLgAFYgXg6XQYRDhGA+aCIYjIHxAshEkkQBD8gBuyFiMGp3UQACE4HBKgVPq1IsdJdKoGZKixAVBOMcILE4JheHgCmzimUKlUoCVQQ4DgqwMrVerEJqzt8TVLrRaMFabraWIzBagXYV3eVKtUtH6tjs8Cq1eEwRgpjNEFC4gk4UiUeiAILnc5mAAK+JcB3BZYrMKw1eRqIx2NxbaUjiEDmcrnA0HgSG8vn8pRAQ14MCm-m++BWAAkAgBZRGlSVgGNx3AKJQB-moY+h4VwCCDa12nx+DCn614ABEN4KX-H4y-mAvwXFc1p3l+Wp1hkUQAPKwV+OZAccvC0pUKEls2BjENgUC0t+CoOlgX7MF+PjhP+OadkgQETi4bgzp4fAFKgwAlCwnRGkKeC0Is5ZtMwqTpIgWQ5CQKxKnAIYXl4irnqa1rfMwZz-NcUCOHevEYDmYi4XgCk3EpiBfs2UY3EYbQWbgbHgBRAFKLpOCGqp1qkZxskAuZXIUnAtmUe2Hn8mu6yHH8YE3ChxBqh8QniQA2qUtLDHqxzBVgfglAAungAHjFe1mirwWSoCA8R3mlxzGC0OU5nmlJYEVRSlVAxzhAgGDFqWYC1aoeD1Y1JXxMcBYgD8lSCHoiD9AADNoMlKAZXGpYGuFahgbE+DxJiksEfWFVAxXNcNqqjUIzBafZiCTkoG34PliC8FgiBwKoaVzXci0RgI4hQGtG0YHg85vkuK6wOuGDfDl51tJd2y9Y9xzjewk3dKjiCzR5sBgKNOYLWZS3fat60lADximJxeCPWdFmRJd10SvjEY+CwcBKhg5zMrE4Q8TDdFTu4s4MgIW6bIFuABNBADiZgBIO0sABrNnehJ0CUXOEjAOb3RLGTS7LpRmIiw6wRkEEikqzC6AAjJbABMlsAMyWwALMwxDMEBHt8Ag7CIQF91QKYmptY+Z4vgugQ+BgADqqpUNgewHCHT4RUMES0owdzdlWpTbrB0doii8sBJnByyMgEBssQXEUED-icdn2BJHgOt63LZiK8wrcy2iBtG3iJtCI4peILlSiVLgWl3oHlVHJENUBXQnPcRdotKMSiB1uEKqFAKQoBFHsexPHWAAOSSGlGVPbwb6ICM-2sCAUB+CyehCo+ljgVpsr+nJrCxuSTUWlcY0yFGGOgs8vhtAXgcZigYNocTngDK0LMu5Sx7piBWzY6ZOBcImAQC8gA)
```lang-ts
function assert (expr: unknown, msg?: string): asserts expr {
if (!expr) throw new Error(msg);
}
function createTextNodeFilterFn (regexp: RegExp, ancestorSelector: string): (textNode: Text) =&gt; number {
return ((textNode: Text): number =&gt; {
if (!(
textNode.textContent
&amp;&amp; regexp.test(textNode.textContent)
)) return NodeFilter.FILTER_REJECT;
// To find any matching ancestor (not just the direct parent):
// const valid = Boolean(textNode.parentElement?.closest(ancestorSelector));
const valid = textNode.parentElement?.matches(ancestorSelector);
if (valid) return NodeFilter.FILTER_ACCEPT;
return NodeFilter.FILTER_REJECT;
});
}
function createSubstituteNode (): HTMLSpanElement {
const span = document.createElement(&quot;span&quot;);
span.textContent = &quot;BARFOO&quot;;
span.style.setProperty(&quot;color&quot;, &quot;red&quot;);
return span;
}
function transformTextNode (node: Node, regexp: RegExp): void {
const {parentNode, textContent} = node;
assert(parentNode, &quot;Parent node not found&quot;);
assert(textContent, &quot;Text content not found&quot;);
const iter = textContent.split(regexp)[Symbol.iterator]();
const firstResult = iter.next();
if (firstResult.done) return;
if (firstResult.value.length &gt; 0) {
parentNode.insertBefore(new Text(firstResult.value), node);
}
for (const str of iter) {
parentNode.insertBefore(createSubstituteNode(), node);
if (str.length === 0) continue;
parentNode.insertBefore(new Text(str), node);
}
parentNode.removeChild(node);
}
function main () {
const TARGET_REGEXP = /world/i;
const TARGET_SELECTOR = &quot;div, h1, h2, h3, h4, p, span, strong&quot;;
const tw = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
{acceptNode: createTextNodeFilterFn(TARGET_REGEXP, TARGET_SELECTOR)},
);
let node = tw.nextNode();
while (node) {
// Advance the TreeWalker&#39;s iterator state before mutating the current node:
const memo = node;
node = tw.nextNode();
transformTextNode(memo, TARGET_REGEXP);
}
}
main();

The TS code above, compiled to plain JavaScript in a runnable snippet:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

&quot;use strict&quot;;
function assert(expr, msg) {
if (!expr)
throw new Error(msg);
}
function createTextNodeFilterFn(regexp, ancestorSelector) {
return ((textNode) =&gt; {
if (!(textNode.textContent
&amp;&amp; regexp.test(textNode.textContent)))
return NodeFilter.FILTER_REJECT;
// To find any matching ancestor (not just the direct parent):
// const valid = Boolean(textNode.parentElement?.closest(ancestorSelector));
const valid = textNode.parentElement?.matches(ancestorSelector);
if (valid)
return NodeFilter.FILTER_ACCEPT;
return NodeFilter.FILTER_REJECT;
});
}
function createSubstituteNode() {
const span = document.createElement(&quot;span&quot;);
span.textContent = &quot;BARFOO&quot;;
span.style.setProperty(&quot;color&quot;, &quot;red&quot;);
return span;
}
function transformTextNode(node, regexp) {
const { parentNode, textContent } = node;
assert(parentNode, &quot;Parent node not found&quot;);
assert(textContent, &quot;Text content not found&quot;);
const iter = textContent.split(regexp)[Symbol.iterator]();
const firstResult = iter.next();
if (firstResult.done)
return;
if (firstResult.value.length &gt; 0) {
parentNode.insertBefore(new Text(firstResult.value), node);
}
for (const str of iter) {
parentNode.insertBefore(createSubstituteNode(), node);
if (str.length === 0)
continue;
parentNode.insertBefore(new Text(str), node);
}
parentNode.removeChild(node);
}
function main() {
const TARGET_REGEXP = /world/i;
const TARGET_SELECTOR = &quot;div, h1, h2, h3, h4, p, span, strong&quot;;
const tw = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT, { acceptNode: createTextNodeFilterFn(TARGET_REGEXP, TARGET_SELECTOR) });
let node = tw.nextNode();
while (node) {
// Advance the TreeWalker&#39;s iterator state before mutating the current node:
const memo = node;
node = tw.nextNode();
transformTextNode(memo, TARGET_REGEXP);
}
}
main();

<!-- language: lang-html -->

&lt;div&gt;
hello world
&lt;p&gt;
the world is round
&lt;img src=&quot;domain.com/world.jpg&quot;&gt;
&lt;/p&gt;
&lt;/div&gt;

<!-- end snippet -->

答案2

得分: 0

使用朋友的帮助,解释了由querySelectorAll返回的节点列表"array"是静态的,这就解释了为什么节点被忽略或被覆盖。建议是从DOM树的最低级别开始,执行innerHTML替换,然后向上处理整个树。

感谢Rob的解释:document.querySelectorAll返回一个静态的节点列表,当函数被调用时是准确的,但如果文档发生变化,就不再准确。使用.innerHTML来替换"world"会删除并重新创建标签中的所有现有内容,包括<p>标签及其内容。现在在页面上的<p>标签是一个全新的标签,不是由document.querySelectorAll返回的节点引用。

querySelectorAll使用深度优先遍历进行前序操作返回一个"array"(不完全是数组,但为了这个答案的目的,这个术语被宽松使用)。有关树遍历方法,请阅读此处:https://en.wikipedia.org/wiki/Tree_traversal

我需要从节点数组的最低级别开始,以避免干扰到子节点的引用。

这是更改:

(旧)

const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).filter(

(新)

const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).reverse().filter(

在此问题中的示例HTML代码以及一些其他变化中,它运行正常。我将继续进行进一步测试。

欢迎评论和注意事项。

新的fiddle:https://jsfiddle.net/9vwo6a3q/

英文:

With the help of a friend, explaining that the nodelist "array" returned by querySelectorAll is static, that explains why nodes were being missed or overwritten. The suggestion was to start at the lowest level of the DOM tree, perform the innerHTML replacement, then work up the tree.

Hat tip to Rob for his explanation: document.querySelectorAll returns a static nodelist which is accurate when the function is called but isn't accurate if the document is changed. Using .innerHTML to make the replacement of "world" deletes and recreates all existing content in the tag including the &lt;p&gt; tag and its contents the &lt;p&gt; tag that is now on the page is a completely new one that isn't referenced by the node returned by document.querySelectorAll

querySelectorAll returns an "array" (not quite, but the term is used loosely for purposes of this answer) using the depth-first traversal in pre-order operation. Read more here <https://en.wikipedia.org/wiki/Tree_traversal> for tree traversal methods.

I needed to start at the lowest levels of the node arrays so as to not mangle any references to child nodes.

Here is the change:

(old)

const elements = Array.from(document.querySelectorAll(&#39;p, span, div, strong, h1, h2, h3, h4&#39;)).filter(

(new)

const elements = Array.from(document.querySelectorAll(&#39;p, span, div, strong, h1, h2, h3, h4&#39;)).reverse().filter(

On the sample html code in this question, and some other variations, it works fine. I'll continue to test further.

Comments / pitfalls welcomed.

New fiddle <https://jsfiddle.net/9vwo6a3q/>

huangapple
  • 本文由 发表于 2023年2月16日 04:50:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465314.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定