正则表达式替换HTML并在div中添加空格。

huangapple go评论60阅读模式
英文:

Regex replace html and add spacing on div

问题

I understand your request. Here's the translated content:

你好,我想要创建一个正则表达式,用于替换所有的 HTML 标签,但当相邻的标签是结束的 <div> 和开始的 <div> 时,它会添加一个空格。例如:

This &lt;b&gt;is&lt;/b&gt; &lt;div&gt;a&lt;/div&gt;&lt;div&gt;test&lt;/div&gt;

应该变成:

This is a test

我目前的正则表达式是 /(&lt;([^&gt;]+)&gt;)/ig,它可以替换所有的 HTML 标签,但我想知道如何在相邻的 <div> 标签之间添加空格。

我尝试使用 /(&lt;([^&gt;]+)&gt;)/ig 来替换 HTML,它可以工作,但我需要帮助在相邻的 <div> 标签之间添加空格。

英文:

Hello I would like help creating a regex that replaces all html tags but when there is an end div and start div next to each other it adds a space, so for example

This &lt;b&gt;is&lt;/b&gt; &lt;div&gt;a&lt;/div&gt;&lt;div&gt;test&lt;/div&gt;

This is a test

What I currently have for regex is /(<([^>]+)>)/ig which will replace all html tags but Im wondering how do I also add a space whenever there is a closing div and starting div next to each other.

I tried using /(<([^>]+)>)/ig to replace the html which works but I need help with the spacing on divs when they are next to each other

答案1

得分: 1

JS具有内置的HTML解析支持。请使用它:

function getSpaceSeparatedText(html) {
  // 创建一个元素并将其用作解析器
  let parser = document.createElement('div');
  
  parser.innerHTML = html;
  
  const result = [];
  
  for (const node of parser.childNodes) {
    // 获取修剪后的文本
    const text = node.textContent.trim();
    
    // 如果文本不为空,则添加到结果中
    if (text) {
      result.push(text);
    }
  }
  
  return result.join(' ');
}

尝试一下:

function getSpaceSeparatedText(html) {
  let parser = document.createElement('div');
  
  parser.innerHTML = html;
  
  const result = [];
  
  for (const node of parser.childNodes) {
    const text = node.textContent.trim();
    
    if (text) {
      result.push(text);
    }
  }
  
  return result.join(' ');
}

const html = `
This &lt;b&gt;is&lt;/b&gt; 
&lt;div&gt;a&lt;/div&gt;&lt;div&gt;test&lt;/div&gt;
`;

console.log(getSpaceSeparatedText(html));
英文:

JS has built-in support for HTML parsing. Use that instead:

function getSpaceSeparatedText(html) {
  // Create an element and use it as a parser
  let parser = document.createElement(&#39;div&#39;);
  
  parser.innerHTML = html;
  
  const result = [];
  
  for (const node of parser.childNodes) {
    // Get the trimmed text
    const text = node.textContent.trim();
    
    // If text is not empty, add it to result
    if (text) {
      result.push(text);
    }
  }
  
  return result.join(&#39; &#39;);
}

Try it:

<!-- begin snippet: js hide: true -->

<!-- language: lang-js -->

console.config({ maximize: true });

function getSpaceSeparatedText(html) {
  let parser = document.createElement(&#39;div&#39;);
  
  parser.innerHTML = html;
  
  const result = [];
  
  for (const node of parser.childNodes) {
    const text = node.textContent.trim();
    
    if (text) {
      result.push(text);
    }
  }
  
  return result.join(&#39; &#39;);
}

const html = `
This &lt;b&gt;is&lt;/b&gt; 
&lt;div&gt;a&lt;/div&gt;&lt;div&gt;test&lt;/div&gt;
`;

console.log(getSpaceSeparatedText(html));

<!-- language: lang-html -->

&lt;script src=&quot;https://gh-canon.github.io/stack-snippet-console/console.min.js&quot;&gt;&lt;/script&gt;

<!-- end snippet -->

答案2

得分: 0

更新:将新组添加到顶部会导致后续反向引用偏移一个。

问题已修复。

这会移除所有HTML标签和不可见内容(https://regex101.com/r/2ACiDg/1),但您需要一个回调在关闭div和打开div之间插入一个空格。

英文:

Update: Adding a new group to the top caused an offset by one to the subsequent backreferences.
Was fixed.

This removes all HTML tags and invisible content (https://regex101.com/r/2ACiDg/1),
but you need a callback to insert a space between a closing and open div.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

var text = &quot;This &lt;b&gt;is&lt;/b&gt; &lt;div&gt;a&lt;/div&gt;&lt;div&gt;test&lt;/div&gt;&quot;
text = text.replace(/(&lt;\/div\s*&gt;&lt;div\s*&gt;)|&lt;(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:\s+(?=((?:&quot;[\S\s]*?&quot;|&#39;[\S\s]*?&#39;|(?:(?!\/&gt;)[^&gt;])?)+)))?\s*&gt;)[\S\s]*?&lt;\/\s*(?=&gt;))|(?:\/?[\w:]+\s*\/?)|(?:[\w:]+\s+(?:&quot;[\S\s]*?&quot;|&#39;[\S\s]*?&#39;|[^&gt;]?)+\s*\/?)|\?[\S\s]*?\?|(?:!(?:(?:DOCTYPE[\S\s]*?)|(?:\[CDATA\[[\S\s]*?\]\])|(?:--[\S\s]*?--)|(?:ATTLIST[\S\s]*?)|(?:ENTITY[\S\s]*?)|(?:ELEMENT[\S\s]*?))))&gt;/g, function(match, grp1)
    {
       if ( grp1 &gt; &quot;&quot; ) 
          return &quot; &quot;; 
       else
          return &quot;&quot;
    }
);

console.log( text );

<!-- end snippet -->

huangapple
  • 本文由 发表于 2023年5月11日 05:03:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222535.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定