将HTML转换为可读文本。

huangapple go评论63阅读模式
英文:

Convert HTML to readable text

问题

你可以通过将HTML标签从文本中删除来修改它。你可以使用正则表达式或字符串操作来实现这一点。以下是一个示例代码片段,演示如何删除HTML标签:

import re

def remove_html_tags(text):
    clean = re.compile('<.*?>')
    return re.sub(clean, '', text)

# 示例用法
html_text = '''
<div>Dear Customer,</div>
<div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
<div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
<div>Please find the receipt attached below.</div>
'''

plain_text = remove_html_tags(html_text)
print(plain_text)

这将输出:

Dear Customer,
An amount of Rs. @Payment - Amount to Pay for your Loan Account Number @Payment - Loan Number has been received on @Payment - Date ,vide receipt no @Payment - Receipt Number.
Payment Mode: @Payment - Payment Mode.
Please find the receipt attached below.

你可以根据需要进一步处理纯文本,例如替换特定的占位符。

英文:

In mongo data is saved as

&lt;div&gt;Dear Customer,&lt;/div&gt;&lt;div&gt;An amount of Rs. &lt;a data-value=&quot;Payment - Amount to Pay&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.amount_to_pay_3xkhkenphf&quot;&gt;@Payment - Amount to Pay&lt;/a&gt; for your Loan Account Number &lt;a data-value=&quot;Payment - Loan Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.loan_number_rtcivc45ok&quot;&gt;@Payment - Loan Number&lt;/a&gt; has been received on &lt;a data-value=&quot;Payment - Date&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_date&quot;&gt;@Payment - Date&lt;/a&gt; ,vide receipt no &lt;a data-value=&quot;Payment - Receipt Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.receipt_number_5dow863ae&quot;&gt;@Payment - Receipt Number&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Payment Mode: &lt;a data-value=&quot;Payment - Payment Mode&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_mode_chw0gfq6vo&quot;&gt;@Payment - Payment Mode&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Please find the receipt attached below.&lt;/div&gt;

So when I want to send the email via an external API, in the body I am reading the value from mongo db and directly sending it.

So the in the mail, body is going as

&lt;div&gt;Dear Customer,&lt;/div&gt;
&lt;div&gt;An amount of Rs. &lt;a data-value=&quot;Payment - Amount to Pay&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.amount_to_pay_3xkhkenphf&quot;&gt;@Payment - Amount to Pay&lt;/a&gt; for your Loan Account Number &lt;a data-value=&quot;Payment - Loan Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.loan_number_rtcivc45ok&quot;&gt;@Payment - Loan Number&lt;/a&gt; has been received on &lt;a data-value=&quot;Payment - Date&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_date&quot;&gt;@Payment - Date&lt;/a&gt; ,vide receipt no &lt;a data-value=&quot;Payment - Receipt Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.receipt_number_5dow863ae&quot;&gt;@Payment - Receipt Number&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Payment Mode: &lt;a data-value=&quot;Payment - Payment Mode&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_mode_chw0gfq6vo&quot;&gt;@Payment - Payment Mode&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Please find the receipt attached below.&lt;/div&gt;

However, I need to send it like this

Dear Customer, An amount of Rs. 111 for your Loan Account Number 8204221103679 has been received on 24-Jul-2023 ,vide receipt no 1690195463903291.Payment Mode: Cash. Please find the receipt attached below.

How can I modify it?

答案1

得分: 2

你可以使用DOMParser API将HTML转换为文本。以下是示例代码:

var html = `
   <div>Dear Customer,</div>
   <div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
   <div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
   <div>Please find the receipt attached below.</div>
`;

var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
var text = doc.body.textContent || "";
console.log(text);

这段代码将HTML转换为纯文本。你可以根据需要修改html变量的内容。

另外,以下是读取HTML文件的示例代码:

// 获取包含<script>标签的整个HTML页面
var htmlWithScripts = document.documentElement.outerHTML;

// 创建一个新的DOMParser
var parser = new DOMParser();

// 将HTML字符串解析为新的文档对象
var doc = parser.parseFromString(htmlWithScripts, 'text/html');

// 获取所有的<script>标签
var scripts = doc.getElementsByTagName('script');

// 循环遍历并删除每个<script>标签
for (let i = scripts.length; i--;) {
  scripts[i].parentNode.removeChild(scripts[i]);
}

// 获取更新后的不包含<script>标签的HTML字符串
var htmlWithoutScripts = doc.documentElement.outerHTML;

// 打印不包含<script>标签的HTML字符串
console.log(htmlWithoutScripts);

这段代码将从HTML中移除所有的<script>标签,并打印更新后的HTML字符串。

希望对你有帮助!

英文:

You can use the DOMParser API to convert HTML to text. Here's how:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

var html = `
   &lt;div&gt;Dear Customer,&lt;/div&gt;
&lt;div&gt;An amount of Rs. &lt;a data-value=&quot;Payment - Amount to Pay&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.amount_to_pay_3xkhkenphf&quot;&gt;@Payment - Amount to Pay&lt;/a&gt; for your Loan Account Number &lt;a data-value=&quot;Payment - Loan Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.loan_number_rtcivc45ok&quot;&gt;@Payment - Loan Number&lt;/a&gt; has been received on &lt;a data-value=&quot;Payment - Date&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_date&quot;&gt;@Payment - Date&lt;/a&gt; ,vide receipt no &lt;a data-value=&quot;Payment - Receipt Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.receipt_number_5dow863ae&quot;&gt;@Payment - Receipt Number&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Payment Mode: &lt;a data-value=&quot;Payment - Payment Mode&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_mode_chw0gfq6vo&quot;&gt;@Payment - Payment Mode&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Please find the receipt attached below.&lt;/div&gt;
    `;

    var parser = new DOMParser();
    var doc = parser.parseFromString(html, &#39;text/html&#39;);
    var text = doc.body.textContent || &quot;&quot;;
    console.log(text);


 //or use the values from html use the id and getelement by id or tag

 //let values = {
 // &quot;@Payment - Amount to Pay&quot;: document.getElementById(&#39;amountToPay&#39;).value,
 // &quot;@Payment - Loan Number&quot;: document.getElementById(&#39;loanNumber&#39;).value,
 // &quot;@Payment - Date&quot;: document.getElementById(&#39;paymentDate&#39;).value,
//  &quot;@Payment - Receipt Number&quot;: document.getElementById(&#39;receiptNumber&#39;).value,
 // &quot;@Payment - Payment Mode&quot;: document.getElementById(&#39;paymentMode&#39;).value
//};

let values = {
  &quot;@Payment - Amount to Pay&quot;: &quot;111&quot;,
  &quot;@Payment - Loan Number&quot;: &quot;8204221103679&quot;,
  &quot;@Payment - Date&quot;: &quot;24-Jul-2023&quot;,
  &quot;@Payment - Receipt Number&quot;: &quot;1690195463903291&quot;,
  &quot;@Payment - Payment Mode&quot;: &quot;Cash&quot;
};

for (let key in values) {
  let regex = new RegExp(key, &quot;g&quot;);
  html = html.replace(regex, values[key]);
}
 
var parser = new DOMParser();
    var doc = parser.parseFromString(html, &#39;text/html&#39;);
    var text = doc.body.textContent || &quot;&quot;;
    console.log(text);

<!-- end snippet -->

This is how you can read HTML file

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

// Get the entire HTML of the page including &lt;script&gt; tags
var htmlWithScripts = document.documentElement.outerHTML;

// Create a new DOMParser
var parser = new DOMParser();

// Parse the HTML string into a new document object
var doc = parser.parseFromString(htmlWithScripts, &#39;text/html&#39;);

// Get all script tags
var scripts = doc.getElementsByTagName(&#39;script&#39;);

// Loop through the script tags and remove each one
for (let i = scripts.length; i--;) {
  scripts[i].parentNode.removeChild(scripts[i]);
}

// Get the updated HTML string without &lt;script&gt; tags
var htmlWithoutScripts = doc.documentElement.outerHTML;

// Print the HTML string without &lt;script&gt; tags
 

 

var html = htmlWithoutScripts;
  var parser = new DOMParser();
    var doc = parser.parseFromString(html, &#39;text/html&#39;);
    var text = doc.body.textContent || &quot;&quot;;
    console.log(text);


 //or use the values from html use the id and getelement by id or tag

 //let values = {
 // &quot;@Payment - Amount to Pay&quot;: document.getElementById(&#39;amountToPay&#39;).value,
 // &quot;@Payment - Loan Number&quot;: document.getElementById(&#39;loanNumber&#39;).value,
 // &quot;@Payment - Date&quot;: document.getElementById(&#39;paymentDate&#39;).value,
//  &quot;@Payment - Receipt Number&quot;: document.getElementById(&#39;receiptNumber&#39;).value,
 // &quot;@Payment - Payment Mode&quot;: document.getElementById(&#39;paymentMode&#39;).value
//};

let values = {
  &quot;@Payment - Amount to Pay&quot;: &quot;111&quot;,
  &quot;@Payment - Loan Number&quot;: &quot;8204221103679&quot;,
  &quot;@Payment - Date&quot;: &quot;24-Jul-2023&quot;,
  &quot;@Payment - Receipt Number&quot;: &quot;1690195463903291&quot;,
  &quot;@Payment - Payment Mode&quot;: &quot;Cash&quot;
};

for (let key in values) {
  let regex = new RegExp(key, &quot;g&quot;);
  html = html.replace(regex, values[key]);
}
 
var parser = new DOMParser();
    var doc = parser.parseFromString(html, &#39;text/html&#39;);
    var text = doc.body.textContent || &quot;&quot;;
    console.log(text);

<!-- language: lang-html -->

   &lt;div&gt;Dear Customer,&lt;/div&gt;
&lt;div&gt;An amount of Rs. &lt;a data-value=&quot;Payment - Amount to Pay&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.amount_to_pay_3xkhkenphf&quot;&gt;@Payment - Amount to Pay&lt;/a&gt; for your Loan Account Number &lt;a data-value=&quot;Payment - Loan Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.loan_number_rtcivc45ok&quot;&gt;@Payment - Loan Number&lt;/a&gt; has been received on &lt;a data-value=&quot;Payment - Date&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_date&quot;&gt;@Payment - Date&lt;/a&gt; ,vide receipt no &lt;a data-value=&quot;Payment - Receipt Number&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.receipt_number_5dow863ae&quot;&gt;@Payment - Receipt Number&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Payment Mode: &lt;a data-value=&quot;Payment - Payment Mode&quot; data-mention=&quot;&quot; class=&quot;wysiwyg-mention&quot; href=&quot;payment_new.payment_mode_chw0gfq6vo&quot;&gt;@Payment - Payment Mode&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;Please find the receipt attached below.&lt;/div&gt;

<!-- end snippet -->

答案2

得分: 0

你可以使用DOMParser将HTML代码转换为纯文本,或者创建一个元素,将其innerHTML属性设置为代码,然后通过innerText属性获取字符串。

像这样:

var code = 'Apple is a <span style="color: red">red</span> colored fruit';

var tmp = document.createElement('div');
tmp.innerHTML = code;
var result = tmp.innerText;

console.log(result);
// --> Apple is a red colored fruit
英文:

You can convert your HTML code to pure text, by using DOMParser, or by creating an element, setting its innerHTML property to the code and then getting the string by the innerText property.

Like this:

var code = &#39;Apple is a &lt;span style=&quot;color: red&quot;&gt;red&lt;/span&gt; colored fruit&#39;;

var tmp = document.createElement(&#39;div&#39;);
tmp.innerHTML = code;
var result = tmp.innerText;

console.log(result);
// --&gt; Apple is a red colored fruit

答案3

得分: 0

如果您真的想要在没有任何换行符的情况下显示整个文本,一个简单的解决方案是在邮件的头部插入一个<style>标签,其中包含以下内容:

div {
  display: inline;
}
英文:

If you really want to display the whole text without any linebreaks, a simple solution would be to insert a &lt;style&gt; tag in the head of the mail that contains

div {
  display: inline;
}

huangapple
  • 本文由 发表于 2023年7月27日 17:29:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778342.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定