英文:
Convert HTML to readable text
问题
你可以通过将HTML标签从文本中删除来修改它。你可以使用正则表达式或字符串操作来实现这一点。以下是一个示例代码片段,演示如何删除HTML标签:
import re
def remove_html_tags(text):
clean = re.compile('<.*?>')
return re.sub(clean, '', text)
# 示例用法
html_text = '''
<div>Dear Customer,</div>
<div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
<div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
<div>Please find the receipt attached below.</div>
'''
plain_text = remove_html_tags(html_text)
print(plain_text)
这将输出:
Dear Customer,
An amount of Rs. @Payment - Amount to Pay for your Loan Account Number @Payment - Loan Number has been received on @Payment - Date ,vide receipt no @Payment - Receipt Number.
Payment Mode: @Payment - Payment Mode.
Please find the receipt attached below.
你可以根据需要进一步处理纯文本,例如替换特定的占位符。
英文:
In mongo data is saved as
<div>Dear Customer,</div><div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
<div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
<div>Please find the receipt attached below.</div>
So when I want to send the email via an external API, in the body I am reading the value from mongo db and directly sending it.
So the in the mail, body is going as
<div>Dear Customer,</div>
<div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
<div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
<div>Please find the receipt attached below.</div>
However, I need to send it like this
Dear Customer, An amount of Rs. 111 for your Loan Account Number 8204221103679 has been received on 24-Jul-2023 ,vide receipt no 1690195463903291.Payment Mode: Cash. Please find the receipt attached below.
How can I modify it?
答案1
得分: 2
你可以使用DOMParser
API将HTML转换为文本。以下是示例代码:
var html = `
<div>Dear Customer,</div>
<div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
<div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
<div>Please find the receipt attached below.</div>
`;
var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
var text = doc.body.textContent || "";
console.log(text);
这段代码将HTML转换为纯文本。你可以根据需要修改html
变量的内容。
另外,以下是读取HTML文件的示例代码:
// 获取包含<script>标签的整个HTML页面
var htmlWithScripts = document.documentElement.outerHTML;
// 创建一个新的DOMParser
var parser = new DOMParser();
// 将HTML字符串解析为新的文档对象
var doc = parser.parseFromString(htmlWithScripts, 'text/html');
// 获取所有的<script>标签
var scripts = doc.getElementsByTagName('script');
// 循环遍历并删除每个<script>标签
for (let i = scripts.length; i--;) {
scripts[i].parentNode.removeChild(scripts[i]);
}
// 获取更新后的不包含<script>标签的HTML字符串
var htmlWithoutScripts = doc.documentElement.outerHTML;
// 打印不包含<script>标签的HTML字符串
console.log(htmlWithoutScripts);
这段代码将从HTML中移除所有的<script>
标签,并打印更新后的HTML字符串。
希望对你有帮助!
英文:
You can use the DOMParser
API to convert HTML to text. Here's how:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-js -->
var html = `
<div>Dear Customer,</div>
<div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
<div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
<div>Please find the receipt attached below.</div>
`;
var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
var text = doc.body.textContent || "";
console.log(text);
//or use the values from html use the id and getelement by id or tag
//let values = {
// "@Payment - Amount to Pay": document.getElementById('amountToPay').value,
// "@Payment - Loan Number": document.getElementById('loanNumber').value,
// "@Payment - Date": document.getElementById('paymentDate').value,
// "@Payment - Receipt Number": document.getElementById('receiptNumber').value,
// "@Payment - Payment Mode": document.getElementById('paymentMode').value
//};
let values = {
"@Payment - Amount to Pay": "111",
"@Payment - Loan Number": "8204221103679",
"@Payment - Date": "24-Jul-2023",
"@Payment - Receipt Number": "1690195463903291",
"@Payment - Payment Mode": "Cash"
};
for (let key in values) {
let regex = new RegExp(key, "g");
html = html.replace(regex, values[key]);
}
var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
var text = doc.body.textContent || "";
console.log(text);
<!-- end snippet -->
This is how you can read HTML file
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-js -->
// Get the entire HTML of the page including <script> tags
var htmlWithScripts = document.documentElement.outerHTML;
// Create a new DOMParser
var parser = new DOMParser();
// Parse the HTML string into a new document object
var doc = parser.parseFromString(htmlWithScripts, 'text/html');
// Get all script tags
var scripts = doc.getElementsByTagName('script');
// Loop through the script tags and remove each one
for (let i = scripts.length; i--;) {
scripts[i].parentNode.removeChild(scripts[i]);
}
// Get the updated HTML string without <script> tags
var htmlWithoutScripts = doc.documentElement.outerHTML;
// Print the HTML string without <script> tags
var html = htmlWithoutScripts;
var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
var text = doc.body.textContent || "";
console.log(text);
//or use the values from html use the id and getelement by id or tag
//let values = {
// "@Payment - Amount to Pay": document.getElementById('amountToPay').value,
// "@Payment - Loan Number": document.getElementById('loanNumber').value,
// "@Payment - Date": document.getElementById('paymentDate').value,
// "@Payment - Receipt Number": document.getElementById('receiptNumber').value,
// "@Payment - Payment Mode": document.getElementById('paymentMode').value
//};
let values = {
"@Payment - Amount to Pay": "111",
"@Payment - Loan Number": "8204221103679",
"@Payment - Date": "24-Jul-2023",
"@Payment - Receipt Number": "1690195463903291",
"@Payment - Payment Mode": "Cash"
};
for (let key in values) {
let regex = new RegExp(key, "g");
html = html.replace(regex, values[key]);
}
var parser = new DOMParser();
var doc = parser.parseFromString(html, 'text/html');
var text = doc.body.textContent || "";
console.log(text);
<!-- language: lang-html -->
<div>Dear Customer,</div>
<div>An amount of Rs. <a data-value="Payment - Amount to Pay" data-mention="" class="wysiwyg-mention" href="payment_new.amount_to_pay_3xkhkenphf">@Payment - Amount to Pay</a> for your Loan Account Number <a data-value="Payment - Loan Number" data-mention="" class="wysiwyg-mention" href="payment_new.loan_number_rtcivc45ok">@Payment - Loan Number</a> has been received on <a data-value="Payment - Date" data-mention="" class="wysiwyg-mention" href="payment_new.payment_date">@Payment - Date</a> ,vide receipt no <a data-value="Payment - Receipt Number" data-mention="" class="wysiwyg-mention" href="payment_new.receipt_number_5dow863ae">@Payment - Receipt Number</a>.</div>
<div>Payment Mode: <a data-value="Payment - Payment Mode" data-mention="" class="wysiwyg-mention" href="payment_new.payment_mode_chw0gfq6vo">@Payment - Payment Mode</a>.</div>
<div>Please find the receipt attached below.</div>
<!-- end snippet -->
答案2
得分: 0
你可以使用DOMParser
将HTML代码转换为纯文本,或者创建一个元素,将其innerHTML
属性设置为代码,然后通过innerText
属性获取字符串。
像这样:
var code = 'Apple is a <span style="color: red">red</span> colored fruit';
var tmp = document.createElement('div');
tmp.innerHTML = code;
var result = tmp.innerText;
console.log(result);
// --> Apple is a red colored fruit
英文:
You can convert your HTML code to pure text, by using DOMParser
, or by creating an element, setting its innerHTML
property to the code and then getting the string by the innerText
property.
Like this:
var code = 'Apple is a <span style="color: red">red</span> colored fruit';
var tmp = document.createElement('div');
tmp.innerHTML = code;
var result = tmp.innerText;
console.log(result);
// --> Apple is a red colored fruit
答案3
得分: 0
如果您真的想要在没有任何换行符的情况下显示整个文本,一个简单的解决方案是在邮件的头部插入一个<style>
标签,其中包含以下内容:
div {
display: inline;
}
英文:
If you really want to display the whole text without any linebreaks, a simple solution would be to insert a <style>
tag in the head of the mail that contains
div {
display: inline;
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论