英文:
Strip html with regex, except tags that contains a character
问题
我有一个导入电子邮件到数据库的程序。为了在另一个程序中使电子邮件更易读,我必须去掉其中的HTML标记。我正在使用以下字符串扩展来去掉HTML标记。
```C#
public static string StripHtml(this string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
问题是,当我复制转发的邮件时,发件人的电子邮件写在一个标记内。
<br>< example@forwared.com >
有没有办法使用正则表达式去除所有标记,除了包含@或电子邮件的标记?
这里提供了一个可能的解决方案:https://stackoverflow.com/questions/16708158/remove-html-tags-except-br-or-br-tags-with-javascript。但如果有办法只使用正则表达式来完成,我更喜欢那样做。
<details>
<summary>英文:</summary>
I have a program that imports emails to a database. To make the emails more readable in another program I have to strip it for html. I am using this string extension to strip the html.
public static string StripHtml(this string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
The problem is that when I copy forwarded mails, the email of the sender is written inside a tag.
<br>< example@forwared.com >
Is there a way to use regex to remove all the tags, except tags that contains @ or an email?
The solution here is a possible way: https://stackoverflow.com/questions/16708158/remove-html-tags-except-br-or-br-tags-with-javascript. But If there is a way to do it with just regex I prefer to do that.
</details>
# 答案1
**得分**: 2
你可以通过向你原始的正则表达式添加额外条件来使用下面的 `Regex` 以满足你的要求:
<.[^@]*?>
演示链接:https://regex101.com/r/CNOvS7/1/
<details>
<summary>英文:</summary>
You can use the below `Regex` by adding an extra condition to your original regex to achieve your requirement:
<.[^@]*?>
Working Demo: https://regex101.com/r/CNOvS7/1/
</details>
# 答案2
**得分**: -1
使用 [^@]* 而不是 .*
这是一个匹配除了 @ 之外的任何字符的字符集。^ 代表“非”。你也可以使用 [^0-9]* 来排除所有数字,例如。
<details>
<summary>英文:</summary>
Use [^@]* instead of .*
It’s a character set of anything but @. The ^ stands for “not”. You could also do something like that [^0-9]* to exclude all numbers for example.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论