英文:
How to take specific set of characters out of overall string and save to array or list?
问题
我有一个包含Unicode字符的字符串,我试图从整个字符串中提取每个Unicode字符并保存到一个列表/数组中。
这是整个字符串:
"test 🔷 test 💙 test 🔹"
我想要以下列表:
1. 🔷 2. 💙 3. 🔹
我目前正在尝试以下方法:
string[] emojiSeparators = new string[] { "&#", ";" };
string[] resultEmojis;
resultEmojis = noHtmlEmoji.Split(
emojiSeparators, StringSplitOptions.RemoveEmptyEntries);
但是我得到了类似下面这样的单词"test"
被添加到列表中:
我只想要将Unicode字符保存到我的列表中,以便我可以迭代它们并执行操作。
英文:
I have a string with Unicodes inside of it, and I am trying to extract each unicode from the overall string and save it to a list/array..
This is the overall string:
"test 🔷 test 💙 test 🔹"
I want the following list:
1. 🔷 2. 💙 3. 🔹
Right now I am trying the following:
string[] emojiSeparators = new string[] { "&#", ";" };
string[] resultEmojis;
resultEmojis = noHtmlEmoji.Split(
emojiSeparators, StringSplitOptions.RemoveEmptyEntries);
But I am getting the words "test"
added to the list like below:
I only want the unicodes saved to my list, so that I can iterate over them and do things.
答案1
得分: 3
我建议使用正则表达式进行匹配:
using System.Linq;
using System.Text.RegularExpressions;
...
string[] resultEmojis = Regex
.Matches(noHtmlEmoji, @"&#[1-9][0-9]{5}(?=;)")
.Cast<Match>()
.Select(match => match.Value)
.ToArray();
模式 &#[1-9][0-9]{5}(?=;)
解释:
&# - 匹配 &# 字符
[1-9] - 匹配 1 到 9 范围内的数字
[0-9]{5} - 匹配 0 到 9 范围内的 5 个数字
(?=;) - 匹配不包括在结果中的 ; 字符
英文:
I suggest matching with a help of regular expression:
using System.Linq;
using System.Text.RegularExpressions;
...
string[] resultEmojis = Regex
.Matches(noHtmlEmoji, @"&#[1-9][0-9]{5}(?=;)")
.Cast<Match>()
.Select(match => match.Value)
.ToArray();
Pattern &#[1-9][0-9]{5}(?=;)
explained:
&# - &# characters
[1-9] - digit in 1..9 range
[0-9]{5} - 5 digits in 0..9 range
(?=;) - ; character which is not included into the match
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论