英文:
remove special characters and punctuations from string
问题
我有一个函数,可以获取帖子中的所有hashtag,并以逗号分隔的形式输出这些单词(因为一个帖子可能有许多hashtag),以便存储在数据库列中。
function getHashtags ($text) {
// 以空格为分隔符分割字符串
$text = explode(" ", $text);
$hashtag = "";
$hashReg = "/^[a-zA-Z0-9]+$/";
// 遍历帖子中的每个单词
foreach ($text as $word) {
// 第一个字符是#
$char = substr($word, 0, 1);
// #后面的单词
$ref = substr($word, 1);
// 如果单词的第一个字符是#
if ($char == "#") {
// 检查是否只有字母和数字
if (preg_match ($hashReg, $ref)) {
// 检查hashtag的长度
if (strlen($ref) <= 11) {
// 设置hashtag
$hashtag .= substr($word, 1).",";
}
}
}
}
return $hashtag;
}
该函数可以正常工作,例如:
$post = "#rock #music is good";
echo getHashtags($post);
// 输出:rock,music,
然而,如果示例是 $post=" #rock, #music, is good"
,#rock
和 #music
后面的逗号将使该函数无法正常工作。这也会发生在其他字符如句号、问号等情况下。我尝试过添加 preg_replace('/[^A-Za-z0-9]/', '', $post)
,但它不起作用。我该如何修复这个问题,以便 #rock, #music,
或 #rock. #music.
仍然输出期望的结果 rock,music
。
英文:
I have a function that gets all the hashtag words in a post and outputs the words separated by a comma (because a post can have many hashtags) to be stored in a database column.
function getHashtags ($text) {
// explode on spaces
$text = explode(" ", $text);
$hashtag = "";
$hashReg = "/^[a-zA-Z0-9]+$/";
// for every word in post
foreach ($text as $word) {
// 1st character #
$char = substr($word, 0, 1);
// word after character #
$ref = substr($word, 1);
// if 1st character in word is #
if ($char == "#") {
// check if only letters & numbers
if (preg_match ($hashReg, $ref)) {
// check hashtag length
if (strlen($ref) <= 11) {
// set hashtag
$hashtag .= substr($word, 1).",";
}
}
}
}
return $hashtag;
}
The function works well, e.g
$post = "#rock #music is good";
echo getHashtags($post);
// output: rock,music,
However if the example was $post="#rock, #music, is good"
the comma after #rock
and #music
will make the function not work, this will also happen with any other characters like fullstops, question marks etc. I have tried adding a preg_replace('/[^A-Za-z0-9]/', '', $post)
but it does not work. How can I fix it so that #rock, #music,
or #rock. #music.
will still output the desired result of rock,music
答案1
得分: 1
以下是您要的代码的中文翻译:
您可以简单地使用preg_replace来删除标签之间的所有字符和空格,然后使用#分割它。
示例:
function getHashtags ($text) {
$clean = preg_replace("/[^A-Za-z0-9]#/", "#", $text);
$text = explode("#", $clean);
$hashtag = [];
foreach ($text as $word) {
if ($word){
$hashtag[]= $word;
}
}
return implode(',', $hashtag);
}
输出应该是:
getHashtags("#rock, #music, is good, #metal, #is not so good");
=> string(40) "rock,music, is good,metal,is not so good"
希望这对您有所帮助。
英文:
You can simple use preg_replace to remove all characters and spaces between the tags and then explode it with #.
Example:
function getHashtags ($text) {
$clean = preg_replace("/[^A-Za-z0-9] #/", "#", $text);
$text = explode("#", $clean);
$hashtag = [];
foreach ($text as $word) {
if ($word){
$hashtag[]= $word;
}
}
return implode(',', $hashtag);
}
Output should be:
getHashtags("#rock, #music, is good, #metal, #is not so good");
=> string(40) "rock,music, is good,metal,is not so good"
答案2
得分: 0
为处理由非字母数字字符分隔的标签的情况,您可以修改用于匹配标签的正则表达式。目前,正则表达式 /^[a-zA-Z0-9]+$/ 仅匹配字母数字字符。
您可以更新它以允许出现在 "#" 符号和实际标签单词之间的非字母数字字符。一种方法是使用一个字符类,该字符类匹配任何非空格字符,如下所示:
$hashReg = "/^#[^\s]+/";
以下是修改后的 getHashtags 函数:
function getHashtags($text) {
// 按空格拆分文本
$text = explode(" ", $text);
$hashtags = [];
$hashReg = "/^#[^\s]+/";
// 针对每个帖子中的每个单词
foreach ($text as $word) {
// 第一个字符是 #
$char = substr($word, 0, 1);
// # 字符后的单词
$ref = substr($word, 1);
// 如果单词的第一个字符是 #
if ($char == "#") {
// 检查标签是否匹配模式
if (preg_match($hashReg, $word)) {
// 检查标签长度
if (strlen($ref) <= 11) {
// 将标签添加到数组中
$hashtags[] = $ref;
}
}
}
}
// 使用逗号连接标签并返回字符串
return implode(",", $hashtags);
}
希望这对您有所帮助。
英文:
To handle the case where hashtags are separated by non-alphanumeric characters, you can modify the regular expression used to match hashtags. Currently, the regular expression /^[a-zA-Z0-9]+$/ matches only alphanumeric characters.
You can update it to allow for non-alphanumeric characters that might appear between the '#' symbol and the actual hashtag word. One way to do this is to use a character class that matches any non-space character, like this:
$hashReg = "/^#[^\s]+$/";
Here is the modified getHashtags function:
function getHashtags($text) {
// explode on spaces
$text = explode(" ", $text);
$hashtags = [];
$hashReg = "/^#[^\s]+$/";
// for every word in post
foreach ($text as $word) {
// 1st character #
$char = substr($word, 0, 1);
// word after character #
$ref = substr($word, 1);
// if 1st character in word is #
if ($char == "#") {
// check if hashtag matches pattern
if (preg_match($hashReg, $word)) {
// check hashtag length
if (strlen($ref) <= 11) {
// add hashtag to array
$hashtags[] = $ref;
}
}
}
}
// join hashtags with comma and return as string
return implode(",", $hashtags);
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论