查找搜索字符串及其周围的n个字符,而不中断单词。

huangapple go评论63阅读模式
英文:

Find search string and n surrounding characters without breaking words

问题

我正在构建一个自定义的搜索结果,我想要返回搜索关键字左右的n个字符。我也想保留开头和结尾的完整单词。

例如,这是我搜索了关键字的文本,我需要周围的文本。

所以,如果我说n个字符是10,我宁愿得到:

..搜索了关键字,我需要..

一个更简单的可接受解决方案是分割单词,结果将是:

..rched the 关键字,I nee..

我开始尝试这个,但卡在了关键字之前的字符串部分:

private function getSubstring($content, $keyword, $nOfChars) {
    $content = strtolower(strip_tags($content));
    $noOffoundStrings = substr_count($content, $keyword);
    $position = strpos($content, $keyword);
    $keywordLength = strlen($keyword);
    $afterKey = substr($content, $position + $keywordLength, $nOfChars);
    $beforeKey = substr($content, $position , -???); // 如何获取搜索关键字之前的字符串部分
}
英文:

I'm building a custom search result where I want to return n characters from left and right of the searched keyword. I would also like to preserve whole words at the beginning and the end.

> For example this is the text where I searched the keyword and I
> need the text around it too.

So if I say n characters is 10 I would preferably get:

> ..searched the keyword and I need..

A simpler acceptable solution would be to break the words so the result would be:

> ..rched the keyword and I nee..

I started with this but got stuck on the string part before the keyword:

private function getSubstring($content,$keyword, $nOfChars) {
    $content = strtolower(strip_tags($content));
    $noOffoundStrings = substr_count($content, $keyword);
    $position = strpos($content, $keyword);
    $keywordLength = strlen($keyword);
    $afterKey = substr($content, $position + $keywordLength, $nOfChars);
    $beforeKey = substr($content, $position , -???); // how to get string part before the searched keyword
}

答案1

得分: 1

I have concentrated on the building of the result set only.

The adornment(... before and after) is static and doesn't treat the edge cases when the keyword occurs at the very beginning or end of the text.

Keeping whole words isn't handled either (that adds too much complexity to the answer). If you are satisfied with an answer to this question you may want to ask a new question for that.

the mb_* variants of the string functions work with non-English text (Latin ABC with diacritics [ő, ű, â, î, ș, ț, etc.], Israeli, Arabic, Hindi, etc.).

$str = strip_tags('<p>This is a search text <span>with</span> some content blabla blabla search text of length</p>');

$keyword = 'search';

$a = explode(strtolower($keyword), strtolower($str));
$resultArray = [];
$keepChars = 10;

for ($i = 0; $i < count($a) - 1; $i++) {
    $beforeKey = $a[$i];
    $afterKey = $a[$i + 1];
    $resultArray[] = '...' 
                   . mb_substr($beforeKey, min(-$keepChars, mb_strlen($beforeKey))) 
                   . $keyword 
                   . mb_substr($afterKey, 0, min($keepChars, mb_strlen($afterKey))) 
                   . '...';
}

var_dump($resultArray);

This should output the following:

array(2) {
  [0]=>
  string(32) "...this is a search text with..."
  [1]=>
  string(32) "...la blabla search text of l..."
}
英文:

I have concentrated on the building of the result set only.

The adornment(... before and after) is static and doesn't treat the edge cases when the keyword occurs at the very beginning or end of the text.

Keeping whole words isn't handled either (that adds too much complexity to the answer). If you are satisfied with an answer to this question you may want to ask a new question for that.

the mb_* variants of the string functions work with non-English text (Latin ABC with diacritics [ő, ű, â, î, ș, ț, etc.], Israeli, Arabic, Hindi, etc.).

$str = strip_tags(&#39;&lt;p&gt;This is a search text &lt;span&gt;with&lt;/span&gt; some content blabla blabla search text of length&lt;/p&gt;&#39;);

$keyword = &#39;search&#39;;

$a = explode(strtolower($keyword), strtolower($str));
$resultArray = [];
$keepChars = 10;

for ($i = 0; $i &lt; count($a) - 1; $i++) {
    $beforeKey = $a[$i];
    $afterKey = $a[$i + 1];
    $resultArray[] = &#39;...&#39; 
                   . mb_substr($beforeKey, min(-$keepChars, mb_strlen($beforeKey))) 
                   . $keyword 
                   . mb_substr($afterKey, 0, min($keepChars, mb_strlen($afterKey))) 
                   . &#39;...&#39;;
}

var_dump($resultArray);

This should output the following:

array(2) {
  [0]=&gt;
  string(32) &quot;...this is a search text with...&quot;
  [1]=&gt;
  string(32) &quot;...la blabla search text of l...&quot;
}

答案2

得分: 0

你可以使用explode函数

    $numChar = 12;
    $string = "apelle figlio di apollo fece una palla";
    $searched = "apollo";
    
    $exploded = explode($searched, $string);
    
    if(count($exploded) == 1) {
        //没有匹配
        return "";
    } 
    
    $exlopedBefore = array_reverse(explode(" ", trim($exploded[0])));
    
    $before = "";
    
    foreach($exlopedBefore as $string) {
    	if(strlen($before) >= $numChar) {
    		break;
    	}
    	$before = $string . " " . $before;
    }
  
	
	$explodedAfter = explode(" ", trim($exploded[1]));
    
    $after = "";
    
    foreach($explodedAfter as $string) {
    	if(strlen($after) >= $numChar) {
    		break;
    	}
    	$after .= " " . $string;
    }
  
    
    
    $complete = $before . $searched . $after;
    echo $complete;
英文:

you could use the explode function

    $numChar = 12;
    $string = &quot;apelle figlio di apollo fece una palla&quot;;
    $searched = &quot;apollo&quot;;
    
    $exploded = explode($searched, $string);
    
    if(count($exploded) == 1) {
        //no match
        return &quot;&quot;;
    } 
    
    $exlopedBefore = array_reverse(explode(&quot; &quot;, trim($exploded[0])));
    
    $before = &quot;&quot;;
    
    foreach($exlopedBefore as $string) {
    	if(strlen($before) &gt;= $numChar) {
    		break;
    	}
    	$before = $string . &quot; &quot; . $before;
    }
  
	
	$explodedAfter = explode(&quot; &quot;, trim($exploded[1]));
    
    $after = &quot;&quot;;
    
    foreach($explodedAfter as $string) {
    	if(strlen($after) &gt;= $numChar) {
    		break;
    	}
    	$after .= &quot; &quot; . $string;
    }
  
    
    
    $complete = $before . $searched . $after;
    echo $complete;

答案3

得分: 0

我可以为你提供代码的中文翻译部分:

$needle = "关键词";
$extra = 10;

foreach ($texts as $text) {
    $new = preg_replace_callback(
               "/.*?(\S+.{0,$extra})?($needle)(.{0,$extra}\S+)?.*/",
               function($m) {
                   return sprintf(
                       '%s<b>%s</b>%s',
                       strlen($m[1]) ? "..{$m[1]}" : '',
                       $m[2],
                       strlen($m[3] ?? '') ? "{$m[3]}.." : ''
                   );
               }, 
               $text,
               1,
               $count
           );
    echo ($count ? $new : '') . "\n";
}

请注意,这是代码的中文翻译部分,不包括说明文字。

英文:

I am comfortable recommending a regex approach because it concisely affords precise handling of needles at the start, middle, and end of the haystack string.

This will try to show full words on both sides of the needle. Logically if there are no words on either side, no dots will be added.

Code: (Demo)

$needle = &quot;keyword&quot;;
$extra = 10;

foreach ($texts as $text) {
    $new = preg_replace_callback(
               &quot;/.*?(\S+.{0,$extra})?($needle)(.{0,$extra}\S+)?.*/&quot;,
               function($m) {
                   return sprintf(
                       &#39;%s&lt;b&gt;%s&lt;/b&gt;%s&#39;,
                       strlen($m[1]) ? &quot;..{$m[1]}&quot; : &#39;&#39;,
                       $m[2],
                       strlen($m[3] ?? &#39;&#39;) ? &quot;{$m[3]}..&quot; : &#39;&#39;
                   );
               }, 
               $text,
               1,
               $count
           );
    echo ($count ? $new : &#39;&#39;) . &quot;\n&quot;;
}

Input:

$texts = [
    &quot;For example this is the text where I searched the keyword and I need the text around it too.&quot;,
    &quot;keyword at the very start&quot;,
    &quot;Or it can end with keyword&quot;,
    &quot;Nothing to see here officer.&quot;,
    &quot;keyword&quot;,
];

Output:

..searched the &lt;b&gt;keyword&lt;/b&gt; and I need..
&lt;b&gt;keyword&lt;/b&gt; at the very..
..can end with &lt;b&gt;keyword&lt;/b&gt;

&lt;b&gt;keyword&lt;/b&gt;

Pattern breakdown:

/               #starting pattern delimiter
.*?             #lazily match zero or more characters (giving back as much as possible)
(               #start capture group 1
  \S+           #match one or more visible characters
  .{0,$extra}   #match between 0 and 10 characters
)?              #end capture group 1 and make matching optional
($needle)       #match the needle string as capture group 2
(               #start capture group 3
  .{0,$extra}   #match between 0 and 10 characters
  \S+           #match one or more visible characters
)?              #end capture group 3 and make matching optional
.*              #greedily match zero or more characters
/
  • Add the u pattern modifier if multibyte characters might be encountered.

  • Add the i pattern modifier for case-insensitive matching.

  • Add the s pattern modifier if your string might contain newline characters.

  • Wrap the needle string in \b (word boundary metacharacters) for whole word matching.

huangapple
  • 本文由 发表于 2023年4月17日 17:52:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033839.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定