2023年4月17日 17:52:54go评论63阅读模式

英文:

Find search string and n surrounding characters without breaking words

问题

我正在构建一个自定义的搜索结果，我想要返回搜索关键字左右的n个字符。我也想保留开头和结尾的完整单词。

例如，这是我搜索了关键字的文本，我需要周围的文本。

所以，如果我说n个字符是10，我宁愿得到：

..搜索了关键字，我需要..

一个更简单的可接受解决方案是分割单词，结果将是：

..rched the 关键字，I nee..

我开始尝试这个，但卡在了关键字之前的字符串部分：

private function getSubstring($content, $keyword, $nOfChars) {
    $content = strtolower(strip_tags($content));
    $noOffoundStrings = substr_count($content, $keyword);
    $position = strpos($content, $keyword);
    $keywordLength = strlen($keyword);
    $afterKey = substr($content, $position + $keywordLength, $nOfChars);
    $beforeKey = substr($content, $position , -???); // 如何获取搜索关键字之前的字符串部分
}

英文:

I'm building a custom search result where I want to return n characters from left and right of the searched keyword. I would also like to preserve whole words at the beginning and the end.

> For example this is the text where I searched the keyword and I
> need the text around it too.

So if I say n characters is 10 I would preferably get:

> ..searched the keyword and I need..

A simpler acceptable solution would be to break the words so the result would be:

> ..rched the keyword and I nee..

I started with this but got stuck on the string part before the keyword:

private function getSubstring($content,$keyword, $nOfChars) {
    $content = strtolower(strip_tags($content));
    $noOffoundStrings = substr_count($content, $keyword);
    $position = strpos($content, $keyword);
    $keywordLength = strlen($keyword);
    $afterKey = substr($content, $position + $keywordLength, $nOfChars);
    $beforeKey = substr($content, $position , -???); // how to get string part before the searched keyword
}

答案1

得分: 1

I have concentrated on the building of the result set only.

The adornment(... before and after) is static and doesn't treat the edge cases when the keyword occurs at the very beginning or end of the text.

Keeping whole words isn't handled either (that adds too much complexity to the answer). If you are satisfied with an answer to this question you may want to ask a new question for that.

the mb_* variants of the string functions work with non-English text (Latin ABC with diacritics [ő, ű, â, î, ș, ț, etc.], Israeli, Arabic, Hindi, etc.).

$str = strip_tags('<p>This is a search text <span>with</span> some content blabla blabla search text of length</p>');

$keyword = 'search';

$a = explode(strtolower($keyword), strtolower($str));
$resultArray = [];
$keepChars = 10;

for ($i = 0; $i < count($a) - 1; $i++) {
    $beforeKey = $a[$i];
    $afterKey = $a[$i + 1];
    $resultArray[] = '...' 
                   . mb_substr($beforeKey, min(-$keepChars, mb_strlen($beforeKey))) 
                   . $keyword 
                   . mb_substr($afterKey, 0, min($keepChars, mb_strlen($afterKey))) 
                   . '...';
}

var_dump($resultArray);

This should output the following:

array(2) {
  [0]=>
  string(32) "...this is a search text with..."
  [1]=>
  string(32) "...la blabla search text of l..."
}

英文:

I have concentrated on the building of the result set only.

The adornment(... before and after) is static and doesn't treat the edge cases when the keyword occurs at the very beginning or end of the text.

Keeping whole words isn't handled either (that adds too much complexity to the answer). If you are satisfied with an answer to this question you may want to ask a new question for that.

the mb_* variants of the string functions work with non-English text (Latin ABC with diacritics [ő, ű, â, î, ș, ț, etc.], Israeli, Arabic, Hindi, etc.).

$str = strip_tags(&#39;&lt;p&gt;This is a search text &lt;span&gt;with&lt;/span&gt; some content blabla blabla search text of length&lt;/p&gt;&#39;);

$keyword = &#39;search&#39;;

$a = explode(strtolower($keyword), strtolower($str));
$resultArray = [];
$keepChars = 10;

for ($i = 0; $i &lt; count($a) - 1; $i++) {
    $beforeKey = $a[$i];
    $afterKey = $a[$i + 1];
    $resultArray[] = &#39;...&#39; 
                   . mb_substr($beforeKey, min(-$keepChars, mb_strlen($beforeKey))) 
                   . $keyword 
                   . mb_substr($afterKey, 0, min($keepChars, mb_strlen($afterKey))) 
                   . &#39;...&#39;;
}

var_dump($resultArray);

This should output the following:

array(2) {
  [0]=&gt;
  string(32) &quot;...this is a search text with...&quot;
  [1]=&gt;
  string(32) &quot;...la blabla search text of l...&quot;
}

答案2

得分: 0

你可以使用explode函数

    $numChar = 12;
    $string = "apelle figlio di apollo fece una palla";
    $searched = "apollo";
    
    $exploded = explode($searched, $string);
    
    if(count($exploded) == 1) {
        //没有匹配
        return "";
    } 
    
    $exlopedBefore = array_reverse(explode(" ", trim($exploded[0])));
    
    $before = "";
    
    foreach($exlopedBefore as $string) {
    	if(strlen($before) >= $numChar) {
    		break;
    	}
    	$before = $string . " " . $before;
    }
  
	
	$explodedAfter = explode(" ", trim($exploded[1]));
    
    $after = "";
    
    foreach($explodedAfter as $string) {
    	if(strlen($after) >= $numChar) {
    		break;
    	}
    	$after .= " " . $string;
    }
  
    
    
    $complete = $before . $searched . $after;
    echo $complete;

英文:

you could use the explode function

    $numChar = 12;
    $string = &quot;apelle figlio di apollo fece una palla&quot;;
    $searched = &quot;apollo&quot;;
    
    $exploded = explode($searched, $string);
    
    if(count($exploded) == 1) {
        //no match
        return &quot;&quot;;
    } 
    
    $exlopedBefore = array_reverse(explode(&quot; &quot;, trim($exploded[0])));
    
    $before = &quot;&quot;;
    
    foreach($exlopedBefore as $string) {
    	if(strlen($before) &gt;= $numChar) {
    		break;
    	}
    	$before = $string . &quot; &quot; . $before;
    }
  
	
	$explodedAfter = explode(&quot; &quot;, trim($exploded[1]));
    
    $after = &quot;&quot;;
    
    foreach($explodedAfter as $string) {
    	if(strlen($after) &gt;= $numChar) {
    		break;
    	}
    	$after .= &quot; &quot; . $string;
    }
  
    
    
    $complete = $before . $searched . $after;
    echo $complete;

答案3

得分: 0

我可以为你提供代码的中文翻译部分：

$needle = "关键词";
$extra = 10;

foreach ($texts as $text) {
    $new = preg_replace_callback(
               "/.*?(\S+.{0,$extra})?($needle)(.{0,$extra}\S+)?.*/",
               function($m) {
                   return sprintf(
                       '%s<b>%s</b>%s',
                       strlen($m[1]) ? "..{$m[1]}" : '',
                       $m[2],
                       strlen($m[3] ?? '') ? "{$m[3]}.." : ''
                   );
               }, 
               $text,
               1,
               $count
           );
    echo ($count ? $new : '') . "\n";
}

请注意，这是代码的中文翻译部分，不包括说明文字。

英文:

I am comfortable recommending a regex approach because it concisely affords precise handling of needles at the start, middle, and end of the haystack string.

This will try to show full words on both sides of the needle. Logically if there are no words on either side, no dots will be added.

Code: (Demo)

$needle = &quot;keyword&quot;;
$extra = 10;

foreach ($texts as $text) {
    $new = preg_replace_callback(
               &quot;/.*?(\S+.{0,$extra})?($needle)(.{0,$extra}\S+)?.*/&quot;,
               function($m) {
                   return sprintf(
                       &#39;%s&lt;b&gt;%s&lt;/b&gt;%s&#39;,
                       strlen($m[1]) ? &quot;..{$m[1]}&quot; : &#39;&#39;,
                       $m[2],
                       strlen($m[3] ?? &#39;&#39;) ? &quot;{$m[3]}..&quot; : &#39;&#39;
                   );
               }, 
               $text,
               1,
               $count
           );
    echo ($count ? $new : &#39;&#39;) . &quot;\n&quot;;
}

Input:

$texts = [
    &quot;For example this is the text where I searched the keyword and I need the text around it too.&quot;,
    &quot;keyword at the very start&quot;,
    &quot;Or it can end with keyword&quot;,
    &quot;Nothing to see here officer.&quot;,
    &quot;keyword&quot;,
];

Output:

..searched the &lt;b&gt;keyword&lt;/b&gt; and I need..
&lt;b&gt;keyword&lt;/b&gt; at the very..
..can end with &lt;b&gt;keyword&lt;/b&gt;

&lt;b&gt;keyword&lt;/b&gt;

Pattern breakdown:

/               #starting pattern delimiter
.*?             #lazily match zero or more characters (giving back as much as possible)
(               #start capture group 1
  \S+           #match one or more visible characters
  .{0,$extra}   #match between 0 and 10 characters
)?              #end capture group 1 and make matching optional
($needle)       #match the needle string as capture group 2
(               #start capture group 3
  .{0,$extra}   #match between 0 and 10 characters
  \S+           #match one or more visible characters
)?              #end capture group 3 and make matching optional
.*              #greedily match zero or more characters
/

Add the u pattern modifier if multibyte characters might be encountered.
Add the i pattern modifier for case-insensitive matching.
Add the s pattern modifier if your string might contain newline characters.
Wrap the needle string in \b (word boundary metacharacters) for whole word matching.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

查找搜索字符串及其周围的n个字符，而不中断单词。

问题

答案1

答案2

答案3

如何在dplyr中避免使用省略号…？

How to create map variable from string literal in Golang?

Hackerrank: 夏洛克与字谜

如何在Go中检测字节无法转换为字符串的情况？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论