2023年2月24日 05:31:57go评论52阅读模式

英文:

find matches for multiple words

问题

以下是您要求的代码部分的翻译：

I have this PostgreSQL table for storing words:

    CREATE TABLE IF NOT EXISTS words
    (
        id bigint NOT NULL DEFAULT nextval(&#39;processed_words_id_seq&#39;::regclass),
        keyword character varying(300) COLLATE pg_catalog.&quot;default&quot;,
    );
    
    insert into words (keyword)
    VALUES (&#39;while swam is interesting&#39;, 
            &#39;ibm is a company like bmw&#39;);
    
    CREATE TABLE IF NOT EXISTS trademarks
    (
       id bigint NOT NULL DEFAULT nextval(&#39;trademarks_id_seq&#39;::regclass),
       trademark character varying(300) COLLATE pg_catalog.&quot;default&quot;,
    );

    insert into words (trademarks)
    VALUES (&#39;while swam&#39;, 
            &#39;ibm&#39;,
            &#39;bmw&#39;);

Into table `trademarks` I will have thousands of registered trademarks names.
I want to compare words stored into `words` table keyword, do they match not only for a words but also for word which is in a group of words. For example:

I have a keyword `while swam is interesting` stored into `words.keyword`. I also have a trademark `swam` located in `trademarks.trademark` like `ibm` I have a word match, so I want to detect this using Java code. 

First I want to select all blacklisted keywords convert them in for example List and compare `ibm is a company like bmw` with elements from the list. How I can do this not only for one word but also for a expressions?

something like this?

    Optional&lt;ProcessedWords&gt; keywords = processedWordsService.findRandomKeywordWhereTrademarkBlacklistedIsEmpty();
    
            if(keywords.isPresent())
            {
                List&lt;BlacklistedWords&gt; blacklistedWords = blacklistedWordsService.findAll();
                List&lt;String&gt; list = new ArrayList&lt;&gt;();
                for(BlacklistedWords item:  blacklistedWords){
                    list.add(item.getKeyword());
                }
    
                ProcessedWords processedWords = keywords.get();
                String keyword = processedWords.getKeyword();
    
                if(list.contains(keyword))
                {
                    System.out.println(&quot;Found blacklisted word in keyword: &quot; + keyword);
                }
    
            }


@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
@Builder(toBuilder = true)
@Entity
@Table(name = &quot;trademarks&quot;)
public class BlacklistedWords implements Serializable {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = &quot;id&quot;, unique = true, updatable = false, nullable = false)
    private long id;

    @Column(name = &quot;trademark&quot;, length = 200, unique = true)
    private String keyword;
}

如果您需要任何进一步的帮助或解释，请随时告诉我。

英文:

I have this PostgreSQL table for storing words:

CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval(&#39;processed_words_id_seq&#39;::regclass),
keyword character varying(300) COLLATE pg_catalog.&quot;default&quot;,
);
insert into words (keyword)
VALUES (&#39;while swam is interesting&#39;, 
&#39;ibm is a company like bmw&#39;);
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval(&#39;trademarks_id_seq&#39;::regclass),
trademark character varying(300) COLLATE pg_catalog.&quot;default&quot;,
);
insert into words (trademarks)
VALUES (&#39;while swam&#39;, 
&#39;ibm&#39;,
&#39;bmw&#39;);

Into table trademarks I will have thousands of registered trademarks names.
I want to compare words stored into words table keyword, do they match not only for a words but also for word which is in a group of words. For example:

I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark like ibm I have a word match, so I want to detect this using Java code.

First I want to select all blacklisted keywords convert them in for example List and compare ibm is a company like bmw with elements from the list. How I can do this not only for one word but also for a expressions?

something like this?

Optional&lt;ProcessedWords&gt; keywords = processedWordsService.findRandomKeywordWhereTrademarkBlacklistedIsEmpty();
if(keywords.isPresent())
{
List&lt;BlacklistedWords&gt; blacklistedWords = blacklistedWordsService.findAll();
List&lt;String&gt; list = new ArrayList&lt;&gt;();
for(BlacklistedWords item:  blacklistedWords){
list.add(item.getKeyword());
}
ProcessedWords processedWords = keywords.get();
String keyword = processedWords.getKeyword();
if(list.contains(keyword))
{
System.out.println(&quot;Found blacklisted word in keyword: &quot; + keyword);
}
}
@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
@Builder(toBuilder = true)
@Entity
@Table(name = &quot;trademarks&quot;)
public class BlacklistedWords implements Serializable {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = &quot;id&quot;, unique = true, updatable = false, nullable = false)
private long id;
@Column(name = &quot;trademark&quot;, length = 200, unique = true)
private String keyword;
}

Can you guide me how this can be implemented?

答案1

得分: 1

以下是使用Java流进行匹配的示例：

public static void main(String[] args) {
    // 模拟持久层
    List<BlacklistedWords> blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, "while swam"), new BlacklistedWords(2, "ibm"), new BlacklistedWords(3, "bmw")});
    List<ProcessedWords> keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, "while swam is interesting"), new ProcessedWords(2, "ibm is a company like bmw"), new ProcessedWords(3, "miss")});
    
    List<ProcessedWords> hits = keyWords.stream()
        .filter(pw -> blacklistedWords.stream()                    
            .anyMatch(bw -> pw.getKeyword().indexOf(bw.getTrademark()) != -1))
        .collect(Collectors.toList());
    
    System.out.println(hits);
}

输出结果:

[ProcessedWords(id=1, keyword=while swam is interesting), ProcessedWords(id=2, keyword=ibm is a company like bmw)]

请注意，我模拟了持久层，并使用@Data注解了BlacklistedWords(table=trademarks)和ProcessedWords(table=words)，以获得合理的toString()输出。然而，实际上，不应该这样做，因为它们应该是@Entity。

英文:

This is how to do the matching with Java streams:

public static void main(String[] args) {
// stubbing up the persistence layer
List&lt;BlacklistedWords&gt; blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, &quot;while swam&quot;), new BlacklistedWords(2, &quot;ibm&quot;), new BlacklistedWords(3, &quot;bmw&quot;)});
List&lt;ProcessedWords&gt; keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, &quot;while swam is interesting&quot;), new ProcessedWords(2, &quot;ibm is a company like bmw&quot;), new ProcessedWords(3, &quot;miss&quot;)});
List&lt;ProcessedWords&gt; hits = keyWords.stream()
.filter(pw -&gt; blacklistedWords.stream()                    
.anyMatch(bw -&gt;                                 
pw.getKeyword().indexOf(bw.getTrademark()) != -1))
.collect(Collectors.toList());
System.out.println(hits);
}

Output:

[ProcessedWords(id=1, keyword=while swam is interesting), ProcessedWords(id=2, keyword=ibm is a company like bmw)]

Note that I stubbed out the persistence layer with an additional ProcessedWords of "missed" and annotated BlacklistedWords(table=trademarks) & ProcessedWords(table=words) with @Data to get a decent toString(), which you shouldn't because they are @Entity.

答案2

得分: 1

为了满足对整个单词的要求，应该进行以下操作。

List<BlacklistedWords> blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, "while swam"), new BlacklistedWords(2, "ibm"), new BlacklistedWords(3, "bmw")});
List<ProcessedWords> keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, "while swam is interesting"), new ProcessedWords(2, "ibm is a company like bmw"), new ProcessedWords(3, "miss")});

Set<ProcessedWords> hits = new HashSet<>();
blacklistedWords.parallelStream().forEach(bw -> {
    final String trademark = bw.getTrademark();
    final String startsWith = trademark + " ";
    final String contains = " " + startsWith;
    final String endsWith = " " + trademark;
    keyWords.parallelStream().forEach(pw -> {
        final String keyword = pw.getKeyword();
        if (keyword.contains(contains) || keyword.startsWith(startsWith) || keyword.endsWith(endsWith) || keyword.equals(trademark))
            hits.add(pw);
    });
});

由于我们需要的输出是 keyWords 的子集，而且我们不想在内部流中重新计算 " " + trademark + " " 等内容，不建议使用以下方法，但是它是有效的：

List<ProcessedWords> hits1 = keyWords.parallelStream().filter(pw -> blacklistedWords.parallelStream().anyMatch(bw -> {
    final String keyword = pw.getKeyword();
    final String trademark = bw.getTrademark();
    return keyword.contains(" " + trademark + " ") || keyword.startsWith(trademark + " ") || keyword.endsWith(" " + trademark) || keyword.equals(trademark);
})).collect(Collectors.toList());

英文:

In order to satisfy the requirement for whole words, the following should be done.

    List&lt;BlacklistedWords&gt; blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, &quot;while swam&quot;), new BlacklistedWords(2, &quot;ibm&quot;), new BlacklistedWords(3, &quot;bmw&quot;)});
List&lt;ProcessedWords&gt; keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, &quot;while swam is interesting&quot;), new ProcessedWords(2, &quot;ibm is a company like bmw&quot;), new ProcessedWords(3, &quot;miss&quot;)});
Set&lt;ProcessedWords&gt; hits = new HashSet&lt;&gt;();
blacklistedWords.parallelStream().forEach(bw -&gt; {
final String trademark = bw.getTrademark();
final String startsWith = trademark + &quot; &quot;;
final String contains = &quot; &quot; + startsWith;
final String endsWith = &quot; &quot; + trademark;
keyWords.parallelStream().forEach(pw -&gt; {
final String keyword = pw.getKeyword();
if (keyword.contains(contains) || keyword.startsWith(startsWith) || keyword.endsWith(endsWith)
|| keyword.equals(trademark))
hits.add(pw);
});
});

Because the output we require is a subset of keyWords and we do not want to recalculate the " " + trademark + " ", etc in the inner stream, the following is not advised, but works:

	List&lt;ProcessedWords&gt; hits1 = keyWords.parallelStream().filter(pw -&gt; blacklistedWords.parallelStream().anyMatch(bw -&gt; {
final String keyword = pw.getKeyword();
final String trademark = bw.getTrademark();
return keyword.contains(&quot; &quot; + trademark + &quot; &quot;) || keyword.startsWith(trademark + &quot; &quot;) || keyword.endsWith(&quot; &quot; + trademark) || keyword.equals(trademark);
})).collect(Collectors.toList());

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

查找多个单词的匹配项

问题

答案1

答案2

为什么 Hibernate 会生成不必要的约束？

可以阻塞/等待已经存在的异步函数吗？

在运行简单的Selenium+Java代码时遇到了异常？

MapStruct – 基于两个或更多不同源对象的目标字段的自定义映射

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论