英文:
find matches for multiple words
问题
以下是您要求的代码部分的翻译:
I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
);
insert into words (keyword)
VALUES ('while swam is interesting',
'ibm is a company like bmw');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
);
insert into words (trademarks)
VALUES ('while swam',
'ibm',
'bmw');
Into table `trademarks` I will have thousands of registered trademarks names.
I want to compare words stored into `words` table keyword, do they match not only for a words but also for word which is in a group of words. For example:
I have a keyword `while swam is interesting` stored into `words.keyword`. I also have a trademark `swam` located in `trademarks.trademark` like `ibm` I have a word match, so I want to detect this using Java code.
First I want to select all blacklisted keywords convert them in for example List and compare `ibm is a company like bmw` with elements from the list. How I can do this not only for one word but also for a expressions?
something like this?
Optional<ProcessedWords> keywords = processedWordsService.findRandomKeywordWhereTrademarkBlacklistedIsEmpty();
if(keywords.isPresent())
{
List<BlacklistedWords> blacklistedWords = blacklistedWordsService.findAll();
List<String> list = new ArrayList<>();
for(BlacklistedWords item: blacklistedWords){
list.add(item.getKeyword());
}
ProcessedWords processedWords = keywords.get();
String keyword = processedWords.getKeyword();
if(list.contains(keyword))
{
System.out.println("Found blacklisted word in keyword: " + keyword);
}
}
@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
@Builder(toBuilder = true)
@Entity
@Table(name = "trademarks")
public class BlacklistedWords implements Serializable {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "id", unique = true, updatable = false, nullable = false)
private long id;
@Column(name = "trademark", length = 200, unique = true)
private String keyword;
}
如果您需要任何进一步的帮助或解释,请随时告诉我。
英文:
I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
);
insert into words (keyword)
VALUES ('while swam is interesting',
'ibm is a company like bmw');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
);
insert into words (trademarks)
VALUES ('while swam',
'ibm',
'bmw');
Into table trademarks
I will have thousands of registered trademarks names.
I want to compare words stored into words
table keyword, do they match not only for a words but also for word which is in a group of words. For example:
I have a keyword while swam is interesting
stored into words.keyword
. I also have a trademark swam
located in trademarks.trademark
like ibm
I have a word match, so I want to detect this using Java code.
First I want to select all blacklisted keywords convert them in for example List and compare ibm is a company like bmw
with elements from the list. How I can do this not only for one word but also for a expressions?
something like this?
Optional<ProcessedWords> keywords = processedWordsService.findRandomKeywordWhereTrademarkBlacklistedIsEmpty();
if(keywords.isPresent())
{
List<BlacklistedWords> blacklistedWords = blacklistedWordsService.findAll();
List<String> list = new ArrayList<>();
for(BlacklistedWords item: blacklistedWords){
list.add(item.getKeyword());
}
ProcessedWords processedWords = keywords.get();
String keyword = processedWords.getKeyword();
if(list.contains(keyword))
{
System.out.println("Found blacklisted word in keyword: " + keyword);
}
}
@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
@Builder(toBuilder = true)
@Entity
@Table(name = "trademarks")
public class BlacklistedWords implements Serializable {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "id", unique = true, updatable = false, nullable = false)
private long id;
@Column(name = "trademark", length = 200, unique = true)
private String keyword;
}
Can you guide me how this can be implemented?
答案1
得分: 1
以下是使用Java流进行匹配的示例:
public static void main(String[] args) {
// 模拟持久层
List<BlacklistedWords> blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, "while swam"), new BlacklistedWords(2, "ibm"), new BlacklistedWords(3, "bmw")});
List<ProcessedWords> keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, "while swam is interesting"), new ProcessedWords(2, "ibm is a company like bmw"), new ProcessedWords(3, "miss")});
List<ProcessedWords> hits = keyWords.stream()
.filter(pw -> blacklistedWords.stream()
.anyMatch(bw -> pw.getKeyword().indexOf(bw.getTrademark()) != -1))
.collect(Collectors.toList());
System.out.println(hits);
}
输出结果:
[ProcessedWords(id=1, keyword=while swam is interesting), ProcessedWords(id=2, keyword=ibm is a company like bmw)]
请注意,我模拟了持久层,并使用@Data
注解了BlacklistedWords(table=trademarks)
和ProcessedWords(table=words)
,以获得合理的toString()
输出。然而,实际上,不应该这样做,因为它们应该是@Entity
。
英文:
This is how to do the matching with Java streams:
public static void main(String[] args) {
// stubbing up the persistence layer
List<BlacklistedWords> blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, "while swam"), new BlacklistedWords(2, "ibm"), new BlacklistedWords(3, "bmw")});
List<ProcessedWords> keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, "while swam is interesting"), new ProcessedWords(2, "ibm is a company like bmw"), new ProcessedWords(3, "miss")});
List<ProcessedWords> hits = keyWords.stream()
.filter(pw -> blacklistedWords.stream()
.anyMatch(bw ->
pw.getKeyword().indexOf(bw.getTrademark()) != -1))
.collect(Collectors.toList());
System.out.println(hits);
}
Output:
[ProcessedWords(id=1, keyword=while swam is interesting), ProcessedWords(id=2, keyword=ibm is a company like bmw)]
Note that I stubbed out the persistence layer with an additional ProcessedWords
of "missed" and annotated BlacklistedWords(table=trademarks)
& ProcessedWords(table=words)
with @Data
to get a decent toString()
, which you shouldn't because they are @Entity
.
答案2
得分: 1
为了满足对整个单词的要求,应该进行以下操作。
List<BlacklistedWords> blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, "while swam"), new BlacklistedWords(2, "ibm"), new BlacklistedWords(3, "bmw")});
List<ProcessedWords> keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, "while swam is interesting"), new ProcessedWords(2, "ibm is a company like bmw"), new ProcessedWords(3, "miss")});
Set<ProcessedWords> hits = new HashSet<>();
blacklistedWords.parallelStream().forEach(bw -> {
final String trademark = bw.getTrademark();
final String startsWith = trademark + " ";
final String contains = " " + startsWith;
final String endsWith = " " + trademark;
keyWords.parallelStream().forEach(pw -> {
final String keyword = pw.getKeyword();
if (keyword.contains(contains) || keyword.startsWith(startsWith) || keyword.endsWith(endsWith) || keyword.equals(trademark))
hits.add(pw);
});
});
由于我们需要的输出是 keyWords 的子集,而且我们不想在内部流中重新计算 " " + trademark + " " 等内容,不建议使用以下方法,但是它是有效的:
List<ProcessedWords> hits1 = keyWords.parallelStream().filter(pw -> blacklistedWords.parallelStream().anyMatch(bw -> {
final String keyword = pw.getKeyword();
final String trademark = bw.getTrademark();
return keyword.contains(" " + trademark + " ") || keyword.startsWith(trademark + " ") || keyword.endsWith(" " + trademark) || keyword.equals(trademark);
})).collect(Collectors.toList());
英文:
In order to satisfy the requirement for whole words, the following should be done.
List<BlacklistedWords> blacklistedWords = Arrays.asList(new BlacklistedWords[] {new BlacklistedWords(1, "while swam"), new BlacklistedWords(2, "ibm"), new BlacklistedWords(3, "bmw")});
List<ProcessedWords> keyWords = Arrays.asList(new ProcessedWords[] {new ProcessedWords(1, "while swam is interesting"), new ProcessedWords(2, "ibm is a company like bmw"), new ProcessedWords(3, "miss")});
Set<ProcessedWords> hits = new HashSet<>();
blacklistedWords.parallelStream().forEach(bw -> {
final String trademark = bw.getTrademark();
final String startsWith = trademark + " ";
final String contains = " " + startsWith;
final String endsWith = " " + trademark;
keyWords.parallelStream().forEach(pw -> {
final String keyword = pw.getKeyword();
if (keyword.contains(contains) || keyword.startsWith(startsWith) || keyword.endsWith(endsWith)
|| keyword.equals(trademark))
hits.add(pw);
});
});
Because the output we require is a subset of keyWords and we do not want to recalculate the " " + trademark + " ", etc in the inner stream, the following is not advised, but works:
List<ProcessedWords> hits1 = keyWords.parallelStream().filter(pw -> blacklistedWords.parallelStream().anyMatch(bw -> {
final String keyword = pw.getKeyword();
final String trademark = bw.getTrademark();
return keyword.contains(" " + trademark + " ") || keyword.startsWith(trademark + " ") || keyword.endsWith(" " + trademark) || keyword.equals(trademark);
})).collect(Collectors.toList());
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论