Suffix Trie匹配,匹配操作存在问题。

huangapple go评论84阅读模式
英文:

Suffix Trie matching, problem with matching operation

问题

我面临了后缀 Trie 匹配的问题,我设计了一个后缀 Trie,其中包含一个 26 路树来表示节点中的字符,以及与每个节点关联的值。每个节点的值要么表示字符串在主字符串中的起始索引(如果它是后缀),要么表示-1(如果不是后缀)。然后,我试图让匹配操作正常工作,但显然没有成功,我无法找出其中的错误。需要更多澄清,请参考此 PDF 中的第二个问题。请帮忙。

期望输出:

输入:
AATCGGGTTCAATCGGGGT
2
ATCG
GGGT
输出:
1 4 11 15

我的输出:

AATCGGGTTCAATCGGGGT
2
ATCG
11
1
GGGT
4

如您所见,我没有得到15作为答案。

英文:

I am facing a problem with suffix Trie matching, I designed a suffix trie with a 26-way tree to represent characters in a node plus a value associated with each node. The value of each node denotes either the index where the string ( if it is a suffix ) starts in the main string or -1 otherwise. Thereafter I am trying to get matching operation to work but apparently it doesn't and I am not able to find out bugs in here. For any more clarification refer Second Question in this Pdf. Help, please.

import java.util.*;

class node{
    public int val;
    public node ptrs[];
    node(){
        this.val =0;
        ptrs = new node[26];
        for (node ptr : ptrs) {
            ptr = null;
        }
    }    
}
class Tree{
    public node root = new node();
    int pass =0;
    void insert(String s,int indx) {
        node trv = root;
        for (int i = 0; i < s.length(); i++) {
            if (trv.ptrs[s.charAt(i) - 'A'] == null) {
                trv.ptrs[s.charAt(i) - 'A'] = new node();
                if(i==s.length()-1){
                    trv.ptrs[s.charAt(i)-'A'].val = indx;
                }else{
                    trv.ptrs[s.charAt(i)-'A'].val = -1;
                }
            }
            trv = trv.ptrs[s.charAt(i) - 'A'];
        }
    }
    
    private void visit(node trv){
        for(int i =0;i<26;i++){
            if(trv.ptrs[i]!=null){
                System.out.println(trv.ptrs[i].val+":"+((char)(i+'A')));
                visit(trv.ptrs[i]);
            }
        }
    }

    void visit(){
        this.visit(root);
    }
    void leaf(node trv){
        if(trv.val>=0){
            System.out.println(trv.val);
        }else{
            for(int i=0;i<26;i++){
                if(trv.ptrs[i]!=null){
                    if(trv.ptrs[i].val>=0){
                        System.out.println(trv.ptrs[i].val);
                    }else{
                        leaf(trv.ptrs[i]);
                    }
                }
            }
        }
    }
    private void search(node trv,String s,int i){
        if(i<=s.length()-1){
            if(trv.ptrs[s.charAt(i)-'A']!=null){
                if(i==s.length()-1){
                    leaf(trv.ptrs[s.charAt(i)-'A']);
                }else{
                    search(trv.ptrs[s.charAt(i)-'A'], s, i+1);
                }
            }
        }
    }
    void query(String s){
        this.search(root, s, 0);
    }
}
public class Trie {
    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
        Tree t = new Tree();
        String txt = sc.next();
        int txtLen = txt.length();
        for(int i =0;i<txtLen;i++){
            t.insert(txt.substring(i,txtLen),i);
        }
        int q = sc.nextInt();
        while(q-->0){
           String m = sc.next();
           t.query(m);
        }
        sc.close();
    }
}

Expected :

Input:
AATCGGGTTCAATCGGGGT
2
ATCG
GGGT
Output:
1 4 11 15

My Output :

AATCGGGTTCAATCGGGGT
2
ATCG
11
1
GGGT
4

I am not getting 15 as the answer as you can see.

答案1

得分: 1

我尝试调查您的程序,但您的程序编写得不好,不幸的是我找不到问题。我尝试使用visit来打印它,但没有有用的信息。

但以下尝试使用后缀树查找模式,该后缀树在Fast Pattern Matching of Strings Using Suffix Tree中有描述。可能会有帮助:

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

class Node {
    // ...(省略了类Node的部分代码,包括构造函数、方法等)
}

public class SuffixTree {
    // ...(省略了类SuffixTree的部分代码,包括构造函数、方法等)
    
    public static void main(String[] args) {
        SuffixTree suffixTree = new SuffixTree("AATCGGGTTCAATCGGGGT");
        List<String> matches = suffixTree.searchText("ATCG");
        matches.stream().forEach(m -> { System.out.println(m); });
        List<String> matches2 = suffixTree.searchText("GGGT");
        matches2.stream().forEach(m -> { System.out.println(m); });
    }
}

请注意,我已经省略了一些类和方法的代码,以保持翻译简洁。您可以在提供的链接Fast Pattern Matching of Strings Using Suffix Tree中查找完整的代码和更多细节。

英文:

I try to investigate your program but your program is not well-written and unfortunately I cannot find your problem. I try to print it by visit but there is no helpful information.

But the following try to find pattern by suffix tree which is described at Fast Pattern Matching of Strings Using Suffix Tree. Maybe helpful:

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
class Node {
private String text;
private List&lt;Node&gt; children;
private int position;
public Node(String word, int position) {
this.text = word;
this.position = position;
this.children = new ArrayList&lt;&gt;();
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
public int getPosition() {
return position;
}
public void setPosition(int position) {
this.position = position;
}
public List&lt;Node&gt; getChildren() {
return children;
}
public void setChildren(List&lt;Node&gt; children) {
this.children = children;
}
public String printTree(String depthIndicator) {
String str = &quot;&quot;;
String positionStr = position &gt; -1 ? &quot;[&quot; + String.valueOf(position) + &quot;]&quot; : &quot;&quot;;
str += depthIndicator + text + positionStr + &quot;\n&quot;;
for (int i = 0; i &lt; children.size(); i++) {
str += children.get(i)
.printTree(depthIndicator + &quot;\t&quot;);
}
return str;
}
@Override
public String toString() {
return printTree(&quot;&quot;);
}
}
public class SuffixTree {
private static final Logger LOGGER = LoggerFactory.getLogger(SuffixTree.class);
private static final String WORD_TERMINATION = &quot;$&quot;;
private static final int POSITION_UNDEFINED = -1;
private Node root;
private String fullText;
public SuffixTree(String text) {
root = new Node(&quot;&quot;, POSITION_UNDEFINED);
for (int i = 0; i &lt; text.length(); i++) {
addSuffix(text.substring(i) + WORD_TERMINATION, i);
}
fullText = text;
}
public List&lt;String&gt; searchText(String pattern) {
LOGGER.info(&quot;Searching for pattern \&quot;{}\&quot;&quot;, pattern);
List&lt;String&gt; result = new ArrayList&lt;&gt;();
List&lt;Node&gt; nodes = getAllNodesInTraversePath(pattern, root, false);
if (nodes.size() &gt; 0) {
Node lastNode = nodes.get(nodes.size() - 1);
if (lastNode != null) {
List&lt;Integer&gt; positions = getPositions(lastNode);
positions = positions.stream()
.sorted()
.collect(Collectors.toList());
positions.forEach(m -&gt; result.add((markPatternInText(m, pattern))));
}
}
return result;
}
private void addSuffix(String suffix, int position) {
LOGGER.info(&quot;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Adding new suffix {}&quot;, suffix);
List&lt;Node&gt; nodes = getAllNodesInTraversePath(suffix, root, true);
if (nodes.size() == 0) {
addChildNode(root, suffix, position);
LOGGER.info(&quot;{}&quot;, printTree());
} else {
Node lastNode = nodes.remove(nodes.size() - 1);
String newText = suffix;
if (nodes.size() &gt; 0) {
String existingSuffixUptoLastNode = nodes.stream()
.map(a -&gt; a.getText())
.reduce(&quot;&quot;, String::concat);
// Remove prefix from newText already included in parent
newText = newText.substring(existingSuffixUptoLastNode.length());
}
extendNode(lastNode, newText, position);
LOGGER.info(&quot;{}&quot;, printTree());
}
}
private List&lt;Integer&gt; getPositions(Node node) {
List&lt;Integer&gt; positions = new ArrayList&lt;&gt;();
if (node.getText()
.endsWith(WORD_TERMINATION)) {
positions.add(node.getPosition());
}
for (int i = 0; i &lt; node.getChildren()
.size(); i++) {
positions.addAll(getPositions(node.getChildren()
.get(i)));
}
return positions;
}
private String markPatternInText(Integer startPosition, String pattern) {
String matchingTextLHS = fullText.substring(0, startPosition);
String matchingText = fullText.substring(startPosition, startPosition + pattern.length());
String matchingTextRHS = fullText.substring(startPosition + pattern.length());
return matchingTextLHS + &quot;[&quot; + matchingText + &quot;]&quot; + matchingTextRHS;
}
private void addChildNode(Node parentNode, String text, int position) {
parentNode.getChildren()
.add(new Node(text, position));
}
private void extendNode(Node node, String newText, int position) {
String currentText = node.getText();
String commonPrefix = getLongestCommonPrefix(currentText, newText);
if (commonPrefix != currentText) {
String parentText = currentText.substring(0, commonPrefix.length());
String childText = currentText.substring(commonPrefix.length());
splitNodeToParentAndChild(node, parentText, childText);
}
String remainingText = newText.substring(commonPrefix.length());
addChildNode(node, remainingText, position);
}
private void splitNodeToParentAndChild(Node parentNode, String parentNewText, String childNewText) {
Node childNode = new Node(childNewText, parentNode.getPosition());
if (parentNode.getChildren()
.size() &gt; 0) {
while (parentNode.getChildren()
.size() &gt; 0) {
childNode.getChildren()
.add(parentNode.getChildren()
.remove(0));
}
}
parentNode.getChildren()
.add(childNode);
parentNode.setText(parentNewText);
parentNode.setPosition(POSITION_UNDEFINED);
}
private String getLongestCommonPrefix(String str1, String str2) {
int compareLength = Math.min(str1.length(), str2.length());
for (int i = 0; i &lt; compareLength; i++) {
if (str1.charAt(i) != str2.charAt(i)) {
return str1.substring(0, i);
}
}
return str1.substring(0, compareLength);
}
private List&lt;Node&gt; getAllNodesInTraversePath(String pattern, Node startNode, boolean isAllowPartialMatch) {
List&lt;Node&gt; nodes = new ArrayList&lt;&gt;();
for (int i = 0; i &lt; startNode.getChildren()
.size(); i++) {
Node currentNode = startNode.getChildren()
.get(i);
String nodeText = currentNode.getText();
if (pattern.charAt(0) == nodeText.charAt(0)) {
if (isAllowPartialMatch &amp;&amp; pattern.length() &lt;= nodeText.length()) {
nodes.add(currentNode);
return nodes;
}
int compareLength = Math.min(nodeText.length(), pattern.length());
for (int j = 1; j &lt; compareLength; j++) {
if (pattern.charAt(j) != nodeText.charAt(j)) {
if (isAllowPartialMatch) {
nodes.add(currentNode);
}
return nodes;
}
}
nodes.add(currentNode);
if (pattern.length() &gt; compareLength) {
List&lt;Node&gt; nodes2 = getAllNodesInTraversePath(pattern.substring(compareLength), currentNode, isAllowPartialMatch);
if (nodes2.size() &gt; 0) {
nodes.addAll(nodes2);
} else if (!isAllowPartialMatch) {
nodes.add(null);
}
}
return nodes;
}
}
return nodes;
}
public String printTree() {
return root.printTree(&quot;&quot;);
}
public static void main(String[] args) {
SuffixTree suffixTree = new SuffixTree(&quot;AATCGGGTTCAATCGGGGT&quot;);
List&lt;String&gt; matches = suffixTree.searchText(&quot;ATCG&quot;);
matches.stream().forEach(m -&gt; { System.out.println(m);});
List&lt;String&gt; matches2 = suffixTree.searchText(&quot;GGGT&quot;);
matches2.stream().forEach(m -&gt; { System.out.println(m);});
}
}

huangapple
  • 本文由 发表于 2020年8月12日 13:05:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/63370104.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定