Golang多行正则表达式解析问题

huangapple go评论84阅读模式
英文:

Golang multiline regexp parsing issue

问题

我正在使用Go语言创建一个解析Solidity代码的项目。在我的项目中,我创建了一个名为*analyzeFile()*的函数,用于检测每个智能合约(.sol)中的静态问题,使用正则表达式进行检测:

func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
    findings := make(map[string][]Finding)
    readFile, err := os.Open(file)
    if err != nil {
        return nil, err
    }
    defer readFile.Close()
    contents, _ := ioutil.ReadFile(file)
    scanner := bufio.NewScanner(readFile)
    lineNumber := 0
    for scanner.Scan() {
        lineNumber++
        line := scanner.Text()
        for _, issue := range issues {
            if issue.ParsingMode == "SingleLine" {
                matched, _ := regexp.MatchString(issue.Pattern, line)
                if matched {
                    findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                        IssueIdentifier: issue.Identifier,
                        File:            file,
                        LineNumber:      lineNumber,
                        LineContent:     strings.TrimSpace(line),
                    })
                }
            }
        }
    }

当正则表达式需要控制单行代码时,一切都正常
然而,我还需要检查出现在多行的.sol文件中的内容,例如检测以下代码片段:

require(
  _disputeID < disputeCount &&
  disputes[_disputeID].status == Status.Active,
  "Disputes::!Resolvable"
);

我尝试在analyzeFile()函数中添加以下代码:

contents, _ := ioutil.ReadFile(file)
for _, issue := range issues {
    if issue.ParsingMode == "MultiLine" {
        contents_to_string := string(contents)
        //s := strings.ReplaceAll(contents_to_string, "\n", " ")
        //sr := strings.ReplaceAll(s, "\r", " ")
        r := regexp.MustCompile(`((require)([(])\n.*[&&](?s)(.*?)([;]))`)
        finds := r.FindStringSubmatch(contents_to_string)
        for _, find := range finds {
            findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                IssueIdentifier: issue.Identifier,
                File:            file,
                LineContent:     (find),
            })
        }
    }
}

但是我得到了错误的结果,因为将源代码转换为字符串时,我得到了所有代码在一行上,并带有换行符\n字符,这导致任何正则表达式检查都会失败。

英文:

I am creating a project in Go that parses Solidity code. In my project, I created a function analyzeFile() which for each smart contract (.sol) will detect statically issues with regexp:

func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
    findings := make(map[string][]Finding)
    readFile, err := os.Open(file)
    if err != nil {
        return nil, err
    }
    defer readFile.Close()
    contents, _ := ioutil.ReadFile(file)
    scanner := bufio.NewScanner(readFile)
    lineNumber := 0
    for scanner.Scan() {
        lineNumber++
        line := scanner.Text()
        for _, issue := range issues {
            if issue.ParsingMode == &quot;SingleLine&quot; {
                matched, _ := regexp.MatchString(issue.Pattern, line)
                if matched {
                    findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                        IssueIdentifier: issue.Identifier,
                        File:            file,
                        LineNumber:      lineNumber,
                        LineContent:     strings.TrimSpace(line),
                    })
                }
            }
        }
    }

When the regexes have to control the code on a single line, everything is fine.
However, I also need to check things in the .sol files that occur on multiple lines, for instance detect this piece of code:

require(
  _disputeID &lt; disputeCount &amp;&amp;
  disputes[_disputeID].status == Status.Active,
  &quot;Disputes::!Resolvable&quot;
);

I tried to add the following code in the analyzeFile() function:

 contents, _ := ioutil.ReadFile(file)
    for _, issue := range issues {
        if issue.ParsingMode == &quot;MultiLine&quot; {
            contents_to_string := string(contents)
            //s := strings.ReplaceAll(contents_to_string, &quot;\n&quot;, &quot; &quot;)
            //sr := strings.ReplaceAll(s, &quot;\r&quot;, &quot; &quot;)
            r := regexp.MustCompile(`((require)([(])\n.*[&amp;&amp;](?s)(.*?)([;]))`)
            finds := r.FindStringSubmatch(contents_to_string)
            for _, find := range finds {
                findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
                    IssueIdentifier: issue.Identifier,
                    File:            file,
                    LineContent:     (find),
                })
            }
        }
    }

But I get wrong results because when transforming the source code to string, I get all the code on one line with line break \n character which makes any regex check crash.

答案1

得分: 0

一个解决方案是使用多行模式和\n来分割整个字符串,捕获组为(?s)require\((.*?)\);


func main() {
	var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
	var str = `require(
  _disputeID &lt; disputeCount &amp;&amp;
  disputes[_disputeID].status == Status.Active,
  &quot;Disputes::!Resolvable&quot;
);`

	matches := re.FindAllStringSubmatch(str, -1)
	for _, match := range matches {
		lines := strings.Split(match[1], "\n")
		for _, line := range lines {
			fmt.Println(line)
		}
	}
}

链接:https://go.dev/play/p/Omn5ULHun_-

为了匹配多行,可以使用(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$。我们可以对require()之间的内容进行多行匹配。

func main() {
	var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
	var str = `require(
  _disputeID &lt; disputeCount &amp;&amp;
  disputes[_disputeID].status == Status.Active,
  &quot;Disputes::!Resolvable&quot;
);`

	var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
	matches := re.FindAllStringSubmatch(str, -1)
	for _, match := range matches {
		submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
		for _, submatch := range submathes {
			fmt.Println(submatch[0])
		}
	}
}

链接:https://go.dev/play/p/LJsVy5vN6Ej

英文:

One word around solution could split the whole string with multiline with \n after caputer group (?s)require\((.*?)\);


func main() {
	var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
	var str = `require(
  _disputeID &lt; disputeCount &amp;&amp;
  disputes[_disputeID].status == Status.Active,
  &quot;Disputes::!Resolvable&quot;
);`

	matches := re.FindAllStringSubmatch(str, -1)
	for _, match := range matches {
		lines := strings.Split(match[1], &quot;\n&quot;)
		for _, line := range lines {
			fmt.Println(line)
		}
	}
}

https://go.dev/play/p/Omn5ULHun_-


In order to match multiple lines, the (?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$ could be used. We could do the multiline matching to the content between require( and )

func main() {
	var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
	var str = `require(
  _disputeID &lt; disputeCount &amp;&amp;
  disputes[_disputeID].status == Status.Active,
  &quot;Disputes::!Resolvable&quot;
);`

	var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
	matches := re.FindAllStringSubmatch(str, -1)
	for _, match := range matches {
		submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
		for _, submatch := range submathes {
			fmt.Println(submatch[0])
		}
	}
}

https://go.dev/play/p/LJsVy5vN6Ej

答案2

得分: 0

通过调整代码,我设法使其工作:

	contents, _ := ioutil.ReadFile(file)
	for _, issue := range issues {
		if issue.ParsingMode == "MultiLineG015" {
			str := string(contents)
			var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
			//var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
			//获取sol文件中的所有require语句
			matches := re.FindAllStringSubmatch(str, -1)
			r := regexp.MustCompile(issue.Pattern)
			for _, match := range matches {
				submatches := r.FindAllStringSubmatch(match[0], -1)
				for _, submatch := range submatches {
					findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
						IssueIdentifier: issue.Identifier,
						File:            file,
						LineContent:     ([]string{submatch[0]}),
					})
				}
			}
		}
	}

这是输出结果:

2022-08-rigor\contracts\Community.sol::0 => [(
            _lendingNeeded >= _communityProject.totalLent &&
                _lendingNeeded <= IProject(_project).projectCost(),
            "Community::invalid lending"
        );]
2022-08-rigor\contracts\Disputes.sol::0 => [(
            _disputeID < disputeCount &&
                disputes[_disputeID].status == Status.Active,
            "Disputes::!Resolvable"
        );]
2022-08-rigor\contracts\Disputes.sol::0 => [(
            _actionType > 0 && _actionType <= uint8(ActionType.TaskPay),
            "Disputes::!ActionType"
        );]
2022-08-rigor\contracts\Project.sol::0 => [(
            _sender == builder || _sender == homeFi.communityContract(),
            "Project::!Builder&&!Community"
        );]

感谢zangw的帮助!

英文:

By tinkering with the code I managed to get it to work:

	contents, _ := ioutil.ReadFile(file)
	for _, issue := range issues {
		if issue.ParsingMode == &quot;MultiLineG015&quot; {
			str := string(contents)
			var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
			//var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
			//Getting all require in the sol file
			matches := re.FindAllStringSubmatch(str, -1)
			r := regexp.MustCompile(issue.Pattern)
			for _, match := range matches {
				submatches := r.FindAllStringSubmatch(match[0], -1)
				for _, submatch := range submatches {
					findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
						IssueIdentifier: issue.Identifier,
						File:            file,
						LineContent:     ([]string{submatch[0]}),
					})
				}
			}

This is the output:

2022-08-rigor\contracts\Community.sol::0 =&gt; [(
            _lendingNeeded &gt;= _communityProject.totalLent &amp;&amp;
                _lendingNeeded &lt;= IProject(_project).projectCost(),
            &quot;Community::invalid lending&quot;
        );]
2022-08-rigor\contracts\Disputes.sol::0 =&gt; [(
            _disputeID &lt; disputeCount &amp;&amp;
                disputes[_disputeID].status == Status.Active,
            &quot;Disputes::!Resolvable&quot;
        );]
2022-08-rigor\contracts\Disputes.sol::0 =&gt; [(
            _actionType &gt; 0 &amp;&amp; _actionType &lt;= uint8(ActionType.TaskPay),
            &quot;Disputes::!ActionType&quot;
        );]
2022-08-rigor\contracts\Project.sol::0 =&gt; [(
            _sender == builder || _sender == homeFi.communityContract(),
            &quot;Project::!Builder&amp;&amp;!Community&quot;
        );]

Thanks zangw for your help!

huangapple
  • 本文由 发表于 2022年10月18日 17:38:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/74108831.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定