Golang多行正则表达式解析问题

huangapple go评论111阅读模式
英文:

Golang multiline regexp parsing issue

问题

我正在使用Go语言创建一个解析Solidity代码的项目。在我的项目中,我创建了一个名为*analyzeFile()*的函数,用于检测每个智能合约(.sol)中的静态问题,使用正则表达式进行检测:

  1. func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
  2. findings := make(map[string][]Finding)
  3. readFile, err := os.Open(file)
  4. if err != nil {
  5. return nil, err
  6. }
  7. defer readFile.Close()
  8. contents, _ := ioutil.ReadFile(file)
  9. scanner := bufio.NewScanner(readFile)
  10. lineNumber := 0
  11. for scanner.Scan() {
  12. lineNumber++
  13. line := scanner.Text()
  14. for _, issue := range issues {
  15. if issue.ParsingMode == "SingleLine" {
  16. matched, _ := regexp.MatchString(issue.Pattern, line)
  17. if matched {
  18. findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
  19. IssueIdentifier: issue.Identifier,
  20. File: file,
  21. LineNumber: lineNumber,
  22. LineContent: strings.TrimSpace(line),
  23. })
  24. }
  25. }
  26. }
  27. }

当正则表达式需要控制单行代码时,一切都正常
然而,我还需要检查出现在多行的.sol文件中的内容,例如检测以下代码片段:

  1. require(
  2. _disputeID < disputeCount &&
  3. disputes[_disputeID].status == Status.Active,
  4. "Disputes::!Resolvable"
  5. );

我尝试在analyzeFile()函数中添加以下代码:

  1. contents, _ := ioutil.ReadFile(file)
  2. for _, issue := range issues {
  3. if issue.ParsingMode == "MultiLine" {
  4. contents_to_string := string(contents)
  5. //s := strings.ReplaceAll(contents_to_string, "\n", " ")
  6. //sr := strings.ReplaceAll(s, "\r", " ")
  7. r := regexp.MustCompile(`((require)([(])\n.*[&&](?s)(.*?)([;]))`)
  8. finds := r.FindStringSubmatch(contents_to_string)
  9. for _, find := range finds {
  10. findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
  11. IssueIdentifier: issue.Identifier,
  12. File: file,
  13. LineContent: (find),
  14. })
  15. }
  16. }
  17. }

但是我得到了错误的结果,因为将源代码转换为字符串时,我得到了所有代码在一行上,并带有换行符\n字符,这导致任何正则表达式检查都会失败。

英文:

I am creating a project in Go that parses Solidity code. In my project, I created a function analyzeFile() which for each smart contract (.sol) will detect statically issues with regexp:

  1. func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
  2. findings := make(map[string][]Finding)
  3. readFile, err := os.Open(file)
  4. if err != nil {
  5. return nil, err
  6. }
  7. defer readFile.Close()
  8. contents, _ := ioutil.ReadFile(file)
  9. scanner := bufio.NewScanner(readFile)
  10. lineNumber := 0
  11. for scanner.Scan() {
  12. lineNumber++
  13. line := scanner.Text()
  14. for _, issue := range issues {
  15. if issue.ParsingMode == &quot;SingleLine&quot; {
  16. matched, _ := regexp.MatchString(issue.Pattern, line)
  17. if matched {
  18. findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
  19. IssueIdentifier: issue.Identifier,
  20. File: file,
  21. LineNumber: lineNumber,
  22. LineContent: strings.TrimSpace(line),
  23. })
  24. }
  25. }
  26. }
  27. }

When the regexes have to control the code on a single line, everything is fine.
However, I also need to check things in the .sol files that occur on multiple lines, for instance detect this piece of code:

  1. require(
  2. _disputeID &lt; disputeCount &amp;&amp;
  3. disputes[_disputeID].status == Status.Active,
  4. &quot;Disputes::!Resolvable&quot;
  5. );

I tried to add the following code in the analyzeFile() function:

  1. contents, _ := ioutil.ReadFile(file)
  2. for _, issue := range issues {
  3. if issue.ParsingMode == &quot;MultiLine&quot; {
  4. contents_to_string := string(contents)
  5. //s := strings.ReplaceAll(contents_to_string, &quot;\n&quot;, &quot; &quot;)
  6. //sr := strings.ReplaceAll(s, &quot;\r&quot;, &quot; &quot;)
  7. r := regexp.MustCompile(`((require)([(])\n.*[&amp;&amp;](?s)(.*?)([;]))`)
  8. finds := r.FindStringSubmatch(contents_to_string)
  9. for _, find := range finds {
  10. findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
  11. IssueIdentifier: issue.Identifier,
  12. File: file,
  13. LineContent: (find),
  14. })
  15. }
  16. }
  17. }

But I get wrong results because when transforming the source code to string, I get all the code on one line with line break \n character which makes any regex check crash.

答案1

得分: 0

一个解决方案是使用多行模式和\n来分割整个字符串,捕获组为(?s)require\((.*?)\);

  1. func main() {
  2. var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
  3. var str = `require(
  4. _disputeID &lt; disputeCount &amp;&amp;
  5. disputes[_disputeID].status == Status.Active,
  6. &quot;Disputes::!Resolvable&quot;
  7. );`
  8. matches := re.FindAllStringSubmatch(str, -1)
  9. for _, match := range matches {
  10. lines := strings.Split(match[1], "\n")
  11. for _, line := range lines {
  12. fmt.Println(line)
  13. }
  14. }
  15. }

链接:https://go.dev/play/p/Omn5ULHun_-

为了匹配多行,可以使用(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$。我们可以对require()之间的内容进行多行匹配。

  1. func main() {
  2. var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
  3. var str = `require(
  4. _disputeID &lt; disputeCount &amp;&amp;
  5. disputes[_disputeID].status == Status.Active,
  6. &quot;Disputes::!Resolvable&quot;
  7. );`
  8. var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
  9. matches := re.FindAllStringSubmatch(str, -1)
  10. for _, match := range matches {
  11. submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
  12. for _, submatch := range submathes {
  13. fmt.Println(submatch[0])
  14. }
  15. }
  16. }

链接:https://go.dev/play/p/LJsVy5vN6Ej

英文:

One word around solution could split the whole string with multiline with \n after caputer group (?s)require\((.*?)\);

  1. func main() {
  2. var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
  3. var str = `require(
  4. _disputeID &lt; disputeCount &amp;&amp;
  5. disputes[_disputeID].status == Status.Active,
  6. &quot;Disputes::!Resolvable&quot;
  7. );`
  8. matches := re.FindAllStringSubmatch(str, -1)
  9. for _, match := range matches {
  10. lines := strings.Split(match[1], &quot;\n&quot;)
  11. for _, line := range lines {
  12. fmt.Println(line)
  13. }
  14. }
  15. }

https://go.dev/play/p/Omn5ULHun_-


In order to match multiple lines, the (?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$ could be used. We could do the multiline matching to the content between require( and )

  1. func main() {
  2. var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
  3. var str = `require(
  4. _disputeID &lt; disputeCount &amp;&amp;
  5. disputes[_disputeID].status == Status.Active,
  6. &quot;Disputes::!Resolvable&quot;
  7. );`
  8. var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
  9. matches := re.FindAllStringSubmatch(str, -1)
  10. for _, match := range matches {
  11. submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
  12. for _, submatch := range submathes {
  13. fmt.Println(submatch[0])
  14. }
  15. }
  16. }

https://go.dev/play/p/LJsVy5vN6Ej

答案2

得分: 0

通过调整代码,我设法使其工作:

  1. contents, _ := ioutil.ReadFile(file)
  2. for _, issue := range issues {
  3. if issue.ParsingMode == "MultiLineG015" {
  4. str := string(contents)
  5. var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
  6. //var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
  7. //获取sol文件中的所有require语句
  8. matches := re.FindAllStringSubmatch(str, -1)
  9. r := regexp.MustCompile(issue.Pattern)
  10. for _, match := range matches {
  11. submatches := r.FindAllStringSubmatch(match[0], -1)
  12. for _, submatch := range submatches {
  13. findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
  14. IssueIdentifier: issue.Identifier,
  15. File: file,
  16. LineContent: ([]string{submatch[0]}),
  17. })
  18. }
  19. }
  20. }
  21. }

这是输出结果:

  1. 2022-08-rigor\contracts\Community.sol::0 => [(
  2. _lendingNeeded >= _communityProject.totalLent &&
  3. _lendingNeeded <= IProject(_project).projectCost(),
  4. "Community::invalid lending"
  5. );]
  6. 2022-08-rigor\contracts\Disputes.sol::0 => [(
  7. _disputeID < disputeCount &&
  8. disputes[_disputeID].status == Status.Active,
  9. "Disputes::!Resolvable"
  10. );]
  11. 2022-08-rigor\contracts\Disputes.sol::0 => [(
  12. _actionType > 0 && _actionType <= uint8(ActionType.TaskPay),
  13. "Disputes::!ActionType"
  14. );]
  15. 2022-08-rigor\contracts\Project.sol::0 => [(
  16. _sender == builder || _sender == homeFi.communityContract(),
  17. "Project::!Builder&&!Community"
  18. );]

感谢zangw的帮助!

英文:

By tinkering with the code I managed to get it to work:

  1. contents, _ := ioutil.ReadFile(file)
  2. for _, issue := range issues {
  3. if issue.ParsingMode == &quot;MultiLineG015&quot; {
  4. str := string(contents)
  5. var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
  6. //var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
  7. //Getting all require in the sol file
  8. matches := re.FindAllStringSubmatch(str, -1)
  9. r := regexp.MustCompile(issue.Pattern)
  10. for _, match := range matches {
  11. submatches := r.FindAllStringSubmatch(match[0], -1)
  12. for _, submatch := range submatches {
  13. findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
  14. IssueIdentifier: issue.Identifier,
  15. File: file,
  16. LineContent: ([]string{submatch[0]}),
  17. })
  18. }
  19. }

This is the output:

  1. 2022-08-rigor\contracts\Community.sol::0 =&gt; [(
  2. _lendingNeeded &gt;= _communityProject.totalLent &amp;&amp;
  3. _lendingNeeded &lt;= IProject(_project).projectCost(),
  4. &quot;Community::invalid lending&quot;
  5. );]
  6. 2022-08-rigor\contracts\Disputes.sol::0 =&gt; [(
  7. _disputeID &lt; disputeCount &amp;&amp;
  8. disputes[_disputeID].status == Status.Active,
  9. &quot;Disputes::!Resolvable&quot;
  10. );]
  11. 2022-08-rigor\contracts\Disputes.sol::0 =&gt; [(
  12. _actionType &gt; 0 &amp;&amp; _actionType &lt;= uint8(ActionType.TaskPay),
  13. &quot;Disputes::!ActionType&quot;
  14. );]
  15. 2022-08-rigor\contracts\Project.sol::0 =&gt; [(
  16. _sender == builder || _sender == homeFi.communityContract(),
  17. &quot;Project::!Builder&amp;&amp;!Community&quot;
  18. );]

Thanks zangw for your help!

huangapple
  • 本文由 发表于 2022年10月18日 17:38:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/74108831.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定