英文:
Golang multiline regexp parsing issue
问题
我正在使用Go语言创建一个解析Solidity代码的项目。在我的项目中,我创建了一个名为*analyzeFile()*的函数,用于检测每个智能合约(.sol)中的静态问题,使用正则表达式进行检测:
func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
findings := make(map[string][]Finding)
readFile, err := os.Open(file)
if err != nil {
return nil, err
}
defer readFile.Close()
contents, _ := ioutil.ReadFile(file)
scanner := bufio.NewScanner(readFile)
lineNumber := 0
for scanner.Scan() {
lineNumber++
line := scanner.Text()
for _, issue := range issues {
if issue.ParsingMode == "SingleLine" {
matched, _ := regexp.MatchString(issue.Pattern, line)
if matched {
findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
IssueIdentifier: issue.Identifier,
File: file,
LineNumber: lineNumber,
LineContent: strings.TrimSpace(line),
})
}
}
}
}
当正则表达式需要控制单行代码时,一切都正常。
然而,我还需要检查出现在多行的.sol文件中的内容,例如检测以下代码片段:
require(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);
我尝试在analyzeFile()函数中添加以下代码:
contents, _ := ioutil.ReadFile(file)
for _, issue := range issues {
if issue.ParsingMode == "MultiLine" {
contents_to_string := string(contents)
//s := strings.ReplaceAll(contents_to_string, "\n", " ")
//sr := strings.ReplaceAll(s, "\r", " ")
r := regexp.MustCompile(`((require)([(])\n.*[&&](?s)(.*?)([;]))`)
finds := r.FindStringSubmatch(contents_to_string)
for _, find := range finds {
findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
IssueIdentifier: issue.Identifier,
File: file,
LineContent: (find),
})
}
}
}
但是我得到了错误的结果,因为将源代码转换为字符串时,我得到了所有代码在一行上,并带有换行符\n字符,这导致任何正则表达式检查都会失败。
英文:
I am creating a project in Go that parses Solidity code. In my project, I created a function analyzeFile() which for each smart contract (.sol) will detect statically issues with regexp:
func analyzeFile(issues []Issue, file string) (map[string][]Finding, error) {
findings := make(map[string][]Finding)
readFile, err := os.Open(file)
if err != nil {
return nil, err
}
defer readFile.Close()
contents, _ := ioutil.ReadFile(file)
scanner := bufio.NewScanner(readFile)
lineNumber := 0
for scanner.Scan() {
lineNumber++
line := scanner.Text()
for _, issue := range issues {
if issue.ParsingMode == "SingleLine" {
matched, _ := regexp.MatchString(issue.Pattern, line)
if matched {
findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
IssueIdentifier: issue.Identifier,
File: file,
LineNumber: lineNumber,
LineContent: strings.TrimSpace(line),
})
}
}
}
}
When the regexes have to control the code on a single line, everything is fine.
However, I also need to check things in the .sol files that occur on multiple lines, for instance detect this piece of code:
require(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);
I tried to add the following code in the analyzeFile() function:
contents, _ := ioutil.ReadFile(file)
for _, issue := range issues {
if issue.ParsingMode == "MultiLine" {
contents_to_string := string(contents)
//s := strings.ReplaceAll(contents_to_string, "\n", " ")
//sr := strings.ReplaceAll(s, "\r", " ")
r := regexp.MustCompile(`((require)([(])\n.*[&&](?s)(.*?)([;]))`)
finds := r.FindStringSubmatch(contents_to_string)
for _, find := range finds {
findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
IssueIdentifier: issue.Identifier,
File: file,
LineContent: (find),
})
}
}
}
But I get wrong results because when transforming the source code to string, I get all the code on one line with line break \n character which makes any regex check crash.
答案1
得分: 0
一个解决方案是使用多行模式和\n
来分割整个字符串,捕获组为(?s)require\((.*?)\);
。
func main() {
var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
var str = `require(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);`
matches := re.FindAllStringSubmatch(str, -1)
for _, match := range matches {
lines := strings.Split(match[1], "\n")
for _, line := range lines {
fmt.Println(line)
}
}
}
链接:https://go.dev/play/p/Omn5ULHun_-
为了匹配多行,可以使用(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$
。我们可以对require(
和)
之间的内容进行多行匹配。
func main() {
var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
var str = `require(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);`
var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
matches := re.FindAllStringSubmatch(str, -1)
for _, match := range matches {
submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
for _, submatch := range submathes {
fmt.Println(submatch[0])
}
}
}
链接:https://go.dev/play/p/LJsVy5vN6Ej
英文:
One word around solution could split the whole string with multiline with \n
after caputer group (?s)require\((.*?)\);
func main() {
var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
var str = `require(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);`
matches := re.FindAllStringSubmatch(str, -1)
for _, match := range matches {
lines := strings.Split(match[1], "\n")
for _, line := range lines {
fmt.Println(line)
}
}
}
https://go.dev/play/p/Omn5ULHun_-
In order to match multiple lines, the (?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$
could be used. We could do the multiline matching to the content between require(
and )
func main() {
var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
var str = `require(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);`
var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
matches := re.FindAllStringSubmatch(str, -1)
for _, match := range matches {
submathes := multilineRe.FindAllStringSubmatch(match[1], -1)
for _, submatch := range submathes {
fmt.Println(submatch[0])
}
}
}
答案2
得分: 0
通过调整代码,我设法使其工作:
contents, _ := ioutil.ReadFile(file)
for _, issue := range issues {
if issue.ParsingMode == "MultiLineG015" {
str := string(contents)
var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
//var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
//获取sol文件中的所有require语句
matches := re.FindAllStringSubmatch(str, -1)
r := regexp.MustCompile(issue.Pattern)
for _, match := range matches {
submatches := r.FindAllStringSubmatch(match[0], -1)
for _, submatch := range submatches {
findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
IssueIdentifier: issue.Identifier,
File: file,
LineContent: ([]string{submatch[0]}),
})
}
}
}
}
这是输出结果:
2022-08-rigor\contracts\Community.sol::0 => [(
_lendingNeeded >= _communityProject.totalLent &&
_lendingNeeded <= IProject(_project).projectCost(),
"Community::invalid lending"
);]
2022-08-rigor\contracts\Disputes.sol::0 => [(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);]
2022-08-rigor\contracts\Disputes.sol::0 => [(
_actionType > 0 && _actionType <= uint8(ActionType.TaskPay),
"Disputes::!ActionType"
);]
2022-08-rigor\contracts\Project.sol::0 => [(
_sender == builder || _sender == homeFi.communityContract(),
"Project::!Builder&&!Community"
);]
感谢zangw的帮助!
英文:
By tinkering with the code I managed to get it to work:
contents, _ := ioutil.ReadFile(file)
for _, issue := range issues {
if issue.ParsingMode == "MultiLineG015" {
str := string(contents)
var re = regexp.MustCompile(`(?s)require\((.*?)\);`)
//var multilineRe = regexp.MustCompile(`(?m)^[^\S\r\n]*(.*)[^\S\r\n](\S+)$`)
//Getting all require in the sol file
matches := re.FindAllStringSubmatch(str, -1)
r := regexp.MustCompile(issue.Pattern)
for _, match := range matches {
submatches := r.FindAllStringSubmatch(match[0], -1)
for _, submatch := range submatches {
findings[issue.Identifier] = append(findings[issue.Identifier], Finding{
IssueIdentifier: issue.Identifier,
File: file,
LineContent: ([]string{submatch[0]}),
})
}
}
This is the output:
2022-08-rigor\contracts\Community.sol::0 => [(
_lendingNeeded >= _communityProject.totalLent &&
_lendingNeeded <= IProject(_project).projectCost(),
"Community::invalid lending"
);]
2022-08-rigor\contracts\Disputes.sol::0 => [(
_disputeID < disputeCount &&
disputes[_disputeID].status == Status.Active,
"Disputes::!Resolvable"
);]
2022-08-rigor\contracts\Disputes.sol::0 => [(
_actionType > 0 && _actionType <= uint8(ActionType.TaskPay),
"Disputes::!ActionType"
);]
2022-08-rigor\contracts\Project.sol::0 => [(
_sender == builder || _sender == homeFi.communityContract(),
"Project::!Builder&&!Community"
);]
Thanks zangw for your help!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论