Match regexp email in website in go

huangapple go评论78阅读模式
英文:

Match regexp email in website in go

问题

我尝试在Goland中使用包含URL的文件来查找网站中的电子邮件匹配项。例如,如果我在文件中输入"http://facebook.com",它将尝试查找网站中的所有电子邮件,但结果始终为0。我认为我选择了错误的函数,但我尝试找到其他函数,但结果相同。以下是代码:

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"log"
	"net/http"
	"os"
	"regexp"
	"sync"
)

func main() {
	var wg sync.WaitGroup
	wg.Add(1)
	go emailWeb(os.Args[1], &wg)
	wg.Wait()

}

func emailWeb(name string, wg *sync.WaitGroup) {
	file, err := os.Open(name)
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		str := scanner.Text()
		nb_arobase := numberEmail(str)
		fmt.Println("URL : ", str, " nb email: ", nb_arobase)
	}

	if err := scanner.Err(); err != nil {
		log.Fatal(err)
	}
	(*wg).Done()
}

func numberEmail(url string) int {
	count := 0
	reg := regexp.MustCompile(`[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,4}`)
	response, err := http.Get(url)
	if err != nil {
		log.Fatal(err)
	} else {
		str := response.Body
		buf := new(bytes.Buffer)
		buf.ReadFrom(str)
		bodyStr := buf.String()

		for i := 0; i < len(bodyStr); i++ {
			if reg.MatchString(string(bodyStr[i])) {
				count += 1
			}
		}
	}
	return count
}

希望这可以帮助你解决问题。

英文:

I try to find email match in a website in goland with a file include url, for example, if i put "http://facebook.com" in the file, he will try to find all email find in the website, but he always result 0. I think I choose the wrong function but i try to find other function but i've got the same result. Here the code :

    package main
import (
&quot;bufio&quot;
&quot;bytes&quot;
&quot;fmt&quot;
&quot;log&quot;
&quot;net/http&quot;
&quot;os&quot;
&quot;regexp&quot;
&quot;sync&quot;
)
func main() {
var wg sync.WaitGroup
wg.Add(1)
go emailWeb(os.Args[1], &amp;wg)
wg.Wait()
}
func emailWeb(name string, wg *sync.WaitGroup) {
file, err := os.Open(name)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
str := scanner.Text()
nb_arobase := numberEmail(str)
fmt.Println(&quot;URL : &quot;, str, &quot; nb email: &quot;, nb_arobase)
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
(*wg).Done()
}
func numberEmail(url string) int {
count := 0
reg := regexp.MustCompile(`[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,4}`)
response, err := http.Get(url)
if err != nil {
log.Fatal(err)
} else {
str := response.Body
buf := new(bytes.Buffer)
buf.ReadFrom(str)
bodyStr := buf.String()
for i := 0; i &lt; len(bodyStr); i++ {
if reg.MatchString(string(bodyStr[i])) {
count += 1
}
}
}
return count
}

答案1

得分: 0

你正在尝试将正则表达式与HTTP响应体中的每个单个字符进行匹配。如果你想要计算整个响应体中的匹配次数,可以通过计算匹配的索引来实现。

resp, err := http.Get(url)
if err != nil {
log.Println(err)
return 0
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Println(err)
return 0
}
return len(reg.FindAllIndex(body))
英文:

You're trying to match the regexp against each individual character in the http response body. You can count the matches in the entire body if you want by counting the matched indexes.

resp, err := http.Get(url)
if err != nil {
log.Println(err)
return 0
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Println(err)
return 0
}
return len(reg.FindAllIndex(body))

huangapple
  • 本文由 发表于 2017年1月13日 01:22:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/41619311.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定