问题

我为亚马逊产品标题做了爬取，但是亚马逊的验证码抓住了我的爬虫。我尝试了10次运行main.go（8次被抓住，2次成功爬取了产品标题）。

我进行了研究，但没有找到针对golang的解决方案（只有Python的解决方案），是否有适用于我的解决方案？

package main

import (
	"fmt"
	"strings"

	"github.com/gocolly/colly"
)

func main() {

	// 创建一个专门用于 Shopify 的 Collector
	c := colly.NewCollector(
		colly.AllowedDomains("www.amazon.com", "amazon.com"),
	)
	c.OnHTML("div", func(h *colly.HTMLElement) {
		capctha := h.Text
		title := h.ChildText("span#productTitle")
		fmt.Println(strings.TrimSpace(title))
		fmt.Println(strings.TrimSpace(capctha))
	})

	// 启动 Collector
	c.Visit("https://www.amazon.com/Bluetooth-Over-Ear-Headphones-Foldable-Prolonged/dp/B07K5214NZ")
}

输出：

输入下面看到的字符。抱歉，我们只是需要确保您不是机器人。为了获得最佳结果，请确保您的浏览器接受 cookie。

英文:

I did make Scraping for Amazon Product Titles but Amazon captcha catches my scraper. I tried 10 times- go run main.go(8 times catches me - 2 times I scraped the product title)

I researched this but I did not find any solution for golang(there is just python) is there any solution for me?

package main

import (
	&quot;fmt&quot;
	&quot;strings&quot;0

	&quot;github.com/gocolly/colly&quot;
)

func main() {

	// Create a Collector specifically for Shopify
	c := colly.NewCollector(
		colly.AllowedDomains(&quot;www.amazon.com&quot;, &quot;amazon.com&quot;),
	)
	c.OnHTML(&quot;div&quot;, func(h *colly.HTMLElement) {
		capctha := h.Text
		title := h.ChildText(&quot;span#productTitle&quot;)
		fmt.Println(strings.TrimSpace(title))
		fmt.Println(strings.TrimSpace(capctha))
	})

	// Start the collector
	c.Visit(&quot;https://www.amazon.com/Bluetooth-Over-Ear-Headphones-Foldable-Prolonged/dp/B07K5214NZ&quot;)
}

> Output:
>
> Enter the characters you see below Sorry, we just need to make sure
> you're not a robot. For best results, please make sure your browser is
> accepting cookies.

答案1

得分: 2

如果您不介意使用不同的软件包，我写了一个用于搜索HTML的软件包（实际上是github.com/tdewolff/parse的薄包装）：

package main

import (
   "github.com/89z/parse/html"
   "net/http"
   "os"
)

func main() {
   req, err := http.NewRequest(
      "GET", "https://www.amazon.com/dp/B07K5214NZ", nil,
   )
   req.Header = http.Header{
      "User-Agent": {"Mozilla"},
   }
   res, err := new(http.Transport).RoundTrip(req)
   if err != nil {
      panic(err)
   }
   defer res.Body.Close()
   lex := html.NewLexer(res.Body)
   lex.NextAttr("id", "productTitle")
   os.Stdout.Write(lex.Bytes())
}

结果：

Bluetooth Headphones Over-Ear, Zihnic Foldable Wireless and Wired Stereo
Headset Micro SD/TF, FM for Cell Phone,PC,Soft Earmuffs &amp;Light Weight for
Prolonged Waring(Rose Gold)

https://github.com/89z/parse

英文:

If you don't mind a different package, I wrote a package to search HTML
(essentially thin wrapper around github.com/tdewolff/parse):

package main

import (
   &quot;github.com/89z/parse/html&quot;
   &quot;net/http&quot;
   &quot;os&quot;
)

func main() {
   req, err := http.NewRequest(
      &quot;GET&quot;, &quot;https://www.amazon.com/dp/B07K5214NZ&quot;, nil,
   )
   req.Header = http.Header{
      &quot;User-Agent&quot;: {&quot;Mozilla&quot;},
   }
   res, err := new(http.Transport).RoundTrip(req)
   if err != nil {
      panic(err)
   }
   defer res.Body.Close()
   lex := html.NewLexer(res.Body)
   lex.NextAttr(&quot;id&quot;, &quot;productTitle&quot;)
   os.Stdout.Write(lex.Bytes())
}

Result:

Bluetooth Headphones Over-Ear, Zihnic Foldable Wireless and Wired Stereo
Headset Micro SD/TF, FM for Cell Phone,PC,Soft Earmuffs &amp;Light Weight for
Prolonged Waring(Rose Gold)

https://github.com/89z/parse

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang Colly爬虫 – 网站验证码阻碍了我的爬取

问题

答案1

Why can I not use func literal as custom func type despite matching parameters in golang?

go-staticcheck: file的值从未被使用（SA4006）

当密钥包含点（.）时，如何引用密钥的值？

Parsing a date from a mongoDB with golang time.Now()

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论