在Golang中是否可以获取需要JS的网页?

huangapple go评论109阅读模式
英文:

Is it possible to get a webpage that requires JS in Golang?

问题

创建一个微服务,获取一些网站的主页面HTML。其中一个网站会检查是否启用了JS,如果没有检测到JS,则会重定向到错误页面。

在Golang中有没有解决这个问题的方法?

编辑:尝试使用这个包(JavaScript解释器),但没有成功。

编辑2:现在是2020年,改用js Puppeteer

它使用嵌入式浏览器,是一个非常成熟且功能丰富的库。对于复杂的Web应用程序,嵌入式浏览器确实是唯一的选择。

对于使用其他语言编写的后端,我仍然会将'Puppeteer'用作微服务。

希望这对将来的任何人有所帮助。

谢谢

英文:

Creating a micro service that gets couple websites main pages html.
One of them performs a check for enabled JS and redirects to error page
if no JS was detected.

Is there a way around it with Golang?

EDIT: attempted to play with this package (JavaScript interpreter) but with no luck..

EDIT2: its 2020, moved to use js Puppeteer

It uses embedded browser and is a very mature and packed with utilities library. for complex web apps embedded browser is really the only one to go

For backends written in other the js I would still use 'Puppeteer' as a micro service

hope this helps anyone in the future

thanks

答案1

得分: 2

是的,这是可能的。就像Gonzalez之前提到的,PhantomJS是一个不错的选择。但是有一些事情我想澄清一下,首先,在Linux上使用phantomgo存储库存在一个问题,因为开发者没有提供Linux平台的PhantomJS二进制文件。

使用这个存储库的方法应该按照以下说明进行:

  1. go get github.com/k4s/phantomgo
  2. go get github.com/k4s/webrowser
  3. 此页面上下载适用于您平台的预编译的PhantomJS二进制文件。
  4. 将二进制文件添加到$GOPATH/src/github.com/k4s/phantomgo/phantomgojs文件夹中。
import (
	"fmt"
	"io/ioutil"
	"net/http"

	. "github.com/k4s/webrowser"
)

func main() {
	p := &Param{
		Method:       "GET",
		Url:          "http://google.com",
		Header:       http.Header{"Cookie": []string{"your cookie"}},
		UsePhantomJS: true,
	}
	brower := NewWebrowse()
	resp, err := brower.Download(p)
	if err != nil {
		fmt.Println(err)
	}
	body, err := ioutil.ReadAll(resp.Body)
	fmt.Println(string(body))
	fmt.Println(resp.Cookies())
}
英文:

Yes, it is possible. Like Gonzalez mentioned earlier, PhantomJS is a good choice. But there are some things I would like to clarify, first there is a problem using the phantomgo repository on Linux as the developer doesn't provide the binary of the PanthomJS for Linux.

The way this repository is used should be using the following instructions:

  1. go get github.com/k4s/phantomgo
  2. go get github.com/k4s/webrowser
  3. Download the pre-compiled binary for your platform of PhantomJS on this page.
  4. Add the binary to the $GOPATH/src/github.com/k4s/phantomgo/phantomgojs folder.
import (
	"fmt"
	"io/ioutil"
	"net/http"

	. "github.com/k4s/webrowser"
)

func main() {
	p := &Param{
		Method:       "GET",
		Url:          "http://google.com",
		Header:       http.Header{"Cookie": []string{"your cookie"}},
		UsePhantomJS: true,
	}
	brower := NewWebrowse()
	resp, err := brower.Download(p)
	if err != nil {
		fmt.Println(err)
	}
	body, err := ioutil.ReadAll(resp.Body)
	fmt.Println(string(body))
	fmt.Println(resp.Cookies())
}

答案2

得分: 1

尝试使用Go语言的PhantomJS,链接为https://github.com/k4s/phantomgo。

我曾经尝试过它,对我来说效果很好,也许对你有帮助。

英文:

Try PhantomJS for Go https://github.com/k4s/phantomgo

I tried it once and it worked for me, maybe it'll help you.

答案3

得分: 0

使用无头 Chrome。
这里有一个示例

英文:

Use Headless Chrome.
Here is an example.

huangapple
  • 本文由 发表于 2017年5月16日 19:24:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/44000154.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定