在AWS Lambda中遇到了Websocket URL超时错误。

huangapple go评论92阅读模式
英文:

Websocket URL timeout reached error in AWS Lambda

问题

我正在使用Go语言在AWS Lambda上工作。我正在部署Go语言代码,使用chromedp,使用Docker镜像,并遇到了WebSocket URL超时错误。我的Lambda设置为3008 MB的RAM内存,512MB的存储空间,以及15分钟的超时时间。你能找出问题所在并提供修复方法吗?这里是main.goDockerfile文件。

main.go文件(chromedp部分):

func getPage(URL string, lineNum string, stationNm string) {
	// 爬取设置
	ctx, cancel := chromedp.NewContext(
		context.Background(),
		chromedp.WithLogf(log.Printf),
	)
	defer cancel()

	opts := []chromedp.ExecAllocatorOption{
		chromedp.DisableGPU,
		chromedp.NoSandbox,
		chromedp.Headless,
		chromedp.Flag("no-zygote", true),
		chromedp.Flag("single-process", true),
		chromedp.Flag("homedir", "/tmp"),
		chromedp.Flag("data-path", "/tmp/data-path"),
		chromedp.Flag("disk-cache-dir", "/tmp/cache-dir"),
		chromedp.Flag("remote-debugging-port", "9222"),
		chromedp.Flag("remote-debugging-address", "0.0.0.0"),
		chromedp.Flag("disable-dev-shm-usage", true),
	}

	allocCtx, cancel := chromedp.NewExecAllocator(ctx, opts...)
	defer cancel()

	ctx, cancel = chromedp.NewContext(allocCtx, chromedp.WithLogf(log.Printf))
	defer cancel()

	var htmlContent string

	ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
		return strings.Contains(i.URL, "/timetable/web/")
	})

}

Dockerfile文件:

FROM public.ecr.aws/lambda/provided:al2 AS build

ENV GO111MODULE=on \
    CGO_ENABLED=0 \
    GOOS=linux \
    GOARCH=amd64

# 解决扩展警告
RUN mkdir -p /opt/extensions
RUN yum -y install golang
RUN go env -w GOPROXY=direct

# 克隆git,复制go.mod,go.sum,main.go
WORKDIR /var/task/
RUN yum install git -y
RUN git clone https://github.com/seedspirit/NaverCrawler-CICD-go.git
RUN cp NaverCrawler-CICD-go/main.go /var/task/
RUN cp NaverCrawler-CICD-go/go.mod /var/task/
RUN cp NaverCrawler-CICD-go/go.sum /var/task/

# 缓存依赖项
RUN go mod download
RUN go build -o main .

FROM public.ecr.aws/lambda/provided:al2
COPY --from=build /var/task/main /var/task/main

# 安装Chrome依赖项
RUN curl https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm -o chrome.rpm && \
    yum install -y ./chrome.rpm && \
    yum install -y fontconfig libX11 GConf2 dbus-x11

ENTRYPOINT ["/var/task/main"]
英文:

I'm working on AWS Lambda with Go lang. I'm deploying Go lang code, use chromedp , with Docker image and got websocket URL timeout reached error. My lambda setting is with 3008 MB RAM memory, 512MB storage, and 15 minutes timeout. Can you find what is wrong and how to fix this? Here is file main.go and Dockerfile

File main.go (chromedp part)

func getPage(URL string, lineNum string, stationNm string) {
	// settings for crawling
	ctx, cancle := chromedp.NewContext(
		context.Background(),
		chromedp.WithLogf(log.Printf),
	)
	defer cancle()

	opts := []chromedp.ExecAllocatorOption{
		chromedp.DisableGPU,
		chromedp.NoSandbox,
		chromedp.Headless,
		chromedp.Flag("no-zygote", true),
		chromedp.Flag("single-process", true),
		chromedp.Flag("homedir", "/tmp"),
		chromedp.Flag("data-path", "/tmp/data-path"),
		chromedp.Flag("disk-cache-dir", "/tmp/cache-dir"),
		chromedp.Flag("remote-debugging-port", "9222"),
		chromedp.Flag("remote-debugging-address", "0.0.0.0"),
		chromedp.Flag("disable-dev-shm-usage", true),
	}

	allocCtx, cancel := chromedp.NewExecAllocator(ctx, opts...)
	defer cancel()

	ctx, cancel = chromedp.NewContext(allocCtx, chromedp.WithLogf(log.Printf))
	defer cancel()

	var htmlContent string

	ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
		return strings.Contains(i.URL, "/timetable/web/")
	})

}

File Dockerfile

FROM public.ecr.aws/lambda/provided:al2 AS build

ENV GO111MODULE=on \
    CGO_ENABLED=0 \
    GOOS=linux \
    GOARCH=amd64

# Get rid of the extension warning
RUN mkdir -p /opt/extensions
RUN yum -y install golang
RUN go env -w GOPROXY=direct

# Clone git, copying go.mod, go.sum, main.go
WORKDIR /var/task/
RUN yum install git -y
RUN git clone https://github.com/seedspirit/NaverCrawler-CICD-go.git
RUN cp NaverCrawler-CICD-go/main.go /var/task/
RUN cp NaverCrawler-CICD-go/go.mod /var/task/
RUN cp NaverCrawler-CICD-go/go.sum /var/task/

# cache dependencies
RUN go mod download
RUN go build -o main .

FROM public.ecr.aws/lambda/provided:al2
COPY --from=build /var/task/main /var/task/main

# Install Chrome dependencies
RUN curl https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm -o chrome.rpm && \
    yum install -y ./chrome.rpm && \
    yum install -y fontconfig libX11 GConf2 dbus-x11

ENTRYPOINT ["/var/task/main"]

答案1

得分: 1

建议使用chromedp/headless-shell,因为它体积小,更适合在AWS Lambda上使用。

我刚刚使用chromedp/headless-shell测试了一个简单的示例,它可以正常工作。

Dockerfile:

FROM golang:1.20.4-alpine3.17 AS builder

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .

RUN go build -o main

FROM chromedp/headless-shell:113.0.5672.93

WORKDIR /app

COPY --from=builder /app/main .

ENTRYPOINT ["./main"]

main.go:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"os"

	"github.com/aws/aws-lambda-go/lambda"
	"github.com/chromedp/chromedp"
)

func Handler(_ context.Context, _ json.RawMessage) error {
	opts := []chromedp.ExecAllocatorOption{
		chromedp.NoSandbox,
		chromedp.Flag("disable-setuid-sandbox", true),
		chromedp.Flag("disable-dev-shm-usage", true),
		chromedp.Flag("single-process", true),
		chromedp.Flag("no-zygote", true),
	}
	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()

	ctx, cancel = chromedp.NewContext(ctx, chromedp.WithDebugf(log.Printf))
	defer cancel()

	var content string
	if err := chromedp.Run(ctx, chromedp.Tasks{
		chromedp.Navigate("https://example.com/"),
		chromedp.Text("body > div > p:nth-child(2)", &content),
	}); err != nil {
		log.Fatal(err)
	}
	fmt.Println(content)
	return nil
}

func main() {
	if _, exists := os.LookupEnv("AWS_LAMBDA_RUNTIME_API"); exists {
		lambda.Start(Handler)
	} else {
		err := Handler(context.Background(), nil)
		if err != nil {
			log.Fatal(err)
		}
	}
}

这个示例基于https://github.com/Andiedie/chromedp-aws-lambda-example。请注意,列出的chromedp.ExecAllocatorOption是直接从该存储库复制的。它可以工作,但我不确定这是否是最佳的选项列表。也许你需要根据自己的需求进行调整。

英文:

It's recommended to use chromedp/headless-shell because it's small and more suitable for AWS Lambda.

I just tested a simple demo with chromedp/headless-shell, and it works.

Dockerfile:

FROM golang:1.20.4-alpine3.17 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o main
FROM chromedp/headless-shell:113.0.5672.93
WORKDIR /app
COPY --from=builder /app/main .
ENTRYPOINT [ "./main" ]

main.go:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"os"

	"github.com/aws/aws-lambda-go/lambda"
	"github.com/chromedp/chromedp"
)

func Handler(_ context.Context, _ json.RawMessage) error {
	opts := []chromedp.ExecAllocatorOption{
		chromedp.NoSandbox,
		chromedp.Flag("disable-setuid-sandbox", true),
		chromedp.Flag("disable-dev-shm-usage", true),
		chromedp.Flag("single-process", true),
		chromedp.Flag("no-zygote", true),
	}
	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()

	ctx, cancel = chromedp.NewContext(ctx, chromedp.WithDebugf(log.Printf))
	defer cancel()

	var content string
	if err := chromedp.Run(ctx, chromedp.Tasks{
		chromedp.Navigate("https://example.com/"),
		chromedp.Text("body > div > p:nth-child(2)", &content),
	}); err != nil {
		log.Fatal(err)
	}
	fmt.Println(content)
	return nil
}

func main() {
	if _, exists := os.LookupEnv("AWS_LAMBDA_RUNTIME_API"); exists {
		lambda.Start(Handler)
	} else {
		err := Handler(context.Background(), nil)
		if err != nil {
			log.Fatal(err)
		}
	}
}

This example is based on https://github.com/Andiedie/chromedp-aws-lambda-example. Please note that the chromedp.ExecAllocatorOptions listed is copied directly from that repository. It works, but I'm not sure whether this is the best list of options. Maybe you have to adjust them according to your needs.

答案2

得分: 0

对于来到这里的人!我是用以下方式解决的

Dockerfile

FROM golang:1.20.4-alpine3.17 AS builder
ENV GO111MODULE=on \
CGO_ENABLED=0 \
GOOS=linux \
GOARCH=amd64
WORKDIR /app
RUN apk update && apk add ca-certificates && rm -rf /var/cache/apk/*
COPY go.mod go.sum main.go ./
RUN go mod download
COPY . .
RUN go build -o main
FROM chromedp/headless-shell:113.0.5672.93
WORKDIR /app
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /app/main .
ENTRYPOINT ["./main"]

function

func getPage(URL string, lineNum string, stationNm string) {
// settings for crawling
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.NoSandbox,
chromedp.Flag("disable-setuid-sandbox", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("single-process", true),
chromedp.Flag("no-zygote", true),
)
alloCtx, _ := chromedp.NewExecAllocator(context.Background(), opts...)
ctx, cancel := chromedp.NewContext(alloCtx, chromedp.WithLogf(log.Printf))
defer cancel()
var htmlContent string
ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
return strings.Contains(i.URL, "/timetable/web/")
})
err := chromedp.Run(ctx,
chromedp.Navigate(URL),
chromedp.WaitVisible(".end_footer_area"),
chromedp.Click("button"),
)
checkErr(err)
newContext, cancel := chromedp.NewContext(ctx, chromedp.WithTargetID(<-ch))
defer cancel()
if err := chromedp.Run(newContext,
chromedp.WaitReady(".table_schedule", chromedp.ByQuery),
chromedp.OuterHTML(".schedule_wrap", &htmlContent, chromedp.ByQuery),
); err != nil {
panic(err)
}
crawler(htmlContent, lineNum, stationNm)
}
英文:

For people come here! I Solved in this way

Dockerfile

FROM golang:1.20.4-alpine3.17 AS builder
ENV GO111MODULE=on \
CGO_ENABLED=0 \
GOOS=linux \
GOARCH=amd64
WORKDIR /app
RUN apk update &amp;&amp; apk add ca-certificates &amp;&amp; rm -rf /var/cache/apk/*
COPY go.mod go.sum main.go ./
RUN go mod download
COPY . .
RUN go build -o main
FROM chromedp/headless-shell:113.0.5672.93
WORKDIR /app
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /app/main .
ENTRYPOINT [ &quot;./main&quot; ]

function

func getPage(URL string, lineNum string, stationNm string) {
// settings for crawling
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.NoSandbox,
chromedp.Flag(&quot;disable-setuid-sandbox&quot;, true),
chromedp.Flag(&quot;disable-dev-shm-usage&quot;, true),
chromedp.Flag(&quot;single-process&quot;, true),
chromedp.Flag(&quot;no-zygote&quot;, true),
)
alloCtx, _ := chromedp.NewExecAllocator(context.Background(), opts...)
ctx, cancel := chromedp.NewContext(alloCtx, chromedp.WithLogf(log.Printf))
defer cancel()
var htmlContent string
ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
return strings.Contains(i.URL, &quot;/timetable/web/&quot;)
})
err := chromedp.Run(ctx,
chromedp.Navigate(URL),
chromedp.WaitVisible(&quot;.end_footer_area&quot;),
chromedp.Click(&quot;button&quot;),
)
checkErr(err)
newContext, cancel := chromedp.NewContext(ctx, chromedp.WithTargetID(&lt;-ch))
defer cancel()
if err := chromedp.Run(newContext,
chromedp.WaitReady(&quot;.table_schedule&quot;, chromedp.ByQuery),
chromedp.OuterHTML(&quot;.schedule_wrap&quot;, &amp;htmlContent, chromedp.ByQuery),
); err != nil {
panic(err)
}
crawler(htmlContent, lineNum, stationNm)
}

huangapple
  • 本文由 发表于 2023年5月9日 14:28:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76206426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定