英文:
Websocket URL timeout reached error in AWS Lambda
问题
我正在使用Go语言在AWS Lambda上工作。我正在部署Go语言代码,使用chromedp,使用Docker镜像,并遇到了WebSocket URL超时错误。我的Lambda设置为3008 MB的RAM内存,512MB的存储空间,以及15分钟的超时时间。你能找出问题所在并提供修复方法吗?这里是main.go
和Dockerfile
文件。
main.go
文件(chromedp部分):
func getPage(URL string, lineNum string, stationNm string) {
// 爬取设置
ctx, cancel := chromedp.NewContext(
context.Background(),
chromedp.WithLogf(log.Printf),
)
defer cancel()
opts := []chromedp.ExecAllocatorOption{
chromedp.DisableGPU,
chromedp.NoSandbox,
chromedp.Headless,
chromedp.Flag("no-zygote", true),
chromedp.Flag("single-process", true),
chromedp.Flag("homedir", "/tmp"),
chromedp.Flag("data-path", "/tmp/data-path"),
chromedp.Flag("disk-cache-dir", "/tmp/cache-dir"),
chromedp.Flag("remote-debugging-port", "9222"),
chromedp.Flag("remote-debugging-address", "0.0.0.0"),
chromedp.Flag("disable-dev-shm-usage", true),
}
allocCtx, cancel := chromedp.NewExecAllocator(ctx, opts...)
defer cancel()
ctx, cancel = chromedp.NewContext(allocCtx, chromedp.WithLogf(log.Printf))
defer cancel()
var htmlContent string
ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
return strings.Contains(i.URL, "/timetable/web/")
})
}
Dockerfile
文件:
FROM public.ecr.aws/lambda/provided:al2 AS build
ENV GO111MODULE=on \
CGO_ENABLED=0 \
GOOS=linux \
GOARCH=amd64
# 解决扩展警告
RUN mkdir -p /opt/extensions
RUN yum -y install golang
RUN go env -w GOPROXY=direct
# 克隆git,复制go.mod,go.sum,main.go
WORKDIR /var/task/
RUN yum install git -y
RUN git clone https://github.com/seedspirit/NaverCrawler-CICD-go.git
RUN cp NaverCrawler-CICD-go/main.go /var/task/
RUN cp NaverCrawler-CICD-go/go.mod /var/task/
RUN cp NaverCrawler-CICD-go/go.sum /var/task/
# 缓存依赖项
RUN go mod download
RUN go build -o main .
FROM public.ecr.aws/lambda/provided:al2
COPY --from=build /var/task/main /var/task/main
# 安装Chrome依赖项
RUN curl https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm -o chrome.rpm && \
yum install -y ./chrome.rpm && \
yum install -y fontconfig libX11 GConf2 dbus-x11
ENTRYPOINT ["/var/task/main"]
英文:
I'm working on AWS Lambda with Go lang. I'm deploying Go lang code, use chromedp , with Docker image and got websocket URL timeout reached error. My lambda setting is with 3008 MB RAM memory, 512MB storage, and 15 minutes timeout. Can you find what is wrong and how to fix this? Here is file main.go
and Dockerfile
File main.go
(chromedp part)
func getPage(URL string, lineNum string, stationNm string) {
// settings for crawling
ctx, cancle := chromedp.NewContext(
context.Background(),
chromedp.WithLogf(log.Printf),
)
defer cancle()
opts := []chromedp.ExecAllocatorOption{
chromedp.DisableGPU,
chromedp.NoSandbox,
chromedp.Headless,
chromedp.Flag("no-zygote", true),
chromedp.Flag("single-process", true),
chromedp.Flag("homedir", "/tmp"),
chromedp.Flag("data-path", "/tmp/data-path"),
chromedp.Flag("disk-cache-dir", "/tmp/cache-dir"),
chromedp.Flag("remote-debugging-port", "9222"),
chromedp.Flag("remote-debugging-address", "0.0.0.0"),
chromedp.Flag("disable-dev-shm-usage", true),
}
allocCtx, cancel := chromedp.NewExecAllocator(ctx, opts...)
defer cancel()
ctx, cancel = chromedp.NewContext(allocCtx, chromedp.WithLogf(log.Printf))
defer cancel()
var htmlContent string
ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
return strings.Contains(i.URL, "/timetable/web/")
})
}
File Dockerfile
FROM public.ecr.aws/lambda/provided:al2 AS build
ENV GO111MODULE=on \
CGO_ENABLED=0 \
GOOS=linux \
GOARCH=amd64
# Get rid of the extension warning
RUN mkdir -p /opt/extensions
RUN yum -y install golang
RUN go env -w GOPROXY=direct
# Clone git, copying go.mod, go.sum, main.go
WORKDIR /var/task/
RUN yum install git -y
RUN git clone https://github.com/seedspirit/NaverCrawler-CICD-go.git
RUN cp NaverCrawler-CICD-go/main.go /var/task/
RUN cp NaverCrawler-CICD-go/go.mod /var/task/
RUN cp NaverCrawler-CICD-go/go.sum /var/task/
# cache dependencies
RUN go mod download
RUN go build -o main .
FROM public.ecr.aws/lambda/provided:al2
COPY --from=build /var/task/main /var/task/main
# Install Chrome dependencies
RUN curl https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm -o chrome.rpm && \
yum install -y ./chrome.rpm && \
yum install -y fontconfig libX11 GConf2 dbus-x11
ENTRYPOINT ["/var/task/main"]
答案1
得分: 1
建议使用chromedp/headless-shell,因为它体积小,更适合在AWS Lambda上使用。
我刚刚使用chromedp/headless-shell
测试了一个简单的示例,它可以正常工作。
Dockerfile:
FROM golang:1.20.4-alpine3.17 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o main
FROM chromedp/headless-shell:113.0.5672.93
WORKDIR /app
COPY --from=builder /app/main .
ENTRYPOINT ["./main"]
main.go:
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
"github.com/aws/aws-lambda-go/lambda"
"github.com/chromedp/chromedp"
)
func Handler(_ context.Context, _ json.RawMessage) error {
opts := []chromedp.ExecAllocatorOption{
chromedp.NoSandbox,
chromedp.Flag("disable-setuid-sandbox", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("single-process", true),
chromedp.Flag("no-zygote", true),
}
ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
defer cancel()
ctx, cancel = chromedp.NewContext(ctx, chromedp.WithDebugf(log.Printf))
defer cancel()
var content string
if err := chromedp.Run(ctx, chromedp.Tasks{
chromedp.Navigate("https://example.com/"),
chromedp.Text("body > div > p:nth-child(2)", &content),
}); err != nil {
log.Fatal(err)
}
fmt.Println(content)
return nil
}
func main() {
if _, exists := os.LookupEnv("AWS_LAMBDA_RUNTIME_API"); exists {
lambda.Start(Handler)
} else {
err := Handler(context.Background(), nil)
if err != nil {
log.Fatal(err)
}
}
}
这个示例基于https://github.com/Andiedie/chromedp-aws-lambda-example。请注意,列出的chromedp.ExecAllocatorOption
是直接从该存储库复制的。它可以工作,但我不确定这是否是最佳的选项列表。也许你需要根据自己的需求进行调整。
英文:
It's recommended to use chromedp/headless-shell because it's small and more suitable for AWS Lambda.
I just tested a simple demo with chromedp/headless-shell
, and it works.
Dockerfile:
FROM golang:1.20.4-alpine3.17 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o main
FROM chromedp/headless-shell:113.0.5672.93
WORKDIR /app
COPY --from=builder /app/main .
ENTRYPOINT [ "./main" ]
main.go:
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
"github.com/aws/aws-lambda-go/lambda"
"github.com/chromedp/chromedp"
)
func Handler(_ context.Context, _ json.RawMessage) error {
opts := []chromedp.ExecAllocatorOption{
chromedp.NoSandbox,
chromedp.Flag("disable-setuid-sandbox", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("single-process", true),
chromedp.Flag("no-zygote", true),
}
ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
defer cancel()
ctx, cancel = chromedp.NewContext(ctx, chromedp.WithDebugf(log.Printf))
defer cancel()
var content string
if err := chromedp.Run(ctx, chromedp.Tasks{
chromedp.Navigate("https://example.com/"),
chromedp.Text("body > div > p:nth-child(2)", &content),
}); err != nil {
log.Fatal(err)
}
fmt.Println(content)
return nil
}
func main() {
if _, exists := os.LookupEnv("AWS_LAMBDA_RUNTIME_API"); exists {
lambda.Start(Handler)
} else {
err := Handler(context.Background(), nil)
if err != nil {
log.Fatal(err)
}
}
}
This example is based on https://github.com/Andiedie/chromedp-aws-lambda-example. Please note that the chromedp.ExecAllocatorOption
s listed is copied directly from that repository. It works, but I'm not sure whether this is the best list of options. Maybe you have to adjust them according to your needs.
答案2
得分: 0
对于来到这里的人!我是用以下方式解决的
Dockerfile
FROM golang:1.20.4-alpine3.17 AS builder
ENV GO111MODULE=on \
CGO_ENABLED=0 \
GOOS=linux \
GOARCH=amd64
WORKDIR /app
RUN apk update && apk add ca-certificates && rm -rf /var/cache/apk/*
COPY go.mod go.sum main.go ./
RUN go mod download
COPY . .
RUN go build -o main
FROM chromedp/headless-shell:113.0.5672.93
WORKDIR /app
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /app/main .
ENTRYPOINT ["./main"]
function
func getPage(URL string, lineNum string, stationNm string) {
// settings for crawling
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.NoSandbox,
chromedp.Flag("disable-setuid-sandbox", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("single-process", true),
chromedp.Flag("no-zygote", true),
)
alloCtx, _ := chromedp.NewExecAllocator(context.Background(), opts...)
ctx, cancel := chromedp.NewContext(alloCtx, chromedp.WithLogf(log.Printf))
defer cancel()
var htmlContent string
ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
return strings.Contains(i.URL, "/timetable/web/")
})
err := chromedp.Run(ctx,
chromedp.Navigate(URL),
chromedp.WaitVisible(".end_footer_area"),
chromedp.Click("button"),
)
checkErr(err)
newContext, cancel := chromedp.NewContext(ctx, chromedp.WithTargetID(<-ch))
defer cancel()
if err := chromedp.Run(newContext,
chromedp.WaitReady(".table_schedule", chromedp.ByQuery),
chromedp.OuterHTML(".schedule_wrap", &htmlContent, chromedp.ByQuery),
); err != nil {
panic(err)
}
crawler(htmlContent, lineNum, stationNm)
}
英文:
For people come here! I Solved in this way
Dockerfile
FROM golang:1.20.4-alpine3.17 AS builder
ENV GO111MODULE=on \
CGO_ENABLED=0 \
GOOS=linux \
GOARCH=amd64
WORKDIR /app
RUN apk update && apk add ca-certificates && rm -rf /var/cache/apk/*
COPY go.mod go.sum main.go ./
RUN go mod download
COPY . .
RUN go build -o main
FROM chromedp/headless-shell:113.0.5672.93
WORKDIR /app
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /app/main .
ENTRYPOINT [ "./main" ]
function
func getPage(URL string, lineNum string, stationNm string) {
// settings for crawling
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.NoSandbox,
chromedp.Flag("disable-setuid-sandbox", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("single-process", true),
chromedp.Flag("no-zygote", true),
)
alloCtx, _ := chromedp.NewExecAllocator(context.Background(), opts...)
ctx, cancel := chromedp.NewContext(alloCtx, chromedp.WithLogf(log.Printf))
defer cancel()
var htmlContent string
ch := chromedp.WaitNewTarget(ctx, func(i *target.Info) bool {
return strings.Contains(i.URL, "/timetable/web/")
})
err := chromedp.Run(ctx,
chromedp.Navigate(URL),
chromedp.WaitVisible(".end_footer_area"),
chromedp.Click("button"),
)
checkErr(err)
newContext, cancel := chromedp.NewContext(ctx, chromedp.WithTargetID(<-ch))
defer cancel()
if err := chromedp.Run(newContext,
chromedp.WaitReady(".table_schedule", chromedp.ByQuery),
chromedp.OuterHTML(".schedule_wrap", &htmlContent, chromedp.ByQuery),
); err != nil {
panic(err)
}
crawler(htmlContent, lineNum, stationNm)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论