crawlee – 如何将相同的URL添加回请求队列

huangapple go评论74阅读模式
英文:

crawlee - How to add the same URL back to the requestQueue

问题

如何将我当前正在处理请求的相同URL加入队列?
我有这段代码,并希望再次爬取相同的URL(可能带有延迟),我添加了环境变量,根据此答案1,缓存的结果将被删除。

import { RequestQueue, CheerioCrawler, Configuration } from "crawlee";

const config = Configuration.getGlobalConfig();
config.set('persistStorage', false);
config.set('purgeOnStart', false);

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({ url: "https://www.google.com/" });

const crawler = new CheerioCrawler({
    requestQueue,
    async requestHandler({ $, request }) {
        console.log("使用爬取的数据执行某些操作...");
        await crawler.addRequests([{ url: "https://www.google.com/" }]);
    }
})

await crawler.run();
英文:

How do i enqueue the same URL that i am currently handling the request for?
I have this code and want to scrape the same URL again (possibly with a delay), i added enviroment variables that cached results will be deleted, according to this answer.

import { RequestQueue, CheerioCrawler, Configuration } from "crawlee";

const config = Configuration.getGlobalConfig();
config.set('persistStorage', false);
config.set('purgeOnStart', false);

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({ url: "https://www.google.com/" });

const crawler = new CheerioCrawler({
    requestQueue,
    async requestHandler({ $, request }) {
        console.log("Do something with scraped data...");
        await crawler.addRequests([{url: "https://www.google.com/"}]);
    }
})

await crawler.run();

答案1

得分: 0

我找到了一个解决方案:
向请求字典添加一个唯一键,例如在我们排队新请求之前每次递增的计数器,可以解决这个问题。

{url: "https://www.google.com/", uniqueKey: counter.toString()}

英文:

I found a solution:
Adding a unique key to the Request Dictionary, for example an counter that is incremented every time before we queue a new request, solves this problem.

{url: "https://www.google.com/", uniqueKey: counter.toString()}

huangapple
  • 本文由 发表于 2023年1月9日 00:37:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75049539.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定