英文:
How do I scrape my correct submissions on LeetCode?
问题
我正在研究如何抓取我的正确的LeetCode提交并上传到Github。作为一个网络抓取的初学者,我阅读了一些博客,我了解到我们可以使用Python库如BeautifulSoup、Scrapy、Selenium等来进行抓取。但是否真的只能抓取robots.txt
文件中未被禁止的路由?因为Leetcode的robots.txt
已经禁止了提交的路由。
如果确实无法抓取被禁止的页面,那么是否有其他方法可以抓取我的正确提交呢?作为一个绝对的初学者,我欢迎任何建议
P.S. 对整个过程的概要就足够了,我不需要具体的代码。谢谢!
英文:
I am looking into how I can scrape my correct LeetCode submissions and upload them to Github. Being a beginner to web scraping, I read through a few blogs and I understand that we can use python libraries like BeautifulSoup, Scrapy, Selenium, etc., to perform scraping. But is it true that we can only scrape the routes which aren't disallowed in the robots.txt
of the website ? Because Leetcode's robots.txt
has disallowed the submissions route.
The robots.txt
page of leetcode
If it is true that disallowed pages cannot be scraped, then is there any other way I can scrape my correct submissions ? Any advice is welcome as I am an absolute beginner here
P.S. An outline of the process is more than enough and I do not need the exact code. Thank You.
答案1
得分: 1
"但是否真的只能抓取网站robots.txt中未被禁止的路线?
从技术上讲,这是为了大规模的索引器机器人,如Google、Yandex、Majestic12等...您也没有义务遵守robots.txt,但这是一种好的做法。
由于您不是在进行大规模的抓取,只是想要获取您自己的提交,除非您编写错误的代码并开始对网站进行DDOS攻击,否则应该没有问题。
如果您不想编写代码,您可以在GitHub上查找其他人的代码,例如https://github.com/world177/Leetcode-Downloader-for-Submissions,该项目有75颗星,因此应该是安全的。但无论如何,我不能保证它是否适合您的具体需求,所以请务必自行审查该代码库。"
英文:
> But is it true that we can only scrape the routes which aren't disallowed in the robots.txt of the website
Technically this is for massive indexer bots like Google, Yandex, Majestic12, etc... You're also not obligated to follow robots.txt
but it's the nice thing to do.
Since you're not doing a massive scrape and just want your own submissions you should be fine unless you code it wrong and start DDOSing the website.
If you don't want to code it you can check GitHub for other's people code like https://github.com/world177/Leetcode-Downloader-for-Submissions which has 75 stars so it should be safe, regardless I can't guarantee its safety or suitability for your specific needs so make sure to review the repository yourself.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论