英文:
Ruby on Rails sitemap using Amazon ECS(Fargate)
问题
我已经在互联网上搜索了几个月,试图找到解决方案,所以任何指导对我来说都将是巨大的帮助。
所以我的任务是,我有一个使用Fargate的RoR应用程序。我有一个站点地图索引和三个站点地图(链接分成了50k的增量)。这些站点地图需要通过我的网址(mysite.com/sitemap...)访问。
据我了解,容器是短暂的,将站点地图添加到我的公共文件夹中将导致在Google上索引不良的结果。
我找到了无数关于如何使用Heroku通过S3上传站点地图的教程 - 但这个选项似乎使用了S3的公共URL,而不是我的域名的URL。
我猜想我需要使用类似Elastic File Storage或甚至S3的东西 - 但我迷失了。我甚至可以这样说,像Airbnb和Github这样的公司是如何存储他们的站点地图的?
英文:
I have scoured the interwebs for months trying to find a solution, so any guidance will be a huge help to me.
So my task is that I have a RoR app that is using Fargate. I have a sitemap index and three sitemaps(links split up in 50k increments). These sitemaps needs to be accessible via my url (mysite.com/sitemap...).
So from my understanding, containers are ephemeral and adding the sitemap to my public folder will have undesirable results with indexing on Google.
I have found countless tutorials on how to upload the sitemap using Heroku via S3 - but this option appears to use the public url of the S3 and not the url from my domain.
My guess is I need to use something like Elastic File Storage or maybe even S3 - but I am lost. I can even put it this way, how do companies like Airbnb and Github store their sitemaps?
答案1
得分: 1
我不了解Airbnb或Github的站点地图,但如果您可以在Fargate上运行您的应用程序,那么您可以解决任何问题。
据我了解,容器是短暂的,将站点地图添加到我的公共文件夹可能会导致Google的索引产生不良结果。
容器确实是短暂的,但这与Google的不良结果无关。
您可以将站点地图托管在S3或弹性文件存储上。您还可以配置S3以使用您的域名(请参见下文),但我不确定是否值得这个努力。
最简单的方法是将站点地图托管在您的公共文件夹中。该过程是在开发机器上生成文件并将它们添加到存储库中。当它们部署时,它们将位于每个容器的公共文件夹中,并可供Rails应用程序使用。
如果您决定不让Rails应用程序提供站点地图(对于某些用例可能有意义),那么下一个最简单的方法可能是将其托管在S3上。
您可以配置S3以使用子域。我不确定这是否会影响Google如何看待您的站点,或者站点索引是否应托管在同一域上。
如果您想要在S3上使用自己的域名托管站点地图,那么您可能可以使用CloudFront将所有请求转发到您的Rails应用程序,除了站点地图。站点地图可以从S3提供。
参考:在S3上使用子域
编辑:如果您决定使用CloudFront,那么无需使用S3。CloudFront可以缓存站点地图数天甚至数周,您的应用程序在此期间只需提供一次。
英文:
I don't know about Airbnb or Github's sitemaps, but if you can get your app running on Fargate then you can figure out anything.
> So from my understanding, containers are ephemeral and adding the sitemap to my public folder will have undesirable results with indexing on Google.
It's true that containers are ephemeral, but that has nothing to do with undesirable results with Google.
You can host the sitemaps on S3 or Elastic File Storage. You can configure S3 to use your domain as well (see below), but I'm not sure if that is worth the effort.
The easiest thing to do is to host the sitemaps in your public folder. The process would be to generate the files on your dev machine and add them to the repo. When they are deployed, they will be in the public folder of each container and available to the Rails app.
If you decide that you don't want the Rails app to serve the sitemaps (which may make sense for certain use cases), then the next easiest thing would probably be to host it on S3.
You can configure S3 to use a subdomain. I'm not sure if this would have an effect on how Google sees your site, or if the site index is supposed to be hosted on the same domain.
If you want to host the sitemaps on S3 with your own domain, then you might be able to use CloudFront to forward all requests to your Rails app, with the exception of the sitemaps. The sitemaps could be served from S3.
Reference: Using S3 with Subdomain
EDIT: If you decide to use CloudFront, then it's not necessary to use S3. CloudFront can cache the sitemap for days or weeks, and your app would only serve it once in that time.
答案2
得分: 0
我的猜测是我需要使用类似弹性文件存储或甚至S3这样的东西 - 但我迷失了。我甚至可以这样说,像Airbnb和Github这样的公司是如何存储它们的站点地图的?
像这样的大公司肯定会在他们的网站前面使用CDN。您也可以在您的网站前面使用CDN。AWS的解决方案是CloudFront,但我也建议查看Cloudflare。
在任何情况下,一旦您在网站前面有了CDN,您可以配置它以从不同的源提供不同的内容,基于URL路径。因此,例如,您可以将默认源设置为您的Ruby应用程序,并将/sitemap
源设置为一个包含您站点地图文件的S3存储桶。
或者,您可以将站点地图存储在EFS中,将EFS卷映射到您的Fargate任务,并在请求/sitemap
时配置您的Ruby应用程序(或在您的Ruby应用程序前运行的Nginx?)以提供位于站点地图卷中的文件。
英文:
> My guess is I need to use something like Elastic File Storage or maybe
> even S3 - but I am lost. I can even put it this way, how do companies
> like Airbnb and Github store their sitemaps?
Big companies like that would certainly have a CDN in front of their website. You can also have a CDN in front of your website. The AWS solution is CloudFront, but I would also recommend looking into Cloudflare.
In either case, once you have a CDN in front of your website, you can configure it to server different content from different origins, based on the URL path. So for instance you could setup the default origin as your Ruby app, and setup the /sitemap
origin as an S3 bucket that has your sitemap file in it.
Alternatively you could store the site map in EFS, map the EFS volume to your Fargate tasks, and configure your Ruby app (or Nginx running in front of your Ruby app?) to serve the file in the sitemap volume when a request comes in for /sitemap
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论