将数据点加载到不同的存储桶

huangapple go评论46阅读模式
英文:

Redshift Loading Data Points To Different Bucket

问题

这篇帖子原本是关于我在将一些简单的CSV数据加载到Redshift表中时遇到的问题,但在写了一半时,我意识到不知何故,在选择COPY命令中的存储桶时,Redshift指向了错误的存储桶!

有人能解释为什么会这样吗?为了背景,我选择的存储桶是

s3://soccer-project/Player

但Redshift默认选择了

s3://soccer-project/Player_Attributes

这是我的存储桶中的另一个文件

对于Redshift还不太熟悉...有人能帮我理解这个问题吗

谢谢

英文:

This post was going to be about my issues loading some simple csv data in a Redshift table but halfway through writing it I realised that, for whatever reason, when selecting the bucket in the COPY command, Redshift was pointing to the wrong one!

Can someone explain why this is the case? For context, the bucket I selected was

s3://soccer-project/Player

but Redshift defaulted to

s3://soccer-project/Player_Attributes

which is another file in my bucket

New to Redshift... can someone help me understand this

Thanks

答案1

得分: 1

只有S3对象路径的顶部部分是存储桶。在您的两种情况下,这都是"soccer-project"。

现在我预期您显示的只是对象名称的一部分 - 在您的问题中是"Player"和"Player_Attributes"。这些不是对象的完整名称。完整的对象名称包括这些部分以及斜杠和更多文本。Redshift已设置为接受部分对象名称,以便它可以扩展复制的文件,以包括与部分匹配的所有对象名称。如果我对问题的理解有误,请纠正我。

要理解发生了什么,您需要了解S3是一个对象存储而不是文件系统。这意味着所有文件都存储在每个存储桶下,"扁平"存储。只有两个东西标识对象 - 存储桶名称和对象名称。存储中没有真正的层次结构。但是,为了使人们在查看时更加有组织,S3会查看对象名称中的斜杠,并使事物看起来层次化。但实际上,存储桶名称和斜杠之后的一切都是对象名称,包括任何斜杠、"文件夹"名称或您认为具有独特含义的任何内容。这都是对象名称。

现在来看您的情况:您的存储桶中可能有以"Player"或"Player_Attributes"开头的对象名称,对象名称中的下一个字符是斜杠。这只是对象名称的第一部分。我猜测您的COPY命令的FROM子句可能类似于"s3://soccer-project/Player*"。(如果您在问题中提供COPY命令,将有助于更清楚地理解发生了什么。)""是一个通配符,匹配对象名称中的所有后续字符,这将匹配"Player_Attributes"。如果一切都正确,那么您可以通过将FROM子句更改为"s3://soccer-project/Player/"(添加斜杠)来修复此问题。

如我上面所说,这是基于提供的部分信息的最佳猜测。如果这不正确,请更新问题。

英文:

Only the top (left most) part of the S3 object path is the bucket. In both of your cases this is "soccer-project".

Now I expect that what you are showing is only part of the object name - "Player" and "Player_Attributes" in your question. These are not the full names to the objects. The full object names are these parts plus a slash and more text. Redshift is set up to take partial object names so that it can expand the files copied to include all object names that match the partial. Correct me if I'm interpreting the question incorrectly.

To understand what is going on you need to understand that S3 is an object store and not a file system. This means that all files are stored "flat" under each bucket. Only 2 things identify the object - bucket name and object name. There is no real hierarch in the storage. However to make things a little more organized when us humans look S3 will organize the objects by looking at slashes in the object name and make things seem hierarchical. But in reality everything after the bucket-name and slash is the object name, including any slashes, "folder" names, or anything else you think has unique meaning. It is all the object name.

Now to your situation: You likely have object names in your bucket that start with "Player" or "Player_Attributes", with the next character in the name being a slash. This is all just the first part of the object name. I'd also guess that your COPY command has a FROM clause like "s3://soccer-project/Player*". (Providing your COPY command in the question would really help clear up what is going on.) The '' is a wildcard that matches all following characters in the object name which will match "Player_Attributes". If all of this is correct then you can fix this by changing the FROM clause to "s3://soccer-project/Player/" (slash added).

As I said above this is a best guess based on the partial info provided. Please update the question if this is incorrect.

huangapple
  • 本文由 发表于 2023年7月7日 03:10:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631895.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定