如何在不下载整个文件的情况下获取S3对象的CSV标题?

huangapple go评论112阅读模式
英文:

How to get the CSV headers of an S3 Object without downloading entire file?

问题

我在S3上有一个非常大的CSV文件,只需要获取该文件的标题(CSV顶部的具有列名称的行,而不是HTTP标题)。是否有方法在不先下载整个文件的情况下实现这一点?我正在使用Java AWS SDK。我认为这些信息不会存储在对象元数据中,但我可能是错误的。

编辑:

下面选择的答案起作用了,它使用了S3 Select,但对我起作用的查询是:

select s.* from S3Object s limit 1
英文:

I have a very large CSV file in S3, and just need to get the headers of that file (the top row of a CSV that has column names, not HTTP headers). Is there a way to do this without downloading the entire file first? I'm using the Java AWS SDK. I don't think this information is stored in the object metadata, but I may be wrong.

Edit:

The chosen answer below worked, and it used S3 Select, but the query that worked for me was

select s.* from S3Object s limit 1

答案1

得分: 4

你可以使用 S3 select 来查询存储在 AWS S3 中的任何文件的数据。

以下是相同操作的 Java 示例,可以在 aws 文档 中查看。

要从 CSV 文件中选择 列标题,可以将结果限制为 一条记录。请查看这里的 SELECT 命令

例如:

QUERY = "select s.* from S3Object s limit 1";

在这里可以查看 不同类型的查询示例

英文:

You can use S3 select to query the data from any file stored in AWS S3.

Java example for the same from aws docs.

To select the column headers from a CSV file, you can limit the results to one record. Check here for SELECT command.

For example:

QUERY = "select s.* from S3Object s limit 1";

Check for different type of query examples here.

答案2

得分: 2

我知道您可以从文件中下载一系列字节。然后,您可以下载文件的大约10%(但您需要自己确定这个数字),然后将这些字节转换为字符,然后转为字符串。

输出可能会包括标头+一些值,因此您需要查看如何解析内容,以便只保留标头。

// 从对象获取一系列字节并打印字节。
GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key)
        .withRange(0, 9);
英文:

I know that you can download a range of bytes from the files. So then you can download let's say maybe 10% of the file ( but you'll have to figure out this number by yourself ) and then transform those bytes into chars, then strings.

The output will probably be the header + some values, so you'll have to see how you can parse the content so that you remain only with the header.

// Get a range of bytes from an object and print the bytes.
            GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key)
                    .withRange(0, 9);

huangapple
  • 本文由 发表于 2020年7月27日 01:49:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/63103677.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定