可以使用dynamodb.ScanPages来设置页面大小吗?

huangapple go评论76阅读模式
英文:

Can page size be set with dynamodb.ScanPages?

问题

在使用Go AWS SDK的文档中,没有像AWS CLI中的page-size参数那样明确指定分页大小的方法。根据Go SDK文档和一般的扫描文档,如果结果超过1MB,那么就会被认为是一页。此外,你提到了ScanInput上的Limit值,但文档指出该值只在每个处理的项都匹配扫描的过滤表达式时才起作用,它表示要评估的最大项数,而不一定是匹配的项数。在Go SDK中,是否有一种设置与page-size等效的方法呢?

英文:

The documentation for working with dynamodb scans, found here, makes reference to a page-size parameter for the AWS CLI.

In looking at the documentation for the go AWS SDK, found here, there is function ScanPages. There is an example of how to use the function, but no where in the documentation is there a way to specify something like page-size as the AWS CLI has. I can't determine how the paging occurs other than assuming if the results exceed 1MB, then that would be considered a page based on the go documentation and the general scan documentation.

I'm also aware of the Limit value that can be set on the ScanInput, but the documentation indicates that value would function as a page size only if every item processed matched the filter expression of the scan:
> The maximum number of items to evaluate (not necessarily the number of matching items)

Is there a way to set something equivalent to page-size with the go SDK?

答案1

得分: 3

AWS中的分页是如何工作的?

DynamoDB对Scan操作的结果进行分页处理。通过分页,Scan结果被划分为大小为1MB(或更小)的数据“页”。应用程序可以处理第一页的结果,然后是第二页,依此类推。

因此,对于每个请求,如果结果中有更多的项,您将始终获得LastEvaluatedKey。您需要使用这个LastEvaluatedKey重新发出扫描请求以获取完整的结果。

例如,对于一个样本查询,您有400个结果,每个结果获取到上限100个结果,您将不断重新发出扫描请求,直到返回空的lastEvaluatedKey为止。您可以像下面这样做。文档

var result *ScanOutput
for{
    if(len(resultLastEvaluatedKey) == 0){
         break;
    }
    input := & ScanInput{ 
        ExclusiveStartKey= LastEvaluatedKey
        // 复制原始scanInput请求的所有参数
    }
    output = dynamoClient.Scan(input)
}

AWS-CLI中的page-size是什么作用?

扫描操作会扫描整个DynamoDB并根据过滤器返回结果。通常,AWS CLI会自动处理分页。AWS CLI会不断重新发出扫描请求。这个请求和响应的模式会一直持续,直到最后的响应。

page-size具体告诉扫描每次只扫描page-size数量的行,并对其进行过滤。如果没有扫描完整个表或结果超过1MB,结果将发送lastEvaluatedKey,CLI将重新发出请求。

这是来自文档的一个示例请求响应。

aws dynamodb scan \
    --table-name Movies \
    --projection-expression "title" \
    --filter-expression 'contains(info.genres,:gen)' \
    --expression-attribute-values '{"gen":{"S":"Sci-Fi"}}' \
    --page-size 100  \
    --debug
b'{"Count":7,"Items":[{"title":{"S":"Monster on the Campus"}},{"title":{"S":"+1"}},
{"title":{"S":"100 Degrees Below Zero"}},{"title":{"S":"About Time"}},{"title":{"S":"After Earth"}},
{"title":{"S":"Age of Dinosaurs"}},{"title":{"S":"Cloudy with a Chance of Meatballs 2"}}],
"LastEvaluatedKey":{"year":{"N":"2013"},"title":{"S":"Curse of Chucky"}},"ScannedCount":100}'

我们可以清楚地看到scannedCount:100和过滤后的计数Count:7,所以在扫描了100个项中,只有7个项被过滤。文档

来自Limit的文档

    // 要评估的最大项目数(不一定是匹配的项目数)。如果DynamoDB在处理结果时处理了达到限制数量的项目,它将停止操作,并返回匹配的值,以及一个在后续操作中应用的LastEvaluatedKey键,以便您可以从上次离开的地方继续。

所以基本上,page-sizelimit是相同的。Limit将限制在一个Scan请求中要扫描的行数。

英文:

How Pagination Works in AWS?

> DynamoDB paginates the results from Scan operations. With pagination,
> the Scan results are divided into "pages" of data that are 1 MB in
> size (or less). An application can process the first page of results,
> then the second page, and so on.

So for each request if you have more items in the result you will always get the LastEvaluatedKey. You will have re-issue scan request using this LastEvaluatedKey to get the complete result.

For example for a sample query you have 400 results and each result fetches to the upper limit 100 results, you will have to re-issue the scan request till the lastEvaluatedKey is returned empty. You will do something like below. documentation

var result *ScanOutput
for{
    if(len(resultLastEvaluatedKey) == 0){
         break;
    }
    input := & ScanInput{ 
        ExclusiveStartKey= LastEvaluatedKey
        // Copying all parameters of original scanInput request
    }
    output = dynamoClient.Scan(input)
}

What page-size on AWS-CLI does?

The scan operation scan's all the dynamoDB and returns result according to filter. Ordinarily, the AWS CLI handles pagination automatically.The AWS CLI keeps on re-issuing scan request for us. This request and response pattern continues, until the final response.

The page-size tells specifically to scan only the page-size number of rows in the DB table at a time and filter on those. If the complete table is not scanned or the result is more than 1MB the result will send out lastEvaluatedKey and cli will re-issue the request.

Here is a sample request response from documentation.

aws dynamodb scan \
    --table-name Movies \
    --projection-expression "title" \
    --filter-expression 'contains(info.genres,:gen)' \
    --expression-attribute-values '{":gen":{"S":"Sci-Fi"}}' \
    --page-size 100  \
    --debug
b'{"Count":7,"Items":[{"title":{"S":"Monster on the Campus"}},{"title":{"S":"+1"}},
{"title":{"S":"100 Degrees Below Zero"}},{"title":{"S":"About Time"}},{"title":{"S":"After Earth"}},
{"title":{"S":"Age of Dinosaurs"}},{"title":{"S":"Cloudy with a Chance of Meatballs 2"}}],
"LastEvaluatedKey":{"year":{"N":"2013"},"title":{"S":"Curse of Chucky"}},"ScannedCount":100}'

We can clearly see that the scannedCount:100 and the filtered count Count:7, so out of 100 items scanned only 7 items are filtered. documentation

From Limit's Documentation

    // The maximum number of items to evaluate (not necessarily the number of matching
    // items). If DynamoDB processes the number of items up to the limit while processing
    // the results, it stops the operation and returns the matching values up to
    // that point, and a key in LastEvaluatedKey to apply in a subsequent operation,
    // so that you can pick up where you left off.

So basically, page-size and limit are same. Limit will limit the number of rows to scan in one Scan request.

huangapple
  • 本文由 发表于 2021年8月7日 11:40:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/68689228.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定