2023年4月17日 03:03:34go评论96阅读模式

英文:

using lambda (python) to read csv from s3 and return csv to client via api gateway

问题

Here is the translated code part:

我有一个具有路径和查询字符串参数设置以及使用带有 pandas 层 (awswrangler) 的 lambda 集成的 API 网关。我想要让用户能够将存储桶指定为路径参数，并使用查询字符串参数来确定数据是否以 JSON 或 CSV 的方式返回给他们（可下载）。
基本上，我希望如果他们在 URL 中指定 fmt=csv，它会自动下载 CSV。
以下是具有两种格式的网关：
https://m0fhdyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=json
https://m0fhddyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=csv
br-candles 是存储桶
这是我的 Lambda 代码。目前，我只为 CSV 格式有 "statuscode: 200"，因为我不知道如何实现我想要的功能。
import json
import awswrangler as wr
def lambda_handler(event, context):
    
    print(event)
    bucket_name = event['params']['path']['bucket']
    format = event['params']['querystring']['fmt']
    full_path = f"s3://{bucket_name}"
    print(bucket_name, format)
    
    raw_df = wr.s3.read_csv(path=full_path, path_suffix=['.csv'], use_threads=True)
    for df in raw_df:
        if format == 'json':
            
            df = raw_df.to_json(orient="records")
            parsed = json.loads(df)
          
            return {
                'body': (parsed)
            }
        elif format == 'csv':
            for df in raw_df:
                #df = df.to_string(index=False)
                #print (df)
              
                return {
                    "statusCode": 200
                } 
        else:
            return {
                "statusCode": 300
            }

Please note that I've translated the code portion you provided. If you have any specific questions or need further assistance with this code, please feel free to ask.

英文:

I have an api gateway with path and query string params setup and lambda integration using a lambda with a pandas layer (awswrangler). I want to be able to have the user specify the bucket as the path param and a query string param which will dictate whether data is returned to them either as json or csv (downloadable)
Basically, I want it to download the csv automatically if they specify fmt=csv in the url.
Here is the gateway with the two formats:

https://m0fhdyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=json

https://m0fhddyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=csv

br-candles is the bucket

Here is my lambda code. Currently, I only have "statuscode: 200 for the csv format since I dont know how to do what I want.

import json
import awswrangler as wr
def lambda_handler(event, context):
    
    print(event)
    bucket_name = event[&#39;params&#39;][&#39;path&#39;][&#39;bucket&#39;]
    format = event[&#39;params&#39;][&#39;querystring&#39;][&#39;fmt&#39;]
    full_path = f&quot;s3://{bucket_name}&quot;
    print(bucket_name, format)
    
    raw_df = wr.s3.read_csv(path=full_path, path_suffix=[&#39;.csv&#39;], use_threads=True)
    for df in raw_df:
      if format == &#39;json&#39;:
        
        df = raw_df.to_json(orient=&quot;records&quot;)
        parsed = json.loads(df)
      
        return {
          &#39;body&#39;: (parsed)
        }
      elif format == &#39;csv&#39;:
        for df in raw_df:
          #df = df.to_string(index=False)
          #print (df)
          
          return {
            &quot;statusCode&quot;: 200
          } 
      else:
        return {
          &quot;statusCode&quot;: 300
          }

答案1

得分: 1

Sure, here are the translated parts:

指定路径参数作为存储桶

要将路径参数指定为存储桶，最简单的方法是在API Gateway路由中定义一个形式为/base-url-of-your-api/{bucket}的路径变量。由您的Lambda函数来检查给定的参数是否正确，并返回HTTP错误。

最终，您将得到类似以下的内容：

API Gateway资源：{bucket}
在您的代码中：bucket_name = event['requestContext']['path'].split('/')[-1]

不过，我更倾向于使用功能名称而不是物理存储桶名称。这样可以避免向最终用户披露您的S3存储桶名称，并保持在不更改API契约的情况下，将来重新组织存储桶的灵活性：

API调用：/your-api-base-url/{functionalName}
在您的代码中：

# 添加适当的错误处理
buckets_mapping = {
    'functionalNameAlpha': 'bucket-name-for-alpha',
    'functionalNameBeta': 'bucket-name-for-beta',
    'functionalNameGamma': 'bucket-name-for-gamma'
}
functional_name = event['requestContext']['path'].split('/')[-1]
bucket = buckets_mapping[functional_name]

返回格式

关于返回格式，更清晰的方法是依赖于标准的Accept HTTP头。调用方设置此头，并定义了接受的格式列表。例如：Accept: application/json,application/xml,text/csv 表示调用方理解这三种格式：json、xml和csv，按照这个偏好顺序。

这些头部信息可以在传递给您的lambda_handler的event中找到。与上述类似，由您的Lambda函数来检查接受的格式是否与您的应用程序兼容，如果不兼容，则返回HTTP 406 "Not Acceptable"：

accepted_formats = event['headers']['Accept']  # 例如： "application/xml,text/csv;q=0.9,application/json,text/*;q=0.2"

附注

我维护一个开源库，专门用于通过提供以下方法来简化所有这些操作：

确定要选择的返回格式，
如果Accept不列出任何可接受的格式，自动返回406
设置响应内容转换器，将内部格式转换为调用方期望的格式

它叫做awsmate，可以在这里找到：https://github.com/shlublu/awsmate
如果您有兴趣，我会尽量找时间编辑此答案并在评论中发布一些代码示例。与此同时，它附带了一个示例应用程序，可以展示如何快速实现您的需求，希望能对您有所帮助。

英文:

Specify the bucket as a path param

To specify the bucket as a path param, the simplest is to define a path variable of the form /base-url-of-your-api/{bucket} in your API Gateway route. Up to your lambda to check whether the given parameter is correct and to return an HTTP error of not.

You will end up with something like this:

API Gateway resource: {bucket}
In your code: bucket_name = event['requestContext']['path'].split('/')[-1]

I'd rather use a functional name instead of a physical bucket name though. This not to disclose to the end user the name of your S3 buckets and to keep the flexibility to organize your buckets differently in the future without changing the API contract:

API call: /your-api-base-url/{functionalName}
In your code:

''' with some proper error handling to be added '''

buckets_mapping = {
    &#39;functionalNameAlpha&#39; : &#39;bucket-name-for-alpha&#39;,
    &#39;functionalNameBeta&#39; : &#39;bucket-name-for-beta&#39;,
    &#39;functionalNameGamma&#39; : &#39;bucket-name-for-gamma&#39;
}
    
functional_name = event[&#39;requestContext&#39;][&#39;path&#39;].split(&#39;/&#39;)[-1]
bucket = buckets_mapping[functional_name ]

Return format

Regarding the return format, the cleaner way is to rely on the standard Accept HTTP header. This header is set by the caller and defines a list of accepted formats. For example: Accept: application/json,application/xml,text/csv means that these json, xml and csv are the three format that the caller understands, in this order of preference.

The headers are available in the event passed to your lambda_handler mounted in proxy mode. Similarly as above, up to your lambda to check whether the accepted formats are compatible with your application, and to return a HTTP 406 "Not Acceptable" if not:

accepted_formats = event[&#39;headers&#39;][&#39;Accept&#39;] # for example: &quot;application/xml,text/csv;q=0.9,application/json,text/*;q=0.2&quot;

Sidenote

I maintain an open source libraray that is dedicated to make all this easier by providing ways to (among other things):

determine the return format to select,
automatically return a 406 if Accept does not list any acceptable format
set up response content transformers to convert you internal format to the format expected by the caller

It is called awsmate and it is available here: https://github.com/shlublu/awsmate
I'll try to find some time to edit this answer and post some code here should you be interested (let me know in comments). In the meantime, it comes with an example application that could show how to quickly do what you wish I hope.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用lambda（Python）从S3读取CSV，并通过API Gateway将CSV返回给客户端。

问题

答案1

在Gtk.ListBox中对行进行排序

Plotly: difference between fig.update_layout({'yaxis': dict(matches=None)}) and fig.update_yaxes(matches=None)

如何为使用prince库绘制的散点图点添加注释

‘lat’的类型为不能转换为float。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。