使用lambda(Python)从S3读取CSV,并通过API Gateway将CSV返回给客户端。

huangapple go评论63阅读模式
英文:

using lambda (python) to read csv from s3 and return csv to client via api gateway

问题

Here is the translated code part:

我有一个具有路径和查询字符串参数设置以及使用带有 pandas 层 (awswrangler) 的 lambda 集成的 API 网关我想要让用户能够将存储桶指定为路径参数并使用查询字符串参数来确定数据是否以 JSON 或 CSV 的方式返回给他们可下载)。
基本上我希望如果他们在 URL 中指定 fmt=csv它会自动下载 CSV
以下是具有两种格式的网关

https://m0fhdyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=json

https://m0fhddyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=csv

br-candles 是存储桶

这是我的 Lambda 代码目前我只为 CSV 格式有 "statuscode: 200"因为我不知道如何实现我想要的功能

import json
import awswrangler as wr

def lambda_handler(event, context):
    
    print(event)
    bucket_name = event['params']['path']['bucket']
    format = event['params']['querystring']['fmt']
    full_path = f"s3://{bucket_name}"
    print(bucket_name, format)
    
    raw_df = wr.s3.read_csv(path=full_path, path_suffix=['.csv'], use_threads=True)
    for df in raw_df:
        if format == 'json':
            
            df = raw_df.to_json(orient="records")
            parsed = json.loads(df)
          
            return {
                'body': (parsed)
            }
        elif format == 'csv':
            for df in raw_df:
                #df = df.to_string(index=False)
                #print (df)
              
                return {
                    "statusCode": 200
                } 
        else:
            return {
                "statusCode": 300
            }

Please note that I've translated the code portion you provided. If you have any specific questions or need further assistance with this code, please feel free to ask.

英文:

I have an api gateway with path and query string params setup and lambda integration using a lambda with a pandas layer (awswrangler). I want to be able to have the user specify the bucket as the path param and a query string param which will dictate whether data is returned to them either as json or csv (downloadable)
Basically, I want it to download the csv automatically if they specify fmt=csv in the url.
Here is the gateway with the two formats:

https://m0fhdyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=json

https://m0fhddyq5.execute-api.us-east-1.amazonaws.com/v1/br-candles?fmt=csv

br-candles is the bucket

Here is my lambda code. Currently, I only have "statuscode: 200 for the csv format since I dont know how to do what I want.

import json
import awswrangler as wr

def lambda_handler(event, context):
    
    print(event)
    bucket_name = event['params']['path']['bucket']
    format = event['params']['querystring']['fmt']
    full_path = f"s3://{bucket_name}"
    print(bucket_name, format)
    
    raw_df = wr.s3.read_csv(path=full_path, path_suffix=['.csv'], use_threads=True)
    for df in raw_df:
      if format == 'json':
        
        df = raw_df.to_json(orient="records")
        parsed = json.loads(df)
      
        return {
          'body': (parsed)
        }
      elif format == 'csv':
        for df in raw_df:
          #df = df.to_string(index=False)
          #print (df)
          
          return {
            "statusCode": 200
          } 
      else:
        return {
          "statusCode": 300
          }

答案1

得分: 1

Sure, here are the translated parts:

指定路径参数作为存储桶

要将路径参数指定为存储桶,最简单的方法是在API Gateway路由中定义一个形式为/base-url-of-your-api/{bucket}路径变量。由您的Lambda函数来检查给定的参数是否正确,并返回HTTP错误。

最终,您将得到类似以下的内容:

  • API Gateway资源:{bucket}
  • 在您的代码中:bucket_name = event['requestContext']['path'].split('/')[-1]

不过,我更倾向于使用功能名称而不是物理存储桶名称。这样可以避免向最终用户披露您的S3存储桶名称,并保持在不更改API契约的情况下,将来重新组织存储桶的灵活性:

  • API调用:/your-api-base-url/{functionalName}
  • 在您的代码中:
# 添加适当的错误处理
buckets_mapping = {
    'functionalNameAlpha': 'bucket-name-for-alpha',
    'functionalNameBeta': 'bucket-name-for-beta',
    'functionalNameGamma': 'bucket-name-for-gamma'
}

functional_name = event['requestContext']['path'].split('/')[-1]
bucket = buckets_mapping[functional_name]

返回格式

关于返回格式,更清晰的方法是依赖于标准的Accept HTTP头。调用方设置此头,并定义了接受的格式列表。例如:Accept: application/json,application/xml,text/csv 表示调用方理解这三种格式:json、xml和csv,按照这个偏好顺序。

这些头部信息可以在传递给您的lambda_handlerevent中找到。与上述类似,由您的Lambda函数来检查接受的格式是否与您的应用程序兼容,如果不兼容,则返回HTTP 406 "Not Acceptable":

accepted_formats = event['headers']['Accept']  # 例如: "application/xml,text/csv;q=0.9,application/json,text/*;q=0.2"

附注

我维护一个开源库,专门用于通过提供以下方法来简化所有这些操作:

  • 确定要选择的返回格式,
  • 如果Accept不列出任何可接受的格式,自动返回406
  • 设置响应内容转换器,将内部格式转换为调用方期望的格式

它叫做awsmate,可以在这里找到:https://github.com/shlublu/awsmate
如果您有兴趣,我会尽量找时间编辑此答案并在评论中发布一些代码示例。与此同时,它附带了一个示例应用程序,可以展示如何快速实现您的需求,希望能对您有所帮助。

英文:

Specify the bucket as a path param

To specify the bucket as a path param, the simplest is to define a path variable of the form /base-url-of-your-api/{bucket} in your API Gateway route. Up to your lambda to check whether the given parameter is correct and to return an HTTP error of not.

You will end up with something like this:

  • API Gateway resource: {bucket}
  • In your code: bucket_name = event['requestContext']['path'].split('/')[-1]

I'd rather use a functional name instead of a physical bucket name though. This not to disclose to the end user the name of your S3 buckets and to keep the flexibility to organize your buckets differently in the future without changing the API contract:

  • API call: /your-api-base-url/{functionalName}
  • In your code:

''' with some proper error handling to be added '''

buckets_mapping = {
    'functionalNameAlpha' : 'bucket-name-for-alpha',
    'functionalNameBeta' : 'bucket-name-for-beta',
    'functionalNameGamma' : 'bucket-name-for-gamma'
}
    
functional_name = event['requestContext']['path'].split('/')[-1]
bucket = buckets_mapping[functional_name ]

Return format

Regarding the return format, the cleaner way is to rely on the standard Accept HTTP header. This header is set by the caller and defines a list of accepted formats. For example: Accept: application/json,application/xml,text/csv means that these json, xml and csv are the three format that the caller understands, in this order of preference.

The headers are available in the event passed to your lambda_handler mounted in proxy mode. Similarly as above, up to your lambda to check whether the accepted formats are compatible with your application, and to return a HTTP 406 "Not Acceptable" if not:

accepted_formats = event['headers']['Accept'] # for example: "application/xml,text/csv;q=0.9,application/json,text/*;q=0.2"

Sidenote

I maintain an open source libraray that is dedicated to make all this easier by providing ways to (among other things):

  • determine the return format to select,
  • automatically return a 406 if Accept does not list any acceptable format
  • set up response content transformers to convert you internal format to the format expected by the caller

It is called awsmate and it is available here: https://github.com/shlublu/awsmate
I'll try to find some time to edit this answer and post some code here should you be interested (let me know in comments). In the meantime, it comes with an example application that could show how to quickly do what you wish I hope.

huangapple
  • 本文由 发表于 2023年4月17日 03:03:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76029827.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定