将 JSON 文件写入 S3 存储桶,无需在本地保存文件。

huangapple go评论65阅读模式
英文:

write json files to s3 bucket without saving to file locally

问题

我有一些数据集在一个 GitHub 仓库中,我想使用 Python 将它们移动到 S3,而不保存在本地任何地方。

这是我的源公共仓库链接:
[https://github.com/statsbomb/open-data/tree/master/data]

我看到了 boto3 的工作原理,但我必须将文件保存在我的工作空间中,然后才能将其上传到 S3。由于数据量太大,我想直接将其移动到 S3,然后开始整理数据。

英文:

I have some datasets in a github repo and I want to move them to S3 using python without saving locally anything .

This is my source public repo :
[https://github.com/statsbomb/open-data/tree/master/data]

I have seen boto3 working but I have to save the file in my workspace to upload it to s3.
This is too much data to download so i want to move directly to s3 then start wrangling the data .

答案1

得分: 0

import requests
import boto3

s3 = boto3.client('s3')
bucket_name = 'your_bucket_name'

List of datasets you want to download

datasets = [
'events',
'matches',
'competitions.json',
'lineups'
]

Download the datasets and upload them to S3

for dataset in datasets:
url = f'https://github.com/statsbomb/open-data/blob/master/data/{dataset}.json?raw=true'
response = requests.get(url, stream=True)
s3.upload_fileobj(response.raw, bucket_name, f'{dataset}.json')

英文:
import requests
import boto3

s3 = boto3.client('s3')
bucket_name = 'your_bucket_name'

# List of datasets you want to download
datasets = [
    'events', 
    'matches', 
    'competitions.json', 
    'lineups'
]

# Download the datasets and upload them to S3
for dataset in datasets:
    url = f'https://github.com/statsbomb/open-data/blob/master/data/{dataset}.json?raw=true'
    response = requests.get(url, stream=True)
    s3.upload_fileobj(response.raw, bucket_name, f'{dataset}.json')

huangapple
  • 本文由 发表于 2023年2月16日 04:35:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75465194.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定