英文:
write json files to s3 bucket without saving to file locally
问题
我有一些数据集在一个 GitHub 仓库中,我想使用 Python 将它们移动到 S3,而不保存在本地任何地方。
这是我的源公共仓库链接:
[https://github.com/statsbomb/open-data/tree/master/data]
我看到了 boto3 的工作原理,但我必须将文件保存在我的工作空间中,然后才能将其上传到 S3。由于数据量太大,我想直接将其移动到 S3,然后开始整理数据。
英文:
I have some datasets in a github repo and I want to move them to S3 using python without saving locally anything .
This is my source public repo :
[https://github.com/statsbomb/open-data/tree/master/data]
I have seen boto3 working but I have to save the file in my workspace to upload it to s3.
This is too much data to download so i want to move directly to s3 then start wrangling the data .
答案1
得分: 0
import requests
import boto3
s3 = boto3.client('s3')
bucket_name = 'your_bucket_name'
List of datasets you want to download
datasets = [
'events',
'matches',
'competitions.json',
'lineups'
]
Download the datasets and upload them to S3
for dataset in datasets:
url = f'https://github.com/statsbomb/open-data/blob/master/data/{dataset}.json?raw=true'
response = requests.get(url, stream=True)
s3.upload_fileobj(response.raw, bucket_name, f'{dataset}.json')
英文:
import requests
import boto3
s3 = boto3.client('s3')
bucket_name = 'your_bucket_name'
# List of datasets you want to download
datasets = [
'events',
'matches',
'competitions.json',
'lineups'
]
# Download the datasets and upload them to S3
for dataset in datasets:
url = f'https://github.com/statsbomb/open-data/blob/master/data/{dataset}.json?raw=true'
response = requests.get(url, stream=True)
s3.upload_fileobj(response.raw, bucket_name, f'{dataset}.json')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论