问题

我有一些数据集在一个 GitHub 仓库中，我想使用 Python 将它们移动到 S3，而不保存在本地任何地方。

这是我的源公共仓库链接：
[https://github.com/statsbomb/open-data/tree/master/data]

我看到了 boto3 的工作原理，但我必须将文件保存在我的工作空间中，然后才能将其上传到 S3。由于数据量太大，我想直接将其移动到 S3，然后开始整理数据。

英文:

I have some datasets in a github repo and I want to move them to S3 using python without saving locally anything .

This is my source public repo :
[https://github.com/statsbomb/open-data/tree/master/data]

I have seen boto3 working but I have to save the file in my workspace to upload it to s3.
This is too much data to download so i want to move directly to s3 then start wrangling the data .

答案1

得分: 0

import requests
import boto3

s3 = boto3.client('s3')
bucket_name = 'your_bucket_name'

List of datasets you want to download

datasets = [
'events',
'matches',
'competitions.json',
'lineups'
]

Download the datasets and upload them to S3

for dataset in datasets:
url = f'https://github.com/statsbomb/open-data/blob/master/data/{dataset}.json?raw=true'
response = requests.get(url, stream=True)
s3.upload_fileobj(response.raw, bucket_name, f'{dataset}.json')

英文:

import requests
import boto3

s3 = boto3.client(&#39;s3&#39;)
bucket_name = &#39;your_bucket_name&#39;

# List of datasets you want to download
datasets = [
    &#39;events&#39;, 
    &#39;matches&#39;, 
    &#39;competitions.json&#39;, 
    &#39;lineups&#39;
]

# Download the datasets and upload them to S3
for dataset in datasets:
    url = f&#39;https://github.com/statsbomb/open-data/blob/master/data/{dataset}.json?raw=true&#39;
    response = requests.get(url, stream=True)
    s3.upload_fileobj(response.raw, bucket_name, f&#39;{dataset}.json&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将 JSON 文件写入 S3 存储桶，无需在本地保存文件。

问题

答案1

List of datasets you want to download

Download the datasets and upload them to S3

在Python中，我们是否有“已检查”和“未检查”异常的概念？

Django身份验证在现有应用中同时使用电子邮件和用户名

RSA – CTF 加密和解密

从Powershell脚本安装Python

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论