问题

我是新手使用pandas和python。我有一个要求，需要从S3中读取一个csv文件到一个Pandas DF中，然后动态地根据一个Python list中提到的内容重命名列。list中的新列名将按照csv中的列的顺序排列。

然后，我需要将这个重命名后的pd数据框写入S3上名为datetime=20230506112312的文件夹中。您在文件夹名称中看到的时间戳部分应该是当前时间戳。写入的csv文件应该命名为 - export_current_20230506112312.csv

如果需要额外的信息，请告诉我。谢谢。

英文:

I am new to pandas and python. I have a requirement to read a csv file from s3 into a Pandas DF and then, dynamically rename the columns as mentioned in a Python list. The new column names in the list will be in the same order as the columns in the csv.

Then I need to write this renamed pd dataframe as a csv file in a folder on S3 named datetime=20230506112312. The times-stamp part that you see in the folder-name should be the current timestamp. The csv file written should have the name as - export_current_20230506112312.csv

Please let me know if additional information is required.
Thanks

答案1

得分: 1

以下是您提供的内容的中文翻译：

我们可以使用boto3来完成这个任务，首先我们需要安装它，运行pip install boto3，然后在接下来的步骤中不要忘记替换必要的信息，我们需要设置好我们的凭证。

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

然后，例如，对于这些列名new_columns = ["new_col1", "new_col2", "new_col3", ...]，我们的脚本将如下所示：

import boto3
import pandas as pd
from io import StringIO
import datetime

s3 = boto3.client('s3', region_name='us-east-1')
response = s3.get_object(Bucket='mybucket', Key='mykey')
df = pd.read_csv(response['Body'])
df.columns = ["new_col1", "new_col2", "new_col3"]

timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
s3.put_object(Bucket='mybucket', Key=f'datetime={timestamp}/export_current_{timestamp}.csv', Body=csv_buffer.getvalue())

希望这对您有所帮助。

英文:

We could do that with boto3, first we install it pip install boto3 then for the following dont forget to replace the necessary informations, we set up our credentials

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

then for example for those kind columns names new_columns = ["new_col1", "new_col2", "new_col3", ...]

we will have our script like this:

import boto3
import pandas as pd
from io import StringIO
import datetime

s3 = boto3.client(&#39;s3&#39;, region_name=&#39;us-east-1&#39;)
response = s3.get_object(Bucket=&#39;mybucket&#39;, Key=&#39;mykey&#39;)
df = pd.read_csv(response[&#39;Body&#39;])
df.columns = [&quot;new_col1&quot;, &quot;new_col2&quot;, &quot;new_col3&quot;]

timestamp = datetime.datetime.now().strftime(&quot;%Y%m%d%H%M%S&quot;)
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
s3.put_object(Bucket=&#39;mybucket&#39;, Key=f&#39;datetime={timestamp}/export_current_{timestamp}.csv&#39;, Body=csv_buffer.getvalue())

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas数据框中动态重命名列，并将其写入S3为CSV格式。

问题

答案1

Python smart_open在文档中的代码中引发了NotImplementedError错误。

寻找给定一组坐标时最有效的算法是什么？

AmazonS3变量未能正确自动装配。

如何将结果保存在文本文件中？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论