在Pandas数据框中动态重命名列,并将其写入S3为CSV格式。

huangapple go评论62阅读模式
英文:

Rename columns in a Pandas Dataframe dynamically and write it to S3 as CSV

问题

我是新手使用pandas和python。我有一个要求,需要从S3中读取一个csv文件到一个Pandas DF中,然后动态地根据一个Python list中提到的内容重命名列。list中的新列名将按照csv中的列的顺序排列。

然后,我需要将这个重命名后的pd数据框写入S3上名为datetime=20230506112312的文件夹中。您在文件夹名称中看到的时间戳部分应该是当前时间戳。写入的csv文件应该命名为 - export_current_20230506112312.csv

如果需要额外的信息,请告诉我。谢谢。

英文:

I am new to pandas and python. I have a requirement to read a csv file from s3 into a Pandas DF and then, dynamically rename the columns as mentioned in a Python list. The new column names in the list will be in the same order as the columns in the csv.

Then I need to write this renamed pd dataframe as a csv file in a folder on S3 named datetime=20230506112312. The times-stamp part that you see in the folder-name should be the current timestamp. The csv file written should have the name as - export_current_20230506112312.csv

Please let me know if additional information is required.
Thanks

答案1

得分: 1

以下是您提供的内容的中文翻译:

我们可以使用boto3来完成这个任务,首先我们需要安装它,运行pip install boto3,然后在接下来的步骤中不要忘记替换必要的信息,我们需要设置好我们的凭证。

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

然后,例如,对于这些列名new_columns = ["new_col1", "new_col2", "new_col3", ...],我们的脚本将如下所示:

import boto3
import pandas as pd
from io import StringIO
import datetime

s3 = boto3.client('s3', region_name='us-east-1')
response = s3.get_object(Bucket='mybucket', Key='mykey')
df = pd.read_csv(response['Body'])
df.columns = ["new_col1", "new_col2", "new_col3"]

timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
s3.put_object(Bucket='mybucket', Key=f'datetime={timestamp}/export_current_{timestamp}.csv', Body=csv_buffer.getvalue())

希望这对您有所帮助。

英文:

We could do that with boto3, first we install it pip install boto3 then for the following dont forget to replace the necessary informations, we set up our credentials

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

then for example for those kind columns names new_columns = ["new_col1", "new_col2", "new_col3", ...]

we will have our script like this:

import boto3
import pandas as pd
from io import StringIO
import datetime

s3 = boto3.client('s3', region_name='us-east-1')
response = s3.get_object(Bucket='mybucket', Key='mykey')
df = pd.read_csv(response['Body'])
df.columns = ["new_col1", "new_col2", "new_col3"]

timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
s3.put_object(Bucket='mybucket', Key=f'datetime={timestamp}/export_current_{timestamp}.csv', Body=csv_buffer.getvalue())

huangapple
  • 本文由 发表于 2023年6月13日 15:01:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76462379.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定