英文:
Rename columns in a Pandas Dataframe dynamically and write it to S3 as CSV
问题
我是新手使用pandas和python。我有一个要求,需要从S3中读取一个csv
文件到一个Pandas DF
中,然后动态地根据一个Python list
中提到的内容重命名列。list
中的新列名将按照csv中的列的顺序排列。
然后,我需要将这个重命名后的pd数据框写入S3上名为datetime=20230506112312
的文件夹中。您在文件夹名称中看到的时间戳部分应该是当前时间戳。写入的csv文件应该命名为 - export_current_20230506112312.csv
如果需要额外的信息,请告诉我。谢谢。
英文:
I am new to pandas and python. I have a requirement to read a csv
file from s3 into a Pandas DF
and then, dynamically rename the columns as mentioned in a Python list
. The new column names in the list
will be in the same order as the columns in the csv.
Then I need to write this renamed pd dataframe as a csv file in a folder on S3 named datetime=20230506112312
. The times-stamp part that you see in the folder-name should be the current timestamp. The csv file written should have the name as - export_current_20230506112312.csv
Please let me know if additional information is required.
Thanks
答案1
得分: 1
以下是您提供的内容的中文翻译:
我们可以使用boto3
来完成这个任务,首先我们需要安装它,运行pip install boto3
,然后在接下来的步骤中不要忘记替换必要的信息,我们需要设置好我们的凭证。
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
然后,例如,对于这些列名new_columns = ["new_col1", "new_col2", "new_col3", ...]
,我们的脚本将如下所示:
import boto3
import pandas as pd
from io import StringIO
import datetime
s3 = boto3.client('s3', region_name='us-east-1')
response = s3.get_object(Bucket='mybucket', Key='mykey')
df = pd.read_csv(response['Body'])
df.columns = ["new_col1", "new_col2", "new_col3"]
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
s3.put_object(Bucket='mybucket', Key=f'datetime={timestamp}/export_current_{timestamp}.csv', Body=csv_buffer.getvalue())
希望这对您有所帮助。
英文:
We could do that with boto3
, first we install it pip install boto3
then for the following dont forget to replace the necessary informations, we set up our credentials
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
then for example for those kind columns names new_columns = ["new_col1", "new_col2", "new_col3", ...]
we will have our script like this:
import boto3
import pandas as pd
from io import StringIO
import datetime
s3 = boto3.client('s3', region_name='us-east-1')
response = s3.get_object(Bucket='mybucket', Key='mykey')
df = pd.read_csv(response['Body'])
df.columns = ["new_col1", "new_col2", "new_col3"]
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
s3.put_object(Bucket='mybucket', Key=f'datetime={timestamp}/export_current_{timestamp}.csv', Body=csv_buffer.getvalue())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论