英文:
Mocking file-like gzipped csv for boto3's StreamingBody
问题
我的真正的S3助手执行以下操作:
```python
def read_gzipped_csv_from_s3(self, key):
return self.bucket.Object(key).get()
obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')
我需要模拟read_gzipped_csv_from_s3()
方法进行单元测试。问题是响应应该是一个经过gzip压缩的CSV,我必须从字符串构造它,因为在Gitlab的流水线中运行测试时无法存储任何内容。
所以我有一些CSV字符串:
CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""
然后我有一些示例代码,用于使用常规CSV文件模拟botocore.response.StreamingBody
:
body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))
但我无法弄清楚如何在内存中创建gzip压缩的CSV:下面是我在某处找到的开始部分:
import gzip
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
<无法弄清楚这里应该怎么做>
非常感谢您的帮助。
尝试了很多来自SO的其他片段并对其进行了修改,但没有成功。我期望得到:经过gzip压缩的CSV文件类似对象,以传递给StreamingBody
。
<details>
<summary>英文:</summary>
My real S3 helper does the following:
def read_gzipped_csv_from_s3(self, key):
return self.bucket.Object(key).get()
obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')
I need to mock `read_gzipped_csv_from_s3()` method for unit tests. The problem is that the response should be a gzipped CSV which I must construct from a string because I cannot store anything as tests are running in a Gitlab's pipeline.
So I have some csv as a string:
CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""
Then I have some example code for using a regular CSV file to mock botocore.response.StreamingBody:
body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))
but I can't figure out how to create gzipped CSV in memory: there's the beginning I've found somewhere:
import gzip
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
<can't figure out what's here>
Help would be much appreciated.
Tried tons of other snippets from SO and modified them but no luck. What I expect: gzipped CSV file-like object to pass to StreamingBody
</details>
# 答案1
**得分**: 1
You could use `.write()` to write the data into the `BytesIO` object. You also need `.seek()` to reset the file position to the beginning before you can read it.
```python
import gzip
from io import BytesIO, TextIOWrapper
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
wrapper.write(CSV_DATA)
buffer.seek(0)
df = pd.read_csv(buffer, compression='gzip')
英文:
You could use .write()
to write the data into the BytesIO
object. You also need .seek()
to reset the file position to the beginning before you can read it.
import gzip
from io import BytesIO, TextIOWrapper
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
wrapper.write(CSV_DATA)
buffer.seek(0)
df = pd.read_csv(buffer, compression='gzip')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论