Mocking file-like gzipped csv for boto3’s StreamingBody

huangapple go评论68阅读模式
英文:

Mocking file-like gzipped csv for boto3's StreamingBody

问题

我的真正的S3助手执行以下操作
```python
def read_gzipped_csv_from_s3(self, key):
    return self.bucket.Object(key).get()

obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')

我需要模拟read_gzipped_csv_from_s3()方法进行单元测试。问题是响应应该是一个经过gzip压缩的CSV,我必须从字符串构造它,因为在Gitlab的流水线中运行测试时无法存储任何内容。

所以我有一些CSV字符串:

CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""

然后我有一些示例代码,用于使用常规CSV文件模拟botocore.response.StreamingBody

body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))

但我无法弄清楚如何在内存中创建gzip压缩的CSV:下面是我在某处找到的开始部分:

import gzip

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
    with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
        <无法弄清楚这里应该怎么做>

非常感谢您的帮助。

尝试了很多来自SO的其他片段并对其进行了修改,但没有成功。我期望得到:经过gzip压缩的CSV文件类似对象,以传递给StreamingBody


<details>
<summary>英文:</summary>

My real S3 helper does the following:

def read_gzipped_csv_from_s3(self, key):
return self.bucket.Object(key).get()

obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')


I need to mock `read_gzipped_csv_from_s3()` method for unit tests. The problem is that the response should be a gzipped CSV which I must construct from a string because I cannot store anything as tests are running in a Gitlab&#39;s pipeline.

So I have some csv as a string:

CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""


Then I have some example code for using a regular CSV file to mock botocore.response.StreamingBody:

body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))


but I can&#39;t figure out how to create gzipped CSV in memory: there&#39;s the beginning I&#39;ve found somewhere:

import gzip

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
<can't figure out what's here>


Help would be much appreciated.

Tried tons of other snippets from SO and modified them but no luck. What I expect: gzipped CSV file-like object to pass to StreamingBody

</details>


# 答案1
**得分**: 1

You could use `.write()` to write the data into the `BytesIO` object. You also need `.seek()` to reset the file position to the beginning before you can read it.

```python
import gzip
from io import BytesIO, TextIOWrapper

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
    with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
        wrapper.write(CSV_DATA)
buffer.seek(0)
df = pd.read_csv(buffer, compression='gzip')
英文:

You could use .write() to write the data into the BytesIO object. You also need .seek() to reset the file position to the beginning before you can read it.

import gzip
from io import BytesIO, TextIOWrapper

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode=&#39;wb&#39;) as compressed:
    with TextIOWrapper(compressed, encoding=&#39;utf-8&#39;) as wrapper:
        wrapper.write(CSV_DATA)
buffer.seek(0)
df = pd.read_csv(buffer, compression=&#39;gzip&#39;)

huangapple
  • 本文由 发表于 2023年8月5日 03:01:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838565.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定