问题

我的真正的S3助手执行以下操作：
```python
def read_gzipped_csv_from_s3(self, key):
    return self.bucket.Object(key).get()

obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')

我需要模拟read_gzipped_csv_from_s3()方法进行单元测试。问题是响应应该是一个经过gzip压缩的CSV，我必须从字符串构造它，因为在Gitlab的流水线中运行测试时无法存储任何内容。

所以我有一些CSV字符串：

CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""

然后我有一些示例代码，用于使用常规CSV文件模拟botocore.response.StreamingBody：

body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))

但我无法弄清楚如何在内存中创建gzip压缩的CSV：下面是我在某处找到的开始部分：

import gzip

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
    with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
        <无法弄清楚这里应该怎么做>

非常感谢您的帮助。

尝试了很多来自SO的其他片段并对其进行了修改，但没有成功。我期望得到：经过gzip压缩的CSV文件类似对象，以传递给StreamingBody。


<details>
<summary>英文:</summary>

My real S3 helper does the following:

def read_gzipped_csv_from_s3(self, key):
return self.bucket.Object(key).get()

obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')


I need to mock `read_gzipped_csv_from_s3()` method for unit tests. The problem is that the response should be a gzipped CSV which I must construct from a string because I cannot store anything as tests are running in a Gitlab&#39;s pipeline.

So I have some csv as a string:

CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""


Then I have some example code for using a regular CSV file to mock botocore.response.StreamingBody:

body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))


but I can&#39;t figure out how to create gzipped CSV in memory: there&#39;s the beginning I&#39;ve found somewhere:

import gzip

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
<can't figure out what's here>


Help would be much appreciated.

Tried tons of other snippets from SO and modified them but no luck. What I expect: gzipped CSV file-like object to pass to StreamingBody

</details>


# 答案1
**得分**: 1

You could use `.write()` to write the data into the `BytesIO` object. You also need `.seek()` to reset the file position to the beginning before you can read it.

```python
import gzip
from io import BytesIO, TextIOWrapper

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
    with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
        wrapper.write(CSV_DATA)
buffer.seek(0)
df = pd.read_csv(buffer, compression='gzip')

英文:

You could use .write() to write the data into the BytesIO object. You also need .seek() to reset the file position to the beginning before you can read it.

import gzip
from io import BytesIO, TextIOWrapper

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode=&#39;wb&#39;) as compressed:
    with TextIOWrapper(compressed, encoding=&#39;utf-8&#39;) as wrapper:
        wrapper.write(CSV_DATA)
buffer.seek(0)
df = pd.read_csv(buffer, compression=&#39;gzip&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Mocking file-like gzipped csv for boto3’s StreamingBody

问题

Named shared memory between C++ and python on Windows

使用pandas在数据框中创建层次结构，需要使用4列。

如何获取无效的OR-Tools CP-SAT模型的详细错误消息？

Python matplotlib堆叠条形图（系列、数据和类别）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论