2023年2月19日 19:54:10go评论68阅读模式

英文:

S3 GetObject Returning Weird Encoding for HTML file

问题

我试图将一个HTML文件读取到Lambda Edge函数中，然后在响应主体中返回它，但出于某种原因，我无法使其正确返回HTML文件的内容。

以下是我的代码（简化版）：

const AWS = require('aws-sdk');

const S3 = new AWS.S3({
    signatureVersion: 'v4'
});

const { Body } = await S3.getObject({ Bucket: 'my-bucket', Key: 'index.html' }).promise();

console.log(Body.toString());

在控制台日志中，我看到的不是<html...，而是可怕的问号字符，这意味着（我认为）编码错误：

> ��Y�r#��x�`�,b�J�ٲ��NR�yIٮ�!!���"���n���޴��Is�>}�n4pF�_���de�nq~�]� f����v��۔*�㺮ȕ� Hǆ�<�! �c�5�1B��,#|Ŵ;ֶ�U����z� Qi��j�0��V ���H���...etc

我已经尝试了包括但不限于以下的一切：

Body.toString('utf-8');
Body.toString('ascii');
Body.toString('base64');
decoder.write(Body.toString('base64'));
还有很多其他方法...

我认为我肯定漏掉了某些非常明显的东西，因为我找不到其他人遇到同样的问题。我曾认为问题可能与加密有关，但我的其他Lambda Edge函数可以读取图像文件而没有问题，所以我认为这必须与编码有关，但我没有考虑到。

更新

我认为问题可能与文件被gzip压缩有关。

以下是来自S3的响应的打印：

{
  AcceptRanges: 'bytes',
  LastModified: 2023-02-17T19:44:41.000Z,
  ContentLength: 1598,
  ETag: 'some-key',
  CacheControl: 'max-age=31536000',
  ContentEncoding: 'gzip',
  ContentType: 'text/html; charset=UTF-8',
  ServerSideEncryption: 'AES256',
  Metadata: { etag: 'some-key' },
  Body: <Buffer 1f 8b 08 00 00 00 00 00 00 03 cd 59 db 72 23 b7 11 fd 15 78 f2 60 bb 2c 62 ee b7 8d c8 4a b2 d9 b2 b7 ca 4e 52 bb 79 49 d9 ae 14 06 e8 21 21 cd 0c a6 ... 1548 more bytes>
}

希望这对你有帮助。

英文:

I am attempting to read an HTML file into a Lambda Edge function and then return it in the response body, but for some reason, I cannot get it to return the contents of the HTML file correctly.

Here is my code (simplified):

const AWS = require(&#39;aws-sdk&#39;);

const S3 = new AWS.S3({
    signatureVersion: &#39;v4&#39;
});

const { Body } = await S3.getObject({ Bucket: &#39;my-bucket&#39;, Key: &#39;index.html&#39; }).promise();

console.log(Body.toString());

Instead of seeing <html... in the console log, I am seeing the dreaded question mark characters which implies (I think), bad encoding:

> ��Y�r#��x�`�,b�J�ٲ��NR�yIٮ�!!��"��n��޴��Is�>}�n4pF�_��de�nq�~�]� f��v��۔*�㺮Ý� Hǆ�<�! �c�5�1B��,#|Ŵ;ֶ�U��z� �Qi��j�0��V ��H��...etc

I have literally tried everything including, but not limited to:

Body.toString('utf-8');
Body.toString('ascii');
Body.toString('base64');
decoder.write(Body.toString('base64'));
and a lot more...

I think I must be missing something really obvious here as I cannot find anyone else facing the same issue. I thought it might be to do with the encryption but my other Lambda Edge function reads an image file without issues so I assume it has to be something to do with encoding that I haven't thought of.

UPDATE

I believe the issue may be related to the fact that the file is gzipped.

Here is a print of the response from S3:

{
  AcceptRanges: &#39;bytes&#39;,
  LastModified: 2023-02-17T19:44:41.000Z,
  ContentLength: 1598,
  ETag: &#39;some-key&#39;,
  CacheControl: &#39;max-age=31536000&#39;,
  ContentEncoding: &#39;gzip&#39;,
  ContentType: &#39;text/html; charset=UTF-8&#39;,
  ServerSideEncryption: &#39;AES256&#39;,
  Metadata: { etag: &#39;some-key&#39; },
  Body: &lt;Buffer 1f 8b 08 00 00 00 00 00 00 03 cd 59 db 72 23 b7 11 fd 15 78 f2 60 bb 2c 62 ee b7 8d c8 4a b2 d9 b2 b7 ca 4e 52 bb 79 49 d9 ae 14 06 e8 21 21 cd 0c a6 ... 1548 more bytes&gt;
}

答案1

得分: 0

以下是翻译好的部分：

问题是由于index.html文件被压缩而引起的。处理此问题的解决方案如下：

const AWS  = require('aws-sdk');
const zlib = require('zlib');

const S3 = new AWS.S3({
    signatureVersion: 'v4'
});

const { Body } = await S3.getObject({ Bucket: 'my-bucket', Key: 'index.html' }).promise();

const body = zlib.unzipSync(Body).toString();

console.log(body);

然而，考虑到这段代码将部署在Lambda Edge函数中，速度至关重要！因此，实际上更快的做法是利用https.get，因为它将在返回Buffer流之前自动解压文件的内容。

英文:

The issue was caused by the fact that the index.html file was gzipped. The solution to handling this is as follows:

const AWS  = require(&#39;aws-sdk&#39;);
const zlib = require(&#39;zlib&#39;);

const S3 = new AWS.S3({
    signatureVersion: &#39;v4&#39;
});

const { Body } = await S3.getObject({ Bucket: &#39;my-bucket&#39;, Key: &#39;index.html&#39; }).promise();

const body = zlib.unzipSync(Body).toString();

console.log(body);

However, given that this code is being deployed within a Lambda Edge function, speed is essential! With that in mind, it would/should be faster to actually utilise https.get as this will automatically unzip the contents of the file prior to returning the Buffer stream.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

S3 GetObject 返回 HTML 文件的奇怪编码

问题

更新

UPDATE

答案1

Excluding @aws-sdk packages from the output of serverless package command in Serverless Framework.

获取一个在0和1之间的加权值，基于0为最小值，1为最大值。

使用Highcharts和Angular创建动态列图，以可视化对象数据。

使用Multipart内容类型将文件通过Ajax上传到GoLang服务器

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论