重复使用流以计算Azure Blob的MD5和SHA256。

huangapple go评论79阅读模式
英文:

Reusing stream when calculating MD5 and SHA256 of Azure Blob

问题

我必须计算存储在Azure Blob存储帐户中的文件的MD5和SHA256,使用Azure Function。我尝试使用DownloadContent方法将文件下载到内存中,但对于大文件(超过4GB),我会耗尽内存(我必须扩展,这很昂贵)。我找到了另一种方法DownloadStreaming,对于第一次计算,它工作正常(几乎没有额外的内存使用),但在计算第二个哈希之前,我必须再次下载文件,因为流有点空。我尝试在该流上调用Position = 0,但是我得到了不支持的类型异常。有没有办法在不重新下载文件的情况下使用DownloadStreaming

这是我正在使用的代码:

var sourceFile = await blobClient.DownloadStreamingAsync();
byte[] md5Result;
byte[] sha256Result;

using (var md5 = MD5.Create())
{
   md5Result = md5.ComputeHash(sourceFile.Value.Content);
}

sourceFile = await blobClient.DownloadStreamingAsync();
using (var sha256 = SHA256.Create())
{
   sha256Result = sha256.ComputeHash(sourceFile.Value.Content);
}
英文:

I have to calculate MD5 and SHA256 of a file stored in Azure Blob Storage Account using Azure Function. I've tried to download file to a memory using DownloadContent method, but with large files (more than 4GB) I'm running out of memory (I'd have to scale up which is expensive). I found another method DownloadStreaming which works fine for first calculation (almost no additional memory usage), but before calculating 2nd hash I have to download the file again, because the stream is kind of empty. I've tried to call Position = 0 on that stream, but I was getting not supported type of exception. Is there any way to use DownloadStreaming without re-downloading the file?

Here is the code I'm using:

var sourceFile = await blobClient.DownloadStreamingAsync();
byte[] md5Result;
byte[] sha256Result;

using (var md5 = MD5.Create())
{
   md5Result = md5.ComputeHash(sourceFile.Value.Content);
}

sourceFile = await blobClient.DownloadStreamingAsync();
using (var sha256 = SHA256.Create())
{
   sha256Result = sha256.ComputeHash(sourceFile.Value.Content);
}

答案1

得分: 0

你可以在循环中读取流,并使用 md5.TransformBlocksha256.TransformBlock 手动逐位计算哈希值。

  • 你不需要向该函数传递 outputBuffer,可以传递 null
  • 在检索哈希值之前,需要调用 TransformFinalBlock
  • 考虑在异步函数上使用 CancellationToken
var inputBuffer = new byte[4000];
using var sourceFile = await blobClient.DownloadStreamingAsync();
using var md5 = MD5.Create();
using var sha256 = SHA256.Create();

int bytesRead;
while ((bytesRead = await sourceFile.ReadAsync(inputBuffer.AsMemory())) > 0)
{
    md5.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
    sha256.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
}
md5.TransformFinalBlock(inputBuffer, 0, 0);  // 完成哈希计算
var md5Result = md5.Hash;

sha256.TransformFinalBlock(inputBuffer, 0, 0);  // 完成哈希计算
var sha256Result = sha256.Hash;
英文:

You can read the stream in a loop, and use md5.TransformBlock and sha256.TransformBlock manually to compute the hashes bit by bit.

  • You don't need to pass an outputBuffer to that function, you can pass null.
  • You need to call TransformFinalBlock before retrieving the hash.
  • Consider using CancellationToken on async functions.
var inputBuffer = new byte[4000];
using var sourceFile = await blobClient.DownloadStreamingAsync();
using var md5 = MD5.Create();
using var sha256 = SHA256.Create();

int bytesRead;
while ((bytesRead = await sourceFile.ReadAsync(inputBuffer.AsMemory())) > 0)
{
    md5.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
    sha256.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
}
md5.TransformFinalBlock(inputBuffer, 0, 0);  // finish hash calculation
var md5Result = md5.Hash;

sha256.TransformFinalBlock(inputBuffer, 0, 0);  // finish hash calculation
var sha256Result = sha256.Hash;

huangapple
  • 本文由 发表于 2023年6月19日 21:13:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定