英文:
Reusing stream when calculating MD5 and SHA256 of Azure Blob
问题
我必须计算存储在Azure Blob存储帐户中的文件的MD5和SHA256,使用Azure Function。我尝试使用DownloadContent方法将文件下载到内存中,但对于大文件(超过4GB),我会耗尽内存(我必须扩展,这很昂贵)。我找到了另一种方法DownloadStreaming,对于第一次计算,它工作正常(几乎没有额外的内存使用),但在计算第二个哈希之前,我必须再次下载文件,因为流有点空。我尝试在该流上调用Position = 0
,但是我得到了不支持的类型异常。有没有办法在不重新下载文件的情况下使用DownloadStreaming?
这是我正在使用的代码:
var sourceFile = await blobClient.DownloadStreamingAsync();
byte[] md5Result;
byte[] sha256Result;
using (var md5 = MD5.Create())
{
md5Result = md5.ComputeHash(sourceFile.Value.Content);
}
sourceFile = await blobClient.DownloadStreamingAsync();
using (var sha256 = SHA256.Create())
{
sha256Result = sha256.ComputeHash(sourceFile.Value.Content);
}
英文:
I have to calculate MD5 and SHA256 of a file stored in Azure Blob Storage Account using Azure Function. I've tried to download file to a memory using DownloadContent method, but with large files (more than 4GB) I'm running out of memory (I'd have to scale up which is expensive). I found another method DownloadStreaming which works fine for first calculation (almost no additional memory usage), but before calculating 2nd hash I have to download the file again, because the stream is kind of empty. I've tried to call Position = 0
on that stream, but I was getting not supported type of exception. Is there any way to use DownloadStreaming without re-downloading the file?
Here is the code I'm using:
var sourceFile = await blobClient.DownloadStreamingAsync();
byte[] md5Result;
byte[] sha256Result;
using (var md5 = MD5.Create())
{
md5Result = md5.ComputeHash(sourceFile.Value.Content);
}
sourceFile = await blobClient.DownloadStreamingAsync();
using (var sha256 = SHA256.Create())
{
sha256Result = sha256.ComputeHash(sourceFile.Value.Content);
}
答案1
得分: 0
你可以在循环中读取流,并使用 md5.TransformBlock
和 sha256.TransformBlock
手动逐位计算哈希值。
- 你不需要向该函数传递
outputBuffer
,可以传递null
。 - 在检索哈希值之前,需要调用
TransformFinalBlock
。 - 考虑在异步函数上使用
CancellationToken
。
var inputBuffer = new byte[4000];
using var sourceFile = await blobClient.DownloadStreamingAsync();
using var md5 = MD5.Create();
using var sha256 = SHA256.Create();
int bytesRead;
while ((bytesRead = await sourceFile.ReadAsync(inputBuffer.AsMemory())) > 0)
{
md5.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
sha256.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
}
md5.TransformFinalBlock(inputBuffer, 0, 0); // 完成哈希计算
var md5Result = md5.Hash;
sha256.TransformFinalBlock(inputBuffer, 0, 0); // 完成哈希计算
var sha256Result = sha256.Hash;
英文:
You can read the stream in a loop, and use md5.TransformBlock
and sha256.TransformBlock
manually to compute the hashes bit by bit.
- You don't need to pass an
outputBuffer
to that function, you can passnull
. - You need to call
TransformFinalBlock
before retrieving the hash. - Consider using
CancellationToken
on async functions.
var inputBuffer = new byte[4000];
using var sourceFile = await blobClient.DownloadStreamingAsync();
using var md5 = MD5.Create();
using var sha256 = SHA256.Create();
int bytesRead;
while ((bytesRead = await sourceFile.ReadAsync(inputBuffer.AsMemory())) > 0)
{
md5.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
sha256.TransformBlock(inputBuffer, 0, bytesRead, null, 0);
}
md5.TransformFinalBlock(inputBuffer, 0, 0); // finish hash calculation
var md5Result = md5.Hash;
sha256.TransformFinalBlock(inputBuffer, 0, 0); // finish hash calculation
var sha256Result = sha256.Hash;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论