2023年7月13日 15:56:37go评论145阅读模式

英文:

HDF5 files and plotting using chunks

问题

Sure, here's the translated version of your content without the code parts:

我对HDF5文件还很陌生，不太明白如何访问数据集中的块。
我有一个相当大的数据集（1536，2048，11，18，2），被分块成（768，1024，1，1，1），每个块代表了图像的一半。
我想绘制数据集，以显示每个（完整的）图像的均值（使用matplotlib）。

问题：我如何访问单独的块以及如何处理它们？（h5py如何使用它们？）

这是我的代码：

bla = np.random.randint(0,100, (1536, 2048, 11, 18, 2))

with h5py.File('test.h5','w') as f:
    grp = f.create_group('Measurement 1')
    grp.create_dataset('data', data = bla, chunks = (768,1024,1,1,1))

f.close()

我有以下代码以访问数据集，但我不知道如何访问块：

with h5py.File('test.h5', 'r') as hf:
    for dset in hf['Measurement 1'].keys():      
        print (dset)
        ds_hf = hf['Measurement 1']['data'] # 返回HDF5数据集对象
        print (ds_hf)
        print (ds_hf.shape, ds_hf.dtype)
        data_f = hf['Measurement 1']['data'][:] # 添加[:]返回一个numpy数组
hf.close()

我需要程序打开每个块，获取均值，然后再关闭它，以免我的RAM占用过多空间。

英文:

I'm new to HDF5 files and I don't understand how to access chunks in a dataset.
I have quite a big dataset (1536, 2048, 11, 18, 2) which is chunked into (768, 1024, 1,1,1), each chunk represents half of an image.
I want to plot the dataset, giving the mean values of each (whole) image (using matplotlib).

Question: how to I access separate chunks and how do I work with them? (How does h5py use them?)

This is my code:

bla = np.random.randint(0,100, (1536, 2048, 11, 18, 2))

with h5py.File(&#39;test.h5&#39;,&#39;w&#39;) as f:
    grp = f.create_group(&#39;Measurement 1&#39;)
    grp.create_dataset(&#39;data&#39;, data = bla, chunks = (768,1024,1,1,1))

f.close()

I have this to get access to the dataset, but I don't know how to access the chunks..

with h5py.File(&#39;test.h5&#39;, &#39;r&#39;) as hf:
            for dset in hf[&#39;Measurement 1&#39;].keys():      
                print (dset)
                ds_hf = hf[&#39;Measurement 1&#39;][&#39;data&#39;] # returns HDF5 dataset object
                print (ds_hf)
                print (ds_hf.shape, ds_hf.dtype)
                data_f = hf[&#39;Measurement 1&#39;][&#39;data&#39;][:] # adding [:] returns a numpy array
hf.close()

I need the program to open each chunk, get the mean value and close it again before opening the next one, so my RAM doesn't get full.

答案1

得分: 1

这是一个示例代码，您可以了解在HDF5中如何使用数据块（chunks）。我以一种通用的方式开发了它，您可以根据您的需求进行修改：

import h5py
import numpy as np

# 生成随机数据
bla = np.random.randint(0, 100, (1536, 2048, 11, 18, 2))

# 创建HDF5文件和数据集
with h5py.File('test.h5', 'w') as f:
    grp = f.create_group('Measurement 1')
    grp.create_dataset('data', data=bla, chunks=(768, 1024, 1, 1, 1))

# 打开HDF5文件
with h5py.File('test.h5', 'r') as hf:
    # 访问数据集
    ds_hf = hf['Measurement 1']['data']
    print(ds_hf)
    print(ds_hf.shape, ds_hf.dtype)

    # 遍历数据块
    for chunk_idx in np.ndindex(ds_hf.chunks):
        chunk = ds_hf[chunk_idx]
        # 处理数据块
        chunk_mean = np.mean(chunk)
        print(f"数据块 {chunk_idx}: 均值 = {chunk_mean}")

# 关闭HDF5文件
hf.close()

英文:

Here is a sample code that you can understand how chunks work in hdf5, I developed it in a general way, you can modify it based on you requirements:

import h5py
import numpy as np

# Generate random data
bla = np.random.randint(0, 100, (1536, 2048, 11, 18, 2))

# Create the HDF5 file and dataset
with h5py.File(&#39;test.h5&#39;, &#39;w&#39;) as f:
    grp = f.create_group(&#39;Measurement 1&#39;)
    grp.create_dataset(&#39;data&#39;, data=bla, chunks=(768, 1024, 1, 1, 1))

# Open the HDF5 file
with h5py.File(&#39;test.h5&#39;, &#39;r&#39;) as hf:
    # Access the dataset
    ds_hf = hf[&#39;Measurement 1&#39;][&#39;data&#39;]
    print(ds_hf)
    print(ds_hf.shape, ds_hf.dtype)

    # Iterate over the chunks
    for chunk_idx in np.ndindex(ds_hf.chunks):
        chunk = ds_hf[chunk_idx]
        # Process the chunk
        chunk_mean = np.mean(chunk)
        print(f&quot;Chunk {chunk_idx}: Mean value = {chunk_mean}&quot;)

# Close the HDF5 file
hf.close()

答案2

得分: 1

Chunks are used to optimize I/O performance. HDF5 (and h5py) write/read data in chunked blocks (1 chunk at a time). This is handled in the background, and you do not have to worry about the chunking mechanism. The chunk size/shape is defined when you create the dataset, and cannot be changed. If you need to change it, there are HDF5 utilities to do this.

When reading data you don't have to worry about chunksize (in general). **See comments at end for more details. Use Numpy slice notation to read the desired slice, and h5py/HDF5 will read for you. YOU DO NOT HAVE TO WRITE YOUR CODE TO READ EXACTLY 1 CHUNK AT A TIME.

Assuming axis 0 is the image index, the code below will read each image array to the image object (as a numpy array). It's much easier and cleaner than working with the chunk objects.

with h5py.File('test.h5', 'r') as hf:
    ds_hf = hf['Measurement 1']['data'] # returns HDF5 dataset objects
    print(ds_hf.shape)
    for i in range(len(ds_hf.shape[0])):
        image = ds_hf[i] # this returns numpy array for image i

Although you don't have to worry about chunk size to read and write data, it's important to set an appropriate size for your use. That discussion goes beyond your question. Your size is good for your application.

英文:

Assuming axis 0 is the image index, the code below will read each image array to the image object (as a numpy array). It's much easier and cleaner than working with the chunk objects.

with h5py.File(&#39;test.h5&#39;, &#39;r&#39;) as hf:
    ds_hf = hf[&#39;Measurement 1&#39;][&#39;data&#39;] # returns HDF5 dataset objects
    print(ds_hf.shape)
    for i in range(len(ds_hf.shape[0]):
        image = ds_hf[i] # this returns numpy array for image i

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

HDF5文件和使用块绘图

问题

答案1

答案2

如何向Dask中的聚合函数传递参数。

如何在requests.session中使用代理？

在DataFrame列内查找字符串之间的相似性。

Tkinter – 文本插入到文本框中显示为反向。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论