HDF5文件和使用块绘图

huangapple go评论65阅读模式
英文:

HDF5 files and plotting using chunks

问题

Sure, here's the translated version of your content without the code parts:

我对HDF5文件还很陌生,不太明白如何访问数据集中的块。
我有一个相当大的数据集(1536,2048,11,18,2),被分块成(768,1024,1,1,1),每个块代表了图像的一半。
我想绘制数据集,以显示每个(完整的)图像的均值(使用matplotlib)。

问题:我如何访问单独的块以及如何处理它们?(h5py如何使用它们?)

这是我的代码:

bla = np.random.randint(0,100, (1536, 2048, 11, 18, 2))

with h5py.File('test.h5','w') as f:
    grp = f.create_group('Measurement 1')
    grp.create_dataset('data', data = bla, chunks = (768,1024,1,1,1))

f.close()

我有以下代码以访问数据集,但我不知道如何访问块:

with h5py.File('test.h5', 'r') as hf:
    for dset in hf['Measurement 1'].keys():      
        print (dset)
        ds_hf = hf['Measurement 1']['data'] # 返回HDF5数据集对象
        print (ds_hf)
        print (ds_hf.shape, ds_hf.dtype)
        data_f = hf['Measurement 1']['data'][:] # 添加[:]返回一个numpy数组
hf.close()

我需要程序打开每个块,获取均值,然后再关闭它,以免我的RAM占用过多空间。

英文:

I'm new to HDF5 files and I don't understand how to access chunks in a dataset.
I have quite a big dataset (1536, 2048, 11, 18, 2) which is chunked into (768, 1024, 1,1,1), each chunk represents half of an image.
I want to plot the dataset, giving the mean values of each (whole) image (using matplotlib).

Question: how to I access separate chunks and how do I work with them? (How does h5py use them?)

This is my code:

bla = np.random.randint(0,100, (1536, 2048, 11, 18, 2))

with h5py.File('test.h5','w') as f:
    grp = f.create_group('Measurement 1')
    grp.create_dataset('data', data = bla, chunks = (768,1024,1,1,1))

f.close()

I have this to get access to the dataset, but I don't know how to access the chunks..

with h5py.File('test.h5', 'r') as hf:
            for dset in hf['Measurement 1'].keys():      
                print (dset)
                ds_hf = hf['Measurement 1']['data'] # returns HDF5 dataset object
                print (ds_hf)
                print (ds_hf.shape, ds_hf.dtype)
                data_f = hf['Measurement 1']['data'][:] # adding [:] returns a numpy array
hf.close()

I need the program to open each chunk, get the mean value and close it again before opening the next one, so my RAM doesn't get full.

答案1

得分: 1

这是一个示例代码,您可以了解在HDF5中如何使用数据块(chunks)。我以一种通用的方式开发了它,您可以根据您的需求进行修改:

import h5py
import numpy as np

# 生成随机数据
bla = np.random.randint(0, 100, (1536, 2048, 11, 18, 2))

# 创建HDF5文件和数据集
with h5py.File('test.h5', 'w') as f:
    grp = f.create_group('Measurement 1')
    grp.create_dataset('data', data=bla, chunks=(768, 1024, 1, 1, 1))

# 打开HDF5文件
with h5py.File('test.h5', 'r') as hf:
    # 访问数据集
    ds_hf = hf['Measurement 1']['data']
    print(ds_hf)
    print(ds_hf.shape, ds_hf.dtype)

    # 遍历数据块
    for chunk_idx in np.ndindex(ds_hf.chunks):
        chunk = ds_hf[chunk_idx]
        # 处理数据块
        chunk_mean = np.mean(chunk)
        print(f"数据块 {chunk_idx}: 均值 = {chunk_mean}")

# 关闭HDF5文件
hf.close()
英文:

Here is a sample code that you can understand how chunks work in hdf5, I developed it in a general way, you can modify it based on you requirements:

import h5py
import numpy as np

# Generate random data
bla = np.random.randint(0, 100, (1536, 2048, 11, 18, 2))

# Create the HDF5 file and dataset
with h5py.File('test.h5', 'w') as f:
    grp = f.create_group('Measurement 1')
    grp.create_dataset('data', data=bla, chunks=(768, 1024, 1, 1, 1))

# Open the HDF5 file
with h5py.File('test.h5', 'r') as hf:
    # Access the dataset
    ds_hf = hf['Measurement 1']['data']
    print(ds_hf)
    print(ds_hf.shape, ds_hf.dtype)

    # Iterate over the chunks
    for chunk_idx in np.ndindex(ds_hf.chunks):
        chunk = ds_hf[chunk_idx]
        # Process the chunk
        chunk_mean = np.mean(chunk)
        print(f"Chunk {chunk_idx}: Mean value = {chunk_mean}")

# Close the HDF5 file
hf.close()

答案2

得分: 1

Chunks are used to optimize I/O performance. HDF5 (and h5py) write/read data in chunked blocks (1 chunk at a time). This is handled in the background, and you do not have to worry about the chunking mechanism. The chunk size/shape is defined when you create the dataset, and cannot be changed. If you need to change it, there are HDF5 utilities to do this.

When reading data you don't have to worry about chunksize (in general). **See comments at end for more details. Use Numpy slice notation to read the desired slice, and h5py/HDF5 will read for you. YOU DO NOT HAVE TO WRITE YOUR CODE TO READ EXACTLY 1 CHUNK AT A TIME.

Assuming axis 0 is the image index, the code below will read each image array to the image object (as a numpy array). It's much easier and cleaner than working with the chunk objects.

with h5py.File('test.h5', 'r') as hf:
    ds_hf = hf['Measurement 1']['data'] # returns HDF5 dataset objects
    print(ds_hf.shape)
    for i in range(len(ds_hf.shape[0])):
        image = ds_hf[i] # this returns numpy array for image i

Although you don't have to worry about chunk size to read and write data, it's important to set an appropriate size for your use. That discussion goes beyond your question. Your size is good for your application.

英文:

Chunks are used to optimize I/O performance. HDF5 (and h5py) write/read data in chunked blocks (1 chunk at a time). This is handled in the background, and you do not have to worry about the chunking mechanism. The chunk size/shape is defined when you create the dataset, and cannot be changed. If you need to change it, there are HDF5 utilities to do this.

When reading data you don't have to worry about chunksize (in general). **See comments at end for more details. Use Numpy slice notation to read the desired slice, and h5py/HDF5 will read for you. YOU DO NOT HAVE TO WRITE YOUR CODE TO READ EXACTLY 1 CHUNK AT A TIME.

Assuming axis 0 is the image index, the code below will read each image array to the image object (as a numpy array). It's much easier and cleaner than working with the chunk objects.

with h5py.File('test.h5', 'r') as hf:
    ds_hf = hf['Measurement 1']['data'] # returns HDF5 dataset objects
    print(ds_hf.shape)
    for i in range(len(ds_hf.shape[0]):
        image = ds_hf[i] # this returns numpy array for image i

Although you don't have to worry about chunk size to read and write data, it's important to set an appropriate size for your use. That discussion goes beyond your question. Your size is good for your application.

huangapple
  • 本文由 发表于 2023年7月13日 15:56:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76677131.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定