问题

I have an HDF5 file I am trying to open with Python or MATLAB. The HDF5 file consists of several datasets all in the root folder, which were saved to the file in some order. I want to extract the datasets in the order they were written. I know that the order they were written is encoded in the HDF5 file, because when I open it with HDFView there is an "Object Ref" number associated with each dataset. These Object Ref IDs are lower for datasets that were written earlier / higher for datasets that are written later.

When I hope the file with Python (h5py package), the datasets are extracted in alphabetical order. I can't figure out any way to extract the Object Ref I see in HDFView to process in Python. Is there any way to extract the datasets in order in Python or MATLAB (or any other platform)?

This is the code I used in Python to get the datasets in alphabetical order

with h5py.File(file) as f:
    keys = f.keys()
    for k in keys: print(k)

I'm looking for a way to do something like this

with h5py.File(file) as f:
    keys = f.keys()
    object_refs = f.object_refs()
    indexes_in_sorted_order = object_refs.sorted_order() # pseudocode
    for i in indexes_in sorted_order: print(keys[i])

（我理解你不需要翻译代码部分，只需要翻译代码之外的内容，所以我只提供了翻译好的部分。）

英文:

This is the code I used in Python to get the datasets in alphabetical order

with h5py.File(file) as f:       
        keys = f.keys()
        for k in keys: print(k)

I'm looking for a way to do something like this

with h5py.File(file) as f:       
        keys = f.keys()
        object_refs = f.object_refs()
        indexes_in_sorted_order = object_refs.sorted_order() # pseudocode
        for i in indexes_in_sorted_order: print(keys[i])

答案1

得分: 1

@Homer512 是正确的，h5py 没有一个获取该值的 API。也就是说，您可能可以使用数据集的 "offset" 值。我进行了一些有限的测试，针对不是按字母顺序创建的数据集。偏移值似乎是根据创建顺序增加的。要执行此操作，您必须使用引用 DataSetID 的低级 API。

以下是一个示例，创建了 6 个不按字母顺序的数据集。创建后，它循环遍历数据集，创建了一个 [name]:offset 字典，然后根据值重新排序字典。它在排序后的字典中循环遍历名称，以获取按偏移排序的数据集（如果您不关心偏移值，也可以创建数据集名称的排序列表）。

注意：如果您经常要执行此操作，建议将创建时间添加为数据集属性。

请参见下面的代码：

ds_names = ['alpha', 'zebra', 'bravo', 'yankee', 'charlie', 'xray']
cnt = 1
with h5py.File('SO_75624797.h5','w') as h5f:
    for name in ds_names:
        h5f.create_dataset(name, data=np.arange(cnt,cnt+10))
        cnt += 10

offset_dict = {}    
with h5py.File('SO_75624797.h5') as h5f:
    for dset in h5f:
        print(f"for dset: {dset}, Offset: {h5f[dset].id.get_offset()}")
        offset_dict[dset] = h5f[dset].id.get_offset()
        
    offset_dict = {k: v for k, v in sorted(offset_dict.items(), key=lambda item: item[1])}

    print('')
    for dset in offset_dict:
        print(f"for dset: {dset}, Offset: {h5f[dset].id.get_offset()}")

希望对您有所帮助。

英文:

@Homer512 is correct, h5py doesn't have an API to get that value. That said, you might be able to use the dataset's "offset" value. I did some limited testing for datasets that are NOT created in alphabetical order. The offset values appear to increase based on order of creation. To do this you have to use a low level API that references the DataSetID.

Here is an example that creates 6 datasets that are not in alphabetical order. after creating, it loops over the datasets, creates a dictionary of [name]:offset, then reorders the dictionary based on the value. It loops over the names in the sorted dictionary to get the datasets in offset order. (You could also create a sorted list of the dataset names if you're not interested in the offset value.)

Note: If you are going to do this frequently, I suggest adding creation time as a dataset attribute.

See code below:

ds_names = [&#39;alpha&#39;, &#39;zebra&#39;, &#39;bravo&#39;, &#39;yankee&#39;, &#39;charlie&#39;, &#39;xray&#39;] 
cnt = 1
with h5py.File(&#39;SO_75624797.h5&#39;,&#39;w&#39;) as h5f:
    for name in ds_names:
        h5f.create_dataset(name, data=np.arange(cnt,cnt+10))
        cnt += 10

offset_dict = {}    
with h5py.File(&#39;SO_75624797.h5&#39;) as h5f:
    for dset in h5f:
        print(f&quot;for dset: {dset}, Offset: {h5f[dset].id.get_offset()}&quot;)
        offset_dict[dset] = h5f[dset].id.get_offset()
        
    offset_dict = {k: v for k, v in sorted(offset_dict.items(), key=lambda item: item[1])}

    print(&#39;&#39;)
    for dset in offset_dict:
        print(f&quot;for dset: {dset}, Offset: {h5f[dset].id.get_offset()}&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从HDF5文件中按创建顺序提取数据集

问题

答案1

separation of training data pyTorch

Python的POST请求在控制台上无法接收POST数据，但在Postman上运行正常。

要求用户在Python中输入一个函数。

图例在保存图形时被切断。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论