英文:
Extracting Datasets from HDF5 File in Order Created
问题
I have an HDF5 file I am trying to open with Python or MATLAB. The HDF5 file consists of several datasets all in the root folder, which were saved to the file in some order. I want to extract the datasets in the order they were written. I know that the order they were written is encoded in the HDF5 file, because when I open it with HDFView there is an "Object Ref" number associated with each dataset. These Object Ref IDs are lower for datasets that were written earlier / higher for datasets that are written later.
When I hope the file with Python (h5py package), the datasets are extracted in alphabetical order. I can't figure out any way to extract the Object Ref I see in HDFView to process in Python. Is there any way to extract the datasets in order in Python or MATLAB (or any other platform)?
This is the code I used in Python to get the datasets in alphabetical order
with h5py.File(file) as f:
keys = f.keys()
for k in keys: print(k)
I'm looking for a way to do something like this
with h5py.File(file) as f:
keys = f.keys()
object_refs = f.object_refs()
indexes_in_sorted_order = object_refs.sorted_order() # pseudocode
for i in indexes_in sorted_order: print(keys[i])
(我理解你不需要翻译代码部分,只需要翻译代码之外的内容,所以我只提供了翻译好的部分。)
英文:
I have an HDF5 file I am trying to open with Python or MATLAB. The HDF5 file consists of several datasets all in the root folder, which were saved to the file in some order. I want to extract the datasets in the order they were written. I know that the order they were written is encoded in the HDF5 file, because when I open it with HDFView there is an "Object Ref" number associated with each dataset. These Object Ref IDs are lower for datasets that were written earlier / higher for datasets that are written later.
When I hope the file with Python (h5py package), the datasets are extracted in alphabetical order. I can't figure out any way to extract the Object Ref I see in HDFView to process in Python. Is there any way to extract the datasets in order in Python or MATLAB (or any other platform)?
This is the code I used in Python to get the datasets in alphabetical order
with h5py.File(file) as f:
keys = f.keys()
for k in keys: print(k)
I'm looking for a way to do something like this
with h5py.File(file) as f:
keys = f.keys()
object_refs = f.object_refs()
indexes_in_sorted_order = object_refs.sorted_order() # pseudocode
for i in indexes_in_sorted_order: print(keys[i])
答案1
得分: 1
@Homer512 是正确的,h5py 没有一个获取该值的 API。 也就是说,您可能可以使用数据集的 "offset" 值。 我进行了一些有限的测试,针对不是按字母顺序创建的数据集。 偏移值似乎是根据创建顺序增加的。 要执行此操作,您必须使用引用 DataSetID 的低级 API。
以下是一个示例,创建了 6 个不按字母顺序的数据集。 创建后,它循环遍历数据集,创建了一个 [name]:offset
字典,然后根据值重新排序字典。 它在排序后的字典中循环遍历名称,以获取按偏移排序的数据集(如果您不关心偏移值,也可以创建数据集名称的排序列表)。
注意:如果您经常要执行此操作,建议将创建时间添加为数据集属性。
请参见下面的代码:
ds_names = ['alpha', 'zebra', 'bravo', 'yankee', 'charlie', 'xray']
cnt = 1
with h5py.File('SO_75624797.h5','w') as h5f:
for name in ds_names:
h5f.create_dataset(name, data=np.arange(cnt,cnt+10))
cnt += 10
offset_dict = {}
with h5py.File('SO_75624797.h5') as h5f:
for dset in h5f:
print(f"for dset: {dset}, Offset: {h5f[dset].id.get_offset()}")
offset_dict[dset] = h5f[dset].id.get_offset()
offset_dict = {k: v for k, v in sorted(offset_dict.items(), key=lambda item: item[1])}
print('')
for dset in offset_dict:
print(f"for dset: {dset}, Offset: {h5f[dset].id.get_offset()}")
希望对您有所帮助。
英文:
@Homer512 is correct, h5py doesn't have an API to get that value. That said, you might be able to use the dataset's "offset" value. I did some limited testing for datasets that are NOT created in alphabetical order. The offset values appear to increase based on order of creation. To do this you have to use a low level API that references the DataSetID.
Here is an example that creates 6 datasets that are not in alphabetical order. after creating, it loops over the datasets, creates a dictionary of [name]:offset
, then reorders the dictionary based on the value. It loops over the names in the sorted dictionary to get the datasets in offset order. (You could also create a sorted list of the dataset names if you're not interested in the offset value.)
Note: If you are going to do this frequently, I suggest adding creation time as a dataset attribute.
See code below:
ds_names = ['alpha', 'zebra', 'bravo', 'yankee', 'charlie', 'xray']
cnt = 1
with h5py.File('SO_75624797.h5','w') as h5f:
for name in ds_names:
h5f.create_dataset(name, data=np.arange(cnt,cnt+10))
cnt += 10
offset_dict = {}
with h5py.File('SO_75624797.h5') as h5f:
for dset in h5f:
print(f"for dset: {dset}, Offset: {h5f[dset].id.get_offset()}")
offset_dict[dset] = h5f[dset].id.get_offset()
offset_dict = {k: v for k, v in sorted(offset_dict.items(), key=lambda item: item[1])}
print('')
for dset in offset_dict:
print(f"for dset: {dset}, Offset: {h5f[dset].id.get_offset()}")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论