使用Python配对文件

huangapple go评论75阅读模式
英文:

Pair files using Python

问题

我有一个包含多个.tif文件的文件夹,我想要将它们配对以在一个for循环中执行一些函数。

例如:

smp001_GFP.tif

smp001_mCherry.tif
(这应该是一对)

smp002_GFP.tif

smp002_mCherry.tif
(这是另一对)

我希望for循环能够遍历每一对文件并执行一些函数。例如:

for pair in folder:
    img_GFP=cv2.imread(pair.__contains__("GFP"))
    img_mCherry=cv2.imread(pair.__contains__("mCherry"))

有人告诉我可以使用字典来配对这些文件,但你会推荐使用哪种策略来实现?

谢谢!

英文:

I have a folder with several .tif files that I would like to pair to perform some functions inside a for loop.

For example:

smp001_GFP.tif

smp001_mCherry.tif
(this should be a pair)

smp002_GFP.tif

smp002_mCherry.tif
(this another pair)

I would like the for loop to iterate over each pair and perform some functions. For example:

**for** pair **in** folder:
         img_GFP=cv2.imread(pair.__contains__("GFP"))
         img_mCherry=cv2.imread(pair.__contains__("mCherry"))

I've been told that I could pair the files using dictionaries, but which strategy will you recommend to do so?

Thanks!

答案1

得分: 1

以下是翻译好的部分:

一些额外的信息/代码可能会有所帮助,但为了提供一个大致的想法,您可以创建一个字典,然后循环遍历您的文件名,为每对编号创建一个新的键。基本上:

pairs_dict = {}
for file_name in folder:
    # 获取配对的前缀
    # 假设文件名格式为'smp000_...'
    key = file_name.split('_')[0] # 获取'smpXXX'
    # 然后在我们的字典中为其创建一个键。
    pairs_dict[key] = []
...
for pair_prefix in list(pairs_dict.keys()):
    # 'get_file()'是您使用的模块的用于按名称获取文件的函数
    img_GFP = get_file(pair_prefix + '_GFP.tif')
    img_mCherry = get_file(pair_prefix + '_mCherry.tif')

请注意,代码中的注释也已经被翻译。

英文:

Some additional info/code would be helpful, but to give a general idea, what you can do is create a dictionary and then loop through your file names and create a new key for each numbered pair. Essentially:

pairs_dict = {}
for file_name in folder:
    # Get the prefix for the pair
    # assuming the filename format 'smp000_...'
    key = file_name.split('_')[0] # grabs 'smpXXX'
    # Then create a key in our dictionary for it. 
    pairs_dict[key] = []
...
for pair_prefix in list(pairs_dict.keys()):
    # 'get_file()' being whatever function the module 
    # you use has for grabbing files by name
    img_GFP = get_file(pair_prefix + '_GFP.tif')
    img_mCherry = get_file(pair_prefix + '_mCherry.tif')

答案2

得分: 1

嵌套字典将很好地工作。外部字典的键001、002等将映射到内部字典,其中包含{"GFP":文件名,"mCherry":文件名}项。如果您对外部字典使用defaultdict,它将在首次访问时自动创建内部字典。使用正则表达式从字符串中获取标识符。

import re
from collections import defaultdict
import os

tif_name_re = re.compile(r"smp(\d+)_(GFP|mCherry)\.tif")
tif_map = defaultdict(dict)

for name in os.listdir("some/directory"):
    m = tif_name_re.match(name)
    if m:
        tif_map[m.group(1)][m.group(2)] = m.group(0)

for key, value in tif_map.items():
    print(key, value)

输出

001 {'GFP': 'smp001_GFP.tif', 'mCherry': 'smp001_mCherry.tif'}
002 {'GFP': 'smp002_GFP.tif', 'mCherry': 'smp002_mCherry.tif'}
英文:

Nested dicts would work well. The outer dict keys 001, 002, etc... would map to inner dicts that hold {"GFP":filename, "mCherry:filename} items. If you use defaultdict for the outer dict, it will automatically create the inner dicts on first access. Use a regular expression to get the identifiers from the string.

import re
from collections import defaultdict
import os

tif_name_re = re.compile(r"smp(\d+)_(GFP|mCherry)\.tif")
tif_map = defaultdict(dict)

for name in os.listdir("some/directory"):
    m = tif_name_re.match(name)
    if m:
        tif_map[m.group(1)][m.group(2)] = m.group(0)

for key,value in tif_map.items():
    print(key, value)

Output

001 {'GFP': 'smp001_GFP.tif', 'mCherry': 'smp001_mCherry.tif'}
002 {'GFP': 'smp002_GFP.tif', 'mCherry': 'smp002_mCherry.tif'}

答案3

得分: 0

这里是翻译好的部分:

"Here's a different view. Let's assume that the GFP and mCherry parts of the filenames are irrelevant but that the common part is actually that which precedes the underscore.

If that's the case then:

from glob import glob
from os.path import basename, join

DIRECTORY = './tifs' # directory contains the tif files
result = dict()

for filename in sorted(map(basename, glob(join(DIRECTORY, '*.tif')))):
key, _ = filename.split('_')
result.setdefault(key, []).append(filename)

print(result)

Output:

{'smp002': ['smp002_mCherry.tif', 'smp002_GFP.tif'], 'smp001': ['smp001_mCherry.tif', 'smp001_GFP.tif']}

This gives us a dictionary keyed on the preamble and the "pairs" as a list for each key"

英文:

Here's a different view. Let's assume that the GFP and mCherry parts of the filenames are irrelevant but that the common part is actually that which precedes the underscore.

If that's the case then:

from glob import glob
from os.path import basename, join

DIRECTORY = './tifs' # directory contains the tif files
result = dict()
 
for filename in sorted(map(basename, glob(join(DIRECTORY, '*.tif')))):
    key, _ = filename.split('_')
    result.setdefault(key, []).append(filename)

print(result)

Output:

{'smp002': ['smp002_mCherry.tif', 'smp002_GFP.tif'], 'smp001': ['smp001_mCherry.tif', 'smp001_GFP.tif']}

This gives us a dictionary keyed on the preamble and the "pairs" as a list for each key

huangapple
  • 本文由 发表于 2023年1月9日 01:45:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75050043.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定