Python:re.findall结果未附加到字典键

huangapple go评论74阅读模式
英文:

Python: re.findall result not appending to dictionary key

问题

问题很可能出现在代码的最后两行。然而,我不确定具体原因,因为.update 不是将列表附加到字典键的有效方法。在这里,你应该使用.append 方法来将匹配到的时间数据附加到相应的列表。以下是修复后的代码:

import re

timings = {
    "download": [],
    "commit": [],
    "reset": []
}   

# List of key phrases to look for in the text file
pattern = ["Download Time: ([\d.]+) sec", 
           "Commit Time: ([\d.]+) sec", 
           "Reset Time: ([\d.]+) sec"]

# Loop through each key phrase and put timing data into proper list
# This is not well optimized lol (nested for loops)
for i in range(len(pattern)):
    with open(file, 'r') as f:
        # Open file contents and search for pattern
        content = f.read()
        matches = re.findall(pattern[i], content)

        # From collected matching phrases, append timing data into proper lists
        for j, key in enumerate(timings):
            timings[key].append(matches[j])

# 现在,timing 字典中的值应该包含正确的时间数据

上述代码使用.append方法将匹配的时间数据附加到字典中的相应列表中,而不是使用.update。这样,你应该可以得到正确的结果。

英文:

So I'm trying to go through a text file, grab the timings using re.findall, and then append the resulting list to the dictionary key. Here are the contents of the file:

Download Time: 1.234 sec

Commit Time: 4.321 sec

Reset Time: 6.96969 sec
***
Download Time: 8.313412 sec

Commit Time: 4.20420 sec

Reset Time: 9.699234 sec
***
Download Time: 5.678 sec

Commit Time: 2.3151 sec

Reset Time: 9.325346 sec
***

Now my issue is that I'm getting an empty dictionary result after the process is done. Not entirely sure why this is happening. Here's my code:

import re

timings = {
    "download": [],
    "commit": [],
    "reset": []
}   

# List of key phrases to look for in the text file
pattern = [r"Download Time: ([\d.]+) sec", 
           r"Commit Time: ([\d.]+) sec", 
           r"Reset Time: ([\d.]+) sec"]

# Loop through each key phrase and put timing data into proper list
# This is not well optimized lol (nested for loops)
for i in range(len(pattern)):
    with open(file, 'r') as f:
        # Open file contents and search for pattern
        content = f.read()
        matches = re.findall(pattern[i], content)

        # From collected matching phrases, append timing data into proper lists
        for key in timings:
            timings.update({key: matches})

The problem most likely lies in the last 2 lines of code. However, I'm not exactly sure what it would be since .update is a valid way to append a list into a key.

答案1

得分: 2

`for key in timings` 循环会用当前匹配集替换字典中的所有键因此最后它们都会包含最后一个模式的匹配项

你应该并行遍历这两个列表使用相应模式的结果更新每个键

```python
with open(file, 'r') as f:
    content = f.read()

for key, pattern in zip(timings, patterns):
    timings[key] = re.findall(pattern, content)

<details>
<summary>英文:</summary>

The `for key in timings` loop is replacing all the keys in the dictionary with the current set of matches. So at the end they&#39;ll all contain the matches of the last pattern.

You should loop over the two lists in parallel, updating each key with the results of the corresponding pattern.

with open(file, 'r') as f:
content = f.read()

for key, pattern in zip(timings, patterns):
timings[key] = re.findall(pattern, content)



</details>



# 答案2
**得分**: 1

以下是翻译好的部分:

这是另一种解决方案,而不需要大量更改您的代码:
```python
for key in pattern:
    with open(file, 'r') as f:
        # 打开文件内容并搜索模式
        content = f.read()
        matches = re.findall(key, content)

    # 从收集的匹配短语中将时序数据附加到适当的列表中
    timings[key.split(" ")[0].lower()].extend(matches)

我在开头更改了for循环,如您所见,所以现在我们不使用索引了。
关于最后一行,为了获取键的名称,您将正则表达式模式分割为子字符串,使用空格作为分隔符,保留第一个元素,然后将其小写化。

例如:

字符串:"Download Time: ([\d.]+) sec"

将分解为:['Download', 'Time:', '([\d.]+)', 'sec'] 当使用 split(" ") 时。

如果我们保留第一个元素(0索引)并将其小写化,我们将得到:download。对于“commit”和“reset”也是如此。

您也可以使用.partition(" ")而不是split(" "),它将将整个字符串分成2部分。第一个部分是给定分隔符之前的部分,第二个部分是分隔符之后的部分。在我们的情况下,分隔符是空格。

无论如何,我建议您学习编写像@Barmar提供的解决方案一样的代码。我的解决方案只是对现有代码进行较小更改的替代视角。

英文:

Here is another solution without changing your code a lot:

for key in pattern:
    with open(file, &#39;r&#39;) as f:
        # Open file contents and search for pattern
        content = f.read()
        matches = re.findall(key, content)

    # From collected matching phrases, append timing data into proper lists
    timings[key.split(&quot; &quot;)[0].lower()].extend(matches)

I changed the for loop in the beginning as you can see, so we don't use an index now.
About the last line, in order to take the name of the key, you split the regex pattern in sub-strings with " " (whitespace) as delimiter, you keep the first element, and then lower its letters.

For example:

The string: &quot;Download Time: ([\d.]+) sec&quot;

will break into: [&#39;Download&#39;, &#39;Time:&#39;, &#39;([\\d.]+)&#39;, &#39;sec&#39;] whensplit(&quot; &quot;).

If we keep the first element (0-index) and lower it, we will have: download.
The same will happen for "commit" and "reset".

You can also use .partition(&quot; &quot;) instead of split(&quot; &quot;) which will break the whole string in 2 parts. The first is before the given delim. and the second is after it. In our case, delim. is whitespace.


Whatsoever, I would recommend you to learn to code solutions like the one @Barmar provided. My solution is just an alternative perspective if you would like to do smaller changes to the existing code.

答案3

得分: 1

不同的方法需要使用命名组模式而不是通常的组,并使用finditer迭代匹配结果。

命名组使用以下语法声明在(?P<name>...)中。

优点是组的名称可以用作字典的键。

import re

text = # 从文件中读取

patterns = [r"Download Time: (?P<download>[\d.]+) sec", 
           r"Commit Time: (?P<commit>[\d.]+) sec", 
           r"Reset Time: (?P<reset>[\d.]+) sec"]

timings = {}
for p in patterns:
    for m in re.finditer(p, text):
        for key, v in m.groupdict().items():
            timings.setdefault(key, []).append(v)

print(timings)
英文:

Different approach which required named group patterns instead of usual group and finditer to iterate over the matched results.

Named groups are declared with the following syntax in a (?P&lt;name&gt;...).

The advantage is that the name of the group can be used as the key of the dictionary.

import re


text = # read from file


patterns = [r&quot;Download Time: (?P&lt;download&gt;[\d.]+) sec&quot;, 
           r&quot;Commit Time: (?P&lt;commit&gt;[\d.]+) sec&quot;, 
           r&quot;Reset Time: (?P&lt;reset&gt;[\d.]+) sec&quot;]


timings = {}
for p in patterns:
    for m in re.finditer(p, text):
        for key, v in m.groupdict().items():
            timings.setdefault(key, []).append(v)

print(timings)

huangapple
  • 本文由 发表于 2023年6月22日 15:56:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76529694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定