2023年6月8日 02:32:22go评论193阅读模式

英文:

List of lists of dictionaries with the same keys but different values

问题

以下是翻译好的部分：

首先，请允许我要求您不要对这篇帖子进行负面评价。我尝试过其他帖子，其中包含一个“最小可重现示例”，但没有成功。日志太复杂了。到目前为止，没有人能够帮助我。

我需要从防病毒软件的日志中收集特定的键/值对。除了一个键/值对之外，我已经成功收集了所有我需要的键/值对，那就是防病毒软件采取的操作。一切都围绕着“指示器”键展开，其中包含一个包含有关找到的病毒/恶意软件的某些信息的字典列表。这些字典中的每一个都以一个id号码（1、2、3、4…）开头，这个id号码根据病毒/恶意软件的不同而不同。虽然这些字典具有完全相同的结构（和相同的键名），但它们的值不同。请看下面的摘录：

"indicators": [
{
"id": 1,
"type": "detection_name",
"field": "malName",
"value": "HKTL_CAIN",
"relatedEntities": [
"C888A5B2"
],
"filterIds": [
"a665ee2c"
],
"provenance": [
"Alert"
]
},
{
"id": 2,
"type": "file_sha1",
"field": "fileHash",
"value": "",
"relatedEntities": [
"C888A5B2"
],
"filterIds": [
"a665ee2c"
],
"provenance": [
"Alert"
]
},
{
"id": 3,
"type": "filename",
"field": "fileName",
"value": "D:\ECIH",
"relatedEntities": [
"C888A5B2"
],
"filterIds": [
"a665ee2c"
],
"provenance": [
"Alert"
]
},
{
"id": 4,
"type": "fullpath",
"field": "fullPath",
"value": "D:\ECIH",
"relatedEntities": [
"C888A5B2"
],
"filterIds": [
"a665ee2c"
],
"provenance": [
"Alert"
]
},
{
"id": 5,
"type": "text",
"field": "actResult",
"value": "File cleaned",
"relatedEntities": [
"C888A5B2"
],
"filterIds": [
"a665ee2c"
],
"provenance": [
"Alert"
]
},
{
"id": 6,
"type": "text",
"field": "scanType",
"value": "Scheduled Scan",
"relatedEntities": [
"C888A5B2"
],
"filterIds": [
"a665ee2c"
],
"provenance": [
"Alert"
]
}
]

请注意，id 1与恶意软件名称相关。id 2是哈希值，id 3是文件名，等等。我感兴趣的是id 5，其中包含防病毒软件采取的操作。有一个很长的可能操作列表，但为了举例说明，操作有'File cleaned'和'File quarantined.'。操作总是在键'value'中找到，但问题是'value'出现在任何地方。我注意到我需要的'value'（服务器操作）总是与'field'中的'actResult'值配对出现，'actResult'也出现在任何地方。

{
"id": 5,
"type": "text",
"field": "actResult",
"value": "File cleaned",
"relatedEntities": [
"C888A5B2"
],
"filterIds": [
"a665ee2c"
],
"provenance": [
"Alert"
]
}

另一个问题是，这些日志的长度并不总是相同的，因此有些可能有id1、id2，..id5，而其他可能有11个id。没有什么是一致的。不管怎样，我需要做的是捕获我需要的值，并将它们放入数据框中，但是由于'value'键到处都有，脚本有问题。

在我提供的示例中，总共有10条记录，所以我将有10个ID。但由于病毒类型不同，病毒的信息也会发生变化，所以'指示器'也会发生变化。因此，在记录缺失的地方，我将其替换为' - '。但由于'value'键到处都有，最终我得到了不均匀数量的ID/操作。

请参考这里的日志（https://codeshare.io/j0yX1A）。下面是脚本：

actions = ['File cleaned', 'File deleted', 'File quarantined']
actions_list = []
action_list = []

id = [id['id'] for id in logs]
print(id)

for log in logs:
for indicator in log['indicators']:
if indicator['value'] in actions:
action_list.append(indicator['value'])
else:
action_list.append('-')
print(action_list)

当前输出：

如您所见，当前脚本捕获了所有'value'键，而不仅仅是那些在操作列表中的值。

预期输出

如果没有键/值，将其替换为' - '。

['WB-13273-20230604-00000', 'WB-13273-20230603-00000', 'WB-13273-20230601-00001', 'WB-13273-20230601-00000', 'WB-13273-20230529-00000', 'WB-13273-20230526-00000', 'WB-13273-20230523-00001', 'WB-13273-20230523-00000', 'WB-13273-20230510-00002', 'WB-13273-20230510-00003']
[' - ',' - ','File cleaned', 'File cleaned', ' - ','File cleaned', ' - ','File quarantined', 'File quarantined', 'File quarantined']
相同数量的ID和操作。

那么，我如何从'field'键中收集操作值，并将缺失的记录替换为' - '，同时忽略不需要的其他'value'键？

英文:

Firs off, let me preface this by asking you not to downvote the post. I've tried other posts with a 'minimal reproducible example' but it didn't work. The log is too complex. So far no one's been able to help.

I need to collect certain key/value pairs from logs from the antivirus. I’ve been able to collect all key/value pairs I need except for one, the action taken by the antivirus.
Everything revolves around the ‘indicators’ key, which contains a list of dicts with each containing a certain piece of information about the virus/malware found. Each of these dicts starts with an id number (1,2,3,4…) which varies depending on the virus/malware. While these dicts have the exact same structure (and same key names), their values differ. Take a gander at the excerpt below:

&quot;indicators&quot;: [
            {
                &quot;id&quot;: 1,
                &quot;type&quot;: &quot;detection_name&quot;,
                &quot;field&quot;: &quot;malName&quot;,
                &quot;value&quot;: &quot;HKTL_CAIN&quot;,
                &quot;relatedEntities&quot;: [
                    &quot;C888A5B2&quot;
                ],
                &quot;filterIds&quot;: [
                    &quot;a665ee2c&quot;
                ],
                &quot;provenance&quot;: [
                    &quot;Alert&quot;
                ]
            },
            {
                &quot;id&quot;: 2,
                &quot;type&quot;: &quot;file_sha1&quot;,
                &quot;field&quot;: &quot;fileHash&quot;,
                &quot;value&quot;: &quot;&quot;,
                &quot;relatedEntities&quot;: [
                    &quot;C888A5B2&quot;
                ],
                &quot;filterIds&quot;: [
                    &quot;a665ee2c&quot;
                ],
                &quot;provenance&quot;: [
                    &quot;Alert&quot;
                ]
            },
            {
                &quot;id&quot;: 3,
                &quot;type&quot;: &quot;filename&quot;,
                &quot;field&quot;: &quot;fileName&quot;,
                &quot;value&quot;: &quot;D:\\ECIH&quot;,
                &quot;relatedEntities&quot;: [
                    &quot;C888A5B2&quot;
                ],
                &quot;filterIds&quot;: [
                    &quot;a665ee2c&quot;
                ],
                &quot;provenance&quot;: [
                    &quot;Alert&quot;
                ]
            },
            {
                &quot;id&quot;: 4,
                &quot;type&quot;: &quot;fullpath&quot;,
                &quot;field&quot;: &quot;fullPath&quot;,
                &quot;value&quot;: &quot;D:\\ECIH&quot;,
                &quot;relatedEntities&quot;: [
                    &quot;C888A5B2&quot;
                ],
                &quot;filterIds&quot;: [
                    &quot;a665ee2c&quot;
                ],
                &quot;provenance&quot;: [
                    &quot;Alert&quot;
                ]
            },
            {
                &quot;id&quot;: 5,
                &quot;type&quot;: &quot;text&quot;,
                &quot;field&quot;: &quot;actResult&quot;,
                &quot;value&quot;: &quot;File cleaned&quot;,
                &quot;relatedEntities&quot;: [
                    &quot;C888A5B2&quot;
                ],
                &quot;filterIds&quot;: [
                    &quot;a665ee2c&quot;
                ],
                &quot;provenance&quot;: [
                    &quot;Alert&quot;
                ]
            },
            {
                &quot;id&quot;: 6,
                &quot;type&quot;: &quot;text&quot;,
                &quot;field&quot;: &quot;scanType&quot;,
                &quot;value&quot;: &quot;Scheduled Scan&quot;,
                &quot;relatedEntities&quot;: [
                    &quot;C888A5B2&quot;
                ],
                &quot;filterIds&quot;: [
                    &quot;a665ee2c&quot;
                ],
                &quot;provenance&quot;: [
                    &quot;Alert&quot;
                ]
            }
        ]

Note that id 1 relates to the malware name. Id 2 is the hash value, id 3 is the file name, etc. What I'm interested in is in id 5, which contains the action taken by the antivirus. There is a long list of possible actions, but for exemplification purposes, the actions are 'File cleaned' and 'File quarantined.' The action is always found in the key 'value', but the problem is that 'value' appears everywhere. I noticed that the 'value' I need (server action) is always paired with the 'actResult' value in the 'field', which also appears everywhere.

        {
            &quot;id&quot;: 5,
            &quot;type&quot;: &quot;text&quot;,
            &quot;field&quot;: &quot;actResult&quot;,
            &quot;value&quot;: &quot;File cleaned&quot;,
            &quot;relatedEntities&quot;: [
                &quot;C888A5B2&quot;
            ],
            &quot;filterIds&quot;: [
                &quot;a665ee2c&quot;
            ],
            &quot;provenance&quot;: [
                &quot;Alert&quot;
            ]
        }

Another issue is that these logs aren't always the same length, so some have id1, id2,..id5 whereas others might have 11 ids. Nothing is consistent. Regardless, what I need to do is to capture the values I need and put them into a dataframe, but given that the 'value' key appears everywhere, the script is faulty.

In the sample I provide, there are 10 records total, so I'll have 10 IDs. But since the info on the virus changes based on the virus type, so do the 'indicators'. Hence, where a record is missing, I replace it with a ' - '. But since the 'value' key appears everywhere, I end up with an uneven number of ID/Action.

Please refer to the log here (https://codeshare.io/j0yX1A). The script is below:

actions = [&#39;File cleaned&#39;, &#39;File deleted&#39;, &#39;File quarantined&#39;]
actions_list = []
action_list = []

id = [id[&#39;id&#39;] for id in logs]
print(id)

for log in logs:
        for indicator in log[&#39;indicators&#39;]:
                if indicator[&#39;value&#39;] in actions:
                        action_list.append(indicator[&#39;value&#39;])
                    else:
                                action_list.append(&#39;-&#39;)
print(action_list)

Current output:

As you can see, the current script picks up all 'value' keys rather than just the ones whose values are in the actions list.

Expected Output

If there is no key/value, replace it with a ' - '.

[&#39;WB-13273-20230604-00000&#39;, &#39;WB-13273-20230603-00000&#39;, &#39;WB-13273-20230601-00001&#39;, &#39;WB-13273-20230601-00000&#39;, &#39;WB-13273-20230529-00000&#39;, &#39;WB-13273-20230526-00000&#39;, &#39;WB-13273-20230523-00001&#39;, &#39;WB-13273-20230523-00000&#39;, &#39;WB-13273-20230510-00002&#39;, &#39;WB-13273-20230510-00003&#39;]
[&#39; - &#39;,&#39; - &#39;,&#39;File cleaned&#39;, &#39;File cleaned&#39;, &#39; - &#39;,&#39;File cleaned&#39;, &#39; - &#39;,&#39;File quarantined&#39;, &#39;File quarantined&#39;, &#39;File quarantined&#39;]

Same number of IDs and actions.

So how can I collect the action value from the "field" key and replace the missing records with a ' - ' while ignoring the other 'value' keys that aren't needed?

答案1

得分: 1

看起来你可能正在寻找for循环的else子句，以处理没有"actResults"的情况。我将这称为"无中断"子句，因为else是在for循环未执行break时发生的操作。

鉴于你的数据：

import json

with open("log.json", "r") as file_in:
    log_data = json.load(file_in)

action_list = []
for log_entry in log_data:
    for indicator in log_entry["indicators"]:
        if indicator.get("field") == "actResult":
            action_list.append((log_entry["id"], indicator["value"]))
            break
    else:
        action_list.append((log_entry["id"], "--"))

for action in action_list:
    print(action)

将返回你的10个列表：

('WB-13273-20230601-00001', '--')
('WB-13273-20230601-00000', '--')
('WB-13273-20230529-00000', '--')
('WB-13273-20230526-00000', 'File cleaned')
('WB-13273-20230523-00001', '--')
('WB-13273-20230523-00000', '--')
('WB-13273-20230510-00002', 'File quarantined')
('WB-13273-20230510-00003', 'File quarantined')
('WB-13273-20230510-00001', 'File quarantined')
('WB-13273-20230510-00000', 'File quarantined')

英文:

It looks like you might be looking for the else clause of the for loop to account for the cases where you have not "actResults". I call this the no break clause as the else is what happens in the event that the for loop did not do a break.

Given your data:

import json

with open(&quot;log.json&quot;, &quot;r&quot;) as file_in:
    log_data = json.load(file_in)

action_list = []
for log_entry in log_data:
    for indicator in log_entry[&quot;indicators&quot;]:
        if indicator.get(&quot;field&quot;) == &quot;actResult&quot;:
            action_list.append((log_entry[&quot;id&quot;], indicator[&quot;value&quot;]))
            break
    else:
        action_list.append((log_entry[&quot;id&quot;], &quot;--&quot;))

for action in action_list:
    print(action)

Will give you back your list of 10:

(&#39;WB-13273-20230601-00001&#39;, &#39;--&#39;)
(&#39;WB-13273-20230601-00000&#39;, &#39;--&#39;)
(&#39;WB-13273-20230529-00000&#39;, &#39;--&#39;)
(&#39;WB-13273-20230526-00000&#39;, &#39;File cleaned&#39;)
(&#39;WB-13273-20230523-00001&#39;, &#39;--&#39;)
(&#39;WB-13273-20230523-00000&#39;, &#39;--&#39;)
(&#39;WB-13273-20230510-00002&#39;, &#39;File quarantined&#39;)
(&#39;WB-13273-20230510-00003&#39;, &#39;File quarantined&#39;)
(&#39;WB-13273-20230510-00001&#39;, &#39;File quarantined&#39;)
(&#39;WB-13273-20230510-00000&#39;, &#39;File quarantined&#39;)

答案2

得分: -1

以下是您要求的代码的翻译部分：

actions_to_track = ['File cleaned', 'File deleted', 'File quarantined']

def get_actresult(log):
    for ind in log['indicators']:
        if ind.get('field') == 'actResult':  # 一个动作指示器
            if ind.get('value') in actions_to_track:
                return ind.get('value')

    return '-'

log_ids_to_actresult = {log['id']: get_actresult(log) for log in logs}
# => {'example_id_0': 'File cleaned;'}

在https://codeshare.io/j0yX1A的logs上运行，会产生以下结果：

{
  'WB-13273-20230510-00000': 'File quarantined',
  'WB-13273-20230510-00001': 'File quarantined',
  'WB-13273-20230510-00002': 'File quarantined',
  'WB-13273-20230510-00003': 'File quarantined',
  'WB-13273-20230523-00000': '-',
  'WB-13273-20230523-00001': '-',
  'WB-13273-20230526-00000': 'File cleaned',
  'WB-13273-20230529-00000': '-',
  'WB-13273-20230601-00000': '-',
  'WB-13273-20230601-00001': '-',
}

请注意，这是您提供的代码的翻译部分，没有其他内容。

英文:

Since you're still a relatively new user and you've shown some amount of effort at asking your (somewhat incomplete) question, I'll give you the benefit of the doubt. I think what you're looking for is:

actions_to_track = [&#39;File cleaned&#39;, &#39;File deleted&#39;, &#39;File quarantined&#39;]

def get_actresult(log):
    for ind in log[&#39;indicators&#39;]:
        if ind.get(&#39;field&#39;) == &#39;actResult&#39;:  # an action indicator
            if ind.get(&#39;value&#39;) in actions_to_track:
                return ind.get(&#39;value&#39;)

    return &#39;-&#39;

log_ids_to_actresult = {log[&#39;id&#39;]: get_actresult(log) for log in logs}
# =&gt; {&#39;example_id_0&#39;: &#39;File cleaned&#39;}

Running against logs from https://codeshare.io/j0yX1A, this produces:

{
  &#39;WB-13273-20230510-00000&#39;: &#39;File quarantined&#39;,
  &#39;WB-13273-20230510-00001&#39;: &#39;File quarantined&#39;,
  &#39;WB-13273-20230510-00002&#39;: &#39;File quarantined&#39;,
  &#39;WB-13273-20230510-00003&#39;: &#39;File quarantined&#39;,
  &#39;WB-13273-20230523-00000&#39;: &#39;-&#39;,
  &#39;WB-13273-20230523-00001&#39;: &#39;-&#39;,
  &#39;WB-13273-20230526-00000&#39;: &#39;File cleaned&#39;,
  &#39;WB-13273-20230529-00000&#39;: &#39;-&#39;,
  &#39;WB-13273-20230601-00000&#39;: &#39;-&#39;,
  &#39;WB-13273-20230601-00001&#39;: &#39;-&#39;,
}

However, I'm not sure because your question is unclear and missing key information. My answer is based on the following guesses I've had to make that you should've clarified in your question:

You've got a list of log dictionaries, each with "id" and "indicators" keys
There are a variable number of indicators per log. They are all dict of the same format
An "action indicator" will have a "field" of "actResult"
For each log, you want to extract the "value" from an action indicator if it exists and if the value is one of an explicit list of actions you care about, otherwise use "-"
The actResult values you care about are ['File cleaned', 'File deleted', 'File quarantined']

A complete but minimal-ish example of the logs and indicator data is:

logs = [
    {
        &quot;id&quot;: &quot;example_id_0&quot;,
        &quot;indicators&quot;: [
            {
                &quot;id&quot;: 1,
                &quot;type&quot;: &quot;detection_name&quot;,
                &quot;field&quot;: &quot;malName&quot;,
                &quot;value&quot;: &quot;HKTL_CAIN&quot;,
            },
            {
                &quot;id&quot;: 5,
                &quot;type&quot;: &quot;text&quot;,
                &quot;field&quot;: &quot;actResult&quot;,
                &quot;value&quot;: &quot;File cleaned&quot;,
            },
        ],
    },
]

It was good that you included your attempt code. It would've helped if you included annotations explaining what you were trying to do (even if you don't know how) and omitted extraneous information (e.g. about antiviruses or something).

Don't simply dismiss all the users giving you critique as "pedantic". People want to answer your question, but they can only do so if you include relevant information and exclude irrelevant information. Please take the above as an example of how you can ask a better question next time.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

具有相同键但不同值的字典的列表列表

问题

答案1

答案2

比较两个数据框使用Python。

Go Client Connect to an URL with Socket

如何使用Pandas获取CSV中的特定行。

PyPDF2 无法压缩 PDF。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论