2023年2月18日 20:43:54go评论65阅读模式

英文:

PHP preg_match_all extract id and name, where id in tag is optional

问题

我有以下的代码:

<?php
$html = '<div>
    <div class="block">
        <div class="id">10</div>
        <div class="name">first element</div>
    </div>
    <div class="block">
        <div class="name">second element</div>
    </div>
    <div class="block">
        <div class="id">30</div>
        <div class="name">third element</div>
    </div>
</div>';

preg_match_all('/<div class="block">[\s]+<div class="id">(.*?)<\/div>[\s]+<div class="name">(.*?)<\/div>[\s]+<\/div>/ms', $html, $matches);

print_r($matches);

我想要获得包含id和name的数组，但第二个位置没有id，所以我的preg_match跳过了这个。如何生成一个不跳过的数组，并打印出类似这样的内容 [ ... [id => 0 // 或 null, name => 'second element'] ...]?

英文:

I have following code:

&lt;?php
$html = &#39;&lt;div&gt;
    &lt;div class=&quot;block&quot;&gt;
        &lt;div class=&quot;id&quot;&gt;10&lt;/div&gt;
        &lt;div class=&quot;name&quot;&gt;first element&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;block&quot;&gt;
        &lt;div class=&quot;name&quot;&gt;second element&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;block&quot;&gt;
        &lt;div class=&quot;id&quot;&gt;30&lt;/div&gt;
        &lt;div class=&quot;name&quot;&gt;third element&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;&#39;;

preg_match_all(&#39;/&lt;div class=&quot;block&quot;&gt;[\s]+&lt;div class=&quot;id&quot;&gt;(.*?)&lt;\/div&gt;[\s]+&lt;div class=&quot;name&quot;&gt;(.*?)&lt;\/div&gt;[\s]+&lt;\/div&gt;/ms&#39;, $html, $matches);

print_r($matches);

I want to get array with id and name, but the second position doesn't have id, so my preg match skipped this one. How can I generate array without skip and print sth like this [ ... [id => 0 // or null, name => 'second element'] ...]?

答案1

得分: 1

使用 DOMDocument 来解决这个任务；有很多很好的理由不使用正则表达式。

假设您的 HTML 代码存储在 $html 变量中，请创建一个 DOMDocument 的实例，加载 HTML 代码，并初始化 DOMXPath：

$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom->loadHTML($html, LIBXML_NOBLANKS);
$dom->formatOutput = True;
$xpath = new DOMXPath($dom);

使用 DOMXPath 来搜索所有具有类名 "name" 的 <div> 节点，并为结果准备一个空数组：

$nodes = $xpath->query('//div[@class="name"]');
$result = array();

对于找到的每个节点，运行一个额外的查询以查找具有类名 "id" 的可选节点，然后将记录添加到结果数组中：

foreach ($nodes as $node) {
    $id = $xpath->query('div[@class="id"]', $node->parentNode);
    
    $result[] = array(
        'id' => $id->count() ? $id->item(0)->nodeValue : null,
        'name' => $node->nodeValue
    );
}

print_r($result);

这是结果：

Array
(
    [0] => Array
        (
            [id] => 10
            [name] => first element
        )

    [1] => Array
        (
            [id] => 
            [name] => second element
        )

    [2] => Array
        (
            [id] => 30
            [name] => third element
        )

)

英文:

Use DOMDocument to solve this task; there are a lot of good reasons not to use regular expressions.

Assuming your HTML code is stored in $html variable, create an instance of DOMDocument, load the HTML code, and initialize DOMXPath:

$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom-&gt;loadHTML($html, LIBXML_NOBLANKS);
$dom-&gt;formatOutput = True;
$xpath = new DOMXPath($dom);

Use DOMXPath to search for all <div> nodes with class "name" and prepare an empty array for the results:

$nodes = $xpath-&gt;query(&#39;//div[@class=&quot;name&quot;]&#39;);
$result = array();

For each node found, run an additional query to find the optional node with class "id", then add a record to the results array:

foreach ($nodes as $node) {
    $id = $xpath-&gt;query(&#39;div[@class=&quot;id&quot;]&#39;, $node-&gt;parentNode);
    
    $result[] = array(
        &#39;id&#39; =&gt; $id-&gt;count() ? $id-&gt;item(0)-&gt;nodeValue : null,
        &#39;name&#39; =&gt; $node-&gt;nodeValue
    );
}

print_r($result);

This is the result:

Array
(
    [0] =&gt; Array
        (
            [id] =&gt; 10
            [name] =&gt; first element
        )

    [1] =&gt; Array
        (
            [id] =&gt; 
            [name] =&gt; second element
        )

    [2] =&gt; Array
        (
            [id] =&gt; 30
            [name] =&gt; third element
        )

)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PHP preg_match_all提取id和name，其中标签中的id是可选的。

问题

答案1

如何从外部页面获取所有网址？

获取外部URL中最后一个斜杠后面的动态数字。

保护 Laravel 8 项目在 CyberPanel VPS 上安装的 .env、.yalm 和 .json 文件。

从一个smarty变量中获取星期几

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论