Parse formatted string with labels in ALL-CAPS followed by their value to generate an associative array

huangapple go评论71阅读模式
英文:

Parse formatted string with labels in ALL-CAPS followed by their value to generate an associative array

问题

$string = 'Audi MODEL 80 ENGINE 1.9 TDi';
list($make, $model, $engine) = preg_split('/( MODEL | ENGINE )/', $string);

在上面的PHP代码中,输入字符串$string被使用正则表达式/( MODEL | ENGINE )/进行分割,然后将结果存储到$make$model$engine变量中。

对于其他可能的变化,您可以使用类似的方法进行拆分,并相应地更改变量名称以匹配需要的字段。

英文:
$string = 'Audi MODEL 80 ENGINE 1.9 TDi';
list($make,$model,$engine) = preg_split('/( MODEL | ENGINE )/',$string);

Anything before "MODEL" would be considered "MAKE string".
Anything before "ENGINE" will be considered "MODEL string".
Anything after "ENGINE" is the "ENGINE string".

But we usually have more information in this string.

//  possible variations:
$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

$string = 'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 NOTE this engine needs custom stage GEAR auto';    

$string = 'Audi MODEL 80 ENGINE 1.9 TDi GEAR man YEAR 1996';

$string = 'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 DRIVE 2wd';

MODEL and ENGINE is always present, and is always the start of the string.

The rest (POWER,TORQUE,GEAR,DRIVE,YEAR,NOTE) may vary, both in sorting order, and if they're even there or not.

Since we can't know for sure how the ENGINE string ends, or which of the other keywords will be the first to come right after, I thought it would be possible to create an array with the keywords.
Then do some sort of a string search for first occurrence of a word that matches one of the keyword in the array.

I do need to keep the matched word.

Another way of putting this might be: "How to split the string on/before each occurrence of words in array"

答案1

得分: 1

为了保持包含关键词的“位”完整,您可以使用preg_split和一个前瞻,它将在空格后面跟着任何一个关键词时进行分割。例如:

$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

$bits = preg_split('~\s+(?=(MODEL|ENGINE|POWER|TORQUE|GEAR|DRIVE|YEAR|NOTE)\b)~', $string);

结果为:

array(8) {
    [0] => "Audi"
    [1] => "MODEL 80"
    [2] => "ENGINE 1.9 TDi"
    [3] => "POWER 90Hk"
    [4] => "TORQUE 202Nm"
    [5] => "GEAR man"
    [6] => "DRIVE 2wd"
    [7] => "YEAR 1996"
}

如果您想将这些解析成键/值对,很简单:

// 初始化数组;获取“未命名”的制造商:
$data = [
    'MAKE' => array_shift($bits),
];

// 遍历找到的其他已知键:
foreach($bits as $bit) {
    $pair = explode(' ', $bit, 2);
    $data[$pair[0]] = $pair[1];
}

结果为:

array(8) {
    ["MAKE"] => "Audi"
    ["MODEL"] => "80"
    ["ENGINE"] => "1.9 TDi"
    ["POWER"] => "90Hk"
    ["TORQUE"] => "202Nm"
    ["GEAR"] => "man"
    ["DRIVE"] => "2wd"
    ["YEAR"] => "1996"
}
英文:

To keep the "bits" intact with the keyword included, you can use preg_split with a lookahead that will split on a space followed by any one of your keywords. For example:

$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

$bits = preg_split('~\s+(?=(MODEL|ENGINE|POWER|TORQUE|GEAR|DRIVE|YEAR|NOTE)\b)~', $string);

Results in:

array(8) {
	[0] · string(4) "Audi"
	[1] · string(8) "MODEL 80"
	[2] · string(14) "ENGINE 1.9 TDi"
	[3] · string(10) "POWER 90Hk"
	[4] · string(12) "TORQUE 202Nm"
	[5] · string(8) "GEAR man"
	[6] · string(9) "DRIVE 2wd"
	[7] · string(9) "YEAR 1996"
}

If you want to parse these into key/value pairs, it's simple:

// Initialize array; get the "unnamed" make:
$data = [
	'MAKE' => array_shift($bits),
];

// Iterate any other known keys found:
foreach($bits as $bit) {
	$pair = explode(' ', $bit, 2);
	$data[$pair[0]] = $pair[1];
}

Results in:

array(8) {
	["MAKE"] · string(4) "Audi"
	["MODEL"] · string(2) "80"
	["ENGINE"] · string(7) "1.9 TDi"
	["POWER"] · string(4) "90Hk"
	["TORQUE"] · string(5) "202Nm"
	["GEAR"] · string(3) "man"
	["DRIVE"] · string(3) "2wd"
	["YEAR"] · string(4) "1996"
}

答案2

得分: 1

以下是代码的翻译部分:

如果您想要一个动态的关联数组:

  1. 在字符串前面添加 MAKE
  2. 使用 preg_match_all() 捕获格式化字符串中标签和值的成对匹配
  3. 使用 array_column() 将匹配的列重新组织成关联数组。

输出:

array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'POWER' => '90Hk',
  'TORQUE' => '202Nm',
  'GEAR' => 'man',
  'DRIVE' => '2wd',
  'YEAR' => '1996',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'YEAR' => '1996',
  'NOTE' => 'this engine needs custom stage',
  'GEAR' => 'auto',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'GEAR' => 'man',
  'YEAR' => '1996',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'YEAR' => '1996',
  'DRIVE' => '2wd',
)
---

另外,您还提到了其他一些相关示例和技巧,如果需要翻译,请具体指明。

英文:

If you'd like to have a dynamic associative array:

  1. Prepend MAKE to the string
  2. Use preg_match_all() to capture pairs of labels and values in the formatted string
  3. Use array_column() to restructure the columns of matches into an associative array.

Code: (Demo)

$strings = [
    'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996',
    'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 NOTE this engine needs custom stage GEAR auto',
    'Audi MODEL 80 ENGINE 1.9 TDi GEAR man YEAR 1996',
    'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 DRIVE 2wd'
];

foreach ($strings as $string) {
    preg_match_all('/\b([A-Z]+)\s+(\S+(?:\s+\S+)*?)(?=$|\s+[A-Z]+\b)/', 'MAKE ' . $string, $m, PREG_SET_ORDER);
    var_export(array_column($m, 2, 1));
    echo "\n---\n";
}

Output:

array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'POWER' => '90Hk',
  'TORQUE' => '202Nm',
  'GEAR' => 'man',
  'DRIVE' => '2wd',
  'YEAR' => '1996',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'YEAR' => '1996',
  'NOTE' => 'this engine needs custom stage',
  'GEAR' => 'auto',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'GEAR' => 'man',
  'YEAR' => '1996',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'YEAR' => '1996',
  'DRIVE' => '2wd',
)
---

This is not a new concept/technique. The only adjustment to make is how to identify the keys/labels in the original string. Instead of [A-Z]+ you may wish to explicitly name each label and separate them in the pattern with pipes. See these other demonstrations:


Alternatively, instead of using a regex to parse the string, you could manipulate the string into a standardized format that a native PHP function can parse. (Demo)

foreach ($strings as $string) {
    var_export(
        parse_ini_string(
            preg_replace(
                '~\s*\b(MAKE|MODEL|ENGINE|POWER|TORQUE|GEAR|DRIVE|YEAR|NOTE)\s+~',
                "\n$1=",
                'MAKE ' . $string
            )
        )
    );
    echo "\n---\n";
}

答案3

得分: 0

以下是代码部分的翻译,不包括注释:

// 要分配非前缀项的第一个组
$firstGroup = 'MAKE';

// 每个可能的单词分组
$wordList = ['ENGINE', 'MODEL', 'POWER', 'TORQUE', 'GEAR', 'DRIVE', 'YEAR'];

// 测试字符串
$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

// 组名和值的键/值对
$groups = [];

// 默认为第一个组
$currentWord = $firstGroup;
foreach (explode(' ', $string) as $word) {

    // 找到特殊单词,重置并继续查找
    if (in_array($word, $wordList)) {
        $currentWord = $word;
        continue;
    }

    // 分配。后续的循环可以通过在此处进行字符串连接来删除
    $groups[$currentWord][] = $word;
}

// 可选,将每个组重新连接成字符串
foreach ($groups as $key => $values) {
    $groups[$key] = implode(' ', $values);
}

var_dump($groups);

输出:

array(8) {
  ["MAKE"] =>
  string(4) "Audi"
  ["MODEL"] =>
  string(2) "80"
  ["ENGINE"] =>
  string(7) "1.9 TDi"
  ["POWER"] =>
  string(4) "90Hk"
  ["TORQUE"] =>
  string(5) "202Nm"
  ["GEAR"] =>
  string(3) "man"
  ["DRIVE"] =>
  string(3) "2wd"
  ["YEAR"] =>
  string(4) "1996"
}

演示:https://3v4l.org/D4pvl

英文:

If you'd prefer a non-RegEx method, you could also just break into individual tokens (words) and build an array. The code below makes some presumptions about whitespace which, if it is a problem, could be addressed with a replace possibly.

// The first group to assign un-prefixed items to
$firstGroup = 'MAKE';

// Every possible word grouping
$wordList = ['ENGINE', 'MODEL', 'POWER', 'TORQUE', 'GEAR', 'DRIVE', 'YEAR'];

// Test string
$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

// Key/value of group name and values
$groups = [];

// Default to the first group
$currentWord = $firstGroup;
foreach (explode(' ', $string) as $word) {

    // Found a special word, reset and continue the hunt
    if (in_array($word, $wordList)) {
        $currentWord = $word;
        continue;
    }

    // Assign. The subsequent for loop could be removed by just doing string concatenation here instead
    $groups[$currentWord][] = $word;
}

// Optional, join each back into a string
foreach ($groups as $key => $values) {
    $groups[$key] = implode(' ', $values);
}

var_dump($groups);

Outputs:

array(8) {
  ["MAKE"]=>
  string(4) "Audi"
  ["MODEL"]=>
  string(2) "80"
  ["ENGINE"]=>
  string(7) "1.9 TDi"
  ["POWER"]=>
  string(4) "90Hk"
  ["TORQUE"]=>
  string(5) "202Nm"
  ["GEAR"]=>
  string(3) "man"
  ["DRIVE"]=>
  string(3) "2wd"
  ["YEAR"]=>
  string(4) "1996"
}

Demo: https://3v4l.org/D4pvl

huangapple
  • 本文由 发表于 2023年2月14日 22:26:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75449236.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定