将包含子字符串的字符串转换为字典

huangapple go评论53阅读模式
英文:

convert string which contains sub string to dictionary

问题

我试图将特定格式的字符串转换为Python字典。
字符串格式如下,
```python
st1 = ''key1 key2=value2 key3="key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4" key4''

我想解析它并将其转换为以下字典,

dict1 {
    key1: None,
    key2: value2,
    key3: {
            key3.1: None,
            key3.2: value3.2,
            key3.3: value3.3,
            key3.4: None
          }
    key4: None,

我尝试使用python的re包和字符串分割函数,但未能实现结果。我有成千上万个相同格式的字符串,我正在尝试自动化处理它。有人能帮忙吗?


<details>
<summary>英文:</summary>

I am tring to convert particular strings which are in particular format to Python dictionary.
String format is like below,

st1 = 'key1 key2=value2 key3="key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4" key4'


I want to parse it and convert to dictionary as below,

dict1 {
key1: None,
key2: value2,
key3: {
key3.1: None,
key3.2: value3.2,
key3.3: value3.3,
key3.2: None
}
key4: None,







I tried to use python re package and string split function. not able to acheive the result. I have thousands of string in same format, I am trying to automate it. could someone help. 

</details>


# 答案1
**得分**: 0

```python
如果您的所有字符串都是一致的,并且只有一层子字典,下面的代码应该能起作用,您可能需要对其进行微调/更改。

import json

st1 = 'key1 key2=item2 key3="key3.1, key3.2=item3.2 , key3.3 = item3.3, key3.4" key4'
st1 = st1.replace(' = ', '=')
st1 = st1.replace(' ,', ',')
new_dict = {}
no_keys=False

while not no_keys:
    st1 = st1.lstrip()
    
    if " " in st1:
        item = st1.split(" ")[0]
    else:
        item = st1
    
    if '=' in item:
        if '="' in item:
            item = item.split('=')[0]
            new_dict[item] = {}        
            
            st1 = st1.replace(f'{item}=', '')
            sub_items = st1.split('"')[1]
            sub_values = sub_items.split(',')
    
            for sub_item in sub_values:
                if "=" in sub_item:
                    sub_key, sub_value = sub_item.split('=')
                    new_dict[item].update({sub_key.strip():sub_value.strip()})
                else:
                    new_dict[item].update({sub_item.strip(): None})
                
                st1 = st1.replace(f'"{sub_items}"', '')
        else:
            key, value = item.split('=')
            new_dict.update({key:value})
            st1 = st1.replace(f"{item} ", "")
    else:
        new_dict.update({item: None})
        st1 = st1.replace(f"{item}", "")
        
    if st1 == "":
        no_keys=True    
        
print(json.dumps(new_dict, indent=4))
英文:

If all your strings are consistent, and only have 1 layer of sub dict, this code below should do the trick, you may need to make tweaks/changes to it.

import json

st1 = &#39;key1 key2=item2 key3=&quot;key3.1, key3.2=item3.2 , key3.3 = item3.3, key3.4&quot; key4&#39;
st1 = st1.replace(&#39; = &#39;, &#39;=&#39;)
st1 = st1.replace(&#39; ,&#39;, &#39;,&#39;)
new_dict = {}
no_keys=False

while not no_keys:
	st1 = st1.lstrip()
	
	if &quot; &quot; in st1:
		item = st1.split(&quot; &quot;)[0]
	else:
		item = st1
	
	if &#39;=&#39; in item:
		if &#39;=&quot;&#39; in item:
			item = item.split(&#39;=&#39;)[0]
			new_dict[item] = {}		
			
			st1 = st1.replace(f&#39;{item}=&#39;,&#39;&#39;)
			sub_items = st1.split(&#39;&quot;&#39;)[1]
			sub_values = sub_items.split(&#39;,&#39;)

			for sub_item in sub_values:
				if &quot;=&quot; in sub_item:
					sub_key, sub_value = sub_item.split(&#39;=&#39;)
					new_dict[item].update({sub_key.strip():sub_value.strip()})
				else:
					new_dict[item].update({sub_item.strip(): None})
			
			st1 = st1.replace(f&#39;&quot;{sub_items}&quot;&#39;, &#39;&#39;)
		else:
			key, value = item.split(&#39;=&#39;)
			new_dict.update({key:value})
			st1 = st1.replace(f&quot;{item} &quot;,&quot;&quot;)
	else:
		new_dict.update({item: None})
		st1 = st1.replace(f&quot;{item}&quot;,&quot;&quot;)
		
	if st1 == &quot;&quot;:
		no_keys=True	
	
print(json.dumps(new_dict, indent=4))

答案2

得分: 0

考虑使用解析工具,如 lark。对于你的情况,这是一个简单的例子:

_grammar = r&#39;&#39;&#39;
    ?start: value
    
    ?value: object
           | NON_SEPARATOR_STRING?

    object : &quot;\&quot;&quot; [pair (_SEPARATOR pair)*] &quot;\&quot;&quot;
    pair : NON_SEPARATOR_STRING [_PAIRTOR] value

    
    NON_SEPARATOR_STRING: /[a-zA-z0-9\.]+/
    _SEPARATOR: /[,  ]+/
            | &quot;,&quot;
    _PAIRTOR: &quot; = &quot;
            | &quot;=&quot;
&#39;&#39;&#39;

parser = Lark(_grammar)

st1 = &#39;key1 key2=value2 key3=&quot;key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4&quot; key4&#39;

tree = parser.parse(f&#39;&quot;{st1}&quot;&#39;)
print(tree.pretty())

&quot;&quot;&quot;
object
  pair
    key1
    value
  pair
    key2
    value2
  pair
    key3
    object
      pair
        key3.1
        value
      pair
        key3.2
        value3.2
      pair
        key3.3
        value3.3
      pair
        key3.4
        value
  pair
    key4
    value

&quot;&quot;&quot;

然后,你可以编写自己的 Transformer 来将这个 tree 转换为你想要的日期类型。

英文:

Consider use parsing tool like lark. A simple example to your case:

_grammar = r&#39;&#39;&#39;
    ?start: value
    
    ?value: object
           | NON_SEPARATOR_STRING?

    object : &quot;\&quot;&quot; [pair (_SEPARATOR pair)*] &quot;\&quot;&quot;
    pair : NON_SEPARATOR_STRING [_PAIRTOR] value

    
    NON_SEPARATOR_STRING: /[a-zA-z0-9\.]+/
    _SEPARATOR: /[,  ]+/
            | &quot;,&quot;
    _PAIRTOR: &quot; = &quot;
            | &quot;=&quot;
&#39;&#39;&#39;

parser = Lark(_grammar)

st1 = &#39;key1 key2=value2 key3=&quot;key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4&quot; key4&#39;

tree = parser.parse(f&#39;&quot;{st1}&quot;&#39;)
print(tree.pretty())

&quot;&quot;&quot;
object
  pair
    key1
    value
  pair
    key2
    value2
  pair
    key3
    object
      pair
        key3.1
        value
      pair
        key3.2
        value3.2
      pair
        key3.3
        value3.3
      pair
        key3.4
        value
  pair
    key4
    value

&quot;&quot;&quot;

Then you can write your own Transformer to transform this tree to your desired date type.

huangapple
  • 本文由 发表于 2023年2月16日 11:15:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75467500.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定