将包含子字符串的字符串转换为字典

huangapple go评论93阅读模式
英文:

convert string which contains sub string to dictionary

问题

  1. 我试图将特定格式的字符串转换为Python字典。
  2. 字符串格式如下,
  3. ```python
  4. st1 = ''key1 key2=value2 key3="key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4" key4''

我想解析它并将其转换为以下字典,

  1. dict1 {
  2. key1: None,
  3. key2: value2,
  4. key3: {
  5. key3.1: None,
  6. key3.2: value3.2,
  7. key3.3: value3.3,
  8. key3.4: None
  9. }
  10. key4: None,

我尝试使用python的re包和字符串分割函数,但未能实现结果。我有成千上万个相同格式的字符串,我正在尝试自动化处理它。有人能帮忙吗?

  1. <details>
  2. <summary>英文:</summary>
  3. I am tring to convert particular strings which are in particular format to Python dictionary.
  4. String format is like below,

st1 = 'key1 key2=value2 key3="key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4" key4'

  1. I want to parse it and convert to dictionary as below,

dict1 {
key1: None,
key2: value2,
key3: {
key3.1: None,
key3.2: value3.2,
key3.3: value3.3,
key3.2: None
}
key4: None,

  1. I tried to use python re package and string split function. not able to acheive the result. I have thousands of string in same format, I am trying to automate it. could someone help.
  2. </details>
  3. # 答案1
  4. **得分**: 0
  5. ```python
  6. 如果您的所有字符串都是一致的,并且只有一层子字典,下面的代码应该能起作用,您可能需要对其进行微调/更改。
  7. import json
  8. st1 = 'key1 key2=item2 key3="key3.1, key3.2=item3.2 , key3.3 = item3.3, key3.4" key4'
  9. st1 = st1.replace(' = ', '=')
  10. st1 = st1.replace(' ,', ',')
  11. new_dict = {}
  12. no_keys=False
  13. while not no_keys:
  14. st1 = st1.lstrip()
  15. if " " in st1:
  16. item = st1.split(" ")[0]
  17. else:
  18. item = st1
  19. if '=' in item:
  20. if '="' in item:
  21. item = item.split('=')[0]
  22. new_dict[item] = {}
  23. st1 = st1.replace(f'{item}=', '')
  24. sub_items = st1.split('"')[1]
  25. sub_values = sub_items.split(',')
  26. for sub_item in sub_values:
  27. if "=" in sub_item:
  28. sub_key, sub_value = sub_item.split('=')
  29. new_dict[item].update({sub_key.strip():sub_value.strip()})
  30. else:
  31. new_dict[item].update({sub_item.strip(): None})
  32. st1 = st1.replace(f'"{sub_items}"', '')
  33. else:
  34. key, value = item.split('=')
  35. new_dict.update({key:value})
  36. st1 = st1.replace(f"{item} ", "")
  37. else:
  38. new_dict.update({item: None})
  39. st1 = st1.replace(f"{item}", "")
  40. if st1 == "":
  41. no_keys=True
  42. print(json.dumps(new_dict, indent=4))
英文:

If all your strings are consistent, and only have 1 layer of sub dict, this code below should do the trick, you may need to make tweaks/changes to it.

  1. import json
  2. st1 = &#39;key1 key2=item2 key3=&quot;key3.1, key3.2=item3.2 , key3.3 = item3.3, key3.4&quot; key4&#39;
  3. st1 = st1.replace(&#39; = &#39;, &#39;=&#39;)
  4. st1 = st1.replace(&#39; ,&#39;, &#39;,&#39;)
  5. new_dict = {}
  6. no_keys=False
  7. while not no_keys:
  8. st1 = st1.lstrip()
  9. if &quot; &quot; in st1:
  10. item = st1.split(&quot; &quot;)[0]
  11. else:
  12. item = st1
  13. if &#39;=&#39; in item:
  14. if &#39;=&quot;&#39; in item:
  15. item = item.split(&#39;=&#39;)[0]
  16. new_dict[item] = {}
  17. st1 = st1.replace(f&#39;{item}=&#39;,&#39;&#39;)
  18. sub_items = st1.split(&#39;&quot;&#39;)[1]
  19. sub_values = sub_items.split(&#39;,&#39;)
  20. for sub_item in sub_values:
  21. if &quot;=&quot; in sub_item:
  22. sub_key, sub_value = sub_item.split(&#39;=&#39;)
  23. new_dict[item].update({sub_key.strip():sub_value.strip()})
  24. else:
  25. new_dict[item].update({sub_item.strip(): None})
  26. st1 = st1.replace(f&#39;&quot;{sub_items}&quot;&#39;, &#39;&#39;)
  27. else:
  28. key, value = item.split(&#39;=&#39;)
  29. new_dict.update({key:value})
  30. st1 = st1.replace(f&quot;{item} &quot;,&quot;&quot;)
  31. else:
  32. new_dict.update({item: None})
  33. st1 = st1.replace(f&quot;{item}&quot;,&quot;&quot;)
  34. if st1 == &quot;&quot;:
  35. no_keys=True
  36. print(json.dumps(new_dict, indent=4))

答案2

得分: 0

考虑使用解析工具,如 lark。对于你的情况,这是一个简单的例子:

  1. _grammar = r&#39;&#39;&#39;
  2. ?start: value
  3. ?value: object
  4. | NON_SEPARATOR_STRING?
  5. object : &quot;\&quot;&quot; [pair (_SEPARATOR pair)*] &quot;\&quot;&quot;
  6. pair : NON_SEPARATOR_STRING [_PAIRTOR] value
  7. NON_SEPARATOR_STRING: /[a-zA-z0-9\.]+/
  8. _SEPARATOR: /[, ]+/
  9. | &quot;,&quot;
  10. _PAIRTOR: &quot; = &quot;
  11. | &quot;=&quot;
  12. &#39;&#39;&#39;
  13. parser = Lark(_grammar)
  14. st1 = &#39;key1 key2=value2 key3=&quot;key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4&quot; key4&#39;
  15. tree = parser.parse(f&#39;&quot;{st1}&quot;&#39;)
  16. print(tree.pretty())
  17. &quot;&quot;&quot;
  18. object
  19. pair
  20. key1
  21. value
  22. pair
  23. key2
  24. value2
  25. pair
  26. key3
  27. object
  28. pair
  29. key3.1
  30. value
  31. pair
  32. key3.2
  33. value3.2
  34. pair
  35. key3.3
  36. value3.3
  37. pair
  38. key3.4
  39. value
  40. pair
  41. key4
  42. value
  43. &quot;&quot;&quot;

然后,你可以编写自己的 Transformer 来将这个 tree 转换为你想要的日期类型。

英文:

Consider use parsing tool like lark. A simple example to your case:

  1. _grammar = r&#39;&#39;&#39;
  2. ?start: value
  3. ?value: object
  4. | NON_SEPARATOR_STRING?
  5. object : &quot;\&quot;&quot; [pair (_SEPARATOR pair)*] &quot;\&quot;&quot;
  6. pair : NON_SEPARATOR_STRING [_PAIRTOR] value
  7. NON_SEPARATOR_STRING: /[a-zA-z0-9\.]+/
  8. _SEPARATOR: /[, ]+/
  9. | &quot;,&quot;
  10. _PAIRTOR: &quot; = &quot;
  11. | &quot;=&quot;
  12. &#39;&#39;&#39;
  13. parser = Lark(_grammar)
  14. st1 = &#39;key1 key2=value2 key3=&quot;key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4&quot; key4&#39;
  15. tree = parser.parse(f&#39;&quot;{st1}&quot;&#39;)
  16. print(tree.pretty())
  17. &quot;&quot;&quot;
  18. object
  19. pair
  20. key1
  21. value
  22. pair
  23. key2
  24. value2
  25. pair
  26. key3
  27. object
  28. pair
  29. key3.1
  30. value
  31. pair
  32. key3.2
  33. value3.2
  34. pair
  35. key3.3
  36. value3.3
  37. pair
  38. key3.4
  39. value
  40. pair
  41. key4
  42. value
  43. &quot;&quot;&quot;

Then you can write your own Transformer to transform this tree to your desired date type.

huangapple
  • 本文由 发表于 2023年2月16日 11:15:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75467500.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定