从字符串中提取基于特定键值对的数值。

huangapple go评论72阅读模式
英文:

Extract value from a string based on certain key value pairs

问题

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz Text abc'}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]

comment text is:[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]

英文:

I have some data I am pulling from JIRA that has data in the below format.

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz  Text abc'}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]

comment text is:[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]

I would like to extract all data that have key value of text and also concat if there are multiple such values in the same row. Given below is the expected output

Expected output:

In conversation with the customer @Kev Text 123

@xyz  Text abc

@Hey FYI

Output: New Text goes here

答案1

得分: 1

以下是您要翻译的内容的翻译部分:

假设您的 JSON 值可以非常复杂并且彼此之间差异很大,您可以使用正向后顾查找来定位所需的字符串:

import pandas as pd
import re

comments_df = pd.DataFrame([
    '''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123'}]}]''',
    '''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz  Text abc'}]}]''',
    '''[{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]''',
    '''[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]'''
], columns=['string'])

pattern = r'''(?<='text': ['"]).*?(?=['"][},])'''
print(comments_df['string'].str.findall(pattern).str.join(''))

输出

0    In conversation with the customer @Kev Text 123
1                                     @xyz  Text abc
2                                           @Hey FYI
3                         Output: New Text goes here
Name: string, dtype: object

请查看正则表达式演示Python 演示

英文:

Assuming your json values can be very complex and highly different among each other, you may use a positive lookbehind to spot your needed strings:

(?&lt;=&#39;text&#39;: [&#39;\&quot;]).*?(?=[&#39;\&quot;][},])

Regex Explanation:

  • (?&lt;=&#39;text&#39;: [&#39;\&quot;]): positive lookbehind matching &#39;text&#39; and either a single or double quote
  • .*?: any amount of characters, matched in a lazy fashion
  • (?=[&#39;\&quot;][},]): followed by a single or double quote, and either comma or closed brace.

While your Python code would be like:

import pandas as pd
import re

comments_df = pd.DataFrame([
    &#39;&#39;&#39;[{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39;In conversation with the customer &#39;}, {&#39;type&#39;: &#39;mention&#39;, &#39;attrs&#39;: {&#39;id&#39;: &#39;04445152&#39;, &#39;text&#39;: &#39;@Kev&#39;, &#39;accessLevel&#39;: &#39;&#39;}}, {&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39; Text 123&#39;}]}]&#39;&#39;&#39;,
    &#39;&#39;&#39;[{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39;@xyz  Text abc&#39;}]}]&#39;&#39;&#39;,
    &#39;&#39;&#39;[{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;mention&#39;, &#39;attrs&#39;: {&#39;id&#39;: &#39;3445343&#39;, &#39;text&#39;: &#39;@Hey&#39;, &#39;accessLevel&#39;: &#39;&#39;}}, {&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39; FYI&#39;}]}]&#39;&#39;&#39;,
    &#39;&#39;&#39;[{&#39;content&#39;: [{&#39;text&#39;: &#39;Output: &#39;, &#39;type&#39;: &#39;text&#39;}, {&#39;type&#39;: &#39;hardBreak&#39;}, {&#39;type&#39;: &#39;hardBreak&#39;}, {&#39;text&#39;: &quot;New Text goes here&quot;, &#39;type&#39;: &#39;text&#39;}], &#39;type&#39;: &#39;paragraph&#39;}]&#39;&#39;&#39;
], columns=[&#39;string&#39;])

pattern = r&#39;&#39;&#39;(?&lt;=&#39;text&#39;: [&#39;\&quot;]).*?(?=[&#39;\&quot;][},])&#39;&#39;&#39;
print(comments_df[&#39;string&#39;].str.findall(pattern).str.join(&#39;&#39;))

Output:

0    In conversation with the customer @Kev Text 123
1                                     @xyz  Text abc
2                                           @Hey FYI
3                         Output: New Text goes here
Name: string, dtype: object

Check the Regex demo and the Python demo.

huangapple
  • 本文由 发表于 2023年6月1日 01:46:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76376097.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定