英文:
Extract value from a string based on certain key value pairs
问题
comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123}]}]
comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz Text abc'}]}]
comment text is: [{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]
comment text is:[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]
英文:
I have some data I am pulling from JIRA that has data in the below format.
comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123}]}]
comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz Text abc'}]}]
comment text is: [{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]
comment text is:[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]
I would like to extract all data that have key value of text and also concat if there are multiple such values in the same row. Given below is the expected output
Expected output:
In conversation with the customer @Kev Text 123
@xyz Text abc
@Hey FYI
Output: New Text goes here
答案1
得分: 1
以下是您要翻译的内容的翻译部分:
假设您的 JSON 值可以非常复杂并且彼此之间差异很大,您可以使用正向后顾查找来定位所需的字符串:
import pandas as pd
import re
comments_df = pd.DataFrame([
'''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123'}]}]''',
'''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz Text abc'}]}]''',
'''[{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]''',
'''[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]'''
], columns=['string'])
pattern = r'''(?<='text': ['"]).*?(?=['"][},])'''
print(comments_df['string'].str.findall(pattern).str.join(''))
输出:
0 In conversation with the customer @Kev Text 123
1 @xyz Text abc
2 @Hey FYI
3 Output: New Text goes here
Name: string, dtype: object
英文:
Assuming your json values can be very complex and highly different among each other, you may use a positive lookbehind to spot your needed strings:
(?<='text': ['\"]).*?(?=['\"][},])
Regex Explanation:
(?<='text': ['\"])
: positive lookbehind matching'text'
and either a single or double quote.*?
: any amount of characters, matched in a lazy fashion(?=['\"][},])
: followed by a single or double quote, and either comma or closed brace.
While your Python code would be like:
import pandas as pd
import re
comments_df = pd.DataFrame([
'''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123'}]}]''',
'''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz Text abc'}]}]''',
'''[{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]''',
'''[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]'''
], columns=['string'])
pattern = r'''(?<='text': ['\"]).*?(?=['\"][},])'''
print(comments_df['string'].str.findall(pattern).str.join(''))
Output:
0 In conversation with the customer @Kev Text 123
1 @xyz Text abc
2 @Hey FYI
3 Output: New Text goes here
Name: string, dtype: object
Check the Regex demo and the Python demo.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论