2023年6月1日 01:46:28go评论97阅读模式

英文:

Extract value from a string based on certain key value pairs

问题

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz Text abc'}]}]

comment text is: [{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]

comment text is:[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]

英文:

I have some data I am pulling from JIRA that has data in the below format.

comment text is: [{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39;In conversation with the customer &#39;}, {&#39;type&#39;: &#39;mention&#39;, &#39;attrs&#39;: {&#39;id&#39;: &#39;04445152&#39;, &#39;text&#39;: &#39;@Kev&#39;, &#39;accessLevel&#39;: &#39;&#39;}}, {&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39; Text 123}]}]
comment text is: [{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39;@xyz  Text abc&#39;}]}]
comment text is: [{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;mention&#39;, &#39;attrs&#39;: {&#39;id&#39;: &#39;3445343&#39;, &#39;text&#39;: &#39;@Hey&#39;, &#39;accessLevel&#39;: &#39;&#39;}}, {&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39; FYI&#39;}]}]
comment text is:[{&#39;content&#39;: [{&#39;text&#39;: &#39;Output: &#39;, &#39;type&#39;: &#39;text&#39;}, {&#39;type&#39;: &#39;hardBreak&#39;}, {&#39;type&#39;: &#39;hardBreak&#39;}, {&#39;text&#39;: &quot;New Text goes here&quot;, &#39;type&#39;: &#39;text&#39;}], &#39;type&#39;: &#39;paragraph&#39;}]

I would like to extract all data that have key value of text and also concat if there are multiple such values in the same row. Given below is the expected output

Expected output:

In conversation with the customer @Kev Text 123
@xyz  Text abc
@Hey FYI
Output: New Text goes here

答案1

得分: 1

以下是您要翻译的内容的翻译部分：

假设您的 JSON 值可以非常复杂并且彼此之间差异很大，您可以使用正向后顾查找来定位所需的字符串：

import pandas as pd
import re
comments_df = pd.DataFrame([
    '''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': 'In conversation with the customer '}, {'type': 'mention', 'attrs': {'id': '04445152', 'text': '@Kev', 'accessLevel': ''}}, {'type': 'text', 'text': ' Text 123'}]}]''',
    '''[{'type': 'paragraph', 'content': [{'type': 'text', 'text': '@xyz  Text abc'}]}]''',
    '''[{'type': 'paragraph', 'content': [{'type': 'mention', 'attrs': {'id': '3445343', 'text': '@Hey', 'accessLevel': ''}}, {'type': 'text', 'text': ' FYI'}]}]''',
    '''[{'content': [{'text': 'Output: ', 'type': 'text'}, {'type': 'hardBreak'}, {'type': 'hardBreak'}, {'text': "New Text goes here", 'type': 'text'}], 'type': 'paragraph'}]'''
], columns=['string'])
pattern = r'''(?<='text': ['"]).*?(?=['"][},])'''
print(comments_df['string'].str.findall(pattern).str.join(''))

输出：

0    In conversation with the customer @Kev Text 123
1                                     @xyz  Text abc
2                                           @Hey FYI
3                         Output: New Text goes here
Name: string, dtype: object

请查看正则表达式演示和Python 演示。

英文:

Assuming your json values can be very complex and highly different among each other, you may use a positive lookbehind to spot your needed strings:

(?&lt;=&#39;text&#39;: [&#39;\&quot;]).*?(?=[&#39;\&quot;][},])

Regex Explanation:

(?<='text': ['\"]): positive lookbehind matching 'text' and either a single or double quote
.*?: any amount of characters, matched in a lazy fashion
(?=['\"][},]): followed by a single or double quote, and either comma or closed brace.

While your Python code would be like:

import pandas as pd
import re
comments_df = pd.DataFrame([
    &#39;&#39;&#39;[{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39;In conversation with the customer &#39;}, {&#39;type&#39;: &#39;mention&#39;, &#39;attrs&#39;: {&#39;id&#39;: &#39;04445152&#39;, &#39;text&#39;: &#39;@Kev&#39;, &#39;accessLevel&#39;: &#39;&#39;}}, {&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39; Text 123&#39;}]}]&#39;&#39;&#39;,
    &#39;&#39;&#39;[{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39;@xyz  Text abc&#39;}]}]&#39;&#39;&#39;,
    &#39;&#39;&#39;[{&#39;type&#39;: &#39;paragraph&#39;, &#39;content&#39;: [{&#39;type&#39;: &#39;mention&#39;, &#39;attrs&#39;: {&#39;id&#39;: &#39;3445343&#39;, &#39;text&#39;: &#39;@Hey&#39;, &#39;accessLevel&#39;: &#39;&#39;}}, {&#39;type&#39;: &#39;text&#39;, &#39;text&#39;: &#39; FYI&#39;}]}]&#39;&#39;&#39;,
    &#39;&#39;&#39;[{&#39;content&#39;: [{&#39;text&#39;: &#39;Output: &#39;, &#39;type&#39;: &#39;text&#39;}, {&#39;type&#39;: &#39;hardBreak&#39;}, {&#39;type&#39;: &#39;hardBreak&#39;}, {&#39;text&#39;: &quot;New Text goes here&quot;, &#39;type&#39;: &#39;text&#39;}], &#39;type&#39;: &#39;paragraph&#39;}]&#39;&#39;&#39;
], columns=[&#39;string&#39;])
pattern = r&#39;&#39;&#39;(?&lt;=&#39;text&#39;: [&#39;\&quot;]).*?(?=[&#39;\&quot;][},])&#39;&#39;&#39;
print(comments_df[&#39;string&#39;].str.findall(pattern).str.join(&#39;&#39;))

Output:

0    In conversation with the customer @Kev Text 123
1                                     @xyz  Text abc
2                                           @Hey FYI
3                         Output: New Text goes here
Name: string, dtype: object

Check the Regex demo and the Python demo.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从字符串中提取基于特定键值对的数值。

问题

答案1

如何在函数包含条件if语句时将numpy数组传递给函数？

Google cloud SDK drops a warning on macOS Catalina: Executing a script that is loading libcrypto in an unsafe way

在链表中添加搜索功能。

尝试在Python中按部门显示计数。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。