2023年7月28日 05:17:49go评论116阅读模式

英文:

python pandas: Generate (three) cells from one cell

问题

我有一个简单的数据框，其中包含一些元数据和一个句子列。我想要使用textacy的SVO提取器生成三个新列，分别用于主语、动词和宾语。我尽量以尽可能纯粹的pandas方式来实现这个目标：

metadata   sentence
1-0	       Thank you so much, Chris.
1-1	       And it's truly a great honor to be here. 
1-2	       I have been blown away by this conference.
1-3	       And I say that sincerely.

我尝试过以下代码：

def svo(text):
    svotriple = textacy.extract.triples.subject_verb_object_triples(nlp(text))
    for item in svotriple:
        df['subject'] = str(item[0][-1])
        df['verb']    = str(item[1][-1])
        df['object']  = str(item[2])
df.apply(svo(df['sentence'].values[0]))

我试图以不同的方式从句子列中提取句子作为字符串。大多数情况下，它们返回的是一个Series。我想要逐行进行操作。我一开始的想法是使用for循环，但我真的想尽量按照pandas的方式来做这个。

英文:

I have a simple dataframe consisting of some metadata in a few columns and then a column with a sentence in it. I would like to use textacy's SVO extractor to generate three new columns, one each for the subject, verb, and object. I am trying to do this in as pandas a way as possible:

metadata   sentence
1-0	       Thank you so much, Chris.
1-1	       And it&#39;s truly a great honor to be here. 
1-2	       I have been blown away by this conference.
1-3	       And I say that sincerely.

To which I tried this:

def svo(text):
    svotriple = textacy.extract.triples.subject_verb_object_triples(nlp(text))
    for item in svotriple:
        df[&#39;subject&#39;] = str(item[0][-1])
        df[&#39;verb&#39;]    = str(item[1][-1])
        df[&#39;object&#39;]  = str(item[2])
df.apply(svo(df[&#39;sentence&#39;].values[0]))

I've tried to get just the sentence as a string out of the sentence column a couple of ways. Most of them returned the fact that I was actually getting a series. I want this to work row-by-row. My impulse was to go with a for loop, but I really want to try to do this the pandas way. (Not that my for loops were working terribly well.)

答案1

得分: 1

你使用apply的方式是不正确的。你应该创建一个空的DataFrame来存储SVO三元组，你正在直接在每次迭代中更新现有DataFrame的列，这将覆盖先前的值。

尝试这种方式

import pandas as pd
import textacy
import spacy
nlp = spacy.load('en_core_web_sm')
def svo(text):
    svotriples = textacy.extract.triples.subject_verb_object_triples(nlp(text))
    svo_list = []
    for item in svotriples:
        subject = str(item[0][-1])
        verb = str(item[1][-1])
        obj = str(item[2])
        svo_list.append([subject, verb, obj])
    return svo_list
data = {
    'sentence': [
        'Thank you so much, Chris.',
        "And it's truly a great honor to be here.",
        'I have been blown away by this conference.',
        'And I say that sincerely.'
    ]
}
df = pd.DataFrame(data)
df[['subject', 'verb', 'object']] = df['sentence'].apply(svo).apply(pd.Series)
print(df)

英文:

The way you use apply is incorrect. You should create an empty DataFrame to store the SVO triples, you're directly updating the columns of the existing DataFrame in each iteration, which will overwrite the previous values.

Try this way

import pandas as pd
import textacy
import spacy
nlp = spacy.load(&#39;en_core_web_sm&#39;)
def svo(text):
    svotriples = textacy.extract.triples.subject_verb_object_triples(nlp(text))
    svo_list = []
    for item in svotriples:
        subject = str(item[0][-1])
        verb = str(item[1][-1])
        obj = str(item[2])
        svo_list.append([subject, verb, obj])
    return svo_list
data = {
    &#39;sentence&#39;: [
        &#39;Thank you so much, Chris.&#39;,
        &quot;And it&#39;s truly a great honor to be here.&quot;,
        &#39;I have been blown away by this conference.&#39;,
        &#39;And I say that sincerely.&#39;
    ]
}
df = pd.DataFrame(data)
df[[&#39;subject&#39;, &#39;verb&#39;, &#39;object&#39;]] = df[&#39;sentence&#39;].apply(svo).apply(pd.Series)
print(df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

python pandas: 从一个单元格生成（三个）单元格

问题

答案1

percent-encoded %2F fail request

如何一次保存多个CSV文件，并更改它们的标题？

AttributeError: ‘NoneType’ object has no attribute ‘randomSplit’

从DataFrame中根据特定条件提取特定组。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。