2023年6月22日 00:53:54go评论67阅读模式

英文:

Change Pandas column based on values in two other columns

问题

我尝试了许多不同的解决方案，但似乎找不到一个能正常工作的答案。

所以基本上我有一个如下的pandas dataframe：

组件   ...   rdfs:range
-----------------------
类             NaN
属性          xsd:int
实例             NaN
属性          obj:对象名称
属性          xsd:字符串
属性          obj:Object2名称
实例             NaN
类             NaN

我需要的输出是：

组件 ... 子组件 ... rdfs:range
----------------------------
类         NaN            NaN
属性     数据属性      xsd:int
实例         NaN            NaN
属性     对象属性      obj:对象名称
属性     数据属性      xsd:字符串
属性     对象属性      obj:Object2名称
实例         NaN            NaN
类         NaN            NaN

我有一个字典，其中包含了所有作为数据属性的rdfs:range值，但对象属性的数量太大，无法手动编目（而且数据属性和对象属性的前缀不同，与上面不同，因此无法使用字符串匹配）。

理想的行为是：

对于类和实例，保持"本体子组件"为NaN。
将"本体子组件"根据rdfs:range列中的字典内容进行转换。
对于步骤2中不在字典中的所有属性，将"本体子组件"转换为"对象属性"。

我已经分别找到了如何实现步骤1和步骤2，但它们会相互覆盖，即sample_df["本体子组件"] = sample_df["rdfs:range"].map(sample_dict)解决了步骤2，但我找到的任何解决步骤1和步骤3的方法都会覆盖此解决方案，或者不能在类和实例的正确位置保留NaN。

任何帮助或指导都将非常有帮助！

英文:

I've tried a number of different solutions to this but can't seem to come across an answer which is functioning as necessary.

So basically I have a pandas dataframe as follows:

component   ...   rdfs:range
-----------------------------
class             NaN
property          xsd:int
instance          NaN
property          obj:ObjectName
property          xsd:string
property          obj:Object2Name
instance          NaN
class             NaN

What I need as an output is the following:

component ... subcomponent ... rdfs:range
-----------------------------------------------
class         NaN              NaN
property      data property    xsd:int
instance      NaN              NaN
property      object property  obj:ObjectName
property      data property    xsd:string
property      object property  obj:Object2Name
instance      NaN              NaN
class         NaN              NaN

I have a dictionary where I have all of the rdfs:range values that are data properties are enumerated, but the number of object properties is too large to catalogue manually (as well both the data properties and object properties have different prefixes, unlike above, so string matching is out of the question).

Ideal behavior is:

Keep "ontology subcomponent" as NaN for classes and instances.
Turn "ontology subcomponent" to whatever is in the dictionary based on the rdfs:range column.
Turn "ontology subcomponent" to "object property" for all properties not in the dictionary in step 2.

I have figured out how to achieve steps 1 and 2 separately, but they keep overwriting one another, i.e. sample_df["ontology subcomponent"] = sample_df["rdfs:range"].map(sample_dict) solves step 2 but any solution to steps 1 and 3 that I've found overwrites this solution or doesn't retain NaN in the correct spot for classes and instances.

Any help or pointing in the right direction would be extremely helpful!

答案1

得分: 1

我会只翻译代码部分，如下：

sub_comp = (
    sample_df["rdfs:range"].map(sample_dict)
        .fillna("object property") # &lt;- 1st chain added
        .mask(sample_df["component"].isin(["class", "instance"])) # &lt;- 2nd chain
)

sample_df.insert(1, "subcomponent", sub_comp)

输出：

print(sample_df)

  component     subcomponent       rdfs:range
0     class              NaN              NaN
1  property    data property          xsd:int
2  instance              NaN              NaN
3  property  object property   obj:ObjectName
4  property    data property       xsd:string
5  property  object property  obj:Object2Name
6  instance              NaN              NaN
7     class              NaN              NaN

英文:

I would just fillna all the subcomponents with "object property" right after the mapping then mask those with a component isin "class" or "instance" :

sub_comp = (
    sample_df[&quot;rdfs:range&quot;].map(sample_dict)
        .fillna(&quot;object property&quot;) # &lt;- 1st chain added
        .mask(sample_df[&quot;component&quot;].isin([&quot;class&quot;, &quot;instance&quot;])) # &lt;- 2nd chain
)

sample_df.insert(1, &quot;subcomponent&quot;, sub_comp)

Output :

print(sample_df)

  component     subcomponent       rdfs:range
0     class              NaN              NaN
1  property    data property          xsd:int
2  instance              NaN              NaN
3  property  object property   obj:ObjectName
4  property    data property       xsd:string
5  property  object property  obj:Object2Name
6  instance              NaN              NaN
7     class              NaN              NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据另外两列的值更改 Pandas 列。

问题

答案1

Class doesn’t support Automation: ‘CreateObject’ when creating a Scripting.Dictionary object.

如何在不丢失信息的情况下对多维事件进行重新采样

如何在Go语言中计算一个map中的元素数量？

Go语言中的映射（maps）在内部使用哈希表（hash table）数据结构。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论