根据另外两列的值更改 Pandas 列。

huangapple go评论67阅读模式
英文:

Change Pandas column based on values in two other columns

问题

我尝试了许多不同的解决方案,但似乎找不到一个能正常工作的答案。

所以基本上我有一个如下的pandas dataframe:

组件   ...   rdfs:range
-----------------------
类             NaN
属性          xsd:int
实例             NaN
属性          obj:对象名称
属性          xsd:字符串
属性          obj:Object2名称
实例             NaN
类             NaN

我需要的输出是:

组件 ... 子组件 ... rdfs:range
----------------------------
类         NaN            NaN
属性     数据属性      xsd:int
实例         NaN            NaN
属性     对象属性      obj:对象名称
属性     数据属性      xsd:字符串
属性     对象属性      obj:Object2名称
实例         NaN            NaN
类         NaN            NaN

我有一个字典,其中包含了所有作为数据属性的rdfs:range值,但对象属性的数量太大,无法手动编目(而且数据属性和对象属性的前缀不同,与上面不同,因此无法使用字符串匹配)。

理想的行为是:

  1. 对于类和实例,保持"本体子组件"为NaN
  2. 将"本体子组件"根据rdfs:range列中的字典内容进行转换。
  3. 对于步骤2中不在字典中的所有属性,将"本体子组件"转换为"对象属性"。

我已经分别找到了如何实现步骤1和步骤2,但它们会相互覆盖,即sample_df["本体子组件"] = sample_df["rdfs:range"].map(sample_dict)解决了步骤2,但我找到的任何解决步骤1和步骤3的方法都会覆盖此解决方案,或者不能在类和实例的正确位置保留NaN

任何帮助或指导都将非常有帮助!

英文:

I've tried a number of different solutions to this but can't seem to come across an answer which is functioning as necessary.

So basically I have a pandas dataframe as follows:

component   ...   rdfs:range
-----------------------------
class             NaN
property          xsd:int
instance          NaN
property          obj:ObjectName
property          xsd:string
property          obj:Object2Name
instance          NaN
class             NaN

What I need as an output is the following:

component ... subcomponent ... rdfs:range
-----------------------------------------------
class         NaN              NaN
property      data property    xsd:int
instance      NaN              NaN
property      object property  obj:ObjectName
property      data property    xsd:string
property      object property  obj:Object2Name
instance      NaN              NaN
class         NaN              NaN

I have a dictionary where I have all of the rdfs:range values that are data properties are enumerated, but the number of object properties is too large to catalogue manually (as well both the data properties and object properties have different prefixes, unlike above, so string matching is out of the question).

Ideal behavior is:

  1. Keep "ontology subcomponent" as NaN for classes and instances.
  2. Turn "ontology subcomponent" to whatever is in the dictionary based on the rdfs:range column.
  3. Turn "ontology subcomponent" to "object property" for all properties not in the dictionary in step 2.

I have figured out how to achieve steps 1 and 2 separately, but they keep overwriting one another, i.e. sample_df["ontology subcomponent"] = sample_df["rdfs:range"].map(sample_dict) solves step 2 but any solution to steps 1 and 3 that I've found overwrites this solution or doesn't retain NaN in the correct spot for classes and instances.

Any help or pointing in the right direction would be extremely helpful!

答案1

得分: 1

我会只翻译代码部分,如下:

sub_comp = (
    sample_df["rdfs:range"].map(sample_dict)
        .fillna("object property") # <- 1st chain added
        .mask(sample_df["component"].isin(["class", "instance"])) # <- 2nd chain
)

sample_df.insert(1, "subcomponent", sub_comp)

输出

print(sample_df)

  component     subcomponent       rdfs:range
0     class              NaN              NaN
1  property    data property          xsd:int
2  instance              NaN              NaN
3  property  object property   obj:ObjectName
4  property    data property       xsd:string
5  property  object property  obj:Object2Name
6  instance              NaN              NaN
7     class              NaN              NaN
英文:

I would just fillna all the subcomponents with "object property" right after the mapping then mask those with a component isin "class" or "instance" :

sub_comp = (
    sample_df["rdfs:range"].map(sample_dict)
        .fillna("object property") # <- 1st chain added
        .mask(sample_df["component"].isin(["class", "instance"])) # <- 2nd chain
)

sample_df.insert(1, "subcomponent", sub_comp)

Output :

print(sample_df)

  component     subcomponent       rdfs:range
0     class              NaN              NaN
1  property    data property          xsd:int
2  instance              NaN              NaN
3  property  object property   obj:ObjectName
4  property    data property       xsd:string
5  property  object property  obj:Object2Name
6  instance              NaN              NaN
7     class              NaN              NaN

huangapple
  • 本文由 发表于 2023年6月22日 00:53:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525547.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定