英文:
Change Pandas column based on values in two other columns
问题
我尝试了许多不同的解决方案,但似乎找不到一个能正常工作的答案。
所以基本上我有一个如下的pandas dataframe:
组件 ... rdfs:range
-----------------------
类 NaN
属性 xsd:int
实例 NaN
属性 obj:对象名称
属性 xsd:字符串
属性 obj:Object2名称
实例 NaN
类 NaN
我需要的输出是:
组件 ... 子组件 ... rdfs:range
----------------------------
类 NaN NaN
属性 数据属性 xsd:int
实例 NaN NaN
属性 对象属性 obj:对象名称
属性 数据属性 xsd:字符串
属性 对象属性 obj:Object2名称
实例 NaN NaN
类 NaN NaN
我有一个字典,其中包含了所有作为数据属性的rdfs:range
值,但对象属性的数量太大,无法手动编目(而且数据属性和对象属性的前缀不同,与上面不同,因此无法使用字符串匹配)。
理想的行为是:
- 对于类和实例,保持"本体子组件"为
NaN
。 - 将"本体子组件"根据
rdfs:range
列中的字典内容进行转换。 - 对于步骤2中不在字典中的所有属性,将"本体子组件"转换为"对象属性"。
我已经分别找到了如何实现步骤1和步骤2,但它们会相互覆盖,即sample_df["本体子组件"] = sample_df["rdfs:range"].map(sample_dict)
解决了步骤2,但我找到的任何解决步骤1和步骤3的方法都会覆盖此解决方案,或者不能在类和实例的正确位置保留NaN
。
任何帮助或指导都将非常有帮助!
英文:
I've tried a number of different solutions to this but can't seem to come across an answer which is functioning as necessary.
So basically I have a pandas dataframe as follows:
component ... rdfs:range
-----------------------------
class NaN
property xsd:int
instance NaN
property obj:ObjectName
property xsd:string
property obj:Object2Name
instance NaN
class NaN
What I need as an output is the following:
component ... subcomponent ... rdfs:range
-----------------------------------------------
class NaN NaN
property data property xsd:int
instance NaN NaN
property object property obj:ObjectName
property data property xsd:string
property object property obj:Object2Name
instance NaN NaN
class NaN NaN
I have a dictionary where I have all of the rdfs:range
values that are data properties are enumerated, but the number of object properties is too large to catalogue manually (as well both the data properties and object properties have different prefixes, unlike above, so string matching is out of the question).
Ideal behavior is:
- Keep "ontology subcomponent" as
NaN
for classes and instances. - Turn "ontology subcomponent" to whatever is in the dictionary based on the
rdfs:range
column. - Turn "ontology subcomponent" to "object property" for all properties not in the dictionary in step 2.
I have figured out how to achieve steps 1 and 2 separately, but they keep overwriting one another, i.e. sample_df["ontology subcomponent"] = sample_df["rdfs:range"].map(sample_dict)
solves step 2 but any solution to steps 1 and 3 that I've found overwrites this solution or doesn't retain NaN
in the correct spot for classes and instances.
Any help or pointing in the right direction would be extremely helpful!
答案1
得分: 1
我会只翻译代码部分,如下:
sub_comp = (
sample_df["rdfs:range"].map(sample_dict)
.fillna("object property") # <- 1st chain added
.mask(sample_df["component"].isin(["class", "instance"])) # <- 2nd chain
)
sample_df.insert(1, "subcomponent", sub_comp)
输出:
print(sample_df)
component subcomponent rdfs:range
0 class NaN NaN
1 property data property xsd:int
2 instance NaN NaN
3 property object property obj:ObjectName
4 property data property xsd:string
5 property object property obj:Object2Name
6 instance NaN NaN
7 class NaN NaN
英文:
I would just fillna
all the subcomponents with "object property"
right after the mapping then mask
those with a component isin
"class"
or "instance"
:
sub_comp = (
sample_df["rdfs:range"].map(sample_dict)
.fillna("object property") # <- 1st chain added
.mask(sample_df["component"].isin(["class", "instance"])) # <- 2nd chain
)
sample_df.insert(1, "subcomponent", sub_comp)
Output :
print(sample_df)
component subcomponent rdfs:range
0 class NaN NaN
1 property data property xsd:int
2 instance NaN NaN
3 property object property obj:ObjectName
4 property data property xsd:string
5 property object property obj:Object2Name
6 instance NaN NaN
7 class NaN NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论