如何检查 Pandas 数据框中列中具有共同组 ID 的相邻行值是否相等?

huangapple go评论74阅读模式
英文:

How to check if adjacent row values with common group ID in a column are equal in a pandas dataframe?

问题

我有一个由7列组成的数据框,假设每个FID都是一个研究对象。在“FamilyID”中共享相同值的对象属于同一“组”。"YDDW"指示研究对象的名称。

现在我想比较相邻的两个FID的名称("YDDW"中的值)是否按日期顺序("Order_Date"中的值)相同。如果相邻的两个名称相同,则我想在新列"Classification"中为具有较早日期(由"Order_Date"定义)的对象分配值"A";如果相邻的名称不同,则在新列中分配"B"。

以下是数据框的快照。属于组006的两个FID 1506、3388在"YDDW"中有相同的名称,然后将在"Classification"列中为3388的行分配"A";组027中的两个对象40、2369在"YDDW"中有不同的名称,然后将在"Classification"列中为2369的行分配"B"。

我该如何实现这些?提前感谢!

英文:

I have a dataframe consisting of 7 columns, say each FID is a study object. The objects sharing common values in "FamilyID" are within the same "group". The "YDDW" indicates the names of the study objects.

Now I would like to compare whether two adjacent FID's names (values in "YDDW") are the same along the order of date (values in "Order_Date"). If the two adjacent names are the same, then I would like to assign value "A" to the object with earlier date (defined by "Order_Date") in a new column "Classification"; if the adjacent names are different, then assign "B" in the new column.

Below is the snapshot of the dataframe. The two FIDs 1506, 3388 belonging to the group 006 have the same names in "YDDW", then "A" will be assigned to the row of 3388 in "Classification" column; the two objects 40, 2369 within the group 027 have different names in "YDDW", then "B" will be assigned to the row of 2369 in "Classification" column.

How may I implement these? Thanks in advance!

如何检查 Pandas 数据框中列中具有共同组 ID 的相邻行值是否相等?

答案1

得分: 0

根据 @Quang Hoang 的评论,你可以尝试这样做:

d = {True: "A", False: "B"}

输出:

print(df)

   HighOverID  FamilyID   FID  Order_Date  Order_Year  YDDW  FamilyOrder Classification
0        1506         6  1506  2021-08-25        2021  val1            2              B
1        3388         6  3388  2019-01-14        2019  val1            1              A
2          40        27    40  2023-02-23        2023  val2            2              B
3        2369        27  2369  2020-11-10        2020  val3            1              B
4        1203        55  1203  2021-11-24        2021  val4            2              B
5        3238        55  3238  2019-07-09        2019  val4            1              A
英文:

IIUC and to build upon @Quang Hoang comment, you can try this :

d = {True: "A", False: "B"}
​
df["Classification"] = df["YDDW"].eq(df.groupby("FamilyID")["YDDW"].shift()).map(d)

Output :

print(df)

   HighOverID  FamilyID   FID  Order_Date  Order_Year  YDDW  FamilyOrder Classification
0        1506         6  1506  2021-08-25        2021  val1            2              B
1        3388         6  3388  2019-01-14        2019  val1            1              A
2          40        27    40  2023-02-23        2023  val2            2              B
3        2369        27  2369  2020-11-10        2020  val3            1              B
4        1203        55  1203  2021-11-24        2021  val4            2              B
5        3238        55  3238  2019-07-09        2019  val4            1              A

huangapple
  • 本文由 发表于 2023年4月19日 22:29:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055708.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定