根据另一列的条件创建ID列

huangapple go评论64阅读模式
英文:

ID column based on condition in another column

问题

你好,以下是你要求的翻译部分:

嗨,
我有以下的A列,显示了连续的数据采样。如果数值发生变化,传感器正在测量数据,如果数值重复出现,传感器未测量实际数值。
我想要遍历A列,并根据以下条件分配一个ID列:如果测量开始且数值连续变化,ID应该相同。如果数值不变,ID应递增一次。然后,如果数值再次变化,ID列应再次递增,依此类推。
迄今为止尝试了以下代码,但未准确分配ID:
import pandas as pd

data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3,
3, 3, 3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]

df = pd.DataFrame({'A': data})

# 初始化ID
df['ID'] = 0

# 分配ID
id_value = 1
for i in range(1, len(df)):
   if df.at[i, 'A'] != df.at[i-1, 'A']:
      df.at[i, 'ID'] = df.at[i-1, 'ID']
   else:
      df.at[i, 'ID'] = id_value
      id_value += 1

print(df)

期望输出:
      A  ID
0   3.5   0
1   3.6   0
2   3.7   0
3   3.8   0
4   1.0   1
5   1.0   1
6   1.0   1
7   1.0   1
8   1.0   1
9   3.9   2
10  4.0   2
…
在我的输出中,数据连续递增,因此每行都有自己的ID。

希望这对你有帮助。如果有其他问题,请随时提问。

英文:

Hej,
I have the following column A which shows continuous data sampling. If the value changes the Sensor is measuring data, if the values repeat itself the sensor is not measuring actual values.
`
I would like to loop through A and assign an ID column with the condition that if the measurement starts and the values are changing continuously the ID should be the same. If the values are not changing the ID should increment by one. If then the values start changing again another increment should be added to the ID column and so forth.
Tried this so far which does not accurately assigns the ID

import pandas as pd


data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3,        
3, 3,  3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]

df = pd.DataFrame({'A': data})

# Initialize ID 
df['ID'] = 0

# Assign ID
 id_value = 1
 for i in range(1, len(df)):
    if df.at[i, 'A'] != df.at[i-1, 'A']:
       df.at[i, 'ID'] = df.at[i-1, 'ID']
    else:
       df.at[i, 'ID'] = id_value
       id_value += 1

print(df)


Expected output:
      A  ID
 0   3.5   0
 1   3.6   0
 2   3.7   0
 3   3.8   0
 4   1.0   1
 5   1.0   1
 6   1.0   1
 7   1.0   1
 8   1.0   1
 9   3.9   2
 10  4.0   2

In my output the data is incremented continuously, so each row has its own ID.

答案1

得分: 2

如果期望的输出是正确的,您可以通过查看前一行和下一行的值来执行此操作,使用shift方法。将结果分配给一个新列,以便我们可以使用累积和来处理它。

>>> df = pd.DataFrame({'A': data})
>>> df['Changing'] = (df['A'] != df['A'].shift(-1)) & (df['A'] != df['A'].shift())
>>> df
      A  Changing
0   3.5      True
1   3.6      True
2   3.7      True
3   3.8      True
4   1.0     False
5   1.0     False
6   1.0     False
7   1.0     False
8   1.0     False
9   3.9      True
10  4.0      True
11  4.2      True
12  4.4      True
13  4.6      True
14  4.8      True
15  3.0     False
16  3.0     False
17  3.0     False
18  3.0     False
19  3.2      True
20  3.3      True
21  3.5      True
22  2.1     False
23  2.1     False
24  2.1     False
25  2.1     False
>>> df['ID'] = (df['Changing'] != df['Changing'].shift()).cumsum() - 1
>>> df
      A  Changing  ID
0   3.5      True   0
1   3.6      True   0
2   3.7      True   0
3   3.8      True   0
4   1.0     False   1
5   1.0     False   1
6   1.0     False   1
7   1.0     False   1
8   1.0     False   1
9   3.9      True   2
10  4.0      True   2
11  4.2      True   2
12  4.4      True   2
13  4.6      True   2
14  4.8      True   2
15  3.0     False   3
16  3.0     False   3
17  3.0     False   3
18  3.0     False   3
19  3.2      True   4
20  3.3      True   4
21  3.5      True   4
22  2.1     False   5
23  2.1     False   5
24  2.1     False   5
25  2.1     False   5

我从cumsum中减去1以符合期望的输出。然后,根据需要丢弃添加的列。

英文:

If the expected output is correct you can do this by looking at the value of the previous AND the next row using shift. Assigning the result to a new column so we can make use the cumulative sum against it.

>>> df = pd.DataFrame({'A': data})
>>> df['Changing'] = (df['A'] != df['A'].shift(-1)) & (df['A'] != df['A'].shift())
>>> df
A  Changing
0   3.5      True
1   3.6      True
2   3.7      True
3   3.8      True
4   1.0     False
5   1.0     False
6   1.0     False
7   1.0     False
8   1.0     False
9   3.9      True
10  4.0      True
11  4.2      True
12  4.4      True
13  4.6      True
14  4.8      True
15  3.0     False
16  3.0     False
17  3.0     False
18  3.0     False
19  3.2      True
20  3.3      True
21  3.5      True
22  2.1     False
23  2.1     False
24  2.1     False
25  2.1     False
>>> df['ID'] = (df['Changing'] != df['Changing'].shift()).cumsum() - 1
>>> df
A  Changing  ID
0   3.5      True   0
1   3.6      True   0
2   3.7      True   0
3   3.8      True   0
4   1.0     False   1
5   1.0     False   1
6   1.0     False   1
7   1.0     False   1
8   1.0     False   1
9   3.9      True   2
10  4.0      True   2
11  4.2      True   2
12  4.4      True   2
13  4.6      True   2
14  4.8      True   2
15  3.0     False   3
16  3.0     False   3
17  3.0     False   3
18  3.0     False   3
19  3.2      True   4
20  3.3      True   4
21  3.5      True   4
22  2.1     False   5
23  2.1     False   5
24  2.1     False   5
25  2.1     False   5

I subtracted 1 from the cumsum to follow the expected output. You can then drop the added column as needed.

答案2

得分: 0

After several fails, I managed to get the output you wanted:

Explanation:

我编辑了脚本,以准确跟踪值是否连续变化或保持不变,并相应地分配ID。

  1. 添加了 prev_value 变量,以在循环中跟踪前一个值。
  2. 添加了 value_was_changing 标志,以跟踪前一个值是否发生了变化。
  3. 添加了 value_is_changing 以确定当前值是否与前一个值不同。
  4. 当在改变和不改变状态之间发生转换时,循环增加 id_value
  5. 在每次迭代结束时更新 prev_valuevalue_was_changing 以在下一次迭代中使用。
英文:

After several fails, I managed to get the output you wanted

Code

import pandas as pd
data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3, 3, 3, 3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]
df = pd.DataFrame({'A': data})
# Initialize ID
df['ID'] = 0
# Assign ID
id_value = 0
prev_value = df.at[0, 'A']
value_was_changing = True
for i in range(1, len(df)):
value_is_changing = df.at[i, 'A'] != prev_value
if value_is_changing:
if not value_was_changing:
id_value += 1
else:
if value_was_changing:
id_value += 1
df.at[i, 'ID'] = id_value
prev_value = df.at[i, 'A']
value_was_changing = value_is_changing
print(df)

Output

      A  ID
0   3.5   0
1   3.6   0
2   3.7   0
3   3.8   0
4   1.0   0
5   1.0   1
6   1.0   1
7   1.0   1
8   1.0   1
9   3.9   2
10  4.0   2
11  4.2   2
12  4.4   2
13  4.6   2
14  4.8   2
15  3.0   2
16  3.0   3
17  3.0   3
18  3.0   3
19  3.2   4
20  3.3   4
21  3.5   4
22  2.1   4
23  2.1   5
24  2.1   5
25  2.1   5

Explanation

I edited the script to accurately track whether the values are continuously changing or staying the same, and assign IDs accordingly.

  1. Added prev_value variable to keep track of the previous value in the loop
  2. Added value_was_changing flag to keep track of whether the previous value was changing or not.
  3. Added value_is_changing to determine if the current value is different from the previous value.
  4. The loop increments id_value when there's a transition between changing and not changing states.
  5. Updated prev_value and value_was_changing at the end of each iteration for use in the next iteration.

huangapple
  • 本文由 发表于 2023年6月13日 03:59:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76459921.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定