英文:
ID column based on condition in another column
问题
你好,以下是你要求的翻译部分:
嗨,
我有以下的A列,显示了连续的数据采样。如果数值发生变化,传感器正在测量数据,如果数值重复出现,传感器未测量实际数值。
我想要遍历A列,并根据以下条件分配一个ID列:如果测量开始且数值连续变化,ID应该相同。如果数值不变,ID应递增一次。然后,如果数值再次变化,ID列应再次递增,依此类推。
迄今为止尝试了以下代码,但未准确分配ID:
import pandas as pd
data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3,
3, 3, 3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]
df = pd.DataFrame({'A': data})
# 初始化ID
df['ID'] = 0
# 分配ID
id_value = 1
for i in range(1, len(df)):
if df.at[i, 'A'] != df.at[i-1, 'A']:
df.at[i, 'ID'] = df.at[i-1, 'ID']
else:
df.at[i, 'ID'] = id_value
id_value += 1
print(df)
期望输出:
A ID
0 3.5 0
1 3.6 0
2 3.7 0
3 3.8 0
4 1.0 1
5 1.0 1
6 1.0 1
7 1.0 1
8 1.0 1
9 3.9 2
10 4.0 2
…
在我的输出中,数据连续递增,因此每行都有自己的ID。
希望这对你有帮助。如果有其他问题,请随时提问。
英文:
Hej,
I have the following column A which shows continuous data sampling. If the value changes the Sensor is measuring data, if the values repeat itself the sensor is not measuring actual values.
`
I would like to loop through A and assign an ID column with the condition that if the measurement starts and the values are changing continuously the ID should be the same. If the values are not changing the ID should increment by one. If then the values start changing again another increment should be added to the ID column and so forth.
Tried this so far which does not accurately assigns the ID
import pandas as pd
data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3,
3, 3, 3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]
df = pd.DataFrame({'A': data})
# Initialize ID
df['ID'] = 0
# Assign ID
id_value = 1
for i in range(1, len(df)):
if df.at[i, 'A'] != df.at[i-1, 'A']:
df.at[i, 'ID'] = df.at[i-1, 'ID']
else:
df.at[i, 'ID'] = id_value
id_value += 1
print(df)
Expected output:
A ID
0 3.5 0
1 3.6 0
2 3.7 0
3 3.8 0
4 1.0 1
5 1.0 1
6 1.0 1
7 1.0 1
8 1.0 1
9 3.9 2
10 4.0 2
…
In my output the data is incremented continuously, so each row has its own ID.
答案1
得分: 2
如果期望的输出是正确的,您可以通过查看前一行和下一行的值来执行此操作,使用shift方法。将结果分配给一个新列,以便我们可以使用累积和来处理它。
>>> df = pd.DataFrame({'A': data})
>>> df['Changing'] = (df['A'] != df['A'].shift(-1)) & (df['A'] != df['A'].shift())
>>> df
A Changing
0 3.5 True
1 3.6 True
2 3.7 True
3 3.8 True
4 1.0 False
5 1.0 False
6 1.0 False
7 1.0 False
8 1.0 False
9 3.9 True
10 4.0 True
11 4.2 True
12 4.4 True
13 4.6 True
14 4.8 True
15 3.0 False
16 3.0 False
17 3.0 False
18 3.0 False
19 3.2 True
20 3.3 True
21 3.5 True
22 2.1 False
23 2.1 False
24 2.1 False
25 2.1 False
>>> df['ID'] = (df['Changing'] != df['Changing'].shift()).cumsum() - 1
>>> df
A Changing ID
0 3.5 True 0
1 3.6 True 0
2 3.7 True 0
3 3.8 True 0
4 1.0 False 1
5 1.0 False 1
6 1.0 False 1
7 1.0 False 1
8 1.0 False 1
9 3.9 True 2
10 4.0 True 2
11 4.2 True 2
12 4.4 True 2
13 4.6 True 2
14 4.8 True 2
15 3.0 False 3
16 3.0 False 3
17 3.0 False 3
18 3.0 False 3
19 3.2 True 4
20 3.3 True 4
21 3.5 True 4
22 2.1 False 5
23 2.1 False 5
24 2.1 False 5
25 2.1 False 5
我从cumsum中减去1以符合期望的输出。然后,根据需要丢弃添加的列。
英文:
If the expected output is correct you can do this by looking at the value of the previous AND the next row using shift. Assigning the result to a new column so we can make use the cumulative sum against it.
>>> df = pd.DataFrame({'A': data})
>>> df['Changing'] = (df['A'] != df['A'].shift(-1)) & (df['A'] != df['A'].shift())
>>> df
A Changing
0 3.5 True
1 3.6 True
2 3.7 True
3 3.8 True
4 1.0 False
5 1.0 False
6 1.0 False
7 1.0 False
8 1.0 False
9 3.9 True
10 4.0 True
11 4.2 True
12 4.4 True
13 4.6 True
14 4.8 True
15 3.0 False
16 3.0 False
17 3.0 False
18 3.0 False
19 3.2 True
20 3.3 True
21 3.5 True
22 2.1 False
23 2.1 False
24 2.1 False
25 2.1 False
>>> df['ID'] = (df['Changing'] != df['Changing'].shift()).cumsum() - 1
>>> df
A Changing ID
0 3.5 True 0
1 3.6 True 0
2 3.7 True 0
3 3.8 True 0
4 1.0 False 1
5 1.0 False 1
6 1.0 False 1
7 1.0 False 1
8 1.0 False 1
9 3.9 True 2
10 4.0 True 2
11 4.2 True 2
12 4.4 True 2
13 4.6 True 2
14 4.8 True 2
15 3.0 False 3
16 3.0 False 3
17 3.0 False 3
18 3.0 False 3
19 3.2 True 4
20 3.3 True 4
21 3.5 True 4
22 2.1 False 5
23 2.1 False 5
24 2.1 False 5
25 2.1 False 5
I subtracted 1 from the cumsum to follow the expected output. You can then drop the added column as needed.
答案2
得分: 0
After several fails, I managed to get the output you wanted:
Explanation:
我编辑了脚本,以准确跟踪值是否连续变化或保持不变,并相应地分配ID。
- 添加了
prev_value
变量,以在循环中跟踪前一个值。 - 添加了
value_was_changing
标志,以跟踪前一个值是否发生了变化。 - 添加了
value_is_changing
以确定当前值是否与前一个值不同。 - 当在改变和不改变状态之间发生转换时,循环增加
id_value
。 - 在每次迭代结束时更新
prev_value
和value_was_changing
以在下一次迭代中使用。
英文:
After several fails, I managed to get the output you wanted
Code
import pandas as pd
data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3, 3, 3, 3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]
df = pd.DataFrame({'A': data})
# Initialize ID
df['ID'] = 0
# Assign ID
id_value = 0
prev_value = df.at[0, 'A']
value_was_changing = True
for i in range(1, len(df)):
value_is_changing = df.at[i, 'A'] != prev_value
if value_is_changing:
if not value_was_changing:
id_value += 1
else:
if value_was_changing:
id_value += 1
df.at[i, 'ID'] = id_value
prev_value = df.at[i, 'A']
value_was_changing = value_is_changing
print(df)
Output
A ID
0 3.5 0
1 3.6 0
2 3.7 0
3 3.8 0
4 1.0 0
5 1.0 1
6 1.0 1
7 1.0 1
8 1.0 1
9 3.9 2
10 4.0 2
11 4.2 2
12 4.4 2
13 4.6 2
14 4.8 2
15 3.0 2
16 3.0 3
17 3.0 3
18 3.0 3
19 3.2 4
20 3.3 4
21 3.5 4
22 2.1 4
23 2.1 5
24 2.1 5
25 2.1 5
Explanation
I edited the script to accurately track whether the values are continuously changing or staying the same, and assign IDs accordingly.
- Added
prev_value
variable to keep track of the previous value in the loop - Added
value_was_changing
flag to keep track of whether the previous value was changing or not. - Added
value_is_changing
to determine if the current value is different from the previous value. - The loop increments
id_value
when there's a transition between changing and not changing states. - Updated
prev_value
andvalue_was_changing
at the end of each iteration for use in the next iteration.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论