英文:
If statement to add column to pandas dataframe gives the same values
问题
我想要添加一个名为"Spike"的新列到我的pandas数据帧"week5_233C"中。我的Python版本是3.19.13。
我编写了一个if语句来向我的数据集中添加一个新列:"Spike"。如果"Value [pV]"列中的值不等于0,我想在那一行中添加一个1。如果"Value [pV]"等于0,那么我想在"Spike"列中添加0。
数据看起来像这样:
TimeStamp [µs] Value [pV]
0 1906200 0
1 1906300 0
2 1906400 0
3 1906500 -149012
4 1906600 -149012
我希望它看起来像这样:
TimeStamp [µs] Value [pV] Spike
0 1906200 0 0
1 1906300 0 0
2 1906400 0 0
3 1906500 -149012 1
4 1906600 -149012 1
我尝试过:
week5_233C.loc[week5_233C[' Value [pV]'] != 0, 'Spike'] = 1
week5_233C.loc[week5_233C[' Value [pV]'] == 0, 'Spike'] = 0
但"Spike"列中的所有行都获得相同的值。
我还尝试过:
week5_233C['Spike'] = week5_233C[' Value [pV]'].apply(lambda x: 0 if x == 0 else 1)
再次,它只添加了0或1,但不适用"if"和"else"。查看示例数据:
TimeStamp [µs] Value [pV] Spike
0 1906200 0 1
1 1906300 0 1
2 1906400 0 1
3 1906500 -149012 1
4 1906600 -149012 1
像这样执行:
for i in week5_233C[' Value [pV]']:
if i != 0:
week5_233C['Spike'] = 1
elif i == 0:
week5_233C['Spike'] = 0
什么也不做:不添加列,不报错,并使Python崩溃。
然而,当我运行这个if语句只有一个打印如下:
for i in week5_233C[' Value [pV]']:
if i != 0:
print(1)
elif i == 0:
print(0)
那么它确实打印出我想要的确切值。我无法弄清如何将这些值保存在新列中。
这个:
for i in week5_233C[' Value [pV]']:
if i != 0:
week5_233C.concat([1, df.iloc['Spike']])
elif i == 0:
week5_233C.concat([0, df.iloc['Spike']])
给出了一个错误:"AttributeError: 'DataFrame' object has no attribute 'concat'"
如何创建一个新的"Spike"列,并根据"Value [pV]"列中的值添加0和1呢?
翻译结果:
要创建新的"Spike"列并根据"Value [pV]"列中的值添加0和1,您可以使用以下代码:
week5_233C['Spike'] = week5_233C['Value [pV]'].apply(lambda x: 0 if x == 0 else 1)
这段代码会为"Spike"列赋予相应的值,0表示"Value [pV]"为0,1表示"Value [pV]"不为0。
如果您需要任何进一步的帮助,请随时告诉我。
英文:
I want to add a new column called I have a pandas dataframe called week5_233C
. My Python version is 3.19.13.
I wrote an if-statement to add a new column to my data set: Spike
. If the value in Value [pV]
is not equal to 0, I want to add a 1 to that row. If Value [pV]
is 0, then I want to add in the spike
column that it is 0.
The data looks like this:
TimeStamp [µs] Value [pV]
0 1906200 0
1 1906300 0
2 1906400 0
3 1906500 -149012
4 1906600 -149012
And I want it to look like this:
TimeStamp [µs] Value [pV] Spike
0 1906200 0 0
1 1906300 0 0
2 1906400 0 0
3 1906500 -149012 1
4 1906600 -149012 1
I tried:
week5_233C.loc[week5_233C[' Value [pV]'] != 0, 'Spike'] = 1
week5_233C.loc[week5_233C[' Value [pV]'] == 0, 'Spike'] = 0
but all rows in column Spike
get the same value.
I also tried:
week5_233C['Spike'] = week5_233C[' Value [pV]'].apply(lambda x: 0 if x == 0 else 1)
Again, it just adds only 0s or only 1s, but does not work with if
and else
. See example data:
TimeStamp [µs] Value [pV] Spike
0 1906200 0 1
1 1906300 0 1
2 1906400 0 1
3 1906500 -149012 1
4 1906600 -149012 1
Doing it like this:
for i in week5_233C[' Value [pV]']:
if i != 0:
week5_233C['Spike'] = 1
elif i == 0:
week5_233C['Spike'] = 0
does not do anything: does not add a column, does not give an error, and makes Python crash.
However, when I run this if-statement with just a print as such:
for i in week5_233C[' Value [pV]']:
if i != 0:
print(1)
elif i == 0:
print(0)
then it does print the exact values I want. I cannot figure out how to save these values in a new column.
This:
for i in week5_233C[' Value [pV]']:
if i != 0:
week5_233C.concat([1, df.iloc['Spike']])
elif i == 0:
week5_233C.concat([0, df.iloc['Spike']])
gives me an error: AttributeError: 'DataFrame' object has no attribute 'concat'
How can I make a new column Spike
and add the values 0 and 1 based on the value in column Value [pV]
?
答案1
得分: 2
请检查 Value [pV]
列的数据类型。您可能有字符串数据,这就是为什么您得到相同的值。尝试执行 print(df['Value [pV]'].dtype)
。如果是 object
,尝试使用 astype(float)
或 pd.to_numeric(df['Value [pV]])
进行转换。
您还可以尝试以下操作:
df['spike'] = np.where(df['Value [pV]'] == '0', 0, 1)
更新
要显示错误行并调试您的数据框,请使用以下代码:
df.loc[pd.to_numeric(df['Value [pV]'], errors='coerce').isna(), 'Value [pV]']
英文:
I think you should check the dtype of Value [pV]
column. You probably have string that's why you have the same value. Try print(df['Value [pV]'].dtype)
. If object
try to convert with astype(float)
or pd.to_numeric(df['Value [pV]'])
.
You can also try:
df['spike'] = np.where(df['Value [pV]'] == '0', 0, 1)
Update
To show bad rows and debug your datafame, use the following code:
df.loc[pd.to_numeric(df['Value [pV]'], errors='coerce').isna(), 'Value [pV]']
答案2
得分: 0
import pandas as pd
df = pd.DataFrame({'TimeStamp [μs]':[1906200, 1906300, 1906400, 1906500, 1906600],
'Value [pV] ':[0, 0, 0, -149012, -149012],
})
df['Spike'] = df.agg({'Value [pV] ': lambda v: int(bool(v))})
print(df)
英文:
import pandas as pd
df = pd.DataFrame({'TimeStamp [µs]':[1906200, 1906300, 1906400, 1906500, 1906600],
'Value [pV] ':[0, 0, 0, -149012, -149012],
})
df['Spike'] = df.agg({'Value [pV] ': lambda v: int(bool(v))})
print(df)
TimeStamp [µs] Value [pV] Spike
0 1906200 0 0
1 1906300 0 0
2 1906400 0 0
3 1906500 -149012 1
4 1906600 -149012 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论