英文:
problem with if statement? not equal rule doesnt work?
问题
我有1000个具有相同结构的数据框,但其中一些可能包含字符串值。我需要对所有这些数据框执行相同的计算,只是排除包含子字符串的行。带有字符串数据集结构的示例如下:
time x y z
0.00 run_failed run_failed run_failed
0.02 run_failed run_failed run_failed
0.03 test_failed test_failed test_failed
0.04 44 321 644
0.04 44 321 644
0.04 44 321 644
0.03 test_failed test_failed test_failed
0.04 44 321 644
0.04 44 321 644
当使用df.dtypes
方法查看时,包含子字符串的数据框将始终是对象类型,而“正常”的数据框将是float64类型。
因此,为了处理这个问题,我创建了以下脚本:
for df in dfs:
add = 0
z = pd.read_csv(df)
if type(z["x"]) != np.float64:
bmb = z[z['x'].str.contains('failed')]
z = z.drop(bmb.index)
add = len(bmb)
print(add)
# 然后是进行计算的代码,假设如果出现字符串,它在if语句内被删除
但当运行代码时,它返回错误:"只能对字符串值使用 .str 访问器!",指向if语句块内部,然而数据集完全是float64类型,为什么它尝试处理这个命令 "bmb = z[z['x'].str.contains('failed')]" 对我来说一点都不清楚。
【注意】:上述内容是您提供的信息的翻译,没有其他内容。
英文:
I have 1000 dataframes that have same structure, but some of them may contain strings as values. With all those frames I need to do the same calculations, just exclude rows in those dataframes where subsrings occur. The example of the structure of the dataset with string is as follows:
time x y z
0.00 run_failed run_failed run_failed
0.02 run_failed run_failed run_failed
0.03 test_failed test_failed test_failed
0.04 44 321 644
0.04 44 321 644
0.04 44 321 644
0.03 test_failed test_failed test_failed
0.04 44 321 644
0.04 44 321 644
If too look at df.dtypes method, those dfs that contain substring will always be object type, while "normal" dfs - of float64
So in order to deal with it I made the following script:
for df in dfs:
add = 0
z = pd.read_csv(df)
if type(z["x"]) != np.float64:
bmb = z[z['x'].str.contains('failed')]
z = z.drop(bmb.index)
add = len(bmb)
print(add)
....
and then the code for doing calculations assuming that if string occured, it was dropped inside if statement
But when I run code it returns error: "Can only use .str accessor with string values!" pointing inside if statement block, however the dataset if fully of float64 type and why it tried to process this " bmb = z[z['x'].str.contains('failed')]" command is not clear for me at all.
答案1
得分: 2
如果 z["x"] 的类型不是 np.float64:
这里你在获取 DataFrame 的*列*的类型,它是一个 `Series`,考虑以下简单的例子:
```python
import pandas as pd
df = pd.DataFrame({"x":[1,2,3]},dtype="int32")
print(type(df["x"]))
输出结果是:
<class 'pandas.core.series.Series'>
因此,你的条件始终成立(即相当于写 if True:
)。如果你对持有值的类型感兴趣,请使用 .dtype
属性:
print(df["x"].dtype)
输出结果是:
int32
<details>
<summary>英文:</summary>
if type(z["x"]) != np.float64:
Here you are getting type of *column of DataFrame* which is `Series` consider following simple example
import pandas as pd
df = pd.DataFrame({"x":[1,2,3]},dtype="int32")
print(type(df["x"]))
gives output
<class 'pandas.core.series.Series'>
Therefore your condition always holds (i.e. it is same as writing `if True:`). If you are interested in type of hold values use `.dtype` attribute
print(df["x"].dtype)
gives output
int32
</details>
# 答案2
**得分**: 0
我想我解决了问题。首先,我将以下代码中的部分进行了更改:
if type(z["x"]) != np.float64
改为
z['x'].dtype == 'object'
在后续的计算中,当我将pandas列作为Series获取时,我添加了`.astype(float)`,因为似乎如果通过if语句处理了DataFrame,列中的值变为字符串,如下所示返回:
array(['0.00260045', '0.00257398', '0.00247482', ..., '0.02017634', '0.01997158','0.02019846'])
<details>
<summary>英文:</summary>
I think I figured the problem.
First, I changed
if type(z["x"]) != np.float64
to
z['x'].dtype == 'object'
and in further calculations, when I get pandas column as series, I added .astype(float) as seems like if dataframe was processed through if statement, values in the column become string, as they were returned as follows:
array(['0.00260045', '0.00257398', '0.00247482', ..., '0.02017634', '0.01997158','0.02019846'])
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论