问题出在 if 语句吗?不等规则不起作用吗?

huangapple go评论104阅读模式
英文:

problem with if statement? not equal rule doesnt work?

问题

我有1000个具有相同结构的数据框,但其中一些可能包含字符串值。我需要对所有这些数据框执行相同的计算,只是排除包含子字符串的行。带有字符串数据集结构的示例如下:

    time    x           y               z
    0.00  run_failed  run_failed   run_failed
    0.02  run_failed  run_failed   run_failed
    0.03  test_failed test_failed  test_failed
    0.04  44            321         644
    0.04  44            321         644
    0.04  44            321         644
    0.03  test_failed test_failed  test_failed
    0.04  44            321         644
    0.04  44            321         644

当使用df.dtypes方法查看时,包含子字符串的数据框将始终是对象类型,而“正常”的数据框将是float64类型。

因此,为了处理这个问题,我创建了以下脚本:

for df in dfs:  
   add = 0
   z = pd.read_csv(df)
   if type(z["x"]) != np.float64:
        bmb = z[z['x'].str.contains('failed')]
        z = z.drop(bmb.index)
        add = len(bmb)
        print(add)
   # 然后是进行计算的代码,假设如果出现字符串,它在if语句内被删除

但当运行代码时,它返回错误:"只能对字符串值使用 .str 访问器!",指向if语句块内部,然而数据集完全是float64类型,为什么它尝试处理这个命令 "bmb = z[z['x'].str.contains('failed')]" 对我来说一点都不清楚。

【注意】:上述内容是您提供的信息的翻译,没有其他内容。

英文:

I have 1000 dataframes that have same structure, but some of them may contain strings as values. With all those frames I need to do the same calculations, just exclude rows in those dataframes where subsrings occur. The example of the structure of the dataset with string is as follows:

time    x           y               z
0.00  run_failed  run_failed   run_failed
0.02  run_failed  run_failed   run_failed
0.03  test_failed test_failed  test_failed
0.04  44            321         644
0.04  44            321         644
0.04  44            321         644
0.03  test_failed test_failed  test_failed
0.04  44            321         644
0.04  44            321         644

If too look at df.dtypes method, those dfs that contain substring will always be object type, while "normal" dfs - of float64

So in order to deal with it I made the following script:

for df in dfs:  
   add = 0
   z = pd.read_csv(df)
   if type(z["x"]) != np.float64:
        bmb = z[z['x'].str.contains('failed')]
        z = z.drop(bmb.index)
        add = len(bmb)
        print(add)
   ....
   and then the code for doing calculations assuming that if string occured, it was dropped inside if statement

But when I run code it returns error: "Can only use .str accessor with string values!" pointing inside if statement block, however the dataset if fully of float64 type and why it tried to process this " bmb = z[z['x'].str.contains('failed')]" command is not clear for me at all.

答案1

得分: 2

如果 z["x"] 的类型不是 np.float64:

这里你在获取 DataFrame 的*列*的类型,它是一个 `Series`,考虑以下简单的例子:

```python
import pandas as pd
df = pd.DataFrame({"x":[1,2,3]},dtype="int32")
print(type(df["x"]))

输出结果是:

<class 'pandas.core.series.Series'>

因此,你的条件始终成立(即相当于写 if True:)。如果你对持有值的类型感兴趣,请使用 .dtype 属性:

print(df["x"].dtype)

输出结果是:

int32

<details>
<summary>英文:</summary>

    if type(z[&quot;x&quot;]) != np.float64:

Here you are getting type of *column of DataFrame* which is `Series` consider following simple example

    import pandas as pd
    df = pd.DataFrame({&quot;x&quot;:[1,2,3]},dtype=&quot;int32&quot;)
    print(type(df[&quot;x&quot;]))

gives output

    &lt;class &#39;pandas.core.series.Series&#39;&gt;

Therefore your condition always holds (i.e. it is same as writing `if True:`). If you are interested in type of hold values use `.dtype` attribute

    print(df[&quot;x&quot;].dtype)

gives output

    int32





</details>



# 答案2
**得分**: 0

我想我解决了问题。首先,我将以下代码中的部分进行了更改:

if type(z["x"]) != np.float64


改为

z['x'].dtype == 'object'


在后续的计算中,当我将pandas列作为Series获取时,我添加了`.astype(float)`,因为似乎如果通过if语句处理了DataFrame,列中的值变为字符串,如下所示返回:

array(['0.00260045', '0.00257398', '0.00247482', ..., '0.02017634', '0.01997158','0.02019846'])


<details>
<summary>英文:</summary>

I think I figured the problem.
First, I changed 

    if type(z[&quot;x&quot;]) != np.float64
to

    z[&#39;x&#39;].dtype == &#39;object&#39;


and in further calculations, when I get pandas column as series, I added .astype(float) as seems like if dataframe was processed through if statement, values in the column become string, as they were returned as follows:

    array([&#39;0.00260045&#39;, &#39;0.00257398&#39;, &#39;0.00247482&#39;, ..., &#39;0.02017634&#39;, &#39;0.01997158&#39;,&#39;0.02019846&#39;])

</details>



huangapple
  • 本文由 发表于 2023年4月7日 00:57:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75951990.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定