2023年4月7日 00:57:27go评论126阅读模式

英文:

problem with if statement? not equal rule doesnt work?

问题

我有1000个具有相同结构的数据框，但其中一些可能包含字符串值。我需要对所有这些数据框执行相同的计算，只是排除包含子字符串的行。带有字符串数据集结构的示例如下：

    time    x           y               z
    0.00  run_failed  run_failed   run_failed
    0.02  run_failed  run_failed   run_failed
    0.03  test_failed test_failed  test_failed
    0.04  44            321         644
    0.04  44            321         644
    0.04  44            321         644
    0.03  test_failed test_failed  test_failed
    0.04  44            321         644
    0.04  44            321         644

当使用df.dtypes方法查看时，包含子字符串的数据框将始终是对象类型，而“正常”的数据框将是float64类型。

因此，为了处理这个问题，我创建了以下脚本：

for df in dfs:  
   add = 0
   z = pd.read_csv(df)
   if type(z["x"]) != np.float64:
        bmb = z[z['x'].str.contains('failed')]
        z = z.drop(bmb.index)
        add = len(bmb)
        print(add)
   # 然后是进行计算的代码，假设如果出现字符串，它在if语句内被删除

但当运行代码时，它返回错误："只能对字符串值使用 .str 访问器！"，指向if语句块内部，然而数据集完全是float64类型，为什么它尝试处理这个命令 "bmb = z[z['x'].str.contains('failed')]" 对我来说一点都不清楚。

【注意】：上述内容是您提供的信息的翻译，没有其他内容。

英文:

I have 1000 dataframes that have same structure, but some of them may contain strings as values. With all those frames I need to do the same calculations, just exclude rows in those dataframes where subsrings occur. The example of the structure of the dataset with string is as follows:

time    x           y               z
0.00  run_failed  run_failed   run_failed
0.02  run_failed  run_failed   run_failed
0.03  test_failed test_failed  test_failed
0.04  44            321         644
0.04  44            321         644
0.04  44            321         644
0.03  test_failed test_failed  test_failed
0.04  44            321         644
0.04  44            321         644

If too look at df.dtypes method, those dfs that contain substring will always be object type, while "normal" dfs - of float64

So in order to deal with it I made the following script:

for df in dfs:  
   add = 0
   z = pd.read_csv(df)
   if type(z[&quot;x&quot;]) != np.float64:
        bmb = z[z[&#39;x&#39;].str.contains(&#39;failed&#39;)]
        z = z.drop(bmb.index)
        add = len(bmb)
        print(add)
   ....
   and then the code for doing calculations assuming that if string occured, it was dropped inside if statement

But when I run code it returns error: "Can only use .str accessor with string values!" pointing inside if statement block, however the dataset if fully of float64 type and why it tried to process this " bmb = z[z['x'].str.contains('failed')]" command is not clear for me at all.

答案1

得分: 2

如果 z["x"] 的类型不是 np.float64：

这里你在获取 DataFrame 的*列*的类型，它是一个 `Series`，考虑以下简单的例子：

```python
import pandas as pd
df = pd.DataFrame({"x":[1,2,3]},dtype="int32")
print(type(df["x"]))

输出结果是：

<class 'pandas.core.series.Series'>

因此，你的条件始终成立（即相当于写 if True:）。如果你对持有值的类型感兴趣，请使用 .dtype 属性：

print(df["x"].dtype)

输出结果是：

int32


<details>
<summary>英文:</summary>

    if type(z[&quot;x&quot;]) != np.float64:

Here you are getting type of *column of DataFrame* which is `Series` consider following simple example

    import pandas as pd
    df = pd.DataFrame({&quot;x&quot;:[1,2,3]},dtype=&quot;int32&quot;)
    print(type(df[&quot;x&quot;]))

gives output

    &lt;class &#39;pandas.core.series.Series&#39;&gt;

Therefore your condition always holds (i.e. it is same as writing `if True:`). If you are interested in type of hold values use `.dtype` attribute

    print(df[&quot;x&quot;].dtype)

gives output

    int32





</details>



# 答案2
**得分**: 0

我想我解决了问题。首先，我将以下代码中的部分进行了更改：

if type(z["x"]) != np.float64


改为

z['x'].dtype == 'object'


在后续的计算中，当我将pandas列作为Series获取时，我添加了`.astype(float)`，因为似乎如果通过if语句处理了DataFrame，列中的值变为字符串，如下所示返回：

array(['0.00260045', '0.00257398', '0.00247482', ..., '0.02017634', '0.01997158','0.02019846'])


<details>
<summary>英文:</summary>

I think I figured the problem.
First, I changed 

    if type(z[&quot;x&quot;]) != np.float64
to

    z[&#39;x&#39;].dtype == &#39;object&#39;


and in further calculations, when I get pandas column as series, I added .astype(float) as seems like if dataframe was processed through if statement, values in the column become string, as they were returned as follows:

    array([&#39;0.00260045&#39;, &#39;0.00257398&#39;, &#39;0.00247482&#39;, ..., &#39;0.02017634&#39;, &#39;0.01997158&#39;,&#39;0.02019846&#39;])

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题出在 if 语句吗？不等规则不起作用吗？

问题

答案1

Python的`except`能匹配整个错误链中的所有错误吗？

如何从使用Docker部署的AWS Lambda Python处理程序运行终端命令？

Python equivalent of Golang's select on channels

比较 Pandas DataFrame 中的两列并根据匹配输出其他列的值。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论