2023年3月7日 02:48:01go评论72阅读模式

英文:

Creating a new dataframe where a field is blank in the original dataframe

问题

Using Python3 and Pandas. I am admittedly pretty new and I'm having a hard time searching for an answer to this question.

我正在使用Python3和Pandas。坦白说，我是个新手，我在寻找答案时遇到了困难。

I have a dataframe that contains lots of information and I'm trying to get a dataframe that is just the items where one specific field in the original is blank.

我有一个包含大量信息的数据框，我试图获得一个只包含原始数据中特定字段为空的项目的数据框。

I have queried my database to get a dataframe I am calling full_df which is all information on all items in the database. I want to now create a new dataframe that selects just the items where one field in full_df is blank.

我已经查询了我的数据库，获得了一个数据框，我称之为full_df，其中包含了数据库中所有项目的所有信息。现在，我想创建一个新的数据框，只选择full_df中某个字段为空的项目。

This is what I've tried:

这是我尝试过的方法：

no_rate = full_df[(full_df['rate'] == "")]

Which is returning nothing even though I know for a fact that there are loads of items where 'rate' is blank. I expected the dataframe no_rate to be populated with all the items where 'rate' is blank.

尽管我明知道有很多'rate'字段为空的项目，但这段代码返回了空值。我期望数据框no_rate中包含所有'rate'字段为空的项目。

How do I select those items for this new dataframe?

我该如何选择这些项目放入新的数据框中？

英文:

Using Python3 and Pandas. I am admittedly pretty new and I'm having a hard time searching for an answer to this question.

I have a dataframe that contains lots of information and I'm trying to get a dataframe that is just the items where one specific field in the original is blank.

This is what I've tried:

no_rate = full_df[(full_df[&#39;rate&#39;] == &quot;&quot;)]

How do I select those items for this new dataframe?

答案1

得分: 0

这是你要的翻译部分：

首先，你需要检查你的rate列的数据类型是字符串还是对象。可以使用 df.dtypes 来查看。如果不是字符串，那么你就不能用 "" 来测试它。
其次，要进行条件选择，可以使用 loc。
如果你的rate列看起来像这样：

df = pd.DataFrame({'Rate': ['good', 'good', 'bad', 'medium', '', 'bad', '', 'good']})
df

那么你可以写：

df.loc[df['Rate']==""]

将会得到：

  Rate
4     
6

这实际上显示了内容，但由于没有内容，所以看起来只有行号。为了更清晰地看到结果，让我们添加另一列。

添加另一列以查看结果更清晰：

df['Color'] = ['Red', 'Blue', 'Yellow', 'Red', 'Yellow', 'Red', 'Green', 'Blue']
df

和

df.loc[df['Rate'] == ""]

将显示：

  Rate   Color
4          Yellow
6          Green

如果你的rate实际上是一个数字：

df['Decimal_Rate'] = [.8, .8, .3, .6, np.nan, .2, np.nan, .9]
df

如果你想要隔离空的数字单元格，你可以这样做：

df.loc[df['Decimal_Rate'].isna()]

这将得到：

  Rate   Color   Decimal_Rate
4          Yellow
6          Green

英文:

There are a few things you need to do. First of all, is the data type of your rate column a string, or object? df.dtypes will tell you. If not, then you can't test it against "".

Second, and more to the point, a way to do a conditional select is by useing loc.

So, if your rate column looks like this

df = pd.DataFrame({&#39;Rate&#39;: [&#39;good&#39;, &#39;good&#39;, &#39;bad&#39;, &#39;medium&#39;, &#39;&#39;, &#39;bad&#39;, &#39;&#39;, &#39;good&#39;]})
df

	Rate
0	good
1	good
2	bad
3	medium
4	
5	bad
6	
7	good

then you could write

df.loc[df[&#39;Rate&#39;]==&quot;&quot;]

and get

	Rate
4	
6

which is actually showing you the contents, but since there is nothing in there, it looks like just the row numbers. Let's add another column to see the results more plainly.

df[&#39;Color&#39;] = [&#39;Red&#39;, &#39;Blue&#39;, &#39;Yellow&#39;, &#39;Red&#39;, &#39;Yellow&#39;, &#39;Red&#39;, &#39;Green&#39;, &#39;Blue&#39;]
df
	Rate	Color
0	good	Red
1	good	Blue
2	bad	Yellow
3	medium	Red
4		Yellow
5	bad	Red
6		Green
7	good	Blue

and

df.loc[df[&#39;Rate&#39;] == &quot;&quot;]

shows

	Rate	Color
4		Yellow
6		Green

So, what if your rate is actually a number

df[&#39;Decimal_Rate&#39;] = [.8, .8, .3, .6, np.nan, .2, np.nan, .9]
df
	Rate	Color	Decimal_Rate
0	good	Red	0.8
1	good	Blue	0.8
2	bad	Yellow	0.3
3	medium	Red	0.6
4		Yellow	
5	bad	Red	0.2
6		Green	
7	good	Blue	0.9

if you wanted to isolate the empty cells of numbers, you can go like this:

df.loc[df[&#39;Decimal_Rate&#39;].isna()]

which results in

	Rate	Color	Decimal_Rate
4		Yellow	
6		Green

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个新的数据框，其中原始数据框中的一个字段为空。

问题

答案1

fairseq WMT19机器翻译模型的.generate()函数的返回值是什么？

python subprocess标准输入在第一个空格处被截断。

如何用相应的月份和小时均值替换 NaN 值

无法解析表达式类型，未知的输出字段 – 在Django的Coalesce中返回模型实例？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论