创建一个新的数据框,其中原始数据框中的一个字段为空。

huangapple go评论67阅读模式
英文:

Creating a new dataframe where a field is blank in the original dataframe

问题

Using Python3 and Pandas. I am admittedly pretty new and I'm having a hard time searching for an answer to this question.

我正在使用Python3和Pandas。坦白说,我是个新手,我在寻找答案时遇到了困难。

I have a dataframe that contains lots of information and I'm trying to get a dataframe that is just the items where one specific field in the original is blank.

我有一个包含大量信息的数据框,我试图获得一个只包含原始数据中特定字段为空的项目的数据框。

I have queried my database to get a dataframe I am calling full_df which is all information on all items in the database. I want to now create a new dataframe that selects just the items where one field in full_df is blank.

我已经查询了我的数据库,获得了一个数据框,我称之为full_df,其中包含了数据库中所有项目的所有信息。现在,我想创建一个新的数据框,只选择full_df中某个字段为空的项目。

This is what I've tried:

这是我尝试过的方法:

no_rate = full_df[(full_df['rate'] == "")]

Which is returning nothing even though I know for a fact that there are loads of items where 'rate' is blank. I expected the dataframe no_rate to be populated with all the items where 'rate' is blank.

尽管我明知道有很多'rate'字段为空的项目,但这段代码返回了空值。我期望数据框no_rate中包含所有'rate'字段为空的项目。

How do I select those items for this new dataframe?

我该如何选择这些项目放入新的数据框中?

英文:

Using Python3 and Pandas. I am admittedly pretty new and I'm having a hard time searching for an answer to this question.

I have a dataframe that contains lots of information and I'm trying to get a dataframe that is just the items where one specific field in the original is blank.

I have queried my database to get a dataframe I am calling full_df which is all information on all items in the database. I want to now create a new dataframe that selects just the items where one field in full_df is blank.

This is what I've tried:

no_rate = full_df[(full_df['rate'] == "")]

Which is returning nothing even though I know for a fact that there are loads of items where 'rate' is blank. I expected the dataframe no_rate to be populated with all the items where 'rate' is blank.

How do I select those items for this new dataframe?

答案1

得分: 0

这是你要的翻译部分:

  • 首先,你需要检查你的rate列的数据类型是字符串还是对象。可以使用 df.dtypes 来查看。如果不是字符串,那么你就不能用 "" 来测试它。

  • 其次,要进行条件选择,可以使用 loc

  • 如果你的rate列看起来像这样:

df = pd.DataFrame({'Rate': ['good', 'good', 'bad', 'medium', '', 'bad', '', 'good']})
df

那么你可以写:

df.loc[df['Rate']==""]

将会得到:

  Rate
4     
6     

这实际上显示了内容,但由于没有内容,所以看起来只有行号。为了更清晰地看到结果,让我们添加另一列。

  • 添加另一列以查看结果更清晰:
df['Color'] = ['Red', 'Blue', 'Yellow', 'Red', 'Yellow', 'Red', 'Green', 'Blue']
df

df.loc[df['Rate'] == ""]

将显示:

  Rate   Color
4          Yellow
6          Green
  • 如果你的rate实际上是一个数字:
df['Decimal_Rate'] = [.8, .8, .3, .6, np.nan, .2, np.nan, .9]
df

如果你想要隔离空的数字单元格,你可以这样做:

df.loc[df['Decimal_Rate'].isna()]

这将得到:

  Rate   Color   Decimal_Rate
4          Yellow
6          Green
英文:

There are a few things you need to do. First of all, is the data type of your rate column a string, or object? df.dtypes will tell you. If not, then you can't test it against "".

Second, and more to the point, a way to do a conditional select is by useing loc.

So, if your rate column looks like this

df = pd.DataFrame({'Rate': ['good', 'good', 'bad', 'medium', '', 'bad', '', 'good']})
df

	Rate
0	good
1	good
2	bad
3	medium
4	
5	bad
6	
7	good

then you could write

df.loc[df['Rate']==""]

and get

	Rate
4	
6	

which is actually showing you the contents, but since there is nothing in there, it looks like just the row numbers. Let's add another column to see the results more plainly.

df['Color'] = ['Red', 'Blue', 'Yellow', 'Red', 'Yellow', 'Red', 'Green', 'Blue']
df
	Rate	Color
0	good	Red
1	good	Blue
2	bad	Yellow
3	medium	Red
4		Yellow
5	bad	Red
6		Green
7	good	Blue




and

df.loc[df['Rate'] == ""]

shows

	Rate	Color
4		Yellow
6		Green

So, what if your rate is actually a number

df['Decimal_Rate'] = [.8, .8, .3, .6, np.nan, .2, np.nan, .9]
df
	Rate	Color	Decimal_Rate
0	good	Red	0.8
1	good	Blue	0.8
2	bad	Yellow	0.3
3	medium	Red	0.6
4		Yellow	
5	bad	Red	0.2
6		Green	
7	good	Blue	0.9

if you wanted to isolate the empty cells of numbers, you can go like this:

df.loc[df['Decimal_Rate'].isna()]

which results in

	Rate	Color	Decimal_Rate
4		Yellow	
6		Green	

huangapple
  • 本文由 发表于 2023年3月7日 02:48:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654687.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定