2023年6月15日 02:53:51go评论98阅读模式

英文:

Loop for filtering NaN's in dictionaries

问题

我打开了我的字典，我想写一个循环，以便只获得"favorite color"列等于NaN的行作为输出。

我的代码到目前为止：

# 导入模块
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# 定义文件路径
my_file_path = r'C:\Users\machukovich\Desktop\stack.xlsx'
# 将文件加载到数据框字典中
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)

my_dict输出：

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color   
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 2  Christina   Rodwell  Christina Rodwell  100.0     3.0           black   
 3      Ziggy  Stardust     Ziggy Stardust   40.0     7.0             red   ,
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  
 0   Lucy  Diamonds  Lucy Diamonds   22.0     9.0           brown   
 1  Grace     Kelly    Grace Kelly   50.0     7.0           white   
 2    Uma   Thurman    Uma Thurman  105.0     7.0          purple   
 3   Lola      King      Lola King    NaN     NaN             NaN     ,
 'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  
 0  Eleanor     Rigby  Eleanor  Rigby  104.0     6.0            blue   
 1  Barbara       Ann    Barbara  Ann  168.0     8.0            pink   
 2    Polly   Cracker  Polly  Cracker  450.0     7.0           black   
 3   Little       Joe     Little  Joe    NaN     NaN             NaN   }

我想要的输出：

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color  
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  
 3   Lola      King      Lola King    NaN     NaN             NaN   
 'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  
 3   Little       Joe     Little  Joe    NaN     NaN             NaN

最后，我想将"desired output"写入一个新的Excel文件（在单独的工作表中）。请为我提供指导。我是Python新手。

英文:

I have opened my dictionary and I would like to write a loop so to obtain as an output only those rows for which the ''favorite color'' column equals to NaN.

My code so far:

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the file path
my_file_path = r&#39;C:\Users\machukovich\Desktop\stack.xlsx&#39;
# Loading the file into a dictionary of Dataframes
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)

my_dict output:

my_dict = {&#39;Sheet_1&#39;:         Name   Surname      Concatenation    ID_  Grade_ favourite color   
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 2  Christina   Rodwell  Christina Rodwell  100.0     3.0           black   
 3      Ziggy  Stardust     Ziggy Stardust   40.0     7.0             red    ,
 &#39;Sheet_2&#39;:     Name   Surname  Concatenation    ID_  Grade_ favourite color  \
 0   Lucy  Diamonds  Lucy Diamonds   22.0     9.0           brown   
 1  Grace     Kelly    Grace Kelly   50.0     7.0           white   
 2    Uma   Thurman    Uma Thurman  105.0     7.0          purple   
 3   Lola      King      Lola King    NaN     NaN             NaN     ,
 &#39;Sheet_3&#39;:        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 0  Eleanor     Rigby  Eleanor  Rigby  104.0     6.0            blue   
 1  Barbara       Ann    Barbara  Ann  168.0     8.0            pink   
 2    Polly   Cracker  Polly  Cracker  450.0     7.0           black   
 3   Little       Joe     Little  Joe    NaN     NaN             NaN    }

My desired output:

my_dict = {&#39;Sheet_1&#39;:         Name   Surname      Concatenation    ID_  Grade_ favourite color  
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 &#39;Sheet_2&#39;:     Name   Surname  Concatenation    ID_  Grade_ favourite color  \ 
 3   Lola      King      Lola King    NaN     NaN             NaN   
  &#39;Sheet_3&#39;:        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 3   Little       Joe     Little  Joe    NaN     NaN             NaN

And, finally I would like to write the desired output to a new excel file (in separate sheets).
Please, enlighten me. I am new to python.

答案1

得分: 1

这是您提供的代码的翻译：

我会这样做：
    使用 pd.ExcelWriter("output.xlsx", engine="xlsxwriter") 作为 writer:
        对于 sn, df in my_dict.items():
            (df.loc[df["favourite color"].isnull()] # 我们使用布尔索引
                 .to_excel(writer, sheet_name=sn, index=False)) # 是否使用 startrow, startcol ?
        # 这是可选的
        对于 ws in writer.sheets:
            writer.sheets[ws].autofit() # xlsxwriter 3.0.6+

输出（只有 Sheet_1）：

更新:

如果您想先更新 my_dict，可以使用以下方式：

对于 sn, df in my_dict.items():
    my_dict[sn] = df.loc[df["favourite color"].isnull()]

输出：

print(my_dict)
{'Sheet_1':       Name   Surname     Concatenation  ID_  Grade_  favourite color
 0  Delilah  Gonzalez  Delilah Gonzalez  NaN     NaN              NaN,
 'Sheet_2':    Name Surname Concatenation  ID_  Grade_  favourite color
 0  Lola    King     Lola King  NaN     NaN              NaN,
 'Sheet_3':      Name Surname Concatenation  ID_  Grade_  favourite color
 0  Little     Joe    Little Joe  NaN     NaN              NaN}

然后（如果需要），您可以循环遍历每个筛选后的 df 将其存储在电子表格中：

使用 pd.ExcelWriter("output.xlsx", engine="xlsxwriter") 作为 writer:
    对于 sn, df in my_dict.items():
        df.to_excel(writer, sheet_name=sn, index=False)


请注意，我已经翻译了您提供的代码和相关注释，但没有翻译其他内容。
<details>
<summary>英文:</summary>
I would do it this way :
    with pd.ExcelWriter(&quot;output.xlsx&quot;, engine=&quot;xlsxwriter&quot;) as writer:
        for sn, df in my_dict.items():
            (df.loc[df[&quot;favourite color&quot;].isnull()] # we use boolean indexing
                 .to_excel(writer, sheet_name=sn, index=False)) # with startrow, starcol ?
        #this is optional    
        for ws in writer.sheets:
            writer.sheets[ws].autofit() # xlsxwriter 3.0.6+
Output (*only* `Sheet_1`):
[![enter image description here][1]][1]
***Update :***
If you want to update `my_dict` first, you can use this :
    for sn, df in my_dict.items():
        my_dict[sn] = df.loc[df[&quot;favourite color&quot;].isnull()]
Output :
    print(my_dict)
    
    {&#39;Sheet_1&#39;:       Name   Surname     Concatenation  ID_  Grade_  favourite color
     0  Delilah  Gonzalez  Delilah Gonzalez  NaN     NaN              NaN,
     &#39;Sheet_2&#39;:    Name Surname Concatenation  ID_  Grade_  favourite color
     0  Lola    King     Lola King  NaN     NaN              NaN,
     &#39;Sheet_3&#39;:      Name Surname Concatenation  ID_  Grade_  favourite color
     0  Little     Joe    Little Joe  NaN     NaN              NaN}
Then (*if needed*) you can loop through each filtered `df` to store it in a spreadsheet :
    with pd.ExcelWriter(&quot;output.xlsx&quot;, engine=&quot;xlsxwriter&quot;) as writer:
        for sn, df in my_dict.items():
            df.to_excel(writer, sheet_name=sn, index=False)
  [1]: https://i.stack.imgur.com/iUPhm.png
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

循环以筛选字典中的NaN值

问题

答案1

迭代快速浏览h5文件并执行一些计算

mypy错误：Python数据类中的+操作数类型不支持（”Self”和”A”）[运算符]

For loop with Python to iterate through a grouped DF and extract the top value for each group and append to a new DF?

PIL.UnidentifiedImageError: cannot identify image file io.BytesIO object for deploying a PyTorch model in Flask

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论