循环以筛选字典中的NaN值

huangapple go评论69阅读模式
英文:

Loop for filtering NaN's in dictionaries

问题

我打开了我的字典,我想写一个循环,以便只获得"favorite color"列等于NaN的行作为输出。

我的代码到目前为止:

# 导入模块
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook

# 定义文件路径
my_file_path = r'C:\Users\machukovich\Desktop\stack.xlsx'
# 将文件加载到数据框字典中
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)

my_dict输出:

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color   
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 2  Christina   Rodwell  Christina Rodwell  100.0     3.0           black   
 3      Ziggy  Stardust     Ziggy Stardust   40.0     7.0             red   ,
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  
 0   Lucy  Diamonds  Lucy Diamonds   22.0     9.0           brown   
 1  Grace     Kelly    Grace Kelly   50.0     7.0           white   
 2    Uma   Thurman    Uma Thurman  105.0     7.0          purple   
 3   Lola      King      Lola King    NaN     NaN             NaN     ,
 'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  
 0  Eleanor     Rigby  Eleanor  Rigby  104.0     6.0            blue   
 1  Barbara       Ann    Barbara  Ann  168.0     8.0            pink   
 2    Polly   Cracker  Polly  Cracker  450.0     7.0           black   
 3   Little       Joe     Little  Joe    NaN     NaN             NaN   }

我想要的输出:

my_dict = {'Sheet_1':         Name   Surname      Concatenation    ID_  Grade_ favourite color  
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 'Sheet_2':     Name   Surname  Concatenation    ID_  Grade_ favourite color  
 3   Lola      King      Lola King    NaN     NaN             NaN   
 'Sheet_3':        Name  Surname   Concatenation    ID_  Grade_ favourite color  
 3   Little       Joe     Little  Joe    NaN     NaN             NaN  

最后,我想将"desired output"写入一个新的Excel文件(在单独的工作表中)。请为我提供指导。我是Python新手。

英文:

I have opened my dictionary and I would like to write a loop so to obtain as an output only those rows for which the ''favorite color'' column equals to NaN.

  • My code so far:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook

# Defining the file path
my_file_path = r&#39;C:\Users\machukovich\Desktop\stack.xlsx&#39;
# Loading the file into a dictionary of Dataframes
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)

<!-- end snippet -->

  • my_dict output:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

my_dict = {&#39;Sheet_1&#39;:         Name   Surname      Concatenation    ID_  Grade_ favourite color   
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 2  Christina   Rodwell  Christina Rodwell  100.0     3.0           black   
 3      Ziggy  Stardust     Ziggy Stardust   40.0     7.0             red    ,
 &#39;Sheet_2&#39;:     Name   Surname  Concatenation    ID_  Grade_ favourite color  \
 0   Lucy  Diamonds  Lucy Diamonds   22.0     9.0           brown   
 1  Grace     Kelly    Grace Kelly   50.0     7.0           white   
 2    Uma   Thurman    Uma Thurman  105.0     7.0          purple   
 3   Lola      King      Lola King    NaN     NaN             NaN     ,
 &#39;Sheet_3&#39;:        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 0  Eleanor     Rigby  Eleanor  Rigby  104.0     6.0            blue   
 1  Barbara       Ann    Barbara  Ann  168.0     8.0            pink   
 2    Polly   Cracker  Polly  Cracker  450.0     7.0           black   
 3   Little       Joe     Little  Joe    NaN     NaN             NaN    }

<!-- end snippet -->

  • My desired output:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

my_dict = {&#39;Sheet_1&#39;:         Name   Surname      Concatenation    ID_  Grade_ favourite color  
 1    Delilah  Gonzalez   Delilah Gonzalez    NaN     NaN             NaN   
 &#39;Sheet_2&#39;:     Name   Surname  Concatenation    ID_  Grade_ favourite color  \ 
 3   Lola      King      Lola King    NaN     NaN             NaN   
  &#39;Sheet_3&#39;:        Name  Surname   Concatenation    ID_  Grade_ favourite color  \
 3   Little       Joe     Little  Joe    NaN     NaN             NaN   

<!-- end snippet -->

And, finally I would like to write the desired output to a new excel file (in separate sheets).
Please, enlighten me. I am new to python.

答案1

得分: 1

这是您提供的代码的翻译:

我会这样做

    使用 pd.ExcelWriter("output.xlsx", engine="xlsxwriter") 作为 writer:
        对于 sn, df in my_dict.items():
            (df.loc[df["favourite color"].isnull()] # 我们使用布尔索引
                 .to_excel(writer, sheet_name=sn, index=False)) # 是否使用 startrow, startcol ?
        # 这是可选的
        对于 ws in writer.sheets:
            writer.sheets[ws].autofit() # xlsxwriter 3.0.6+

输出(只有 Sheet_1):

循环以筛选字典中的NaN值

更新:

如果您想先更新 my_dict,可以使用以下方式:

对于 sn, df in my_dict.items():
    my_dict[sn] = df.loc[df["favourite color"].isnull()]

输出:

print(my_dict)

{'Sheet_1':       Name   Surname     Concatenation  ID_  Grade_  favourite color
 0  Delilah  Gonzalez  Delilah Gonzalez  NaN     NaN              NaN,
 'Sheet_2':    Name Surname Concatenation  ID_  Grade_  favourite color
 0  Lola    King     Lola King  NaN     NaN              NaN,
 'Sheet_3':      Name Surname Concatenation  ID_  Grade_  favourite color
 0  Little     Joe    Little Joe  NaN     NaN              NaN}

然后(如果需要),您可以循环遍历每个筛选后的 df 将其存储在电子表格中:

使用 pd.ExcelWriter("output.xlsx", engine="xlsxwriter") 作为 writer:
    对于 sn, df in my_dict.items():
        df.to_excel(writer, sheet_name=sn, index=False)

请注意,我已经翻译了您提供的代码和相关注释,但没有翻译其他内容。

<details>
<summary>英文:</summary>

I would do it this way :

    with pd.ExcelWriter(&quot;output.xlsx&quot;, engine=&quot;xlsxwriter&quot;) as writer:
        for sn, df in my_dict.items():
            (df.loc[df[&quot;favourite color&quot;].isnull()] # we use boolean indexing
                 .to_excel(writer, sheet_name=sn, index=False)) # with startrow, starcol ?
        #this is optional    
        for ws in writer.sheets:
            writer.sheets[ws].autofit() # xlsxwriter 3.0.6+

Output (*only* `Sheet_1`):

[![enter image description here][1]][1]

***Update :***

If you want to update `my_dict` first, you can use this :

    for sn, df in my_dict.items():
        my_dict[sn] = df.loc[df[&quot;favourite color&quot;].isnull()]

Output :

    print(my_dict)
    
    {&#39;Sheet_1&#39;:       Name   Surname     Concatenation  ID_  Grade_  favourite color
     0  Delilah  Gonzalez  Delilah Gonzalez  NaN     NaN              NaN,
     &#39;Sheet_2&#39;:    Name Surname Concatenation  ID_  Grade_  favourite color
     0  Lola    King     Lola King  NaN     NaN              NaN,
     &#39;Sheet_3&#39;:      Name Surname Concatenation  ID_  Grade_  favourite color
     0  Little     Joe    Little Joe  NaN     NaN              NaN}

Then (*if needed*) you can loop through each filtered `df` to store it in a spreadsheet :

    with pd.ExcelWriter(&quot;output.xlsx&quot;, engine=&quot;xlsxwriter&quot;) as writer:
        for sn, df in my_dict.items():
            df.to_excel(writer, sheet_name=sn, index=False)

  [1]: https://i.stack.imgur.com/iUPhm.png

</details>



huangapple
  • 本文由 发表于 2023年6月15日 02:53:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76476734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定