英文:
Loop for filtering NaN's in dictionaries
问题
我打开了我的字典,我想写一个循环,以便只获得"favorite color"列等于NaN
的行作为输出。
我的代码到目前为止:
# 导入模块
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# 定义文件路径
my_file_path = r'C:\Users\machukovich\Desktop\stack.xlsx'
# 将文件加载到数据框字典中
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)
my_dict输出:
my_dict = {'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
1 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN
2 Christina Rodwell Christina Rodwell 100.0 3.0 black
3 Ziggy Stardust Ziggy Stardust 40.0 7.0 red ,
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color
0 Lucy Diamonds Lucy Diamonds 22.0 9.0 brown
1 Grace Kelly Grace Kelly 50.0 7.0 white
2 Uma Thurman Uma Thurman 105.0 7.0 purple
3 Lola King Lola King NaN NaN NaN ,
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color
0 Eleanor Rigby Eleanor Rigby 104.0 6.0 blue
1 Barbara Ann Barbara Ann 168.0 8.0 pink
2 Polly Cracker Polly Cracker 450.0 7.0 black
3 Little Joe Little Joe NaN NaN NaN }
我想要的输出:
my_dict = {'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
1 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color
3 Lola King Lola King NaN NaN NaN
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color
3 Little Joe Little Joe NaN NaN NaN
最后,我想将"desired output"写入一个新的Excel文件(在单独的工作表中)。请为我提供指导。我是Python新手。
英文:
I have opened my dictionary and I would like to write a loop so to obtain as an output only those rows for which the ''favorite color'' column equals to NaN
.
- My code so far:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the file path
my_file_path = r'C:\Users\machukovich\Desktop\stack.xlsx'
# Loading the file into a dictionary of Dataframes
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)
<!-- end snippet -->
- my_dict output:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
my_dict = {'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
1 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN
2 Christina Rodwell Christina Rodwell 100.0 3.0 black
3 Ziggy Stardust Ziggy Stardust 40.0 7.0 red ,
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color \
0 Lucy Diamonds Lucy Diamonds 22.0 9.0 brown
1 Grace Kelly Grace Kelly 50.0 7.0 white
2 Uma Thurman Uma Thurman 105.0 7.0 purple
3 Lola King Lola King NaN NaN NaN ,
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color \
0 Eleanor Rigby Eleanor Rigby 104.0 6.0 blue
1 Barbara Ann Barbara Ann 168.0 8.0 pink
2 Polly Cracker Polly Cracker 450.0 7.0 black
3 Little Joe Little Joe NaN NaN NaN }
<!-- end snippet -->
- My desired output:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-html -->
my_dict = {'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
1 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color \
3 Lola King Lola King NaN NaN NaN
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color \
3 Little Joe Little Joe NaN NaN NaN
<!-- end snippet -->
And, finally I would like to write the desired output
to a new excel file (in separate sheets).
Please, enlighten me. I am new to python.
答案1
得分: 1
这是您提供的代码的翻译:
我会这样做:
使用 pd.ExcelWriter("output.xlsx", engine="xlsxwriter") 作为 writer:
对于 sn, df in my_dict.items():
(df.loc[df["favourite color"].isnull()] # 我们使用布尔索引
.to_excel(writer, sheet_name=sn, index=False)) # 是否使用 startrow, startcol ?
# 这是可选的
对于 ws in writer.sheets:
writer.sheets[ws].autofit() # xlsxwriter 3.0.6+
输出(只有 Sheet_1
):
更新:
如果您想先更新 my_dict
,可以使用以下方式:
对于 sn, df in my_dict.items():
my_dict[sn] = df.loc[df["favourite color"].isnull()]
输出:
print(my_dict)
{'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
0 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN,
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color
0 Lola King Lola King NaN NaN NaN,
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color
0 Little Joe Little Joe NaN NaN NaN}
然后(如果需要),您可以循环遍历每个筛选后的 df
将其存储在电子表格中:
使用 pd.ExcelWriter("output.xlsx", engine="xlsxwriter") 作为 writer:
对于 sn, df in my_dict.items():
df.to_excel(writer, sheet_name=sn, index=False)
请注意,我已经翻译了您提供的代码和相关注释,但没有翻译其他内容。
<details>
<summary>英文:</summary>
I would do it this way :
with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
for sn, df in my_dict.items():
(df.loc[df["favourite color"].isnull()] # we use boolean indexing
.to_excel(writer, sheet_name=sn, index=False)) # with startrow, starcol ?
#this is optional
for ws in writer.sheets:
writer.sheets[ws].autofit() # xlsxwriter 3.0.6+
Output (*only* `Sheet_1`):
[![enter image description here][1]][1]
***Update :***
If you want to update `my_dict` first, you can use this :
for sn, df in my_dict.items():
my_dict[sn] = df.loc[df["favourite color"].isnull()]
Output :
print(my_dict)
{'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
0 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN,
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color
0 Lola King Lola King NaN NaN NaN,
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color
0 Little Joe Little Joe NaN NaN NaN}
Then (*if needed*) you can loop through each filtered `df` to store it in a spreadsheet :
with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
for sn, df in my_dict.items():
df.to_excel(writer, sheet_name=sn, index=False)
[1]: https://i.stack.imgur.com/iUPhm.png
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论