英文:
Trying to use python to run a complex macro in excel, code not working
问题
以下是我要执行的宏的一般思路:一个包含药物名称的大型Excel文档,其中药物名称在B列,不良效应在D列。我试图按不良效应对药物进行排序。这是我的表格示例:
- 打开一个保存在我的桌面上的Excel文件,路径为"/Desktop/Pharm Exam III Drugscopy.xlsx",并对文件中的每个工作表执行以下操作:(共有3个工作表,"Antidepressants and Mood","Immunomodulators" 和 "Drugs of Abuse")
- 在D列中搜索每个唯一的值,这些值要么紧跟在“-”后,要么在“-”和值之间有空格。例如,"-HTN" 和 "- HTN" 将被视为相同。
- 对于每个唯一值的每个实例,复制列B中相应的值。(例如,如果列D中有-HTN,则查看列B并复制其值“Venlafaxine”)
- 创建一个新工作表,用于存储所有这些唯一值以及它们在列B中的相应值,每个新列的标题都是唯一值,下面列出的将是列B中的相应值。
- 格式化表格,使每个列具有不同的颜色。
以下是我提出的代码(只有数月的代码学习经验和ChatGPT的调试帮助):
import pandas as pd
import openpyxl
# 用指定路径读取Excel文件
file_path = "/Desktop/Pharm Exam III Drugscopy.xlsx"
xl = pd.ExcelFile(file_path)
# 创建一个空字典来存储每个工作表的数据
sheet_data = {}
# 循环遍历每个工作表并提取数据
for sheet_name in ['Antidepressants and Mood Stabil', 'Immunomodulators', 'Drugs of Abuse']:
# 将工作表读取为数据帧
df = xl.parse(sheet_name)
# 创建一个空字典来存储此工作表的数据
sheet_dict = {}
# 遍历每一行并提取数据
for index, row in df.iterrows():
value_d = row['D']
value_b = row['B']
# 检查列D中的值是否与感兴趣的模式匹配
if isinstance(value_d, str) and ('-' in value_d or '- ' in value_d):
split_values = value_d.split('-')
for sv in split_values:
sv = sv.strip() # 移除任何额外的空格
if sv != '':
sheet_dict[sv] = sheet_dict.get(sv, []) + [value_b]
# 将工作表数据添加到整体字典中
sheet_data[sheet_name] = sheet_dict
# 创建一个新的数据帧以保存结果
result_df = pd.DataFrame(sheet_data)
# 将数据帧写入原始Excel文件中的新工作表
result_sheet_name = "Results"
result_df.to_excel(file_path, sheet_name=result_sheet_name, index=False)
# 格式化表格,使每个列都有不同的颜色
writer = pd.ExcelWriter(file_path, engine='openpyxl')
book = writer.book
sheet = book[result_sheet_name]
for i, col in enumerate(sheet.columns):
color = 'FFFFFF' if i % 2 == 0 else 'F2F2F2'
for cell in sheet[f'{col}']:
cell.fill = PatternFill(start_color=color, end_color=color, fill_type='solid')
writer.save()
希望这可以帮助你执行你的宏。
英文:
Here is the general idea of my macro I want to perform: A big excel document with drugs names in column B, and their Adverse Effects in column D. I am trying to sort the drugs by adverse effects. Here is an example of my table:
- open up an excel file saved on my desktop with the pathway "/Desktop/Pharm Exam III Drugscopy.xlsx" and perform this for each sheet within the file: (there are 3 sheets, "Antidepressants and Mood", "Immunomodulators", and "Drugs of Abuse"
- Search in column D for every unique value that either comes immediately after a "-" or has a space in between the "-" and the value. For example "-HTN" and "- HTN" will be considered the same.
- For every instance of a unique value, copy the corresponding value in column B. (ex. if column D has -HTN, then you look over to column B and copy its value "Venlafaxine"
- Create a table in a new sheet to store all of these unique values and their corresponding value in column B with the title of each new column being the unique value, and everything listed below will be the corresponding value that is in column B.
- Format the table so that each column is a different color
Here is the code I came up with (only have a few months experience with codeacademy and chatGPT for debugging help)
import pandas as pd
import openpyxl
# read in the excel file with the specified path
file_path = "/Desktop/Pharm Exam III Drugscopy.xlsx"
xl = pd.ExcelFile(file_path)
# create an empty dictionary to hold the data for each sheet
sheet_data = {}
# loop through each sheet and extract the data
for sheet_name in ['Antidepressants and Mood Stabil', 'Immunomodulators', 'Drugs of Abuse']:
# read in the sheet as a dataframe
df = xl.parse(sheet_name)
# create an empty dictionary to hold the data for this sheet
sheet_dict = {}
# loop through each row and extract the data
for index, row in df.iterrows():
value_d = row['D']
value_b = row['B']
# check if the value in column D matches the pattern of interest
if isinstance(value_d, str) and ('-' in value_d or '- ' in value_d):
split_values = value_d.split('-')
for sv in split_values:
sv = sv.strip() # remove any extra spaces
if sv != '':
sheet_dict[sv] = sheet_dict.get(sv, []) + [value_b]
# add the sheet data to the overall dictionary
sheet_data[sheet_name] = sheet_dict
# create a new dataframe to hold the results
result_df = pd.DataFrame(sheet_data)
# write the dataframe to a new sheet in the original excel file
result_sheet_name = "Results"
result_df.to_excel(file_path, sheet_name=result_sheet_name, index=False)
# format the table with alternating column colors
writer = pd.ExcelWriter(file_path, engine='openpyxl')
book = writer.book
sheet = book[result_sheet_name]
for i, col in enumerate(sheet.columns):
color = 'FFFFFF' if i % 2 == 0 else 'F2F2F2'
for cell in sheet[f'{col}']:
cell.fill = PatternFill(start_color=color, end_color=color, fill_type='solid')
writer.save()
答案1
得分: 1
以下是代码部分的翻译:
import pandas as pd
from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment
dfs = pd.read_excel(file_path, sheet_name=None, usecols=["Drug Name", "Adverse Effects"])
df = (
pd.concat(dfs.values(), ignore_index=True)
.assign(Adverse_Effects= lambda df_: df_.pop("Adverse Effects")
.str.findall(r"\s+-\s*(.*)")
.loc[lambda s: s.str.len().gt(0)])
.dropna(subset=["Adverse_Effects"]).explode("Adverse_Effects")
.drop_duplicates().groupby("Adverse_Effects", sort=False).agg("\n".join).T
)
with pd.ExcelWriter(file_path, mode="a", engine="openpyxl") as writer:
df.to_excel(writer, sheet_name="Results")
worksheet = writer.sheets["Results"]
cell_range = f"A2:{get_column_letter(worksheet.max_column)}2"
align = Alignment(horizontal="left", vertical="center", wrap_text=True)
for row in worksheet[cell_range]:
for cell in row:
cell.alignment = align
for column in range(2, worksheet.max_column+1):
col_letter = get_column_letter(column)
worksheet.column_dimensions[col_letter].width = 25
worksheet.column_dimensions["A"].width = 11
worksheet.row_dimensions[2].height = 50
希望这对您有所帮助。如果您有任何其他问题,请随时提问。
英文:
Your expected output is unclear but IIUC, you can use findall
(which is a good starting point) :
import pandas as pd
from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment
dfs = pd.read_excel(file_path, sheet_name=None, usecols=["Drug Name", "Adverse Effects"])
df = (
pd.concat(dfs.values(), ignore_index=True)
.assign(Adverse_Effects= lambda df_: df_.pop("Adverse Effects")
.str.findall(r"\s+-\s*(.*)")
.loc[lambda s: s.str.len().gt(0)])
.dropna(subset=["Adverse_Effects"]).explode("Adverse_Effects")
.drop_duplicates().groupby("Adverse_Effects", sort=False).agg("\n".join).T
)
Then, to make the new spreadsheet, you can use ExcelWriter
with an openpyxl
engine :
with pd.ExcelWriter(file_path, mode="a", engine="openpyxl") as writer:
df.to_excel(writer, sheet_name="Results")
worksheet = writer.sheets["Results"]
cell_range = f"A2:{get_column_letter(worksheet.max_column)}2"
align = Alignment(horizontal="left", vertical="center", wrap_text=True)
for row in worksheet[cell_range]:
for cell in row:
cell.alignment = align
for column in range(2, worksheet.max_column+1):
col_letter = get_column_letter(column)
worksheet.column_dimensions[col_letter].width = 25
worksheet.column_dimensions["A"].width = 11
worksheet.row_dimensions[2].height = 50
Ouptut (spreadsheet) :
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论