尝试使用Python在Excel中运行复杂的宏,代码不起作用。

huangapple go评论68阅读模式
英文:

Trying to use python to run a complex macro in excel, code not working

问题

以下是我要执行的宏的一般思路:一个包含药物名称的大型Excel文档,其中药物名称在B列,不良效应在D列。我试图按不良效应对药物进行排序。这是我的表格示例:

  1. 打开一个保存在我的桌面上的Excel文件,路径为"/Desktop/Pharm Exam III Drugscopy.xlsx",并对文件中的每个工作表执行以下操作:(共有3个工作表,"Antidepressants and Mood","Immunomodulators" 和 "Drugs of Abuse")
  2. 在D列中搜索每个唯一的值,这些值要么紧跟在“-”后,要么在“-”和值之间有空格。例如,"-HTN" 和 "- HTN" 将被视为相同。
  3. 对于每个唯一值的每个实例,复制列B中相应的值。(例如,如果列D中有-HTN,则查看列B并复制其值“Venlafaxine”)
  4. 创建一个新工作表,用于存储所有这些唯一值以及它们在列B中的相应值,每个新列的标题都是唯一值,下面列出的将是列B中的相应值。
  5. 格式化表格,使每个列具有不同的颜色。

以下是我提出的代码(只有数月的代码学习经验和ChatGPT的调试帮助):

import pandas as pd
import openpyxl

# 用指定路径读取Excel文件
file_path = "/Desktop/Pharm Exam III Drugscopy.xlsx"
xl = pd.ExcelFile(file_path)

# 创建一个空字典来存储每个工作表的数据
sheet_data = {}

# 循环遍历每个工作表并提取数据
for sheet_name in ['Antidepressants and Mood Stabil', 'Immunomodulators', 'Drugs of Abuse']:
    # 将工作表读取为数据帧
    df = xl.parse(sheet_name)

    # 创建一个空字典来存储此工作表的数据
    sheet_dict = {}

    # 遍历每一行并提取数据
    for index, row in df.iterrows():
        value_d = row['D']
        value_b = row['B']

        # 检查列D中的值是否与感兴趣的模式匹配
        if isinstance(value_d, str) and ('-' in value_d or '- ' in value_d):
            split_values = value_d.split('-')
            for sv in split_values:
                sv = sv.strip()  # 移除任何额外的空格
                if sv != '':
                    sheet_dict[sv] = sheet_dict.get(sv, []) + [value_b]

    # 将工作表数据添加到整体字典中
    sheet_data[sheet_name] = sheet_dict

# 创建一个新的数据帧以保存结果
result_df = pd.DataFrame(sheet_data)

# 将数据帧写入原始Excel文件中的新工作表
result_sheet_name = "Results"
result_df.to_excel(file_path, sheet_name=result_sheet_name, index=False)

# 格式化表格,使每个列都有不同的颜色
writer = pd.ExcelWriter(file_path, engine='openpyxl')
book = writer.book
sheet = book[result_sheet_name]
for i, col in enumerate(sheet.columns):
    color = 'FFFFFF' if i % 2 == 0 else 'F2F2F2'
    for cell in sheet[f'{col}']:
        cell.fill = PatternFill(start_color=color, end_color=color, fill_type='solid')
writer.save()

希望这可以帮助你执行你的宏。

英文:

Here is the general idea of my macro I want to perform: A big excel document with drugs names in column B, and their Adverse Effects in column D. I am trying to sort the drugs by adverse effects. Here is an example of my table: 尝试使用Python在Excel中运行复杂的宏,代码不起作用。

  1. open up an excel file saved on my desktop with the pathway "/Desktop/Pharm Exam III Drugscopy.xlsx" and perform this for each sheet within the file: (there are 3 sheets, "Antidepressants and Mood", "Immunomodulators", and "Drugs of Abuse"
  2. Search in column D for every unique value that either comes immediately after a "-" or has a space in between the "-" and the value. For example "-HTN" and "- HTN" will be considered the same.
  3. For every instance of a unique value, copy the corresponding value in column B. (ex. if column D has -HTN, then you look over to column B and copy its value "Venlafaxine"
  4. Create a table in a new sheet to store all of these unique values and their corresponding value in column B with the title of each new column being the unique value, and everything listed below will be the corresponding value that is in column B.
  5. Format the table so that each column is a different color

Here is the code I came up with (only have a few months experience with codeacademy and chatGPT for debugging help)

import pandas as pd
import openpyxl

# read in the excel file with the specified path
file_path = "/Desktop/Pharm Exam III Drugscopy.xlsx"
xl = pd.ExcelFile(file_path)

# create an empty dictionary to hold the data for each sheet
sheet_data = {}

# loop through each sheet and extract the data
for sheet_name in ['Antidepressants and Mood Stabil', 'Immunomodulators', 'Drugs of Abuse']:
    # read in the sheet as a dataframe
    df = xl.parse(sheet_name)
    
    # create an empty dictionary to hold the data for this sheet
    sheet_dict = {}
    
    # loop through each row and extract the data
    for index, row in df.iterrows():
        value_d = row['D']
        value_b = row['B']
        
        # check if the value in column D matches the pattern of interest
        if isinstance(value_d, str) and ('-' in value_d or '- ' in value_d):
            split_values = value_d.split('-')
            for sv in split_values:
                sv = sv.strip() # remove any extra spaces
                if sv != '':
                    sheet_dict[sv] = sheet_dict.get(sv, []) + [value_b]
    
    # add the sheet data to the overall dictionary
    sheet_data[sheet_name] = sheet_dict

# create a new dataframe to hold the results
result_df = pd.DataFrame(sheet_data)

# write the dataframe to a new sheet in the original excel file
result_sheet_name = "Results"
result_df.to_excel(file_path, sheet_name=result_sheet_name, index=False)

# format the table with alternating column colors
writer = pd.ExcelWriter(file_path, engine='openpyxl')
book = writer.book
sheet = book[result_sheet_name]
for i, col in enumerate(sheet.columns):
    color = 'FFFFFF' if i % 2 == 0 else 'F2F2F2'
    for cell in sheet[f'{col}']:
        cell.fill = PatternFill(start_color=color, end_color=color, fill_type='solid')
writer.save()

尝试使用Python在Excel中运行复杂的宏,代码不起作用。

尝试使用Python在Excel中运行复杂的宏,代码不起作用。

NEW ERROR CODES
尝试使用Python在Excel中运行复杂的宏,代码不起作用。

PAGE 1
尝试使用Python在Excel中运行复杂的宏,代码不起作用。

PAGE 2尝试使用Python在Excel中运行复杂的宏,代码不起作用。

PAGE 3尝试使用Python在Excel中运行复杂的宏,代码不起作用。

答案1

得分: 1

以下是代码部分的翻译:

import pandas as pd

from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment

dfs = pd.read_excel(file_path, sheet_name=None, usecols=["Drug Name", "Adverse Effects"])

df = (
        pd.concat(dfs.values(), ignore_index=True)
            .assign(Adverse_Effects= lambda df_: df_.pop("Adverse Effects")
                    .str.findall(r"\s+-\s*(.*)")
                    .loc[lambda s: s.str.len().gt(0)])
            .dropna(subset=["Adverse_Effects"]).explode("Adverse_Effects")
            .drop_duplicates().groupby("Adverse_Effects", sort=False).agg("\n".join).T
     )
with pd.ExcelWriter(file_path, mode="a", engine="openpyxl") as writer:
    df.to_excel(writer, sheet_name="Results")
    worksheet = writer.sheets["Results"]

    cell_range = f"A2:{get_column_letter(worksheet.max_column)}2"
    align = Alignment(horizontal="left", vertical="center", wrap_text=True)

    for row in worksheet[cell_range]:
        for cell in row:
            cell.alignment = align

    for column in range(2, worksheet.max_column+1):
        col_letter = get_column_letter(column)
        worksheet.column_dimensions[col_letter].width = 25

    worksheet.column_dimensions["A"].width = 11
    worksheet.row_dimensions[2].height = 50

希望这对您有所帮助。如果您有任何其他问题,请随时提问。

英文:

Your expected output is unclear but IIUC, you can use findall (which is a good starting point) :

import pandas as pd

from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment

dfs = pd.read_excel(file_path, sheet_name=None, usecols=["Drug Name", "Adverse Effects"])

df = (
        pd.concat(dfs.values(), ignore_index=True)
            .assign(Adverse_Effects= lambda df_: df_.pop("Adverse Effects")
                    .str.findall(r"\s+-\s*(.*)")
                    .loc[lambda s: s.str.len().gt(0)])
            .dropna(subset=["Adverse_Effects"]).explode("Adverse_Effects")
            .drop_duplicates().groupby("Adverse_Effects", sort=False).agg("\n".join).T
     )

Then, to make the new spreadsheet, you can use ExcelWriter with an openpyxl engine :

with pd.ExcelWriter(file_path, mode="a", engine="openpyxl") as writer:
    df.to_excel(writer, sheet_name="Results")
    worksheet = writer.sheets["Results"]

    cell_range = f"A2:{get_column_letter(worksheet.max_column)}2"
    align = Alignment(horizontal="left", vertical="center", wrap_text=True)
    
    for row in worksheet[cell_range]:
        for cell in row:
            cell.alignment = align
            
    for column in range(2, worksheet.max_column+1):
        col_letter = get_column_letter(column)
        worksheet.column_dimensions[col_letter].width = 25
    
    worksheet.column_dimensions["A"].width = 11
    worksheet.row_dimensions[2].height = 50

Ouptut (spreadsheet) :

尝试使用Python在Excel中运行复杂的宏,代码不起作用。

huangapple
  • 本文由 发表于 2023年4月4日 12:09:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75925449.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定