2023年4月4日 12:09:14go评论108阅读模式

英文:

Trying to use python to run a complex macro in excel, code not working

问题

以下是我要执行的宏的一般思路：一个包含药物名称的大型Excel文档，其中药物名称在B列，不良效应在D列。我试图按不良效应对药物进行排序。这是我的表格示例：

打开一个保存在我的桌面上的Excel文件，路径为"/Desktop/Pharm Exam III Drugscopy.xlsx"，并对文件中的每个工作表执行以下操作：（共有3个工作表，"Antidepressants and Mood"，"Immunomodulators" 和 "Drugs of Abuse"）
在D列中搜索每个唯一的值，这些值要么紧跟在“-”后，要么在“-”和值之间有空格。例如，"-HTN" 和 "- HTN" 将被视为相同。
对于每个唯一值的每个实例，复制列B中相应的值。（例如，如果列D中有-HTN，则查看列B并复制其值“Venlafaxine”）
创建一个新工作表，用于存储所有这些唯一值以及它们在列B中的相应值，每个新列的标题都是唯一值，下面列出的将是列B中的相应值。
格式化表格，使每个列具有不同的颜色。

以下是我提出的代码（只有数月的代码学习经验和ChatGPT的调试帮助）：

import pandas as pd
import openpyxl
# 用指定路径读取Excel文件
file_path = "/Desktop/Pharm Exam III Drugscopy.xlsx"
xl = pd.ExcelFile(file_path)
# 创建一个空字典来存储每个工作表的数据
sheet_data = {}
# 循环遍历每个工作表并提取数据
for sheet_name in ['Antidepressants and Mood Stabil', 'Immunomodulators', 'Drugs of Abuse']:
    # 将工作表读取为数据帧
    df = xl.parse(sheet_name)
    # 创建一个空字典来存储此工作表的数据
    sheet_dict = {}
    # 遍历每一行并提取数据
    for index, row in df.iterrows():
        value_d = row['D']
        value_b = row['B']
        # 检查列D中的值是否与感兴趣的模式匹配
        if isinstance(value_d, str) and ('-' in value_d or '- ' in value_d):
            split_values = value_d.split('-')
            for sv in split_values:
                sv = sv.strip()  # 移除任何额外的空格
                if sv != '':
                    sheet_dict[sv] = sheet_dict.get(sv, []) + [value_b]
    # 将工作表数据添加到整体字典中
    sheet_data[sheet_name] = sheet_dict
# 创建一个新的数据帧以保存结果
result_df = pd.DataFrame(sheet_data)
# 将数据帧写入原始Excel文件中的新工作表
result_sheet_name = "Results"
result_df.to_excel(file_path, sheet_name=result_sheet_name, index=False)
# 格式化表格，使每个列都有不同的颜色
writer = pd.ExcelWriter(file_path, engine='openpyxl')
book = writer.book
sheet = book[result_sheet_name]
for i, col in enumerate(sheet.columns):
    color = 'FFFFFF' if i % 2 == 0 else 'F2F2F2'
    for cell in sheet[f'{col}']:
        cell.fill = PatternFill(start_color=color, end_color=color, fill_type='solid')
writer.save()

希望这可以帮助你执行你的宏。

英文:

Here is the general idea of my macro I want to perform: A big excel document with drugs names in column B, and their Adverse Effects in column D. I am trying to sort the drugs by adverse effects. Here is an example of my table:

open up an excel file saved on my desktop with the pathway "/Desktop/Pharm Exam III Drugscopy.xlsx" and perform this for each sheet within the file: (there are 3 sheets, "Antidepressants and Mood", "Immunomodulators", and "Drugs of Abuse"
Search in column D for every unique value that either comes immediately after a "-" or has a space in between the "-" and the value. For example "-HTN" and "- HTN" will be considered the same.
For every instance of a unique value, copy the corresponding value in column B. (ex. if column D has -HTN, then you look over to column B and copy its value "Venlafaxine"
Create a table in a new sheet to store all of these unique values and their corresponding value in column B with the title of each new column being the unique value, and everything listed below will be the corresponding value that is in column B.
Format the table so that each column is a different color

Here is the code I came up with (only have a few months experience with codeacademy and chatGPT for debugging help)

import pandas as pd
import openpyxl
# read in the excel file with the specified path
file_path = &quot;/Desktop/Pharm Exam III Drugscopy.xlsx&quot;
xl = pd.ExcelFile(file_path)
# create an empty dictionary to hold the data for each sheet
sheet_data = {}
# loop through each sheet and extract the data
for sheet_name in [&#39;Antidepressants and Mood Stabil&#39;, &#39;Immunomodulators&#39;, &#39;Drugs of Abuse&#39;]:
    # read in the sheet as a dataframe
    df = xl.parse(sheet_name)
    
    # create an empty dictionary to hold the data for this sheet
    sheet_dict = {}
    
    # loop through each row and extract the data
    for index, row in df.iterrows():
        value_d = row[&#39;D&#39;]
        value_b = row[&#39;B&#39;]
        
        # check if the value in column D matches the pattern of interest
        if isinstance(value_d, str) and (&#39;-&#39; in value_d or &#39;- &#39; in value_d):
            split_values = value_d.split(&#39;-&#39;)
            for sv in split_values:
                sv = sv.strip() # remove any extra spaces
                if sv != &#39;&#39;:
                    sheet_dict[sv] = sheet_dict.get(sv, []) + [value_b]
    
    # add the sheet data to the overall dictionary
    sheet_data[sheet_name] = sheet_dict
# create a new dataframe to hold the results
result_df = pd.DataFrame(sheet_data)
# write the dataframe to a new sheet in the original excel file
result_sheet_name = &quot;Results&quot;
result_df.to_excel(file_path, sheet_name=result_sheet_name, index=False)
# format the table with alternating column colors
writer = pd.ExcelWriter(file_path, engine=&#39;openpyxl&#39;)
book = writer.book
sheet = book[result_sheet_name]
for i, col in enumerate(sheet.columns):
    color = &#39;FFFFFF&#39; if i % 2 == 0 else &#39;F2F2F2&#39;
    for cell in sheet[f&#39;{col}&#39;]:
        cell.fill = PatternFill(start_color=color, end_color=color, fill_type=&#39;solid&#39;)
writer.save()

NEW ERROR CODES

PAGE 1

PAGE 2

PAGE 3

答案1

得分: 1

以下是代码部分的翻译：

import pandas as pd
from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment
dfs = pd.read_excel(file_path, sheet_name=None, usecols=["Drug Name", "Adverse Effects"])
df = (
        pd.concat(dfs.values(), ignore_index=True)
            .assign(Adverse_Effects= lambda df_: df_.pop("Adverse Effects")
                    .str.findall(r"\s+-\s*(.*)")
                    .loc[lambda s: s.str.len().gt(0)])
            .dropna(subset=["Adverse_Effects"]).explode("Adverse_Effects")
            .drop_duplicates().groupby("Adverse_Effects", sort=False).agg("\n".join).T
     )

with pd.ExcelWriter(file_path, mode="a", engine="openpyxl") as writer:
    df.to_excel(writer, sheet_name="Results")
    worksheet = writer.sheets["Results"]
    cell_range = f"A2:{get_column_letter(worksheet.max_column)}2"
    align = Alignment(horizontal="left", vertical="center", wrap_text=True)
    for row in worksheet[cell_range]:
        for cell in row:
            cell.alignment = align
    for column in range(2, worksheet.max_column+1):
        col_letter = get_column_letter(column)
        worksheet.column_dimensions[col_letter].width = 25
    worksheet.column_dimensions["A"].width = 11
    worksheet.row_dimensions[2].height = 50

希望这对您有所帮助。如果您有任何其他问题，请随时提问。

英文:

Your expected output is unclear but IIUC, you can use findall (which is a good starting point) :

import pandas as pd
from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment
dfs = pd.read_excel(file_path, sheet_name=None, usecols=[&quot;Drug Name&quot;, &quot;Adverse Effects&quot;])
df = (
        pd.concat(dfs.values(), ignore_index=True)
            .assign(Adverse_Effects= lambda df_: df_.pop(&quot;Adverse Effects&quot;)
                    .str.findall(r&quot;\s+-\s*(.*)&quot;)
                    .loc[lambda s: s.str.len().gt(0)])
            .dropna(subset=[&quot;Adverse_Effects&quot;]).explode(&quot;Adverse_Effects&quot;)
            .drop_duplicates().groupby(&quot;Adverse_Effects&quot;, sort=False).agg(&quot;\n&quot;.join).T
     )

Then, to make the new spreadsheet, you can use ExcelWriter with an openpyxl engine :

with pd.ExcelWriter(file_path, mode=&quot;a&quot;, engine=&quot;openpyxl&quot;) as writer:
    df.to_excel(writer, sheet_name=&quot;Results&quot;)
    worksheet = writer.sheets[&quot;Results&quot;]
    cell_range = f&quot;A2:{get_column_letter(worksheet.max_column)}2&quot;
    align = Alignment(horizontal=&quot;left&quot;, vertical=&quot;center&quot;, wrap_text=True)
    
    for row in worksheet[cell_range]:
        for cell in row:
            cell.alignment = align
            
    for column in range(2, worksheet.max_column+1):
        col_letter = get_column_letter(column)
        worksheet.column_dimensions[col_letter].width = 25
    
    worksheet.column_dimensions[&quot;A&quot;].width = 11
    worksheet.row_dimensions[2].height = 50

Ouptut (spreadsheet) :

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

尝试使用Python在Excel中运行复杂的宏，代码不起作用。

问题

答案1

阿拉伯文本正在报告实验室中反转行。

Python Shiny：如何使用两个按钮切换条件面板的可见性？

如何在Python中应用轮廓后从图像中提取文本？

如何以快速方式删除原始索引并设置新索引？（Python语言）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。