Pandas 读取 Excel 文件并用删除线标记行。

huangapple go评论110阅读模式
英文:

Pandas read excel and flag rows with striketrough

问题

我需要使用Python在Excel表格上工作,以便稍后将其放入数据库。我的问题是,我有相当多的行/单元格有删除线。我希望正常读取Excel,并标记具有一个或多个删除线单元格的每一行。我找到了一种跳过这些行的解决方案,但我想要标记它。

我确实找到了使用openpyxl打印有删除线的行的方法,但我想使用pandas,因为我以后需要添加和删除许多列,而pandas在这方面表现最佳。

英文:

I have to work on an excel sheet with python so I can later put it in a database.
My problem is, that I have quite a lot of rows/cells with striketrough. I want to read the excel normally and flag every row with one or more striketrough cells. I just found a solution for skipping these rows but I want to flag it.

I did find a way to print the strikethrough rows using openpyxl but I wanted to use pandas because I have to add and cut a lot of columns later on and pandas is best for that.

答案1

得分: 0

这是一个示例,用于说明一般的逻辑。基本上,我们使用Font.strikethrough来检查单元格的字体是否被划掉。

from openpyxl import load_workbook

ws = load_workbook("file.xlsx")["Sheet1"]

data = [[cell for cell in row] for row in ws.iter_rows(values_only=True)]

df = pd.DataFrame(data[1:], columns=data[0])

df["Flag"] = [
    any(cell.font.strikethrough for cell in row) for row in ws.iter_rows(min_row=2)
]

输出:

print(df)

  col1  col2   col3   Flag
0  foo     1  test1   True
1  bar     2  test2  False
2  qux     3  test3   True
3  baz     4  test4  False

*使用的输入:*

[![进入图片描述][2]][2]


  [1]: https://openpyxl.readthedocs.io/en/stable/api/openpyxl.styles.fonts.html?highlight=strike#openpyxl.styles.fonts.Font.strikethrough
  [2]: https://i.stack.imgur.com/PMNgP.png
英文:

It's hard to imagine how looks like your spreadsheet but here is an example to give you the general logic. Basically, we use Font.strikethrough from [tag:openpyxl] to check if a the font of a cell is striked or not.

from openpyxl import load_workbook

ws = load_workbook("file.xlsx")["Sheet1"]

data = [[cell for cell in row] for row in ws.iter_rows(values_only=True)]

df = pd.DataFrame(data[1:], columns=data[0])

df["Flag"] = [
    any(cell.font.strikethrough for cell in row) for row in ws.iter_rows(min_row=2)
]

Output :

print(df)

  col1  col2   col3   Flag
0  foo     1  test1   True
1  bar     2  test2  False
2  qux     3  test3   True
3  baz     4  test4  False

Input used :

Pandas 读取 Excel 文件并用删除线标记行。

huangapple
  • 本文由 发表于 2023年5月22日 20:30:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76306226.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定