问题

我正在尝试将多个PDF表格导入到PowerQuery中，但这些表格出现在不同的页面上。在一个PDF中，表格可能出现在第5页，而在另一个PDF中，表格可能出现在第10页。在这种情况下，表格的表头是相同的。我该如何使用PowerQuery来查找这些表格的表头并将它们导入到Excel中呢？

谢谢您的帮助。

英文:

I am trying to import multiple PDF tables into PowerQuery but the table appear on different pages. In one pdf, the table can appear on Page 5 while the other the table can appear on page 10. In this case the table headers are the same. How can I use PowerQuery to find this table header and pull these tables into excel?

Thanks for the anticipated help.

答案1

得分: 1

这似乎适用于您的三个示例文件

let PullPDF = (variable)=&gt; let
Source = Pdf.Tables(File.Contents(variable)),
List = List.Union(List.Transform(Source[Data], each Table.ColumnNames(_))), 
#&quot;Expanded Data&quot; = Table.ExpandTableColumn(Source, &quot;Data&quot;, List,List)   ,
// 假设所有包含“Volume”一词的行上方的数据都是垃圾
Row=Table.RemoveFirstN(#&quot;Expanded Data&quot;,List.PositionOf(#&quot;Expanded Data&quot;[Column2],&quot;Volume&quot;)),
#&quot;Removed Other Columns&quot; = Table.SelectColumns(Row,{&quot;Column1&quot;, &quot;Column2&quot;, &quot;Column3&quot;, &quot;Column4&quot;, &quot;Column5&quot;, &quot;Column6&quot;, &quot;Column7&quot;}),
#&quot;Promoted Headers&quot; = Table.PromoteHeaders(#&quot;Removed Other Columns&quot;, [PromoteAllScalars=true]),
// 根据实际列名和格式更改或删除下一行
#&quot;Changed Type&quot; = Table.TransformColumnTypes(#&quot;Promoted Headers&quot;,{{&quot;A3 Unit&quot;, type text}, {&quot;Volume&quot;, Int64.Type}, {&quot;∆V&quot;, Int64.Type}, {&quot;Method&quot;, type text}, {&quot;Volume_1&quot;, Int64.Type}, {&quot;∆V_2&quot;, Int64.Type}, {&quot;Delta#(lf)Volume&quot;, Int64.Type}})
in  #&quot;Changed Type&quot;,


Source2 = Folder.Files(&quot;c:\temp\&quot;),
#&quot;Filtered Rows&quot; = Table.SelectRows(Source2, each ([Extension] = &quot;.pdf&quot;)),
#&quot;Added Custom&quot; = Table.AddColumn(#&quot;Filtered Rows&quot;, &quot;Data&quot;, each PullPDF([Folder Path]&amp;[Name])),
List = List.Union(List.Transform(#&quot;Added Custom&quot;[Data], each Table.ColumnNames(_))), 
#&quot;Expanded Data&quot; = Table.ExpandTableColumn(#&quot;Added Custom&quot;, &quot;Data&quot;, List,List)   
in  #&quot;Expanded Data&quot;

我在另一个网站上查看了一下，因为我很无聊，但实际上，要求人们登录另一个网站来帮助您在这个网站上并不是最佳选择。

英文:

This seems to work on your three sample files

let PullPDF = (variable)=&gt; let
Source = Pdf.Tables(File.Contents(variable)),
List = List.Union(List.Transform(Source[Data], each Table.ColumnNames(_))), 
#&quot;Expanded Data&quot; = Table.ExpandTableColumn(Source, &quot;Data&quot;, List,List)   ,
// assume all data above row containing word Volume is garbage
Row=Table.RemoveFirstN(#&quot;Expanded Data&quot;,List.PositionOf(#&quot;Expanded Data&quot;[Column2],&quot;Volume&quot;)),
#&quot;Removed Other Columns&quot; = Table.SelectColumns(Row,{&quot;Column1&quot;, &quot;Column2&quot;, &quot;Column3&quot;, &quot;Column4&quot;, &quot;Column5&quot;, &quot;Column6&quot;, &quot;Column7&quot;}),
#&quot;Promoted Headers&quot; = Table.PromoteHeaders(#&quot;Removed Other Columns&quot;, [PromoteAllScalars=true]),
// change or remove next row as needed based on real column names and formats
#&quot;Changed Type&quot; = Table.TransformColumnTypes(#&quot;Promoted Headers&quot;,{{&quot;A3 Unit&quot;, type text}, {&quot;Volume&quot;, Int64.Type}, {&quot;∆V&quot;, Int64.Type}, {&quot;Method&quot;, type text}, {&quot;Volume_1&quot;, Int64.Type}, {&quot;∆V_2&quot;, Int64.Type}, {&quot;Delta#(lf)Volume&quot;, Int64.Type}})
in  #&quot;Changed Type&quot;,


Source2 = Folder.Files(&quot;c:\temp\&quot;),
#&quot;Filtered Rows&quot; = Table.SelectRows(Source2, each ([Extension] = &quot;.pdf&quot;)),
#&quot;Added Custom&quot; = Table.AddColumn(#&quot;Filtered Rows&quot;, &quot;Data&quot;, each PullPDF([Folder Path]&amp;[Name])),
List = List.Union(List.Transform(#&quot;Added Custom&quot;[Data], each Table.ColumnNames(_))), 
#&quot;Expanded Data&quot; = Table.ExpandTableColumn(#&quot;Added Custom&quot;, &quot;Data&quot;, List,List)   
in  #&quot;Expanded Data&quot;

I looked on the other site since I was bored, but really, requiring people to log into another site to help you on this one is not optimal

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to use Power Query to find a word in a PDF document and pull a table out

问题

答案1

两个不同表中事件之间的时间差

SQL Collation: 无法存储俄语字符，即使它们已经存在于列中。

如何获取不包含在消费表中的日期列表。

SQL 查询 Google 表 – 等于 / 不等于

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论