2023年2月27日 07:36:42go评论92阅读模式

英文:

Create clean data frame in R from list generated by ocr

问题

我已经将表格导入到R中，使用Tesseract的ocr函数，我需要将其格式化为一个可以用来创建图形表示的表格。

到目前为止，我已对数据进行了多种不同的转换，已经有了我最终输出所需的前3列。到目前为止，我的表格输出如下：

"V1"需要重命名为"Year"，"month_list"重命名为"Month"，"us_total"重命名为"U.S. Totals"。

我还已经清理了表格剩余部分的所有值，对于这些值，我有以下数据框输出：

这些值需要组成我当前输出的U.S. Total列右侧的9列，其中前9个值是2020年12月的数据，接下来的9个是2021年1月的数据，以此类推。

如果我能够正确地使这个工作，最终输出将如下表格所示（以数据框格式）：

图片链接

英文:

I have a table I imported into R using the ocr function from tesseract which I need to format into a table that can be used to create graphical representations from.

I have done a bunch of different transformations on the data and so far have the first 3 columns that are needed for my final output.

My output for the table so far looks like this:

&gt; table_df2
     V1 month_list us_total
1  2020   December      6.7
2  2021    January      6.3
3  2021   February      6.2
4  2021      March        6
5  2021      April      6.1
6  2021        May      5.8
7  2021       June      5.9
8  2021       July      5.4
9  2021     August      5.2
10 2021  September      4.8
11 2021    October      4.6
12 2021   November      4.2
13 2021   December      3.9

"V1" will need to be renamed as "Year", "month_list" as "Month", and "us_total" as "U.S. Totals"

I also have cleaned all of the values in the remaining part of the table and have the following dataframe output for these:

&gt; dput(values)
structure(list(table_df = c(&quot;5.6&quot;, &quot;5.8&quot;, &quot;14.4&quot;, &quot;8.5&quot;, &quot;10.5&quot;, 
&quot;24.2&quot;, &quot;9.1&quot;, &quot;8.9&quot;, &quot;16.9&quot;, &quot;5.1&quot;, &quot;5.5&quot;, &quot;14.5&quot;, &quot;8.5&quot;, &quot;9.4&quot;, 
&quot;17.3&quot;, &quot;8.8&quot;, &quot;7.7&quot;, &quot;17.4&quot;, &quot;5.2&quot;, &quot;5.3&quot;, &quot;13.1&quot;, &quot;8.9&quot;, &quot;10.2&quot;, 
&quot;19.8&quot;, &quot;8.5&quot;, &quot;7.7&quot;, &quot;17.3&quot;, &quot;5&quot;, &quot;5.2&quot;, &quot;11.8&quot;, &quot;8.7&quot;, &quot;9.8&quot;, 
&quot;18.1&quot;, &quot;7.3&quot;, &quot;7.5&quot;, &quot;16.3&quot;, &quot;4.8&quot;, &quot;5.3&quot;, &quot;11.1&quot;, &quot;8.6&quot;, &quot;10.2&quot;, 
&quot;18.9&quot;, &quot;7.5&quot;, &quot;7.5&quot;, &quot;17&quot;, &quot;4.8&quot;, &quot;5.1&quot;, &quot;8.8&quot;, &quot;8.2&quot;, &quot;9.8&quot;, 
&quot;12.1&quot;, &quot;7.4&quot;, &quot;6.7&quot;, &quot;14.2&quot;, &quot;5&quot;, &quot;5.2&quot;, &quot;9&quot;, &quot;8.5&quot;, &quot;10&quot;, &quot;9.3&quot;, 
&quot;7.9&quot;, &quot;6.6&quot;, &quot;13.2&quot;, &quot;&quot;, &quot;4.5&quot;, &quot;4.9&quot;, &quot;8.2&quot;, &quot;7.6&quot;, &quot;8.4&quot;, 
&quot;13.3&quot;, &quot;6.7&quot;, &quot;6.2&quot;, &quot;10.8&quot;, &quot;4.3&quot;, &quot;4.4&quot;, &quot;9.7&quot;, &quot;7.8&quot;, &quot;9.0&quot;, 
&quot;17.4&quot;, &quot;6&quot;, &quot;5.6&quot;, &quot;14.9&quot;, &quot;3.7&quot;, &quot;4.2&quot;, &quot;10.6&quot;, &quot;7.2&quot;, &quot;7.9&quot;, 
&quot;14.6&quot;, &quot;5.5&quot;, &quot;5.6&quot;, &quot;17.5&quot;, &quot;3.8&quot;, &quot;3.6&quot;, &quot;10.3&quot;, &quot;6.8&quot;, &quot;8.2&quot;, 
&quot;16.0&quot;, &quot;5.6&quot;, &quot;5&quot;, &quot;15.6&quot;, &quot;3.7&quot;, &quot;3.3&quot;, &quot;9&quot;, &quot;4.9&quot;, &quot;7.2&quot;, 
&quot;22&quot;, &quot;5.3&quot;, &quot;4.5&quot;, &quot;12.1&quot;, &quot;3.1&quot;, &quot;3.0&quot;, &quot;8.6&quot;, &quot;6.2&quot;, &quot;7&quot;, 
&quot;21.0&quot;, &quot;4.9&quot;, &quot;4.2&quot;, &quot;12.2&quot;)), row.names = c(NA, -118L), class = &quot;data.frame&quot;)

These values need to make up 9 columns to the right of U.S. Total column in my current output, where the first 9 values are for 2020 December, the next 9 are for 2021 January row, etc.

If I could get this to work correctly, the final output would look like this table (in dataframe format):

答案1

得分: 1

我能够使用矩阵函数使这个工作。

my_matrix <- matrix(as.numeric(values), ncol = 9, byrow = TRUE)
values_df <- data.frame(my_matrix)
colnames(values_df) <- c("W.F", "W.M", "W.16-19", "B.F", "B.M", "B.16-19",
                         "HL.F", "HL.M", "HL.16-19")

英文:

I was able to get this to work using the matrix function.

my_matrix &lt;- matrix(as.numeric(values), ncol = 9, byrow = TRUE)
values_df &lt;- data.frame(my_matrix)
colnames(values_df) &lt;- c(&quot;W.F&quot;, &quot;W.M&quot;, &quot;W.16-19&quot;, &quot;B.F&quot;, &quot;B.M&quot;, &quot;B.16-19&quot;,
                         &quot;HL.F&quot;, &quot;HL.M&quot;, &quot;HL.16-19&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中从OCR生成的列表创建干净的数据框。

问题

答案1

在dbplyr中，可以使用日期时间格式执行日期的加法或减法操作。

你可以使用R中的以下方法从十六进制中获取32位小端值：

根据ID反转列

如何将数据框列表合并为一个数据框使用R？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。