在R中从OCR生成的列表创建干净的数据框。

huangapple go评论66阅读模式
英文:

Create clean data frame in R from list generated by ocr

问题

我已经将表格导入到R中,使用Tesseract的ocr函数,我需要将其格式化为一个可以用来创建图形表示的表格。

到目前为止,我已对数据进行了多种不同的转换,已经有了我最终输出所需的前3列。到目前为止,我的表格输出如下:

"V1"需要重命名为"Year","month_list"重命名为"Month","us_total"重命名为"U.S. Totals"。

我还已经清理了表格剩余部分的所有值,对于这些值,我有以下数据框输出:

这些值需要组成我当前输出的U.S. Total列右侧的9列,其中前9个值是2020年12月的数据,接下来的9个是2021年1月的数据,以此类推。

如果我能够正确地使这个工作,最终输出将如下表格所示(以数据框格式):

图片链接

英文:

I have a table I imported into R using the ocr function from tesseract which I need to format into a table that can be used to create graphical representations from.

I have done a bunch of different transformations on the data and so far have the first 3 columns that are needed for my final output.

My output for the table so far looks like this:

> table_df2
     V1 month_list us_total
1  2020   December      6.7
2  2021    January      6.3
3  2021   February      6.2
4  2021      March        6
5  2021      April      6.1
6  2021        May      5.8
7  2021       June      5.9
8  2021       July      5.4
9  2021     August      5.2
10 2021  September      4.8
11 2021    October      4.6
12 2021   November      4.2
13 2021   December      3.9

"V1" will need to be renamed as "Year", "month_list" as "Month", and "us_total" as "U.S. Totals"

I also have cleaned all of the values in the remaining part of the table and have the following dataframe output for these:

> dput(values)
structure(list(table_df = c("5.6", "5.8", "14.4", "8.5", "10.5", 
"24.2", "9.1", "8.9", "16.9", "5.1", "5.5", "14.5", "8.5", "9.4", 
"17.3", "8.8", "7.7", "17.4", "5.2", "5.3", "13.1", "8.9", "10.2", 
"19.8", "8.5", "7.7", "17.3", "5", "5.2", "11.8", "8.7", "9.8", 
"18.1", "7.3", "7.5", "16.3", "4.8", "5.3", "11.1", "8.6", "10.2", 
"18.9", "7.5", "7.5", "17", "4.8", "5.1", "8.8", "8.2", "9.8", 
"12.1", "7.4", "6.7", "14.2", "5", "5.2", "9", "8.5", "10", "9.3", 
"7.9", "6.6", "13.2", "", "4.5", "4.9", "8.2", "7.6", "8.4", 
"13.3", "6.7", "6.2", "10.8", "4.3", "4.4", "9.7", "7.8", "9.0", 
"17.4", "6", "5.6", "14.9", "3.7", "4.2", "10.6", "7.2", "7.9", 
"14.6", "5.5", "5.6", "17.5", "3.8", "3.6", "10.3", "6.8", "8.2", 
"16.0", "5.6", "5", "15.6", "3.7", "3.3", "9", "4.9", "7.2", 
"22", "5.3", "4.5", "12.1", "3.1", "3.0", "8.6", "6.2", "7", 
"21.0", "4.9", "4.2", "12.2")), row.names = c(NA, -118L), class = "data.frame")

These values need to make up 9 columns to the right of U.S. Total column in my current output, where the first 9 values are for 2020 December, the next 9 are for 2021 January row, etc.

If I could get this to work correctly, the final output would look like this table (in dataframe format):
在R中从OCR生成的列表创建干净的数据框。

答案1

得分: 1

我能够使用矩阵函数使这个工作。

my_matrix <- matrix(as.numeric(values), ncol = 9, byrow = TRUE)
values_df <- data.frame(my_matrix)
colnames(values_df) <- c("W.F", "W.M", "W.16-19", "B.F", "B.M", "B.16-19",
                         "HL.F", "HL.M", "HL.16-19")
英文:

I was able to get this to work using the matrix function.

my_matrix &lt;- matrix(as.numeric(values), ncol = 9, byrow = TRUE)
values_df &lt;- data.frame(my_matrix)
colnames(values_df) &lt;- c(&quot;W.F&quot;, &quot;W.M&quot;, &quot;W.16-19&quot;, &quot;B.F&quot;, &quot;B.M&quot;, &quot;B.16-19&quot;,
                         &quot;HL.F&quot;, &quot;HL.M&quot;, &quot;HL.16-19&quot;)

huangapple
  • 本文由 发表于 2023年2月27日 07:36:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75575678.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定