在R中从OCR生成的列表创建干净的数据框。

huangapple go评论92阅读模式
英文:

Create clean data frame in R from list generated by ocr

问题

我已经将表格导入到R中,使用Tesseract的ocr函数,我需要将其格式化为一个可以用来创建图形表示的表格。

到目前为止,我已对数据进行了多种不同的转换,已经有了我最终输出所需的前3列。到目前为止,我的表格输出如下:

"V1"需要重命名为"Year","month_list"重命名为"Month","us_total"重命名为"U.S. Totals"。

我还已经清理了表格剩余部分的所有值,对于这些值,我有以下数据框输出:

这些值需要组成我当前输出的U.S. Total列右侧的9列,其中前9个值是2020年12月的数据,接下来的9个是2021年1月的数据,以此类推。

如果我能够正确地使这个工作,最终输出将如下表格所示(以数据框格式):

图片链接

英文:

I have a table I imported into R using the ocr function from tesseract which I need to format into a table that can be used to create graphical representations from.

I have done a bunch of different transformations on the data and so far have the first 3 columns that are needed for my final output.

My output for the table so far looks like this:

  1. > table_df2
  2. V1 month_list us_total
  3. 1 2020 December 6.7
  4. 2 2021 January 6.3
  5. 3 2021 February 6.2
  6. 4 2021 March 6
  7. 5 2021 April 6.1
  8. 6 2021 May 5.8
  9. 7 2021 June 5.9
  10. 8 2021 July 5.4
  11. 9 2021 August 5.2
  12. 10 2021 September 4.8
  13. 11 2021 October 4.6
  14. 12 2021 November 4.2
  15. 13 2021 December 3.9

"V1" will need to be renamed as "Year", "month_list" as "Month", and "us_total" as "U.S. Totals"

I also have cleaned all of the values in the remaining part of the table and have the following dataframe output for these:

  1. > dput(values)
  2. structure(list(table_df = c("5.6", "5.8", "14.4", "8.5", "10.5",
  3. "24.2", "9.1", "8.9", "16.9", "5.1", "5.5", "14.5", "8.5", "9.4",
  4. "17.3", "8.8", "7.7", "17.4", "5.2", "5.3", "13.1", "8.9", "10.2",
  5. "19.8", "8.5", "7.7", "17.3", "5", "5.2", "11.8", "8.7", "9.8",
  6. "18.1", "7.3", "7.5", "16.3", "4.8", "5.3", "11.1", "8.6", "10.2",
  7. "18.9", "7.5", "7.5", "17", "4.8", "5.1", "8.8", "8.2", "9.8",
  8. "12.1", "7.4", "6.7", "14.2", "5", "5.2", "9", "8.5", "10", "9.3",
  9. "7.9", "6.6", "13.2", "", "4.5", "4.9", "8.2", "7.6", "8.4",
  10. "13.3", "6.7", "6.2", "10.8", "4.3", "4.4", "9.7", "7.8", "9.0",
  11. "17.4", "6", "5.6", "14.9", "3.7", "4.2", "10.6", "7.2", "7.9",
  12. "14.6", "5.5", "5.6", "17.5", "3.8", "3.6", "10.3", "6.8", "8.2",
  13. "16.0", "5.6", "5", "15.6", "3.7", "3.3", "9", "4.9", "7.2",
  14. "22", "5.3", "4.5", "12.1", "3.1", "3.0", "8.6", "6.2", "7",
  15. "21.0", "4.9", "4.2", "12.2")), row.names = c(NA, -118L), class = "data.frame")

These values need to make up 9 columns to the right of U.S. Total column in my current output, where the first 9 values are for 2020 December, the next 9 are for 2021 January row, etc.

If I could get this to work correctly, the final output would look like this table (in dataframe format):
在R中从OCR生成的列表创建干净的数据框。

答案1

得分: 1

我能够使用矩阵函数使这个工作。

  1. my_matrix <- matrix(as.numeric(values), ncol = 9, byrow = TRUE)
  2. values_df <- data.frame(my_matrix)
  3. colnames(values_df) <- c("W.F", "W.M", "W.16-19", "B.F", "B.M", "B.16-19",
  4. "HL.F", "HL.M", "HL.16-19")
英文:

I was able to get this to work using the matrix function.

  1. my_matrix &lt;- matrix(as.numeric(values), ncol = 9, byrow = TRUE)
  2. values_df &lt;- data.frame(my_matrix)
  3. colnames(values_df) &lt;- c(&quot;W.F&quot;, &quot;W.M&quot;, &quot;W.16-19&quot;, &quot;B.F&quot;, &quot;B.M&quot;, &quot;B.16-19&quot;,
  4. &quot;HL.F&quot;, &quot;HL.M&quot;, &quot;HL.16-19&quot;)

huangapple
  • 本文由 发表于 2023年2月27日 07:36:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75575678.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定