从网站读取表格到R Studio,并创建包含信息的数据框。

huangapple go评论78阅读模式
英文:

Read table from a website into R Studio and create a dataframe with the info

问题

我正在处理一个需要将从网站上找到的表格读入R Studio并格式化的项目,以便我可以创建数据的图形表示。我一直在尝试使用magick包,使用image_readimage_display函数来实现这一点。但在我尝试显示图像后,我一直收到以下错误:

错误:ImageMagick没有构建X11支持

并且我的数据框中没有真正的输出。

以下是我最近尝试使其工作的内容:

img <- image_read("https://i2.wp.com/www.brookings.edu/wp-content/uploads/2022/01/Table-2.png?w=768&crop=0%2C0px%2C100%2C9999px&ssl=1",
                  density = "300")
image_display(img)

img_data <- image_data(img)
table_df <- data.frame(img_data)
table_df

这会返回以下错误:

错误:在默认情况下将x[[i]]的类‘c("bitmap", "rgb")’强制转换为数据框

我需要它将图像的数据作为数据框返回,然后我可以为不同的图形表示进行操作。

英文:

I'm working on project that requires a table I found on a website to be read into R studio and formatted so I can create graphical representations of the data. I have been attempting to do this via the magick package using the image_read and image_display functions. I continue to get the following error after I attempt to display the image:

Error: ImageMagick was built without X11 support

and no real output in my dataframe object.

Here is what I most recently tried to get this to work:

img &lt;- image_read(&quot;https://i2.wp.com/www.brookings.edu/wp-content/uploads/2022/01/Table-2.png?w=768&amp;crop=0%2C0px%2C100%2C9999px&amp;ssl=1&quot;,
                  density = &quot;300&quot;)
image_display(img)

img_data &lt;- image_data(img)
table_df &lt;- data.frame(img_data)
table_df

This returns the following error:

Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class ‘c(&quot;bitmap&quot;, &quot;rgb&quot;)’ to a data.frame

I need it to return the data from the image as a dataframe that I can then manipulate for different graphical representations.

答案1

得分: 1

我能够使用 R 中的tesseract包解决这个问题,使用表格的 PDF 版本。

library(tesseract)

# 加载 PDF 文件并转换为文本
pdf_file <- "C:/Users/Table_Q2.pdf"
text <- tesseract::ocr(pdf_file)
英文:

I was able to solve this using the tesseract package in R using a pdf version of the table.

library(tesseract)

# Load the PDF file and convert to text
pdf_file &lt;- &quot;C:/Users/Table_Q2.pdf&quot;
text &lt;- tesseract::ocr(pdf_file)

huangapple
  • 本文由 发表于 2023年2月27日 05:49:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75575221.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定