如何将使用pytesseract.image_to_string提取的信息转换为数据框?

huangapple go评论71阅读模式
英文:

how to convert from extractedInformation (using pytesseract.image_to_string) to dataframe?

问题

I have this image:
如何将使用pytesseract.image_to_string提取的信息转换为数据框?

I have extracted the text using pytesseract

  1. # Use the image_to_string method to extract text from the image
  2. extractedInformation = pytesseract.image_to_string(image, lang='eng')
  3. # Print the extracted text
  4. print(extractedInformation)

This is the output:

  1. Moustafa
  2. Engineer
  3. Upwork Company
  4. Ahmed
  5. Teacher
  6. School

How to convert the output to a dataframe with 3 columns cols = ["name","job","company"]?

I tried to convert extractedInformation to a list by

  1. list_word = extractedInformation.split('\n')

and this was the output:

  1. ['Moustafa',
  2. 'Engineer',
  3. 'Upwork Company',
  4. '',
  5. 'Ahmed',
  6. 'Teacher',
  7. 'School',
  8. '\x0c']

but still couldn't convert it to a dataframe with 3 columns.

英文:

I have this image:
如何将使用pytesseract.image_to_string提取的信息转换为数据框?

I have extracted the text using pytesseract

  1. # Use the image_to_string method to extract text from the image
  2. extractedInformation = pytesseract.image_to_string(image, lang = 'eng')
  3. # Print the extract text
  4. print(extractedInformation)

This is the output:

  1. Moustafa
  2. Engineer
  3. Upwork Company
  4. Ahmed
  5. Teacher
  6. School

How to convert the output to dataframe that has 3 columns cols = ["name","job","company"]?

I tried to convert extractedInformation to list by

  1. list_word = extractedInformation.split('\n')

and this was the output:

  1. ['Moustafa',
  2. 'Engineer',
  3. 'Upwork Company',
  4. '',
  5. 'Ahmed',
  6. 'Teacher',
  7. 'School',
  8. '\x0c']

but still couldn't convert it to dataframe with 3 columns.

答案1

得分: 0

你可以使用:

  1. import numpy as np
  2. list_word =
    展开收缩
  3. df = pd.DataFrame(
  4. np.array(list_word).reshape(-1, 3), columns=["name", "job", "company"]
  5. )

另一种变体(不使用 [tag:numpy]):

  1. data = [list_word[i:i+3] for i in range(0, len(list_word), 3)]
  2. df = pd.DataFrame(data, columns=["name", "job", "company"])

输出:

  1. print(df)
  2. name job company
  3. 0 Moustafa Engineer Upwork Company
  4. 1 Ahmed Teacher School
英文:

You can use :

  1. import numpy as np
  2. list_word =
    展开收缩
  3. df = pd.DataFrame(
  4. np.array(list_word).reshape(-1, 3), columns=["name", "job", "company"]
  5. )

Another variant (without [tag:numpy]) :

  1. data = [list_word[i:i+3] for i in range(0, len(list_word), 3)]
  2. df = pd.DataFrame(data, columns=["name", "job", "company"])

Output :

  1. print(df)
  2. name job company
  3. 0 Moustafa Engineer Upwork Company
  4. 1 Ahmed Teacher School

huangapple
  • 本文由 发表于 2023年6月5日 00:38:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76401418.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定