如何将使用pytesseract.image_to_string提取的信息转换为数据框?

huangapple go评论51阅读模式
英文:

how to convert from extractedInformation (using pytesseract.image_to_string) to dataframe?

问题

I have this image:
如何将使用pytesseract.image_to_string提取的信息转换为数据框?

I have extracted the text using pytesseract

# Use the image_to_string method to extract text from the image
extractedInformation = pytesseract.image_to_string(image, lang='eng')
     

# Print the extracted text
print(extractedInformation)

This is the output:

Moustafa
Engineer
Upwork Company

Ahmed
Teacher
School

How to convert the output to a dataframe with 3 columns cols = ["name","job","company"]?

I tried to convert extractedInformation to a list by

list_word = extractedInformation.split('\n')

and this was the output:

['Moustafa',
 'Engineer',
 'Upwork Company',
 '',
 'Ahmed',
 'Teacher',
 'School',
 '\x0c']

but still couldn't convert it to a dataframe with 3 columns.

英文:

I have this image:
如何将使用pytesseract.image_to_string提取的信息转换为数据框?

I have extracted the text using pytesseract

# Use the image_to_string method to extract text from the image
extractedInformation = pytesseract.image_to_string(image, lang = 'eng')
     

# Print the extract text
print(extractedInformation)

This is the output:

Moustafa
Engineer
Upwork Company

Ahmed
Teacher
School

How to convert the output to dataframe that has 3 columns cols = ["name","job","company"]?

I tried to convert extractedInformation to list by

list_word = extractedInformation.split('\n')

and this was the output:

['Moustafa',
 'Engineer',
 'Upwork Company',
 '',
 'Ahmed',
 'Teacher',
 'School',
 '\x0c']

but still couldn't convert it to dataframe with 3 columns.

答案1

得分: 0

你可以使用:

import numpy as np

list_word = 
展开收缩
df = pd.DataFrame( np.array(list_word).reshape(-1, 3), columns=["name", "job", "company"] )

另一种变体(不使用 [tag:numpy]):

data = [list_word[i:i+3] for i in range(0, len(list_word), 3)]

df = pd.DataFrame(data, columns=["name", "job", "company"])

输出:

print(df)

       name       job         company
0  Moustafa  Engineer  Upwork Company
1     Ahmed   Teacher          School
英文:

You can use :

import numpy as np

list_word = 
展开收缩
df = pd.DataFrame( np.array(list_word).reshape(-1, 3), columns=["name", "job", "company"] )

Another variant (without [tag:numpy]) :

data = [list_word[i:i+3] for i in range(0, len(list_word), 3)]

df = pd.DataFrame(data, columns=["name", "job", "company"])

Output :

print(df)

       name       job         company
0  Moustafa  Engineer  Upwork Company
1     Ahmed   Teacher          School

huangapple
  • 本文由 发表于 2023年6月5日 00:38:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76401418.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定