英文:
how to convert from extractedInformation (using pytesseract.image_to_string) to dataframe?
问题
I have this image:
I have extracted the text using pytesseract
# Use the image_to_string method to extract text from the image
extractedInformation = pytesseract.image_to_string(image, lang='eng')
# Print the extracted text
print(extractedInformation)
This is the output:
Moustafa
Engineer
Upwork Company
Ahmed
Teacher
School
How to convert the output to a dataframe with 3 columns cols = ["name","job","company"]?
I tried to convert extractedInformation
to a list by
list_word = extractedInformation.split('\n')
and this was the output:
['Moustafa',
'Engineer',
'Upwork Company',
'',
'Ahmed',
'Teacher',
'School',
'\x0c']
but still couldn't convert it to a dataframe with 3 columns.
英文:
I have extracted the text using pytesseract
# Use the image_to_string method to extract text from the image
extractedInformation = pytesseract.image_to_string(image, lang = 'eng')
# Print the extract text
print(extractedInformation)
This is the output:
Moustafa
Engineer
Upwork Company
Ahmed
Teacher
School
How to convert the output to dataframe that has 3 columns cols = ["name","job","company"]?
I tried to convert extractedInformation to list by
list_word = extractedInformation.split('\n')
and this was the output:
['Moustafa',
'Engineer',
'Upwork Company',
'',
'Ahmed',
'Teacher',
'School',
'\x0c']
but still couldn't convert it to dataframe with 3 columns.
答案1
得分: 0
你可以使用:
import numpy as np
list_word = 展开收缩
df = pd.DataFrame(
np.array(list_word).reshape(-1, 3), columns=["name", "job", "company"]
)
另一种变体(不使用 [tag:numpy]):
data = [list_word[i:i+3] for i in range(0, len(list_word), 3)]
df = pd.DataFrame(data, columns=["name", "job", "company"])
输出:
print(df)
name job company
0 Moustafa Engineer Upwork Company
1 Ahmed Teacher School
英文:
You can use :
import numpy as np
list_word = 展开收缩
df = pd.DataFrame(
np.array(list_word).reshape(-1, 3), columns=["name", "job", "company"]
)
Another variant (without [tag:numpy]) :
data = [list_word[i:i+3] for i in range(0, len(list_word), 3)]
df = pd.DataFrame(data, columns=["name", "job", "company"])
Output :
print(df)
name job company
0 Moustafa Engineer Upwork Company
1 Ahmed Teacher School
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论