2023年3月3日 23:30:49go评论100阅读模式

英文:

Opening file from Windows gives UnicodeDecodeError: 'utf-8' codec can't decode byte: invalid start byte

问题

最近我把我的Windows 10升级为Ubuntu，但我带来的脚本不起作用。在Windows上它们运行得很完美。现在，当我尝试运行它们时，我得到了一个UTF-8编解码错误。

我安装了venvs、pip和所需的模块（pip list）因为我认为可能缺少UTF或Unicode模块，但这没有解决问题。

以下是涉及的文件的代码。这是一个用于自动抓取输入单词结果的词汇表。它还远未完成，但在Windows上可以正常启动和运行（没有终端错误）。

（以下是您的Python代码，我已经去除了代码部分以进行翻译。）

这是我尝试运行它时收到的错误：错误截图。有没有人知道如何修复这个问题？

英文:

Recently I upgraded my windows 10 to Ubuntu, and the scripts that I brought with me don't work. They worked perfectly well on windows though. Now when i try running them i get a utf-8 codec error.

I installed venvs, pip and required modules (pip list) cause i thought that maybe it lacks a utf or unicode module, but that didn't fix it.

Here's the code for the file in question. It's a vocabulary that automatically scrapes results for entered words. It's far from finished, but it starts up and runs fine on windows (no terminal errors)

from tkinter import *
import csv
import tkinter
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
def FkinIndex(number):
if number &gt;&gt; 1:
print(&quot;NUMBER:::&quot;, number)
elif number == 0:
number += 1
return number
# -------------------------------------------------- Function: strVarSet
def strVarSet(keys, values):
y = 0
vars = {}
l = []
for item in values:
x = eval(item)
for xtem in x:
y = y + 1
vars[y] = x[xtem]
print(&quot;\nstrVarSet()\nReturning\n&quot;, vars, &quot;\n\n&quot;, keys)
return vars, keys
# -------------------------------------------------- Function: DictExtract
def dictExtract():
with open(&quot;dir/dess.txt&quot;, &quot;r&quot;) as y:
x = y.readlines()
d = str(x)
Dict = eval(d)
print(Dict)
print(type(Dict))
l = []
for item in Dict:
print(&quot;first print -----\n&quot;, item)
x = eval(item)
for key in x:
l.append(key)
print(&quot;\ndictExtract()\nreturning\n&quot;, l, &quot;\n&quot;, Dict)
return l, Dict
# -------------------------------------------------- Function: Next
def Next(index, ):
index = index+1
print(f&quot;-------------\n{index}\n-------------&quot;)
l, keys = dictExtract()
vars, momo = strVarSet(l, keys)
count = 0 # key for dict
descs = {} # dict
for f in vars[index]: # for values in DictList[index]
count = count + 1 # key for dict
descs[count] = f # Extracting descriptions for Labels
print(&quot;ff\n\n&quot;, f)
return index
# -------------------------------------------------- Function: Current
def Current(index):
print(f&quot;-------------\n{index}\n-------------&quot;)
l, keys = dictExtract()
vars, momo = strVarSet(l, keys)
count = 0  # key for dict
descs = {}  # dict
for f in vars[index]:  # for values in DictList[index]
count = count + 1  # key for dict
descs[count] = f  # Extracting descriptions for Labels
print(&quot;ff\n\n&quot;, f)
window.update()
return index, vars, descs
# -------------------------------------------------- Function: Previous
def Previous(index, ):
index = index-1
l, keys = dictExtract()
vars, momo = strVarSet(l, keys)
count = 0
descs = {}
for f in vars[index]:
count = count + 1
descs[count] = f
print(f)
# -------------------------------------------------- Function: DictSaver
def dictSaver(d):
with open(&quot;dir/dess.txt&quot;, &quot;a&quot;) as y:
#y = csv.writer(y)
d = str(d)
y.write(d + &quot;\n&quot;)
d = {}
# -------------------------------------------------- DictFormer
def DictFormer(l, name):
d = {name:l}
print(d)
dictSaver(d)
# -------------------------------------------------- Function: Button_Words
def Button_Words(words):
l = []
for word in words:
keys, dictList = dictExtract()
if word not in keys:
print(word)
with urlopen(f&quot;https://www.dictionary.com/browse/{word}&quot;) as token:
bsobj = BeautifulSoup(token, &quot;html.parser&quot;)
section = bsobj.find(&quot;div&quot;, {&quot;class&quot;: &quot;css-69s207 e1hk9ate3&quot;})
l.append(section.get_text())
for span in section.find_next_sibling(&quot;div&quot;):
l.append(span.get_text())
print(l)
x = DictFormer(l, word)
l = []
else:
continue
# --------------------------------------------------
token1 = urlopen(&quot;https://www.dictionary.com/&quot;)
token = requests.Request(&quot;https://www.dictionary.com/&quot;, headers={&#39;User-Agent&#39;: &#39;Mozilla/5.0&#39;})
bsobj = BeautifulSoup(token1, &quot;html.parser&quot;)
table = bsobj.find(&quot;section&quot;, {&quot;class&quot;: &quot;g6v6DANjsJKOolEk5qVH&quot;})
step = table.find(&quot;span&quot;, {&quot;class&quot;: re.compile(&quot;.*&quot;)})
xstep = step.find(&quot;a&quot;, {&quot;data-linkid&quot;: &quot;nx1fkx&quot;})
print(xstep.get_text())
with open(&quot;dir/word_list.csv&quot;) as word_list:
word_list = csv.reader(word_list)
count1 = 0
for row in word_list:
count1 = count1 + 1
row = str(row).strip(&quot;[&quot;).strip(&quot;]&quot;).strip(&quot;\&#39;&quot;)
print(row)
v1 = &quot;Hello World&quot;
def WordInput(x):
f = []
f.append(x)
with open(&quot;dir/word_list.csv&quot;, &quot;a&quot;) as y:
writer = csv.writer(y)
writer.writerow(f)
y.close()
def wordUnload():
x=[]
with open(&quot;dir/word_list.csv&quot;, &quot;r&quot;) as y:
reader = csv.reader(y)
for read in reader:
f = str(read)
f = f.strip(&quot;[&quot;).strip(&quot;]&quot;).strip(&quot;\&#39;&quot;)
print(f)
x.append(f)
print(x)
return x
indexxternal = FkinIndex(0)
unloadedw = wordUnload()
window = Tk()
# -------------------------------------------------- Button: Add New Word
NewWordButton = Button(window, text=&quot;+&quot;, command=lambda: WordInput(input(&quot;Add new Word\n&gt; &quot;)))
NewWordButton.grid(row=0, column=1)
# -------------------------------------------------- Button: Load Words
WordsButton = Button(window, text=&quot;Words&quot;, command=lambda: Button_Words(unloadedw))
WordsButton.grid(row=0, column=0)
# -------------------------------------------------- Button: Next
NextButton = Button(window, text=&quot;&gt;&quot;, command=lambda: indexxternal==Next(indexxternal))
NextButton.grid(row=0, column=99)
# -------------------------------------------------- Button: Previous
PreviousButton = Button(window, text=&quot;&lt;&quot;, command=lambda: indexxternal==Previous(indexxternal))
PreviousButton.grid(row=0, column=2)
# -------------------------------------------------- Initial Extraction
l, keys = dictExtract()
count = 0
index, vars, descs = Current(indexxternal)
for f in vars[index]:
count = count + 1
descs[count] = f
print(f)
# -------------------------------------------------- WORD Label
bar1 = tkinter.StringVar(window, str(l[0]).capitalize())
wrd1 = Label(window, textvariable=bar1, font=&quot;helvetica 11 underline&quot;)
wrd1.grid(row=0, column=3)
try: # --------------------------------------------- Description Label 1
var1 = tkinter.StringVar(window, str(descs[1]))
dsc1 = Label(window, textvariable=var1, font=&quot;Helvetica 9 italic&quot;)
dsc1.grid(row=1, column=3, pady=5, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 1&quot;)
try: # --------------------------------------------- Description Label 2
var2 = tkinter.StringVar(window, str(descs[2]))
dsc2 = Label(window, textvariable=var2)
dsc2.grid(row=2, column=3, pady=1, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 2&quot;)
try: # --------------------------------------------- Description Label 3
var3 = tkinter.StringVar(window, str(descs[3]))
dsc3 = Label(window, textvariable=var3)
dsc3.grid(row=3, column=3,pady=1, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 3&quot;)
try: # --------------------------------------------- Description Label 4
var4 = tkinter.StringVar(window, str(descs[4]))
dsc4 = Label(window, textvariable=var4)
dsc4.grid(row=4, column=3,pady=1, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 4&quot;)
try: # --------------------------------------------- Description Label 5
var5 = tkinter.StringVar(window, str(descs[5]))
dsc5 = Label(window, textvariable=var5)
dsc5.grid(row=5, column=3,pady=1, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 5&quot;)
try: # --------------------------------------------- Description Label 6
var6 = tkinter.StringVar(window, str(descs[6]))
dsc6 = Label(window, textvariable=var6)
dsc6.grid(row=6, column=3,pady=1, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 6&quot;)
try: # --------------------------------------------- Description Label 7
var7 = tkinter.StringVar(window, str(descs[7]))
dsc7 = Label(window, textvariable=var7)
dsc7.grid(row=7, column=3,pady=1, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 7&quot;)
try: # --------------------------------------------- Description Label 8
var8 = tkinter.StringVar(window, str(descs[8]))
dsc8 = Label(window, textvariable=var8)
dsc8.grid(row=8, column=3, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 8&quot;)
try: # --------------------------------------------- Description Label 9
var9 = tkinter.StringVar(window, str(descs[9]))
dsc9 = Label(window, textvariable=var9)
dsc9.grid(row=9, column=3, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 9&quot;)
try: # --------------------------------------------- Description Label 10
var10 = tkinter.StringVar(window, str(descs[10]))
dsc10 = Label(window, textvariable=var10)
dsc10.grid(row=10, column=3, sticky=&quot;W&quot;)
except KeyError:
print(&quot;Description out of index in Label 10&quot;)
loadword1 = tkinter.StringVar(window)
window.mainloop()

Here's the error I get when I try running it: Error. Does anyone know a way to fix this?

答案1

得分: 1

已解决！
问题出在 dess.txt 的编码上。Windows 对其txt文件进行不同的编码，导致Unicode错误。

对于任何面临类似问题的人：

进入你的txt所在的目录
通过终端输入gedit name.txt，在GEdit中打开你的txt文件（你可以在Ubuntu软件应用中获取它）。
点击保存并在窗口底部更改文件的编码，然后进行覆盖保存。

非常感谢 @snakecharmerb - 我确实被他所吸引。

英文:

SOLVED!
The issue was dess.txt's encoding. Windows encodes its txt files differently which results in a unicode error.

For anyone facing similar problem:

cd into your txt's directory
Open your txt in GEdit (you can get it on ubuntu software app) via terminal by typinggedit name.txt
Click save and change your file's encoding at the bottom of the window and overwrite it.

Huge thanks to @snakecharmerb - i'm charmed indeed

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Opening file from Windows gives UnicodeDecodeError: 'utf-8' codec can't decode byte: invalid start byte

问题

答案1

如何访问二维网格中的所有点？

改变乌龟图形的颜色按键按下时。

你能在Python turtle中清除特定的字母吗

如何从字典创建一个类变量

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。