英文:
Split long sentences of a text file around the middle on comma (multiple commas)
问题
我有一个.srt文件,我想要拆分以便在mpv中观看。它是一本整本的书转化为.srt文件,用于语言学习,并附带有配套的有声读物。
我的问题是,它是日文,日文单词之间没有空格,所以mpv不会断开长句子,而是会将它们变得非常小以适应一行的大小。
我尝试了Subtitle Edit,但它对日文不起作用。
所以我试图写自己的脚本,尽管我对此了解不多。
我卡在了如何分割具有多个逗号的句子上,我应该选择中间的哪一个?
以下是我到目前为止得到的:
```python
with open("test.txt", encoding="utf8") as file:
for line in file:
#print(line)
size = len(line)
if size > 45:
# 使用日文逗号、将句子一分为二
英文:
I have a .srt file that I'd like to split to watch with mpv. It's a whole book turned into .srt for language learning, with an audiobook to go along.
My problem is, it's in Japanese, which doesn't have space between words, so mpv doesn't break long sentences, instead it makes them very tiny to fit the one line size.
I tried Subtitle Edit, but it's not working for Japanese.
So I'm trying to do my own script, although I don't know much about it.
I'm stuck on how to break a sentence that has multiple commas, how would I choose one around the middle?
Here's what I got so far:
with open("test.txt", encoding="utf8") as file:
for line in file:
#print(line)
size = len(line)
if size > 45:
#break sentence in half, using Japanese comma 、
Here's the text file I'm using for testing:
10
00:00:55,640 --> 00:01:09,580
クラスで一番、明るくて、優しくて、運動神経がよくて、しかも、頭もよくて、みんなその子と友達になりたがる。
11
00:01:11,090 --> 00:01:24,500
だけどその子は、たくさんいるクラスメートの中に私がいることに気づいて、その顔にお日様みたいな眩しく、優しい微笑みをふわーっと浮かべる。
12
00:01:24,730 --> 00:01:32,250
私に近づき、「こころちゃん、ひさしぶり!」
13
00:01:32,910 --> 00:01:35,180
と挨拶をする。
14
00:01:37,450 --> 00:01:41,730
周りの子がみんな息を吞む中、「前から知ってるの。
15
00:01:42,000 --> 00:01:42,820
ね?」
16
00:01:43,820 --> 00:01:46,550
と私に目配せをする。
答案1
得分: 0
我的编译器在我尝试仅打开文件一次时出现问题,所以我的解决方案执行以下操作:读取每一行并将它们存储到一个列表中,遍历列表并找到所有字符数大于45的行,找到中间附近的逗号,然后将前后的行添加到列表中。完成后,将列表写入文件。
fileLines = []
def findCommaNearMiddle(line):
length = len(line)
middle = int(length/2)
# 检查逗号出现在中间的位置
distance = 0
while distance <= middle:
if line[middle+distance] == '、':
return middle+distance
elif line[middle-distance] == '、':
return middle-distance
distance += 1
return -1 # 理想情况下,这永远不会发生
with open("test.txt", "r", encoding="utf8") as file:
fileText = file.read()
fileLines = fileText.split('\n');
for i in range(len(fileLines)):
line = fileLines[i]
size = len(line)
if size > 45:
middleComma = findCommaNearMiddle(line)
fileLines[i] = line[:middleComma]
fileLines.insert(i+1, line[middleComma+1:]) # +1以去除逗号
file.close()
with open("test.txt", "w", encoding="utf8") as file:
for line in fileLines:
file.write(line + '\n')
file.close()
如果你想能够按照除'、'以外的字符分割,只需添加另一个条件到两个if语句中,类似于 or line[middle+distance] == '。':
。
英文:
My compiler was being weird when I tried to open the file only once, so my solution does the following: Read every line and store them to a list, go through the list and find all the lines that are > 45 characters, find a comma near the middle, then add the line before and after to the list. Once done, write the list to the file.
fileLines = []
def findCommaNearMiddle(line):
length = len(line)
middle = int(length/2)
# check values on either side until comma is found
distance = 0
while distance <= middle:
if line[middle+distance] == '、':
return middle+distance
elif line[middle-distance] == '、':
return middle-distance
distance += 1
return -1 # idealy, this will never happen
with open("test.txt", "r", encoding="utf8") as file:
fileText = file.read()
fileLines = fileText.split('\n');
for i in range(len(fileLines)):
line = fileLines[i]
size = len(line)
if size > 45:
middleComma = findCommaNearMiddle(line)
fileLines[i] = line[:middleComma]
fileLines.insert(i+1, line[middleComma+1:]) # +1 to get rid of comma
file.close()
with open("test.txt", "w", encoding="utf8") as file:
for line in fileLines:
file.write(line + '\n')
file.close()
If you want to be able to split by characters other than '、', just add another condition to the two if statements that goes something like or line[middle+distance] == '。':
答案2
得分: 0
你可以找到最接近句子中间的逗号并在那个逗号处分割句子。
with open("test.txt", encoding="utf8") as file:
for line in file:
size = len(line)
if size > 45:
# 找到距离行中间最近的逗号
middle = size // 2
comma_index = line.rfind("、", 0, middle) # rfind() 在中间之前搜索逗号的最后一次出现
if comma_index == -1: # 如果在中间之前没有逗号,则在中间分割
split_index = middle
else:
split_index = comma_index + 1 # 在逗号之后分割
# 在split_index处分割行
first_line = line[:split_index].strip()
second_line = line[split_index:].strip()
print(first_line)
print(second_line)
else:
print(line.strip())
英文:
you can locate the comma that is closest to the middle of the sentence and split the sentence at that comma.
with open("test.txt", encoding="utf8") as file:
for line in file:
size = len(line)
if size > 45:
# Find the comma closest to the middle of the line
middle = size // 2
comma_index = line.rfind("、", 0, middle) # rfind() searches for the last occurrence of the comma before the middle
if comma_index == -1: # If there is no comma before the middle, split at the middle
split_index = middle
else:
split_index = comma_index + 1 # Split after the comma
# Split the line at the split_index
first_line = line[:split_index].strip()
second_line = line[split_index:].strip()
print(first_line)
print(second_line)
else:
print(line.strip())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论