将4个字符串的列表分成一对一对的列表。

huangapple go评论54阅读模式
英文:

Group list of 4-strings into list of pairs

问题

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

英文:

I have following list of strings:

['word1 word2 word3 word4', 'word5 word6 word7 word8']

(I have shown only two strings, but there can be many.)
I want to create new list which should look like this:

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

I tried following:

lines = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
[[word1 + ' ' + word2, word3 + ' ' + word4] for line in lines for word1, word2, word3, word4 in line.split()]

But it gives following error:

ValueError: too many values to unpack (expected 4)

How do I do this in most pythonic way?

答案1

得分: 2

Here is the translated code without any additional content:

import re

lst = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
res = [pair for words in lst for pair in re.findall(r'\S+ \S+', words)]

And the result:

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']
英文:

With short regex matching:

import re

lst = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
res = [pair for words in lst for pair in re.findall(r'\S+ \S+', words)]
  • \S+ \S+ - matches 2 consecutive "words"

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

答案2

得分: 0

Pythonic并不意味着“代码行数更少”。使用简单的for循环可以轻松实现:

result = []
for line in lines:
words = line.split()
result.append(' '.join(words[:2]))
result.append(' '.join(words[2:]))


这将得到您期望的结果:

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']


[在线尝试!](https://tio.run/##jY1LCgMhDED3niI7FYpQpz@EnmRwMTCWWiSKWkpPbyeZCzSLF5K8JOXbnxmnMVLE0OAOs/zkuh6BaJkT8yQPwKMzlxfmlXmTXtTQ3qnTvhePXIHOQUTOzQnYglT6QC3TSopdaR7su2YpJeCqJEjzyhEV@7OzXv@hWUeaKDViV7upx/gB)

如果您希望对更多单词的字符串进行通用处理,可以编写一个函数,该函数将生成所需大小的块,并将其与`str.join`一起使用:

def chunks(iterable, chunk_size):
c = []
for item in iterable:
c.append(item)
if len(c) == chunk_size:
yield c
c = []

if c: yield c

result = []
for line in lines:
words = line.split()
for chunk in chunks(words, 2):
result.append(' '.join(chunk))


[在线尝试!](https://tio.run/##VVDLjsIwDLznK3xLI1WVFtiHkPolK4QgdYV3QxolQQh@vtROKbs@jBJ7xvY43PJp8OtxdOQxQQvf@jrE7g0YV4JrwY2uQUrv8v0Q/BT80julOuzBni7@N1WUMR6ODuuS2Ce6o9kqmMLyhJ08@yHCxDwDeXgqCkmIzSEE9B03O5slTT049JU10LZ/ur90HDdC14H9l5sHq7mJ3S4sFTFdXC51XoovwUvJRUpndsnH4VSTgqNcmcWErMGC2b6Qa1iZ11ZlxNOSBt38DDTZYIExSoVIPleFZcbxAQ)
英文:

Pythonic doesn't mean "fewer lines". This is easily done with a simple for loop:

result = []
for line in lines:
    words = line.split()
    result.append(' '.join(words[:2]))
    result.append(' '.join(words[2:]))

Which gives your desired result:

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

Try it online!

If you want to make this more general for strings with more words, you can write a function that will yield chunks of the desired size, and use that with str.join:

def chunks(iterable, chunk_size):
    c = []
    for item in iterable:
        c.append(item)
        if len(c) == chunk_size:
            yield c
            c = []

    if c: yield c

result = []
for line in lines:
    words = line.split()
    for chunk in chunks(words, 2):
        result.append(' '.join(chunk))

Try it online!

答案3

得分: 0

>>> words = [item for line in lines for item in line.split()]
>>> words
['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8']
>>> [l[i] + ' ' + l[i+1] for i in range(0, len(words), 2)]
['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']
英文:

Modified @jsbueno's ealier answer which was slightly incorrect:

>>> words = [item for line in lines for item in line.split()]
>>> words
['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8']
>>> [l[i] + ' ' + l[i+1] for i in range(0, len(words), 2)]
['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

答案4

得分: 0

一个优化的解决方案,将所有每个项目的工作都推到了C层:

from itertools import chain

lines = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
words = chain.from_iterable(map(str.split, lines))
paired = list(map('{} {}'.format, words, words))
print(paired)

[在线尝试!][TIO-ldt8uubn]

chain.from_iterable(map(str.split, lines)) 创建了一个单词的迭代器。map('{} {}'.format, words, words) 将相同的迭代器映射两次,将它们重新组合成成对的形式(map(' '.join, zip(words, words)) 可以达到相同的效果,但是会有一个额外的中间产物;可以测试一下哪个在实践中更快)。list 包装器用来生成最终的结果。

这种方法通过避免在Python层面进行所有每个项目的工作(随着输入增长,没有额外的字节码被执行),并且避免了Python的一个奇怪的高开销方面(索引和简单的整数运算),超过了现有的答案。

英文:

An optimized solution that pushes all the per-item work to the C layer:

from itertools import chain

lines = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
words = chain.from_iterable(map(str.split, lines))
paired = list(map('{} {}'.format, words, words))
print(paired)

[Try it online!][TIO-ldt8uubn]

chain.from_iterable(map(str.split, lines)) creates an iterator of the individual words. map('{} {}'.format, words, words) maps the same iterator twice to put them back together in pairs (map(' '.join, zip(words, words)) would get the same effect, but with an additional intermediate product; feel free to test which is faster in practice). The list wrapper consumes it to produce the final result.

This beats the existing answers by avoiding all per-item work at the Python layer (no additional bytecode executed as the input grows), and avoids one of the weirdly high overhead aspects of Python (indexing and simple integer math).

[TIO-ldt8uubn]: https://tio.run/##LY3LCsIwEEX3@YrZpYVS0Pra@CUiEm1KB/JiMiBS@u2xGd2cyzD3kT48xzCUMlH0gGyJY3QZ0KdIDK/ZYFDKYbAZrnDT70jjDir3wkF40B3I6yjnSXgWXvRdVa1xaevr0qMumaezjTepyUx9Tg65A1lqW5UMkh23jMPMYtLLCsuq@ymSN5tTSv9SA4SBm1@sLeUL "Python 3 – Try It Online"

huangapple
  • 本文由 发表于 2023年2月7日 03:39:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75365818.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定