2023年2月7日 03:39:16go评论70阅读模式

英文:

Group list of 4-strings into list of pairs

问题

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

英文:

I have following list of strings:

[&#39;word1 word2 word3 word4&#39;, &#39;word5 word6 word7 word8&#39;]

(I have shown only two strings, but there can be many.)
I want to create new list which should look like this:

[&#39;word1 word2&#39;, &#39;word3 word4&#39;, &#39;word5 word6&#39;, &#39;word7 word8&#39;]

I tried following:

lines = [&#39;word1 word2 word3 word4&#39;, &#39;word5 word6 word7 word8&#39;]
[[word1 + &#39; &#39; + word2, word3 + &#39; &#39; + word4] for line in lines for word1, word2, word3, word4 in line.split()]

But it gives following error:

ValueError: too many values to unpack (expected 4)

How do I do this in most pythonic way?

答案1

得分: 2

Here is the translated code without any additional content:

import re

lst = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
res = [pair for words in lst for pair in re.findall(r'\S+ \S+', words)]

And the result:

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

英文:

With short regex matching:

import re

lst = [&#39;word1 word2 word3 word4&#39;, &#39;word5 word6 word7 word8&#39;]
res = [pair for words in lst for pair in re.findall(r&#39;\S+ \S+&#39;, words)]

\S+ \S+ - matches 2 consecutive "words"

[&#39;word1 word2&#39;, &#39;word3 word4&#39;, &#39;word5 word6&#39;, &#39;word7 word8&#39;]

答案2

得分: 0

Pythonic并不意味着“代码行数更少”。使用简单的for循环可以轻松实现：

result = []
for line in lines:
words = line.split()
result.append(' '.join(words[:2]))
result.append(' '.join(words[2:]))


这将得到您期望的结果：

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']


[在线尝试！](https://tio.run/##jY1LCgMhDED3niI7FYpQpz@EnmRwMTCWWiSKWkpPbyeZCzSLF5K8JOXbnxmnMVLE0OAOs/zkuh6BaJkT8yQPwKMzlxfmlXmTXtTQ3qnTvhePXIHOQUTOzQnYglT6QC3TSopdaR7su2YpJeCqJEjzyhEV@7OzXv@hWUeaKDViV7upx/gB)

如果您希望对更多单词的字符串进行通用处理，可以编写一个函数，该函数将生成所需大小的块，并将其与`str.join`一起使用：

def chunks(iterable, chunk_size):
c = []
for item in iterable:
c.append(item)
if len(c) == chunk_size:
yield c
c = []

if c: yield c

result = []
for line in lines:
words = line.split()
for chunk in chunks(words, 2):
result.append(' '.join(chunk))


[在线尝试！](https://tio.run/##VVDLjsIwDLznK3xLI1WVFtiHkPolK4QgdYV3QxolQQh@vtROKbs@jBJ7xvY43PJp8OtxdOQxQQvf@jrE7g0YV4JrwY2uQUrv8v0Q/BT80julOuzBni7@N1WUMR6ODuuS2Ce6o9kqmMLyhJ08@yHCxDwDeXgqCkmIzSEE9B03O5slTT049JU10LZ/ur90HDdC14H9l5sHq7mJ3S4sFTFdXC51XoovwUvJRUpndsnH4VSTgqNcmcWErMGC2b6Qa1iZ11ZlxNOSBt38DDTZYIExSoVIPleFZcbxAQ)

英文:

Pythonic doesn't mean "fewer lines". This is easily done with a simple for loop:

result = []
for line in lines:
    words = line.split()
    result.append(&#39; &#39;.join(words[:2]))
    result.append(&#39; &#39;.join(words[2:]))

Which gives your desired result:

[&#39;word1 word2&#39;, &#39;word3 word4&#39;, &#39;word5 word6&#39;, &#39;word7 word8&#39;]

Try it online!

If you want to make this more general for strings with more words, you can write a function that will yield chunks of the desired size, and use that with str.join:

def chunks(iterable, chunk_size):
    c = []
    for item in iterable:
        c.append(item)
        if len(c) == chunk_size:
            yield c
            c = []

    if c: yield c

result = []
for line in lines:
    words = line.split()
    for chunk in chunks(words, 2):
        result.append(&#39; &#39;.join(chunk))

Try it online!

答案3

得分: 0

>>> words = [item for line in lines for item in line.split()]
>>> words
['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8']
>>> [l[i] + ' ' + l[i+1] for i in range(0, len(words), 2)]
['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

英文:

Modified @jsbueno's ealier answer which was slightly incorrect:

&gt;&gt;&gt; words = [item for line in lines for item in line.split()]
&gt;&gt;&gt; words
[&#39;word1&#39;, &#39;word2&#39;, &#39;word3&#39;, &#39;word4&#39;, &#39;word5&#39;, &#39;word6&#39;, &#39;word7&#39;, &#39;word8&#39;]
&gt;&gt;&gt; [l[i] + &#39; &#39; + l[i+1] for i in range(0, len(words), 2)]
[&#39;word1 word2&#39;, &#39;word3 word4&#39;, &#39;word5 word6&#39;, &#39;word7 word8&#39;]

答案4

得分: 0

一个优化的解决方案，将所有每个项目的工作都推到了C层：

from itertools import chain

lines = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
words = chain.from_iterable(map(str.split, lines))
paired = list(map('{} {}'.format, words, words))
print(paired)

[在线尝试！][TIO-ldt8uubn]

chain.from_iterable(map(str.split, lines)) 创建了一个单词的迭代器。map('{} {}'.format, words, words) 将相同的迭代器映射两次，将它们重新组合成成对的形式（map(' '.join, zip(words, words)) 可以达到相同的效果，但是会有一个额外的中间产物；可以测试一下哪个在实践中更快）。list 包装器用来生成最终的结果。

这种方法通过避免在Python层面进行所有每个项目的工作（随着输入增长，没有额外的字节码被执行），并且避免了Python的一个奇怪的高开销方面（索引和简单的整数运算），超过了现有的答案。

英文:

An optimized solution that pushes all the per-item work to the C layer:

from itertools import chain

lines = [&#39;word1 word2 word3 word4&#39;, &#39;word5 word6 word7 word8&#39;]
words = chain.from_iterable(map(str.split, lines))
paired = list(map(&#39;{} {}&#39;.format, words, words))
print(paired)

[Try it online!][TIO-ldt8uubn]

chain.from_iterable(map(str.split, lines)) creates an iterator of the individual words. map('{} {}'.format, words, words) maps the same iterator twice to put them back together in pairs (map(' '.join, zip(words, words)) would get the same effect, but with an additional intermediate product; feel free to test which is faster in practice). The list wrapper consumes it to produce the final result.

This beats the existing answers by avoiding all per-item work at the Python layer (no additional bytecode executed as the input grows), and avoids one of the weirdly high overhead aspects of Python (indexing and simple integer math).

[TIO-ldt8uubn]: https://tio.run/##LY3LCsIwEEX3@YrZpYVS0Pra@CUiEm1KB/JiMiBS@u2xGd2cyzD3kT48xzCUMlH0gGyJY3QZ0KdIDK/ZYFDKYbAZrnDT70jjDir3wkF40B3I6yjnSXgWXvRdVa1xaevr0qMumaezjTepyUx9Tg65A1lqW5UMkh23jMPMYtLLCsuq@ymSN5tTSv9SA4SBm1@sLeUL "Python 3 – Try It Online"

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将4个字符串的列表分成一对一对的列表。

问题

答案1

答案2

答案3

答案4

如何在Dockerfile中配置PYTHONPATH环境变量？

正则表达式按括号拆分，但不是所有括号。

用注释替换Python脚本中的打印命令。

翻译结果：大数据框分组后的热力图

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论