2023年6月12日 01:43:47go评论71阅读模式

英文:

Regular Expression works in Regex101 but not Jupyter notebook

问题

import re
with open('names.txt') as f:
    data = f.readlines()

twitter_pattern = re.compile(r"\s{1}[@]\w+")

twitter_match = twitter_pattern.findall(str(data))
print(twitter_match)

英文:

import re
with open(&#39;names.txt&#39;) as f:
    data = f.readlines()

twitter_pattern = re.compile(r&quot;\s{1}[@]\w+&quot;)

twitter_match = twitter_pattern.findall(str(data))
print(twitter_match)

names.txt is a list of full names, phone numbers and twitter handles.
\s{1}[@]\w+ should return only twitter handles, but returns an empty list. Everything seems to be working fine in regex101, but not when I run this through Jupyter Notebook.

The content of the file is identical to the data provided in the Regex101 link:

Osterberg, Sven-Erik	governor@norrbotten.co.se		Governor, Norrbotten	@sverik
, Tim	tim@killerrabbit.com		Enchanter, Killer Rabbit Cave
Butz, Ryan	ryanb@codingtemple.com	(555) 555-5543	CEO, Coding Temple	@ryanbutz
Doctor, The	doctor+companion@tardis.co.uk		Time Lord, Gallifrey
Exampleson, Example	me@example.com	555-555-5552	Example, Example Co.	@example
Pael, Ripal	ripalp@codingtemple.com	(555) 555-5553	Teacher, Coding Temple	@ripalp

答案1

得分: -2

readlines() 将文本读取为字符串数组。

文件

Hello
World

生成数组 ["Hello", "World"]。

str(data) 是该数组的文本表示形式。在Python中，它是文本 ["Hello", "World"]。请注意，换行符被消耗并被解释为数组下一项的开始。

在你的情况下，这意味着你会得到 [ 和 ] 以及大量额外的 " 和 ,，导致你的Twitter用户名后不再有空格。

要修复你的代码，请不要将文件读取为数组，而是将其作为文本读取。

with open(&#39;twitter.txt&#39;) as f:
    data = f.read()              # 而不是 readlines()

此外，请不要使你的正则表达式比必要的更复杂。\s@\w+ 是相同的但不容易混淆。

英文:

readlines() reads the text as an array of strings.

The file

Hello
World

gives the array ["Hello", "World"].

str(data) is the textual representation of that array. In Python that's the text ["Hello", "World"]. Note that the line break was consumed and interpreted as the start of the next item of the array.

In your case that means you get [ and ] as well as a lot of additional " and ,, with the consequence that you no longer have a whitespace after the Twitter handle.

To fix your code, don't read the file as an array, read it as text instead.

with open(&#39;twitter.txt&#39;) as f:
    data = f.read()              # instead of readlines()

Also, please don't make your Regex more complicated than necessary. \s@\w+ is identical but less confusing.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式在Regex101上有效，但在Jupyter笔记本中无效。

问题

答案1

在Python中的石头剪刀布游戏中出现的问题。

Flask SQLAlchemy，密码不存储在数据库中

理解Python中的二进制加法使用位操作

将一个值随机分配给一个固定大小的值列表

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论