2023年5月22日 17:36:07go评论67阅读模式

英文:

Python regex to match identical city names but formatted differently

问题

In your provided code, you are attempting to compare identical city names that are formatted differently. However, there are a couple of issues in the code. To resolve the problem you mentioned, I've made the necessary adjustments:

import re
import unicodedata


class CompareCities:
    def __init__(self):
        self.city_regex = re.compile(
            r"^[A-Za-z]+([ -]?[A-Za-z]+)*$"
        )

    def compare(self, city1, city2):
        city1_normalized = self._normalize(city1)
        city2_normalized = self._normalize(city2)

        return city1_normalized == city2_normalized

    def _normalize(self, city):
        city = (
            self.city_regex.match(city).group()
            if self.city_regex.match(city)
            else False
        )

        if not city:
            return False

        city = (
            unicodedata.normalize("NFD", city)
            .encode("ascii", "ignore")
            .decode("utf-8")
            .upper()
        )
        return city


compare_cities = CompareCities()
result = compare_cities.compare("New York", "nEw-yOrk")

if result:
    print("same city")
else:
    print("not same city")

I have updated the regular expression and removed the HTML entity encoding (e.g., ") from the code for better readability and functionality. Now, "New York" and "nEw-yOrk" should match correctly.

英文:

In a Python regex, I am trying to compare identical city names that are formatted differently (uppercase/lowercase, separated by spaces or hyphens, characters with different cases). For example, "Paris" and "paris", "New York" and "neW-york" should match, but not "Paris" and "New York".

my code :

import re
import unicodedata


class CompareCities:
    def __init__(self):
        self.city_regex = re.compile(
            r&quot;^(([A-Z]+[a-z]*)|([a-z]+[A-Z]*))[ -]?(([A-Z]+[a-z]*)|([a-z]+[A-Z]*))$&quot;
        )

    def compare(self, city1, city2):

        city1_normalized = self._normalize(city1)
        city2_normalized = self._normalize(city2)

        return city1_normalized == city2_normalized

    def _normalize(self, city):
        city = (
            self.city_regex.match(city).group()
            if self.city_regex.match(city)
            else False
        )

        if not city:
            return False

        city = (
            unicodedata.normalize(&quot;NFD&quot;, city)
            .encode(&quot;ascii&quot;, &quot;ignore&quot;)
            .decode(&quot;utf-8&quot;)
            .upper()
        )
        return city


compare_cities = CompareCities()
result = compare_cities.compare(&quot;New York&quot;, &quot;nEw-yOrk&quot;)

if result:
    print(&quot;same city&quot;)
else:
    print(&quot;not same city&quot;)

the probleme , for example , "New York" and "nEw-yOrk" should match, but not.

thank you for help

答案1

得分: 0

如果你可以使用正则表达式以外的方法，你可以尝试处理这些字符串，将它们都转换为小写字母并使用.lower()去除任何分隔符字符，比如空格和连字符，可以使用.replace()。可能有一些情况，两个不同的城市只是因为空格或连字符而不同，但根据具体情况，这可能只是一个很小的问题。

英文:

If you can use something else than regex, you could try instead to process the strings, by passing them all to lowercase using .lower() and removing any delimiter characters, such as spaces and hyphens, with .replace().

There are likely some cases where two different cities differ only from a space or a hyphen, but depending on the application this might be a very small issue.

答案2

得分: 0

def compare(city1, city2):
city1_normalized = city1.casefold().split() # city1 应该是实际的地理城市，比如巴黎、纽约、新德里
city2_normalized = city2.casefold().split('-') # city2 可以是大小写混合和连字符
return city1_normalized == city2_normalized

compare("New York", "nEw-yOrk")
#True

英文:

def compare(city1, city2):
    city1_normalized = city1.casefold().split()  # city1  should be the actual Geographical city like Paris, New York, New Delhi
    city2_normalized = city2.casefold().split(&#39;-&#39;) # city2 can be a mixture of upper lower case and hyphens
    return city1_normalized == city2_normalized

compare(&quot;New York&quot;, &quot;nEw-yOrk&quot;)
#True

答案3

得分: -2

首先，我建议将城市名称全部转换为小写（.lower），然后将所有的“-”替换为“ ”或反过来，然后进行检查，无需使用复杂的正则表达式。

英文:

Well, first I would suggest making the city names all lowercase (.lower) and then replacing all "-"s with " "s or the other way around and then doing the check—no need for overcomplicated regex.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python正则表达式匹配相同但格式不同的城市名称

问题

答案1

答案2

答案3

argparse在Python类中的验证

Tkinter 图形用户界面启动按钮注册输入但不重新启动程序

如何将一个字符串分割成多个相等大小的子字符串

SSL ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论