2023年3月23日 12:03:52go评论96阅读模式

英文:

Get Closest match for a column in data frame

问题

我有一个包含不同呼叫类型的数据框，如下所示的值

    CallType
0         IN
1        OUT
2       a_in
3       asms
4   INCOMING
5   OUTGOING
6  A2P_SMSIN
7        ain
8       aout

我希望将其映射，以便输出如下

    CallType
0       IN
1       OUT
2       IN
3       SMS
4       IN
5       OUT
6       SMS
7       IN
8       OUT

我试图使用difflib.closestmatch，但它没有结果。以下是我的代码

CALL_TYPE=['IN','OUT','SMS','VOICE','SMT']
def test1():
    final_file_data = pd.DataFrame({
        'CallType': ['IN', 'OUT', 'a_in',
                         'asms', 'INCOMING', 'OUTGOING','A2P_SMSIN',
                         'ain', 'aout']})
    print(final_file_data)
    final_file_data['CallType'] = final_file_data['CallType'].apply(lambda x: difflib.get_close_matches(x, CALL_TYPE, n=1))

我得到的输出如下，只有IN和OUT有结果

 CallType
0     [IN]
1    [OUT]
2       []
3       []
4       []
5       []
6       []
7       []
8       []

我不确定我哪里出错了。

英文:

I have a data Frame which contains different call types as below values

    CallType
0         IN
1        OUT
2       a_in
3       asms
4   INCOMING
5   OUTGOING
6  A2P_SMSIN
7        ain
8       aout

I want to map this in such a way the output would be

    CallType
0       IN
1       OUT
2       IN
3       SMS
4       IN
5       OUT
6       SMS
7       IN
8       OUT

I am trying to use difflib.closestmatch but it gives no result . Below is my code

CALL_TYPE=[&#39;IN&#39;,&#39;OUT&#39;,&#39;SMS&#39;,&#39;VOICE&#39;,&#39;SMT&#39;]
def test1():
    final_file_data = pd.DataFrame({
        &#39;CallType&#39;: [&#39;IN&#39;, &#39;OUT&#39;, &#39;a_in&#39;,
                         &#39;asms&#39;, &#39;INCOMING&#39;, &#39;OUTGOING&#39;,&#39;A2P_SMSIN&#39;,
                         &#39;ain&#39;, &#39;aout&#39;]})
    print(final_file_data)
    final_file_data[&#39;CallType&#39;] = final_file_data[&#39;CallType&#39;].apply(lambda x: difflib.get_close_matches(x, CALL_TYPE, n=1))

The output I get is below which as results only for IN and OUT

 CallType
0     [IN]
1    [OUT]
2       []
3       []
4       []
5       []
6       []
7       []
8       []

I am not sure where I am going wrong .

答案1

得分: 1

这与 get_close_matches 是区分大小写的以及用于相似度得分的 cutoff 有关。您可以将字符串 x 转换为大写 (upper())，并将 cutoff 设置得更宽松。这就是我所做的：

final_file_data['CallType'] = final_file_data['CallType'].apply(lambda x: difflib.get_close_matches(x.upper(), CALL_TYPE, n=1, cutoff=0))

现在的 final_file_data 是：

  CallType
0     [IN]
1    [OUT]
2     [IN]
3    [SMS]
4     [IN]
5    [OUT]
6    [SMS]
7     [IN]
8    [OUT]

您可以在这里了解更多关于 get_close_matches 和 cutoff 参数的信息。

英文:

It has to do with get_close_matches being case-sensitive and the cutoff for the score that is gotten for similarity. You can manipulate the x string to upper() and change the cutoff to be less stringent. This is what I did:

final_file_data[&#39;CallType&#39;] = final_file_data[&#39;CallType&#39;].apply(lambda x: difflib.get_close_matches(x.upper(), CALL_TYPE, n=1, cutoff=0))

final_file_data is now:

  CallType
0     [IN]
1    [OUT]
2     [IN]
3    [SMS]
4     [IN]
5    [OUT]
6    [SMS]
7     [IN]
8    [OUT]

You can read more about the get_close_matches here to read more about the cutoff argument.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取数据框中列的最接近匹配。

问题

答案1

如何操作每个索引中的值？

如何在VS Code中为Jupyter Notebook（Windows）设置Latex。

如何在Python中获取整数输入的一部分

How to find 2 integers that can form the numerical values of a list and structure the answers in another list?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。