正则表达式匹配数字 – Python

huangapple go评论88阅读模式
英文:

Regex on numbers - python

问题

我是正则表达式的新手,需要一些帮助。我有一个包含金额列的数据框,大多数情况下是像869,850.0这样的数字,我只需要以950.00或999.00结尾的行,不需要像999.1这样的值。我没有想出如何在pandas中筛选这些值的任何想法。

所以我尝试使用正则表达式的匹配,并且因为我对此还很陌生,我只知道如何获取数字部分,类似于[^.]*。但我不知道如何应用if语句以及如何继续,有人可以帮助我吗?

英文:

I am new to regex and I need some help please. I have a dataframe in which I got column with amount, which is in most cases something like 869,850.0 and I need only rows where the number is ending with 950.00 or 999.00 I dont need something like 999.1 . I did not came up with any idea how to filer these values in pandas.

So I am trying to apply match with regex and because I am new to this I only know how to get number unit . something like [^.]*. but I dont know how to apply if and how to continue, can someone please help me?

答案1

得分: 0

如果你想使用正则表达式,请尝试以下内容:

[0-9]{3}\.0{2}

它将首先匹配3个数字,然后匹配点(.),然后匹配2个0。我希望这个正则表达式足够容易理解和调整。

你可以在这里尝试这个正则表达式:

https://regex101.com/

英文:

If you want to use regex try the following:

[0-9]{3}\.0{2}

It will first match 3 numbers then match the dot(.) and then match 2 0. I hope this regex is easy enough to understand and tweak.

You can try the regex out here:

https://regex101.com/

答案2

得分: 0

使用取模运算(%)而不是正则表达式。它将给你除法后的余数,可以用来获取给定10的幂的数字的“尾数”。可以使用这个尾数来检查你的条件。

在你的情况下,通过除以1000得到的余数就是你要找的余数,见下面的示例:

import pandas as pd

s = pd.Series([
    1950., 
    1012950., 
    2999., 
    1950.1, 
])

s % 1000

# 返回 
# 950.
# 950.
# 999.
# 950.1

(s % 1000).isin([999., 950.])  # 允许的值

# 返回
# True
# True
# True
# False

s[(s % 1000).isin([999., 950.])]

# 返回
# 1950.
# 1012950.
# 2999.

# 不包括: 1950.1,因为950.1不在[999., 950.]中
英文:

Use modulo (%) instead of regex. It will give you the remainder after division, which can be used to get the "tail" of a number if a power of 10 is given. This tail can be checked with your conditions.

In your case the remainder of dividing by 1000 gives you the remainder you are looking for, see below example:

import pandas as pd

s = pd.Series([
    1950., 
    1012950., 
    2999., 
    1950.1, 
])

s % 1000

# Returns 
# 950.
# 950.
# 999.
# 950.1

(s % 1000).isin([999., 950.])  # allowed values

# Returns
# True
# True
# True
# False

s[(s % 1000).isin([999., 950.])]

# Returns
# 1950.
# 1012950.
# 2999.

# Not: 1950.1, because 950.1 is not in [999., 950.]

答案3

得分: 0

你也可以尝试这样做:

df = pd.DataFrame({'a': ['850890.0', '850999.0', '850990.0', '850995.0']})
print(df)

输出:

          a
0  850890.0
1  850999.0
2  850990.0
3  850950.0

# 创建一个正则表达式以查找所有数字
numbers = ['999.0', '950.0']

# 现在将这些值放入一个模式中
pattern = r'(?:{})'.format('|'.join(numbers))
pattern

输出:

'(?:999.0|950.0)'

dfnew = df['a'].str.findall(pattern).apply(''.join, 1)
print(dfnew)

输出:

0         
1    999.0
2         
3    950.0
Name: a, dtype: object
英文:

You can also try this:

     df = pd.DataFrame({'a': ['850890.0', '850999.0', '850990.0', '850995.0']})
     print(df)
               a
     0  850890.0
     1  850999.0
     2  850990.0
     3  850950.0

     #create a regex to find all numbers
     numbers = ['999.0', '950.0']

     #now put these values in a pattern
     pattern =  r'(?:{})'.format('|'.join(numbers))
     pattern
     '(?:999.0|950.0)'

     dfnew = df['a'].str.findall(pattern).apply(''.join,1)
     print(dfnew)
     0         
     1    999.0     
     2    
     3    950.0
     Name: a, dtype: object

huangapple
  • 本文由 发表于 2020年1月3日 20:42:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/59578857.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定