在Python或NumPy中已经有一些方法可以确定数字的格式吗?

huangapple go评论78阅读模式
英文:

Is there already something in python or numpy to determine a number's format?

问题

我需要确定一个字符串是一个普通整数一个普通浮点数一个使用 `e` 的浮点数或者无法解析为数字这是我想出的方法但这感觉像是已经存在的东西也许在numpy中我对库和谷歌进行了简要的扫描没有看到任何东西这已经是一个事情了吗只是我没有看到吗

PLAIN_INT, PLAIN_FLOAT, E_FLOAT, STRING = range(4)

# 应该只是可选的 - 然后是数字
sample_plain_ints = ['1', '0', '-5', '333333333']
# 需要包含一个点
plain_floats = ['1.0', '-5.0', '-33.212', '0.0', '-1.', '-3.']
# 不需要包含一个点
e_floats = ['1.3e5', '-1.2e5', '0.0e0', '5e-3', '3e23', '3E5', '-3E-12']
# 其他
strings = ['aether', '1ee3', 'buzz', 'eeep', '121212beep']

def determine_str_type(item):
    try:
        float(item)
        try:
            int(item)
            return PLAIN_INT
        except ValueError:
            return E_FLOAT if 'E' in item.upper() else PLAIN_FLOAT
    except ValueError:
        return STRING

assert all([determine_str_type(item) == PLAIN_INT for item in sample_plain_ints])
assert all([determine_str_type(item) == PLAIN_FLOAT for item in plain_floats])
assert all([determine_str_type(item) == E_FLOAT for item in e_floats])
assert all([determine_str_type(item) == STRING for item in strings])
英文:

I need to determine if a string is a plain int, a plain float, a float using e, or not parsable as a number. Here's what I came up with, but this feels like something that probably already exists, perhaps in numpy? I did a brief scan of the libraries and google and didn't see anything, is this already a thing and I'm just not seeing it?

PLAIN_INT, PLAIN_FLOAT, E_FLOAT, STRING = range(4)

# should be just optionally - then numbers
sample_plain_ints = ['1', '0', '-5', '333333333']
# need to contain a dot
plain_floats = ['1.0', '-5.0', '-33.212', '0.0', '-1.', '-3.']
# do not need to contain a dot
e_floats = ['1.3e5', '-1.2e5', '0.0e0', '5e-3', '3e23', '3E5', '-3E-12']
# other
strings = ['aether', '1ee3', 'buzz', 'eeep', '121212beep']


def determine_str_type(item):
    try:
        float(item)
        try:
            int(item)
            return PLAIN_INT
        except ValueError:
            return E_FLOAT if 'E' in item.upper() else PLAIN_FLOAT
    except ValueError:
        return STRING


assert all([determine_str_type(item) == PLAIN_INT for item in sample_plain_ints])
assert all([determine_str_type(item) == PLAIN_FLOAT for item in plain_floats])
assert all([determine_str_type(item) == E_FLOAT for item in e_floats])
assert all([determine_str_type(item) == STRING for item in strings])

答案1

得分: 5

我会用正则表达式 (`(-?\d+)(\.\d*)?([eE]-?\d+)?$`) 来处理捕获不同的部分然后根据输出决定

```python
import re

lst = ['1', '0', '-5', '333333333', '1.0', '-5.0', '-33.212', '0.0', '-1.', '-3.', '1.3e5', '-1.2e5', '0.0e0', '5e-3', '3e23', '3E5', '-3E-12', 'aether', '1ee3', 'buzz', 'eeep', '121212beep']

def determine_str_type(s):
    pat = re.compile(r'(-?\d+)(\.\d*)?([eE]-?\d+)?$')
    match m.groups() if (m:=pat.match(s)) else None:
        case None:
            return 'STRING '
        case (_, None, None):
            return 'PLAIN_INT'
        case (_, _, None):
            return 'PLAIN_FLOAT'
        case (_, _, _):
            return 'E_FLOAT'
        
for s in lst:
    print(f'{s: <11}: {determine_str_type(s)}')

输出:

1          : PLAIN_INT
0          : PLAIN_INT
-5         : PLAIN_INT
333333333  : PLAIN_INT
1.0        : PLAIN_FLOAT
-5.0       : PLAIN_FLOAT
-33.212    : PLAIN_FLOAT
0.0        : PLAIN_FLOAT
-1.        : PLAIN_FLOAT
-3.        : PLAIN_FLOAT
1.3e5      : E_FLOAT
-1.2e5     : E_FLOAT
0.0e0      : E_FLOAT
5e-3       : E_FLOAT
3e23       : E_FLOAT
3E5        : E_FLOAT
-3E-12     : E_FLOAT
aether     : STRING 
1ee3       : STRING 
buzz       : STRING 
eeep       : STRING 
121212beep : STRING 

正则表达式演示


<details>
<summary>英文:</summary>

I would use a regex for that (`(-?\d+)(\.\d*)?([eE]-?\d+)?$`), capture the different parts and decide depending on the output:

import re

lst = ['1', '0', '-5', '333333333', '1.0', '-5.0', '-33.212', '0.0', '-1.', '-3.', '1.3e5', '-1.2e5', '0.0e0', '5e-3', '3e23', '3E5', '-3E-12', 'aether', '1ee3', 'buzz', 'eeep', '121212beep']

def determine_str_type(s):
pat = re.compile(r'(-?\d+)(.\d*)?([eE]-?\d+)?$')
match m.groups() if (m:=pat.match(s)) else None:
case None:
return 'STRING '
case (, None, None):
return 'PLAIN_INT'
case (
, , None):
return 'PLAIN_FLOAT'
case (
, _, _):
return 'E_FLOAT'

for s in lst:
print(f'{s: <11}: {determine_str_type(s)}')

Output:

1 : PLAIN_INT
0 : PLAIN_INT
-5 : PLAIN_INT
333333333 : PLAIN_INT
1.0 : PLAIN_FLOAT
-5.0 : PLAIN_FLOAT
-33.212 : PLAIN_FLOAT
0.0 : PLAIN_FLOAT
-1. : PLAIN_FLOAT
-3. : PLAIN_FLOAT
1.3e5 : E_FLOAT
-1.2e5 : E_FLOAT
0.0e0 : E_FLOAT
5e-3 : E_FLOAT
3e23 : E_FLOAT
3E5 : E_FLOAT
-3E-12 : E_FLOAT
aether : STRING
1ee3 : STRING
buzz : STRING
eeep : STRING
121212beep : STRING

[regex demo](https://regex101.com/r/J41gmM/1)

</details>



huangapple
  • 本文由 发表于 2023年7月27日 22:42:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780886.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定