如何在格式中使用`genfromtxt`和括号?

huangapple go评论97阅读模式
英文:

How to use genfromtxt with with brackets in format?

问题

以下是代码部分的翻译:

我有一个 **csv 文件**其中包含以下数值

    # number,array1,array2
    0,[1,2,3,4,5],[6,7,8,9,10]

现在我想要加载这两个数组但当我运行以下代码

    new_array = np.genfromtxt(fname='file_name.csv',
               skip_header=1,
               defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
               deletechars='[,]',
               usecols = (1,2,3,4,5),
               dtype=(int),
               delimiter=',',
               comments='# ',)

然后我得到的数组数值是

    [-1  2  3  4 -1]

而不是

    [1  2  3  4 5]

如果我理解正确问题在于方括号但我预期

    deletechars='[,]';

可以解决这个问题如何让 **genfromtxt** 正确读取这些数值
英文:

I have a csv file with the following values:

# number,array1,array2
0,[1,2,3,4,5],[6,7,8,9,10]

Now I would like to load these two arrays, but when i run:

new_array = np.genfromtxt(fname='file_name.csv',
           skip_header=1,
           defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
           deletechars='[,]',
           usecols = (1,2,3,4,5),
           dtype=(int),
           delimiter=',',
           comments='# ',)

Then i get an array with values:

[-1  2  3  4 -1]

Instead of:

[1  2  3  4 5]

If I understand correctly, the problem are the brackets, but I expected that

deletechars='[,]'

would do the trick. How do I get genfromtxt to read these values correctly?

答案1

得分: 1

我认为 deletchars 只影响列名,而不影响它们的数据。我认为你需要一个“转换器”来删除方括号:

conv = lambda x: int(re.sub(b"[\[\]]", b"", x))

然后你可以使用:

a = np.genfromtxt(fname='file.csv',
           skip_header=1,
           defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
           usecols = (1,2,3,4,5),
           dtype=int,
           delimiter=',',
           comments='# ',
           converters={1:conv,2:conv,3:conv,4:conv,5:conv})
a
array([1, 2, 3, 4, 5])
英文:

I think deletchars only affects the column names, rather than their data. I think you need a "converter" to remove the square brackets:

conv = lambda x: int(re.sub(b"[\[\]]", b"", x))

Then you can use:

In [84]: a = np.genfromtxt(fname='file.csv',
    ...:            skip_header=1,
    ...:            defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i
    ...: ','%i'],
    ...:            usecols = (1,2,3,4,5),
    ...:            dtype=int,
    ...:            delimiter=',',
    ...:            comments='# ',
    ...:            converters={1:conv,2:conv,3:conv,4:conv,5:conv})

In [85]: a
Out[85]: array([1, 2, 3, 4, 5])

答案2

得分: 1

在您的复杂情况下,您可以使用正则表达式解析加载所有数组,使用 numpy.fromregexnumpy.fromstring

rows = np.fromregex(test_txt, regexp=r'\d+,\[([\d,]+)\],\[([\d,]+)\]', dtype=[('c1', 'O'), ('c2', 'O')])
arr = [np.fromstring(c, sep=',', dtype=np.int32) for row in rows for c in row]
print(arr)

结果如下:

[array([1, 2, 3, 4, 5], dtype=int32), array([ 6,  7,  8,  9, 10], dtype=int32)]
英文:

In your sophisticated case you can load all arrays by regex parsing with numpy.fromregex and numpy.fromstring:

rows = np.fromregex(test_txt, regexp=r'\d+,\[([\d,]+)\],\[([\d,]+)\]', dtype=[('c1', 'O'), ('c2', 'O')])
arr = [np.fromstring(c, sep=',', dtype=np.int32) for row in rows for c in row]
print(arr)

[array([1, 2, 3, 4, 5], dtype=int32), array([ 6,  7,  8,  9, 10], dtype=int32)]

huangapple
  • 本文由 发表于 2023年3月8日 19:13:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75672279.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定