英文:
How to use genfromtxt with with brackets in format?
问题
以下是代码部分的翻译:
我有一个 **csv 文件**,其中包含以下数值:
# number,array1,array2
0,[1,2,3,4,5],[6,7,8,9,10]
现在我想要加载这两个数组,但当我运行以下代码:
new_array = np.genfromtxt(fname='file_name.csv',
skip_header=1,
defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
deletechars='[,]',
usecols = (1,2,3,4,5),
dtype=(int),
delimiter=',',
comments='# ',)
然后我得到的数组数值是:
[-1 2 3 4 -1]
而不是:
[1 2 3 4 5]
如果我理解正确,问题在于方括号,但我预期
deletechars='[,]';
可以解决这个问题。如何让 **genfromtxt** 正确读取这些数值?
英文:
I have a csv file with the following values:
# number,array1,array2
0,[1,2,3,4,5],[6,7,8,9,10]
Now I would like to load these two arrays, but when i run:
new_array = np.genfromtxt(fname='file_name.csv',
skip_header=1,
defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
deletechars='[,]',
usecols = (1,2,3,4,5),
dtype=(int),
delimiter=',',
comments='# ',)
Then i get an array with values:
[-1 2 3 4 -1]
Instead of:
[1 2 3 4 5]
If I understand correctly, the problem are the brackets, but I expected that
deletechars='[,]'
would do the trick. How do I get genfromtxt to read these values correctly?
答案1
得分: 1
我认为 deletchars
只影响列名,而不影响它们的数据。我认为你需要一个“转换器”来删除方括号:
conv = lambda x: int(re.sub(b"[\[\]]", b"", x))
然后你可以使用:
a = np.genfromtxt(fname='file.csv',
skip_header=1,
defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
usecols = (1,2,3,4,5),
dtype=int,
delimiter=',',
comments='# ',
converters={1:conv,2:conv,3:conv,4:conv,5:conv})
a
array([1, 2, 3, 4, 5])
英文:
I think deletchars
only affects the column names, rather than their data. I think you need a "converter" to remove the square brackets:
conv = lambda x: int(re.sub(b"[\[\]]", b"", x))
Then you can use:
In [84]: a = np.genfromtxt(fname='file.csv',
...: skip_header=1,
...: defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i
...: ','%i'],
...: usecols = (1,2,3,4,5),
...: dtype=int,
...: delimiter=',',
...: comments='# ',
...: converters={1:conv,2:conv,3:conv,4:conv,5:conv})
In [85]: a
Out[85]: array([1, 2, 3, 4, 5])
答案2
得分: 1
在您的复杂情况下,您可以使用正则表达式解析加载所有数组,使用 numpy.fromregex
和 numpy.fromstring
:
rows = np.fromregex(test_txt, regexp=r'\d+,\[([\d,]+)\],\[([\d,]+)\]', dtype=[('c1', 'O'), ('c2', 'O')])
arr = [np.fromstring(c, sep=',', dtype=np.int32) for row in rows for c in row]
print(arr)
结果如下:
[array([1, 2, 3, 4, 5], dtype=int32), array([ 6, 7, 8, 9, 10], dtype=int32)]
英文:
In your sophisticated case you can load all arrays by regex parsing with numpy.fromregex
and numpy.fromstring
:
rows = np.fromregex(test_txt, regexp=r'\d+,\[([\d,]+)\],\[([\d,]+)\]', dtype=[('c1', 'O'), ('c2', 'O')])
arr = [np.fromstring(c, sep=',', dtype=np.int32) for row in rows for c in row]
print(arr)
[array([1, 2, 3, 4, 5], dtype=int32), array([ 6, 7, 8, 9, 10], dtype=int32)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论