如何在格式中使用`genfromtxt`和括号?

huangapple go评论132阅读模式
英文:

How to use genfromtxt with with brackets in format?

问题

以下是代码部分的翻译:

  1. 我有一个 **csv 文件**其中包含以下数值
  2. # number,array1,array2
  3. 0,[1,2,3,4,5],[6,7,8,9,10]
  4. 现在我想要加载这两个数组但当我运行以下代码
  5. new_array = np.genfromtxt(fname='file_name.csv',
  6. skip_header=1,
  7. defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
  8. deletechars='[,]',
  9. usecols = (1,2,3,4,5),
  10. dtype=(int),
  11. delimiter=',',
  12. comments='# ',)
  13. 然后我得到的数组数值是
  14. [-1 2 3 4 -1]
  15. 而不是
  16. [1 2 3 4 5]
  17. 如果我理解正确问题在于方括号但我预期
  18. deletechars='[,]';
  19. 可以解决这个问题如何让 **genfromtxt** 正确读取这些数值
英文:

I have a csv file with the following values:

  1. # number,array1,array2
  2. 0,[1,2,3,4,5],[6,7,8,9,10]

Now I would like to load these two arrays, but when i run:

  1. new_array = np.genfromtxt(fname='file_name.csv',
  2. skip_header=1,
  3. defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
  4. deletechars='[,]',
  5. usecols = (1,2,3,4,5),
  6. dtype=(int),
  7. delimiter=',',
  8. comments='# ',)

Then i get an array with values:

  1. [-1 2 3 4 -1]

Instead of:

  1. [1 2 3 4 5]

If I understand correctly, the problem are the brackets, but I expected that

  1. deletechars='[,]'

would do the trick. How do I get genfromtxt to read these values correctly?

答案1

得分: 1

我认为 deletchars 只影响列名,而不影响它们的数据。我认为你需要一个“转换器”来删除方括号:

  1. conv = lambda x: int(re.sub(b"[\[\]]", b"", x))

然后你可以使用:

  1. a = np.genfromtxt(fname='file.csv',
  2. skip_header=1,
  3. defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i'],
  4. usecols = (1,2,3,4,5),
  5. dtype=int,
  6. delimiter=',',
  7. comments='# ',
  8. converters={1:conv,2:conv,3:conv,4:conv,5:conv})
  1. a
  1. array([1, 2, 3, 4, 5])
英文:

I think deletchars only affects the column names, rather than their data. I think you need a "converter" to remove the square brackets:

  1. conv = lambda x: int(re.sub(b"[\[\]]", b"", x))

Then you can use:

  1. In [84]: a = np.genfromtxt(fname='file.csv',
  2. ...: skip_header=1,
  3. ...: defaultfmt=['%i','%i','%i','%i','%i','%i','%i','%i','%i','%i
  4. ...: ','%i'],
  5. ...: usecols = (1,2,3,4,5),
  6. ...: dtype=int,
  7. ...: delimiter=',',
  8. ...: comments='# ',
  9. ...: converters={1:conv,2:conv,3:conv,4:conv,5:conv})
  10. In [85]: a
  11. Out[85]: array([1, 2, 3, 4, 5])

答案2

得分: 1

在您的复杂情况下,您可以使用正则表达式解析加载所有数组,使用 numpy.fromregexnumpy.fromstring

  1. rows = np.fromregex(test_txt, regexp=r'\d+,\[([\d,]+)\],\[([\d,]+)\]', dtype=[('c1', 'O'), ('c2', 'O')])
  2. arr = [np.fromstring(c, sep=',', dtype=np.int32) for row in rows for c in row]
  3. print(arr)

结果如下:

  1. [array([1, 2, 3, 4, 5], dtype=int32), array([ 6, 7, 8, 9, 10], dtype=int32)]
英文:

In your sophisticated case you can load all arrays by regex parsing with numpy.fromregex and numpy.fromstring:

  1. rows = np.fromregex(test_txt, regexp=r'\d+,\[([\d,]+)\],\[([\d,]+)\]', dtype=[('c1', 'O'), ('c2', 'O')])
  2. arr = [np.fromstring(c, sep=',', dtype=np.int32) for row in rows for c in row]
  3. print(arr)

  1. [array([1, 2, 3, 4, 5], dtype=int32), array([ 6, 7, 8, 9, 10], dtype=int32)]

huangapple
  • 本文由 发表于 2023年3月8日 19:13:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75672279.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定