在CSV中解析字典数值

huangapple go评论99阅读模式
英文:

Parse the dictionary values in CSV

问题

CSV文件包含

  1. id,type,attributes
  2. 1,xx,{'data': { 'attributes': {'aggregations': [{'space': 'sum','time': 'sum'}],'created_at': '2020-03-25T09:48:37.463835Z','include_percentiles': true,'metric_type': 'count','modified_at': '2020-03-25T09:48:37.463835Z','tags': ['app','datacenter']},'id': 'test.metric.latency','type': 'manage_tags'}}

如何使用pandas dataframe从CSV文件中解析属性。

期望输出

  1. id type space created_at include_percentiles metric_type tags
  2. 1 xx sum 020-03-25T09:48:37.463835Z true count app
  3. 1 xx sum 020-03-25T09:48:37.463835Z true count datacenter
英文:

CSV File contains

  1. id,type,attributes
  2. 1,xx,{'data': { 'attributes': {'aggregations': [{'space': 'sum','time': 'sum'}],'created_at': '2020-03-25T09:48:37.463835Z','include_percentiles': true,'metric_type': 'count','modified_at': '2020-03-25T09:48:37.463835Z','tags': ['app','datacenter']},'id': 'test.metric.latency','type': 'manage_tags'}}

how to parse the attributes from CSV file using pandas dataframe.

Expecting output

  1. id type space created_at include_percentiles metric_type tags
  2. 1 xx sum 020-03-25T09:48:37.463835Z true count app
  3. 1 xx sum 020-03-25T09:48:37.463835Z true count datacenter

答案1

得分: 0

这里的挑战是CSV数据的格式不适用于传统解析器。

这是因为数据基本上是逗号分隔的, 'attributes' 值包含了不是值分隔符的逗号。

'attributes' 值既不是Python字典的字符串表示形式,也不是JSON。

如果数据意味着表示一个Python字典,那么唯一的问题(如所示的数据)是值 'true'。我们可以通过将其更改为True来解决这个问题。让我们也假设可能会有一个值 'false',因此我们也会处理它。

  1. from pandas import DataFrame
  2. from ast import literal_eval
  3. alldata = []
  4. with open('/Volumes/G-Drive/foo.csv') as data:
  5. next(data) # 跳过列名
  6. for line in data:
  7. _id, _type, *dr = line.split(',')
  8. ds = ','.join(dr).replace('true', 'True').replace('false', 'False')
  9. attrs = literal_eval(ds)['data']['attributes']
  10. rd = {
  11. 'id': _id,
  12. 'type': _type,
  13. 'space': attrs['aggregations'][0]['space'],
  14. 'created_at': attrs['created_at'],
  15. 'include_percentiles': attrs['include_percentiles'],
  16. 'metric_type': attrs['metric_type']
  17. }
  18. for tag in attrs['tags']:
  19. rd['tags'] = tag
  20. alldata.append(rd)
  21. print(DataFrame(alldata))

输出:

  1. id type space created_at include_percentiles metric_type tags
  2. 0 1 xx sum 2020-03-25T09:48:37.463835Z True count app
  3. 1 1 xx sum 2020-03-25T09:48:37.463835Z True count datacenter
英文:

The challenge here is that the CSV data are not in a format that can be handled by a traditional parser.

This is because the data are essentially comma-delimited but the 'attributes' value contains commas that are not value separators.

The 'attributes' value is neither a string representation of a Python dictionary nor is it JSON.

If the data are meant to represent a Python dictionary then the only issue (with the data as shown) is the value 'true'. We can overcome that by changing it to True. Let's also assume that there might be a value of 'false' so we'll deal with that too.

  1. from pandas import DataFrame
  2. from ast import literal_eval
  3. alldata = []
  4. with open('/Volumes/G-Drive/foo.csv') as data:
  5. next(data) # skip column names
  6. for line in data:
  7. _id, _type, *dr = line.split(',')
  8. ds = ','.join(dr).replace('true', 'True').replace('false', 'False')
  9. attrs = literal_eval(ds)['data']['attributes']
  10. rd = {
  11. 'id': _id,
  12. 'type': _type,
  13. 'space': attrs['aggregations'][0]['space'],
  14. 'created_at': attrs['created_at'],
  15. 'include_percentiles': attrs['include_percentiles'],
  16. 'metric_type': attrs['metric_type']
  17. }
  18. for tag in attrs['tags']:
  19. rd['tags'] = tag
  20. alldata.append(rd)
  21. print(DataFrame(alldata))

Output:

  1. id type space created_at include_percentiles metric_type tags
  2. 0 1 xx sum 2020-03-25T09:48:37.463835Z True count app
  3. 1 1 xx sum 2020-03-25T09:48:37.463835Z True count datacenter

huangapple
  • 本文由 发表于 2023年5月11日 11:05:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223887.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定