将BigQuery的输出从Python保存为JSON。

huangapple go评论112阅读模式
英文:

Save the output of bigquery in JSON from python

问题

以下是修改后的代码,以将结果保存为JSON格式:

  1. from google.cloud import bigquery
  2. import json
  3. def query_stackoverflow(project_id="gwas-386212"):
  4. client = bigquery.Client()
  5. query = """
  6. WITH SNP_info AS (
  7. SELECT
  8. CONCAT(CAST(rs_id AS string)) AS identifier
  9. FROM
  10. `gwas-386212.gwas_dataset_1.SNPs_intergenic_vep_pha005199`
  11. )
  12. SELECT
  13. SNP_info.identifier AS identifier,
  14. variants.identifier AS identifier_1,
  15. variants.chr_id AS chr_id,
  16. variants.position AS position,
  17. variants.ref AS ref,
  18. variants.alt AS alt,
  19. variants.most_severe_consequence AS most_severe_consequence,
  20. variants.gene_id_any_distance AS gene_id_any_distance,
  21. variants.gene_id_any AS gene_id_any,
  22. variants.gene_id_prot_coding_distance AS gene_id_prot_coding_distance,
  23. variants.gene_id_prot_coding AS gene_id_prot_coding
  24. FROM SNP_info
  25. JOIN (
  26. SELECT
  27. CONCAT(CAST(rs_id AS string)) AS identifier,
  28. chr_id AS chr_id,
  29. position AS position,
  30. ref_allele AS ref,
  31. alt_allele AS alt,
  32. most_severe_consequence AS most_severe_consequence,
  33. gene_id_any_distance AS gene_id_any_distance,
  34. gene_id_any AS gene_id_any,
  35. gene_id_prot_coding_distance AS gene_id_prot_coding_distance,
  36. gene_id_prot_coding AS gene_id_prot_coding
  37. FROM
  38. `bigquery-public-data.open_targets_genetics.variants`
  39. ) variants
  40. ON SNP_info.identifier = variants.identifier
  41. """
  42. results = client.query(query)
  43. # Convert the results to a list of dictionaries
  44. result_list = [dict(row) for row in results]
  45. # Save the results as JSON
  46. with open('output.json', 'w') as json_file:
  47. json.dump(result_list, json_file)
  48. # Call the function to execute the query and save the results as JSON
  49. query_stackoverflow()

这段代码会执行查询并将结果保存为名为"output.json"的JSON文件。

英文:

How can I modify this script to be able to see/print some of the results and write the output in JSON :

  1. from google.cloud import bigquery
  2. def query_stackoverflow(project_id="gwas-386212"):
  3. client = bigquery.Client()
  4. query_job = client.query(
  5. """
  6. WITH
  7. SNP_info AS (
  8. SELECT
  9. CONCAT(CAST(rs_id AS string)) AS identifier
  10. FROM
  11. `gwas-386212.gwas_dataset_1.SNPs_intergenic_vep_pha005199`)
  12. SELECT
  13. *
  14. FROM
  15. SNP_info
  16. JOIN (
  17. SELECT
  18. CONCAT(CAST(rs_id AS string)) AS identifier,
  19. chr_id AS chr_id,
  20. position AS position,
  21. ref_allele AS ref,
  22. alt_allele AS alt,
  23. most_severe_consequence AS most_severe_consequence,
  24. gene_id_any_distance AS gene_id_any_distance,
  25. gene_id_any AS gene_id_any,
  26. gene_id_prot_coding_distance AS gene_id_prot_coding_distance,
  27. gene_id_prot_coding AS gene_id_prot_coding
  28. FROM
  29. `bigquery-public-data.open_targets_genetics.variants`) variants
  30. ON
  31. SNP_info.identifier = variants.identifier"""
  32. )
  33. results = client.query(query)
  34. for row in results:
  35. title = row['identifier']
  36. identifier = row['identifier']
  37. #print(f'{identifier}')

This is just printing a column the intentifier. i want to save the resulted table in JSON format. The JSOn from the google cloud platform should look something like this:

  1. [{
  2. "identifier": "rs62063022",
  3. "identifier_1": "rs62063022",
  4. "chr_id": "17",
  5. "position": "51134537",
  6. "ref": "T",
  7. "alt": "G",
  8. "most_severe_consequence": "intergenic_variant",
  9. "gene_id_any_distance": "13669",
  10. "gene_id_any": "ENSG00000008294",
  11. "gene_id_prot_coding_distance": "13669",
  12. "gene_id_prot_coding": "ENSG00000008294"
  13. }, {
  14. "identifier": "rs12944420",
  15. "identifier_1": "rs12944420",
  16. "chr_id": "17",
  17. "position": "42640692",
  18. "ref": "T",
  19. "alt": "C",
  20. "most_severe_consequence": "intergenic_variant",
  21. "gene_id_any_distance": "18592",
  22. "gene_id_any": "ENSG00000037042",
  23. "gene_id_prot_coding_distance": "18592",
  24. "gene_id_prot_coding": "ENSG00000037042"
  25. },

答案1

得分: 0

查看 json 文档以获取更多信息。

  1. records = [dict(row) for row in results]
  2. out_file = open("bigquery_response.json", "w")
  3. json.dump(records, out_file, indent=6)
  4. out_file.close()
英文:

Check out json documentation for further information.

  1. records = [dict(row) for row in results]
  2. out_file = open("bigquery_response.json", "w")
  3. json.dump(records , out_file, indent = 6)
  4. out_file.close()

huangapple
  • 本文由 发表于 2023年6月1日 21:44:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382547.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定