JSON 转换为 Python 数据帧:从另一个 API 映射值

huangapple go评论111阅读模式
英文:

JSON to Python dataframe: mapping values from another API

问题

以下是您要翻译的部分:

  1. import requests
  2. import json
  3. import pandas as pd
  4. def getdata(link):
  5. try:
  6. response = s.get(link)
  7. except requests.exceptions.RequestException as e:
  8. print('Request Exception Found!')
  9. json_data = response.json()
  10. return json_data
  11. s = requests.session()
  12. json_data = getdata('https://api.school.com/2020/students.json')
  13. print(json_data)
  14. def normalize(json_data, record_path):
  15. temp = json.dumps(json_data)
  16. ar = json.loads(temp)
  17. df = pd.json_normalize(ar[record_path])
  18. return df
  19. Student_df = normalize(json_data, 'Students')
  20. Student_ids = Student_df["Sid"].tolist()
  21. print(Student_ids)
  22. links = []
  23. for i, row in Student_df.groupby('Sid').size().items():
  24. link = "https://api.school.com/2020/mark.json?sid="+str(i)
  25. links.append(link)
  26. for x in links:
  27. normalize(getdata(x),'marks')

如果您有任何其他问题,或需要进一步的帮助,请随时提问。

英文:

I have an API with student data like this, for every student id there will be a corresponding API link with mark details.

for example:

https://api.school.com/2020/students.json

  1. {
  2. "Students": [
  3. {
  4. "StudentName": "AAA",
  5. "Sid": 1020,
  6. "Saddress": "st.aaa",
  7. "Sdob": "10-11-1999"
  8. },
  9. {
  10. "StudentName": "BBB",
  11. "Sid": 1021,
  12. "Saddress": "st.bbb",
  13. "Sdob": "11-11-1999"
  14. },
  15. {
  16. "StudentName": "CCC",
  17. "Sid": 1022,
  18. "Saddress": "st.fff",
  19. "Sdob": "05-12-1999"
  20. },
  21. {
  22. "StudentName": "DDD",
  23. "Sid": 1023,
  24. "Saddress": "st.ddd",
  25. "Sdob": "15-09-1999"
  26. },
  27. {
  28. "StudentName": "EEE",
  29. "Sid": 1024,
  30. "Saddress": "st.eee",
  31. "Sdob": "10-11-1999"
  32. },
  33. {
  34. "StudentName": "FFF",
  35. "Sid": 1025,
  36. "Saddress": "st.ddd",
  37. "Sdob": "20-11-1999"
  38. },
  39. {
  40. "StudentName": "GGG",
  41. "Sid": 1026,
  42. "Saddress": "st.ggg",
  43. "Sdob": "25-11-1999"
  44. },
  45. {
  46. "StudentName": "JJJ",
  47. "Sid": 1019,
  48. "Saddress": "st.aaa",
  49. "Sdob": "18-11-1999"
  50. }
  51. ]
  52. }

https://api.school.com/2020/mark.json?sid=1020

  1. {
  2. "marks": [
  3. {
  4. "English": 11,
  5. "Math": 12,
  6. "Art": 13,
  7. "Science": 14,
  8. "History": 15,
  9. "Geography": 16,
  10. "Physical Education": 17,
  11. "Chemistry": 18,
  12. "Physics": 19,
  13. "Biology": 20
  14. }
  15. ]
  16. }

https://api.school.com/2020/mark.json?sid=1021

  1. {
  2. "marks": [
  3. {
  4. "English": 21,
  5. "Math": 22,
  6. "Art": 23,
  7. "Science": 24,
  8. "History": 25,
  9. "Geography": 26,
  10. "Physical Education": 27,
  11. "Chemistry": 28,
  12. "Physics": 29,
  13. "Biology": 30
  14. }
  15. ]
  16. }

https://api.school.com/2020/mark.json?sid=1022

  1. {
  2. "marks": [
  3. {
  4. "English": 31,
  5. "Math": 32,
  6. "Art": 33,
  7. "Science": 34,
  8. "History": 35,
  9. "Geography": 36,
  10. "Physical Education": 37,
  11. "Chemistry": 38,
  12. "Physics": 39,
  13. "Biology": 40
  14. }
  15. ]
  16. }

https://api.school.com/2020/mark.json?sid=1023

  1. {
  2. "marks": [
  3. {
  4. "English": 41,
  5. "Math": 42,
  6. "Art": 43,
  7. "Science": 44,
  8. "History": 45,
  9. "Geography": 46,
  10. "Physical Education": 47,
  11. "Chemistry": 48,
  12. "Physics": 49,
  13. "Biology": 50
  14. }
  15. ]
  16. }

https://api.school.com/2020/mark.json?sid=1024

  1. {
  2. "marks": [
  3. {
  4. "English": 51,
  5. "Math": 52,
  6. "Art": 53,
  7. "Science": 54,
  8. "History": 55,
  9. "Geography": 56,
  10. "Physical Education": 57,
  11. "Chemistry": 58,
  12. "Physics": 59,
  13. "Biology": 60
  14. }
  15. ]
  16. }

https://api.school.com/2020/mark.json?sid=1025

  1. {
  2. "marks": [
  3. {
  4. "English": 61,
  5. "Math": 62,
  6. "Art": 63,
  7. "Science": 64,
  8. "History": 65,
  9. "Geography": 66,
  10. "Physical Education": 67,
  11. "Chemistry": 68,
  12. "Physics": 69,
  13. "Biology": 70
  14. }
  15. ]
  16. }

https://api.school.com/2020/mark.json?sid=1026

  1. {
  2. "marks": [
  3. {
  4. "English": 71,
  5. "Math": 72,
  6. "Art": 73,
  7. "Science": 74,
  8. "History": 75,
  9. "Geography": 76,
  10. "Physical Education": 77,
  11. "Chemistry": 78,
  12. "Physics": 79,
  13. "Biology": 80
  14. }
  15. ]
  16. }

https://api.school.com/2020/mark.json?sid=1019

  1. {
  2. "marks": [
  3. {
  4. "English": 1,
  5. "Math": 2,
  6. "Art": 3,
  7. "Science": 4,
  8. "History": 5,
  9. "Geography": 6,
  10. "Physical Education": 7,
  11. "Chemistry": 8,
  12. "Physics": 9,
  13. "Biology": 10
  14. }
  15. ]
  16. }

I need to get an output like this in a csv file
JSON 转换为 Python 数据帧:从另一个 API 映射值

My code(not completed):

  1. import requests
  2. import json
  3. import pandas as pd
  4. def getdata(link):
  5. try:
  6. response = s.get(link)
  7. except requests.exceptions.RequestException as e:
  8. print('Request Exception Found!')
  9. json_data = response.json()
  10. return json_data
  11. s = requests.session()
  12. json_data = getdata('https://api.school.com/2020/students.json')
  13. print(json_data)
  14. def normalize(json_data,record_path):
  15. temp = json.dumps(json_data)
  16. ar = json.loads(temp)
  17. df = pd.json_normalize(ar[record_path])
  18. return df
  19. Student_df = normalize(json_data, 'Students')
  20. Student_ids = Student_df["Sid"].tolist()
  21. print(Student_ids)
  22. links = []
  23. for i, row in Student_df.groupby('Sid').size().items():
  24. link = "https://api.school.com/2020/mark.json?sid="+str(i)
  25. links.append(link)
  26. for x in links:
  27. normalize(getdata(x),'marks')

is it possible to insert the StudentName and Sid along with the mark? when the program fetches the mark from each API, it should be able to map and add the StudentName and Sid.

if i saves each mark api data to a json file in the local machine, then i can use this code and it works perfectly

答案1

得分: 0

创建一个记录字典,其中键是Sid,以便可以合并Student_dfMarks_df

  1. link = 'https://api.school.com/2020/mark.json?sid={sid}'
  2. marks = {sid: normalize(getdata(link.format(sid=sid)), 'marks')
  3. for sid in Student_df['Sid'].unique()}
  4. Marks_df = pd.concat(marks).droplevel(1)
  5. out = Student_df[['StudentName', 'Sid']].merge(Marks_df, left_on='Sid', right_index=True)

输出:

  1. >>> out
  2. StudentName Sid English Math Art Science History Geography Physical Education Chemistry Physics Biology
  3. 0 AAA 1020 11 12 13 14 15 16 17 18 19 20
  4. 1 BBB 1021 21 22 23 24 25 26 27 28 29 30
  5. 2 CCC 1022 31 32 33 34 35 36 37 38 39 40
  6. 3 DDD 1023 41 42 43 44 45 46 47 48 49 50
  7. 4 EEE 1024 51 52 53 54 55 56 57 58 59 60
  8. 5 FFF 1025 61 62 63 64 65 66 67 68 69 70
  9. 6 GGG 1026 71 72 73 74 75 76 77 78 79 80
  10. 7 JJJ 1019 1 2 3 4 5 6 7 8 9 10

使用FastAPI的可重现示例:

  1. # api.py
  2. from fastapi import FastAPI
  3. Students = [{'StudentName': 'AAA', 'Sid': 1020, 'Saddress': 'st.aaa', 'Sdob': '10-11-1999'},
  4. {'StudentName': 'BBB', 'Sid': 1021, 'Saddress': 'st.bbb', 'Sdob': '11-11-1999'},
  5. {'StudentName': 'CCC', 'Sid': 1022, 'Saddress': 'st.fff', 'Sdob': '05-12-1999'},
  6. {'StudentName': 'DDD', 'Sid': 1023, 'Saddress': 'st.ddd', 'Sdob': '15-09-1999'},
  7. {'StudentName': 'EEE', 'Sid': 1024, 'Saddress': 'st.eee', 'Sdob': '10-11-1999'},
  8. {'StudentName': 'FFF', 'Sid': 1025, 'Saddress': 'st.ddd', 'Sdob': '20-11-1999'},
  9. {'StudentName': 'GGG', 'Sid': 1026, 'Saddress': 'st.ggg', 'Sdob': '25-11-1999'},
  10. {'StudentName': 'JJJ', 'Sid': 1019, 'Saddress': 'st.aaa', 'Sdob': '18-11-1999'}]
  11. Marks = {'1020': {'marks': [{'English': 11, 'Math': 12, 'Art': 13, 'Science': 14, 'History': 15, 'Geography': 16, 'Physical Education': 17, 'Chemistry': 18, 'Physics': 19, 'Biology': 20}]},
  12. '1021': {'marks': [{'English': 21, 'Math': 22, 'Art': 23, 'Science': 24, 'History': 25, 'Geography': 26, 'Physical Education': 27, 'Chemistry': 28, 'Physics': 29, 'Biology': 30}]},
  13. '1022': {'marks': [{'English': 31, 'Math': 32, 'Art': 33, 'Science': 34, 'History': 35, 'Geography': 36, 'Physical Education': 37, 'Chemistry': 38, 'Physics': 39, 'Biology': 40}]},
  14. '1023': {'marks': [{'English': 41, 'Math': 42, 'Art': 43, 'Science': 44, 'History': 45, 'Geography': 46, 'Physical Education': 47, 'Chemistry': 48, 'Physics': 49, 'Biology': 50}]},
  15. '1024': {'marks': [{'English': 51, 'Math': 52, 'Art': 53, 'Science': 54, 'History': 55, 'Geography': 56, 'Physical Education': 57, 'Chemistry': 58, 'Physics': 59, 'Biology': 60}]},
  16. '1025': {'marks': [{'English': 61, 'Math': 62, 'Art': 63, 'Science': 64, 'History': 65, 'Geography': 66, 'Physical Education': 67, 'Chemistry': 68, 'Physics': 69, 'Biology': 70}]},
  17. '1026': {'marks': [{'English': 71, 'Math': 72, 'Art': 73, 'Science': 74, 'History': 75, 'Geography': 76, 'Physical Education': 77, 'Chemistry': 78, 'Physics': 79, 'Biology': 80}]},
  18. '1019': {'marks': [{'English': 1, 'Math': 2, 'Art': 3, 'Science': 4, 'History': 5, 'Geography': 6, 'Physical Education': 7, 'Chemistry': 8, 'Physics': 9, 'Biology': 10}]}
  19. app = FastAPI()
  20. @app.get('/2020/students.json')
  21. def students():
  22. return Students
  23. @app.get('/2020/mark.json')
  24. def mark(sid):
  25. return Marks[sid]

用法:

  1. [...]$ uvicorn api:app --reload
  2. # requests.get('http://localhost:8000/2020/students.json')
  3. # requests.get('http://localhost:8000/2020/mark.json?sid=1019')
英文:

Create a dict of records where the key is Sid so you can merge Student_df and Marks_df:

  1. link = 'https://api.school.com/2020/mark.json?sid={sid}'
  2. marks = {sid: normalize(getdata(link.format(sid=sid)), 'marks')
  3. for sid in Student_df['Sid'].unique()}
  4. Marks_df = pd.concat(marks).droplevel(1)
  5. out = Student_df[['StudentName', 'Sid']].merge(Marks_df, left_on='Sid', right_index=True)

Output:

  1. >>> out
  2. StudentName Sid English Math Art Science History Geography Physical Education Chemistry Physics Biology
  3. 0 AAA 1020 11 12 13 14 15 16 17 18 19 20
  4. 1 BBB 1021 21 22 23 24 25 26 27 28 29 30
  5. 2 CCC 1022 31 32 33 34 35 36 37 38 39 40
  6. 3 DDD 1023 41 42 43 44 45 46 47 48 49 50
  7. 4 EEE 1024 51 52 53 54 55 56 57 58 59 60
  8. 5 FFF 1025 61 62 63 64 65 66 67 68 69 70
  9. 6 GGG 1026 71 72 73 74 75 76 77 78 79 80
  10. 7 JJJ 1019 1 2 3 4 5 6 7 8 9 10

Reproducible example with FastAPI:

  1. # api.py
  2. from fastapi import FastAPI
  3. Students = [{'StudentName': 'AAA', 'Sid': 1020, 'Saddress': 'st.aaa', 'Sdob': '10-11-1999'}, {'StudentName': 'BBB', 'Sid': 1021, 'Saddress': 'st.bbb', 'Sdob': '11-11-1999'}, {'StudentName': 'CCC', 'Sid': 1022, 'Saddress': 'st.fff', 'Sdob': '05-12-1999'}, {'StudentName': 'DDD', 'Sid': 1023, 'Saddress': 'st.ddd', 'Sdob': '15-09-1999'}, {'StudentName': 'EEE', 'Sid': 1024, 'Saddress': 'st.eee', 'Sdob': '10-11-1999'}, {'StudentName': 'FFF', 'Sid': 1025, 'Saddress': 'st.ddd', 'Sdob': '20-11-1999'}, {'StudentName': 'GGG', 'Sid': 1026, 'Saddress': 'st.ggg', 'Sdob': '25-11-1999'}, {'StudentName': 'JJJ', 'Sid': 1019, 'Saddress': 'st.aaa', 'Sdob': '18-11-1999'}]
  4. Marks = {'1020': {'marks': [{'English': 11, 'Math': 12, 'Art': 13, 'Science': 14, 'History': 15, 'Geography': 16, 'Physical Education': 17, 'Chemistry': 18, 'Physics': 19, 'Biology': 20}]}, '1021': {'marks': [{'English': 21, 'Math': 22, 'Art': 23, 'Science': 24, 'History': 25, 'Geography': 26, 'Physical Education': 27, 'Chemistry': 28, 'Physics': 29, 'Biology': 30}]}, '1022': {'marks': [{'English': 31, 'Math': 32, 'Art': 33, 'Science': 34, 'History': 35, 'Geography': 36, 'Physical Education': 37, 'Chemistry': 38, 'Physics': 39, 'Biology': 40}]}, '1023': {'marks': [{'English': 41, 'Math': 42, 'Art': 43, 'Science': 44, 'History': 45, 'Geography': 46, 'Physical Education': 47, 'Chemistry': 48, 'Physics': 49, 'Biology': 50}]}, '1024': {'marks': [{'English': 51, 'Math': 52, 'Art': 53, 'Science': 54, 'History': 55, 'Geography': 56, 'Physical Education': 57, 'Chemistry': 58, 'Physics': 59, 'Biology': 60}]}, '1025': {'marks': [{'English': 61, 'Math': 62, 'Art': 63, 'Science': 64, 'History': 65, 'Geography': 66, 'Physical Education': 67, 'Chemistry': 68, 'Physics': 69, 'Biology': 70}]}, '1026': {'marks': [{'English': 71, 'Math': 72, 'Art': 73, 'Science': 74, 'History': 75, 'Geography': 76, 'Physical Education': 77, 'Chemistry': 78, 'Physics': 79, 'Biology': 80}]}, '1019': {'marks': [{'English': 1, 'Math': 2, 'Art': 3, 'Science': 4, 'History': 5, 'Geography': 6, 'Physical Education': 7, 'Chemistry': 8, 'Physics': 9, 'Biology': 10}]}}
  5. app = FastAPI()
  6. @app.get('/2020/students.json')
  7. def students():
  8. return Students
  9. @app.get('/2020/mark.json')
  10. def mark(sid):
  11. return Marks[sid]

Usage:

  1. [...]$ uvicorn api:app --reload
  2. # requests.get('http://localhost:8000/2020/students.json')
  3. # requests.get('http://localhost:8000/2020/mark.json?sid=1019')

答案2

得分: 0

import concurrent.futures 使用 concurrent.futures 来异步从 API 获取分数数据。(你可以在 merge_frames() 方法中删除你不需要的列。)

  1. import concurrent.futures
  2. import json
  3. import os
  4. import pandas as pd
  5. import requests
  6. class StudentsMarks:
  7. def __init__(self):
  8. self.root_url = "https://api.school.com/2020/"
  9. self.students = self.get_students()
  10. self.processors = os.cpu_count()
  11. def get_students(self) -> pd.DataFrame:
  12. url = "students.json"
  13. return self.get_data(url=url, record="Students")
  14. def process_marks(self) -> pd.DataFrame:
  15. student_ids = self.students["Sid"].tolist()
  16. with concurrent.futures.ProcessPoolExecutor(max_workers=self.processors) as executor:
  17. grades = pd.concat(executor.map(self.get_data, student_ids)).reset_index(drop=True)
  18. return self.merge_frames(grades)
  19. def get_data(self, sid: int = "", url: str = "mark.json?sid=", record: str = "marks") -> pd.DataFrame:
  20. url = f"{self.root_url}{url}{sid}"
  21. with requests.Session() as request:
  22. response = request.get(url, timeout=30)
  23. if response.status_code != 200:
  24. print(response.raise_for_status())
  25. data = json.loads(response.text)
  26. return pd.json_normalize(data=data, record_path=record).assign(id=sid)
  27. def merge_frames(self, grades: pd.DataFrame) -> pd.DataFrame:
  28. df = pd.merge(left=self.students, right=grades, left_on="Sid", right_on="id", suffixes=("students", "grades"))
  29. return df[df.columns.drop(list(df.filter(regex="students|grades"))]
  30. if __name__ == "__main__":
  31. final = StudentsMarks().process_marks()
  32. print(final)

输出:

  1. Saddress Sdob Sid StudentName Art Biology Chemistry English Geography History Math Physical Education Physics Science
  2. 0 st.aaa 10-31-1999 1020 AAA 13 20 18 11 16 15 12 17 19 14
  3. 1 st.bbb 11-19-1999 1021 BBB 23 30 28 21 26 25 22 27 29 24
  4. 2 st.fff 12-22-1999 1022 CCC 33 40 38 31 36 35 32 37 39 34
  5. 3 st.ddd 09-15-1999 1023 DDD 43 50 48 41 46 45 42 47 49 44
  6. 4 st.eee 02-08-1999 1024 EEE 53 60 58 51 56 55 52 57 59 54
  7. 5 st.ddd 05-14-1999 1025 FFF 63 70 68 61 66 65 62 67 69 64
  8. 6 st.ggg 11-25-1999 1026 GGG 73 80 78 71 76 75 72 77 79 74
  9. 7 st.aaa 11-18-1999 1019 JJJ 3 10 8 1 6 5 2 7 9 4
英文:

Using concurrent.futures to asynchronously fetch mark data from api. (you can drop any columns you don't need in merge_frames() method.)

  1. import concurrent.futures
  2. import json
  3. import os
  4. import pandas as pd
  5. import requests
  6. class StudentsMarks:
  7. def __init__(self):
  8. self.root_url = "https://api.school.com/2020/"
  9. self.students = self.get_students()
  10. self.processors = os.cpu_count()
  11. def get_students(self) -> pd.DataFrame:
  12. url = "students.json"
  13. return self.get_data(url=url, record="Students")
  14. def process_marks(self) -> pd.DataFrame:
  15. student_ids = self.students["Sid"].tolist()
  16. with concurrent.futures.ProcessPoolExecutor(max_workers=self.processors) as executor:
  17. grades = pd.concat(executor.map(self.get_data, student_ids)).reset_index(drop=True)
  18. return self.merge_frames(grades)
  19. def get_data(self, sid: int = "", url: str = "mark.json?sid=", record: str = "marks") -> pd.DataFrame:
  20. url = f"{self.root_url}{url}{sid}"
  21. with requests.Session() as request:
  22. response = request.get(url, timeout=30)
  23. if response.status_code != 200:
  24. print(response.raise_for_status())
  25. data = json.loads(response.text)
  26. return pd.json_normalize(data=data, record_path=record).assign(id=sid)
  27. def merge_frames(self, grades: pd.DataFrame) -> pd.DataFrame:
  28. df = pd.merge(left=self.students, right=grades, left_on="Sid", right_on="id", suffixes=("_students", "_grades"))
  29. return df[df.columns.drop(list(df.filter(regex="students|grades")))]
  30. if __name__ == "__main__":
  31. final = StudentsMarks().process_marks()
  32. print(final)

Output:

  1. Saddress Sdob Sid StudentName Art Biology Chemistry English Geography History Math Physical Education Physics Science
  2. 0 st.aaa 10-31-1999 1020 AAA 13 20 18 11 16 15 12 17 19 14
  3. 1 st.bbb 11-19-1999 1021 BBB 23 30 28 21 26 25 22 27 29 24
  4. 2 st.fff 12-22-1999 1022 CCC 33 40 38 31 36 35 32 37 39 34
  5. 3 st.ddd 09-15-1999 1023 DDD 43 50 48 41 46 45 42 47 49 44
  6. 4 st.eee 02-08-1999 1024 EEE 53 60 58 51 56 55 52 57 59 54
  7. 5 st.ddd 05-14-1999 1025 FFF 63 70 68 61 66 65 62 67 69 64
  8. 6 st.ggg 11-25-1999 1026 GGG 73 80 78 71 76 75 72 77 79 74
  9. 7 st.aaa 11-18-1999 1019 JJJ 3 10 8 1 6 5 2 7 9 4

huangapple
  • 本文由 发表于 2023年2月26日 23:36:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75573060.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定