Iterate over specific csv rows and rerun code when new string detected after empty cell in first column

huangapple go评论87阅读模式
英文:

Iterate over specific csv rows and rerun code when new string detected after empty cell in first column

问题

我想创建一个为学生制作报告的脚本,我已经创建好了。
我想要一个逻辑,将遍历学生信息,并在检测到新的学生姓名时重新运行我的脚本。

例如:
对于Mark,脚本将运行到第5行,即最后一个科目。
对于George,脚本将重新运行到第10行,即最后一个科目。
以此类推,对于许多学生。

我们如何实现这个目标并打印学生姓名?

如果学生的详细信息分别存在于一个单独的CSV文件中,我已经实现了这一点,但如果它们存在于一个单一文件中,我该如何实现?

英文:

I want to create a report for the students with a script that I have created.
I want a logic that will iterate over the students and re-run my script when new students name detected.

  1. student_name,gender,subject,marks
  2. Mark,M,english,90
  3. biology,80
  4. chemistry,77
  5. physics,89
  6. George,M,french,81
  7. biology,66
  8. chemistry,82
  9. physics,79
  10. economic,57
  11. Lisa,F,german,77
  12. biology,89
  13. chemistry,90
  14. physics,92
  15. economic,96

Iterate over specific csv rows and rerun code when new string detected after empty cell in first column

For example:
For Mark the script will run till 5th row i.e last subject.
For George the script will re-run till 10th row i.e last subject.
and so on for many students.

How can we achieve this and print the student name?

I have achieved this if the students details are present in csv separately in a directory for all students but How can I achieve if its in a single file.

答案1

得分: 2

更新

你可以修复你的CSV文件:

  1. def read_csv(filename):
  2. with open('marks.csv') as fp:
  3. buf = io.StringIO()
  4. header = fp.readline()
  5. num_col = headers.count(',')
  6. buf.writelines(header)
  7. for row in fp:
  8. diff = num_col - row.count(',')
  9. buf.writelines(',' * diff + row.lstrip())
  10. buf.seek(0)
  11. return pd.read_csv(buf).ffill()
  12. df = read_csv('marks.csv')
  13. # 使用groupby循环

如果我理解正确,你正在寻找的是 groupby

假设有以下文件 marks.csv

  1. student_name,gender,subject,marks
  2. Mark,M,english,90
  3. Mark,M,biology,80
  4. Mark,M,chemistry,77
  5. Mark,M,physics,89
  6. George,M,french,81
  7. George,M,biology,66
  8. George,M,chemistry,82
  9. George,M,physics,79
  10. George,M,economic,57
  11. Lisa,F,german,77
  12. Lisa,F,biology,89
  13. Lisa,F,chemistry,90
  14. Lisa,F,physics,92
  15. Lisa,F,economic,96

你可以这样做:

  1. df = pd.read_csv('marks.csv')
  2. for name, subdf in df.groupby('student_name', sort=False):
  3. print(f"[{name}]")
  4. print(subdf, end='\n\n')
  5. # 在这里执行操作

输出:

  1. [Mark]
  2. student_name gender subject marks
  3. 0 Mark M english 90
  4. 1 Mark M biology 80
  5. 2 Mark M chemistry 77
  6. 3 Mark M physics 89
  7. [George]
  8. student_name gender subject marks
  9. 4 George M french 81
  10. 5 George M biology 66
  11. 6 George M chemistry 82
  12. 7 George M physics 79
  13. 8 George M economic 57
  14. [Lisa]
  15. student_name gender subject marks
  16. 9 Lisa F german 77
  17. 10 Lisa F biology 89
  18. 11 Lisa F chemistry 90
  19. 12 Lisa F physics 92
  20. 13 Lisa F economic 96

更多关于 groupby 的信息:Group by: split-apply-combine

英文:

Update

You can fix your csv file:

  1. def read_csv(filename):
  2. with open('marks.csv') as fp:
  3. buf = io.StringIO()
  4. header = fp.readline()
  5. num_col = headers.count(',')
  6. buf.writelines(header)
  7. for row in fp:
  8. diff = num_col - row.count(',')
  9. buf.writelines(','*diff + row.lstrip())
  10. buf.seek(0)
  11. return pd.read_csv(buf).ffill()
  12. df = read_csv('marks.csv')
  13. # Use the loop with groupby

IIUC, you are looking for is groupby.

Suppose the following file marks.csv:

  1. student_name,gender,subject,marks
  2. Mark,M,english,90
  3. Mark,M,biology,80
  4. Mark,M,chemistry,77
  5. Mark,M,physics,89
  6. George,M,french,81
  7. George,M,biology,66
  8. George,M,chemistry,82
  9. George,M,physics,79
  10. George,M,economic,57
  11. Lisa,F,german,77
  12. Lisa,F,biology,89
  13. Lisa,F,chemistry,90
  14. Lisa,F,physics,92
  15. Lisa,F,economic,96

You can do:

  1. df = pd.read_csv('marks.csv')
  2. for name, subdf in df.groupby('student_name', sort=False):
  3. print(f"[{name}]")
  4. print(subdf, end='\n\n')
  5. # Do stuff here

Output:

  1. [Mark]
  2. student_name gender subject marks
  3. 0 Mark M english 90
  4. 1 Mark M biology 80
  5. 2 Mark M chemistry 77
  6. 3 Mark M physics 89
  7. [George]
  8. student_name gender subject marks
  9. 4 George M french 81
  10. 5 George M biology 66
  11. 6 George M chemistry 82
  12. 7 George M physics 79
  13. 8 George M economic 57
  14. [Lisa]
  15. student_name gender subject marks
  16. 9 Lisa F german 77
  17. 10 Lisa F biology 89
  18. 11 Lisa F chemistry 90
  19. 12 Lisa F physics 92
  20. 13 Lisa F economic 96

More on groupby: Group by: split-apply-combine

huangapple
  • 本文由 发表于 2023年3月31日 17:25:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75896827.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定