Iterate over specific csv rows and rerun code when new string detected after empty cell in first column

huangapple go评论59阅读模式
英文:

Iterate over specific csv rows and rerun code when new string detected after empty cell in first column

问题

我想创建一个为学生制作报告的脚本,我已经创建好了。
我想要一个逻辑,将遍历学生信息,并在检测到新的学生姓名时重新运行我的脚本。

例如:
对于Mark,脚本将运行到第5行,即最后一个科目。
对于George,脚本将重新运行到第10行,即最后一个科目。
以此类推,对于许多学生。

我们如何实现这个目标并打印学生姓名?

如果学生的详细信息分别存在于一个单独的CSV文件中,我已经实现了这一点,但如果它们存在于一个单一文件中,我该如何实现?

英文:

I want to create a report for the students with a script that I have created.
I want a logic that will iterate over the students and re-run my script when new students name detected.

student_name,gender,subject,marks
Mark,M,english,90
         biology,80
         chemistry,77
         physics,89 
George,M,french,81
         biology,66
         chemistry,82
         physics,79
         economic,57
Lisa,F,german,77
         biology,89
         chemistry,90
         physics,92
         economic,96

Iterate over specific csv rows and rerun code when new string detected after empty cell in first column

For example:
For Mark the script will run till 5th row i.e last subject.
For George the script will re-run till 10th row i.e last subject.
and so on for many students.

How can we achieve this and print the student name?

I have achieved this if the students details are present in csv separately in a directory for all students but How can I achieve if its in a single file.

答案1

得分: 2

更新

你可以修复你的CSV文件:

def read_csv(filename):
    with open('marks.csv') as fp:
        buf = io.StringIO()
        header = fp.readline()
        num_col = headers.count(',')
        buf.writelines(header)
        for row in fp:
            diff = num_col - row.count(',')
            buf.writelines(',' * diff + row.lstrip())
        buf.seek(0)
    return pd.read_csv(buf).ffill()

df = read_csv('marks.csv')
# 使用groupby循环

如果我理解正确,你正在寻找的是 groupby

假设有以下文件 marks.csv

student_name,gender,subject,marks
Mark,M,english,90
Mark,M,biology,80
Mark,M,chemistry,77
Mark,M,physics,89
George,M,french,81
George,M,biology,66
George,M,chemistry,82
George,M,physics,79
George,M,economic,57
Lisa,F,german,77
Lisa,F,biology,89
Lisa,F,chemistry,90
Lisa,F,physics,92
Lisa,F,economic,96

你可以这样做:

df = pd.read_csv('marks.csv')
for name, subdf in df.groupby('student_name', sort=False):
    print(f"[{name}]")
    print(subdf, end='\n\n')
    # 在这里执行操作

输出:

[Mark]
  student_name gender    subject  marks
0         Mark      M    english     90
1         Mark      M    biology     80
2         Mark      M  chemistry     77
3         Mark      M    physics     89

[George]
  student_name gender    subject  marks
4       George      M     french     81
5       George      M    biology     66
6       George      M  chemistry     82
7       George      M    physics     79
8       George      M   economic     57

[Lisa]
   student_name gender    subject  marks
9          Lisa      F     german     77
10         Lisa      F    biology     89
11         Lisa      F  chemistry     90
12         Lisa      F    physics     92
13         Lisa      F   economic     96

更多关于 groupby 的信息:Group by: split-apply-combine

英文:

Update

You can fix your csv file:

def read_csv(filename):
    with open('marks.csv') as fp:
        buf = io.StringIO()
        header = fp.readline()
        num_col = headers.count(',')
        buf.writelines(header)
        for row in fp:
            diff = num_col - row.count(',')
            buf.writelines(','*diff + row.lstrip())
        buf.seek(0)
    return pd.read_csv(buf).ffill()

df = read_csv('marks.csv')
# Use the loop with groupby

IIUC, you are looking for is groupby.

Suppose the following file marks.csv:

student_name,gender,subject,marks
Mark,M,english,90
Mark,M,biology,80
Mark,M,chemistry,77
Mark,M,physics,89
George,M,french,81
George,M,biology,66
George,M,chemistry,82
George,M,physics,79
George,M,economic,57
Lisa,F,german,77
Lisa,F,biology,89
Lisa,F,chemistry,90
Lisa,F,physics,92
Lisa,F,economic,96

You can do:

df = pd.read_csv('marks.csv')
for name, subdf in df.groupby('student_name', sort=False):
    print(f"[{name}]")
    print(subdf, end='\n\n')
    # Do stuff here

Output:

[Mark]
  student_name gender    subject  marks
0         Mark      M    english     90
1         Mark      M    biology     80
2         Mark      M  chemistry     77
3         Mark      M    physics     89

[George]
  student_name gender    subject  marks
4       George      M     french     81
5       George      M    biology     66
6       George      M  chemistry     82
7       George      M    physics     79
8       George      M   economic     57

[Lisa]
   student_name gender    subject  marks
9          Lisa      F     german     77
10         Lisa      F    biology     89
11         Lisa      F  chemistry     90
12         Lisa      F    physics     92
13         Lisa      F   economic     96

More on groupby: Group by: split-apply-combine

huangapple
  • 本文由 发表于 2023年3月31日 17:25:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75896827.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定