英文:
Iterate over specific csv rows and rerun code when new string detected after empty cell in first column
问题
我想创建一个为学生制作报告的脚本,我已经创建好了。
我想要一个逻辑,将遍历学生信息,并在检测到新的学生姓名时重新运行我的脚本。
例如:
对于Mark,脚本将运行到第5行,即最后一个科目。
对于George,脚本将重新运行到第10行,即最后一个科目。
以此类推,对于许多学生。
我们如何实现这个目标并打印学生姓名?
如果学生的详细信息分别存在于一个单独的CSV文件中,我已经实现了这一点,但如果它们存在于一个单一文件中,我该如何实现?
英文:
I want to create a report for the students with a script that I have created.
I want a logic that will iterate over the students and re-run my script when new students name detected.
student_name,gender,subject,marks
Mark,M,english,90
biology,80
chemistry,77
physics,89
George,M,french,81
biology,66
chemistry,82
physics,79
economic,57
Lisa,F,german,77
biology,89
chemistry,90
physics,92
economic,96
For example:
For Mark the script will run till 5th row i.e last subject.
For George the script will re-run till 10th row i.e last subject.
and so on for many students.
How can we achieve this and print the student name?
I have achieved this if the students details are present in csv separately in a directory for all students but How can I achieve if its in a single file.
答案1
得分: 2
更新
你可以修复你的CSV文件:
def read_csv(filename):
with open('marks.csv') as fp:
buf = io.StringIO()
header = fp.readline()
num_col = headers.count(',')
buf.writelines(header)
for row in fp:
diff = num_col - row.count(',')
buf.writelines(',' * diff + row.lstrip())
buf.seek(0)
return pd.read_csv(buf).ffill()
df = read_csv('marks.csv')
# 使用groupby循环
如果我理解正确,你正在寻找的是 groupby
。
假设有以下文件 marks.csv
:
student_name,gender,subject,marks
Mark,M,english,90
Mark,M,biology,80
Mark,M,chemistry,77
Mark,M,physics,89
George,M,french,81
George,M,biology,66
George,M,chemistry,82
George,M,physics,79
George,M,economic,57
Lisa,F,german,77
Lisa,F,biology,89
Lisa,F,chemistry,90
Lisa,F,physics,92
Lisa,F,economic,96
你可以这样做:
df = pd.read_csv('marks.csv')
for name, subdf in df.groupby('student_name', sort=False):
print(f"[{name}]")
print(subdf, end='\n\n')
# 在这里执行操作
输出:
[Mark]
student_name gender subject marks
0 Mark M english 90
1 Mark M biology 80
2 Mark M chemistry 77
3 Mark M physics 89
[George]
student_name gender subject marks
4 George M french 81
5 George M biology 66
6 George M chemistry 82
7 George M physics 79
8 George M economic 57
[Lisa]
student_name gender subject marks
9 Lisa F german 77
10 Lisa F biology 89
11 Lisa F chemistry 90
12 Lisa F physics 92
13 Lisa F economic 96
更多关于 groupby
的信息:Group by: split-apply-combine
英文:
Update
You can fix your csv file:
def read_csv(filename):
with open('marks.csv') as fp:
buf = io.StringIO()
header = fp.readline()
num_col = headers.count(',')
buf.writelines(header)
for row in fp:
diff = num_col - row.count(',')
buf.writelines(','*diff + row.lstrip())
buf.seek(0)
return pd.read_csv(buf).ffill()
df = read_csv('marks.csv')
# Use the loop with groupby
IIUC, you are looking for is groupby
.
Suppose the following file marks.csv
:
student_name,gender,subject,marks
Mark,M,english,90
Mark,M,biology,80
Mark,M,chemistry,77
Mark,M,physics,89
George,M,french,81
George,M,biology,66
George,M,chemistry,82
George,M,physics,79
George,M,economic,57
Lisa,F,german,77
Lisa,F,biology,89
Lisa,F,chemistry,90
Lisa,F,physics,92
Lisa,F,economic,96
You can do:
df = pd.read_csv('marks.csv')
for name, subdf in df.groupby('student_name', sort=False):
print(f"[{name}]")
print(subdf, end='\n\n')
# Do stuff here
Output:
[Mark]
student_name gender subject marks
0 Mark M english 90
1 Mark M biology 80
2 Mark M chemistry 77
3 Mark M physics 89
[George]
student_name gender subject marks
4 George M french 81
5 George M biology 66
6 George M chemistry 82
7 George M physics 79
8 George M economic 57
[Lisa]
student_name gender subject marks
9 Lisa F german 77
10 Lisa F biology 89
11 Lisa F chemistry 90
12 Lisa F physics 92
13 Lisa F economic 96
More on groupby
: Group by: split-apply-combine
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论