如何根据条件提取数据并存储到多个文件中

huangapple go评论71阅读模式
英文:

How to fetch data and store into multiple files based on condition

问题

以下是代码部分的中文翻译,不包括问题的回答:

import pandas as pd

input_file = pd.read_csv("test.csv")
for i in range(0, len(input_file['name'])):
    dict1 = {}
    dict1["name"] = str(input_file['name'][i])
    dict1["age"] = str(input_file['age'][i])
    dict1["n1"] = str(input_file['n1'][i])
    dict1["n2"] = str(input_file['n2'][i])
    dict1["n3"] = str(input_file['n3'][i])

希望这对你有帮助。如果需要进一步的翻译或帮助,请告诉我。

英文:

test.csv

name,age,n1,n2,n3
a,21,1,2,3
b,22,4,9,0
c,25,4,5,6
d,25,41,5,6
e,25,4,66,6
f,25,4,5,66
g,25,4,55,6
h,25,4,5,56
i,25,41,5,61
j,25,4,51,60
k,20,40,50,60
l,21,40,51,60

My code till reading and storing into dict

import pandas as pd

input_file = pd.read_csv("test.csv")
for i in range(0, len(input_file['name'])):   
    dict1 = {}
    dict1["name"] = str(input_file['name'][i])
    dict1["age"] = str(input_file['age'][i])
    dict1["n1"] = str(input_file['n1'][i])
    dict1["n2"] = str(input_file['n2'][i])
    dict1["n3"] = str(input_file['n3'][i]) 

I want to generate output in multiple file for each 5 rows of data (But this I need to do using writeline function in python as I need to do many stuff in writelines. FIle name should be generated dynamically also input will be dynamic (Meaning more rows can come)

example or expected output (herre file name must be dynamic)

out_file = open('File1.xml', 'w')
out_file.writelines(I will process with dictionary data row by row)
out_file.writelines("\n")

File1

a,21,1,2,3
b,22,4,9,0
c,25,4,5,6
d,25,41,5,6
e,25,4,66,6

File2

f,25,4,5,66
g,25,4,55,6
h,25,4,5,56
i,25,41,5,61
j,25,4,51,60

File3

k,20,40,50,60
l,21,40,51,60

答案1

得分: 2

如果使用默认的RangeIndex,您可以通过组的数量进行整数除法在groupby中进行循环:

input_file = pd.read_csv("test.csv")

N = 5
for name, g in input_file.groupby(input_file.index // N): 
    g.to_csv(f'file_{name}.csv', ignore_index=True, header=False)

N = 5
for name, g in input_file.groupby(np.arange(len(input_file)) // N): 
    g.to_csv(f'file_{name}.csv', ignore_index=True, header=False)

编辑:如果需要逐行写入,可以使用以下方式:

N = 5
for name, g in input_file.groupby(input_file.index // N): 
    with open(f'File{name+1}.xml', 'w') as out_file:
        for data in g.to_numpy():
            out_file.write(','.join(str(x) for x in data))
            out_file.write('\n')
英文:

If default RangeIndex you can loop in groupby with integer division by number of groups:

input_file = pd.read_csv("test.csv")

N = 5
for name, g in input_file.groupby(input_file.index // N): 
    g.to_csv(f'file_{name}.csv', ignore_index=True, header=False)

N = 5
for name, g in input_file.groupby(np.arange(len(input_file)) // N): 
    g.to_csv(f'file_{name}.csv', ignore_index=True, header=False)

EDIT: If need really write line by line use:

N = 5
for name, g in input_file.groupby(input_file.index // N): 
    with open(f'File{name+1}.xml', 'w') as out_file:
        for data in g.to_numpy():
            out_file.write(','.join(str(x) for x in data))
            out_file.write('\n')

huangapple
  • 本文由 发表于 2023年2月27日 14:30:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/75577350.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定