pandas read_table使用停止字符串来分隔不同的数据框以进行分配。

huangapple go评论75阅读模式
英文:

pandas read_table with stopping strings to delimit different dataframes to assign

问题

我有一个csv文件的形式:

跳过的第1行
跳过的第2行
2.13999987 0.139999986 -0.398405492 1
2.61999989 6.0000062E-2 0.450082362 1
2.74000001 5.99999428E-2 1.04403841 1
2.84000015 4.00000811E-2 6.17375337E-2 1
IGN IGN IGN IGN 
21.4200001 0.420000076 1.53572667 1
22.3199997 0.479999542 -0.595370948 1
23.3199997 0.520000458 0.136062101 1
24.3600006 0.519999504 -0.520044923 1
25.3999996 0.520000458 2.45230961 1
26.4399986 0.519999504 -2.08248448 1
27.4799995 0.520000458 -0.263438225 1
IGN IGN IGN IGN 
58.6800003 0.520000458 -0.789233088 1
59.7200012 0.520000458 -1.02961564 1
60.7600021 0.51999855 -0.889572859 1
61.7999992 0.520000458 -1.03346229 1
62.8400002 0.520000458 4.94940579E-2 1

我想使用pandas读取它,如下所示:

df_first = pd.read_table('file.txt', names=names, delimiter=' ', skiprows=3, nrows=4)

(其中names是文件.txt中每列的名称)。
我想将每组行分配给指定的df,直到遇到字符串IGN IGN IGN IGN,然后再次将其余的行分配给下一个df,直到再次遇到IGN IGN IGN IGN字符串,一直到文件结束。

如何实现这一目标的一个好方法是什么?

英文:

I have a csv file of the form :

LINE 1 to SKIP
LINE 2 to SKIP
2.13999987 0.139999986 -0.398405492 1
2.61999989 6.0000062E-2 0.450082362 1
2.74000001 5.99999428E-2 1.04403841 1
2.84000015 4.00000811E-2 6.17375337E-2 1
IGN IGN IGN IGN 
21.4200001 0.420000076 1.53572667 1
22.3199997 0.479999542 -0.595370948 1
23.3199997 0.520000458 0.136062101 1
24.3600006 0.519999504 -0.520044923 1
25.3999996 0.520000458 2.45230961 1
26.4399986 0.519999504 -2.08248448 1
27.4799995 0.520000458 -0.263438225 1
IGN IGN IGN IGN 
58.6800003 0.520000458 -0.789233088 1
59.7200012 0.520000458 -1.02961564 1
60.7600021 0.51999855 -0.889572859 1
61.7999992 0.520000458 -1.03346229 1
62.8400002 0.520000458 4.94940579E-2 1

And I would like to read that with pandas like:

df_first = pd.read_table('file.txt', names=names, delimiter=' ', skiprows=3, nrows=4)

(where names are the name of each column in the file.txt).
I want to assign each series of rows to a df with a given name specified (perhaps with an array of names), until the IGN IGN IGN IGN string is met, and then assign the rest of the rows to the following df again until the the next IGN IGN IGN IGN string is met, till the end of the file.

What is a good way to do that?

答案1

得分: 1

我几年前遇到了这个问题。我的解决方案:

names = ['1', '2', '3', '4']
df = pd.read_table('file.txt', names=names, delimiter=' ', skiprows=3) # 读取数据
index = list(df.loc[df['1']=='IGN'].index) # 获取"IGN"出现的索引
df_list = [] # 定义数据框列表以存储数据框
start = df.index.min() # 定义起始索引
for end in index: # 循环遍历所有索引
    df_list.append(df.loc[start:end-1])
    start = end+1
else:
    df_list.append(df.loc[start:]) # 获取主数据框的最后一部分

您可以像这样调用单个数据框:

df_list[0]
df_list[1]
...
df_list[n]

问候。

英文:

I was confronted by this problem couple of years ago. My solution:

names =['1','2', '3', '4']
df = pd.read_table('file.txt', names=names, delimiter=' ', skiprows=3) # Read the data
index = list(df.loc[df['1']=='IGN'].index) # Getting the index, where IGN ocures
df_list = [] # Defining the dataframe-List ot store the dataframes
start = df.index.min() # Defining the start index
for end in index: # looping through all indeces
    df_list.append(df.loc[start:end-1])
    start = end+1
else:
    df_list.append(df.loc[start:]) # Getting the last slice of the main dataframe

You can call the single dataframes like this:

df_list[0]
df_list[1]
...
df_list[n]

Greetings

huangapple
  • 本文由 发表于 2023年3月3日 19:43:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75626660.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定