提取Python中文件内容作为变量。

huangapple go评论67阅读模式
英文:

extracting contents of file as variables in python

问题

Linux中有一个如下的文件。

file_namebatch_file.txt

sub_directorycode_base/workflow_1

script_namecode_base/workflow_1/session_1.py

batch_file.txt 的内容如下:

1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2

我想在 session_1.py 文件中读取 batch_file.txt 的内容,并根据 file_namesub_directory 创建变量。这些变量将是:

batch_id = 第一个#之前的数字
workflow_id = 第一个和第二个#之间的数字
workflow_name = 第二个和第三个#之间的数字
session_id = 第三个和第四个#之间的数字
session_name = 第四个和第五个#之间的数字
run_date = 第五个和第六个#之间的数字
flow_name = 第六个和第七个#之间的数字
flow_id = 第七个#之后的数字

我有以下代码:

batch_content = open('batch_file.txt', 'r')
batch_content.readlines()

但我不确定如何进一步处理?

英文:

I have a file like below in Linux.

file_name is batch_file.txt.

sub_directory is code_base/workflow_1

script_name is code_base/workflow_1/session_1.py

batch_file.txt contents are:

1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2

I want to read the contents of batch_file.txt in the session_1.py file and create variables based on the file_name and sub_directory. The variables would be:

batch_id = number before 1st #
workflow_id = number between 1st and 2nd #
workflow_name = number between 2nd and 3rd #	
session_id = number between 3rd and 4th #	
session_name = number between 4th and 5th #
run_date = number between 5th and 6th # 
flow_name = number between 6th and 7th #	
flow_id = number after 7th #	

I have this:

batch_content = open('batch_file.txt', 'r')
batch_content.readlines()

But I am not sure how to proceed further?

答案1

得分: 1

如果您希望在运行时命名变量,您_可以_这样做,但不应该这样做。

相反,我会使用一个字典列表。

[
  {'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2], 
   'session_name': x[3], 'session_id': x[4], 'run_date': x[5], 
   'flow_name': x[6], 'flow_id': x[7]}
  for line in text.splitlines()
  for x in (line.split('#'),)
]

结果:

[
  {'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
]
英文:

If you want variables that are named at runtime, you can do that, but shouldn't.

Instead I would use a list of dictionaries.

[
  {'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2], 
   'session_name': x[3], 'session_id': x[4], 'run_date': x[5], 
   'flow_name': x[6], 'flow_id': x[7]}
  for line in text.splitlines()
  for x in (line.split('#'),)
]

Result:

[
  {'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
]

答案2

得分: 1

使用csv模块来将数据读入字典(或者可以选择使用pandas来读入数据框)。

作为示例:

import csv 

with open('batch_file.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#') 
    for line in csv_reader:
        print(line)

每次从文件中读取的行都会得到一个字典,其中"variable"名作为键,文件内容作为值。您可以根据需要进行任何操作。

例如:

import csv 

with open('batch_file.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#') 
    for line in csv_reader:
        print(f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}")

如果您需要更多地处理这些数据或对数据进行任何转换,pandas可能更合适。

一个简单的示例:

import pandas as pd

df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')

df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)

display(df['message'])

这将在数据框中创建一个包含消息的新列,然后显示该列。

英文:

Use the csv module to read the data into a dictionary (or optionally use pandas to read into a dataframe).

As an example:

import csv 

with open('batch_file.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#') 
    for line in csv_reader:
        print(line)

>

{'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_id': '1', 'session_name': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
{'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_id': '2', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
{'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_id': '3', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}

For each line read from the file, you get a dictionary with your "variable" names as the key and your file contents as the value. With this you can do whatever you wish.

For instance:

import csv 

with open('batch_file.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#') 
    for line in csv_reader:
        print(f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}")

>

The workflow name of workflow id 1 is workflow_1
The workflow name of workflow id 2 is workflow_2
The workflow name of workflow id 3 is workflow_1_2

If you are needing to work more with this data or perform any transformation of the data, pandas may be more appropriate.

A silly example:

import pandas as pd

df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')

df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)

display(df['message'])

>

0     The workflow name of workflow id 1 is workflow_1
1     The workflow name of workflow id 2 is workflow_2
2    The workflow name of workflow id 3 is workflow...

答案3

得分: 1

你可以使用split来实现你的输出

with open("batch_details.txt") as search:
    for line in search:
        line = line.rstrip() # 移除行尾的'\n'
        if 'workflow_1' and 'session_1' in line:
            	batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')

print(batch_id)
print(workflow_id)
print(workflow_name)
print(session_id)
print(session_name)
print(run_date)
print(flow_name)
print(flow_id)
英文:

You can use split to achieve your output

with open("batch_details.txt") as search:
    for line in search:
        line = line.rstrip() # remove '\n' at end of line
        if 'workflow_1' and 'session_1' in line:
            	batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')

print(batch_id)
print(workflow_id)
print(workflow_name)
print(session_id)
print(session_name)
print(run_date)
print(flow_name)
print(flow_id)

huangapple
  • 本文由 发表于 2023年4月4日 05:39:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923964.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定