英文:
extracting contents of file as variables in python
问题
Linux
中有一个如下的文件。
file_name
是 batch_file.txt
。
sub_directory
是 code_base/workflow_1
script_name
是 code_base/workflow_1/session_1.py
batch_file.txt
的内容如下:
1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2
我想在 session_1.py
文件中读取 batch_file.txt
的内容,并根据 file_name
和 sub_directory
创建变量。这些变量将是:
batch_id = 第一个#之前的数字
workflow_id = 第一个和第二个#之间的数字
workflow_name = 第二个和第三个#之间的数字
session_id = 第三个和第四个#之间的数字
session_name = 第四个和第五个#之间的数字
run_date = 第五个和第六个#之间的数字
flow_name = 第六个和第七个#之间的数字
flow_id = 第七个#之后的数字
我有以下代码:
batch_content = open('batch_file.txt', 'r')
batch_content.readlines()
但我不确定如何进一步处理?
英文:
I have a file like below in Linux
.
file_name
is batch_file.txt
.
sub_directory
is code_base/workflow_1
script_name
is code_base/workflow_1/session_1.py
batch_file.txt
contents are:
1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2
I want to read the contents of batch_file.txt
in the session_1.py
file and create variables based on the file_name
and sub_directory
. The variables would be:
batch_id = number before 1st #
workflow_id = number between 1st and 2nd #
workflow_name = number between 2nd and 3rd #
session_id = number between 3rd and 4th #
session_name = number between 4th and 5th #
run_date = number between 5th and 6th #
flow_name = number between 6th and 7th #
flow_id = number after 7th #
I have this:
batch_content = open('batch_file.txt', 'r')
batch_content.readlines()
But I am not sure how to proceed further?
答案1
得分: 1
如果您希望在运行时命名变量,您_可以_这样做,但不应该这样做。
相反,我会使用一个字典列表。
[
{'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2],
'session_name': x[3], 'session_id': x[4], 'run_date': x[5],
'flow_name': x[6], 'flow_id': x[7]}
for line in text.splitlines()
for x in (line.split('#'),)
]
结果:
[
{'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
{'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
{'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
]
英文:
If you want variables that are named at runtime, you can do that, but shouldn't.
Instead I would use a list of dictionaries.
[
{'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2],
'session_name': x[3], 'session_id': x[4], 'run_date': x[5],
'flow_name': x[6], 'flow_id': x[7]}
for line in text.splitlines()
for x in (line.split('#'),)
]
Result:
[
{'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
{'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
{'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
]
答案2
得分: 1
使用csv
模块来将数据读入字典(或者可以选择使用pandas来读入数据框)。
作为示例:
import csv
with open('batch_file.txt', mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
for line in csv_reader:
print(line)
每次从文件中读取的行都会得到一个字典,其中"variable"名作为键,文件内容作为值。您可以根据需要进行任何操作。
例如:
import csv
with open('batch_file.txt', mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
for line in csv_reader:
print(f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}")
如果您需要更多地处理这些数据或对数据进行任何转换,pandas
可能更合适。
一个简单的示例:
import pandas as pd
df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')
df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)
display(df['message'])
这将在数据框中创建一个包含消息的新列,然后显示该列。
英文:
Use the csv
module to read the data into a dictionary (or optionally use pandas to read into a dataframe).
As an example:
import csv
with open('batch_file.txt', mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
for line in csv_reader:
print(line)
>
{'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_id': '1', 'session_name': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
{'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_id': '2', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
{'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_id': '3', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
For each line read from the file, you get a dictionary with your "variable" names as the key and your file contents as the value. With this you can do whatever you wish.
For instance:
import csv
with open('batch_file.txt', mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
for line in csv_reader:
print(f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}")
>
The workflow name of workflow id 1 is workflow_1
The workflow name of workflow id 2 is workflow_2
The workflow name of workflow id 3 is workflow_1_2
If you are needing to work more with this data or perform any transformation of the data, pandas
may be more appropriate.
A silly example:
import pandas as pd
df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')
df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)
display(df['message'])
>
0 The workflow name of workflow id 1 is workflow_1
1 The workflow name of workflow id 2 is workflow_2
2 The workflow name of workflow id 3 is workflow...
答案3
得分: 1
你可以使用split来实现你的输出
with open("batch_details.txt") as search:
for line in search:
line = line.rstrip() # 移除行尾的'\n'
if 'workflow_1' and 'session_1' in line:
batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')
print(batch_id)
print(workflow_id)
print(workflow_name)
print(session_id)
print(session_name)
print(run_date)
print(flow_name)
print(flow_id)
英文:
You can use split to achieve your output
with open("batch_details.txt") as search:
for line in search:
line = line.rstrip() # remove '\n' at end of line
if 'workflow_1' and 'session_1' in line:
batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')
print(batch_id)
print(workflow_id)
print(workflow_name)
print(session_id)
print(session_name)
print(run_date)
print(flow_name)
print(flow_id)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论