提取Python中文件内容作为变量。

huangapple go评论95阅读模式
英文:

extracting contents of file as variables in python

问题

Linux中有一个如下的文件。

file_namebatch_file.txt

sub_directorycode_base/workflow_1

script_namecode_base/workflow_1/session_1.py

batch_file.txt 的内容如下:

1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2

我想在 session_1.py 文件中读取 batch_file.txt 的内容,并根据 file_namesub_directory 创建变量。这些变量将是:

batch_id = 第一个#之前的数字
workflow_id = 第一个和第二个#之间的数字
workflow_name = 第二个和第三个#之间的数字
session_id = 第三个和第四个#之间的数字
session_name = 第四个和第五个#之间的数字
run_date = 第五个和第六个#之间的数字
flow_name = 第六个和第七个#之间的数字
flow_id = 第七个#之后的数字

我有以下代码:

batch_content = open('batch_file.txt', 'r')
batch_content.readlines()

但我不确定如何进一步处理?

英文:

I have a file like below in Linux.

file_name is batch_file.txt.

sub_directory is code_base/workflow_1

script_name is code_base/workflow_1/session_1.py

batch_file.txt contents are:

  1. 1#1#workflow_1#1#session_1#2023-04-02#FDR#2
  2. 1#2#workflow_2#2#session_2#2023-04-02#FDR#2
  3. 1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2

I want to read the contents of batch_file.txt in the session_1.py file and create variables based on the file_name and sub_directory. The variables would be:

  1. batch_id = number before 1st #
  2. workflow_id = number between 1st and 2nd #
  3. workflow_name = number between 2nd and 3rd #
  4. session_id = number between 3rd and 4th #
  5. session_name = number between 4th and 5th #
  6. run_date = number between 5th and 6th #
  7. flow_name = number between 6th and 7th #
  8. flow_id = number after 7th #

I have this:

  1. batch_content = open('batch_file.txt', 'r')
  2. batch_content.readlines()

But I am not sure how to proceed further?

答案1

得分: 1

如果您希望在运行时命名变量,您_可以_这样做,但不应该这样做。

相反,我会使用一个字典列表。

  1. [
  2. {'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2],
  3. 'session_name': x[3], 'session_id': x[4], 'run_date': x[5],
  4. 'flow_name': x[6], 'flow_id': x[7]}
  5. for line in text.splitlines()
  6. for x in (line.split('#'),)
  7. ]

结果:

  1. [
  2. {'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
  3. {'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
  4. {'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
  5. ]
英文:

If you want variables that are named at runtime, you can do that, but shouldn't.

Instead I would use a list of dictionaries.

  1. [
  2. {'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2],
  3. 'session_name': x[3], 'session_id': x[4], 'run_date': x[5],
  4. 'flow_name': x[6], 'flow_id': x[7]}
  5. for line in text.splitlines()
  6. for x in (line.split('#'),)
  7. ]

Result:

  1. [
  2. {'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
  3. {'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'},
  4. {'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
  5. ]

答案2

得分: 1

使用csv模块来将数据读入字典(或者可以选择使用pandas来读入数据框)。

作为示例:

  1. import csv
  2. with open('batch_file.txt', mode='r') as csv_file:
  3. csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
  4. for line in csv_reader:
  5. print(line)

每次从文件中读取的行都会得到一个字典,其中"variable"名作为键,文件内容作为值。您可以根据需要进行任何操作。

例如:

  1. import csv
  2. with open('batch_file.txt', mode='r') as csv_file:
  3. csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
  4. for line in csv_reader:
  5. print(f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}")

如果您需要更多地处理这些数据或对数据进行任何转换,pandas可能更合适。

一个简单的示例:

  1. import pandas as pd
  2. df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')
  3. df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)
  4. display(df['message'])

这将在数据框中创建一个包含消息的新列,然后显示该列。

英文:

Use the csv module to read the data into a dictionary (or optionally use pandas to read into a dataframe).

As an example:

  1. import csv
  2. with open('batch_file.txt', mode='r') as csv_file:
  3. csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
  4. for line in csv_reader:
  5. print(line)

>

  1. {'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_id': '1', 'session_name': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
  2. {'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_id': '2', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
  3. {'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_id': '3', 'session_name': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}

For each line read from the file, you get a dictionary with your "variable" names as the key and your file contents as the value. With this you can do whatever you wish.

For instance:

  1. import csv
  2. with open('batch_file.txt', mode='r') as csv_file:
  3. csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#')
  4. for line in csv_reader:
  5. print(f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}")

>

  1. The workflow name of workflow id 1 is workflow_1
  2. The workflow name of workflow id 2 is workflow_2
  3. The workflow name of workflow id 3 is workflow_1_2

If you are needing to work more with this data or perform any transformation of the data, pandas may be more appropriate.

A silly example:

  1. import pandas as pd
  2. df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')
  3. df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)
  4. display(df['message'])

>

  1. 0 The workflow name of workflow id 1 is workflow_1
  2. 1 The workflow name of workflow id 2 is workflow_2
  3. 2 The workflow name of workflow id 3 is workflow...

答案3

得分: 1

你可以使用split来实现你的输出

  1. with open("batch_details.txt") as search:
  2. for line in search:
  3. line = line.rstrip() # 移除行尾的'\n'
  4. if 'workflow_1' and 'session_1' in line:
  5. batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')
  6. print(batch_id)
  7. print(workflow_id)
  8. print(workflow_name)
  9. print(session_id)
  10. print(session_name)
  11. print(run_date)
  12. print(flow_name)
  13. print(flow_id)
英文:

You can use split to achieve your output

  1. with open("batch_details.txt") as search:
  2. for line in search:
  3. line = line.rstrip() # remove '\n' at end of line
  4. if 'workflow_1' and 'session_1' in line:
  5. batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')
  6. print(batch_id)
  7. print(workflow_id)
  8. print(workflow_name)
  9. print(session_id)
  10. print(session_name)
  11. print(run_date)
  12. print(flow_name)
  13. print(flow_id)

huangapple
  • 本文由 发表于 2023年4月4日 05:39:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923964.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定