2023年4月4日 05:39:26go评论95阅读模式

英文:

extracting contents of file as variables in python

问题

Linux中有一个如下的文件。

file_name 是 batch_file.txt。

sub_directory 是 code_base/workflow_1

script_name 是 code_base/workflow_1/session_1.py

batch_file.txt 的内容如下：

1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2

我想在 session_1.py 文件中读取 batch_file.txt 的内容，并根据 file_name 和 sub_directory 创建变量。这些变量将是：

batch_id = 第一个#之前的数字
workflow_id = 第一个和第二个#之间的数字
workflow_name = 第二个和第三个#之间的数字
session_id = 第三个和第四个#之间的数字
session_name = 第四个和第五个#之间的数字
run_date = 第五个和第六个#之间的数字
flow_name = 第六个和第七个#之间的数字
flow_id = 第七个#之后的数字

我有以下代码：

batch_content = open('batch_file.txt', 'r')
batch_content.readlines()

但我不确定如何进一步处理？

英文:

I have a file like below in Linux.

file_name is batch_file.txt.

sub_directory is code_base/workflow_1

script_name is code_base/workflow_1/session_1.py

batch_file.txt contents are:

1#1#workflow_1#1#session_1#2023-04-02#FDR#2
1#2#workflow_2#2#session_2#2023-04-02#FDR#2
1#3#workflow_1_2#3#session_2#2023-04-02#FDR#2

I want to read the contents of batch_file.txt in the session_1.py file and create variables based on the file_name and sub_directory. The variables would be:

batch_id = number before 1st #
workflow_id = number between 1st and 2nd #
workflow_name = number between 2nd and 3rd #	
session_id = number between 3rd and 4th #	
session_name = number between 4th and 5th #
run_date = number between 5th and 6th # 
flow_name = number between 6th and 7th #	
flow_id = number after 7th #

I have this:

batch_content = open(&#39;batch_file.txt&#39;, &#39;r&#39;)
batch_content.readlines()

But I am not sure how to proceed further?

答案1

得分: 1

如果您希望在运行时命名变量，您_可以_这样做，但不应该这样做。

相反，我会使用一个字典列表。

[
  {'batch_id': x[0], 'workflow_id': x[1], 'workflow_name': x[2], 
   'session_name': x[3], 'session_id': x[4], 'run_date': x[5], 
   'flow_name': x[6], 'flow_id': x[7]}
  for line in text.splitlines()
  for x in (line.split('#'),)
]

结果:

[
  {'batch_id': '1', 'workflow_id': '1', 'workflow_name': 'workflow_1', 'session_name': '1', 'session_id': 'session_1', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '2', 'workflow_name': 'workflow_2', 'session_name': '2', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}, 
  {'batch_id': '1', 'workflow_id': '3', 'workflow_name': 'workflow_1_2', 'session_name': '3', 'session_id': 'session_2', 'run_date': '2023-04-02', 'flow_name': 'FDR', 'flow_id': '2'}
]

英文:

If you want variables that are named at runtime, you can do that, but shouldn't.

Instead I would use a list of dictionaries.

[
  {&#39;batch_id&#39;: x[0], &#39;workflow_id&#39;: x[1], &#39;workflow_name&#39;: x[2], 
   &#39;session_name&#39;: x[3], &#39;session_id&#39;: x[4], &#39;run_date&#39;: x[5], 
   &#39;flow_name&#39;: x[6], &#39;flow_id&#39;: x[7]}
  for line in text.splitlines()
  for x in (line.split(&#39;#&#39;),)
]

Result:

[
  {&#39;batch_id&#39;: &#39;1&#39;, &#39;workflow_id&#39;: &#39;1&#39;, &#39;workflow_name&#39;: &#39;workflow_1&#39;, &#39;session_name&#39;: &#39;1&#39;, &#39;session_id&#39;: &#39;session_1&#39;, &#39;run_date&#39;: &#39;2023-04-02&#39;, &#39;flow_name&#39;: &#39;FDR&#39;, &#39;flow_id&#39;: &#39;2&#39;}, 
  {&#39;batch_id&#39;: &#39;1&#39;, &#39;workflow_id&#39;: &#39;2&#39;, &#39;workflow_name&#39;: &#39;workflow_2&#39;, &#39;session_name&#39;: &#39;2&#39;, &#39;session_id&#39;: &#39;session_2&#39;, &#39;run_date&#39;: &#39;2023-04-02&#39;, &#39;flow_name&#39;: &#39;FDR&#39;, &#39;flow_id&#39;: &#39;2&#39;}, 
  {&#39;batch_id&#39;: &#39;1&#39;, &#39;workflow_id&#39;: &#39;3&#39;, &#39;workflow_name&#39;: &#39;workflow_1_2&#39;, &#39;session_name&#39;: &#39;3&#39;, &#39;session_id&#39;: &#39;session_2&#39;, &#39;run_date&#39;: &#39;2023-04-02&#39;, &#39;flow_name&#39;: &#39;FDR&#39;, &#39;flow_id&#39;: &#39;2&#39;}
]

答案2

得分: 1

使用csv模块来将数据读入字典（或者可以选择使用pandas来读入数据框）。

作为示例：

import csv 
with open('batch_file.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#') 
    for line in csv_reader:
        print(line)

每次从文件中读取的行都会得到一个字典，其中"variable"名作为键，文件内容作为值。您可以根据需要进行任何操作。

例如：

import csv 
with open('batch_file.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], delimiter='#') 
    for line in csv_reader:
        print(f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}")

如果您需要更多地处理这些数据或对数据进行任何转换，pandas可能更合适。

一个简单的示例：

import pandas as pd
df = pd.read_csv('batch_file.txt', header=None, names=['batch_id','workflow_id','workflow_name','session_id','session_name','run_date','flow_name','flow_id'], sep='#')
df['message'] = df.apply(lambda line: f"The workflow name of workflow id {line['workflow_id']} is {line['workflow_name']}", axis=1)
display(df['message'])

这将在数据框中创建一个包含消息的新列，然后显示该列。

英文:

Use the csv module to read the data into a dictionary (or optionally use pandas to read into a dataframe).

As an example:

import csv 
with open(&#39;batch_file.txt&#39;, mode=&#39;r&#39;) as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=[&#39;batch_id&#39;,&#39;workflow_id&#39;,&#39;workflow_name&#39;,&#39;session_id&#39;,&#39;session_name&#39;,&#39;run_date&#39;,&#39;flow_name&#39;,&#39;flow_id&#39;], delimiter=&#39;#&#39;) 
    for line in csv_reader:
        print(line)

{&#39;batch_id&#39;: &#39;1&#39;, &#39;workflow_id&#39;: &#39;1&#39;, &#39;workflow_name&#39;: &#39;workflow_1&#39;, &#39;session_id&#39;: &#39;1&#39;, &#39;session_name&#39;: &#39;session_1&#39;, &#39;run_date&#39;: &#39;2023-04-02&#39;, &#39;flow_name&#39;: &#39;FDR&#39;, &#39;flow_id&#39;: &#39;2&#39;}
{&#39;batch_id&#39;: &#39;1&#39;, &#39;workflow_id&#39;: &#39;2&#39;, &#39;workflow_name&#39;: &#39;workflow_2&#39;, &#39;session_id&#39;: &#39;2&#39;, &#39;session_name&#39;: &#39;session_2&#39;, &#39;run_date&#39;: &#39;2023-04-02&#39;, &#39;flow_name&#39;: &#39;FDR&#39;, &#39;flow_id&#39;: &#39;2&#39;}
{&#39;batch_id&#39;: &#39;1&#39;, &#39;workflow_id&#39;: &#39;3&#39;, &#39;workflow_name&#39;: &#39;workflow_1_2&#39;, &#39;session_id&#39;: &#39;3&#39;, &#39;session_name&#39;: &#39;session_2&#39;, &#39;run_date&#39;: &#39;2023-04-02&#39;, &#39;flow_name&#39;: &#39;FDR&#39;, &#39;flow_id&#39;: &#39;2&#39;}

For each line read from the file, you get a dictionary with your "variable" names as the key and your file contents as the value. With this you can do whatever you wish.

For instance:

import csv 
with open(&#39;batch_file.txt&#39;, mode=&#39;r&#39;) as csv_file:
    csv_reader = csv.DictReader(csv_file, fieldnames=[&#39;batch_id&#39;,&#39;workflow_id&#39;,&#39;workflow_name&#39;,&#39;session_id&#39;,&#39;session_name&#39;,&#39;run_date&#39;,&#39;flow_name&#39;,&#39;flow_id&#39;], delimiter=&#39;#&#39;) 
    for line in csv_reader:
        print(f&quot;The workflow name of workflow id {line[&#39;workflow_id&#39;]} is {line[&#39;workflow_name&#39;]}&quot;)

The workflow name of workflow id 1 is workflow_1
The workflow name of workflow id 2 is workflow_2
The workflow name of workflow id 3 is workflow_1_2

If you are needing to work more with this data or perform any transformation of the data, pandas may be more appropriate.

A silly example:

import pandas as pd
df = pd.read_csv(&#39;batch_file.txt&#39;, header=None, names=[&#39;batch_id&#39;,&#39;workflow_id&#39;,&#39;workflow_name&#39;,&#39;session_id&#39;,&#39;session_name&#39;,&#39;run_date&#39;,&#39;flow_name&#39;,&#39;flow_id&#39;], sep=&#39;#&#39;)
df[&#39;message&#39;] = df.apply(lambda line: f&quot;The workflow name of workflow id {line[&#39;workflow_id&#39;]} is {line[&#39;workflow_name&#39;]}&quot;, axis=1)
display(df[&#39;message&#39;])

0     The workflow name of workflow id 1 is workflow_1
1     The workflow name of workflow id 2 is workflow_2
2    The workflow name of workflow id 3 is workflow...

答案3

得分: 1

你可以使用split来实现你的输出

with open("batch_details.txt") as search:
    for line in search:
        line = line.rstrip() # 移除行尾的'\n'
        if 'workflow_1' and 'session_1' in line:
            	batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split('#')
print(batch_id)
print(workflow_id)
print(workflow_name)
print(session_id)
print(session_name)
print(run_date)
print(flow_name)
print(flow_id)

英文:

You can use split to achieve your output

with open(&quot;batch_details.txt&quot;) as search:
    for line in search:
        line = line.rstrip() # remove &#39;\n&#39; at end of line
        if &#39;workflow_1&#39; and &#39;session_1&#39; in line:
            	batch_id, workflow_id, workflow_name, session_id, session_name, run_date, flow_name, flow_id = line.split(&#39;#&#39;)
print(batch_id)
print(workflow_id)
print(workflow_name)
print(session_id)
print(session_name)
print(run_date)
print(flow_name)
print(flow_id)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

提取Python中文件内容作为变量。

问题

答案1

答案2

答案3

如何防止Django字段被包裹在`div`中？

使用Python拆分字符串，当相同的分隔符在不同记录中具有不同含义时。

在字符串列表的列表中找到不重复的字符串出现（附加更多条件）

如何在pyarrow数据类型中使用分类数据类型？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论