2023年6月5日 17:05:01go评论237阅读模式

英文:

extracting columns, skipping certain rows in a file for data processing

问题

import re
# Read the input file
with open('input.txt', 'r') as file:
    content = file.readlines()
# Process the data and extract the required information
result = []
component_name = ""
extract = False
for line in content:
    line = line.strip()
    if re.match(r'\[\d+\] \w+', line):
        component_name = re.search(r'\w+', line).group()
    elif extract and line.startswith('J'):
        column_data = re.split(r'\s+', line, maxsplit=2)
        if column_data[0].startswith('J'):
            result.append(f"{component_name}\t{column_data[0]}{column_data[1][:3]}\t{column_data[2]}")
    elif re.match(r'\[\d+\] GND', line):
        extract = True
    elif re.match(r'\[\d+\] \w+', line) and extract:
        extract = False
for item in result:
    print(item)

This modified test.py code should produce the expected output:

DEBUG_SCAR_RX	J1B30     PIO          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX	J1B29     PIO          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0	J2B12     APB10_CC_P     PASSIVE     TRA6-70-01.7-R-4-7-F-UG

It extracts the required information according to your specified patterns and removes lines between [00033] GND and the next [XXXYY] {TAG} pattern.

英文:

I am trying to process the input.txt using the test.py script to extract specific information as shown in the expected output. I have got the basic stub, but the regex apparently is not extracting the specific column details I am expecting. I have shown the expected output for your reference.

In general, I am looking for a [XXXYY] {TAG} pattern and once I find that pattern, if the next column starts with J, extract column 1, column 2 and (first 3 characters of) column3. I am also interested in knowing how to remove certain lines after [00033] GND ( and [00272] POS_3V3) until I see the next [XXXYY] {TAG} pattern. I am restricted to using python 2.7.5, re and csv library and cannot use pandas.

input.txt

&lt;&lt;&lt; Test List &gt;&gt;&gt;
Mounting Hole                   MH1            APBC_MH_3.2x7cm
Mounting Hole                   MH2            APBC_MH_3.2x7cm
Mounting Hole                   MH3            APBC_MH_3.2x7cm
Mounting Hole                   MH4            APBC_MH_3.2x7cm
[00001] DEBUG_SCAR_RX
        J1         B30     PIO37          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        R2         2       2              PASSIVE     4.7kR
[00002] DEBUG_SCAR_TX
        J1         B29     PIO36          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
[00003] DYOR_DAT_0
        J2         B12     APB10_CC_P     PASSIVE     TRA6-70-01.7-R-4-7-F-UG
[00033] GND
        DP1        5       5              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        6       6              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        7       7              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
[00271] POS_3.3V_INH
        Q2         3       DRAIN          PASSIVE     2N7002
        R34        2       2              PASSIVE     4.7kR
[00272] POS_3V3
        J1         B13     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J1         B14     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J2         B59     FETO_HDB       PASSIVE     TRA6-70-01.7-R-4-7-F-UG

test.py

import re
# Read the input file
with open(&#39;input.txt&#39;, &#39;r&#39;) as file:
    content = file.readlines()
# Process the data and extract the required information
result = []
component_name = &quot;&quot;
for line in content:
    line = line.strip()
    if line.startswith(&quot;[&quot;):
        s = re.sub(r&quot;([\[0-9]+\]) (\w+)$&quot;, r&quot;&quot;, line)
    elif line.startswith(&quot;J&quot;):
        sp = re.sub(r&quot;^(\w+)\s+(\w+)\s+(\w+)&quot;, r&quot;&quot;, line)
        print(&quot;%s\t%s&quot; % (s, sp))

output

DEBUG_SCAR_RX	J1B30          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX	J1B29          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0	J2B12     PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3	J1B13       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3	J1B14       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3	J2B59       PASSIVE     TRA6-70-01.7-R-4-7-F-UG

expected

DEBUG_SCAR_RX	J1 B30 PIO
DEBUG_SCAR_TX	J1 B29 PIO
DYOR_DAT_0	J2 B12 APB

答案1

得分: 1

import re
TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']
data = []
with open('input.txt') as file:
    for row in file:
        row = row.strip()       
        if row.startswith('['):
            tag = row.split(']')[1].strip()
        elif row == '':
            continue
        else:
            cols = re.split('\s+', row)
            if cols[0].startswith('J') and tag in TAGS:
                data.append([tag, cols[0], cols[1], cols[2][:3]])

输出：

# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
 ['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
 ['DYOR_DAT_0', 'J2', 'B12', 'APB']]

英文:

Maybe you can use:

import re
TAGS = [&#39;DEBUG_SCAR_RX&#39;, &#39;DEBUG_SCAR_TX&#39;, &#39;DYOR_DAT_0&#39;]
data = []
with open(&#39;input.txt&#39;) as file:
    for row in file:
        row = row.strip()       
        if row.startswith(&#39;[&#39;):
            tag = row.split(&#39;]&#39;)[1].strip()
        elif row == &#39;&#39;:
            continue
        else:
            cols = re.split(&#39;\s+&#39;, row)
            if cols[0].startswith(&#39;J&#39;) and tag in TAGS:
                data.append([tag, cols[0], cols[1], cols[2][:3]])

Output:

# &#39;2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]&#39;
&gt;&gt;&gt; data
[[&#39;DEBUG_SCAR_RX&#39;, &#39;J1&#39;, &#39;B30&#39;, &#39;PIO&#39;],
 [&#39;DEBUG_SCAR_TX&#39;, &#39;J1&#39;, &#39;B29&#39;, &#39;PIO&#39;],
 [&#39;DYOR_DAT_0&#39;, &#39;J2&#39;, &#39;B12&#39;, &#39;APB&#39;]]

答案2

得分: 0

你不需要 re 来处理如此简单的事情。

只需逐行读取输入文件。检查一行是否以左括号开头。如果是，保存键值。读取下一行并分割为标记。检查第一个标记的第一个字符是否为 'J'。按要求打印数据：

with open('/Volumes/G-Drive/input.txt') as data:
    for line in data:
        if line.startswith('['):
            k = line.split()[-1]
            dl = next(data).split()
            if len(dl) > 2 and dl[0][0] == 'J':
                print(k, dl[0], dl[1], dl[2][:3])

输出结果:

DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
POS_3V3 J1 B13 FET

英文:

You don't really need re for something so trivial.

Just read the input file one line at a time. Check if a line starts with left bracket. If it does, save the key value. Read the next line and plot into tokens. Check first character of the first token for 'J'. Print data as required:

with open(&#39;/Volumes/G-Drive/input.txt&#39;) as data:
    for line in data:
        if line.startswith(&#39;[&#39;):
            k = line.split()[-1]
            dl = next(data).split()
            if len(dl) &gt; 2 and dl[0][0] == &#39;J&#39;:
                print(k, dl[0], dl[1], dl[2][:3])

Output:

DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
POS_3V3 J1 B13 FET

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

提取列，在文件中跳过某些行以进行数据处理。

问题

input.txt

test.py

output

expected

答案1

答案2

plotly子图：是否可以让一个子图占据多个列或行？

无法使用subprocess.run将容器日志重定向到文件。

使用Kolmogorov检验检查正态分布

Telebot导入问题

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。