提取列,在文件中跳过某些行以进行数据处理。

huangapple go评论156阅读模式
英文:

extracting columns, skipping certain rows in a file for data processing

问题

import re

# Read the input file
with open('input.txt', 'r') as file:
    content = file.readlines()

# Process the data and extract the required information
result = []
component_name = ""
extract = False
for line in content:
    line = line.strip()
    if re.match(r'\[\d+\] \w+', line):
        component_name = re.search(r'\w+', line).group()
    elif extract and line.startswith('J'):
        column_data = re.split(r'\s+', line, maxsplit=2)
        if column_data[0].startswith('J'):
            result.append(f"{component_name}\t{column_data[0]}{column_data[1][:3]}\t{column_data[2]}")
    elif re.match(r'\[\d+\] GND', line):
        extract = True
    elif re.match(r'\[\d+\] \w+', line) and extract:
        extract = False

for item in result:
    print(item)

This modified test.py code should produce the expected output:

DEBUG_SCAR_RX	J1B30     PIO          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX	J1B29     PIO          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0	J2B12     APB10_CC_P     PASSIVE     TRA6-70-01.7-R-4-7-F-UG

It extracts the required information according to your specified patterns and removes lines between [00033] GND and the next [XXXYY] {TAG} pattern.

英文:

I am trying to process the input.txt using the test.py script to extract specific information as shown in the expected output. I have got the basic stub, but the regex apparently is not extracting the specific column details I am expecting. I have shown the expected output for your reference.

In general, I am looking for a [XXXYY] {TAG} pattern and once I find that pattern, if the next column starts with J, extract column 1, column 2 and (first 3 characters of) column3. I am also interested in knowing how to remove certain lines after [00033] GND ( and [00272] POS_3V3) until I see the next [XXXYY] {TAG} pattern. I am restricted to using python 2.7.5, re and csv library and cannot use pandas.

input.txt
<<< Test List >>>
Mounting Hole                   MH1            APBC_MH_3.2x7cm
Mounting Hole                   MH2            APBC_MH_3.2x7cm
Mounting Hole                   MH3            APBC_MH_3.2x7cm
Mounting Hole                   MH4            APBC_MH_3.2x7cm

[00001] DEBUG_SCAR_RX
        J1         B30     PIO37          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        R2         2       2              PASSIVE     4.7kR

[00002] DEBUG_SCAR_TX
        J1         B29     PIO36          PASSIVE     TRA6-70-01.7-R-4-7-F-UG

[00003] DYOR_DAT_0
        J2         B12     APB10_CC_P     PASSIVE     TRA6-70-01.7-R-4-7-F-UG

[00033] GND
        DP1        5       5              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        6       6              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V
        DP1        7       7              PASSIVE     MECH, DIP_SWITCH, FFFN-04F-V

[00271] POS_3.3V_INH
        Q2         3       DRAIN          PASSIVE     2N7002
        R34        2       2              PASSIVE     4.7kR

[00272] POS_3V3
        J1         B13     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J1         B14     FETO_FAT       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
        J2         B59     FETO_HDB       PASSIVE     TRA6-70-01.7-R-4-7-F-UG

test.py
import re

# Read the input file
with open('input.txt', 'r') as file:
    content = file.readlines()

# Process the data and extract the required information
result = []
component_name = ""
for line in content:
    line = line.strip()
    if line.startswith("["):
        s = re.sub(r"([\[0-9]+\]) (\w+)$", r"", line)
    elif line.startswith("J"):
        sp = re.sub(r"^(\w+)\s+(\w+)\s+(\w+)", r"", line)
        print("%s\t%s" % (s, sp))

output
DEBUG_SCAR_RX	J1B30          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX	J1B29          PASSIVE     TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0	J2B12     PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3	J1B13       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3	J1B14       PASSIVE     TRA6-70-01.7-R-4-7-F-UG
POS_3V3	J2B59       PASSIVE     TRA6-70-01.7-R-4-7-F-UG

expected
DEBUG_SCAR_RX	J1 B30 PIO
DEBUG_SCAR_TX	J1 B29 PIO
DYOR_DAT_0	J2 B12 APB

答案1

得分: 1

import re

TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']

data = []
with open('input.txt') as file:
    for row in file:
        row = row.strip()       
        if row.startswith('['):
            tag = row.split(']')[1].strip()
        elif row == '':
            continue
        else:
            cols = re.split('\s+', row)
            if cols[0].startswith('J') and tag in TAGS:
                data.append([tag, cols[0], cols[1], cols[2][:3]])

输出:

# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
 ['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
 ['DYOR_DAT_0', 'J2', 'B12', 'APB']]
英文:

Maybe you can use:

import re

TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']

data = []
with open('input.txt') as file:
    for row in file:
        row = row.strip()       
        if row.startswith('['):
            tag = row.split(']')[1].strip()
        elif row == '':
            continue
        else:
            cols = re.split('\s+', row)
            if cols[0].startswith('J') and tag in TAGS:
                data.append([tag, cols[0], cols[1], cols[2][:3]])

Output:

# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
 ['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
 ['DYOR_DAT_0', 'J2', 'B12', 'APB']]

答案2

得分: 0

你不需要 re 来处理如此简单的事情。

只需逐行读取输入文件。检查一行是否以左括号开头。如果是,保存键值。读取下一行并分割为标记。检查第一个标记的第一个字符是否为 'J'。按要求打印数据:

with open('/Volumes/G-Drive/input.txt') as data:
    for line in data:
        if line.startswith('['):
            k = line.split()[-1]
            dl = next(data).split()
            if len(dl) > 2 and dl[0][0] == 'J':
                print(k, dl[0], dl[1], dl[2][:3])

输出结果:

DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
POS_3V3 J1 B13 FET
英文:

You don't really need re for something so trivial.

Just read the input file one line at a time. Check if a line starts with left bracket. If it does, save the key value. Read the next line and plot into tokens. Check first character of the first token for 'J'. Print data as required:

with open('/Volumes/G-Drive/input.txt') as data:
    for line in data:
        if line.startswith('['):
            k = line.split()[-1]
            dl = next(data).split()
            if len(dl) > 2 and dl[0][0] == 'J':
                print(k, dl[0], dl[1], dl[2][:3])

Output:

DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
POS_3V3 J1 B13 FET

huangapple
  • 本文由 发表于 2023年6月5日 17:05:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76404918.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定