英文:
extracting columns, skipping certain rows in a file for data processing
问题
import re
# Read the input file
with open('input.txt', 'r') as file:
content = file.readlines()
# Process the data and extract the required information
result = []
component_name = ""
extract = False
for line in content:
line = line.strip()
if re.match(r'\[\d+\] \w+', line):
component_name = re.search(r'\w+', line).group()
elif extract and line.startswith('J'):
column_data = re.split(r'\s+', line, maxsplit=2)
if column_data[0].startswith('J'):
result.append(f"{component_name}\t{column_data[0]}{column_data[1][:3]}\t{column_data[2]}")
elif re.match(r'\[\d+\] GND', line):
extract = True
elif re.match(r'\[\d+\] \w+', line) and extract:
extract = False
for item in result:
print(item)
This modified test.py
code should produce the expected output:
DEBUG_SCAR_RX J1B30 PIO PASSIVE TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX J1B29 PIO PASSIVE TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0 J2B12 APB10_CC_P PASSIVE TRA6-70-01.7-R-4-7-F-UG
It extracts the required information according to your specified patterns and removes lines between [00033] GND
and the next [XXXYY] {TAG}
pattern.
英文:
I am trying to process the input.txt
using the test.py
script to extract specific information as shown in the expected output. I have got the basic stub, but the regex apparently is not extracting the specific column details I am expecting. I have shown the expected output for your reference.
In general, I am looking for a [XXXYY] {TAG}
pattern and once I find that pattern, if the next column starts with J
, extract column 1, column 2 and (first 3 characters of) column3. I am also interested in knowing how to remove certain lines after [00033] GND
( and [00272] POS_3V3
) until I see the next [XXXYY] {TAG}
pattern. I am restricted to using python 2.7.5, re and csv library and cannot use pandas.
input.txt
<<< Test List >>>
Mounting Hole MH1 APBC_MH_3.2x7cm
Mounting Hole MH2 APBC_MH_3.2x7cm
Mounting Hole MH3 APBC_MH_3.2x7cm
Mounting Hole MH4 APBC_MH_3.2x7cm
[00001] DEBUG_SCAR_RX
J1 B30 PIO37 PASSIVE TRA6-70-01.7-R-4-7-F-UG
R2 2 2 PASSIVE 4.7kR
[00002] DEBUG_SCAR_TX
J1 B29 PIO36 PASSIVE TRA6-70-01.7-R-4-7-F-UG
[00003] DYOR_DAT_0
J2 B12 APB10_CC_P PASSIVE TRA6-70-01.7-R-4-7-F-UG
[00033] GND
DP1 5 5 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
DP1 6 6 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
DP1 7 7 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
[00271] POS_3.3V_INH
Q2 3 DRAIN PASSIVE 2N7002
R34 2 2 PASSIVE 4.7kR
[00272] POS_3V3
J1 B13 FETO_FAT PASSIVE TRA6-70-01.7-R-4-7-F-UG
J1 B14 FETO_FAT PASSIVE TRA6-70-01.7-R-4-7-F-UG
J2 B59 FETO_HDB PASSIVE TRA6-70-01.7-R-4-7-F-UG
test.py
import re
# Read the input file
with open('input.txt', 'r') as file:
content = file.readlines()
# Process the data and extract the required information
result = []
component_name = ""
for line in content:
line = line.strip()
if line.startswith("["):
s = re.sub(r"([\[0-9]+\]) (\w+)$", r"", line)
elif line.startswith("J"):
sp = re.sub(r"^(\w+)\s+(\w+)\s+(\w+)", r"", line)
print("%s\t%s" % (s, sp))
output
DEBUG_SCAR_RX J1B30 PASSIVE TRA6-70-01.7-R-4-7-F-UG
DEBUG_SCAR_TX J1B29 PASSIVE TRA6-70-01.7-R-4-7-F-UG
DYOR_DAT_0 J2B12 PASSIVE TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B13 PASSIVE TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J1B14 PASSIVE TRA6-70-01.7-R-4-7-F-UG
POS_3V3 J2B59 PASSIVE TRA6-70-01.7-R-4-7-F-UG
expected
DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
答案1
得分: 1
import re
TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']
data = []
with open('input.txt') as file:
for row in file:
row = row.strip()
if row.startswith('['):
tag = row.split(']')[1].strip()
elif row == '':
continue
else:
cols = re.split('\s+', row)
if cols[0].startswith('J') and tag in TAGS:
data.append([tag, cols[0], cols[1], cols[2][:3]])
输出:
# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
['DYOR_DAT_0', 'J2', 'B12', 'APB']]
英文:
Maybe you can use:
import re
TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']
data = []
with open('input.txt') as file:
for row in file:
row = row.strip()
if row.startswith('['):
tag = row.split(']')[1].strip()
elif row == '':
continue
else:
cols = re.split('\s+', row)
if cols[0].startswith('J') and tag in TAGS:
data.append([tag, cols[0], cols[1], cols[2][:3]])
Output:
# '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
>>> data
[['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
['DYOR_DAT_0', 'J2', 'B12', 'APB']]
答案2
得分: 0
你不需要 re 来处理如此简单的事情。
只需逐行读取输入文件。检查一行是否以左括号开头。如果是,保存键值。读取下一行并分割为标记。检查第一个标记的第一个字符是否为 'J'。按要求打印数据:
with open('/Volumes/G-Drive/input.txt') as data:
for line in data:
if line.startswith('['):
k = line.split()[-1]
dl = next(data).split()
if len(dl) > 2 and dl[0][0] == 'J':
print(k, dl[0], dl[1], dl[2][:3])
输出结果:
DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
POS_3V3 J1 B13 FET
英文:
You don't really need re for something so trivial.
Just read the input file one line at a time. Check if a line starts with left bracket. If it does, save the key value. Read the next line and plot into tokens. Check first character of the first token for 'J'. Print data as required:
with open('/Volumes/G-Drive/input.txt') as data:
for line in data:
if line.startswith('['):
k = line.split()[-1]
dl = next(data).split()
if len(dl) > 2 and dl[0][0] == 'J':
print(k, dl[0], dl[1], dl[2][:3])
Output:
DEBUG_SCAR_RX J1 B30 PIO
DEBUG_SCAR_TX J1 B29 PIO
DYOR_DAT_0 J2 B12 APB
POS_3V3 J1 B13 FET
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论