提取列,在文件中跳过某些行以进行数据处理。

huangapple go评论237阅读模式
英文:

extracting columns, skipping certain rows in a file for data processing

问题

  1. import re
  2. # Read the input file
  3. with open('input.txt', 'r') as file:
  4. content = file.readlines()
  5. # Process the data and extract the required information
  6. result = []
  7. component_name = ""
  8. extract = False
  9. for line in content:
  10. line = line.strip()
  11. if re.match(r'\[\d+\] \w+', line):
  12. component_name = re.search(r'\w+', line).group()
  13. elif extract and line.startswith('J'):
  14. column_data = re.split(r'\s+', line, maxsplit=2)
  15. if column_data[0].startswith('J'):
  16. result.append(f"{component_name}\t{column_data[0]}{column_data[1][:3]}\t{column_data[2]}")
  17. elif re.match(r'\[\d+\] GND', line):
  18. extract = True
  19. elif re.match(r'\[\d+\] \w+', line) and extract:
  20. extract = False
  21. for item in result:
  22. print(item)

This modified test.py code should produce the expected output:

  1. DEBUG_SCAR_RX J1B30 PIO PASSIVE TRA6-70-01.7-R-4-7-F-UG
  2. DEBUG_SCAR_TX J1B29 PIO PASSIVE TRA6-70-01.7-R-4-7-F-UG
  3. DYOR_DAT_0 J2B12 APB10_CC_P PASSIVE TRA6-70-01.7-R-4-7-F-UG

It extracts the required information according to your specified patterns and removes lines between [00033] GND and the next [XXXYY] {TAG} pattern.

英文:

I am trying to process the input.txt using the test.py script to extract specific information as shown in the expected output. I have got the basic stub, but the regex apparently is not extracting the specific column details I am expecting. I have shown the expected output for your reference.

In general, I am looking for a [XXXYY] {TAG} pattern and once I find that pattern, if the next column starts with J, extract column 1, column 2 and (first 3 characters of) column3. I am also interested in knowing how to remove certain lines after [00033] GND ( and [00272] POS_3V3) until I see the next [XXXYY] {TAG} pattern. I am restricted to using python 2.7.5, re and csv library and cannot use pandas.

input.txt
  1. <<< Test List >>>
  2. Mounting Hole MH1 APBC_MH_3.2x7cm
  3. Mounting Hole MH2 APBC_MH_3.2x7cm
  4. Mounting Hole MH3 APBC_MH_3.2x7cm
  5. Mounting Hole MH4 APBC_MH_3.2x7cm
  6. [00001] DEBUG_SCAR_RX
  7. J1 B30 PIO37 PASSIVE TRA6-70-01.7-R-4-7-F-UG
  8. R2 2 2 PASSIVE 4.7kR
  9. [00002] DEBUG_SCAR_TX
  10. J1 B29 PIO36 PASSIVE TRA6-70-01.7-R-4-7-F-UG
  11. [00003] DYOR_DAT_0
  12. J2 B12 APB10_CC_P PASSIVE TRA6-70-01.7-R-4-7-F-UG
  13. [00033] GND
  14. DP1 5 5 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
  15. DP1 6 6 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
  16. DP1 7 7 PASSIVE MECH, DIP_SWITCH, FFFN-04F-V
  17. [00271] POS_3.3V_INH
  18. Q2 3 DRAIN PASSIVE 2N7002
  19. R34 2 2 PASSIVE 4.7kR
  20. [00272] POS_3V3
  21. J1 B13 FETO_FAT PASSIVE TRA6-70-01.7-R-4-7-F-UG
  22. J1 B14 FETO_FAT PASSIVE TRA6-70-01.7-R-4-7-F-UG
  23. J2 B59 FETO_HDB PASSIVE TRA6-70-01.7-R-4-7-F-UG
test.py
  1. import re
  2. # Read the input file
  3. with open('input.txt', 'r') as file:
  4. content = file.readlines()
  5. # Process the data and extract the required information
  6. result = []
  7. component_name = ""
  8. for line in content:
  9. line = line.strip()
  10. if line.startswith("["):
  11. s = re.sub(r"([\[0-9]+\]) (\w+)$", r"", line)
  12. elif line.startswith("J"):
  13. sp = re.sub(r"^(\w+)\s+(\w+)\s+(\w+)", r"", line)
  14. print("%s\t%s" % (s, sp))
output
  1. DEBUG_SCAR_RX J1B30 PASSIVE TRA6-70-01.7-R-4-7-F-UG
  2. DEBUG_SCAR_TX J1B29 PASSIVE TRA6-70-01.7-R-4-7-F-UG
  3. DYOR_DAT_0 J2B12 PASSIVE TRA6-70-01.7-R-4-7-F-UG
  4. POS_3V3 J1B13 PASSIVE TRA6-70-01.7-R-4-7-F-UG
  5. POS_3V3 J1B14 PASSIVE TRA6-70-01.7-R-4-7-F-UG
  6. POS_3V3 J2B59 PASSIVE TRA6-70-01.7-R-4-7-F-UG
expected
  1. DEBUG_SCAR_RX J1 B30 PIO
  2. DEBUG_SCAR_TX J1 B29 PIO
  3. DYOR_DAT_0 J2 B12 APB

答案1

得分: 1

  1. import re
  2. TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']
  3. data = []
  4. with open('input.txt') as file:
  5. for row in file:
  6. row = row.strip()
  7. if row.startswith('['):
  8. tag = row.split(']')[1].strip()
  9. elif row == '':
  10. continue
  11. else:
  12. cols = re.split('\s+', row)
  13. if cols[0].startswith('J') and tag in TAGS:
  14. data.append([tag, cols[0], cols[1], cols[2][:3]])

输出:

  1. # '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
  2. >>> data
  3. [['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
  4. ['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
  5. ['DYOR_DAT_0', 'J2', 'B12', 'APB']]
英文:

Maybe you can use:

  1. import re
  2. TAGS = ['DEBUG_SCAR_RX', 'DEBUG_SCAR_TX', 'DYOR_DAT_0']
  3. data = []
  4. with open('input.txt') as file:
  5. for row in file:
  6. row = row.strip()
  7. if row.startswith('['):
  8. tag = row.split(']')[1].strip()
  9. elif row == '':
  10. continue
  11. else:
  12. cols = re.split('\s+', row)
  13. if cols[0].startswith('J') and tag in TAGS:
  14. data.append([tag, cols[0], cols[1], cols[2][:3]])

Output:

  1. # '2.7.18 (default, Jan 23 2023, 08:22:06) \n[GCC 12.2.0]'
  2. >>> data
  3. [['DEBUG_SCAR_RX', 'J1', 'B30', 'PIO'],
  4. ['DEBUG_SCAR_TX', 'J1', 'B29', 'PIO'],
  5. ['DYOR_DAT_0', 'J2', 'B12', 'APB']]

答案2

得分: 0

你不需要 re 来处理如此简单的事情。

只需逐行读取输入文件。检查一行是否以左括号开头。如果是,保存键值。读取下一行并分割为标记。检查第一个标记的第一个字符是否为 'J'。按要求打印数据:

  1. with open('/Volumes/G-Drive/input.txt') as data:
  2. for line in data:
  3. if line.startswith('['):
  4. k = line.split()[-1]
  5. dl = next(data).split()
  6. if len(dl) > 2 and dl[0][0] == 'J':
  7. print(k, dl[0], dl[1], dl[2][:3])

输出结果:

  1. DEBUG_SCAR_RX J1 B30 PIO
  2. DEBUG_SCAR_TX J1 B29 PIO
  3. DYOR_DAT_0 J2 B12 APB
  4. POS_3V3 J1 B13 FET
英文:

You don't really need re for something so trivial.

Just read the input file one line at a time. Check if a line starts with left bracket. If it does, save the key value. Read the next line and plot into tokens. Check first character of the first token for 'J'. Print data as required:

  1. with open('/Volumes/G-Drive/input.txt') as data:
  2. for line in data:
  3. if line.startswith('['):
  4. k = line.split()[-1]
  5. dl = next(data).split()
  6. if len(dl) > 2 and dl[0][0] == 'J':
  7. print(k, dl[0], dl[1], dl[2][:3])

Output:

  1. DEBUG_SCAR_RX J1 B30 PIO
  2. DEBUG_SCAR_TX J1 B29 PIO
  3. DYOR_DAT_0 J2 B12 APB
  4. POS_3V3 J1 B13 FET

huangapple
  • 本文由 发表于 2023年6月5日 17:05:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76404918.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定