2023年8月10日 23:06:08go评论128阅读模式

英文:

How do I split a field in tab separated values?

问题

我试图使用Sublime Text 3和RegReplace自动化我经常执行的一系列正则表达式。运行了几个规则后，我得到了以下类似的数据（数百行，以制表符分隔）：

M    543    E0385-C-R    BSC    SDA    SV    After    Quarterly    N/A    N/A
M    543    A-2    Room    SDA    AV    N/A    Quarterly    N/A    N/A
M    543    H9    BSC    SDA    SV    After    Quarterly    N/A    N/A

我的下一步是在第三列的第一个破折号 '-' 处拆分它（如果存在），将 '-' 替换为 \t。如果没有破折号，我想在该列末尾添加一个 \t（以保持所有行中列的一致性）。以下是我想要的输出：

M    543    E0385    C-R    BSC    SDA    SV    After    Quarterly    N/A    N/A
M    543    A    2    Room    SDA    AV    N/A    Quarterly    N/A    N/A
M    543    H9        BSC    SDA    SV    After    Quarterly    N/A    N/A

前两行在第一个破折号处拆分。第三行没有破折号，所以我在第三个字段的末尾插入了一个制表符。

到目前为止，我已经想出了如何匹配列的内容（看起来还有点笨拙）：

(?<=\t|^)[a-zA-Z0-9-\/]*(?=\t|$)

然后我会找到第三个匹配项，并进一步处理它以替换第一个破折号。我不太确定如何做这个（很高兴有人能向我展示如何进行这种“嵌套”评估）。

另一种方法带我来到这里：

(?<=\t\d{3}\t)([a-zA-Z0-9-]*)\K-

这假设第二列中总是有三位数... 这是我不信任的假设。

因此，该模式的一般化版本为：

(?:\S+\t){2}\S*?\K(-)

我认为这非常不错。在这里查看它。

如何使此表达式在没有破折号的情况下匹配“单词”的末尾？
我能使它更健壮（以便后续列中的破折号不匹配）吗？
是否有更好的方法解决这个问题（例如上面的两级匹配），我该如何实现？

英文:

I'm trying to automate a series of regular expressions that I perform regularly by using Sublime Text 3 and RegReplace. After running a few rules, I get data that looks like this (hundreds of lines, tab-separated)

M	543	E0385-C-R	BSC	SDA	SV	After	Quarterly	N/A	N/A
M	543	A-2	Room	SDA	AV	N/A	Quarterly	N/A	N/A
M	543	H9	BSC	SDA	SV	After	Quarterly	N/A	N/A

My next step is to split the third columns at the first dash '-' if it exists by replacing the '-' with a \t. If there is no dash, I would like to add a \t at the end of the column (this is to keep my columns consistent across all the rows. Here is the output I'm looking for:

M	543	E0385	C-R	BSC	SDA	SV	After	Quarterly	N/A	N/A
M	543	A 	2	Room	SDA	AV	N/A	Quarterly	N/A	N/A
M	543	H9		BSC	SDA	SV	After	Quarterly	N/A	N/A

The first two lines split at the first dash. The third line does not have a dash, so I insert a tab at the end of the third field.

So far I have figured out how to match the content of the columns (It still seems a little clumsy) with:

(?&lt;=\t|^)[a-zA-Z0-9-\/]*(?=\t|$)

I would then find the third match and further process it with to replace the first dash. Not really sure how to do this (would be happy to have someone show me how you do this 'nested' evaluation).

A different approach brought me here:

(?&lt;=\t\d{3}\t)([a-zA-Z0-9-]*)\K-

This assumes that there are always three digits in the second column... an assumption I don't trust.

So a generalization of the pattern got me:

(?:\S+\t){2}\S*?\K(-)

which is pretty good, I think. See it here.

How do I get this expression to match the end of the 'word' if there is no dash?
Can I make it more robust (so a dash in a subsequent column does not match)?
Is there a better approach to this problem (like the two-level match above, maybe) and how do I implement it?

答案1

得分: 1

替代方案如下：

查找：^((?:[^\t]*\t){2}[^\t-]*)\K-?

替换：\t

英文:

It is indeed not a good idea to rely on the presence of 3 digits as an indication for the second column.

Instead make use of the start-of-line assertion (^) and match the first two columns and the third column up to the first hyphen (if it is there). Then start capturing from that point onwards (\K) and capture the hyphen if it is there, otherwise you'll just capture an empty string. The replace that with a TAB:

Find: ^((?:[^\t]*\t){2}[^\t-]*)\K-?

Replace: \t

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在制表符分隔的数值中拆分字段？

问题

答案1

删除所有文本，在第一个括号之前和最后一个括号之后的字符。

我正在寻找一个在Python中识别C文件中所有函数体的正则表达式。

将无效的HTML标记替换为<和>为<和>

正则表达式分割导致额外的空格，使用 \\s 但不能去除它

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。