2023年2月24日 16:16:33go评论79阅读模式

英文:

Is there a way to split a string in fixed width chunks in XPath?

问题

使用 xidel 我正在从一个 SAMLResponse 中提取 //Assertion//Signature//KeyInfo//X509Certificate/text()，这是一个长的Base64字符串。

我想将这个字符串分割成64个字符一组。

我尝试过使用 tokenize() 和 replace() 但是没有成功，

似乎 replace() 不允许在替换字符串中使用换行符 \n：

echo &quot;$SAMLRESPONSE&quot; | base64 -D | xidel --xpath &#39;replace(//Assertion//Signature//KeyInfo//X509Certificate/text(),&quot;(.{64})&quot;,&quot;$1\n&quot;)&#39; -
**** 处理中: stdin:/// ****
错误:
err:FORX0004: 无效的替换: $1\n 在 $1\n 之后
可能的回溯:
  $000000010203F668: 可能是 TXQTermTryCatch + 222920 ? 但不太可能
  $0000000102068BBE: 可能是 Q{http://www.w3.org/2005/xpath-functions}tokenize + 166350 ? 但不太可能
  $000000010203FF78: Q{http://www.w3.org/2005/xpath-functions}replace + 376
  $0000000101FF853F: TXQTermNamedFunction + 767
  $0000000101F71CE7: 可能是 ? ? 但不太可能
使用 --trace-stack 选项调用 xidel 以获得实际的回溯信息

而 tokenize() 将整个匹配视为分隔符，并且分隔符不包含在输出中：

echo &quot;$SAMLRESPONSE&quot; | base64 -D | xidel --xpath &#39;tokenize(//Assertion//Signature//KeyInfo//X509Certificate/text(),&quot;(?:.{64})&quot;)&#39; -
**** 处理中: stdin:/// ****
XACcI5tcJbgsvr+ivGPos/WrhywkROwbEBh6OTNXTnaBiiIK

有没有办法在XPath中按固定宽度分割字符串？

英文:

Using xidel I'm extracting the //Assertion//Signature//KeyInfo//X509Certificate/text() from a SAMLResponse, this is a X509 certificate as a long base64 string.

I want to split this string into 64 chars blocks

I tried with tokenize() and replace() but I could make those work,

It seems that replace() does not allow me to use newlines \n in the replacement string:

echo &quot;$SAMLRESPONSE&quot; | base64 -D | xidel --xpath &#39;replace(//Assertion//Signature//KeyInfo//X509Certificate/text(),&quot;(.{64})&quot;,&quot;$1\n&quot;)&#39; -
**** Processing: stdin:/// ****
Error:
err:FORX0004: Invalid replacement: $1\n after $1\n
Possible backtrace:
  $000000010203F668: perhaps TXQTermTryCatch + 222920 ? but unlikely
  $0000000102068BBE: perhaps Q{http://www.w3.org/2005/xpath-functions}tokenize + 166350 ? but unlikely
  $000000010203FF78: Q{http://www.w3.org/2005/xpath-functions}replace + 376
  $0000000101FF853F: TXQTermNamedFunction + 767
  $0000000101F71CE7: perhaps ? ? but unlikely
Call xidel with --trace-stack to get an actual backtrace

And tokenize will treat the whole match as separator, and separator are not included in the output

echo &quot;$SAMLRESPONSE&quot; | base64 -D | xidel --xpath &#39;tokenize(//Assertion//Signature//KeyInfo//X509Certificate/text(),&quot;(?:.{64})&quot;)&#39; -
**** Processing: stdin:/// ****
XACcI5tcJbgsvr+ivGPos/WrhywkROwbEBh6OTNXTnaBiiIK

Is there any way to do split a string in fixed width chunks in XPath?

答案1

得分: 2

你的第一个想法并没有错，你只需要使用&lt;a href=&quot;https://www.w3.org/TR/xpath-functions-31/#func-codepoints-to-string&quot;&gt;codepoints-to-string&lt;/a&gt;函数来生成换行符：
```sh
printf %s &quot;$SAMLRESPONSE&quot; |
base64 -D |
xidel --xpath &#39;
    let
        $cert := //Assertion//Signature//KeyInfo//X509Certificate
    return
        &quot;-----BEGIN CERTIFICATE-----&quot; || codepoints-to-string(10) ||
        replace( $cert, &quot;.{1,64}&quot;, &quot;$0&quot; || codepoints-to-string(10) ) ||
        &quot;-----END CERTIFICATE-----&quot; || codepoints-to-string(10)
&#39; -

<sup>**注意：**我将正则表达式修改为 .{1,64} 以确保“替换”的字符串总是以换行符结尾</sup>

另外： 首先，你甚至不需要使用XPath构建完整的输出。

{
    echo &#39;-----BEGIN CERTIFICATE-----&#39;
    printf %s &quot;$SAMLRESPONSE&quot; |
    base64 -D |
    xidel --xpath &#39;//Assertion//Signature//KeyInfo//X509Certificate&#39; - |
    fold -w 64
    echo &#39;-----END CERTIFICATE-----&#39;
}

英文:

Your first idea wasn't wrong, you just have to use the <a href="https://www.w3.org/TR/xpath-functions-31/#func-codepoints-to-string">codepoints-to-string</a> function for generating the newline character:

printf %s &quot;$SAMLRESPONSE&quot; |
base64 -D |
xidel --xpath &#39;
    let
        $cert := //Assertion//Signature//KeyInfo//X509Certificate
    return
        &quot;-----BEGIN CERTIFICATE-----&quot; || codepoints-to-string(10) ||
        replace( $cert, &quot;.{1,64}&quot;, &quot;$0&quot; || codepoints-to-string(10) ) ||
        &quot;-----END CERTIFICATE-----&quot; || codepoints-to-string(10)
&#39; -

<sup>note: I modified the regex to .{1,64} for making sure that the "replaced" string always ends with a linefeed</sup>

ASIDE: In the first place, you don't even need to build the full output with XPath.

{
    echo &#39;-----BEGIN CERTIFICATE-----&#39;
    printf %s &quot;$SAMLRESPONSE&quot; |
    base64 -D |
    xidel --xpath &#39;//Assertion//Signature//KeyInfo//X509Certificate&#39; - |
    fold -w 64
    echo &#39;-----END CERTIFICATE-----&#39;
}

答案2

得分: 2

It seems that replace() does not allow me to use newlines \n in the replacement string:

这是因为正则表达式不能用在替换字符串中。您必须使用HTML实体或x:cps()：

replace(...,"(.{1,64})","$1&#10;")
replace(...,"(.{1,64})","$1&#x0A;")
replace(...,"(.{1,64})","$1"||x:cps(10))

And tokenize will treat the whole match as separator

https://www.w3.org/TR/xpath-functions-31/#func-tokenize:

返回一个由拆分输入构建的字符串序列，无论何时找到分隔符都会拆分输入

您想要根据它没有的分隔符拆分输入。所以 tokenize() 不适用。作为替代 replace()，您可以使用Xidel自带的 x:extract()。但最重要的是，与parse-xml()和x:binary-to-string()一起，可以使用Xidel更简单地完成所有操作：

$ echo "$SAMLRESPONSE" | xidel -se '
  "-----BEGIN CERTIFICATE-----",
  binary-to-string(base64Binary($raw)) ! extract(
    parse-xml(.)//Assertion//Signature//KeyInfo//X509Certificate,
    ".{1,64}",0,"*"
  ),
  "-----END CERTIFICATE-----"
'
而且，由于换行符是`--output-separator`的默认值，因此不需要`codepoints-to-string(10)`。

英文:

> It seems that replace() does not allow me to use newlines \n in the replacement string:

That's because regular expressions can't be used in the replacement string. You have to use HTML entities or x:cps():

replace(...,&quot;(.{1,64})&quot;,&quot;$1&amp;#10;&quot;)
replace(...,&quot;(.{1,64})&quot;,&quot;$1&amp;#x0A;&quot;)
replace(...,&quot;(.{1,64})&quot;,&quot;$1&quot;||x:cps(10))

> And tokenize will treat the whole match as separator

https://www.w3.org/TR/xpath-functions-31/#func-tokenize:
> Returns a sequence of strings constructed by splitting the input wherever a separator is found

You want to split the input based on a separator it doesn't have. So tokenize() is unsuitable. Instead, as an alternative to replace(), you could use Xidel's own x:extract(). But above all, together with parse-xml() and x:binary-to-string() this can be done much simpler and all with Xidel:

$ echo &quot;$SAMLRESPONSE&quot; | xidel -se &#39;
  &quot;-----BEGIN CERTIFICATE-----&quot;,
  binary-to-string(base64Binary($raw)) ! extract(
    parse-xml(.)//Assertion//Signature//KeyInfo//X509Certificate,
    &quot;.{1,64}&quot;,0,&quot;*&quot;
  ),
  &quot;-----END CERTIFICATE-----&quot;
&#39;

And because a newline is the default value for --output-separator, there's no need for codepoints-to-string(10) either.

答案3

得分: 1

如果您确定某些字符肯定不会出现在原始字符串中（例如$在base64或base64url中不是合法字符），那么您可以结合使用tokenize()和replace()来实现预期的结果：

echo "$SAMLRESPONSE" | base64 -D | xidel -s --xpath 'tokenize(replace(//Assertion//Signature//KeyInfo//X509Certificate/text(),"(.{64})","$1$"),"$$")' -| cat <(echo "-----BEGIN CERTIFICATE-----") - <(echo "-----END CERTIFICATE-----")
-----BEGIN CERTIFICATE-----
MIIC8DCCAdigAwIBAgIQGSvclGcZ8oRINlIUmlg7WzANBgkqhkiG9w0BAQsFADA0
MTIwMAYDVQQDEylNaWNyb3NvZnQgQXp1cmUgRmVkZXJhdGVkIFNTTy CDZXJ0aWZp
Y2F0ZTAeFw0yMDA2MjIwODI4NTlaFw0yMzA2MjIwODI4NTlaMDQxMjAwBgNVBAMT
KU1pY3Jvc29mdCBBenVyZSBGZWRlcmF0ZWQgU1NPIENlcnRpZmljYXRlMIIBIjAN
BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuJds5ZQxHlRF7j10Qey++JJ84vqm
uKjSAsSqCS/JynVs5oDO7oIZvxSdbmwUWDnuBUr8bHyqd/MUYOVCjZvt0zN6+kP0
bmB7B8IP8E2amZB4Hn7bYdrPELcCPjO01gLx6ymLn/kHVUrnYjP0/+r0pos/MeM7
vY6jbCrxLt9cR6e1loC1Z04dyHw0jBHBhqKO5iXe1AVUtmt2zKt27Hck4zndQgMo
Gb8JwekQhRzL+SHLydhVZ5QctyEoT/PkAkrflmhllAGzCYBJkxqAYOk2GTWt5Gi6
/GLm6cxp2KTH7bCJWJTOmfDbJMOEAgAlcXk2KKKPRYFc96Pd5BRyIAlcpQIDAQAB
MA0GCSqGSIb3DQEBCwUAA4IBAQCBmIXI9oVTX7BSiT+hY98UTsc64G4gkuBvwKuh
xxY9oUxrRo6VM/uuArDCjtupk5Wx5YGDWTvcNXmN+h2QQnjK/83hwjsbRP4hAitF
NcvdeQNcfeXTK7Woe1Dmdms2b2U77NnEhD23mv4/IoFnfDDunkOnoottjyQqSOIz
hrO4LIQriCPsHmm/8MYGrHX1KDN69gWYAVSQi7dPcbjhdnNQN00RKQ5XrbktWcFN
GrqVOI0Usy4i7hkcitrOmZfjet5VepXzNfWA2gxgWtWJNbhSBqGT/S+OEdZfNp6s
XACcI5tcJbgsvr+ivGPos/WrhywkROwbEBh6OTNXTnaBiiIK
-----END CERTIFICATE-----

在上述命令中，首先使用replace()匹配64个字符的分组，并将该分组替换为其本身加上末尾的$。然后，您将使用此$作为tokenize的分隔符。

请注意，这仅在您可以访问某些您知道不会出现在原始字符串中的字符（如base64中的$）时才有效。

英文:

If you know some character that for sure does not appear in the original string (for example $ is not a legal character in base64 or base64url) then you can combine tokenize() and replace() to achive the expected result:

echo &quot;$SAMLRESPONSE&quot; | base64 -D | xidel -s --xpath &#39;tokenize(replace(//Assertion//Signature//KeyInfo//X509Certificate/text(),&quot;(.{64})&quot;,&quot;$1$&quot;),&quot;$&quot;)&#39; -| cat &lt;(echo &quot;-----BEGIN CERTIFICATE-----&quot;) - &lt;(echo &quot;-----END CERTIFICATE-----&quot;)
-----BEGIN CERTIFICATE-----
MIIC8DCCAdigAwIBAgIQGSvclGcZ8oRINlIUmlg7WzANBgkqhkiG9w0BAQsFADA0
MTIwMAYDVQQDEylNaWNyb3NvZnQgQXp1cmUgRmVkZXJhdGVkIFNTTyBDZXJ0aWZp
Y2F0ZTAeFw0yMDA2MjIwODI4NTlaFw0yMzA2MjIwODI4NTlaMDQxMjAwBgNVBAMT
KU1pY3Jvc29mdCBBenVyZSBGZWRlcmF0ZWQgU1NPIENlcnRpZmljYXRlMIIBIjAN
BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuJds5ZQxHlRF7j10Qey++JJ84vqm
uKjSAsSqCS/JynVs5oDO7oIZvxSdbmwUWDnuBUr8bHyqd/MUYOVCjZvt0zN6+kP0
bmB7B8IP8E2amZB4Hn7bYdrPELcCPjO01gLx6ymLn/kHVUrnYjP0/+r0pos/MeM7
vY6jbCrxLt9cR6e1loC1Z04dyHw0jBHBhqKO5iXe1AVUtmt2zKt27Hck4zndQgMo
Gb8JwekQhRzL+SHLydhVZ5QctyEoT/PkAkrflmhllAGzCYBJkxqAYOk2GTWt5Gi6
/GLm6cxp2KTH7bCJWJTOmfDbJMOEAgAlcXk2KKKPRYFc96Pd5BRyIAlcpQIDAQAB
MA0GCSqGSIb3DQEBCwUAA4IBAQCBmIXI9oVTX7BSiT+hY98UTsc64G4gkuBvwKuh
xxY9oUxrRo6VM/uuArDCjtupk5Wx5YGDWTvcNXmN+h2QQnjK/83hwjsbRP4hAitF
NcvdeQNcfeXTK7Woe1Dmdms2b2U77NnEhD23mv4/IoFnfDDunkOnoottjyQqSOIz
hrO4LIQriCPsHmm/8MYGrHX1KDN69gWYAVSQi7dPcbjhdnNQN00RKQ5XrbktWcFN
GrqVOI0Usy4i7hkcitrOmZfjet5VepXzNfWA2gxgWtWJNbhSBqGT/S+OEdZfNp6s
XACcI5tcJbgsvr+ivGPos/WrhywkROwbEBh6OTNXTnaBiiIK
-----END CERTIFICATE-----

In the above command first you apply replace() to match groups of 64 characters and replace the group with itself plus a $ at the end. Then you will use this $ as the separator for tokenize.

Again this only works if you have access to some character that you know it can't appear on the original string like $ in the base64 case.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

XPath 中有办法将字符串按固定宽度分割吗？

问题

答案1

答案2

答案3

bash：如何从路径中提取任何目录名

XPath SQL Server中的可选元素

如何在Shell中用多行文本替换特定行

为什么我的cron作业没有正确调用.sh脚本中的命令？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。