如何使用Python正则表达式获取JSX元素的属性?

huangapple go评论82阅读模式
英文:

How to get attributes of a JSX element using Python regular expressions?

问题

我想为JSX代码编写一个简单的解析器。我正在进行静态分析,以强制执行一些公司政策。

这些是可能性:

  1. <Component firstAttribute={dynamicValue} />
  2. <Component firstAttribute="stringValue" secondAttribute={integerValue} />
  3. <Component
  4. firstAttribute={dynamicValue}
  5. secondAttribute="stringValue"
  6. thirdAttribute={numericValue}
  7. booleanAttribute
  8. />
  9. <Component
  10. handlerAttribute={() => {
  11. // JS code here
  12. }}
  13. />
  14. <Component
  15. jsonAttribute={{
  16. key: value,
  17. key2: value2,
  18. }}
  19. />

如您所见,这真的很混乱。而且这些不是唯一可能的代码片段。

对于简单的属性,这并不是什么大问题。我可以使用这个正则表达式轻松提取它们:

  1. <\w+(\s+[\w*]*=['\"\{][^'\"\}]+['\"\}])*

但当JSX和JS混合在一起或开发人员使用字符串插值来为属性提供动态值时,情况就变得非常复杂了。

在我看来,我可能走错了方向,正则表达式在这里可能不是正确的工具。

我是否走在正确的道路上?我应该如何处理这种复杂性?如何可靠地提取元素的属性?

英文:

I want to write a simple parser for JSX code. I'm doing it for static analysis, to enforce some policies in our company.

These are possibilities:

  1. <Component firstAttribute={dynamicValue} />
  2. <Component firstAttribute="stringValue" secondAttribute={integerValue} />
  3. <Component
  4. firstAttribute={dynamicValue}
  5. secondAttribute="stringValue"
  6. thirdAttribute={numericValue}
  7. booleanAttribute
  8. />
  9. <Component
  10. handlerAttribute={() => {
  11. // JS code here
  12. }}
  13. />
  14. <Component
  15. jsonAttribute={{
  16. key: value,
  17. key2: value2,
  18. }}
  19. />

As you see, it's really a mess. And these are not the only possible code snippets.

For simple attributes, it's not a big deal. I can easily extract them using this regular expression:

  1. <\w+(\s+[\w*]*=['\"\{][^'\"\}]+['\"\}])*

But it gets really complicated when JSX and JS are mixed or when developers use string interpolation to provide a dynamic value for an attribute.

It seems to me that I'm not on the right path and regular expression is not the correct tool here.

Am I on the right path? How should I deal with this complexity? How can I extract attributes of elements reliably?

答案1

得分: 2

你不会用正则表达式获得良好且可靠的结果;你需要一个解析器。

幸运的是,esprima 是一个工具:

  1. from pprint import pprint
  2. import esprima
  3. code = """
  4. <Component
  5. firstAttribute={dynamicValue}
  6. secondAttribute="stringValue"
  7. thirdAttribute={numericValue}
  8. booleanAttribute
  9. handlerAttribute={() => {
  10. // JS code here
  11. }}
  12. jsonAttribute={{
  13. key: value,
  14. key2: value2,
  15. }}
  16. />
  17. """
  18. pprint(esprima.parseScript(code, {"jsx": True}))

将会打印出代码的抽象语法树(AST),例如:

  1. {
  2. type: "Program",
  3. sourceType: "script",
  4. body: [
  5. {
  6. type: "ExpressionStatement",
  7. expression: {
  8. type: "JSXElement",
  9. openingElement: {
  10. type: "JSXOpeningElement",
  11. name: {
  12. type: "JSXIdentifier",
  13. name: "Component"
  14. },
  15. selfClosing: True,
  16. attributes: [
  17. {
  18. type: "JSXAttribute",
  19. name: {
  20. type: "JSXIdentifier",
  21. name: "firstAttribute"
  22. },
  23. value: {
  24. type: "JSXExpressionContainer",
  25. expression: {
  26. type: "Identifier",
  27. name: "dynamicValue"
  28. }
  29. }
  30. },
  31. {
  32. type: "JSXAttribute",
  33. name: {
  34. type: "JSXIdentifier",
  35. name: "secondAttribute"
  36. },
  37. value: {
  38. type: "Literal",
  39. value: "stringValue",
  40. raw: "\"stringValue\""
  41. }
  42. },
  43. {
  44. type: "JSXAttribute",
  45. name: {
  46. type: "JSXIdentifier",
  47. name: "thirdAttribute"
  48. },
  49. value: {
  50. type: "JSXExpressionContainer",
  51. expression: {
  52. type: "Identifier",
  53. name: "numericValue"
  54. }
  55. }
  56. },
  57. {
  58. type: "JSXAttribute",
  59. name: {
  60. type: "JSXIdentifier",
  61. name: "booleanAttribute"
  62. }
  63. },
  64. {
  65. type: "JSXAttribute",
  66. name: {
  67. type: "JSXIdentifier",
  68. name: "handlerAttribute"
  69. },
  70. value: {
  71. type: "JSXExpressionContainer",
  72. expression: {
  73. type: "ArrowFunctionExpression",
  74. generator: False,
  75. isAsync: False,
  76. params: [],
  77. body: {
  78. type: "BlockStatement",
  79. body: []
  80. },
  81. expression: False
  82. }
  83. }
  84. },
  85. {
  86. type: "JSXAttribute",
  87. name: {
  88. type: "JSXIdentifier",
  89. name: "jsonAttribute"
  90. },
  91. value: {
  92. type: "JSXExpressionContainer",
  93. expression: {
  94. type: "ObjectExpression",
  95. properties: [
  96. {
  97. type: "Property",
  98. key: {
  99. type: "Identifier",
  100. name: "key"
  101. },
  102. computed: False,
  103. value: {
  104. type: "Identifier",
  105. name: "value"
  106. },
  107. kind: "init",
  108. method: False,
  109. shorthand: False
  110. },
  111. {
  112. type: "Property",
  113. key: {
  114. type: "Identifier",
  115. name: "key2"
  116. },
  117. computed: False,
  118. value: {
  119. type: "Identifier",
  120. name: "value2"
  121. },
  122. kind: "init",
  123. method: False,
  124. shorthand: False
  125. }
  126. ]
  127. }
  128. }
  129. }
  130. ]
  131. },
  132. children: []
  133. }
  134. }
  135. ]
  136. }

然后,从那里进行树遍历。

英文:

You're not going to have a good, reliable time with regular expressions; you'll need a parser.

Happily, esprima is a thing:

  1. from pprint import pprint
  2. import esprima
  3. code = &quot;&quot;&quot;
  4. &lt;Component
  5. firstAttribute={dynamicValue}
  6. secondAttribute=&quot;stringValue&quot;
  7. thirdAttribute={numericValue}
  8. booleanAttribute
  9. handlerAttribute={() =&gt; {
  10. // JS code here
  11. }}
  12. jsonAttribute={{
  13. key: value,
  14. key2: value2,
  15. }}
  16. /&gt;
  17. &quot;&quot;&quot;
  18. pprint(esprima.parseScript(code, {&quot;jsx&quot;: True}))

will print out the AST for the code, e.g.

  1. {
  2. type: &quot;Program&quot;,
  3. sourceType: &quot;script&quot;,
  4. body: [
  5. {
  6. type: &quot;ExpressionStatement&quot;,
  7. expression: {
  8. type: &quot;JSXElement&quot;,
  9. openingElement: {
  10. type: &quot;JSXOpeningElement&quot;,
  11. name: {
  12. type: &quot;JSXIdentifier&quot;,
  13. name: &quot;Component&quot;
  14. },
  15. selfClosing: True,
  16. attributes: [
  17. {
  18. type: &quot;JSXAttribute&quot;,
  19. name: {
  20. type: &quot;JSXIdentifier&quot;,
  21. name: &quot;firstAttribute&quot;
  22. },
  23. value: {
  24. type: &quot;JSXExpressionContainer&quot;,
  25. expression: {
  26. type: &quot;Identifier&quot;,
  27. name: &quot;dynamicValue&quot;
  28. }
  29. }
  30. },
  31. {
  32. type: &quot;JSXAttribute&quot;,
  33. name: {
  34. type: &quot;JSXIdentifier&quot;,
  35. name: &quot;secondAttribute&quot;
  36. },
  37. value: {
  38. type: &quot;Literal&quot;,
  39. value: &quot;stringValue&quot;,
  40. raw: &quot;\&quot;stringValue\&quot;&quot;
  41. }
  42. },
  43. {
  44. type: &quot;JSXAttribute&quot;,
  45. name: {
  46. type: &quot;JSXIdentifier&quot;,
  47. name: &quot;thirdAttribute&quot;
  48. },
  49. value: {
  50. type: &quot;JSXExpressionContainer&quot;,
  51. expression: {
  52. type: &quot;Identifier&quot;,
  53. name: &quot;numericValue&quot;
  54. }
  55. }
  56. },
  57. {
  58. type: &quot;JSXAttribute&quot;,
  59. name: {
  60. type: &quot;JSXIdentifier&quot;,
  61. name: &quot;booleanAttribute&quot;
  62. }
  63. },
  64. {
  65. type: &quot;JSXAttribute&quot;,
  66. name: {
  67. type: &quot;JSXIdentifier&quot;,
  68. name: &quot;handlerAttribute&quot;
  69. },
  70. value: {
  71. type: &quot;JSXExpressionContainer&quot;,
  72. expression: {
  73. type: &quot;ArrowFunctionExpression&quot;,
  74. generator: False,
  75. isAsync: False,
  76. params: [],
  77. body: {
  78. type: &quot;BlockStatement&quot;,
  79. body: []
  80. },
  81. expression: False
  82. }
  83. }
  84. },
  85. {
  86. type: &quot;JSXAttribute&quot;,
  87. name: {
  88. type: &quot;JSXIdentifier&quot;,
  89. name: &quot;jsonAttribute&quot;
  90. },
  91. value: {
  92. type: &quot;JSXExpressionContainer&quot;,
  93. expression: {
  94. type: &quot;ObjectExpression&quot;,
  95. properties: [
  96. {
  97. type: &quot;Property&quot;,
  98. key: {
  99. type: &quot;Identifier&quot;,
  100. name: &quot;key&quot;
  101. },
  102. computed: False,
  103. value: {
  104. type: &quot;Identifier&quot;,
  105. name: &quot;value&quot;
  106. },
  107. kind: &quot;init&quot;,
  108. method: False,
  109. shorthand: False
  110. },
  111. {
  112. type: &quot;Property&quot;,
  113. key: {
  114. type: &quot;Identifier&quot;,
  115. name: &quot;key2&quot;
  116. },
  117. computed: False,
  118. value: {
  119. type: &quot;Identifier&quot;,
  120. name: &quot;value2&quot;
  121. },
  122. kind: &quot;init&quot;,
  123. method: False,
  124. shorthand: False
  125. }
  126. ]
  127. }
  128. }
  129. }
  130. ]
  131. },
  132. children: []
  133. }
  134. }
  135. ]
  136. }

and from there it's just a matter of some tree traversal.

huangapple
  • 本文由 发表于 2023年6月2日 14:35:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387687.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定