英文:
How to get attributes of a JSX element using Python regular expressions?
问题
我想为JSX代码编写一个简单的解析器。我正在进行静态分析,以强制执行一些公司政策。
这些是可能性:
<Component firstAttribute={dynamicValue} />
<Component firstAttribute="stringValue" secondAttribute={integerValue} />
<Component
firstAttribute={dynamicValue}
secondAttribute="stringValue"
thirdAttribute={numericValue}
booleanAttribute
/>
<Component
handlerAttribute={() => {
// JS code here
}}
/>
<Component
jsonAttribute={{
key: value,
key2: value2,
}}
/>
如您所见,这真的很混乱。而且这些不是唯一可能的代码片段。
对于简单的属性,这并不是什么大问题。我可以使用这个正则表达式轻松提取它们:
<\w+(\s+[\w*]*=['\"\{][^'\"\}]+['\"\}])*
但当JSX和JS混合在一起或开发人员使用字符串插值来为属性提供动态值时,情况就变得非常复杂了。
在我看来,我可能走错了方向,正则表达式在这里可能不是正确的工具。
我是否走在正确的道路上?我应该如何处理这种复杂性?如何可靠地提取元素的属性?
英文:
I want to write a simple parser for JSX code. I'm doing it for static analysis, to enforce some policies in our company.
These are possibilities:
<Component firstAttribute={dynamicValue} />
<Component firstAttribute="stringValue" secondAttribute={integerValue} />
<Component
firstAttribute={dynamicValue}
secondAttribute="stringValue"
thirdAttribute={numericValue}
booleanAttribute
/>
<Component
handlerAttribute={() => {
// JS code here
}}
/>
<Component
jsonAttribute={{
key: value,
key2: value2,
}}
/>
As you see, it's really a mess. And these are not the only possible code snippets.
For simple attributes, it's not a big deal. I can easily extract them using this regular expression:
<\w+(\s+[\w*]*=['\"\{][^'\"\}]+['\"\}])*
But it gets really complicated when JSX and JS are mixed or when developers use string interpolation to provide a dynamic value for an attribute.
It seems to me that I'm not on the right path and regular expression is not the correct tool here.
Am I on the right path? How should I deal with this complexity? How can I extract attributes of elements reliably?
答案1
得分: 2
你不会用正则表达式获得良好且可靠的结果;你需要一个解析器。
幸运的是,esprima
是一个工具:
from pprint import pprint
import esprima
code = """
<Component
firstAttribute={dynamicValue}
secondAttribute="stringValue"
thirdAttribute={numericValue}
booleanAttribute
handlerAttribute={() => {
// JS code here
}}
jsonAttribute={{
key: value,
key2: value2,
}}
/>
"""
pprint(esprima.parseScript(code, {"jsx": True}))
将会打印出代码的抽象语法树(AST),例如:
{
type: "Program",
sourceType: "script",
body: [
{
type: "ExpressionStatement",
expression: {
type: "JSXElement",
openingElement: {
type: "JSXOpeningElement",
name: {
type: "JSXIdentifier",
name: "Component"
},
selfClosing: True,
attributes: [
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "firstAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "Identifier",
name: "dynamicValue"
}
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "secondAttribute"
},
value: {
type: "Literal",
value: "stringValue",
raw: "\"stringValue\""
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "thirdAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "Identifier",
name: "numericValue"
}
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "booleanAttribute"
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "handlerAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "ArrowFunctionExpression",
generator: False,
isAsync: False,
params: [],
body: {
type: "BlockStatement",
body: []
},
expression: False
}
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "jsonAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "ObjectExpression",
properties: [
{
type: "Property",
key: {
type: "Identifier",
name: "key"
},
computed: False,
value: {
type: "Identifier",
name: "value"
},
kind: "init",
method: False,
shorthand: False
},
{
type: "Property",
key: {
type: "Identifier",
name: "key2"
},
computed: False,
value: {
type: "Identifier",
name: "value2"
},
kind: "init",
method: False,
shorthand: False
}
]
}
}
}
]
},
children: []
}
}
]
}
然后,从那里进行树遍历。
英文:
You're not going to have a good, reliable time with regular expressions; you'll need a parser.
Happily, esprima
is a thing:
from pprint import pprint
import esprima
code = """
<Component
firstAttribute={dynamicValue}
secondAttribute="stringValue"
thirdAttribute={numericValue}
booleanAttribute
handlerAttribute={() => {
// JS code here
}}
jsonAttribute={{
key: value,
key2: value2,
}}
/>
"""
pprint(esprima.parseScript(code, {"jsx": True}))
will print out the AST for the code, e.g.
{
type: "Program",
sourceType: "script",
body: [
{
type: "ExpressionStatement",
expression: {
type: "JSXElement",
openingElement: {
type: "JSXOpeningElement",
name: {
type: "JSXIdentifier",
name: "Component"
},
selfClosing: True,
attributes: [
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "firstAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "Identifier",
name: "dynamicValue"
}
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "secondAttribute"
},
value: {
type: "Literal",
value: "stringValue",
raw: "\"stringValue\""
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "thirdAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "Identifier",
name: "numericValue"
}
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "booleanAttribute"
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "handlerAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "ArrowFunctionExpression",
generator: False,
isAsync: False,
params: [],
body: {
type: "BlockStatement",
body: []
},
expression: False
}
}
},
{
type: "JSXAttribute",
name: {
type: "JSXIdentifier",
name: "jsonAttribute"
},
value: {
type: "JSXExpressionContainer",
expression: {
type: "ObjectExpression",
properties: [
{
type: "Property",
key: {
type: "Identifier",
name: "key"
},
computed: False,
value: {
type: "Identifier",
name: "value"
},
kind: "init",
method: False,
shorthand: False
},
{
type: "Property",
key: {
type: "Identifier",
name: "key2"
},
computed: False,
value: {
type: "Identifier",
name: "value2"
},
kind: "init",
method: False,
shorthand: False
}
]
}
}
}
]
},
children: []
}
}
]
}
and from there it's just a matter of some tree traversal.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论