如何使用Python正则表达式获取JSX元素的属性?

huangapple go评论53阅读模式
英文:

How to get attributes of a JSX element using Python regular expressions?

问题

我想为JSX代码编写一个简单的解析器。我正在进行静态分析,以强制执行一些公司政策。

这些是可能性:

<Component firstAttribute={dynamicValue} />

<Component firstAttribute="stringValue" secondAttribute={integerValue} />

<Component
   firstAttribute={dynamicValue}
   secondAttribute="stringValue"
   thirdAttribute={numericValue}
   booleanAttribute
/>

<Component
    handlerAttribute={() => {
        // JS code here
    }}
/>

<Component
   jsonAttribute={{
      key: value,
      key2: value2,
   }}
/>

如您所见,这真的很混乱。而且这些不是唯一可能的代码片段。

对于简单的属性,这并不是什么大问题。我可以使用这个正则表达式轻松提取它们:

<\w+(\s+[\w*]*=['\"\{][^'\"\}]+['\"\}])*

但当JSX和JS混合在一起或开发人员使用字符串插值来为属性提供动态值时,情况就变得非常复杂了。

在我看来,我可能走错了方向,正则表达式在这里可能不是正确的工具。

我是否走在正确的道路上?我应该如何处理这种复杂性?如何可靠地提取元素的属性?

英文:

I want to write a simple parser for JSX code. I'm doing it for static analysis, to enforce some policies in our company.

These are possibilities:

<Component firstAttribute={dynamicValue} />

<Component firstAttribute="stringValue" secondAttribute={integerValue} />

<Component
   firstAttribute={dynamicValue}
   secondAttribute="stringValue"
   thirdAttribute={numericValue}
   booleanAttribute
/>

<Component
    handlerAttribute={() => {
        // JS code here
    }}
/>

<Component
   jsonAttribute={{
      key: value,
      key2: value2,
   }}
/>

As you see, it's really a mess. And these are not the only possible code snippets.

For simple attributes, it's not a big deal. I can easily extract them using this regular expression:

<\w+(\s+[\w*]*=['\"\{][^'\"\}]+['\"\}])*

But it gets really complicated when JSX and JS are mixed or when developers use string interpolation to provide a dynamic value for an attribute.

It seems to me that I'm not on the right path and regular expression is not the correct tool here.

Am I on the right path? How should I deal with this complexity? How can I extract attributes of elements reliably?

答案1

得分: 2

你不会用正则表达式获得良好且可靠的结果;你需要一个解析器。

幸运的是,esprima 是一个工具:

from pprint import pprint

import esprima

code = """
<Component
   firstAttribute={dynamicValue}
   secondAttribute="stringValue"
   thirdAttribute={numericValue}
   booleanAttribute
    handlerAttribute={() => {
        // JS code here
    }}
   jsonAttribute={{
      key: value,
      key2: value2,
   }}
/>
"""

pprint(esprima.parseScript(code, {"jsx": True}))

将会打印出代码的抽象语法树(AST),例如:

{
    type: "Program",
    sourceType: "script",
    body: [
        {
            type: "ExpressionStatement",
            expression: {
                type: "JSXElement",
                openingElement: {
                    type: "JSXOpeningElement",
                    name: {
                        type: "JSXIdentifier",
                        name: "Component"
                    },
                    selfClosing: True,
                    attributes: [
                        {
                            type: "JSXAttribute",
                            name: {
                                type: "JSXIdentifier",
                                name: "firstAttribute"
                            },
                            value: {
                                type: "JSXExpressionContainer",
                                expression: {
                                    type: "Identifier",
                                    name: "dynamicValue"
                                }
                            }
                        },
                        {
                            type: "JSXAttribute",
                            name: {
                                type: "JSXIdentifier",
                                name: "secondAttribute"
                            },
                            value: {
                                type: "Literal",
                                value: "stringValue",
                                raw: "\"stringValue\""
                            }
                        },
                        {
                            type: "JSXAttribute",
                            name: {
                                type: "JSXIdentifier",
                                name: "thirdAttribute"
                            },
                            value: {
                                type: "JSXExpressionContainer",
                                expression: {
                                    type: "Identifier",
                                    name: "numericValue"
                                }
                            }
                        },
                        {
                            type: "JSXAttribute",
                            name: {
                                type: "JSXIdentifier",
                                name: "booleanAttribute"
                            }
                        },
                        {
                            type: "JSXAttribute",
                            name: {
                                type: "JSXIdentifier",
                                name: "handlerAttribute"
                            },
                            value: {
                                type: "JSXExpressionContainer",
                                expression: {
                                    type: "ArrowFunctionExpression",
                                    generator: False,
                                    isAsync: False,
                                    params: [],
                                    body: {
                                        type: "BlockStatement",
                                        body: []
                                    },
                                    expression: False
                                }
                            }
                        },
                        {
                            type: "JSXAttribute",
                            name: {
                                type: "JSXIdentifier",
                                name: "jsonAttribute"
                            },
                            value: {
                                type: "JSXExpressionContainer",
                                expression: {
                                    type: "ObjectExpression",
                                    properties: [
                                        {
                                            type: "Property",
                                            key: {
                                                type: "Identifier",
                                                name: "key"
                                            },
                                            computed: False,
                                            value: {
                                                type: "Identifier",
                                                name: "value"
                                            },
                                            kind: "init",
                                            method: False,
                                            shorthand: False
                                        },
                                        {
                                            type: "Property",
                                            key: {
                                                type: "Identifier",
                                                name: "key2"
                                            },
                                            computed: False,
                                            value: {
                                                type: "Identifier",
                                                name: "value2"
                                            },
                                            kind: "init",
                                            method: False,
                                            shorthand: False
                                        }
                                    ]
                                }
                            }
                        }
                    ]
                },
                children: []
            }
        }
    ]
}

然后,从那里进行树遍历。

英文:

You're not going to have a good, reliable time with regular expressions; you'll need a parser.

Happily, esprima is a thing:

from pprint import pprint
import esprima
code = &quot;&quot;&quot;
&lt;Component
firstAttribute={dynamicValue}
secondAttribute=&quot;stringValue&quot;
thirdAttribute={numericValue}
booleanAttribute
handlerAttribute={() =&gt; {
// JS code here
}}
jsonAttribute={{
key: value,
key2: value2,
}}
/&gt;
&quot;&quot;&quot;
pprint(esprima.parseScript(code, {&quot;jsx&quot;: True}))

will print out the AST for the code, e.g.

{
type: &quot;Program&quot;,
sourceType: &quot;script&quot;,
body: [
{
type: &quot;ExpressionStatement&quot;,
expression: {
type: &quot;JSXElement&quot;,
openingElement: {
type: &quot;JSXOpeningElement&quot;,
name: {
type: &quot;JSXIdentifier&quot;,
name: &quot;Component&quot;
},
selfClosing: True,
attributes: [
{
type: &quot;JSXAttribute&quot;,
name: {
type: &quot;JSXIdentifier&quot;,
name: &quot;firstAttribute&quot;
},
value: {
type: &quot;JSXExpressionContainer&quot;,
expression: {
type: &quot;Identifier&quot;,
name: &quot;dynamicValue&quot;
}
}
},
{
type: &quot;JSXAttribute&quot;,
name: {
type: &quot;JSXIdentifier&quot;,
name: &quot;secondAttribute&quot;
},
value: {
type: &quot;Literal&quot;,
value: &quot;stringValue&quot;,
raw: &quot;\&quot;stringValue\&quot;&quot;
}
},
{
type: &quot;JSXAttribute&quot;,
name: {
type: &quot;JSXIdentifier&quot;,
name: &quot;thirdAttribute&quot;
},
value: {
type: &quot;JSXExpressionContainer&quot;,
expression: {
type: &quot;Identifier&quot;,
name: &quot;numericValue&quot;
}
}
},
{
type: &quot;JSXAttribute&quot;,
name: {
type: &quot;JSXIdentifier&quot;,
name: &quot;booleanAttribute&quot;
}
},
{
type: &quot;JSXAttribute&quot;,
name: {
type: &quot;JSXIdentifier&quot;,
name: &quot;handlerAttribute&quot;
},
value: {
type: &quot;JSXExpressionContainer&quot;,
expression: {
type: &quot;ArrowFunctionExpression&quot;,
generator: False,
isAsync: False,
params: [],
body: {
type: &quot;BlockStatement&quot;,
body: []
},
expression: False
}
}
},
{
type: &quot;JSXAttribute&quot;,
name: {
type: &quot;JSXIdentifier&quot;,
name: &quot;jsonAttribute&quot;
},
value: {
type: &quot;JSXExpressionContainer&quot;,
expression: {
type: &quot;ObjectExpression&quot;,
properties: [
{
type: &quot;Property&quot;,
key: {
type: &quot;Identifier&quot;,
name: &quot;key&quot;
},
computed: False,
value: {
type: &quot;Identifier&quot;,
name: &quot;value&quot;
},
kind: &quot;init&quot;,
method: False,
shorthand: False
},
{
type: &quot;Property&quot;,
key: {
type: &quot;Identifier&quot;,
name: &quot;key2&quot;
},
computed: False,
value: {
type: &quot;Identifier&quot;,
name: &quot;value2&quot;
},
kind: &quot;init&quot;,
method: False,
shorthand: False
}
]
}
}
}
]
},
children: []
}
}
]
}

and from there it's just a matter of some tree traversal.

huangapple
  • 本文由 发表于 2023年6月2日 14:35:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387687.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定