Ommit the tag from the text using golang?

huangapple go评论131阅读模式
英文:

Ommit the tag from the text using golang?

问题

我想使用Golang编写一些模板,并且想要从类似xlsx的XML中省略一些标签。XML源代码如下:

input := `<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?> 
  <Worksheet ss:Name="sheet1">
  <Names>
   <NamedRange ss:Name="_FilterDatabase" ss:RefersTo="=sheet!R3C1:R3C13"
    ss:Hidden="1"/>
  </Names>
  <Table ss:ExpandedColumnCount="15" ss:ExpandedRowCount="7" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="52.8" ss:DefaultRowHeight="15.45">
   <Column ss:AutoFitWidth="0" ss:Width="37.200000000000003"/>
   <Column ss:AutoFitWidth="0" ss:Width="67.2"/>
   <Column ss:AutoFitWidth="0" ss:Width="75.600000000000009"/>
   <Column ss:AutoFitWidth="0" ss:Width="71.400000000000006"/>  

   <Row ss:AutoFitHeight="0">
    <Cell ss:MergeAcross="12" ss:MergeDown="1" ss:StyleID="s63"><Data
      ss:Type="String">This is a title of the sheet!</Data></Cell>
   </Row>

   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s69"><Data ss:Type="String">{{range $prj:=.prj}}</Data></Cell>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
   </Row>
   <Row ss:AutoFitHeight="0" ss:Height="45.449999999999996">
    <Cell ss:StyleID="s72"/>
    <Cell ss:StyleID="s70"><Data ss:Type="String">{{$prj.PrjName}}</Data></Cell>
    <Cell ss:StyleID="s70"><Data ss:Type="String">{{$prj.ConstrDept}}</Data></Cell>
    <Cell ss:StyleID="s71"><Data ss:Type="String">{{$prj.Assumer}}</Data></Cell>
    <Cell ss:StyleID="s71"><Data ss:Type="String">{{$prj.ReplyNo}}</Data></Cell>
    <Cell ss:StyleID="s71"><Data ss:Type="String">{{$prj.AnPingNo}}</Data></Cell>   

   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s73"><Data ss:Type="String">{{end}}</Data></Cell>
   </Row>
  </Table> 
 </Worksheet>
</Workbook>`

我想要获取以下内容:

<Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s69"><Data ss:Type="String">{{range $prj:=.prj}}</Data></Cell>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
</Row>

在这一行中,我只想获取"{{range $prj:=.prj}}",并省略围绕"<Row>"的标签。

<Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s73"><Data ss:Type="String">{{end}}</Data></Cell>
</Row>

在这一行中,我只想获取"{{end}}",并省略围绕"<Row>"的标签。

  1. 其他标签,我希望保留。如何实现?
英文:

I want to do some template by using golang,and want to ommit some tags from a xml like xlsx. the xml source like this:

  input := `<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?> 
  <Worksheet ss:Name="sheet1">
  <Names>
   <NamedRange ss:Name="_FilterDatabase" ss:RefersTo="=sheet!R3C1:R3C13"
    ss:Hidden="1"/>
  </Names>
  <Table ss:ExpandedColumnCount="15" ss:ExpandedRowCount="7" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="52.8" ss:DefaultRowHeight="15.45">
   <Column ss:AutoFitWidth="0" ss:Width="37.200000000000003"/>
   <Column ss:AutoFitWidth="0" ss:Width="67.2"/>
   <Column ss:AutoFitWidth="0" ss:Width="75.600000000000009"/>
   <Column ss:AutoFitWidth="0" ss:Width="71.400000000000006"/>  
 
   <Row ss:AutoFitHeight="0">
    <Cell ss:MergeAcross="12" ss:MergeDown="1" ss:StyleID="s63"><Data
      ss:Type="String">This is a title of the sheet!</Data></Cell>
   </Row>

   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s69"><Data ss:Type="String">{{range $prj:=.prj}}</Data></Cell>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
   </Row>
   <Row ss:AutoFitHeight="0" ss:Height="45.449999999999996">
    <Cell ss:StyleID="s72"/>
    <Cell ss:StyleID="s70"><Data ss:Type="String">{{$prj.PrjName}}</Data></Cell>
    <Cell ss:StyleID="s70"><Data ss:Type="String">{{$prj.ConstrDept}}</Data></Cell>
    <Cell ss:StyleID="s71"><Data ss:Type="String">{{$prj.Assumer}}</Data></Cell>
    <Cell ss:StyleID="s71"><Data ss:Type="String">{{$prj.ReplyNo}}</Data></Cell>
    <Cell ss:StyleID="s71"><Data ss:Type="String">{{$prj.AnPingNo}}</Data></Cell>   
   
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s73"><Data ss:Type="String">{{end}}</Data></Cell>
   </Row>
  </Table> 
 </Worksheet>
</Workbook>`

and i want to get as following:

   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s69"><Data ss:Type="String">{{range $prj:=.prj}}</Data></Cell>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
    <Cell ss:StyleID="s70"/>
   </Row>

in this row,i only want to get the "`{{range $prj:=.prj}}`",and ommit the tag around the "`<Row>`"

2.

   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s73"><Data ss:Type="String">{{end}}</Data></Cell>
   </Row>

in the row, i only want to get the "{{end}}",and ommit the tag around the "<Row>"

  1. any other tag ,i want to remain. how to do?

答案1

得分: 1

你可以使用正则表达式{{[^}]*?}}(如果忽略嵌套的{{}})和Regexp.FindAllString()来实现。以下是提取预期结果的示例代码:

re := regexp.MustCompile("{{[^}]*?}}")
res := re.FindAllString(input, -1)
for _, s := range res {
    fmt.Println(s)
}

输出结果:

{{range $prj:=.prj}}
{{$prj.PrjName}}
{{$prj.ConstrDept}}
{{$prj.Assumer}}
...

但是,如果数据依赖于结构的上下文,则简单的正则表达式不适合此任务(必须提到《中心无法持续》)。在这种情况下,最好使用XML的编组或解码器:pkg/encoding/xml/,使用类似于“同时解析XML元素值和属性以进行分组”的技术。

请参考这个示例

type Data struct {
    Type  string `xml:"Type,attr"`
    Value string `xml:",chardata"`
}
type Cell struct {
    StyleID string `xml:"StyleID,attr"`
    Data    Data
}
type Row struct {
    Afh    string `xml:"AutoFitHeight,attr"`
    Height string `xml:"Height,attr"`
    Cells  []Cell `xml:"Cell"`
}
type Column struct{}
type Table struct {
    Rows []Row `xml:"Row"`
}
type Worksheet struct {
    Table Table `xml:"Table"`
}
w := &Worksheet{}
err := xml.Unmarshal([]byte(input), &w)
if err != nil {
    fmt.Printf("error: %v", err)
    return
}
fmt.Printf("%+v\n", w)

这将提取所有单元格,然后您可以筛选包含所需数据(带有{{}})的单元格。

输出结果:

&{Table:
    { Rows:[
    {Afh:0 Height: 
    Cells:[
        {StyleID:s63 Data:{Type:String Value:This is a title of the sheet!}}
    ]} 
    {Afh:0 Height: 
    Cells:[
        {StyleID:s69 Data:{Type:String Value:{{range $prj:=.prj}}}} 
        {StyleID:s70 Data:{Type: Value:}}
    ]} 
    {Afh:0 Height:45.449999999999996 
    Cells:[
        {StyleID:s72 Data:{Type: Value:}} 
        {StyleID:s70 Data:{Type:String Value:{{$prj.PrjName}}}} 
        {StyleID:s70 Data:{Type:String Value:{{$prj.ConstrDept}}}} 
        {StyleID:s71 Data:{Type:String Value:{{$prj.Assumer}}}}
    ]} 
    {Afh:0 Height: 
    Cells:[
        {StyleID:s73 Data:{Type:String Value:{{end}}}}
    ]}
]}}
英文:

You could use (if we discard nested {{}}) a regexp like {{[^}]*?}} with Regexp.FindAllString().
This example would extract the expected results:

re := regexp.MustCompile("{{[^}]*?}}")
res := re.FindAllString(input, -1)
for _, s := range res {
	fmt.Println(s)
}

Output:

{{range $prj:=.prj}}
{{$prj.PrjName}}
{{$prj.ConstrDept}}
{{$prj.Assumer}}
...

But if the data depends on the context of the structure, then a simple regexp is ill-suited for the task (obligatory reference to "The Center cannot Hold").

A xml Marshall or Decoder would be better: pkg/encoding/xml/, using techniques like "Parse both XML element value and attributes for groups"

See this example:

type Data struct {
	Type  string `xml:"Type,attr"`
	Value string `xml:",chardata"`
}
type Cell struct {
	StyleID string `xml:"StyleID,attr"`
	Data    Data
}
type Row struct {
	Afh    string `xml:"AutoFitHeight,attr"`
	Height string `xml:"Height,attr"`
	Cells  []Cell `xml:"Cell"`
}
type Column struct{}
type Table struct {
	Rows []Row `xml:"Row"`
}
type Worksheet struct {
	Table Table `xml:"Table"`
}
w := &Worksheet{}
err := xml.Unmarshal([]byte(input), &w)
if err != nil {
	fmt.Printf("error: %v", err)
	return
}
fmt.Printf("%+v\n", w)

That will extract all cells, and you can filter the ones that include the Data you want (one with {{}})

&{Table:
    { Rows:[
    {Afh:0 Height: 
    Cells:[
        {StyleID:s63 Data:{Type:String Value:This is a title of the sheet!}}
    ]} 
    {Afh:0 Height: 
    Cells:[
        {StyleID:s69 Data:{Type:String Value:{{range $prj:=.prj}}}} 
        {StyleID:s70 Data:{Type: Value:}}
    ]} 
    {Afh:0 Height:45.449999999999996 
    Cells:[
        {StyleID:s72 Data:{Type: Value:}} 
        {StyleID:s70 Data:{Type:String Value:{{$prj.PrjName}}}} 
        {StyleID:s70 Data:{Type:String Value:{{$prj.ConstrDept}}}} 
        {StyleID:s71 Data:{Type:String Value:{{$prj.Assumer}}}}
    ]} 
    {Afh:0 Height: 
    Cells:[
        {StyleID:s73 Data:{Type:String Value:{{end}}}}
    ]}
]}}

huangapple
  • 本文由 发表于 2015年1月22日 14:54:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/28082879.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定