Docx4j功能可将文档转换为JSON表示形式。

huangapple go评论58阅读模式
英文:

Docx4j functionality to turn a document into JSON representation?

问题

有没有一种好的方法将文档转换为JSON表示,然后在网页上显示?(必须将文档转换为JSON)

如果没有内置的方法来实现这一点,我的想法是将“Run/Paragraph”结构表示为JSON对象,但是我觉得一旦我开始处理更复杂的Word文档,这种方法可能效果不佳。

英文:

Is there a good way to convert a document into JSON representation to then display on a web page? (It is a requirement that the document is converted to JSON)

My Idea if there isn't a built in way to do this is to represent the Run/Paragraph structure as JSON Objects, but I feel like this wouldn't work as well once I start working with more complex Word Documents.

答案1

得分: 1

如果你添加:

<dependency>
    <groupId>com.fasterxml.jackson.dataformat</groupId>
    <artifactId>jackson-dataformat-xml</artifactId>
    <version>2.11.3</version>
</dependency>

你可以尝试像这样做:

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.xml.XmlMapper;

public class ConvertOutJSON  {

    static String inputfilepath = System.getProperty("user.dir") + "/sample-docs/sample-docxv2.docx";

    public static void main(String[] args)
            throws Exception {


        WordprocessingMLPackage wordMLPackage 
            = Docx4J.load(new java.io.File(inputfilepath));


        String xml = wordMLPackage.getMainDocumentPart().getXML();

        //System.out.println(xml);

        XmlMapper xmlMapper = new XmlMapper();
        JsonNode node = xmlMapper.readTree(xml);

        ObjectMapper jsonMapper = new ObjectMapper();
        //String json = jsonMapper.writeValueAsString(node);
        String json = jsonMapper.writerWithDefaultPrettyPrinter().writeValueAsString(node);

        System.out.println(json);

    }    
}

然而在快速测试中,我注意到一些w:p节点没有被输出为JSON。我还没有查看是在readTree步骤中被Jackson丢弃,还是在ObjectMapper写入其输出时丢弃;你需要深入研究Jackson来解决这个问题。

当前的输出类似于:

{
  "Ignorable" : "w14 wp14",
  "body" : {
    "p" : {
      "rsidR" : "00D15781",
      "rsidRDefault" : "00D15781",
      "pPr" : {
        "ind" : {
          "left" : "0"
        }
      }
    },
    "tbl" : {
      "tblPr" : {
        "tblStyle" : {
          "val" : "TableGrid"
        },
        "tblW" : {
          "w" : "0",
          "type" : "auto"
        },
        "tblLook" : {
          "firstRow" : "1",
          "lastRow" : "0",
          "firstColumn" : "1",
          "lastColumn" : "0",
          "noHBand" : "0",
          "noVBand" : "1",
          "val" : "04A0"
        }
      },
      "tblGrid" : {
        "gridCol" : {
          "w" : "3561"
        }
      },
      "tr" : {
        "rsidR" : "00D15781",
        "tc" : {
          "tcPr" : {
            "tcW" : {
              "w" : "7122",
              "type" : "dxa"
            },
            "gridSpan" : {
              "val" : "2"
            }
          },
          "p" : {
            "rsidR" : "00D15781",
            "rsidRDefault" : "00945132",
            "pPr" : {
              "ind" : {
                "left" : "0"
              }
            },
            "r" : {
              "t" : "Horizontal merge"
            }
          }
        }
      }
    },
    "sectPr" : {
      "rsidR" : "00D15781",
      "headerReference" : {
        "type" : "default",
        "id" : "rId12"
      },
      "pgSz" : {
        "w" : "11907",
        "h" : "16839",
        "code" : "9"
      },
      "pgMar" : {
        "top" : "720",
        "right" : "720",
        "bottom" : "720",
        "left" : "720",
        "header" : "720",
        "footer" : "720",
        "gutter" : "0"
      },
      "cols" : {
        "space" : "720"
      },
      "docGrid" : {
        "linePitch" : "360"
      }
    }
  }
}
英文:

If you add:

&lt;dependency&gt;
&lt;groupId&gt;com.fasterxml.jackson.dataformat&lt;/groupId&gt;
&lt;artifactId&gt;jackson-dataformat-xml&lt;/artifactId&gt;
&lt;version&gt;2.11.3&lt;/version&gt;
&lt;/dependency&gt;

you can try something like:

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.xml.XmlMapper;
public class ConvertOutJSON  {
static String inputfilepath = System.getProperty(&quot;user.dir&quot;) + &quot;/sample-docs/sample-docxv2.docx&quot;;
public static void main(String[] args)
throws Exception {
WordprocessingMLPackage wordMLPackage 
= Docx4J.load(new java.io.File(inputfilepath));
String xml = wordMLPackage.getMainDocumentPart().getXML();
//System.out.println(xml);
XmlMapper xmlMapper = new XmlMapper();
JsonNode node = xmlMapper.readTree(xml);
ObjectMapper jsonMapper = new ObjectMapper();
//String json = jsonMapper.writeValueAsString(node);
String json = jsonMapper.writerWithDefaultPrettyPrinter().writeValueAsString(node);
System.out.println(json);
}    
}

However in a quick test, I noticed some w:p nodes were not being emitted as JSON. I haven't looked to see whether they get dropped by Jackson at the readTree step or when ObjectMapper writes its output; you'll need to dig into Jackson to fix that.

It is currently producing output like:

{
&quot;Ignorable&quot; : &quot;w14 wp14&quot;,
&quot;body&quot; : {
&quot;p&quot; : {
&quot;rsidR&quot; : &quot;00D15781&quot;,
&quot;rsidRDefault&quot; : &quot;00D15781&quot;,
&quot;pPr&quot; : {
&quot;ind&quot; : {
&quot;left&quot; : &quot;0&quot;
}
}
},
&quot;tbl&quot; : {
&quot;tblPr&quot; : {
&quot;tblStyle&quot; : {
&quot;val&quot; : &quot;TableGrid&quot;
},
&quot;tblW&quot; : {
&quot;w&quot; : &quot;0&quot;,
&quot;type&quot; : &quot;auto&quot;
},
&quot;tblLook&quot; : {
&quot;firstRow&quot; : &quot;1&quot;,
&quot;lastRow&quot; : &quot;0&quot;,
&quot;firstColumn&quot; : &quot;1&quot;,
&quot;lastColumn&quot; : &quot;0&quot;,
&quot;noHBand&quot; : &quot;0&quot;,
&quot;noVBand&quot; : &quot;1&quot;,
&quot;val&quot; : &quot;04A0&quot;
}
},
&quot;tblGrid&quot; : {
&quot;gridCol&quot; : {
&quot;w&quot; : &quot;3561&quot;
}
},
&quot;tr&quot; : {
&quot;rsidR&quot; : &quot;00D15781&quot;,
&quot;tc&quot; : {
&quot;tcPr&quot; : {
&quot;tcW&quot; : {
&quot;w&quot; : &quot;7122&quot;,
&quot;type&quot; : &quot;dxa&quot;
},
&quot;gridSpan&quot; : {
&quot;val&quot; : &quot;2&quot;
}
},
&quot;p&quot; : {
&quot;rsidR&quot; : &quot;00D15781&quot;,
&quot;rsidRDefault&quot; : &quot;00945132&quot;,
&quot;pPr&quot; : {
&quot;ind&quot; : {
&quot;left&quot; : &quot;0&quot;
}
},
&quot;r&quot; : {
&quot;t&quot; : &quot;Horizontal merge&quot;
}
}
}
}
},
&quot;sectPr&quot; : {
&quot;rsidR&quot; : &quot;00D15781&quot;,
&quot;headerReference&quot; : {
&quot;type&quot; : &quot;default&quot;,
&quot;id&quot; : &quot;rId12&quot;
},
&quot;pgSz&quot; : {
&quot;w&quot; : &quot;11907&quot;,
&quot;h&quot; : &quot;16839&quot;,
&quot;code&quot; : &quot;9&quot;
},
&quot;pgMar&quot; : {
&quot;top&quot; : &quot;720&quot;,
&quot;right&quot; : &quot;720&quot;,
&quot;bottom&quot; : &quot;720&quot;,
&quot;left&quot; : &quot;720&quot;,
&quot;header&quot; : &quot;720&quot;,
&quot;footer&quot; : &quot;720&quot;,
&quot;gutter&quot; : &quot;0&quot;
},
&quot;cols&quot; : {
&quot;space&quot; : &quot;720&quot;
},
&quot;docGrid&quot; : {
&quot;linePitch&quot; : &quot;360&quot;
}
}
}
}

huangapple
  • 本文由 发表于 2020年10月6日 05:19:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/64216316.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定