英文:
Transform Microsoft Excel .xls (NOT .xlsx) with Mule 4
问题
我们需要 Mule 4.3 来将旧的 MS Excel 2003 文件转换为 CSV 格式。由于我们无法控制源系统,因此无法选择以另一种格式接收文件。
在mule网站上指出他们不支持旧的格式,而且似乎也没有在线解决方法:
> 仅支持 .xlsx 文件(Excel 2007)。不支持 .xls 文件。
然而,我想知道我们是否可以这样做:
- 将输入内容以无关联格式的 'blob' 形式读取
- 立即将其传递给 Mule 4 Java 模块
- Java 模块将使用用于转换的 Java 插件
- 将转换后的数据再作为 CSV 传递回 DataWeave
这种做法有可能吗?
也许你能提供另一个解决方案?
英文:
We need Mule 4.3 to transform an old MS Excel 2003 file to CSV format. We have no control over the source system so receiving the file in another format is not an option.
On the mule site here it states that they don't support the older format, and there also doesn't seem to be a workaround online:
> Only .xlsx files are supported (Excel 2007). .xls files are not supported by Mule.
However I'm wondering if we could do something like this:
- Read the input as a 'blob' with no associated format
- Immediately hand it off to a Mule 4 Java module
- The Java module would make use of a java plugin for transformation
- Pass the transformed payload back to dataweave as CSV
Is this at all possible?
Perhaps you can suggest another solution?
答案1
得分: 1
这是根据问题中提出的建议来实现的。
添加Mule Java模块依赖项和用于处理Microsoft xls文件的Apache POI:
<dependency>
<groupId>org.mule.module</groupId>
<artifactId>mule-java-module</artifactId>
<version>1.2.5</version>
<classifier>mule-plugin</classifier>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.1.2</version>
</dependency>
然后,文件由不配置MIME类型或编码的 On New or Updated
文件读取器读取。这是因为我们试图避免Mule知道文件类型的任何信息。当然,文件可以以任何方式接收,例如通过FTP。
此时,负载看起来只是一堆无法理解的内容(原始的xls文件)。
然后,源(文件读取器)立即后面跟着一个变换,将负载转换为纯文本并对其进行Base64编码:
%dw 2.0
import * from dw::core::Binaries
output text/plain
---
toBase64(payload as Binary)
之所以这样做,是因为最初我们在将原始文件传递给Java时遇到了很多问题,比如:
- 无法将某某转换为对象
- 文件结束无效
这是有道理的,因为Java不知道我们正在传递什么,所以它怎么知道要将其转换为特定的对象类型。
接下来,我们使用Mule的Java 'New'事件实例化Java类。该类本身如下所示:
public class Transformer {
public String transform(String file) {
String cellValue = "";
try {
// 解码Base64:
byte[] decoded = Base64.getDecoder().decode(file);
// 将解码后的文件流到输入流(就好像我们从磁盘上读取它一样)
InputStream targetStream = new ByteArrayInputStream(decoded);
// 创建.xls Apache POI对象
HSSFWorkbook workbook = new HSSFWorkbook(targetStream);
// 处理行/单元格等...
HSSFSheet sheet = workbook.getSheetAt(0);
// 例如...
cellValue = sheet.getRow(0).getCell(0).getStringCellValue();
} catch (Exception e) {
System.out.println("FAIL" + e.getMessage());
}
return cellValue;
}
}
接下来,我们使用Mule的Java Invoke事件将负载传递给此方法,并进行以下配置:
- 实例:vars.instanceName
- Args:
{arg0: payload as String}
- 类:Java类的包和类名
- 方法:要调用的方法,我们的是
transform(java.lang.String)
它传递为String
,因为Java知道如何处理String
对象,基本上我们隐藏了它是一个文件的事实。
然后,Java执行以下操作(请参阅上面的Java文件):
- 解码文件
- 将其读入输入流
- 创建Apache POI类
- 执行转换
在上面的示例中,我们只是将一个单元格的值作为字符串返回给Mule。但您还可以创建一个类似于 a,b,c\nd,e,f
(\n表示新行)的CSV类型的字符串,然后使用Transform事件将其转换为CSV:
%dw 2.0
output application/java
---
write((read(payload,"application/csv",{"header" : false})),"application/csv",{"quoteValues" : "false","header" : false})
这将输出一个类似于以下内容的CSV文件:
a,b,c
d,e,f
就是这样。现在,Mule可以处理Microsoft Excel xls文件。
英文:
This was achieved as proposed in the question.
The finished sample flow looks like this:
Add the Mule Java Module dependency, and Apache POI for handling the Microsoft xls file:
<dependency>
<groupId>org.mule.module</groupId>
<artifactId>mule-java-module</artifactId>
<version>1.2.5</version>
<classifier>mule-plugin</classifier>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>4.1.2</version>
</dependency>
The file was then read by a On New or Updated
file reader with no mime type or encoding configured. This is because we are trying to avoid Mule knowing anything about the file type. The file could be received in any way of course - e.g. over FTP.
At this point the payload just looks like a pile of gobbledygook (the raw xls file).
The source (file reader) is then immediately followed by a transform converting the payload to plain text and base64 encoding it:
%dw 2.0
import * from dw::core::Binaries
output text/plain
---
toBase64(payload as Binary)
This is done because initially we had a lot of trouble passing the raw file to Java, with issues like (however if you have a better solution let me know!):
- Cannot convert so and so to object
- invalid end of file
Which make perfect sense because Java doesn't know what we are passing it and so how would it know to transform it to a specific object type.
Next we instantiate the Java class with Mule's Java 'New' event. The class itself looks like:
public class Transformer {
public String transform(String file) {
String cellValue = "";
try {
// Decode base64:
byte[] decoded = Base64.getDecoder().decode(file);
// Steam decoded file to an input stream (as if we were reading it from disk)
InputStream targetStream = new ByteArrayInputStream(decoded);
// Create the .xls Apache POI object
HSSFWorkbook workbook = new HSSFWorkbook(targetStream);
// Process the rows/cells etc...
HSSFSheet sheet = workbook.getSheetAt(0);
// For example...
cellValue = sheet.getRow(0).getCell(0).getStringCellValue();
} catch (Exception e) {
System.out.println("FAIL" + e.getMessage());
}
return cellValue;
}
}
Next we pass the payload to this method with Mule's Java Invoke event with the following configuration:
- Instance: vars.instanceName
- Args:
{arg0: payload as String}
- Class: package and class name of the Java class
- Method: the method to invoke, ours was transform(java.lang.String)
It is passed as a String
because Java knows how to handle the String
object, and basically we are hiding the fact that it is a file.
From the there Java does the following (see the above Java file):
- Decodes the file:
- Reads it to an InputStream:
- Creates the Apache POI class:
- performs transformation:
In the sample above we are just returning the value of one cell as a String to Mule. But you can also create a CSV type String such as a,b,c\nd,e,f
(\n for new line)and then transform it to a CSV with a Transform event:
%dw 2.0
output application/java
---
write( (read(payload,"application/csv",{"header" : false})),"application/csv",{"quoteValues" : "false","header" : false})
which would output a csv file like this:
a,b,c
d,e,f
And there you have it. Mule can now process a Microsoft Excel xls file.
答案2
得分: 0
可以将InputStream传递给Java模块方法调用,并且例如可以使用Apache POI(也能够读取xls和xlsx文件)来将您的流转换为csv格式。
英文:
Yes, You can pass an InputStream to Java module method invocation and use for example Apache POI (capable of reading xls and xlsx as well) for Your stream to csv conversion.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论