将嵌套层次结构放置在Azure Data Lake中的XML文件转换为CSV,使用C# Azure函数。

huangapple go评论70阅读模式
英文:

Convert XML File with nested hierarchy placed in Azure Data lake to CSV using C# Azure Function

问题

我有以下XML文件,具有以下结构,要使用Azure函数C#将其转换为CSV。 XML文件位于Azure Data Lake位置。 文件的结构如下。

<root id="1" created_date="01/01/2023" asof_date="01/01/2023">
    <level1>
        <data1>sdfs</data1>
        <data2>true</data2>
        <level2 rec="4">
            <level_record>
                <groupid>1</groupid>
                <groupname>somegroup</groupname>
                <groupdate>01/01/2023</groupdate>
                <groupvalue>5</groupvalue>
                <groupkey>ag55</groupkey>
            </level_record>  
            <level_record>
                <groupid>2</groupid>
                <groupname>somegroup1</groupname>
                <groupdate>02/01/2023</groupdate>
                <groupvalue>6</groupvalue>
                <groupkey>ag56</groupkey>
            </level_record> 
       </level2> 
    </level1>
</root>

如何从Azure数据湖中读取文件并将其转换为CSV文件?

英文:

I have the following xml file with the below structure to convert to csv using Azure function C#. The XML file is located in Azure Data Lake location. The structure of the file is as follows.

&lt;root id=&quot;1&quot; created_date=&quot;01/01/2023&quot; asof_date=&quot;01/01/2023&quot;&gt;
    &lt;level1&gt;
        &lt;data1&gt;sdfs&lt;/data1&gt;
        &lt;data2&gt;true&lt;/data2&gt;
        &lt;level2 rec=&quot;4&quot;&gt;
            &lt;level_record&gt;
                &lt;groupid&gt;1&lt;/groupid&gt;
                &lt;groupname&gt;somegroup&lt;/groupname&gt;
                &lt;groupdate&gt;01/01/2023&lt;/groudate&gt;
                &lt;groupvalue&gt;5&lt;/groupvalue&gt;
                &lt;groupkey&gt;ag55&lt;/groupkey&gt;
            &lt;/level_record&gt;  
            &lt;level_record&gt;
                &lt;groupid&gt;2&lt;/groupid&gt;
                &lt;groupname&gt;somegroup1&lt;/groupname&gt;
                &lt;groupdate&gt;02/01/2023&lt;/groudate&gt;
                &lt;groupvalue&gt;6&lt;/groupvalue&gt;
                &lt;groupkey&gt;ag56&lt;/groupkey&gt;
            &lt;/level_record&gt; 
       &lt;/level2&gt; 
    &lt;/level1&gt;
&lt;/root&gt; 

How do i read the file from Azure data lake and convert it as a csv file?

答案1

得分: 0

以下是使用C#编写的Azure Function示例,该示例从Azure Data Lake Storage中读取XML文件并将其转换为CSV文件:

using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Auth;
using Microsoft.Azure.Storage.Blob;
using System.IO;
using System.Xml.Linq;

namespace YourNamespace
{
    public static class ConvertXmlToCsvFunction
    {
        [Function("ConvertXmlToCsvFunction")]
        public static void Run([BlobTrigger("your-container/{name}", Connection = "AzureWebJobsStorage")] Stream xmlStream, string name, FunctionContext context)
        {
            var logger = context.GetLogger("ConvertXmlToCsvFunction");
            logger.LogInformation($"Processing file: {name}");

            try
            {
                // 读取XML文件内容
                string xmlContent;
                using (StreamReader reader = new StreamReader(xmlStream))
                {
                    xmlContent = reader.ReadToEnd();
                }

                // 解析XML内容
                XDocument xDoc = XDocument.Parse(xmlContent);

                // 提取数据并转换为CSV格式
                XElement rootElement = xDoc.Element("root");
                XElement level1Element = rootElement.Element("level1");
                XElement level2Element = level1Element.Element("level2");

                // 创建CSV标题
                string csv = "groupid,groupname,groupdate,groupvalue,groupkey" + "\n";

                // 遍历level_record元素并提取数据
                foreach (XElement recordElement in level2Element.Elements("level_record"))
                {
                    string groupid = recordElement.Element("groupid").Value;
                    string groupname = recordElement.Element("groupname").Value;
                    string groupdate = recordElement.Element("groupdate").Value;
                    string groupvalue = recordElement.Element("groupvalue").Value;
                    string groupkey = recordElement.Element("groupkey").Value;

                    // 追加CSV行
                    csv += $"{groupid},{groupname},{groupdate},{groupvalue},{groupkey}" + "\n";
                }

                // 将CSV内容保存到文件
                string csvFileName = Path.ChangeExtension(name, "csv");
                string csvFilePath = Path.Combine(Path.GetTempPath(), csvFileName);
                File.WriteAllText(csvFilePath, csv);

                logger.LogInformation($"CSV file created: {csvFilePath}");
            }
            catch (Exception ex)
            {
                logger.LogError($"An error occurred: {ex.Message}");
                throw;
            }
        }
    }
}

请注意,这是Azure Function的C#示例代码,用于执行将XML文件转换为CSV文件的操作。

英文:

Here is the example of Azure Function in C# that reads an XML file from Azure Data Lake Storage and converts it to a CSV file

using Microsoft.Azure.Functions.Worker;
using Microsoft.Extensions.Logging;
using Microsoft.Azure.Storage;
using Microsoft.Azure.Storage.Auth;
using Microsoft.Azure.Storage.Blob;
using System.IO;
using System.Xml.Linq;

namespace YourNamespace
{
    public static class ConvertXmlToCsvFunction
    {
        [Function(&quot;ConvertXmlToCsvFunction&quot;)]
        public static void Run([BlobTrigger(&quot;your-container/{name}&quot;, Connection = &quot;AzureWebJobsStorage&quot;)] Stream xmlStream, string name, FunctionContext context)
        {
            var logger = context.GetLogger(&quot;ConvertXmlToCsvFunction&quot;);
            logger.LogInformation($&quot;Processing file: {name}&quot;);

            try
            {
                // Read the XML file content
                string xmlContent;
                using (StreamReader reader = new StreamReader(xmlStream))
                {
                    xmlContent = reader.ReadToEnd();
                }

                // Parse the XML content
                XDocument xDoc = XDocument.Parse(xmlContent);

                // Extract data and convert to CSV format
                XElement rootElement = xDoc.Element(&quot;root&quot;);
                XElement level1Element = rootElement.Element(&quot;level1&quot;);
                XElement level2Element = level1Element.Element(&quot;level2&quot;);

                // Create the CSV header
                string csv = &quot;groupid,groupname,groupdate,groupvalue,groupkey&quot; + &quot;\n&quot;;

                // Iterate over level_record elements and extract data
                foreach (XElement recordElement in level2Element.Elements(&quot;level_record&quot;))
                {
                    string groupid = recordElement.Element(&quot;groupid&quot;).Value;
                    string groupname = recordElement.Element(&quot;groupname&quot;).Value;
                    string groupdate = recordElement.Element(&quot;groupdate&quot;).Value;
                    string groupvalue = recordElement.Element(&quot;groupvalue&quot;).Value;
                    string groupkey = recordElement.Element(&quot;groupkey&quot;).Value;

                    // Append the CSV row
                    csv += $&quot;{groupid},{groupname},{groupdate},{groupvalue},{groupkey}&quot; + &quot;\n&quot;;
                }

                // Save the CSV content to a file
                string csvFileName = Path.ChangeExtension(name, &quot;csv&quot;);
                string csvFilePath = Path.Combine(Path.GetTempPath(), csvFileName);
                File.WriteAllText(csvFilePath, csv);

                logger.LogInformation($&quot;CSV file created: {csvFilePath}&quot;);
            }
            catch (Exception ex)
            {
                logger.LogError($&quot;An error occurred: {ex.Message}&quot;);
                throw;
            }
        }
    }
}

答案2

得分: 0

尝试以下。XML 不是有效的,因为 groupdate 没有相同的开始和结束标记。

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication52
{
    class Program
    {
        const string INPUT_FILENAME = @"c:\temp\test.xml";
        const string OUTPUT_FILENAME = @"c:\temp\test.csv";

        static void Main(string[] args)
        {
            StreamWriter writer = new StreamWriter(OUTPUT_FILENAME);
            XDocument doc = XDocument.Load(INPUT_FILENAME);

            int rowCount = 0;
            foreach (XElement record in doc.Descendants("level_record"))
            {
                rowCount++;
                if (rowCount == 1)
                {
                    //write csv header row
                    string[] headers = record.Elements().Select(x => x.Name.LocalName).ToArray();
                    writer.WriteLine(string.Join(",", headers));
                }
                //assume elements are in same order all the time.
                string[] data = record.Elements().Select(x => (string)x).ToArray();
                writer.WriteLine(string.Join(",", data));
            }

            writer.Flush();
            writer.Close();
        }
    }
}
英文:

Try following. The xml is not valid since groupdate doesn't have same start end end tag.

<!-- begin snippet: js hide: false console: true babel: false -->

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication52
{
    class Program
    {
        const string INPUT_FILENAME = @&quot;c:\temp\test.xml&quot;;
        const string OUTPUT_FILENAME = @&quot;c:\temp\test.csv&quot;;
        
        static void Main(string[] args)
        {
            StreamWriter writer = new StreamWriter(OUTPUT_FILENAME);
            XDocument doc = XDocument.Load(INPUT_FILENAME);


            int rowCount = 0;
            foreach (XElement record in doc.Descendants(&quot;level_record&quot;))
            {
                rowCount++;
                if (rowCount == 1)
                {
                    //write csv header row
                    string[] headers = record.Elements().Select(x =&gt; x.Name.LocalName).ToArray();
                    writer.WriteLine(string.Join(&quot;,&quot;, headers));
                }
                //assume elements are in same order all the time.
                string[] data = record.Elements().Select(x =&gt; (string)x).ToArray();
                writer.WriteLine(string.Join(&quot;,&quot;, data));
            }

            writer.Flush();
            writer.Close();

        }
    }


}

<!-- end snippet -->

huangapple
  • 本文由 发表于 2023年6月1日 18:18:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380888.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定