在转换数值时捕获错误并返回错误列表与 CSV Helper 吗?

huangapple go评论70阅读模式
英文:

Catch errors while converting values and return list of errors with CSV Helper?

问题

I understand your question. You want to catch conversion errors that occur while parsing a CSV file using CSV Helper and continue parsing the CSV file, collecting all the errors in a list. Here's a suggestion on how to approach this:

You can modify your ReadingExceptionOccurred delegate in the CSV Helper Configuration to catch and handle the conversion errors. Instead of returning false, you can return true to indicate that you've handled the exception and want to continue parsing. Here's an example of how you can modify your ReadingExceptionOccurred delegate:

ReadingExceptionOccurred = (ex) =>
{
    if (ex is TypeConverterException typeConverterException)
    {
        var rowNumber = csvReader.Context.Parser.Row;
        var propertyName = typeConverterException.MemberMapData.Member.Name;
        var attemptedValue = typeConverterException.Text;

        var vr = new RecordValidationResult()
        {
            RowNumber = rowNumber,
            PropertyName = propertyName,
            AttemptedValue = attemptedValue,
            Message = ex.Message,
        };

        errorsList.Add(vr);

        return true; // Continue parsing
    }

    return false; // Handle other exceptions as needed
}

With this modification, when a conversion error occurs, it will be caught, and a RecordValidationResult object will be created and added to the errorsList. Parsing will continue for the next rows, allowing you to collect all conversion errors.

Remember to adjust your error handling logic in the DummyModelController.cs to handle any other exceptions as needed. This way, you can continue parsing the entire CSV file and collect all the errors before returning them to the client.

I hope this helps! Let me know if you have further questions or need more assistance.

英文:

I'm writing an endpoint that takes in a CSV file using the CSV Helper library, does some validation checks/conversions, and then persists that data to the database using SQL Bulk Copy (CSV will have over a million records). For this endpoint, I'd like to return a list of errors that occurred while reading the CSV, such as a conversion gone wrong or a value not in the correct format. Currently, I'm using CSV Helper to do some conversions, like converting strings to enums and dates to a valid UTC date/time. I also have a property that needs to be a specific format, so I'm using Fluent Validation to create a rule that checks for that. I'm able to grab all the errors that occur from that just fine since Fluent Validation provides a collection of errors when validating a given record based off the rules given.

The issue I'm having is when I'm doing conversions. I was able to do my enum conversions and throw an exception if it was unable to parse the string value to my enum, but this was before I decided I wanted to collect all the errors as they occur and continue reading the CSV to the end before returning all the errors to the client so they could then edit the CSV and fix the errors.

I'm going to use dummy fields as it will still illustrate the same issue I'm having:

DummyModel.cs:

public class DummyModel
{
        public string DummyId { get; set; }
        public DateTime DummyDate { get; set; }
        public MyCode DummyCode { get; set; }
        public int? DummyNumberId { get; set; }
        public DateTime? PickUpDate { get; set; }
}

The class mapping for my DummyModel class (also in DummyModel.cs):

public class DummyModelMap : ClassMap<DummyModel>
{
        public DummyModelMap()
        {
            Map(p => p.DummyId).Name("DummyId", "Dummy Id");
            Map(p => p.DummyDate).Name("Date", "Dt").TypeConverterOption.DateTimeStyles(DateTimeStyles.AdjustToUniversal);
            Map(p => p.DummyCode).Name("DummyCode", "Dummy Cd").TypeConverter<DummyCodeEnumConverter<MyCode>>();
            Map(p => p.DummyNumberId).Name("DummyNumberId", "Dummy Number Id").TypeConverter<EmptyStringToIntConverter<int>>();
            Map(p => p.PickUpDate).Name("PickUpDate", "Pick Up Dt").TypeConverterOption.DateTimeStyles(DateTimeStyles.AdjustToUniversal);
        }
}

MyCode Enumeration (Utilities.cs):

public enum MyCode
{
    ABC = 0,
    DEF = 1,
    GHI = 2,
    JKL = 3,
    M_AND_N = 4 // This will look like M&A in the CSV.
}

My custom TypeConverters (Converters.cs):

    public class DummyCodeEnumConverter<T> : EnumConverter where T : struct
    {
        public DummyCodeEnumConverter() : base(typeof(T)) { }

        public override object ConvertFromString(string text, IReader row, MemberMapData memberMapData)
        {
            if (!Enum.TryParse(text, out MyCode code))
            {
               /* SPECIAL CASE: The string value (M_AND_N) does not match what will
                * appear in the CSVs that are uploaded (M&A), so we need to check for that here
                * and convert to the correct MyCode enum.
                */
                if (text == "M&A")
                {
                    return MyCode.M_AND_N;
                }

                // If an invalid value is found in the CSV for the Dummy Code column, throw an exception.
                throw new InvalidCastException($"Invalid value to TypeConverter. Type: {typeof(T)} Value: {text}");
            }

            return code;
         }
    }

    /* This custom Type Converter will convert any empty strings
     * to a null value for fields that are integers.
     */
    public class EmptyStringToIntConverter<T> : TypeConverter where T : struct
    {
        public override object ConvertFromString(string text, IReaderRow row, MemberMapData memberMapData)
        {
            if (!int.TryParse(text, out int result))
            {
                // Convert any empty strings to return a null value for the record's int data field.
                if (text == " " || text == "")
                {
                    return null;
                }

                // If an invalid value is found in the CSV for an int data type column, throw an exception.
                throw new InvalidCastException($"Invalid value to TypeConverter. Type: {typeof(T)} Value: {text}");
            }

            return result;
        }
    }

My validator for Fluent Validation (Validators):

 public class DummyModelValidator : AbstractValidator<DummyModel>
 {
        public DummyModelValidator()
        {
            RuleFor(dm => dm.DummyId).Matches(@"^([A-Za-z0-9]-[A-Za-z0-9]-([A-Za-z0-9]{2})){1,1}$")
                .WithMessage("Dummy Id must be in the following format: B-B-BB (i.e. 2-B-CC).")
                .NotEmpty()
                .WithMessage("Dummy Id cannot be empty.");
        }
 }

My RecordValidationResult class. This will be the collection of errors that are returned to the client (this looks very similar to the error object in Fluent Validation's Errors collection):

public class RecordValidationResult
{
    public int RowNumber { get; set; }
    public string PropertyName { get; set; }
    public string Message { get; set; }
    public object AttemptedValue { get; set; }
}

CSV Helper Configuration (DummyModelController.cs):

var errorsList = new List<RecordValidationResult>();

var config = new CsvConfiguration(CultureInfo.InvariantCulture)
                    {
                        // Delimiter may differ from ','.
                        DetectDelimiter = true,
                        IgnoreBlankLines = true,
                        ExceptionMessagesContainRawData = true,

                        /* Add any conversion exceptions that occur while the CSV Reader
                         * parses the file. Fluent Validation will handle other
                         * validations.
                         */
                        ReadingExceptionOccurred = x =>
                        {
                            // Catch any conversion/null errors here and create
                            // a new RecordValidationResult for each and add it
                            // to the errorsList.
                            return false;
                        }
                    };

This is where I'm handling the Fluent Validation errors (DummyModelController.cs):

using (var csvReader = new CsvReader(streamReader, config))
                    {
                        //...

                        csvReader.Context.RegisterClassMap<DummyModelMap>();

                        // Validate records and add those that return errors to a list.
                        DummyModelValidator validator = new DummyModelValidator();

                        foreach (var record in csvReader.GetRecords<DummyModel>())
                        {
                            var rowNumber = csvReader.Context.Parser.Row;

                            var results = validator.Validate(record);

                            if (!results.IsValid)
                            {
                                foreach (var failure in results.Errors)
                                {
                                    var vr = new RecordValidationResult()
                                    {
                                        RowNumber = rowNumber,
                                        PropertyName = failure.PropertyName,
                                        AttemptedValue = failure.AttemptedValue,
                                        Message = failure.ErrorMessage,
                                    };

                                    errorsList.Add(vr);
                                }
                            }
                        }

                        // Do not persist data if the errors list is greater than 0.
                        if (errorsList.Count() > 0)
                            return BadRequest(errorsList);

                        //...SQL Bulk Copy stuff...
                    }

So where I'm throwing the InvalidCastExceptions in my custom TypeConverters, is there something I can do to where the ReadingExceptionOccurred delegate in the CSV Helper Configuration can catch the conversion error so I can create a new RecordValidationResult object with that row's error information? For example, if a given string, 'BYT' is not parsed and converted to a MyCode enum, is there a way I can return a RecordValidationResult object to add to my errorsList and then continue to parse the CSV for more errors?

I've tried "tricking" my custom converters to throw a TypeConversion exception by purposefully returning a different data type and using the information from the exception and inner exception messages to create the RecordValidationResult object. That worked until I realized if I had more than one error in a row, the first error caught would only be added to the list and the other errors would not. The code was absolutely atrocious too as I was searching for the different property names to determine what to put as my "PropertyName" value in my RecordValidationResult object.

I've also attempted to try and let Fluent Validation do the parsing work, such as catching if a string is an invalid DateTime, or if the string could be parsed to a MyCode enum (I had adjusted the DummyModel properties to all be strings) so I could simply grab my errors from the Errors collection but it was SO slow in validating (like past twelve minutes), I just stopped running it. I also had another custom converter that would change all the strings that were "M&N" to "M_AND_N" so when I went to convert later, they'd do so, but I ended up getting a TypeConversion error when I was trying to manipulate the same string data type?

I think the best route would be to keep my converters the way they are but catch the conversion errors with the row error information, add that to my errorsList, ignore the exception and continue on to the next row. Is this possible, or should I just consider logging all this information to a file and showing that to the client once done?

Thank you for any help/suggestions!

答案1

得分: 1

I not only had an issue with getting CsvHelper to continue processing after exceptions are thrown, but also not being able to grab all the errors on a row. If I did encounter an error where a value couldn't be converted, etc., the rest of the values on that row weren't validated and thus weren't being added to my errorsList.

Upon reading this GitHub thread, I found out that ReadingExceptionOccurred is thrown if the row can't be read, not by the column. So I figured I could use the solution offered in that thread, but did not have luck in getting that to work with all the conversions and validation checks I had to do.

Instead, I found I could utilize the speed of CsvHelper's converters with the simplicity that Fluent Validation offers with being able to set rules for each property.

First, I created a replica class of my original DummyModel class, but made all the data types for the properties strings. I found I can be more flexible with my validation if I'm starting with strings instead of specific data types with Fluent Validation.

public class DummyModelTwo
{
    public string DummyId { get; set; }
    public string DummyDate { get; set; }
    public string DummyCode { get; set; }
    public string DummyNumberId { get; set; }
    public string PickUpDate { get; set; }
}

I kept the same CSV mapping to my original DummyModel class so I could utilize the converters later when persisting to the database with clean data.

public class DummyModelMap : ClassMap<DummyModel>
{
    public DummyModelMap()
    {
        Map(p => p.DummyId).Name("DummyId", "Dummy Id");
        Map(p => p.DummyDate).Name("Date", "Dt").TypeConverter<CustDateTimeConverter>();
        Map(p => p.DummyCode).Name("DummyCode", "Dummy Cd").TypeConverter<DummyCodeEnumConverter<MyCode>>();
        Map(p => p.DummyNumberId).Name("DummyNumberId", "Dummy Number Id").TypeConverter<EmptyStringToIntConverter<int>>();
        Map(p => p.PickUpDate).Name("PickUpDate", "Pick Up Dt").TypeConverter<CustDateTimeConverter>();
    }
}

I won't show all my converters, but I'm now throwing a serialized object of the record instead of throwing an InvalidCastException:

public class DummyCodeEnumConverter<T> : EnumConverter where T : struct
{
    public DummyCodeEnumConverter() : base(typeof(T)) { }

    public override object ConvertFromString(string text, IReader row, MemberMapData memberMapData)
    {
        if (!Enum.TryParse(text, out MyCode code))
        {
           /* SPECIAL CASE: The string value (M_AND_N) does not match what will
            * appear in the CSVs that are uploaded (M&A), so we need to check for that here
            * and convert to the correct MyCode enum.
            */
            if (text == "M&A")
            {
                return MyCode.M_AND_N;
            }

            // Convert any empty strings to return a null value for the record's Dummy Code enum (this enum is nullable).
            if (string.IsNullOrWhiteSpace(text))
            {
                return null;
            }

            // If an invalid value is found in the CSV for the Dummy Code column, throw an exception.
            throw new Exception(JsonConvert.SerializeObject(row.Context.Parser.Record));
        }

        return code;
     }
}

And honestly, I could probably remove the serialization I do when throwing the exception in my converters, but for testing purposes I left it to know which row was coming back. Then, under ReadingExceptionOccurred, I grab the record from that Exception and set the values from that to a new DummyModelTwo object.

ReadingExceptionOccurred = x => {
    var rawRecord = x.Exception.Context.Parser.Record;
    var dmt = new DummyModelTwo()
    {
        DummyId = rawRecord[0],
        DummyDate = rawRecord[1],
        DummyCode = rawRecord[2],
        DummyNumberId = rawRecord[3],
        PickUpDate = rawRecord[4],
    };
    ...//Validation stuff happens here.
    return false;
}

Of course, I'm assuming the column headers will always be the same, so this could be improved in the event the headers are in a different order.

From there, I create a new DummyModelValidator, validate the new DummyModelTwo object, and for each of the errors found, I add a new RecordValidationResult object to my errorsList (process is similar to what I was doing before).

So this probably isn't the cleanest solution, but works and is pretty quick in validating approx. a million records.

英文:

I not only had an issue with getting CsvHelper to continue processing after exceptions are thrown, but also not being able to grab all the errors on a row. If I did encounter an error where a value couldn't be converted, etc., the rest of the values on that row weren't validated and thus weren't being added to my errorsList.

Upon reading this GitHub thread, I found out that ReadingExceptionOccurred is thrown if the row can't be read, not by the column. So I figured I could use the solution offered in that thread, but did not have luck in getting that to work with all the conversions and validation checks I had to do.

Instead, I found I could utilize the speed of CsvHelper's converters with the simplicity that Fluent Validation offers with being able to set rules for each property.

First, I created a replica class of my original DummyModel class, but made all the data types for the properties strings. I found I can be more flexible with my validation if I'm starting with strings instead of specific data types with Fluent Validation.

public class DummyModelTwo
{
        public string DummyId { get; set; }
        public string DummyDate { get; set; }
        public string DummyCode { get; set; }
        public string DummyNumberId { get; set; }
        public string PickUpDate { get; set; }
}

I kept the same CSV mapping to my original DummyModel class so I could utilize the converters later when persisting to the database with clean data.

public class DummyModelMap : ClassMap&lt;DummyModel&gt;
{
        public DummyModelMap()
        {
            Map(p =&gt; p.DummyId).Name(&quot;DummyId&quot;, &quot;Dummy Id&quot;);
            Map(p =&gt; p.DummyDate).Name(&quot;Date&quot;, &quot;Dt&quot;).TypeConverter&lt;CustDateTimeConverter&gt;();
            Map(p =&gt; p.DummyCode).Name(&quot;DummyCode&quot;, &quot;Dummy Cd&quot;).TypeConverter&lt;DummyCodeEnumConverter&lt;MyCode&gt;&gt;();
            Map(p =&gt; p.DummyNumberId).Name(&quot;DummyNumberId&quot;, &quot;Dummy Number Id&quot;).TypeConverter&lt;EmptyStringToIntConverter&lt;int&gt;&gt;();
            Map(p =&gt; p.PickUpDate).Name(&quot;PickUpDate&quot;, &quot;Pick Up Dt&quot;).TypeConverter&lt;CustDateTimeConverter&gt;();
        }
}

I won't show all my converters, but I'm now throwing a serialized object of the record instead of throwing an InvalidCastException:

public class DummyCodeEnumConverter&lt;T&gt; : EnumConverter where T : struct
    {
        public DummyCodeEnumConverter() : base(typeof(T)) { }

        public override object ConvertFromString(string text, IReader row, MemberMapData memberMapData)
        {
            if (!Enum.TryParse(text, out MyCode code))
            {
               /* SPECIAL CASE: The string value (M_AND_N) does not match what will
                * appear in the CSVs that are uploaded (M&amp;A), so we need to check for that here
                * and convert to the correct MyCode enum.
                */
                if (text == &quot;M&amp;A&quot;)
                {
                    return MyCode.M_AND_N;
                }

                // Convert any empty strings to return a null value for the record&#39;s Dummy Code enum (this enum is nullable).
                if (string.IsNullOrWhiteSpace(text))
                {
                    return null;
                }

                // If an invalid value is found in the CSV for the Dummy Code column, throw an exception.
                throw new Exception(JsonConvert.SerializeObject(row.Context.Parser.Record));
            }

            return code;
         }
    }

And honestly, I could probably remove the serialization I do when throwing the exception in my converters, but for testing purposes I left it to know which row was coming back. Then, under ReadingExceptionOccurred, I grab the record from that Exception and set the values from that to a new DummyModelTwo object.

ReadingExceptionOccurred = x =&gt; {
    var rawRecord = x.Exception.Context.Parser.Record;
    var dmt = new DummyModelTwo()
    {
        DummyId = rawRecord[0],
        DummyDate = rawRecord[1],
        DummyCode = rawRecord[2],
        DummyNumberId = rawRecord[3],
        PickUpDate = rawRecord[4],
    };
    ...//Validation stuff happens here.
    return false;
}

Of course, I'm assuming the column headers will always be the same, so this could be improved in the event the headers are in a different order.

From there, I create a new DummyModelValidator, validate the new DummyModelTwo object, and for each of the errors found, I add a new RecordValidationResult object to my errorsList (process is similar to what I was doing before).

So this probably isn't the cleanest solution, but works and is pretty quick in validating approx. a million records.

答案2

得分: 0

Here's the translated content you requested:

也许不是完美的,但类似这样的代码是否可行?

ReadingExceptionOccurred = x =>
{
	errorsList.Add(new RecordValidationResult
	{
		RowNumber = x.Exception.Context.Parser.Row,
		PropertyName = x.Exception.Context.Reader.HeaderRecord[x.Exception.Context.Reader.CurrentIndex],
		Message = x.Exception.InnerException.Message,
		AttemptedValue = x.Exception.Context.Parser.Record[x.Exception.Context.Reader.CurrentIndex]
	});
	return false;
}

对我来说,这会添加一个具有以下数据的RecordValidationResult并继续处理。

RecordValidationResult
{
  RowNumber = 2,
  PropertyName = "DummyCode",
  Message = "Invalid value to TypeConverter. Type: MyCode  Value: BYT",
  AttemptedValue = "BYT"
}

至于您看到的异常,我不确定为什么它认为它是用户未处理的。在我取消选中Visual Studio选项,以使程序"当此异常类型为用户未处理时中断"之后,程序正常运行。
在转换数值时捕获错误并返回错误列表与 CSV Helper 吗?

我在控制台应用程序中获得了这些结果。
在转换数值时捕获错误并返回错误列表与 CSV Helper 吗?

英文:

Maybe not perfect, but would something like this work?

ReadingExceptionOccurred = x =&gt;
{
	errorsList.Add(new RecordValidationResult
	{
		RowNumber = x.Exception.Context.Parser.Row,
		PropertyName = x.Exception.Context.Reader.HeaderRecord[x.Exception.Context.Reader.CurrentIndex],
		Message = x.Exception.InnerException.Message,
		AttemptedValue = x.Exception.Context.Parser.Record[x.Exception.Context.Reader.CurrentIndex]
	});
	return false;
}

For me, this adds a RecordValidationResult with the following data and continues processing.

RecordValidationResult 
{
  RowNumber = 2, 
  PropertyName = &quot;DummyCode&quot;, 
  Message = &quot;Invalid value to TypeConverter. Type: MyCode  Value: BYT&quot;,
  AttemptedValue = &quot;BYT&quot;
}

As far as the exception you are seeing, I'm not sure why it thinks it is user unhandled. After I unchecked the Visual Studio option for the program to "Break when this exception type is user-unhandled", the program ran just fine.
在转换数值时捕获错误并返回错误列表与 CSV Helper 吗?

I got these results in a console app
在转换数值时捕获错误并返回错误列表与 CSV Helper 吗?

huangapple
  • 本文由 发表于 2023年5月26日 01:03:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334700.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定