加载大型文本文件到WinForm 快速

huangapple go评论57阅读模式
英文:

Load huge txt file for winform quickly

问题

我打算制作一个僧伽罗英语词典。因此,我有一个包含每个英语单词的僧伽罗含义的文件。所以我想在加载表单时加载它。所以我添加了以下命令来获取所有文件内容到一个字符串变量中。所以我在FormLoad方法中使用了以下命令,

private string DictionaryWords = "";
private string ss = null;
...

private void Form1_Load(object sender, EventArgs e)
{
    this.BackColor = ColorTranslator.FromHtml("#AFC3E0");

    string fileName = @"SI-utf8.Txt";

    using (StreamReader sr = File.OpenText(fileName))
    {
        while ((ss = sr.ReadLine()) != null)
        {
            DictionaryWords += ss;
        }
    }
}

但不幸的是,该文本文件有130,000多行,大小超过5MB。因此,我的WinForm无法加载。

请参考以下图片:

加载大型文本文件到WinForm 快速

我需要更快地加载这个文件,以便WinForm可以使用正则表达式找到每个英语单词的正确含义。有人能告诉我如何做到这一点吗?我尝试了一切。

在15秒内或更短的时间内将这个庞大的文件加载到我的项目中,并需要使用正则表达式来查找每个英语单词的含义。

英文:

I am going to make a sinhala english dictionary. SO i have a file that contains sinhala meaning for every english word. So i thought to load it while form is loading. So i added following command to get all file content to a string variable. SO i used following command in FormLoad method,

private string DictionaryWords = "";

private string ss = null;

...

private void Form1_Load(object sender, EventArgs e)
{
    this.BackColor = ColorTranslator.FromHtml("#AFC3E0");

    string fileName = @"SI-utf8.Txt";

    using (StreamReader sr = File.OpenText(fileName))
    {
        while ((ss = sr.ReadLine()) != null)
        {
            DictionaryWords += ss;
        }
    }
}

But unfortunately that txt file has 130000+ line and it size it more than 5MB. SO my winform not loading.

see the image

加载大型文本文件到WinForm 快速

I need to load this faster for winform to use REGEX form getting right meaning for every english word..
Could anybody tell me a method to do this. I tried everything.

Load this huge file to my project within 15 more less and need to use Regex for finding each english words..

答案1

得分: 2

好的,以下是您要求的翻译部分:

好的,要分析的代码太少了。我怀疑

    DictionaryWords += ss;

可能是罪魁祸首:将字符串`130000`次附加在一起意味着*一遍又一遍地重新创建*相当长的字符串,可能会使系统崩溃,但我没有严格的证据(我在评论中问过`DictionaryWords`)。还有一个可能受指责的候选人是我不了解的您的*正则表达式*。

这就是为什么让我试着从头解决这个问题。
- 我们有一个(很长的)字典在`SI-utf8.Txt`中。
- 我们应该在不冻结UI的情况下加载字典。
- 我们应该使用加载的字典来翻译英文文本。

我得到了类似这样的东西:

using System.IO;
using System.Linq;
using System.Threading.Tasks;

...

// 加载字典(异步,因为字典可能相当长)
// static:我们只想要一个字典供所有实例使用
private static readonly Task<IReadOnlyDictionary<string, string>> s_Dictionary =
Task<IReadOnlyDictionary<string, string>>.Run(() => {
char[] delimiters = { ' ', '\t' };

IReadOnlyDictionary<string, string> result = File
  .ReadLines(@"SI-utf8.Txt")
  .Where(line => !string.IsNullOrWhiteSpace(line))
  .Select(line => line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
  .Where(items => items.Length == 2)
  .ToDictionary(items => items[0], 
                items => items[1], 
                StringComparer.OrdinalIgnoreCase);

return result;

});


然后我们需要一个翻译部分:

// 让它是最简单的正则表达式:英文字母和撇号;
// 如果您喜欢的话,您可以改进它
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");

// 翻译是异步的,因为我们必须等待字典加载完成
private static async Task Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;

var dictionary = await s_Dictionary;

return s_EnglishWords.Replace(englishText,
match => dictionary.TryGetValue(match.Value, out var translation)
? translation // 如果我们知道翻译
: match.Value); // 如果我们不知道翻译
}


用法:

// 请注意,按钮事件也应该是异步的
private async void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = await Translate(OriginalTextBox.Text);
}


**编辑:** 所以,`DictionaryWords`是一个`string`,因此

    DictionaryWords += ss;

是一个*罪魁祸首*。请不要在(深度)循环中附加`string`:每次附加都会*重新创建*字符串,这很慢。如果您坚持要循环,请使用`StringBuilder`:

// 让我们为600万个字符预先分配一个缓冲区
StringBuilder sb = new StringBuilder(6 * 1024 * 1024);

using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
sb.Append(ss);
}
}

DictionaryWords = sb.ToString();


或者,为什么您需要循环呢?让.NET为您完成这项工作:

DictionaryWords = File.ReadAllText(@"SI-utf8.Txt");


**编辑 2:** 如果实际文件大小不是*那么巨大*(只有`DictionaryWords += ss;`*一个人*搅乱了乐趣),您可以坚持一个简单的同步解决方案:

private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");

private static readonly IReadOnlyDictionary<string, string> s_Dictionary = File
.ReadLines(@"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(new char[] { ' ', '\t' },
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);

private static string Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;

return s_EnglishWords.Replace(englishText,
match => s_Dictionary.TryGetValue(match.Value, out var translation)
? translation
: match.Value);
}


然后使用非常简单:

// 请注意,按钮事件也应该是异步的
private void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = Translate(OriginalTextBox.Text);
}


<details>
<summary>英文:</summary>

Well, there are too little code to analyze. I suspect that

    DictionaryWords += ss;

is the felon: appending string `130000` times which means *re-creating* quite long string *over and over again* can well put the system on the knees, but I have not rigorous proof (I&#39;ve asked about `DictionaryWords` in the comment). Another possible candidate to be blamed is the unknown for me your *regular expression*.

That&#39;s why let me try to solve the problem from scratch.
- We a have a (long) dictionary in `SI-utf8.Txt`. 
- We should load the dictionary without freezing the UI.
- We should use the dictionary loaded to translate the English texts.

I have got something like this:

using System.IO;
using System.Linq;
using System.Threading.Tasks;

...

// Loading dictionary (async, since dictionary can be quite long)
// static: we want just one dictionary for all the instances
private static readonly Task<IReadOnlyDictionary<string, string>> s_Dictionary =
Task<IReadOnlyDictionary<string, string>>.Run(() => {
char[] delimiters = { ' ', '\t' };

IReadOnlyDictionary&lt;string, string&gt; result = File
  .ReadLines(@&quot;SI-utf8.Txt&quot;)
  .Where(line =&gt; !string.IsNullOrWhiteSpace(line))
  .Select(line =&gt; line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
  .Where(items =&gt; items.Length == 2)
  .ToDictionary(items =&gt; items[0], 
                items =&gt; items[1], 
                StringComparer.OrdinalIgnoreCase);

return result;

});


Then we need a translation part:

// Let it be the simplest regex: English letters and apostrophes;
// you can improve it if you like
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");

// Tanslation is async, since we have to wait for dictionary to be loaded
private static async Task<string> Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;

var dictionary = await s_Dictionary;

return s_EnglishWords.Replace(englishText,
match => dictionary.TryGetValue(match.Value, out var translation)
? translation // if we know the translation
: match.Value); // if we don't know the translation
}


Usage:

// Note, that button event should be async as well
private async void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = await Translate(OriginalTextBox.Text);
}


**Edit:** So, `DictionaryWords` is a `string` and thus

    DictionaryWords += ss;

is a *felon*. Please, don&#39;t append `string` in a (deep) loop: each append *re-creates* the string which is slow. If you insist on the looping, use `StringBuilder`: 

// Let's pre-allocate a buffer for 6 million chars
StringBuilder sb = new StringBuilder(6 * 1024 * 1024);

using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
sb.Append(ss);
}
}

DictionaryWords = sb.ToString();


Or, why should you loop at all? Let .net do the work for you:

DictionaryWords = File.ReadAllText(@"SI-utf8.Txt");

**Edit 2:** If actual file size is not *that huge* (it is `DictionaryWords += ss;` *alone* who spoils the fun) you can stick to a simple synchronous solution:

private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");

private static readonly IReadOnlyDictionary<string, string> s_Dictionary = File
.ReadLines(@"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(new char[] { ' ', '\t' },
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);

private static string Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;

return s_EnglishWords.Replace(englishText,
match => s_Dictionary.TryGetValue(match.Value, out var translation)
? translation
: match.Value);
}


An then the usage is quite simple:

// Note, that button event should be async as well
private void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = Translate(OriginalTextBox.Text);
}


</details>



huangapple
  • 本文由 发表于 2023年2月14日 20:45:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/75448038.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定