英文:
Load huge txt file for winform quickly
问题
我打算制作一个僧伽罗英语词典。因此,我有一个包含每个英语单词的僧伽罗含义的文件。所以我想在加载表单时加载它。所以我添加了以下命令来获取所有文件内容到一个字符串变量中。所以我在FormLoad方法中使用了以下命令,
private string DictionaryWords = "";
private string ss = null;
...
private void Form1_Load(object sender, EventArgs e)
{
this.BackColor = ColorTranslator.FromHtml("#AFC3E0");
string fileName = @"SI-utf8.Txt";
using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
DictionaryWords += ss;
}
}
}
但不幸的是,该文本文件有130,000多行,大小超过5MB。因此,我的WinForm无法加载。
请参考以下图片:
我需要更快地加载这个文件,以便WinForm可以使用正则表达式找到每个英语单词的正确含义。有人能告诉我如何做到这一点吗?我尝试了一切。
在15秒内或更短的时间内将这个庞大的文件加载到我的项目中,并需要使用正则表达式来查找每个英语单词的含义。
英文:
I am going to make a sinhala english dictionary. SO i have a file that contains sinhala meaning for every english word. So i thought to load it while form is loading. So i added following command to get all file content to a string variable. SO i used following command in FormLoad method,
private string DictionaryWords = "";
private string ss = null;
...
private void Form1_Load(object sender, EventArgs e)
{
this.BackColor = ColorTranslator.FromHtml("#AFC3E0");
string fileName = @"SI-utf8.Txt";
using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
DictionaryWords += ss;
}
}
}
But unfortunately that txt file has 130000+ line and it size it more than 5MB. SO my winform not loading.
see the image
I need to load this faster for winform to use REGEX form getting right meaning for every english word..
Could anybody tell me a method to do this. I tried everything.
Load this huge file to my project within 15 more less and need to use Regex for finding each english words..
答案1
得分: 2
好的,以下是您要求的翻译部分:
好的,要分析的代码太少了。我怀疑
DictionaryWords += ss;
可能是罪魁祸首:将字符串`130000`次附加在一起意味着*一遍又一遍地重新创建*相当长的字符串,可能会使系统崩溃,但我没有严格的证据(我在评论中问过`DictionaryWords`)。还有一个可能受指责的候选人是我不了解的您的*正则表达式*。
这就是为什么让我试着从头解决这个问题。
- 我们有一个(很长的)字典在`SI-utf8.Txt`中。
- 我们应该在不冻结UI的情况下加载字典。
- 我们应该使用加载的字典来翻译英文文本。
我得到了类似这样的东西:
using System.IO;
using System.Linq;
using System.Threading.Tasks;
...
// 加载字典(异步,因为字典可能相当长)
// static:我们只想要一个字典供所有实例使用
private static readonly Task<IReadOnlyDictionary<string, string>> s_Dictionary =
Task<IReadOnlyDictionary<string, string>>.Run(() => {
char[] delimiters = { ' ', '\t' };
IReadOnlyDictionary<string, string> result = File
.ReadLines(@"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);
return result;
});
然后我们需要一个翻译部分:
// 让它是最简单的正则表达式:英文字母和撇号;
// 如果您喜欢的话,您可以改进它
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");
// 翻译是异步的,因为我们必须等待字典加载完成
private static async Task
if (string.IsNullOrWhiteSpace(englishText))
return englishText;
var dictionary = await s_Dictionary;
return s_EnglishWords.Replace(englishText,
match => dictionary.TryGetValue(match.Value, out var translation)
? translation // 如果我们知道翻译
: match.Value); // 如果我们不知道翻译
}
用法:
// 请注意,按钮事件也应该是异步的
private async void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = await Translate(OriginalTextBox.Text);
}
**编辑:** 所以,`DictionaryWords`是一个`string`,因此
DictionaryWords += ss;
是一个*罪魁祸首*。请不要在(深度)循环中附加`string`:每次附加都会*重新创建*字符串,这很慢。如果您坚持要循环,请使用`StringBuilder`:
// 让我们为600万个字符预先分配一个缓冲区
StringBuilder sb = new StringBuilder(6 * 1024 * 1024);
using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
sb.Append(ss);
}
}
DictionaryWords = sb.ToString();
或者,为什么您需要循环呢?让.NET为您完成这项工作:
DictionaryWords = File.ReadAllText(@"SI-utf8.Txt");
**编辑 2:** 如果实际文件大小不是*那么巨大*(只有`DictionaryWords += ss;`*一个人*搅乱了乐趣),您可以坚持一个简单的同步解决方案:
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");
private static readonly IReadOnlyDictionary<string, string> s_Dictionary = File
.ReadLines(@"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(new char[] { ' ', '\t' },
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);
private static string Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;
return s_EnglishWords.Replace(englishText,
match => s_Dictionary.TryGetValue(match.Value, out var translation)
? translation
: match.Value);
}
然后使用非常简单:
// 请注意,按钮事件也应该是异步的
private void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = Translate(OriginalTextBox.Text);
}
<details>
<summary>英文:</summary>
Well, there are too little code to analyze. I suspect that
DictionaryWords += ss;
is the felon: appending string `130000` times which means *re-creating* quite long string *over and over again* can well put the system on the knees, but I have not rigorous proof (I've asked about `DictionaryWords` in the comment). Another possible candidate to be blamed is the unknown for me your *regular expression*.
That's why let me try to solve the problem from scratch.
- We a have a (long) dictionary in `SI-utf8.Txt`.
- We should load the dictionary without freezing the UI.
- We should use the dictionary loaded to translate the English texts.
I have got something like this:
using System.IO;
using System.Linq;
using System.Threading.Tasks;
...
// Loading dictionary (async, since dictionary can be quite long)
// static: we want just one dictionary for all the instances
private static readonly Task<IReadOnlyDictionary<string, string>> s_Dictionary =
Task<IReadOnlyDictionary<string, string>>.Run(() => {
char[] delimiters = { ' ', '\t' };
IReadOnlyDictionary<string, string> result = File
.ReadLines(@"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);
return result;
});
Then we need a translation part:
// Let it be the simplest regex: English letters and apostrophes;
// you can improve it if you like
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");
// Tanslation is async, since we have to wait for dictionary to be loaded
private static async Task<string> Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;
var dictionary = await s_Dictionary;
return s_EnglishWords.Replace(englishText,
match => dictionary.TryGetValue(match.Value, out var translation)
? translation // if we know the translation
: match.Value); // if we don't know the translation
}
Usage:
// Note, that button event should be async as well
private async void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = await Translate(OriginalTextBox.Text);
}
**Edit:** So, `DictionaryWords` is a `string` and thus
DictionaryWords += ss;
is a *felon*. Please, don't append `string` in a (deep) loop: each append *re-creates* the string which is slow. If you insist on the looping, use `StringBuilder`:
// Let's pre-allocate a buffer for 6 million chars
StringBuilder sb = new StringBuilder(6 * 1024 * 1024);
using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
sb.Append(ss);
}
}
DictionaryWords = sb.ToString();
Or, why should you loop at all? Let .net do the work for you:
DictionaryWords = File.ReadAllText(@"SI-utf8.Txt");
**Edit 2:** If actual file size is not *that huge* (it is `DictionaryWords += ss;` *alone* who spoils the fun) you can stick to a simple synchronous solution:
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");
private static readonly IReadOnlyDictionary<string, string> s_Dictionary = File
.ReadLines(@"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(new char[] { ' ', '\t' },
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);
private static string Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;
return s_EnglishWords.Replace(englishText,
match => s_Dictionary.TryGetValue(match.Value, out var translation)
? translation
: match.Value);
}
An then the usage is quite simple:
// Note, that button event should be async as well
private void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = Translate(OriginalTextBox.Text);
}
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论