在MS Outlook VBA中,如何查找MailItem中的隐藏数据。

huangapple go评论64阅读模式
英文:

In MS Outlook VBA, how to find hidden data in a MailItem

问题

以下是您要翻译的内容:

环境:Windows 11 Pro 64位下的MS Office LTSC Pro Plus 2021

背景

几周前,我经历了一个令人震惊的经历,看着我的Outlook收件箱被成千上万封存储在各种Outlook文件夹中的旧邮件的副本填满。通过谷歌搜索,我得知这并不是一个新问题。显然,这是由Outlook中的一个错误引起的,似乎从未被修复,我找不到关于什么可能触发该错误以及如何防止触发的任何信息。这是一个大问题,因为我使用我的收件箱来保留需要以后处理的电子邮件的积压,而现在这些电子邮件淹没在随机的旧电子邮件中。因此,我需要找到一种可靠的方法来识别并删除这些副本。

为此,我一直在学习Outlook VBA(例如此处)。我在Excel和Access中有经验,但在Outlook中是新手。到目前为止,我发现的是属性CreatedTime设置为创建这些副本的时间。这使我能够确定我的收件箱中有约8,000个这些错误的副本,分为七次在5月3日生成,每个复制的电子邮件都出现在收件箱中,数量在大约两次到七次之间。我不知道为什么那天会发生七次,也不知道为什么此后没有再次发生,我生怕它会突然再次发生。

在文件系统中,可以比较两个文件以确定它们是否相同。据我所知,没有可以在两封电子邮件之间运行的类似比较的设施。必须选择要比较的一组属性,并希望一切顺利。我提出的系统搜索具有相同值的MailItem属性 SenderNameToSubject以及相同值的“时间戳”,如果SenderEmailAddress是我的之一,则我定义为属性SentOn,否则为ReceivedTime。我想更准确地比较正文,但我是通过将属性导出到Excel,然后在其中运行比较来完成的,而电子邮件正文太大,无法这样做。如果我在Outlook VBA中更熟练,也许可以编写一个例程在那里进行比较,但我还没有弄清楚如何做到这一点。我考虑过将Size包括为正文内容的代理,但我发现两封看似相同的电子邮件的大小可能会神秘地不同。有关此问题的更多信息如下。

这是我以下提出的问题的背景。除了回答这个问题,如果有人能指导我有关导致这些副本的错误、触发它的原因以及如何防止将来再次触发它的技术信息,我也会非常感激。

隐藏的数据

在检查这些副本的数据时,我做了两(编辑:三)个奇怪的观察:

  1. 为了减小我的pst文件的大小,当我在Outlook中归档电子邮件时,我会删除任何较大的附件以便在其他地方存储。因此,当Outlook生成这些错误的旧邮件副本时,我惊讶地发现其中许多包括我已删除的附件。这意味着从电子邮件中删除附件根本不会减小pst文件的大小,而只是隐藏附件!
  2. 有一个文件的大小为21 kb。据我所知,它从未附加过附件。收件箱中有七份副本,创建时间在5月3日的不同时间。其中六份副本为21 kb,但有一份为136 kb。我打开了原件和大型副本,看不出内容上有任何区别。这意味着在较大的副本的数据结构中有115 kb的数据隐藏在某个地方。如果这是文件,我将在Notepad++中打开它们,看看是否可以找到区别所在。但我不知道如何打开这样的电子邮件的全部内容。我运行了一个例程,通过其EntryID在VBA中加载其中一封电子邮件,然后将其添加到监视窗口以查看其结构。那115 kb必须在其中某个地方隐藏,但我无法从中看出它在哪里。
  3. 编辑:情况比我写的还要糟糕。为了尝试减小pst的大小,我一直在做的另一件事是,当电子邮件因嵌入的图像而变得很大时,我会将电子邮件删除图像后转发给自己,然后永久删除(或者我以为是永久删除)原始电子邮件。所谓的“永久删除”是指我首先将电子邮件移动到“已删除项”下的子文件夹,称为“太大”。然后,不时地,我会删除此文件夹。当我删除其他地方的文件夹时,它会被移到“已删除项”文件夹中。但是,当我删除“已删除项”的子文件夹时,我会收到一条消息,“删除此文件夹及其中的所有内容?”当我点击“是”时,文件夹及其内容消失。但
英文:

Environment: MS Office LTSC Pro Plus 2021 under Windows 11 Pro 64

Background

A couple of weeks ago, I had the startling experience of watching my Outlook inbox fill up with thousands of copies of old e-mails stored in various Outlook folders. A Google search informed me that this is not a new problem. It's apparently caused by a bug in Outlook that has apparently never been fixed, and I found nothing about what might cause the bug to be triggered and how to prevent that. It's a big problem because I use my inbox to keep a backlog of e-mails requiring later attention, and now those e-mails are drowning in a sea of random old e-mails. So I need to find a way to reliably identify and remove those copies.

To do this, I've been learning Outlook VBA (e.g.). I'm experienced in VBA for Excel and Access, but new to it in Outlook. What I've discovered so far is that the property CreatedTime is set to the time those copies were made. This has allowed me to determine that my inbox has about 8,000 of those errant copies, made in seven spurts on May 3, with each copied e-mail appearing between about two and seven times in the inbox. I have no idea why it happened those seven times on that day and why it hasn't happened again since then, and I live in fear of it suddenly happening again.

In a filesystem, one can run a comparison of two files to determine if they are identical. As far as I know, there is no such comparison facility that can be run between two e-mails. One has to pick a set of properties to compare and hope for the best. The system I've come up with looks for e-mails with the same values of the MailItem properties SenderName, To, and Subject and the same value of "timestamp", which I define as property SentOn if SenderEmailAddress is one of mine, and otherwise ReceivedTime. I suppose it would be more accurate to compare bodies, but I'm doing this by exporting properties to Excel, where the comparisons are run, and the e-mail bodies are too large to do that with. If I were more proficient in Outlook VBA, I could perhaps write a routine to do the comparisons there, but I haven't figured out how to do that. I thought of including Size as a proxy for body content, but I discovered that size can be mysteriously different for two e-mails that appear to be otherwise identical. More about that below.

That is the background of the question I ask below. In addition to an answer to that question, I'd also be grateful if someone can direct me to any technical information about the bug that caused those copies, what triggers it, and how to prevent it being triggered again in the future.

Hidden data

In my inspection of the data of those copies, I've made two (Edit: three) strange observations:

  1. In order to keep down the size of my pst file, when I archive an e-mail in Outlook, I remove any sizable attachments for storage elsewhere. So I was flabbergasted to find that when Outlook generated these errant copies of old e-mails, many of them include attachments that I removed. This means that removing attachments from an e-mail does not reduce the size of the pst file at all, but merely hides the attachments!
  2. There is a case of a file whose size is 21 kb. As far as I can tell, it never had an attachment. There are seven copies of it in the inbox with CreationTime at seven different times on May 3. Six of those copies are 21 kb, but one is 136 kb. I've opened the original and the large copy, and I see no difference in the content. This means that there are 115 kb of data hiding somewhere in the data structure of that larger copy. If these were files, I would open them in Notepad++ to see if I could find where the differences are. But I don't know how to open the full content of an e-mail like that. I ran a routine to load one of the e-mails in VBA by its EntryID and then added it to the watch window to look at its structure. That 115 kb has to be hiding somewhere in there, but I couldn't tell from this where it is.
  3. Edit: It's worse than that. Another thing I've been doing to try to keep down the size of the pst is that when an e-mail is large because of embedded images, I forward the e-mail to myself with the images deleted, and then permanently delete (or so I thought) the original e-mail. By "permanently delete," I mean that I first move the e-mail to a subfolder of "Deleted Items" called "Too big." Then, every so often, I delete this folder. When I delete a folder elsewhere, it gets moved to the "Deleted Items" folder. But when I delete a subfolder of "Deleted Items", I get a message, "Delete this folder and everything in it?" and when I click "Yes", the folder and its contents disappear. But guess what? Included in the errant copies in my inbox are copies of old e-mails that I thought I had removed from Outlook in this way. This means that when I delete a subfolder of "Deleted Items", Outlook does not discard its contents, but hides them somewhere.

My question

Both the hidden attachments in the first observation above and the hidden 115 kb in the second have to be somewhere in the structure of the MailItem object. And (Edit) the hidden e-mails in the third observation must also be hidden somewhere, but I don't see any evidence of MailItem objects still existing for them. I have two questions about all this:

  • Where is this stuff hidden? Or how can I find out where it is?
  • Is there a way to actually remove it? I could trim gigabytes off the size of my pst file if all those attachments (Edit: and e-mails) that I've been removing for years could actually be removed instead of just hidden.

(Edit 2:) Second question

There's something that doesn't make sense in what I wrote above. The first and third observations tell me that Outlook never discards any data -- either deleted attachments or contents of deleted subfolders of Deleted Items, but only hides the data. For me, since I've been keeping almost all my e-mails for over ten years, I haven't been surprised to see the size of my pst file grow to over 10 Gb. And I've always assumed that other people who allow their Deleted Items folder to regularly purge would have much smaller pst files. If that's correct and if my observations are correct, then it must be that:

  • Outlook does discard the data of e-mails purged from Deleted Items.
  • Outlook does not discard, but only hides, deleted attachments and deleted subfolders of Deleted Items.

That would seem like a strange modus operandi. Is that really how Outlook works, or is there something wrong in my reasoning, or maybe something in my settings that is causing Outlook to keep deleted data?

答案1

得分: 1

关于隐藏数据下的第二项观察,您可以将这个大小为136 kb的电子邮件(msg文件)保存为HTML,这将创建一个包含所有文件的文件夹。您也可以将其中一个看似相同但较小的电子邮件保存为HTML文件,以查看大小差异和隐藏内容是什么。如果这不能提供太多帮助,我很抱歉。

英文:

Regarding to your second observation under Hidden data, you could save the 136 kb email (msg file) as HTML which creates a folder with all the files inside an e-mail. You could also save one of the small seemingly identical e-mails as HTML files and see why the difference in size, and what is actually the hidden thing. I'm sorry if this doesn't help much.

答案2

得分: 1

使用MFCMAPI(如果你是扩展MAPI开发者,可能会有点复杂)或者OutlookSpy(我是它的作者,点击IMessage按钮)来查看MAPI级别上的消息属性。特别关注大型二进制(PT_BINARY)和字符串(PT_UNICODE)属性。同时确保没有大型附件(GetAttachmentTable选项卡)。

英文:

Use MFCMAPI (can be a bit overwhelming unless you are an Extended MAPI developer) or OutlookSpy (I am its author, click IMessage button) to look at the message properties on the MAPI level. Pay particular attention to the large binary (PT_BINARY) and string (PT_UNICODE) properties. Also make sure there aren't large attachments (GetAttachmentTable tab).

答案3

得分: 0

Observation #2在问题的Hidden data部分的回答中提到。

观察 #2 是关于我的计算机上的两封电子邮件,我将它们称为 AB。电子邮件 A 存储在 已删除项目 文件夹中,而 B 存储在 收件箱 中。它们具有相同的 MailItem 属性 SenderNameToSubjectReceivedTime 的值。ACreationTimeReceivedTime 后约1.5小时,但对于 B,它是2023年5月3日,大约晚了一年。因此,我得出结论,A 是最初接收到的电子邮件,而 B 是由Outlook中讨论的 Background 部分中的错误生成的副本,违背了我的意愿。

属性 MailItem.Size 对于 A 是 21,679 字节,而对于 B 是 136,367 字节,差异为 114,688 字节。这种大小差异是我在这里提到的 "奇怪观察"。

简短答案:

副本大小的差异主要是由于电子邮件头的开头有一个字符串引起的,该字符串由57,330个未知(如果有的话)含义的Unicode字符组成,占据了114,660字节,即差异大小的几乎全部。

字符串内容:

该字符串由735行,每行76个字符(加CR LF)组成。我怀疑它是垃圾,但如果有人知道如何解释它的情况下,我会很感兴趣。我将为您提供第一行和最后三行。

yFRnGMDvXrYeo/YqPXqdFNX0Zua7b6v4hsbqYyw3Js4jtSNiwKrk8HAzg8fQ1wHgbWo7u1v9MvrW
NJJkP+lIcbUPbHOOma67wTrdjoeoQWEjj7FM3lThjl4Vc4yevQ/pWR4o8LW3g7XrhtKieSwnfzba
djlSsm07MDGcHcPpVRfuSg/kbSPbLDVrmTwAdDt7dY7aNN0mHJOQMDr0444rz3QNBttPhuNS0yPd
...
XPH32enRwTV2kZukfF/WI0fU/GOoafHaRKCVs4Cjlj94A5xlTkHBAzkda77xNqf9i6dE/h1jPfXD
rMAo3Dyzz8y19Y6LF4DtPDFr4d8C+HktvDk0ckcNq1qWlnjOdwVZAZBux35/i96828F/Cfw7rWoa
hZ6zrWpRauoV/wCx9MRbG5ghA+VnMgZ8MMHhl4INfM4mMcTUvA641JRjr0/rU5Cy0u71/SLDV7SS

如果了解有关的相关信息,发件人是一个 "ymail.com" 地址,并且标题包含一个名为 X-YMail-OSG 的字段,该字段具有类似的结构,包含2,410个字符,看起来毫无意义,但显然是雅虎的反垃圾设备的一部分。

我没有找到字段 X-YMail-OSG 的内容与较大副本头部开头的字符串之间的匹配。

我是如何发现它的:

我查看了Dmitry Streblechenko的答案中推荐的两个工具,MFCMAPI和OutlookSpy。第一个确实很难理解;第二个也可以受益于更好的文档,但非常有用。

在OutlookSpy中,我首先尝试了 IMessage > Save to File。对于 A 生成的文件小于Outlook中电子邮件的大小,因此它绝对不包含电子邮件的所有数据。然后,我注意到在OutlookSpy的 IMessage 窗口中,在 "Value" 列中,有三个属性显示为 "MAPI_E_NOT_ENOUGH_MEMORY":PR_BODY_WPR_RTF_COMPRESSEDPR_TRANSPORT_MESSAGE_HEADERS_W。右侧的详细信息面板在选择每个属性时确实显示它们的值,但在保存的文件中,它们的值也显示为 "MAPI_E_NOT_ENOUGH_MEMORY"。这就是为什么保存的文件比电子邮件小,而且毫无帮助的原因。

我仔细查看了这三个属性。PR_BODY_WPidTagBody 的相关属性,据推测与 MailItem.Body 相关联。PR_RTF_COMPRESSEDPidTagRtfCompressed 的相关属性,据推测与 MailItem.RTFbody 相关联。我在这两个属性中没有找到 AB 之间的任何差异。

PR_TRANSPORT_MESSAGE_HEADERS_WPidTagTransportMessageHeaders 的相关属性。链接的文档说它与 "传入消息的消息头信息" 相关。我没有看到可能与此相关的MailItem属性,而且显然,Outlook对象模型中不包括消息头,这对我来说似乎很奇怪,因为消息头包含有关传入电子邮件的重要信息。

在电子邮件 A 中,在OutlookSpy的 IMessage 窗口中,选择 PR_TRANSPORT_MESSAGE_HEADERS_W,在右侧的详细信息窗格中,字段 "Symbol" 看起来像一组电子邮件头。字段 "

英文:

Answer about observation #2 in the section Hidden data of my question.

Observation #2 is about two e-mails on my computer, which I will call A and B. E-mail A is in the folder Deleted Items and B is in the Inbox. They have the same values of the MailItem properties SenderName, To, Subject, and ReceivedTime. The CreationTime of A is about 1.5 hours after ReceivedTime, but for B, it is May 3, 2023, about a year later. I therefore conclude that A is the original e-mail received, while B is an errant copy made against my will by the bug in Outlook discussed in the Background section of the question.

The property MailItem.Size is 21,679 bytes for A and 136,367 bytes for B, a difference of 114,688 bytes. This difference in size is the "strange observation" I'm talking about here.

Short answer:

The difference in the size of the copy is mostly due to a string at the beginning of the e-mail headers of just that copy, consisting of 57,330 Unicode characters of unknown (if any) meaning, taking up 114,660 bytes, i.e., accounting for all but 28 bytes of the size difference.

Contents of the string:

The string consists of 735 lines of 76 characters (plus CR LF) each. I suspect it's garbage, but in case it's not and in case someone here knows how to interpret it, I am giving you the first and last three lines. I'll be interested to know if someone can attribute any meaning to it:

yFRnGMDvXrYeo/YqPXqdFNX0Zua7b6v4hsbqYyw3Js4jtSNiwKrk8HAzg8fQ1wHgbWo7u1v9MvrW
NJJkP+lIcbUPbHOOma67wTrdjoeoQWEjj7FM3lThjl4Vc4yevQ/pWR4o8LW3g7XrhtKieSwnfzba
djlSsm07MDGcHcPpVRfuSg/kbSPbLDVrmTwAdDt7dY7aNN0mHJOQMDr0444rz3QNBttPhuNS0yPd
...
XPH32enRwTV2kZukfF/WI0fU/GOoafHaRKCVs4Cjlj94A5xlTkHBAzkda77xNqf9i6dE/h1jPfXD
rMAo3Dyzz8y19Y6LF4DtPDFr4d8C+HktvDk0ckcNq1qWlnjOdwVZAZBux35/i96828F/Cfw7rWoa
hZ6zrWpRauoV/wCx9MRbG5ghA+VnMgZ8MMHhl4INfM4mMcTUvA641JRjr0/rU5Cy0u71/SLDV7SS

In case it's relevant to know, the sender is at a "ymail.com" address and the headers include a field X-YMail-OSG, which has a similar structure of 2,410 characters that look meaningless but are apparently a Yahoo anti-spam device [1, 2].

I have not found any match between the contents of the field X-YMail-OSG and the much larger string at the beginning of the headers of the large copy.

How I found it:

I checked out both tools recommended in the answer by Dmitry Streblechenko, MFCMAPI and OutlookSpy. The first was indeed difficult to understand; the second could also benefit from better documentation, but is pretty useful.

In OutlookSpy, I first tried IMessage > Save to File. The resulting file for A was smaller than the size of the e-mail in Outlook, so it definitely did not contain all the data of the e-mail. Then I noticed that in OutlookSpy's window for IMessage, in column "Value", three properties show a value of "MAPI_E_NOT_ENOUGH_MEMORY": PR_BODY_W, PR_RTF_COMPRESSED, and PR_TRANSPORT_MESSAGE_HEADERS_W. The details panel on the right side of the window does show their values when each is selected, but in the saved file, their values also show as "MAPI_E_NOT_ENOUGH_MEMORY". That's why the saved file is smaller than the e-mail, and it's not helpful.

I took a closer look at those three properties. PR_BODY_W is an associated property of MAPI property PidTagBody, which is presumably associated with MailItem.Body. PR_RTF_COMPRESSED is the associated property of MAPI property PidTagRtfCompressed, which is presumably associated with MailItem.RTFbody. I didn't find any differences between A and B in those two properties.

PR_TRANSPORT_MESSAGE_HEADERS_W is an associated property of MAPI property PidTagTransportMessageHeaders. The linked documentation says it's related to "message header information for inbound messages." I don't see any MailItem property that might be associated with this, and apparently, message headers are not included in the Outlook Object Model, which seems strange to me since the headers contain important information about inbound e-mails.

In e-mail A, in OutlookSpy's window for IMessage, with PR_TRANSPORT_MESSAGE_HEADERS_W selected, in the details pane on the right side, the field "Symbol" looks like a set of e-mail headers. The field "Value", when viewed as text, looks like those headers with a space between each character, which suggests that the data are stored as Unicode. So I dropped out of OutlookSpy and got the headers from "Outlook > File > Info > Properties > Internet headers". I did that for both A and B and copied the headers to Notepad++, where I ran the plugin "ComparePlus" on them. There I found that the headers of the two e-mails are identical except for a string of 57,330 characters (in which I saw no meaning or pattern) tacked onto the beginning of the headers of B.

Conclusion:

I suspect that that large string at the beginning of the headers of B is just garbage caused by some sort of corruption in the process of Outlook's bug generating the errant copy, e-mail B.

It's interesting that if it's true that message headers are not included in the Outlook Object Model, then I would never have found that garbage string by looking at the entire contents of the Outlook objects. The only way to find it was to either look at the MAPI property PR_TRANSPORT_MESSAGE_HEADERS_W or to look at the headers of the e-mail in "Outlook > File > Info > Properties > Internet headers".

huangapple
  • 本文由 发表于 2023年5月21日 12:39:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76298321.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定