英文:
Telegram Telethon: Sharing media downloads across multiple different clients
问题
我们尝试使用一个Telegram客户端持续从一系列频道中获取消息,并将这些消息传递到Kafka中。然后,我们有第二个Telegram客户端来消费这些消息并使用client.download_media()
下载相关的媒体(照片/视频)。我们的问题是,只有当客户端1和2相同时才能正常工作,而当它们是不同的帐户时则无法工作。我们不确定这是否与会话文件或访问哈希有关,或者可能是其他原因?
是否支持我们的用例?我们主要想解决的问题是异步媒体下载可能导致大量积压,如果我们的服务器崩溃,积压可能会消失。这就是为什么我们想先将消息放入Kafka进行短期存储的原因。如果您有更好的建议,我们也将不胜感激。
这是生产者端
async with client:
messages = client.iter_messages(channel_id, limit=10)
async for message in messages:
print(message)
if message.media is not None:
message_bytes = bytes(message) # 转换为字节流
producer.produce(topic, message_bytes)
这是使用不同客户端的消费者端
with self._client:
# telethon.errors.rpcerrorlist.FileReferenceExpiredError: 文件引用已过期,不再有效,或者属于自毁媒体,无法重新发送(由GetFileRequest引起)
try:
self._client.loop.run_until_complete(self._client.download_media(orig_media, in_memory))
except Exception as e:
print(e)
英文:
we tried to use 1 telegram client to continuously streaming messages from a list of channels, and then produce the messages to kafka. We then have a 2nd telegram client to consume the messages and download the associated media (photos/videos) using client.download_media(). Our issue is that this only works if client 1 and 2 are the same, but not when they are different accounts. We are not sure if this has to do with the session files or access hash, or maybe something else?
Is support for our use case possible? The main thing we are trying to address is that the async media download could result in a large backlog, and the backlog may go away if our server dies. That's why we wanted to put the messages into kafka for short term storage in the first place. Would also appreciate if you have better suggestions.
this is producer side
async with client:
messages = client.iter_messages(channel_id, limit=10)
async for message in messages:
print(message)
if message.media is not None:
# orig_media = message.media
# converted_media = BinaryReader(bytes(orig_media)).tgread_object()
# print('orig, media', orig_media)
# print('converted media', converted_media)
message_bytes = bytes(message) #convert to bytes
producer.produce(topic, message_bytes)
this is consumer side with a different client
with self._client:
#telethon.errors.rpcerrorlist.FileReferenceExpiredError: The file reference has expired and is no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)
try:
self._client.loop.run_until_complete(self._client.download_media(orig_media, in_memory))
except Exception as e:
print(e)
答案1
得分: 1
媒体文件(在电报中的许多其他内容中)包含一个“access_hash”。虽然账户A和账户B都会看到ID为1234的媒体,但账户A可能会有一个为5678的哈希值,而账户B可能会有一个为8765的哈希值。
这样说只是为了表明每个账户只能看到在该账户内有效的“access_hash”。如果尝试在不同的账户中使用相同的哈希值,将会失败,因为该账户需要其自己的哈希值。
除了提供对正确的媒体文件(或其他内容)的实际访问以便它可以获得自己的哈希值之外,没有其他绕过这个规则的方法。
英文:
Media files (among many other things in Telegram) contain an access_hash
. While Account-A and Account-B will both see media with ID 1234, Account-A may have a hash of 5678 and Account-B may have a hash of 8765.
This is a roundabout way of saying that every account will see an access_hash
that is only valid within that account. If that same hash is attempted to be used by a different account, it will fail, because that other account needs its own hash.
There is no way to bypass this, other than giving actual access to the right media files (or whatever it is) so that it can obtain its own hash.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论