2023年6月13日 15:09:23go评论144阅读模式

英文:

Masking and unmasking a string in Python

问题

我理解了你的需求。以下是你要求的翻译：

我有Python代码可以从英文查询生成SQL查询。但在预测时，我可能需要在我的英文查询中发送敏感数据给模型。我想要对我的英文查询中的敏感信息（如名词和数字）进行屏蔽。当我收到预测的查询时，我想要再次取消屏蔽这些数据。
简而言之，我需要一个Python程序，它可以在我的字符串中屏蔽名词和数字，然后在我想要时取消屏蔽它们。我们可以用你建议的任何东西来替代它。
样例英文查询：
    How many Chocolate Orders for a customer with ID 123456?
屏蔽后的预期输出：
    How many xxxxxxxxxx Orders for a customer with ID xxxxxxxxx? 
我的算法将创建以下查询：
    Select count(1) from `sample-bucket` as d where d.Type =&#39;xxxxxxxx&#39; and d.CustId = &#39;xxxxxxx&#39;
现在我需要以下未屏蔽的查询：
    Select count(1) from `sample-bucket` as d where d.Type =&#39;Chocolate&#39; and d.CustId = &#39;123456&#39;

英文:

I have Python code to generate SQL queries from English queries. But while predicting, I might have to send sensitive data in my English query to the model. I want to mask sensitive information like nouns and numbers in my English query. When I receive the predicted query, I want to unmask that data again.

In short, I need a python program that can mask nouns and numbers in my string and then unmask them whenever I want them to. We can replace it with anything you suggest.

Sample English Query:

How many Chocolate Orders for a customer with ID 123456?

Masking Expected Output:

How many xxxxxxxxxx Orders for a customer with ID xxxxxxxxx?

My algorithm with create the query like:

Select count(1) from `sample-bucket` as d where d.Type =&#39;xxxxxxxx&#39; and d.CustId = &#39;xxxxxxx&#39;

Now I need the unmasked query like below:

Select count(1) from `sample-bucket` as d where d.Type =&#39;Chocolate&#39; and d.CustId = &#39;123456&#39;

答案1

得分: 0

以下是代码的翻译部分：

您可以使用以下代码对字符串进行掩码和解除掩码。这样，您可以保留字典中的单词，并在以后需要解除掩码字符串时使用它们。我认为这段代码对使用第三方工具的人们非常有帮助。
import base64 
import nltk
nltk.download('averaged_perceptron_tagger')
def base_64_encoding(text):
    return str(base64.b64encode(text.encode("utf-8")).decode("utf-8"))
def base_64_decoding(text):
    return str(base64.b64decode(text.encode('utf-8')).decode('utf-8'))
masked_element = {}
english_query = "How many Chocolate Orders for a customer with ID 123456?"
print("English Query: ", english_query)
for word in english_query.split(" "):
    ans = nltk.pos_tag([word])
    val = ans[0][1]
    if val == 'NN' or val == 'NNS' or val == 'NNPS' or val == 'NNP':
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
    if word.isdigit():
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
print("Masked Query: ", english_query)
for key, val in masked_element.items():
    if val in english_query:
        english_query = english_query.replace(val, key)
print("Unmasked English Query: ", english_query)

希望这能帮助您理解代码的功能。

英文:

You can use below code for the masking and unmasking a string. This way you can retain the words in the dictionary and can use them later on when you want to unmask the string. I think this code can be very helpful for the people using third party tools.

import base64 
import nltk
nltk.download(&#39;averaged_perceptron_tagger&#39;)
def base_64_encoding(text):
    return str(base64.b64encode(text.encode(&quot;utf-8&quot;)).decode(&quot;utf-8&quot;))
def base_64_decoding(text):
    return str(base64.b64decode(text.encode(&#39;utf-8&#39;)).decode(&#39;utf-8&#39;))
masked_element = {}
english_query = &quot;How many Chocolate Orders for a customer with ID 123456?&quot;
print(&quot;English Query: &quot;, english_query)
for word in english_query.split(&quot; &quot;):
    ans = nltk.pos_tag([word])
    val = ans[0][1]
    if val == &#39;NN&#39; or val == &#39;NNS&#39; or val == &#39;NNPS&#39; or val == &#39;NNP&#39;:
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
    if word.isdigit():
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
print(&quot;Masked Query: &quot;, english_query)
for key, val in masked_element.items():
    if val in english_query:
        english_query = english_query.replace(val, key)
print(&quot;Unmasked English Query: &quot;, english_query)

Below is the output of above program:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中进行字符串的掩码和解除掩码操作。

问题

答案1

I want the result to be in dd hh:mm:ss using pyspark or pyspark.sql.

如何设计一个能够识别波峰和波谷以确定突破点的Python交易算法？

从HTML中使用Zip_longest()函数数据获取Django模型的ID

CSV数据清洗使用Python/Pandas

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。