英文:
Masking and unmasking a string in Python
问题
我理解了你的需求。以下是你要求的翻译:
我有Python代码可以从英文查询生成SQL查询。但在预测时,我可能需要在我的英文查询中发送敏感数据给模型。我想要对我的英文查询中的敏感信息(如名词和数字)进行屏蔽。当我收到预测的查询时,我想要再次取消屏蔽这些数据。
简而言之,我需要一个Python程序,它可以在我的字符串中屏蔽名词和数字,然后在我想要时取消屏蔽它们。我们可以用你建议的任何东西来替代它。
样例英文查询:
How many Chocolate Orders for a customer with ID 123456?
屏蔽后的预期输出:
How many xxxxxxxxxx Orders for a customer with ID xxxxxxxxx?
我的算法将创建以下查询:
Select count(1) from `sample-bucket` as d where d.Type ='xxxxxxxx' and d.CustId = 'xxxxxxx'
现在我需要以下未屏蔽的查询:
Select count(1) from `sample-bucket` as d where d.Type ='Chocolate' and d.CustId = '123456'
英文:
I have Python code to generate SQL queries from English queries. But while predicting, I might have to send sensitive data in my English query to the model. I want to mask sensitive information like nouns and numbers in my English query. When I receive the predicted query, I want to unmask that data again.
In short, I need a python program that can mask nouns and numbers in my string and then unmask them whenever I want them to. We can replace it with anything you suggest.
Sample English Query:
How many Chocolate Orders for a customer with ID 123456?
Masking Expected Output:
How many xxxxxxxxxx Orders for a customer with ID xxxxxxxxx?
My algorithm with create the query like:
Select count(1) from `sample-bucket` as d where d.Type ='xxxxxxxx' and d.CustId = 'xxxxxxx'
Now I need the unmasked query like below:
Select count(1) from `sample-bucket` as d where d.Type ='Chocolate' and d.CustId = '123456'
答案1
得分: 0
以下是代码的翻译部分:
您可以使用以下代码对字符串进行掩码和解除掩码。这样,您可以保留字典中的单词,并在以后需要解除掩码字符串时使用它们。我认为这段代码对使用第三方工具的人们非常有帮助。
import base64
import nltk
nltk.download('averaged_perceptron_tagger')
def base_64_encoding(text):
return str(base64.b64encode(text.encode("utf-8")).decode("utf-8"))
def base_64_decoding(text):
return str(base64.b64decode(text.encode('utf-8')).decode('utf-8'))
masked_element = {}
english_query = "How many Chocolate Orders for a customer with ID 123456?"
print("English Query: ", english_query)
for word in english_query.split(" "):
ans = nltk.pos_tag([word])
val = ans[0][1]
if val == 'NN' or val == 'NNS' or val == 'NNPS' or val == 'NNP':
masked_element[word] = base_64_encoding(word)
english_query = english_query.replace(word, base_64_encoding(word))
if word.isdigit():
masked_element[word] = base_64_encoding(word)
english_query = english_query.replace(word, base_64_encoding(word))
print("Masked Query: ", english_query)
for key, val in masked_element.items():
if val in english_query:
english_query = english_query.replace(val, key)
print("Unmasked English Query: ", english_query)
希望这能帮助您理解代码的功能。
英文:
You can use below code for the masking and unmasking a string. This way you can retain the words in the dictionary and can use them later on when you want to unmask the string. I think this code can be very helpful for the people using third party tools.
import base64
import nltk
nltk.download('averaged_perceptron_tagger')
def base_64_encoding(text):
return str(base64.b64encode(text.encode("utf-8")).decode("utf-8"))
def base_64_decoding(text):
return str(base64.b64decode(text.encode('utf-8')).decode('utf-8'))
masked_element = {}
english_query = "How many Chocolate Orders for a customer with ID 123456?"
print("English Query: ", english_query)
for word in english_query.split(" "):
ans = nltk.pos_tag([word])
val = ans[0][1]
if val == 'NN' or val == 'NNS' or val == 'NNPS' or val == 'NNP':
masked_element[word] = base_64_encoding(word)
english_query = english_query.replace(word, base_64_encoding(word))
if word.isdigit():
masked_element[word] = base_64_encoding(word)
english_query = english_query.replace(word, base_64_encoding(word))
print("Masked Query: ", english_query)
for key, val in masked_element.items():
if val in english_query:
english_query = english_query.replace(val, key)
print("Unmasked English Query: ", english_query)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论