在Python中进行字符串的掩码和解除掩码操作。

huangapple go评论82阅读模式
英文:

Masking and unmasking a string in Python

问题

我理解了你的需求。以下是你要求的翻译:

我有Python代码可以从英文查询生成SQL查询。但在预测时,我可能需要在我的英文查询中发送敏感数据给模型。我想要对我的英文查询中的敏感信息(如名词和数字)进行屏蔽。当我收到预测的查询时,我想要再次取消屏蔽这些数据。

简而言之,我需要一个Python程序,它可以在我的字符串中屏蔽名词和数字,然后在我想要时取消屏蔽它们。我们可以用你建议的任何东西来替代它。

样例英文查询:

    How many Chocolate Orders for a customer with ID 123456?

屏蔽后的预期输出:

    How many xxxxxxxxxx Orders for a customer with ID xxxxxxxxx? 

我的算法将创建以下查询:

    Select count(1) from `sample-bucket` as d where d.Type ='xxxxxxxx' and d.CustId = 'xxxxxxx'

现在我需要以下未屏蔽的查询:

    Select count(1) from `sample-bucket` as d where d.Type ='Chocolate' and d.CustId = '123456'
英文:

I have Python code to generate SQL queries from English queries. But while predicting, I might have to send sensitive data in my English query to the model. I want to mask sensitive information like nouns and numbers in my English query. When I receive the predicted query, I want to unmask that data again.

In short, I need a python program that can mask nouns and numbers in my string and then unmask them whenever I want them to. We can replace it with anything you  suggest.

Sample English Query:

How many Chocolate Orders for a customer with ID 123456?

Masking Expected Output:

How many xxxxxxxxxx Orders for a customer with ID xxxxxxxxx? 

My algorithm with create the query like:

Select count(1) from `sample-bucket` as d where d.Type ='xxxxxxxx' and d.CustId = 'xxxxxxx'

Now I need the unmasked query like below:

Select count(1) from `sample-bucket` as d where d.Type ='Chocolate' and d.CustId = '123456'

答案1

得分: 0

以下是代码的翻译部分:

您可以使用以下代码对字符串进行掩码和解除掩码这样您可以保留字典中的单词并在以后需要解除掩码字符串时使用它们我认为这段代码对使用第三方工具的人们非常有帮助

import base64 
import nltk

nltk.download('averaged_perceptron_tagger')

def base_64_encoding(text):
    return str(base64.b64encode(text.encode("utf-8")).decode("utf-8"))

def base_64_decoding(text):
    return str(base64.b64decode(text.encode('utf-8')).decode('utf-8'))

masked_element = {}
english_query = "How many Chocolate Orders for a customer with ID 123456?"
print("English Query: ", english_query)
for word in english_query.split(" "):
    ans = nltk.pos_tag([word])
    val = ans[0][1]
    if val == 'NN' or val == 'NNS' or val == 'NNPS' or val == 'NNP':
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
    if word.isdigit():
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
print("Masked Query: ", english_query)

for key, val in masked_element.items():
    if val in english_query:
        english_query = english_query.replace(val, key)
print("Unmasked English Query: ", english_query)

希望这能帮助您理解代码的功能。

英文:

You can use below code for the masking and unmasking a string. This way you can retain the words in the dictionary and can use them later on when you want to unmask the string. I think this code can be very helpful for the people using third party tools.

import base64 
import nltk

nltk.download('averaged_perceptron_tagger')

def base_64_encoding(text):
    return str(base64.b64encode(text.encode("utf-8")).decode("utf-8"))

def base_64_decoding(text):
    return str(base64.b64decode(text.encode('utf-8')).decode('utf-8'))

masked_element = {}
english_query = "How many Chocolate Orders for a customer with ID 123456?"
print("English Query: ", english_query)
for word in english_query.split(" "):
    ans = nltk.pos_tag([word])
    val = ans[0][1]
    if val == 'NN' or val == 'NNS' or val == 'NNPS' or val == 'NNP':
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
    if word.isdigit():
        masked_element[word] = base_64_encoding(word)
        english_query = english_query.replace(word, base_64_encoding(word))
print("Masked Query: ", english_query)

for key, val in masked_element.items():
    if val in english_query:
        english_query = english_query.replace(val, key)
print("Unmasked English Query: ", english_query)

Below is the output of above program:
在Python中进行字符串的掩码和解除掩码操作。

huangapple
  • 本文由 发表于 2023年6月13日 15:09:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76462443.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定