Python映射调试

huangapple go评论67阅读模式
英文:

Python mapping debugging

问题

以下是您提供的代码部分的翻译:

def map_headers(mapping, input_fields):
    output_fields = {}
    for field in input_fields:
        if field in mapping:
            if mapping[field]['Mapped To'].lower() != 'skip':
                output_fields[field] = mapping[field]['Mapped To']
        else:
            output_fields[field] = field

    # 确保所有输出字段都存在于标题中
    for output_field in output_fields.values():
        if output_field not in headers: #即使没有匹配的数据,我也希望有一个空字段,以便所有文件都具有相同的标题
            headers.append(output_field) 

    # 将输入字段映射到输出字段
    output_mapping = {}
    for output_field, input_field in output_fields.items():
        if input_field in headers:
            output_mapping[output_field] = input_field
        else:
            output_mapping[output_field] = ''
    return headers, output_mapping

如果您需要进一步的帮助或有其他问题,请随时提出。

英文:

So I have 50+ csv files I want to standardize to have the same header. I'm going through and basically generating a dictionary of each file's headers to a master list of headers I want to output eg BusinessName-> LastOrBusinessName. Or if I'm not interested in keeping the field I'll map it to skip.

So a little something like this

npi_num,skip
provider_type,providertype
specialty,providertype
speciality,providertype
exclusion_effective_date,exclusiondate
actiondate,exclusiondate
sanctiondate,exclusiondate #NOTE MULTIPLE mappings per field to handle different files
businessname,businessname
firstname,firstname
lastname,lastname
npi_4digits,skip
exclusionstatemedicaid,exclusionstatemedicaid
postdate,skip # full dictionary is ~300 lines this is a small sample.

Next I'm reading files with python and looking at the header then renaming headers using my list. This is where I have issues. For my function I'm passing in the dictionary (above) as well as the headers for the input.

def map_headers(mapping, input_fields):
    output_fields = {}
    for field in input_fields:
        if field in mapping:
            if mapping[field]['Mapped To'].lower() != 'skip':
                output_fields[field] = mapping[field]['Mapped To']
        else:
            output_fields[field] = field

    # Make sure all output fields are present in the header
    for output_field in output_fields.values():
        if output_field not in headers: #I want a empty field even if none of the data matches so all my files have the same headers
            headers.append(output_field) 


    # Map the input fields to the output fields
    output_mapping = {}
    for output_field, input_field in output_fields.items():
        if input_field in headers:
            output_mapping[output_field] = input_field
        else:
            output_mapping[output_field] = ''
    return headers, output_mapping

Then I have a bunch of code to basically print the data into a new csv file.

So I'm expecting a set header for every file after I process (as well as keeping the data in that column)

rowid,exclusiondate,lastnameorbusinessname,firstname,providertype,exclusionauthority,exclusionreason,lastname,businessname,exclusionstatemedicaid

except depending on the file I'm not getting the same header.
I'm not sure how to begin to make sure its in the right order either (I want it to match exactly that) but at this point I'll settle for figuring out why my dictionary isn't mapping the fields correctly.

Sample input file:

rowid,lastnameorbusinessname,firstname,npi,begindate,reason,lastname,businessname,npi_4digits,exclusionstatemedicaid
1,Bhullar-Ball,Ramneet,1255336244.0,9/17/2015,Failure to respond to requests for records on Tenncare patients,Bhullar-Ball,,6244,TN
2,"Cumberland Neurology, LLC",,1407898018.0,9/17/2015,Failure to respond to requests for records on Tenncare patients,,"Cumberland Neurology, LLC",8018,TN
3,Clabough,Kenneth,1043631401.0,10/30/2015,Failure to disclose required information,Clabough,,1401,TN

Desired output:

rowid,exclusiondate,lastnameorbusinessname,firstname,providertype,exclusionauthority,exclusionreason,lastname,businessname,exclusionstatemedicaid

So there are 3 lists. 1 is the input list (eg the sample header above). 2 the mapping list for the above file it would be
rowid->rowid,lastnameorbusinessname->lastnameorbusinessname,firstname->firstname,npi->skip,begindate->exclusiondate,reason->exclusionreason,lastname->lastname,businessname->businessname,npi_4digits->skip,exclusionstatemedicaid->exclusionstatemedicaid
3. the output header list is the standardized header I want every file to have. (refer to desired output)

so basically it would rename begindate to exclusiondate and reason to exclusion reason and drop the npi/npi_4 columns and leave everything else the same.

What I actually got as output

rowid,lastnameorbusinessname,firstname,npi,begindate,reason,lastname,businessname,npi_4digits,exclusionstatemedicaid
1,Bhullar-Ball,Ramneet,1255336244.0,9/17/2015,Failure to respond to requests for records on Tenncare patients,Bhullar-Ball,,6244,TN
2,"Cumberland Neurology, LLC",,1407898018.0,9/17/2015,Failure to respond to requests for records on Tenncare patients,,"Cumberland Neurology, LLC",8018,TN
3,Clabough,Kenneth,1043631401.0,10/30/2015,Failure to disclose required information,Clabough,,1401,TN

For clarity about what is right/wrong with the output: begindate is mapped to exclusiondate (but it wasn't renamed) , npi_4digits was supposed to be dropped. npi was correctly dropped. The headers are out of order still.

This is what's in the dictionary:

npi,skip
npi_4digits,skip
begindate,exclusiondate 

答案1

得分: 1

以下是翻译好的部分:

from pprint import pp
from random import choice
import csv

includes = {
    'npi': 'skip',
    'rowid': 'rowid',
    'begindate': 'exclusiondate',
    'exclusion_effective_date': 'exclusiondate',
    'actiondate': 'exclusiondate',
    'sanctiondate': 'exclusiondate',
    'lastnameorbusinessname': 'lastnameorbusinessname',
    'firstname': 'firstname',
    'providertype': 'providertype',
    'provider_type': 'providertype',
    'specialty': 'providertype',
    'speciality': 'providertype',
    'exclusionauthority': 'exclusionauthority',
    'exclusionreason': 'exclusionreason',
    'reason': 'exclusionreason',
    'lastname': 'lastname',
    'businessname': 'businessname',
    'exclusionstatemedicaid': 'exclusionstatemedicaid',
    'npi_4digits': 'skip',
    'postdate': 'skip',
}
result = []

with open('data.csv', encoding='utf-8') as file:
    reader = csv.DictReader(file)
    for row in reader:
        curr_row = {}
        ###
        test = choice(['providertype', 'provider_type', 'specialty', 'speciality'])
        row.update({test: f'mapped from {test}'})
        ###
        for key, val in includes.items():
            if val == 'skip': continue
            try: curr_row[val] = row[key]
            except KeyError:
                curr_row[val] = curr_row.get(val)
        result.append(curr_row)
        pp(curr_row)

输出部分:

{'rowid': '1',
 'exclusiondate': '9/17/2015',
 'lastnameorbusinessname': 'Bhullar-Ball',
 'firstname': 'Ramneet',
 'providertype': 'mapped from provider_type',
 'exclusionauthority': None,
 'exclusionreason': 'Failure to respond to requests for records on Tenncare patients',
 'lastname': 'Bhullar-Ball',
 'businessname': '',
 'exclusionstatemedicaid': 'TN'}
{'rowid': '2',
 'exclusiondate': '9/17/2015',
 'lastnameorbusinessname': 'Cumberland Neurology, LLC',
 'firstname': '',
 'providertype': 'mapped from providertype',
 'exclusionauthority': None,
 'exclusionreason': 'Failure to respond to requests for records on Tenncare patients',
 'lastname': '',
 'businessname': 'Cumberland Neurology, LLC',
 'exclusionstatemedicaid': 'TN'}
{'rowid': '3',
 'exclusiondate': '10/30/2015',
 'lastnameorbusinessname': 'Clabough',
 'firstname': 'Kenneth',
 'providertype': 'mapped from speciality',
 'exclusionauthority': None,
 'exclusionreason': 'Failure to disclose required information',
 'lastname': 'Clabough',
 'businessname': '',
 'exclusionstatemedicaid': 'TN'}

希望这能帮助你!

英文:

another try. there are two bits in there I used as a test to inject a random key/value that has multiple mappings into the small sample you gave.

from pprint import pp
from random import choice
import csv

includes = {
    'npi': 'skip',
    'rowid': 'rowid',
    'begindate': 'exclusiondate',
    'exclusion_effective_date': 'exclusiondate',
    'actiondate': 'exclusiondate',
    'sanctiondate': 'exclusiondate',
    'lastnameorbusinessname': 'lastnameorbusinessname',
    'firstname': 'firstname',
    'providertype': 'providertype',
    'provider_type': 'providertype',
    'specialty': 'providertype',
    'speciality': 'providertype',
    'exclusionauthority': 'exclusionauthority',
    'exclusionreason': 'exclusionreason',
    'reason': 'exclusionreason',
    'lastname': 'lastname',
    'businessname': 'businessname',
    'exclusionstatemedicaid': 'exclusionstatemedicaid',
    'npi_4digits': 'skip',
    'postdate': 'skip',
    }
result = []

with open('data.csv', encoding='utf-8') as file:
    reader = csv.DictReader(file)
    for row in reader:
        curr_row = {}
        ###
        test = choice(['providertype', 'provider_type', 'specialty', 'speciality'])
        row.update({test: f'mapped from {test}'})
        ###
        for key, val in includes.items():
            if val == 'skip': continue
            try: curr_row[val] = row[key]
            except KeyError:
                curr_row[val] = curr_row.get(val)
        result.append(curr_row)
        pp(curr_row)

output:

{'rowid': '1',
 'exclusiondate': '9/17/2015',
 'lastnameorbusinessname': 'Bhullar-Ball',
 'firstname': 'Ramneet',
 'providertype': 'mapped from provider_type',
 'exclusionauthority': None,
 'exclusionreason': 'Failure to respond to requests for records on Tenncare '
                    'patients',
 'lastname': 'Bhullar-Ball',
 'businessname': '',
 'exclusionstatemedicaid': 'TN'}
{'rowid': '2',
 'exclusiondate': '9/17/2015',
 'lastnameorbusinessname': 'Cumberland Neurology, LLC',
 'firstname': '',
 'providertype': 'mapped from providertype',
 'exclusionauthority': None,
 'exclusionreason': 'Failure to respond to requests for records on Tenncare '
                    'patients',
 'lastname': '',
 'businessname': 'Cumberland Neurology, LLC',
 'exclusionstatemedicaid': 'TN'}
{'rowid': '3',
 'exclusiondate': '10/30/2015',
 'lastnameorbusinessname': 'Clabough',
 'firstname': 'Kenneth',
 'providertype': 'mapped from speciality',
 'exclusionauthority': None,
 'exclusionreason': 'Failure to disclose required information',
 'lastname': 'Clabough',
 'businessname': '',
 'exclusionstatemedicaid': 'TN'}

答案2

得分: 0

这段代码使用字典来将输入的键映射到输出。result 字典的结构如下:

{'rowid': '1',
 'exclusiondate': '9/17/2015',
 'lastnameorbusinessname': 'Bhullar-Ball',
 'firstname': 'Ramneet',
 'providertype': None,
 'exclusionauthority': None,
 'exclusionreason': None,
 'lastname': 'Bhullar-Ball',
 'businessname': '',
 'exclusionstatemedicaid': 'TN'}

结果的 CSV 文件如下:

rowid,exclusiondate,lastnameorbusinessname,firstname,providertype,exclusionauthority,exclusionreason,lastname,businessname,exclusionstatemedicaid
1,9/17/2015,Bhullar-Ball,Ramneet,,,,Bhullar-Ball,,TN
2,9/17/2015,"Cumberland Neurology, LLC",,,,,,"Cumberland Neurology, LLC",TN
3,10/30/2015,Clabough,Kenneth,,,,Clabough,,TN

请注意,您的示例输入文件中并不包含您所需输出的所有字段名称,因此字典中有 None 值,CSV 中有空字段。如果有其他键,请在 includes 字典中进行映射,其中键是输入字段名称,值是所需的字段名称。

英文:

taking a guess because there is so much going on with your post.

import csv

includes = {
    'rowid': 'rowid',
    'begindate': 'exclusiondate',
    'lastnameorbusinessname': 'lastnameorbusinessname',
    'firstname': 'firstname',
    'providertype': 'providertype',
    'exclusionauthority': 'exclusionauthority',
    'exclusionreason': 'exclusionreason',
    'lastname': 'lastname',
    'businessname': 'businessname',
    'exclusionstatemedicaid': 'exclusionstatemedicaid',
    }
result = []

with open('data.csv', encoding='utf-8') as file:
    reader = csv.DictReader(file)
    for row in reader:
        curr_dict = {}
        for key, val in includes.items():
            curr_dict[val] = row.get(key)
        result.append(curr_dict)

with open('results.csv', 'w', encoding='utf-8') as file:
    headers = list(includes.values())
    writer = csv.DictWriter(file, fieldnames=headers)
    writer.writeheader()
    for row in result:
        writer.writerow(row)

this uses a dictionary to map the keys from the input to the output. the result dictionary looks like this:

{'rowid': '1',
 'exclusiondate': '9/17/2015',
 'lastnameorbusinessname': 'Bhullar-Ball',
 'firstname': 'Ramneet',
 'providertype': None,
 'exclusionauthority': None,
 'exclusionreason': None,
 'lastname': 'Bhullar-Ball',
 'businessname': '',
 'exclusionstatemedicaid': 'TN'}

the result csv looks like this:

rowid,exclusiondate,lastnameorbusinessname,firstname,providertype,exclusionauthority,exclusionreason,lastname,businessname,exclusionstatemedicaid
1,9/17/2015,Bhullar-Ball,Ramneet,,,,Bhullar-Ball,,TN
2,9/17/2015,"Cumberland Neurology, LLC",,,,,,"Cumberland Neurology, LLC",TN
3,10/30/2015,Clabough,Kenneth,,,,Clabough,,TN

note that your sample input file does not contain all the field names in your desired output, thus the None in the dict and empty fields in the csv. if there are other keys, map them in the includes dictionary, the key is the input field name, value is the desired field name.

huangapple
  • 本文由 发表于 2023年6月13日 08:00:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460942.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定