在csv.reader中查找条件附加的字符串值

huangapple go评论102阅读模式
英文:

Looking for conditional string-appended values in csv.reader

问题

我知道你要求只翻译代码部分,下面是你提供的代码的翻译:

  1. companyList = {'1000000': 'Vendor1', ...}
  2. with open('Vendor Report.csv', mode='r', encoding='latin1') as file:
  3. csvreader = csv.reader(file)
  4. for row in csvreader:
  5. print(' '.join(row))
  6. if 'Functional Amount Not Invoiced:' in row:
  7. ...

请注意,这是你提供的代码的翻译,只包含代码部分,没有其他内容。

英文:

I have a vendor payables aging report I'm trying to automate which is provided as a .csv file exported from a financial system. In the report, a line called 'functional amount not invoiced' is listed, followed by a $xx.xx amount for each vendor on the list. Below is an example of the report output (with numbers changed):

  1. 1000000 Vendor1 USD PO Number 1/1/1900
  2. Item1, Description
  3. 100 Each $1.00
  4. INV000000 1/1/1900 000 Each 100 0 $1.00 $24.00
  5. 0 0 $24.00
  6. INV000001 1/1/1900 000 Each 50 0 $1.00 $10.50
  7. 0 0 $10.50
  8. -------------------
  9. Functional Amount Not Invoiced: $250.00
  10. Amount Not Invoiced Less Returned: $250.00
  11. 1000001 Vendor2 USD PO2061994 6/2/2015
  12. Item2, Description 30 Each $38.00
  13. INV000002 7/23/2015 000 Each 9 0 $38.00 $342.00
  14. 0 0 $342.00
  15. INV000003 7/23/2015 000 Each 7 0 $38.00 $266.00
  16. 0 0 $266.00
  17. -------------------
  18. Functional Amount Not Invoiced: $346,955.00
  19. Amount Not Invoiced Less Returned: $1,245.00

I would like to know how I can parse a .csv file for all instances of 'Functional Amount Not Invoiced' greater than or equal to $10,000.00, and in those cases, take the first two strings and return them (in the case above, I would return 1000000 Vendor1). Here's my code so far:

  1. companyList={'1000000':'Vendor1',...}
  2. with open('Vendor Report.csv',mode='r',encoding='latin1') as file:
  3. csvreader=csv.reader(file)
  4. for row in csvreader:
  5. print(' '.join(row))
  6. if 'Functional Amount Not Invoiced:' in row:
  7. ...

I've gotten to the ... part, and I know the logic is 'if amount after string is at least $10,000.00, find the vendor ID and vendor name and return them. The goal would be to have a list of all vendors over $10,000.00 appended automatically to a list. My expected output would be as follows:

  1. Vendor ID Vendor Name $346,955.00
  2. ...

答案1

得分: 1

以下是代码部分的翻译:

  1. #pip install pandas
  2. import pandas as pd
  3. MIN_AMOUNT = 10000
  4. df = pd.read_fwf("input.csv", header=None)
  5. vendor_vals = df[0].str.extract(r"(\d+) ([a-zA-Z]+\d+)", expand=False).ffill()
  6. fani_vals = (df.pop(0).str.extract(r"Functional Amount Not Invoiced: $(.*)",
  7. expand=False).replace(",|\.0+": "", regex=True).astype(float))
  8. companyList = (
  9. df.assign(VENDOR = vendor_vals, FANI = fani_vals).dropna()
  10. .loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].to_dict("list")
  11. )
  1. df = pd.read_fwf("input.csv", header=None)
  2. out = (
  3. df.join(df[0].str.extract(r"(\d+) ([a-zA-Z]+\d+)")
  4. .rename(columns={0: "VENDOR_ID", 1:"VENDOR_NAME"}).ffill())
  5. .assign(FANI = lambda df_: df_.pop(0).str.extract(r"Functional Amount Not Invoiced: $(.*)",
  6. expand=False).replace(",|\.0+": "", regex=True).astype(float))
  7. .dropna().loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].reset_index(drop=True)
  8. )

希望这些翻译对您有所帮助。

英文:

IIUC, here is one option with [tag:pandas] by using read_fwf and extract :

  1. #pip install pandas
  2. import pandas as pd
  3. MIN_AMOUNT = 10000
  4. df = pd.read_fwf("input.csv", header=None)
  5. vendor_vals = df[0].str.extract(r"(\d+) ([a-zA-Z]+\d+)", expand=False).ffill()
  6. fani_vals = (df.pop(0).str.extract(r"Functional Amount Not Invoiced: $(.*)",
  7. expand=False).replace({r",|\.0+": ""}, regex=True).astype(float))
  8. companyList = (
  9. df.assign(VENDOR = vendor_vals, FANI = fani_vals).dropna()
  10. .loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].to_dict("list")
  11. )

Output :

  1. >>> print(companyList)
  2. {'VENDOR': ['1000001 Vendor2'], 'FANI': [346955.0]}

Update :

If you need a dataframe (to make a .csv), use this :

  1. df = pd.read_fwf("input.csv", header=None)
  2. out = (
  3. df.join(df[0].str.extract(r"(\d+) ([a-zA-Z]+\d+)")
  4. .rename(columns={0: "VENDOR_ID", 1:"VENDOR_NAME"}).ffill())
  5. .assign(FANI = lambda df_: df_.pop(0).str.extract(r"Functional Amount Not Invoiced: $(.*)",
  6. expand=False).replace({r",|\.0+": ""}, regex=True).astype(float))
  7. .dropna().loc[lambda df_: df_["FANI"].gt(MIN_AMOUNT)].reset_index(drop=True)
  8. )

Output :

  1. >>> print(out)
  2. VENDOR_ID VENDOR_NAME FANI
  3. 0 1000001 Vendor2 346955.0

huangapple
  • 本文由 发表于 2023年4月4日 13:23:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75925764.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定