重写此代码为PL/SQL是否比在Python中执行3个for循环更快,以及如何进行转换?

huangapple go评论100阅读模式
英文:

Would re-writing this code in PL/SQL be faster than preforming 3 for-loops in Python & how would I transform this?

问题

我有一个查询结果如下:

  1. 位置编号:lct_nbr
  2. 财务周结束日期:fsc_wk_end_dt
  3. 商品编号:itm_nbr
  4. 产品编号:pg_nbr
  5. SUM(quantity):qty
  6. 本地时间戳:refresh_date

我正在尝试确定商品渗透率或权重。这些循环可以使用PL/SQL完成吗?我对这个概念不太熟悉。这样做比在Python中执行要快吗?它需要很多小时...

  1. item_penetration_by_units = pd.read_sql_query(query, trino_conn)
  2. item_penetration_by_units['total'] = math.nan
  3. item_penetration_by_units['weight'] = math.nan
  4. LCT_NBRs = item_penetration_by_units['lct_nbr']
  5. LCT_NBRs = LCT_NBRs.drop_duplicates()
  6. LCT_NBRs = LCT_NBRs.reset_index(drop=True)
  7. LCT_NBRs = np.array(LCT_NBRs)
  8. i = 0
  9. for i in range(len(LCT_NBRs)):
  10. x = item_penetration_by_units.loc[item_penetration_by_units['lct_nbr'] == LCT_NBRs[i]]
  11. print("i: " + str(i) + " and LCT_NBR: " + str(LCT_NBRs[i]))
  12. PG_NBRs = x['pg_nbr']
  13. PG_NBRs = PG_NBRs.drop_duplicates()
  14. PG_NBRs = PG_NBRs.reset_index(drop=True)
  15. PG_NBRs = np.array(PG_NBRs)
  16. n = len(PG_NBRs)
  17. j = 0
  18. for j in range(n):
  19. y = x.loc[x['pg_nbr'] == PG_NBRs[j]]
  20. print("j: " + str(j) + " and PG_NBR: " + str(PG_NBRs[j]))
  21. WK_End_DTs = y['fsc_wk_end_dt']
  22. WK_End_DTs = WK_End_DTs.drop_duplicates()
  23. WK_End_DTs = WK_End_DTs.reset_index(drop=True)
  24. WK_End_DTs = np.array(WK_End_DTs)
  25. m = len(WK_End_DTs)
  26. k = 0
  27. for k in range(m):
  28. z = y.loc[y['fsc_wk_end_dt'] == WK_End_DTs[k]]
  29. index = z.index
  30. indx = index.values
  31. z = z.reset_index(drop=True)
  32. total_k = float(z.qty.sum())
  33. l = 0
  34. for l in range(len(z)):
  35. item_penetration_by_units.loc[indx[l], 'total'] = total_k
  36. if (float(z.loc[l, 'qty']) == 0 and total_k == 0):
  37. item_penetration_by_units.loc[indx[l], 'weight'] = 0.0
  38. else:
  39. item_penetration_by_units.loc[indx[l], 'weight'] = float(z.loc[l, 'qty']) / total_k

每个 itm_nbr 都有唯一的 qtylct_nbrpg_nbrfsc_wk_end_dt。权重是通过将特定商品编号的 qty 除以 总 qty(即特定位置、特定周的特定产品组中所有商品编号的 qty 总和)来计算的。

英文:

I have a query that outputs:

  1. location number: lct_nbr
  2. fiscal week end date: fsc_wk_end_dt
  3. item number: itm_nbr
  4. product number: pg_nbr
  5. SUM(quantity): qty
  6. localtimestamp: refresh_date

and I am trying to determine item penetration, or weight. Can these for loops be done with PL/SQL? I am not very familiar with the concept. Would it be faster than doing this in Python? It takes many many hours...

  1. item_penetration_by_units = pd.read_sql_query(query, trino_conn)
  2. item_penetration_by_units['total'] = math.nan
  3. item_penetration_by_units['weight'] = math.nan
  4. LCT_NBRs = item_penetration_by_units['lct_nbr']
  5. LCT_NBRs = LCT_NBRs.drop_duplicates()
  6. LCT_NBRs = LCT_NBRs.reset_index(drop=True)
  7. LCT_NBRs = np.array(LCT_NBRs)
  8. i = 0
  9. for i in range(len(LCT_NBRs)):
  10. x = item_penetration_by_units.loc[item_penetration_by_units['lct_nbr'] == LCT_NBRs[i]]
  11. print("i: " + str(i) + " and LCT_NBR: " + str(LCT_NBRs[i]))
  12. PG_NBRs = x['pg_nbr']
  13. PG_NBRs = PG_NBRs.drop_duplicates()
  14. PG_NBRs = PG_NBRs.reset_index(drop=True)
  15. PG_NBRs = np.array(PG_NBRs)
  16. n = len(PG_NBRs)
  17. j = 0
  18. for j in range(n):
  19. y = x.loc[x['pg_nbr'] == PG_NBRs[j]]
  20. print("j: " + str(j) + " and PG_NBR: " + str(PG_NBRs[j]))
  21. WK_End_DTs = y['fsc_wk_end_dt']
  22. WK_End_DTs = WK_End_DTs.drop_duplicates()
  23. WK_End_DTs = WK_End_DTs.reset_index(drop=True)
  24. WK_End_DTs = np.array(WK_End_DTs)
  25. m = len(WK_End_DTs)
  26. k = 0
  27. for k in range(m):
  28. z = y.loc[y['fsc_wk_end_dt'] == WK_End_DTs[k]]
  29. index = z.index
  30. indx = index.values
  31. z = z.reset_index(drop=True)
  32. total_k = float(z.qty.sum())
  33. l = 0
  34. for l in range(len(z)):
  35. item_penetration_by_units.loc[indx[l], 'total'] = total_k
  36. if (float(z.loc[l, 'qty']) == 0 and total_k == 0):
  37. item_penetration_by_units.loc[indx[l], 'weight'] = 0.0
  38. else:
  39. item_penetration_by_units.loc[indx[l], 'weight'] = float(z.loc[l, 'qty']) / total_k

Each itm_nbr has a unique qty, lct_nbr, pg_nbr and fsc_wk_end_dt. Weight is calculated by dividing the qty a particular item number has by the total qty (which is the qty sum for all the item numbers in that particular product group at that particular location for that particular week).

答案1

得分: 1

好的,以下是代码部分的中文翻译:

  1. import pandas as pd
  2. import math
  3. # lct fsc_wk_end_dt pg_nbr itm_nbr qty
  4. data = [
  5. [ 1, '2023-08-01', 1, 1, 5 ],
  6. [ 1, '2023-08-01', 1, 2, 6 ],
  7. [ 1, '2023-08-01', 1, 3, 7 ],
  8. [ 1, '2023-08-01', 2, 1, 2 ],
  9. [ 1, '2023-08-01', 2, 2, 3 ],
  10. [ 1, '2023-08-01', 2, 3, 4 ],
  11. [ 1, '2023-09-01', 1, 1, 1 ],
  12. [ 1, '2023-09-01', 1, 2, 2 ],
  13. [ 1, '2023-09-01', 1, 3, 3 ],
  14. [ 1, '2023-09-01', 2, 1, 9 ],
  15. [ 1, '2023-09-01', 2, 2, 8 ],
  16. [ 1, '2023-09-01', 2, 3, 7 ],
  17. [ 2, '2023-08-01', 1, 1, 3 ],
  18. [ 2, '2023-08-01', 1, 2, 6 ],
  19. [ 2, '2023-08-01', 1, 3, 9 ],
  20. [ 2, '2023-08-01', 2, 1, 7 ],
  21. [ 2, '2023-08-01', 2, 2, 4 ],
  22. [ 2, '2023-08-01', 2, 3, 1 ],
  23. [ 2, '2023-09-01', 1, 1, 2 ],
  24. [ 2, '2023-09-01', 1, 2, 4 ],
  25. [ 2, '2023-09-01', 1, 3, 2 ],
  26. [ 2, '2023-09-01', 2, 1, 9 ],
  27. [ 2, '2023-09-01', 2, 2, 8 ],
  28. [ 2, '2023-09-01', 2, 3, 5 ]
  29. ]
  30. item_penetration_by_units = pd.DataFrame( data, columns = ['lct','fsc_wk_end_dt','pg_nbr','item_nbr','qty'] )
  31. print(item_penetration_by_units)
  32. item_penetration_by_units.set_index(['lct','fsc_wk_end_dt','pg_nbr'],inplace=True)
  33. x1 = item_penetration_by_units.groupby(['lct','fsc_wk_end_dt','pg_nbr']).sum()
  34. item_penetration_by_units['total'] = item_penetration_by_units.join( x1, how='left', rsuffix='x')['qtyx']
  35. item_penetration_by_units['weight'] = item_penetration_by_units['qty']/item_penetration_by_units['total']
  36. item_penetration_by_units.reset_index(inplace=True)
  37. print(item_penetration_by_units)

希望这有所帮助!

英文:

OK, I think you can do all of this with a .groupby, .sum, and a division. The key to that is making a multi-field index using location, date, and product group. And given this, yes, I believe you could do the same thing in SQL, although this seems pretty slick.

  1. import pandas as pd
  2. import math
  3. # lct fsc_wk_end_dt pg_nbr itm_nbr qty
  4. data = [
  5. [ 1, '2023-08-01', 1, 1, 5 ],
  6. [ 1, '2023-08-01', 1, 2, 6 ],
  7. [ 1, '2023-08-01', 1, 3, 7 ],
  8. [ 1, '2023-08-01', 2, 1, 2 ],
  9. [ 1, '2023-08-01', 2, 2, 3 ],
  10. [ 1, '2023-08-01', 2, 3, 4 ],
  11. [ 1, '2023-09-01', 1, 1, 1 ],
  12. [ 1, '2023-09-01', 1, 2, 2 ],
  13. [ 1, '2023-09-01', 1, 3, 3 ],
  14. [ 1, '2023-09-01', 2, 1, 9 ],
  15. [ 1, '2023-09-01', 2, 2, 8 ],
  16. [ 1, '2023-09-01', 2, 3, 7 ],
  17. [ 2, '2023-08-01', 1, 1, 3 ],
  18. [ 2, '2023-08-01', 1, 2, 6 ],
  19. [ 2, '2023-08-01', 1, 3, 9 ],
  20. [ 2, '2023-08-01', 2, 1, 7 ],
  21. [ 2, '2023-08-01', 2, 2, 4 ],
  22. [ 2, '2023-08-01', 2, 3, 1 ],
  23. [ 2, '2023-09-01', 1, 1, 2 ],
  24. [ 2, '2023-09-01', 1, 2, 4 ],
  25. [ 2, '2023-09-01', 1, 3, 2 ],
  26. [ 2, '2023-09-01', 2, 1, 9 ],
  27. [ 2, '2023-09-01', 2, 2, 8 ],
  28. [ 2, '2023-09-01', 2, 3, 5 ]
  29. ]
  30. item_penetration_by_units = pd.DataFrame( data, columns = ['lct','fsc_wk_end_dt','pg_nbr','item_nbr','qty'] )
  31. print(item_penetration_by_units)
  32. item_penetration_by_units.set_index(['lct','fsc_wk_end_dt','pg_nbr'],inplace=True)
  33. x1 = item_penetration_by_units.groupby(['lct','fsc_wk_end_dt','pg_nbr']).sum()
  34. item_penetration_by_units['total'] = item_penetration_by_units.join( x1, how='left', rsuffix='x')['qtyx']
  35. item_penetration_by_units['weight'] = item_penetration_by_units['qty']/item_penetration_by_units['total']
  36. item_penetration_by_units.reset_index(inplace=True)
  37. print(item_penetration_by_units)

Output:

  1. lct fsc_wk_end_dt pg_nbr item_nbr qty
  2. 0 1 2023-08-01 1 1 5
  3. 1 1 2023-08-01 1 2 6
  4. 2 1 2023-08-01 1 3 7
  5. 3 1 2023-08-01 2 1 2
  6. 4 1 2023-08-01 2 2 3
  7. 5 1 2023-08-01 2 3 4
  8. 6 1 2023-09-01 1 1 1
  9. 7 1 2023-09-01 1 2 2
  10. 8 1 2023-09-01 1 3 3
  11. 9 1 2023-09-01 2 1 9
  12. 10 1 2023-09-01 2 2 8
  13. 11 1 2023-09-01 2 3 7
  14. 12 2 2023-08-01 1 1 3
  15. 13 2 2023-08-01 1 2 6
  16. 14 2 2023-08-01 1 3 9
  17. 15 2 2023-08-01 2 1 7
  18. 16 2 2023-08-01 2 2 4
  19. 17 2 2023-08-01 2 3 1
  20. 18 2 2023-09-01 1 1 2
  21. 19 2 2023-09-01 1 2 4
  22. 20 2 2023-09-01 1 3 2
  23. 21 2 2023-09-01 2 1 9
  24. 22 2 2023-09-01 2 2 8
  25. 23 2 2023-09-01 2 3 5
  26. lct fsc_wk_end_dt pg_nbr item_nbr qty total weight
  27. 0 1 2023-08-01 1 1 5 18 0.277778
  28. 1 1 2023-08-01 1 2 6 18 0.333333
  29. 2 1 2023-08-01 1 3 7 18 0.388889
  30. 3 1 2023-08-01 2 1 2 9 0.222222
  31. 4 1 2023-08-01 2 2 3 9 0.333333
  32. 5 1 2023-08-01 2 3 4 9 0.444444
  33. 6 1 2023-09-01 1 1 1 6 0.166667
  34. 7 1 2023-09-01 1 2 2 6 0.333333
  35. 8 1 2023-09-01 1 3 3 6 0.500000
  36. 9 1 2023-09-01 2 1 9 24 0.375000
  37. 10 1 2023-09-01 2 2 8 24 0.333333
  38. 11 1 2023-09-01 2 3 7 24 0.291667
  39. 12 2 2023-08-01 1 1 3 18 0.166667
  40. 13 2 2023-08-01 1 2 6 18 0.333333
  41. 14 2 2023-08-01 1 3 9 18 0.500000
  42. 15 2 2023-08-01 2 1 7 12 0.583333
  43. 16 2 2023-08-01 2 2 4 12 0.333333
  44. 17 2 2023-08-01 2 3 1 12 0.083333
  45. 18 2 2023-09-01 1 1 2 8 0.250000
  46. 19 2 2023-09-01 1 2 4 8 0.500000
  47. 20 2 2023-09-01 1 3 2 8 0.250000
  48. 21 2 2023-09-01 2 1 9 22 0.409091
  49. 22 2 2023-09-01 2 2 8 22 0.363636
  50. 23 2 2023-09-01 2 3 5 22 0.227273

huangapple
  • 本文由 发表于 2023年7月28日 03:29:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76782861.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定