2023年6月26日 23:19:15go评论106阅读模式

英文:

I need to calculate profit/loss for given stock data set, ensuring that the first bought items are sold first

问题

考虑以下示例数据集。

日期	符号	数量	每单位价格	类型
2022年7月7日	REL2300PE	200	50	买入
2022年7月7日	IDBI2300PE	200	50	卖出
2022年7月15日	REL2300PE	100	50	买入
2022年7月15日	IDBI2300PE	20	50	买入
2022年7月16日	REL2300PE	200	35	买入
2022年7月30日	IDBI2300PE	60	50	卖出
2022年7月30日	REL2300PE	450	45	卖出
2022年7月30日	IDBI2300PE	200	25	卖出

如果我们专注于股票符号'REL2300PE'，它以450的数量出售，导致总销售价值为$20,250。现在，让我们计算这支股票的前450单位的成本价格。成本价格可以通过将与这支股票相关的每次买入交易的数量和每单位价格的乘积相加来计算。在这种情况下，前450单位的成本价格计算如下：200单位 * $50 + 100单位 * $50 + 150单位 * $35 = $20,250。由于销售价值和前450单位的成本价格相同（$20,250），因此这支股票的盈亏应为0。

英文:

Consider the following sample dataset.

Date	symbol	qty	price per qty	type
07 July 2022	REL2300PE	200	50	buy
07 July 2022	IDBI2300PE	200	50	sell
15 July 2022	REL2300PE	100	50	buy
15 July 2022	IDBI2300PE	20	50	buy
16 July 2022	REL2300PE	200	35	buy
30 July 2022	IDBI2300PE	60	50	sell
30 July 2022	REL2300PE	450	45	sell
30 July 2022	IDBI2300PE	200	25	sell

If we focus on the stock symbol 'REL2300PE', it is being sold in quantities of 450, resulting in a total sale value of $20,250. Now, let's calculate the cost price for the first 450 units of this stock. The cost price can be calculated by summing up the product of the quantity and price per quantity for each buying trade associated with this stock. In this case, the cost price for the first 450 units is calculated as follows: 200 units * $50 + 100 units * $50 + 150 units * $35 = $20,250. Since the sale value and the cost price for the first 450 units are the same ($20,250), the profit/loss for this stock should be 0.

答案1

得分: 1

You could follow this algorithm:

计算当前行和所有前面行的销售和购买的累积总和，考虑负数的销售数量
计算当前行和所有后续行的销售累积总和，以获取最早购买数量的总销售物品
获取前两者之间的差异，并将其视为实际消耗数量
按符号/产品分组并计算利润/亏损

Input:

from pyspark.sql import Window
from pyspark.sql.functions import col, when, sum
data = [\
    ('05 July 2022', 'IDBI2300PE', 500, 45, 'buy'),\
    ('07 July 2022', 'REL2300PE', 200, 50, 'buy'),\
    ('07 July 2022', 'IDBI2300PE', 200, 50, 'sell'),\
    ('15 July 2022', 'REL2300PE', 100, 50, 'buy'),\
    ('15 July 2022', 'IDBI2300PE', 20, 50, 'buy'),\
    ('16 July 2022', 'REL2300PE', 200, 35, 'buy'),\
    ('20 July 2022', 'REL2300PE', 200, 45, 'sell'),\
    ('30 July 2022', 'IDBI2300PE', 60, 50, 'sell'),\
    ('30 July 2022', 'REL2300PE', 250, 45, 'sell'),\
    ('31 July 2022', 'IDBI2300PE', 200, 25, 'sell')]
df = spark.createDataFrame(data, ["Date", "symbol", "qty", "price", "type"])

Calculate consumed qty:

# 计算当前和所有前面行的销售和购买的累积总和
cum_sum_wind = Window.partitionBy('symbol').orderBy('Date').rangeBetween(Window.unboundedPreceding, 0)
df = df.withColumn('cum_sum', sum(when(col('type') == 'sell', -1 * col('qty'))\
                                    .otherwise(col('qty'))).over(cum_sum_wind))
# 计算当前和所有后续行的销售累积总和
sell_cum_sum_wind = Window.partitionBy('symbol').orderBy('Date').rangeBetween(0, Window.unboundedFollowing)
df = df.withColumn('sell_cum_sum', sum(when(col('type') == 'sell', col('qty'))\
                                         .otherwise(0)).over(sell_cum_sum_wind))
# 计算实际消耗数量
df = df.withColumn('cons_qty', when(col('type') == 'sell', col('qty'))\ 
                                .when(col('sell_cum_sum') > col('cum_sum'), col('qty'))\
                                # 如果为负数，则从此行中未消耗任何物品
                                .when((col('qty') - (col('cum_sum') - col('sell_cum_sum'))) < 0, 0)\
                                .otherwise(col('qty') - (col('cum_sum') - col('sell_cum_sum'))))
df.show()

+------------+----------+---+-----+----+-------+------------+--------+
|        Date|    symbol|qty|price|type|cum_sum|sell_cum_sum|cons_qty|
+------------+----------+---+-----+----+-------+------------+--------+
|05 July 2022|IDBI2300PE|500|   45| buy|    500|         460|     460|
|07 July 2022|IDBI2300PE|200|   50|sell|    300|         460|     200|
|15 July 2022|IDBI2300PE| 20|   50| buy|    320|         260|       0|
|30 July 2022|IDBI2300PE| 60|   50|sell|    260|         260|      60|
|31 July 2022|IDBI2300PE|200|   25|sell|     60|         200|     200|
|07 July 2022| REL2300PE|200|   50| buy|    200|         450|     200|
|15 July 2022| REL2300PE|100|   50| buy|    300|         450|     100|
|16 July 2022| REL2300PE|200|   35| buy|    500|         450|     150|
|20 July 2022| REL2300PE|200|   45|sell|    300|         450|     200|
|30 July 2022| REL2300PE|250|   45|sell|     50|         250|     250|
+------------+----------+---+-----+----+-------+------------+--------+

Calculate global profit/loss per symbol:

# 按符号分组并计算利润/亏损
result = df.groupby('symbol')\
    .agg(
        sum(when(col('type') == 'buy', -1 * col('price') * col('cons_qty'))\
                .otherwise(col('price') * col('cons_qty'))\
        ).alias("profit_loss"))
result.show()

+----------+-----------+
|    symbol|profit_loss|
+----------+-----------+
|IDBI2300PE|      -2700|
| REL2300PE|          0|
+----------+-----------+

英文:

You could follow this algorithm:

Calculate the cumulative sum for the sells and buys for the current row and all prev rows, and consider the sell qty with negative
Calculate the cumulative sum for the sales only for the current row and all following rows, so you can get the total sold items at the oldest buy qty
Get the difference between the first two and consider it as the actual consumed-qty
Group by symbol/product and calculate the profit/loss

Input:

from pyspark.sql import Window
from pyspark.sql.functions import col, when, sum
data = [\
    (&#39;05 July 2022&#39;, &#39;IDBI2300PE&#39;, 500, 45, &#39;buy&#39;),\
    (&#39;07 July 2022&#39;, &#39;REL2300PE&#39;, 200, 50, &#39;buy&#39;),\
    (&#39;07 July 2022&#39;, &#39;IDBI2300PE&#39;, 200, 50, &#39;sell&#39;),\
    (&#39;15 July 2022&#39;, &#39;REL2300PE&#39;, 100, 50, &#39;buy&#39;),\
    (&#39;15 July 2022&#39;, &#39;IDBI2300PE&#39;, 20, 50, &#39;buy&#39;),\
    (&#39;16 July 2022&#39;, &#39;REL2300PE&#39;, 200, 35, &#39;buy&#39;),\
    (&#39;20 July 2022&#39;, &#39;REL2300PE&#39;, 200, 45, &#39;sell&#39;),\
    (&#39;30 July 2022&#39;, &#39;IDBI2300PE&#39;, 60, 50, &#39;sell&#39;),\
    (&#39;30 July 2022&#39;, &#39;REL2300PE&#39;, 250, 45, &#39;sell&#39;),\
    (&#39;31 July 2022&#39;, &#39;IDBI2300PE&#39;, 200, 25, &#39;sell&#39;)]
df = spark.createDataFrame(data, [&quot;Date&quot;, &quot;symbol&quot;, &quot;qty&quot;, &quot;price&quot;, &quot;type&quot;])

Calculate consumed qty:

# Calculate cumulative sum for sells and buys for the current and all prev rows 
cum_sum_wind = Window.partitionBy(&#39;symbol&#39;).orderBy(&#39;Date&#39;).rangeBetween(Window.unboundedPreceding, 0)
df = df.withColumn(&#39;cum_sum&#39;, sum(when(col(&#39;type&#39;) == &#39;sell&#39;, -1 * col(&#39;qty&#39;))\
                                    .otherwise(col(&#39;qty&#39;))).over(cum_sum_wind))
# Calculate cumulative sum for sells only for the current and all following rows
sell_cum_sum_wind = Window.partitionBy(&#39;symbol&#39;).orderBy(&#39;Date&#39;).rangeBetween(0, Window.unboundedFollowing)
df = df.withColumn(&#39;sell_cum_sum&#39;, sum(when(col(&#39;type&#39;) == &#39;sell&#39;, col(&#39;qty&#39;))\
                                         .otherwise(0)).over(sell_cum_sum_wind))
# Calculate the actual consumed qty
df = df.withColumn(&#39;cons_qty&#39;, when(col(&#39;type&#39;) == &#39;sell&#39;, col(&#39;qty&#39;))\ 
                                .when(col(&#39;sell_cum_sum&#39;) &gt; col(&#39;cum_sum&#39;), col(&#39;qty&#39;))\
                                # If negative then nothing is consumed from this row
                                .when((col(&#39;qty&#39;) - (col(&#39;cum_sum&#39;) - col(&#39;sell_cum_sum&#39;))) &lt; 0, 0)\
                                .otherwise(col(&#39;qty&#39;) - (col(&#39;cum_sum&#39;) - col(&#39;sell_cum_sum&#39;))))
df.show()

+------------+----------+---+-----+----+-------+------------+--------+
|        Date|    symbol|qty|price|type|cum_sum|sell_cum_sum|cons_qty|
+------------+----------+---+-----+----+-------+------------+--------+
|05 July 2022|IDBI2300PE|500|   45| buy|    500|         460|     460|
|07 July 2022|IDBI2300PE|200|   50|sell|    300|         460|     200|
|15 July 2022|IDBI2300PE| 20|   50| buy|    320|         260|       0|
|30 July 2022|IDBI2300PE| 60|   50|sell|    260|         260|      60|
|31 July 2022|IDBI2300PE|200|   25|sell|     60|         200|     200|
|07 July 2022| REL2300PE|200|   50| buy|    200|         450|     200|
|15 July 2022| REL2300PE|100|   50| buy|    300|         450|     100|
|16 July 2022| REL2300PE|200|   35| buy|    500|         450|     150|
|20 July 2022| REL2300PE|200|   45|sell|    300|         450|     200|
|30 July 2022| REL2300PE|250|   45|sell|     50|         250|     250|
+------------+----------+---+-----+----+-------+------------+--------+

Calculate global profit/loss per symbol:

# Groupby symbol and calculate the profit/loss
result = df.groupby(&#39;symbol&#39;)\
    .agg(
        sum(when(col(&#39;type&#39;) == &#39;buy&#39;, -1 * col(&#39;price&#39;) * col(&#39;cons_qty&#39;))\
                .otherwise(col(&#39;price&#39;) * col(&#39;cons_qty&#39;))\
        ).alias(&quot;profit_loss&quot;))
result.show()

+----------+-----------+
|    symbol|profit_loss|
+----------+-----------+
|IDBI2300PE|      -2700|
| REL2300PE|          0|
+----------+-----------+

答案2

得分: 0

使用withColumn添加一个名为total_price的新列，根据类型为'buy'时乘以-1，类型为'sell'时乘以1进行额外计算。

df = df.withColumn('total_price', when(df.type == 'buy', (df.qty * df.price_per_qty * (-1))).otherwise(df.qty * df.price_per_qty))
df = df.groupBy(col('symbol')).sum(col('total_price').alias('profit_loss'))

请尽量理解我的概念。可能会有代码语法错误，因为我现在没有笔记本电脑。

谢谢。

英文:

Add a new column total_price using withColumn using additional calculation as multiply by -1 to total_price if type is 'buy' and multiply by 1 if type is 'sell'

df = df.withColumn(&#39;total_price&#39;, when(df.type == &#39;buy&#39;, (df.qty * df.price_per_qty * (-1))).otherwise(df.qty*df.price_per_qty) 
df = df.groupBy(col(&#39;symbol&#39;)).sum(col(&#39;total_price&#39;).alias(&#39;profit_loss&#39;))

Please try to understand my concept here. There may be code syntax error as I don't have laptop now.

Thanks

答案3

得分: 0

如果您想使用Hive来解决这个问题，可以使用以下查询来获得所需的结果：

select date, symbol, total_profit from (
    select date,
           symbol,
           sum(qty * (CASE 
                         WHEN type='buy' THEN price_per_qty 
                         ELSE (price_per_qty*(-1)) 
                      END
                    )
           ) over (partition by symbol order by date  rows between unbounded preceding and current row) as total_profit,
           row_number() over( partition by symbol order by date desc ) as rn
    from stock1
) as running_sum where rn=1;

英文:

If you want to use Hive to solve this problem you can use below query to get the desired result:

    select date
    ,symbol
    ,sum(qty * (CASE 
                    WHEN type=&#39;buy&#39; THEN price_per_qty 
                    ELSE (price_per_qty*(-1)) 
                    END
                )
        )  over (partition by symbol order by date  rows between unbounded preceding and current row) as total_profit
        ,row_number() over( partition by symbol order by date desc ) as rn
    from stock1) as running_sum where rn=1;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我需要计算给定股票数据集的盈亏，确保首先卖出首次购买的股票。

问题

答案1

答案2

答案3

查询在H2中的group by正常工作，但在Oracle DB中不起作用。

日期转换在SQL Server中如何实现？

如何在显示数据时避免换行。

如何通过CROSS APPLY获取多层嵌套JSON属性的值？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。