Converting Json records to parquet using python

huangapple go评论59阅读模式
英文:

Converting Json records to parquet using python

问题

以下是翻译好的内容:

我正在尝试将JSON输入记录转换为Parquet格式并发送回输出我正在获取以下示例JSON记录作为输入

**输入记录**

```json
{'id': '37547594730892523208777', 'timestamp': 1518747, 'message': '10-05-2023 04:21:58.092 [pool-2987-thread-1] INFO com.github.vjhdgk.loggenerator.SellRequest - id=32802,ip=188.219.135.214, email=cbhdg3@gmail.com,sex=F,brand=redjh,name=imac Touch,color=cert,options=Disk 32Go,price=329.0'}

我在Lambda函数中使用以下代码将上述JSON日志转换为Parquet格式并返回。

index.py

import base64
import gzip
import json
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import boto3

def lambda_handler(event, context):
    print('Received event: %s', event)

    output = []
    json_string = ''
    print(event['records'])
    for record in event['records']:
        print(record['data'])

        data = json.loads(gzip.decompress(
            base64.b64decode(record['data'])))
        print(data['logEvents'])

        for logEvents in data['logEvents']:
            print(logEvents)

        df = pd.DataFrame(data=processed_messages)
        print("df", df.head())
    
        # 将Pandas数据帧转换为Arrow表
        table = pa.Table.from_pandas(df)
    
        # 将Arrow表写入内存中的Parquet文件
        parquet_bytes = pa.BufferOutputStream()
        pq.write_table(table, parquet_bytes)

        output_record = {
            'recordId': record['recordId'],
            'result': 'Ok',
            'data': base64.b64encode(json_string.encode('utf-8')).decode('utf-8')
        }
        output.append(output_record)

    print(json.dumps(output))

    return {'records': output}

有人可以帮助我修改这段代码以将上述JSON记录转换为Parquet吗?
我不想将转换后的Parquet记录写入任何文件,只想将它们作为输出返回。谢谢。

希望这对你有所帮助。如果有任何其他问题,请随时提问。

<details>
<summary>英文:</summary>

I am trying to convert json input records as parquet format and send back to the output. i am getting below sample json records as input.

**input records:**

{'id': '37547594730892523208777', 'timestamp': 1518747, 'message': '10-05-2023 04:21:58.092
[pool-2987-thread-1] INFO com.github.vjhdgk.loggenerator.SellRequest - id=32802,ip=188.219.135.214,
email=cbhdg3@gmail.com,sex=F,brand=redjh,name=imac Touch,color=cert,options=Disk 32Go,price=329.0'}


I am using below code in lambda function to convert above json logs to parquet format and send return back. 

**index.py**

import base64
import gzip
import json
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import boto3

def lambda_handler(event, context):
print('Received event: %s', event)

output = []
json_string = &#39;&#39;
print(event[&#39;records&#39;])
for record in event[&#39;records&#39;]:
    print(record[&#39;data&#39;])

    data = json.loads(gzip.decompress(
        base64.b64decode(record[&#39;data&#39;])))
    print(data[&#39;logEvents&#39;])

    for logEvents in data[&#39;logEvents&#39;]:
        print(logEvents)

    df = pd.DataFrame(data=processed_messages)
    print(&quot;df&quot;, df.head())

    # Convert the Pandas dataframe to an Arrow table
    table = pa.Table.from_pandas(df)

    # Write the Arrow table to a Parquet file in memory
    parquet_bytes = pa.BufferOutputStream()
    pq.write_table(table, parquet_bytes)

    output_record = {
        &#39;recordId&#39;: record[&#39;recordId&#39;],
        &#39;result&#39;: &#39;Ok&#39;,
        &#39;data&#39;: base64.b64encode(json_string.encode(&#39;utf-8&#39;)).decode(&#39;utf-8&#39;)
    }
    output.append(output_record)

print(json.dumps(output))

return {&#39;records&#39;: output}


can anyone help me with this code to convert above json records as parquet. 
I don&#39;t want to write converted parquet records to any file, i just want them to return back as output. Thanks in advance

</details>


# 答案1
**得分**: 1

"Since the data is already in a dataframe, I think `df.to_parquet(...)` should work."可以翻译为:"由于数据已经在数据框中,我认为 `df.to_parquet(...)` 应该可以工作。"

<details>
<summary>英文:</summary>

Since the data is already in a dataframe, I think `df.to_parquet(...)` should work.

</details>



huangapple
  • 本文由 发表于 2023年5月10日 14:36:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76215512.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定