为什么我的Python文件监视器不将Parquet文件中的数据写入数据帧?

huangapple go评论64阅读模式
英文:

Why is my Python file watcher not writing the data from Parquet files to a data frame?

问题

你的Python代码中似乎存在一些问题,导致数据框架没有被成功创建。你提到可能是文件路径的问题,但是在提供进一步帮助之前,我需要更多的信息,例如错误消息或其他代码部分,以帮助你解决问题。如果你需要进一步的帮助,请提供更多细节。

英文:

I have written a file watcher in Python that will watch a specific folder in my laptop and whenever a new parquet file is created in it, the watcher will pull it and read the data inside using Pandas and construct a data frame from it.

Issue: It does all those activities with perfection except the last bit where it has to write the data to the data frame

Here is the code I have written:

# Imports and decalarations

import os
import sys
import time
import pathlib
import pandas as pd
import pyarrow.parquet as pq

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler, PatternMatchingEventHandler
# Eventhandler class

class Handler(FileSystemEventHandler):
    
    def on_created(self, event):
        
        # Import Data

        filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
        time.sleep(10) # To allow time to complete file write to disk
        dataset = pd.read_parquet(filepath, engine='pyarrow')
        dataset = dataset.reset_index(drop=True)
        dataset.head()

# Code to run for Python Interpreter

if __name__ == "__main__":
    
    path = r"D:\Folder1\Folder2\Folder3" # Path to watch
    
    observer = Observer()
    event_handler = Handler()
    observer.schedule(event_handler, path, recursive=True)
    observer.start()
    
    try:
        while(True):
            pass
            
    except KeyboardInterrupt:
        observer.stop()
        observer.join()

The expected output is the first five rows of the data frame, however, it shows me nothing and I get no error either.

Some Useful Information

  • I have been running this code in Jupyter Notebook.

  • However, I have also run it in Spyder to see whether a data frame appears at all in its Variable Explorer section. But it didn't.

From this, the natural conclusion would be that the data frame isn't getting created at all. But this is what baffles me. Because I have successfully read this same parquet file from a somewhat less sophisticated code (below) yesterday where I fed the file path as a raw string.

# Less Sophisticated Code

filepath = r"D:\Folder1\Folder2\Folder3\filename.parquet"

dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True) # Resets index of dataframe and replaces with integers
dataset.head()

为什么我的Python文件监视器不将Parquet文件中的数据写入数据帧?

Is the filepath the issue then? I am very happy to provide any other information you may need.

Edit: I have added a screenshot of the output from the code that did not have a file watcher

答案1

得分: 2

如果不执行 print(dataset.head()),就不会显示任何内容,不像执行 dataset.info() 一样:```class Handler(FileSystemEventHandler):

def on_created(self, event):

    # 导入数据

    filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
    time.sleep(10)  # 等待文件写入磁盘完成的时间
    dataset = pd.read_parquet(filepath, engine='pyarrow')
    dataset = dataset.reset_index(drop=True)
    print(dataset.head())  # <- 此处

否则你的代码对我来说可以工作。

注意:最好使用 `Path` 而不是 `PureWindowsPath`。

<details>
<summary>英文:</summary>

If you don&#39;t `print` `dataset.head()`, there will be nothing to display unlike `dataset.info()`:

class Handler(FileSystemEventHandler):

def on_created(self, event):
    
    # Import Data

    filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
    time.sleep(10) # To allow time to complete file write to disk
    dataset = pd.read_parquet(filepath, engine=&#39;pyarrow&#39;)
    dataset = dataset.reset_index(drop=True)
    print(dataset.head())  # &lt;- HERE

Else your code works for me.

Note: prefer use `Path` instead of `PureWindowsPath`.

</details>



huangapple
  • 本文由 发表于 2023年5月25日 17:32:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76330816.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定