英文:
Why is my Python file watcher not writing the data from Parquet files to a data frame?
问题
你的Python代码中似乎存在一些问题,导致数据框架没有被成功创建。你提到可能是文件路径的问题,但是在提供进一步帮助之前,我需要更多的信息,例如错误消息或其他代码部分,以帮助你解决问题。如果你需要进一步的帮助,请提供更多细节。
英文:
I have written a file watcher in Python that will watch a specific folder in my laptop and whenever a new parquet file is created in it, the watcher will pull it and read the data inside using Pandas and construct a data frame from it.
Issue: It does all those activities with perfection except the last bit where it has to write the data to the data frame
Here is the code I have written:
# Imports and decalarations
import os
import sys
import time
import pathlib
import pandas as pd
import pyarrow.parquet as pq
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler, PatternMatchingEventHandler
# Eventhandler class
class Handler(FileSystemEventHandler):
def on_created(self, event):
# Import Data
filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
time.sleep(10) # To allow time to complete file write to disk
dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True)
dataset.head()
# Code to run for Python Interpreter
if __name__ == "__main__":
path = r"D:\Folder1\Folder2\Folder3" # Path to watch
observer = Observer()
event_handler = Handler()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while(True):
pass
except KeyboardInterrupt:
observer.stop()
observer.join()
The expected output is the first five rows of the data frame, however, it shows me nothing and I get no error either.
Some Useful Information
-
I have been running this code in Jupyter Notebook.
-
However, I have also run it in Spyder to see whether a data frame appears at all in its Variable Explorer section. But it didn't.
From this, the natural conclusion would be that the data frame isn't getting created at all. But this is what baffles me. Because I have successfully read this same parquet file from a somewhat less sophisticated code (below) yesterday where I fed the file path as a raw string.
# Less Sophisticated Code
filepath = r"D:\Folder1\Folder2\Folder3\filename.parquet"
dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True) # Resets index of dataframe and replaces with integers
dataset.head()
Is the filepath the issue then? I am very happy to provide any other information you may need.
Edit: I have added a screenshot of the output from the code that did not have a file watcher
答案1
得分: 2
如果不执行 print(dataset.head())
,就不会显示任何内容,不像执行 dataset.info()
一样:```class Handler(FileSystemEventHandler):
def on_created(self, event):
# 导入数据
filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
time.sleep(10) # 等待文件写入磁盘完成的时间
dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True)
print(dataset.head()) # <- 此处
否则你的代码对我来说可以工作。
注意:最好使用 `Path` 而不是 `PureWindowsPath`。
<details>
<summary>英文:</summary>
If you don't `print` `dataset.head()`, there will be nothing to display unlike `dataset.info()`:
class Handler(FileSystemEventHandler):
def on_created(self, event):
# Import Data
filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
time.sleep(10) # To allow time to complete file write to disk
dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True)
print(dataset.head()) # <- HERE
Else your code works for me.
Note: prefer use `Path` instead of `PureWindowsPath`.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论