英文:
Custom log processing/parsing
问题
我有这样的日志格式:
[26830431.7966868][4][0.013590574264526367][30398][api][1374829886.320353][init]
GET /foo
{"controller"=>"foo", "action"=>"index"}
[26830431.7966868][666][2.1876697540283203][30398][api][1374829888.4944339][request_end]
200 OK
该条目使用以下模式构建:
[请求ID][用户ID][从请求开始的时间][进程ID][应用程序][时间戳][标签]
负载
在请求期间,我有许多记录的点 - 应用程序基本上具有复杂的行为。这帮助我大大调试用户行为。
我想要解析它的方式是,我想要创建以下目录结构:
请求ID
|
|----[从请求开始的时间][进程ID][时间戳][标签]
|
等等
基本上,每个目录的名称都基于请求ID,并且文件的名称是标签的其余部分。这些文件将包含负载。
而且,我还将有其他目录,其中包含用户ID,其中包含指向该用户执行的请求的符号链接。
**第一个问题:这个结构正确吗?**在我看来,这将使日志访问变得简单快捷。我想要使用目录和文件的原因是我喜欢Unix的方法,并尝试它(亲自感受它的优点和缺点)。
**第二个问题:**我使用Ruby创建这个不会有问题。但是我想学习一些更适合这个任务的新工具。我正在考虑只使用Unix工具(管道、awk等)来实现这一目标,或者使用我正在学习的Golang编写解析器(甚至有时间实现简单的映射减少)。哪个工具最适合这个任务?
英文:
I have such log format:
[26830431.7966868][4][0.013590574264526367][30398][api][1374829886.320353][init]
GET /foo
{"controller"=>"foo", "action"=>"index"}
[26830431.7966868][666][2.1876697540283203][30398][api][1374829888.4944339][request_end]
200 OK
The entry is constracted using such pattern:
[request_id][user_id][time_from_request_started][process_id][app][timestamp][tagline]
payload
Durring request I have many point where I log something - app basically has complex behaviour. This helps me debug a lot the user behaviour.
The way I would like to parse it is that I would like to make have directory structure like this:
req_id
|
|----[time_from_request_started][process_id][timestamp][tagline]
|
etc
Basically each directory will have name based on req_id, with files wchich names are rest of tagline. These files will include payload.
And also I will have other directory, with users ids, which will contain symlinks to request done by this user.
First question: Is this structure correct? In my opinion it will make easy fast log access. The reason I want to use directories and files is that I like unix approach, and try it (feel by myself its drawbacks and advantages)
Second question: I will have no problem to use ruby for creating this. But I would like to learn some new tool, which is better suited for this. I am thinking about using just unix tools (pipe, awk etc) to achieve this, or write parser in golang which I am learning right now (even have time to implement simple map reduce). What tool is best suited for this?
答案1
得分: 1
我不会将日志存储在一个目录中以查看用户的行为。
根据您想要跟踪的行为,您可以使用不同的工具。其中之一可以是mixpanel或keen.io。
您可以将用户的操作记录发送到其中之一的事件中,而不是将其记录在日志文件中(它们非常相似,选择您认为有更好文档/库的那个),然后您可以绘制这些事件以更好地了解用户的行为。我最近经常这样做,为了以漂亮的方式显示数据,我使用了rickshaw。
我建议这样做的关键点是,如果您选择文件路径,您仍然需要找到一种理解数据的方法,而图表将在很大程度上帮助您。此外,可视化是keen.io默认提供的功能,您可能仍然想要制作自己的图表,但这是一个很好的起点。
希望对您有所帮助。
英文:
I would not store logs in a directory to see how the users behave.
Depending on what behaviour you want to keep track of you could use different tools. One of these could be mixpanel or keen.io.
Instead of logging what the user did in a log file you would sent an event to either of those (they are pretty similar, pick the one you think has better docs / lib), then you would graph those events to better understand the behaviour of your users. I've done this a lot recently, to display data in a nice way I've used rickshaw.
The key point why I'm suggesting this is that if you go the file route you will still have to find a way to understand your data, something that graphs will help you a lot at. Also, visualization is something keen.io does by default, you may still want to do your graphs but it's a good start.
Hope this helped.
答案2
得分: 0
只有你自己才能知道这个结构是否正确,它直接取决于数据需要如何访问和使用。
你可以使用UNIX工具来实现这个,但是编写这个的过程也是锻炼Go技能的好机会。而且这样做也更具扩展性。
英文:
> Is this structure correct?
Only you can know that, it depends directly on how the data needs be accessed and used.
> What tool is best suited for this?
You could probably use UNIX tools to achieve this but it may as well be a good exercise to practice your Go skills by writing this. It would also be more extensible.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论