问题

我想从多个CSV文件（用户、分数、消息）中加载数据到一个索引中，通过logstash。

所有CSV文件都有相同的 "userId" 字段，用于连接其中的数据。

我的目标是将User-Index作为结果，其中包含来自User CSV文件的数据作为简单字段，以及来自Scores和Messages文件的数据作为嵌套字段。

是否有一种方法可以实现这一点？

一个用户可以有多条消息和分数。

我不确定是否正确理解了合并的方式，这是我尝试的logstash配置。

input {
    file {
        path => "C:/resources/files/users.csv"
        start_position => "beginning"
        sincedb_path => "NUL"
    }
    file {
        path => "C:/resources/files/scores.csv"
        start_position => "beginning"
        sincedb_path => "NUL"
    }
    file {
        path => "C:/resources/files/messages.csv"
        start_position => "beginning"
        sincedb_path => "NUL"
    }
}

filter {
    if [log]下载地址
[path] == "C:/resources/files/users.csv" {
        csv {
            separator => ","
            columns => ["userId", "username", "email"]
        }
        mutate {
            remove_field => ["[event][original]", "[log]下载地址
[path]", "[log]下载地址
[path][keyword]", "[message]", "[message][keyword]"]
        }
    }

    if [log]下载地址
[path] == "C:/resources/files/scores.csv" {
        csv{
            separator => ","
            columns => ["userId", "field", "score"]
        }
        translate { 
            destination => "[@metadata][scores]" 
            dictionary_path => "C:/resources/files/scores.csv"
            field => "userId" 
        }
        dissect { 
            mapping => { 
                "[@metadata][scores]" => "%{field};%{score}" 
            } 
        }
    }

    if [log]下载地址
[path] == "C:/resources/files/messages.csv" {
        csv {
            separator => ","
            columns => ["userId", "message", "tag"]
        }
        translate { 
            destination => "[@metadata][messages]" 
            dictionary_path => "C:/resources/files/messages.csv"
            field => "userId" 
        }
        dissect { 
            mapping => { 
                "[@metadata][messages]" => "%{message};%{tag}" 
            } 
        }
    }
}

output {
    elasticsearch{
        action => "create"
        hosts => "localhost:9200"
        index => "users-index"
    }
}

英文:

I want to load data from multiple CSV files(Users, Scores, Messages) into one index via logstash.
All CSV files have the same "userId" field that connects data in it.

My goal is to have User-Index as a result, that has the data from the User CSV file as simple fields and the data from Scores and Messages files as nested fields.

Is there a way to somehow achieve this?

One user can have multiple messages and scores.

I am not sure, that i got the idea of merging the correct way, here's the logstash config i tried.

input {
    file {
		path =&gt; &quot;C:/resources/files/users.csv&quot;
		start_position =&gt; &quot;beginning&quot;
		sincedb_path =&gt; &quot;NUL&quot;
	}
	file {
		path =&gt; &quot;C:/resources/files/scores.csv&quot;
		start_position =&gt; &quot;beginning&quot;
		sincedb_path =&gt; &quot;NUL&quot;
	}
	file {
		path =&gt; &quot;C:/resources/files/messages.csv&quot;
		start_position =&gt; &quot;beginning&quot;
		sincedb_path =&gt; &quot;NUL&quot;
	}
}    
              
filter {

	if [log]下载地址[path] == &quot;C:/resources/files/users.csv&quot; {
		csv {
			separator =&gt; &quot;,&quot;
            columns =&gt; [&quot;userId&quot;, &quot;username&quot;, &quot;email&quot;]
        }
		mutate {
		remove_field =&gt; [&quot;[event][original]&quot;, &quot;[log]下载地址
[path]&quot;, &quot;[log]下载地址[path][keyword]&quot;, &quot;[message]&quot;, &quot;[message][keyword]&quot;]
		}
	}
	
	if [log]下载地址[path] == &quot;C:/resources/files/scores.csv&quot; {
		csv{
			separator =&gt; &quot;,&quot;
			columns =&gt; [&quot;userId&quot;, &quot;field&quot;, &quot;score&quot;]
		}
		
		translate { 
			destination =&gt; &quot;[@metadata][scores]&quot; 
		    dictionary_path =&gt; &quot;C:/resources/files/scores.csv&quot;
			field =&gt; &quot;userId&quot; 
			}
        dissect { 
			mapping =&gt; { 
			&quot;[@metadata][scores]&quot; =&gt; &quot;%{field};%{score}&quot; 
			} 
		}
	}
	
	if [log]下载地址[path] == &quot;C:/resources/files/messages.csv&quot; {
		csv {
			separator =&gt; &quot;,&quot;
			columns =&gt; [&quot;userId&quot;, &quot;message&quot;, &quot;tag&quot;]
		}
		
		translate { 
			destination =&gt; &quot;[@metadata][messages]&quot; 
		    dictionary_path =&gt; &quot;C:/resources/files/messages.csv&quot;
			field =&gt; &quot;userId&quot; 
			}
			
        dissect { 
			mapping =&gt; { 
			&quot;[@metadata][messages]&quot; =&gt; &quot;%{message};%{tag}&quot; 
			} 
		}
	}
		
}

output {
    elasticsearch{
		action =&gt; &quot;create&quot;
        hosts =&gt; &quot;localhost:9200&quot;
        index =&gt; &quot;users-index&quot;
    }
	
}

答案1

得分: 0

这不会按照您的预期方式工作。您首先需要将分数和消息数据加载到单独的索引中。

然后，您可以为这两个索引构建一个丰富策略。

接下来，您可以创建一个摄取管道，利用在上一步中构建的两个丰富策略。

最后，您可以修改您的Logstash配置，以便在每个用户记录中使用在上一步中创建的摄取管道，以丰富每个用户的得分和消息。

英文:

It's not going to work the way you expect. You first need to load the scores and messages data into separate indexes.

Then, out of those two indexes you can build up an enrich policy for each of them.

Next, you can create an ingest pipeline that leverages the two enrich policies built during the previous step.

Finally, you can modify your Logstash configuration to use the ingest pipeline created during the previous step in order to enrich each user record with the appropriate scores and messages for that user.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将多个CSV文件索引到一个具有嵌套字段/对象的索引中

问题

答案1

使用Java Rest High Level Client获取Elasticsearch节点的IP地址和其他详细信息。

ElasticSearch 出现 ClassCastException – 将 MappingMetadata 转换为 MappingMetadata。

在ElasticSearch版本7中替换InternalSimpleValue构造函数

将历史时间序列数据索引到Elasticsearch数据流 – ILM

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论