2020年5月4日 21:20:27go评论85阅读模式

英文:

loading data to Elasticsearch v7.3 using Bulk API

问题

我需要将数据加载到Elasticsearch索引中。我正在使用Elasticsearch的BULK API来将JSON加载到索引中。

private String FOLDER_PATH = "src/main/resources/allJsons";
private String index = "test1";
private static final String TYPE = "test_type";

@Autowired
private RestHighLevelClient restHighLevelClient;

public String loadBulkData() throws IOException {

    BulkRequest bulkRequest = new BulkRequest();
    AtomicInteger counter = new AtomicInteger();
    try (Stream<Path> filePathStream = Files.walk(Paths.get(FOLDER_PATH))) {
        filePathStream.forEach(filePath -> {
            if (Files.isRegularFile(filePath)) {
                counter.getAndIncrement();
                try {
                    String content = Files.readString(filePath);
                    JSONObject jsonObject1 = new JSONObject(content);
                    HashMap yourHashMap1 = new Gson().fromJson(jsonObject1.toString(), HashMap.class);
                    IndexRequest indexRequest = new IndexRequest(index, TYPE).source(yourHashMap1);
                    bulkRequest.add(indexRequest);

                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        });
    }
    try {
        restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return "已将批量数据加载到索引 " + index;
}
}

我有多个基于以下格式的JSON：

[
 {
  "Nutrient" : "Calories",
  "Amount" : " 289.00",
  "Unit" : " kcal"
}, {
  "Nutrient" : "Fat",
  "Amount" : " 17.35",
  "Unit" : " g"
}
]

运行代码时出现错误，
>org.springframework.web.util.NestedServletException: 请求处理失败；嵌套异常为org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]

我认为数据是在JSONArray中，而代码需要JSONObject。有人可以指导一下如何解决这个问题。

英文:

I need to load the data to the elasticsearch index. I am using BULK API of elasticsearch to load the JSONs to index.

private String FOLDER_PATH = &quot;src/main/resources/allJsons&quot;;
    private String index = &quot;test1&quot;;
    private static final String TYPE = &quot;test_type&quot;;

 @Autowired
    private RestHighLevelClient restHighLevelClient;

 public String loadBulkData() throws IOException {

        BulkRequest bulkRequest = new BulkRequest();
        AtomicInteger counter = new AtomicInteger();
        try (Stream&lt;Path&gt; filePathStream = Files.walk(Paths.get(FOLDER_PATH))) {
            filePathStream.forEach(filePath -&gt; {
                if (Files.isRegularFile(filePath)) {
                    counter.getAndIncrement();
                    try {
                        String content = Files.readString(filePath);
                        JSONObject jsonObject1 = new JSONObject(content);
                        HashMap yourHashMap1 = new Gson().fromJson(jsonObject1.toString(), HashMap.class);
                        IndexRequest indexRequest = new IndexRequest(index, TYPE).source(yourHashMap1);
                        bulkRequest.add(indexRequest);

                    } catch (IOException e) {
                        e.printStackTrace();
                    }


                }
            });
        }
        try {
            restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return &quot;Bulk data loaded to index &quot; + index + &quot;&quot;;
    }
}

I have multiple JSONs based on the following format

[
 {
  &quot;Nutrient&quot; : &quot;Calories&quot;,
  &quot;Amount&quot; : &quot; 289.00&quot;,
  &quot;Unit&quot; : &quot; kcal&quot;
}, {
  &quot;Nutrient&quot; : &quot;Fat&quot;,
  &quot;Amount&quot; : &quot; 17.35&quot;,
  &quot;Unit&quot; : &quot; g&quot;
}
]

While running the code it gives me error ,
>org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]

I think the data is in JSONArray and for the code, we need JSONObject. Anyone could please guide to how to do this

答案1

得分: 1

你可以通过将 JSON 对象的哈希映射传递给 Elasticsearch 批量 API 来进行批量插入操作。
你可以通过使用 JSONParser 解析你的 JSON 文件来创建哈希映射。

以下是相同操作的代码示例：

代码：

          Integer id= 1;

          // 你需要调用这个方法来插入批量文档，这个方法内部会调用 `createBulkRequest` 和 `parseObjectList` 方法。
          // 这个方法使用 JSONParser 来解析你的文件，并将其转换为 JSONArray。
           public String insertBulkDocuments() throws Exception {
        		Object obj = new JSONParser().parse(new FileReader(&lt;文件路径&gt;)); 
                JSONArray objList= (JSONArray) obj;       
                BulkRequest request = createBulkRequest(objList);
                BulkResponse bulkresp=restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
                return bulkresp.status().toString();
            }
        	
        	// 通过遍历第一个方法得到的每个 JSONArray 元素，使用 Gson 进行逐个解析，并转换为你定义的对象。
           // 然后将这个对象转换为 Map，并传递给 IndexRequest 对象。
        	private BulkRequest createBulkRequest(JSONArray objList) {
        		BulkRequest request = new BulkRequest();
        		objList.forEach( obj -&gt; parseObjectList((JSONObject) obj, request,id++));
        		return request;
        	}
        	
        	private void parseObjectList(JSONObject obj, BulkRequest request, int id) {
        		Gson gson = new GsonBuilder().create();
        		NutrientDocument doc = gson.fromJson(obj.toJSONString(), NutrientDocument .class);
                
                Map&lt;String, Object&gt; documentMapper = objectMapper.convertValue(doc, Map.class);
        
                IndexRequest indexRequest = new IndexRequest(&lt;你的索引名称&gt;).id(Integer.toString(id)).source(documentMapper);
                request.add(indexRequest);
        	}

你需要创建一个自定义对象，其字段与你的 JSON 相同。我为测试创建了 NutrientDocument，它具有与你的 JSON 相同的字段，我在 parseObjectList 方法中使用它。

public class NutrientDocument {
	private String Nutrient;
	private Float Amount;
	private String Unit;
	public String getNutrient() {
		return Nutrient;
	}
	public void setNutrient(String nutrient) {
		Nutrient = nutrient;
	}
	public Float getAmount() {
		return Amount;
	}
	public void setAmount(Float amount) {
		Amount = amount;
	}
	public String getUnit() {
		return Unit;
	}
	public void setUnit(String unit) {
		Unit = unit;
	}
	
	
	
}

注意：

对于每个文档，Elasticsearch 会生成唯一的 id。

如果要创建我们自己的 id 值而不是使用 Elasticsearch 自动生成的值，我们在 parseObjectList 方法中使用了 id 变量。但是，如果你想使用 Elasticsearch 自动生成的编号，你可以在 parseObjectList 方法中按照下面的方式创建 IndexRequest 对象，并删除我们传递的所有 id 变量。

IndexRequest indexRequest = new IndexRequest(<你的索引名称>).source(documentMapper);

英文:

You can do bulk insertion by passing hashmap of your json objects to Elasticsearch Bulk API.
You can create Hashmap by parsing your JSON file through JSONParser.

Here is the code for the same :

Code :

          Integer id= 1;

          //You need to call this method for inserting bulk documents which 
          // internally calls `createBulkRequest` and `parseObjectList` methods.
          //This method uses JSONParser to parse your file and convert into JSONArray.
           public String insertBulkDocuments() throws Exception {
        		Object obj = new JSONParser().parse(new FileReader(&lt;path-of-file&gt;)); 
                JSONArray objList= (JSONArray) obj;       
                BulkRequest request = createBulkRequest(objList);
                BulkResponse bulkresp=restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
                return bulkresp.status().toString();
            }
        	
        	// Each JSONArray element that was obtained through first method 
           //is parsed individually through Gson and converted into you defined Object. 
           //This object is then converted to Map and passed to IndexRequest object.
        	private BulkRequest createBulkRequest(JSONArray objList) {
        		BulkRequest request = new BulkRequest();
        		objList.forEach( obj -&gt; parseObjectList((JSONObject) obj, request,id++));
        		return request;
        	}
        	
        	private void parseObjectList(JSONObject obj, BulkRequest request, int id) {
        		Gson gson = new GsonBuilder().create();
        		NutrientDocument doc = gson.fromJson(obj.toJSONString(), NutrientDocument .class);
                
                Map&lt;String, Object&gt; documentMapper = objectMapper.convertValue(doc, Map.class);
        
                IndexRequest indexRequest = new IndexRequest(&lt;your-index-name&gt;).id(Integer.toString(id)).source(documentMapper);
                request.add(indexRequest);
        	}

You need to create Custom object which has same feilds as your json . I have created NutrientDocument for testing which has same fields as your JSON and this I am using in parseObjectList method.

public class NutrientDocument {
	private String Nutrient;
	private Float Amount;
	private String Unit;
	public String getNutrient() {
		return Nutrient;
	}
	public void setNutrient(String nutrient) {
		Nutrient = nutrient;
	}
	public Float getAmount() {
		return Amount;
	}
	public void setAmount(Float amount) {
		Amount = amount;
	}
	public String getUnit() {
		return Unit;
	}
	public void setUnit(String unit) {
		Unit = unit;
	}
	
	
	
}

NOTE :

For each document elasticserach generates unique id .

For creating our own id value instead of Elasticsearch autogenerated value, we are using id variable. But, if you want to go with Elasticsearch autogenerated number , you can create IndexRequest object as below in parseObjectList method and remove id variable wherever we are passing.

IndexRequest indexRequest = new IndexRequest(<your-index-name>).source(documentMapper);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将数据加载到Elasticsearch v7.3使用Bulk API。

问题

答案1

代码：

Code :

Java枚举嵌入在字段中。

JSON Schema- Spring Boot

Gradle多项目存储库配置对于spring-boot应用程序不起作用

将JSON中的额外字段”Struc”解析为Pyspark中的单独列。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论