将数据加载到Elasticsearch v7.3使用Bulk API。

huangapple go评论76阅读模式
英文:

loading data to Elasticsearch v7.3 using Bulk API

问题

我需要将数据加载到Elasticsearch索引中。我正在使用Elasticsearch的BULK API来将JSON加载到索引中。

private String FOLDER_PATH = "src/main/resources/allJsons";
private String index = "test1";
private static final String TYPE = "test_type";

@Autowired
private RestHighLevelClient restHighLevelClient;

public String loadBulkData() throws IOException {

    BulkRequest bulkRequest = new BulkRequest();
    AtomicInteger counter = new AtomicInteger();
    try (Stream<Path> filePathStream = Files.walk(Paths.get(FOLDER_PATH))) {
        filePathStream.forEach(filePath -> {
            if (Files.isRegularFile(filePath)) {
                counter.getAndIncrement();
                try {
                    String content = Files.readString(filePath);
                    JSONObject jsonObject1 = new JSONObject(content);
                    HashMap yourHashMap1 = new Gson().fromJson(jsonObject1.toString(), HashMap.class);
                    IndexRequest indexRequest = new IndexRequest(index, TYPE).source(yourHashMap1);
                    bulkRequest.add(indexRequest);

                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        });
    }
    try {
        restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return "已将批量数据加载到索引 " + index;
}
}

我有多个基于以下格式的JSON:

[
 {
  "Nutrient" : "Calories",
  "Amount" : " 289.00",
  "Unit" : " kcal"
}, {
  "Nutrient" : "Fat",
  "Amount" : " 17.35",
  "Unit" : " g"
}
]

运行代码时出现错误,
>org.springframework.web.util.NestedServletException: 请求处理失败;嵌套异常为org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]

我认为数据是在JSONArray中,而代码需要JSONObject。有人可以指导一下如何解决这个问题。

英文:

I need to load the data to the elasticsearch index. I am using BULK API of elasticsearch to load the JSONs to index.

private String FOLDER_PATH = &quot;src/main/resources/allJsons&quot;;
    private String index = &quot;test1&quot;;
    private static final String TYPE = &quot;test_type&quot;;

 @Autowired
    private RestHighLevelClient restHighLevelClient;

 public String loadBulkData() throws IOException {

        BulkRequest bulkRequest = new BulkRequest();
        AtomicInteger counter = new AtomicInteger();
        try (Stream&lt;Path&gt; filePathStream = Files.walk(Paths.get(FOLDER_PATH))) {
            filePathStream.forEach(filePath -&gt; {
                if (Files.isRegularFile(filePath)) {
                    counter.getAndIncrement();
                    try {
                        String content = Files.readString(filePath);
                        JSONObject jsonObject1 = new JSONObject(content);
                        HashMap yourHashMap1 = new Gson().fromJson(jsonObject1.toString(), HashMap.class);
                        IndexRequest indexRequest = new IndexRequest(index, TYPE).source(yourHashMap1);
                        bulkRequest.add(indexRequest);

                    } catch (IOException e) {
                        e.printStackTrace();
                    }


                }
            });
        }
        try {
            restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return &quot;Bulk data loaded to index &quot; + index + &quot;&quot;;
    }
}

I have multiple JSONs based on the following format

[
 {
  &quot;Nutrient&quot; : &quot;Calories&quot;,
  &quot;Amount&quot; : &quot; 289.00&quot;,
  &quot;Unit&quot; : &quot; kcal&quot;
}, {
  &quot;Nutrient&quot; : &quot;Fat&quot;,
  &quot;Amount&quot; : &quot; 17.35&quot;,
  &quot;Unit&quot; : &quot; g&quot;
}
]

While running the code it gives me error ,
>org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]

I think the data is in JSONArray and for the code, we need JSONObject. Anyone could please guide to how to do this

答案1

得分: 1

你可以通过将 JSON 对象的哈希映射传递给 Elasticsearch 批量 API 来进行批量插入操作。
你可以通过使用 JSONParser 解析你的 JSON 文件来创建哈希映射。

以下是相同操作的代码示例:

代码:

          Integer id= 1;

          // 你需要调用这个方法来插入批量文档,这个方法内部会调用 `createBulkRequest` 和 `parseObjectList` 方法。
          // 这个方法使用 JSONParser 来解析你的文件,并将其转换为 JSONArray。
           public String insertBulkDocuments() throws Exception {
        		Object obj = new JSONParser().parse(new FileReader(&lt;文件路径&gt;)); 
                JSONArray objList= (JSONArray) obj;       
                BulkRequest request = createBulkRequest(objList);
                BulkResponse bulkresp=restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
                return bulkresp.status().toString();
            }
        	
        	// 通过遍历第一个方法得到的每个 JSONArray 元素,使用 Gson 进行逐个解析,并转换为你定义的对象。
           // 然后将这个对象转换为 Map,并传递给 IndexRequest 对象。
        	private BulkRequest createBulkRequest(JSONArray objList) {
        		BulkRequest request = new BulkRequest();
        		objList.forEach( obj -&gt; parseObjectList((JSONObject) obj, request,id++));
        		return request;
        	}
        	
        	private void parseObjectList(JSONObject obj, BulkRequest request, int id) {
        		Gson gson = new GsonBuilder().create();
        		NutrientDocument doc = gson.fromJson(obj.toJSONString(), NutrientDocument .class);
                
                Map&lt;String, Object&gt; documentMapper = objectMapper.convertValue(doc, Map.class);
        
                IndexRequest indexRequest = new IndexRequest(&lt;你的索引名称&gt;).id(Integer.toString(id)).source(documentMapper);
                request.add(indexRequest);
        	}

你需要创建一个自定义对象,其字段与你的 JSON 相同。我为测试创建了 NutrientDocument,它具有与你的 JSON 相同的字段,我在 parseObjectList 方法中使用它。

public class NutrientDocument {
	private String Nutrient;
	private Float Amount;
	private String Unit;
	public String getNutrient() {
		return Nutrient;
	}
	public void setNutrient(String nutrient) {
		Nutrient = nutrient;
	}
	public Float getAmount() {
		return Amount;
	}
	public void setAmount(Float amount) {
		Amount = amount;
	}
	public String getUnit() {
		return Unit;
	}
	public void setUnit(String unit) {
		Unit = unit;
	}
	
	
	
}

注意:

对于每个文档,Elasticsearch 会生成唯一的 id

如果要创建我们自己的 id 值而不是使用 Elasticsearch 自动生成的值,我们在 parseObjectList 方法中使用了 id 变量。但是,如果你想使用 Elasticsearch 自动生成的编号,你可以在 parseObjectList 方法中按照下面的方式创建 IndexRequest 对象,并删除我们传递的所有 id 变量。

IndexRequest indexRequest = new IndexRequest(<你的索引名称>).source(documentMapper);

英文:

You can do bulk insertion by passing hashmap of your json objects to Elasticsearch Bulk API.
You can create Hashmap by parsing your JSON file through JSONParser.

Here is the code for the same :

Code :

          Integer id= 1;

          //You need to call this method for inserting bulk documents which 
          // internally calls `createBulkRequest` and `parseObjectList` methods.
          //This method uses JSONParser to parse your file and convert into JSONArray.
           public String insertBulkDocuments() throws Exception {
        		Object obj = new JSONParser().parse(new FileReader(&lt;path-of-file&gt;)); 
                JSONArray objList= (JSONArray) obj;       
                BulkRequest request = createBulkRequest(objList);
                BulkResponse bulkresp=restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
                return bulkresp.status().toString();
            }
        	
        	// Each JSONArray element that was obtained through first method 
           //is parsed individually through Gson and converted into you defined Object. 
           //This object is then converted to Map and passed to IndexRequest object.
        	private BulkRequest createBulkRequest(JSONArray objList) {
        		BulkRequest request = new BulkRequest();
        		objList.forEach( obj -&gt; parseObjectList((JSONObject) obj, request,id++));
        		return request;
        	}
        	
        	private void parseObjectList(JSONObject obj, BulkRequest request, int id) {
        		Gson gson = new GsonBuilder().create();
        		NutrientDocument doc = gson.fromJson(obj.toJSONString(), NutrientDocument .class);
                
                Map&lt;String, Object&gt; documentMapper = objectMapper.convertValue(doc, Map.class);
        
                IndexRequest indexRequest = new IndexRequest(&lt;your-index-name&gt;).id(Integer.toString(id)).source(documentMapper);
                request.add(indexRequest);
        	}

You need to create Custom object which has same feilds as your json . I have created NutrientDocument for testing which has same fields as your JSON and this I am using in parseObjectList method.

public class NutrientDocument {
	private String Nutrient;
	private Float Amount;
	private String Unit;
	public String getNutrient() {
		return Nutrient;
	}
	public void setNutrient(String nutrient) {
		Nutrient = nutrient;
	}
	public Float getAmount() {
		return Amount;
	}
	public void setAmount(Float amount) {
		Amount = amount;
	}
	public String getUnit() {
		return Unit;
	}
	public void setUnit(String unit) {
		Unit = unit;
	}
	
	
	
}

NOTE :

For each document elasticserach generates unique id .

For creating our own id value instead of Elasticsearch autogenerated value, we are using id variable. But, if you want to go with Elasticsearch autogenerated number , you can create IndexRequest object as below in parseObjectList method and remove id variable wherever we are passing.

IndexRequest indexRequest = new IndexRequest(<your-index-name>).source(documentMapper);

huangapple
  • 本文由 发表于 2020年5月4日 21:20:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/61593206.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定