英文:
Read data saved by spark redis using Java
问题
使用 [spark-redis][1] 将 Dataset 保存到 Redis。
然后我使用 [Spring data redis][2] 读取这些数据:
我保存到 Redis 的对象:
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@Builder
@RedisHash("collaborative_filtering")
public class RatingResult implements Serializable {
private static final long serialVersionUID = 8755574422193819444L;
@Id
private String id;
@Indexed
private int user;
@Indexed
private String product;
private double productN;
private double rating;
private float prediction;
public static RatingResult convert(Row row) {
int user = row.getAs("user");
String product = row.getAs("product");
double productN = row.getAs("productN");
double rating = row.getAs("rating");
float prediction = row.getAs("prediction");
String id = user + product;
return RatingResult.builder().id(id).user(user).product(product).productN(productN).rating(rating)
.prediction(prediction).build();
}
}
使用 spark-redis 进行对象保存:
JavaRDD<RatingResult> result = ...
...
sparkSession.createDataFrame(result, RatingResult.class).write().format("org.apache.spark.sql.redis")
.option("table", "collaborative_filtering").mode(SaveMode.Overwrite).save();
Repository:
@Repository
public interface RatingResultRepository extends JpaRepository<RatingResult, String> {
}
我无法使用 Spring data redis 读取通过 spark-redis 保存在 Redis 中的数据,因为结构数据由于 spark-redis 和 spring data redis 的不同而不同(我使用命令进行检查:`redis-cli -p 6379 keys \*` 和 `redis-cli hgetall $key`,spark-redis 和 spring data redis 创建的键的值不同)
那么如何使用 Java 或者 Java 中的任何库来读取已经保存的这些数据呢?
[1]: https://github.com/RedisLabs/spark-redis/blob/master/doc/java.md
[2]: https://www.baeldung.com/spring-data-redis-tutorial
英文:
I using spark-redis to save Dataset to Redis.
Then I read this data by using Spring data redis:
This object I save to redis:
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@Builder
@RedisHash("collaborative_filtering")
public class RatingResult implements Serializable {
private static final long serialVersionUID = 8755574422193819444L;
@Id
private String id;
@Indexed
private int user;
@Indexed
private String product;
private double productN;
private double rating;
private float prediction;
public static RatingResult convert(Row row) {
int user = row.getAs("user");
String product = row.getAs("product");
double productN = row.getAs("productN");
double rating = row.getAs("rating");
float prediction = row.getAs("prediction");
String id = user + product;
return RatingResult.builder().id(id).user(user).product(product).productN(productN).rating(rating)
.prediction(prediction).build();
}
}
Save object by using spark-redis:
JavaRDD<RatingResult> result = ...
...
sparkSession.createDataFrame(result, RatingResult.class).write().format("org.apache.spark.sql.redis")
.option("table", "collaborative_filtering").mode(SaveMode.Overwrite).save();
Repository:
@Repository
public interface RatingResultRepository extends JpaRepository<RatingResult, String> {
}
I can't read this data have been saved in Redis by using Spring data redis because structure data saved by spark-redis and spring data redis not same (I checked value of keys created by spark-redis and spring data redis are different by using command: redis-cli -p 6379 keys \*
and redis-cli hgetall $key
)
So how to read this data have been saved using Java or by any library in Java?
答案1
得分: 1
以下对我有效。
从Spark-Redis中写入数据。
我在这里使用Scala,但与您在Java中所做的基本相同。我唯一更改的是我添加了.option("key.column", "id")
来指定哈希ID。
val ratingResult = new RatingResult("1", 1, "product1", 2.0, 3.0, 4)
val result: JavaRDD[RatingResult] = spark.sparkContext.parallelize(Seq(ratingResult)).toJavaRDD()
spark
.createDataFrame(result, classOf[RatingResult])
.write
.format("org.apache.spark.sql.redis")
.option("key.column", "id")
.option("table", "collaborative_filtering")
.mode(SaveMode.Overwrite)
.save()
在spring-data-redis中,我有以下内容:
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@Builder
@RedisHash("collaborative_filtering")
public class RatingResult implements Serializable {
private static final long serialVersionUID = 8755574422193819444L;
@Id
private String id;
@Indexed
private int user;
@Indexed
private String product;
private double productN;
private double rating;
private float prediction;
@Override
public String toString() {
return "RatingResult{" +
"id='" + id + '\'' +
", user=" + user +
", product='" + product + '\'' +
", productN=" + productN +
", rating=" + rating +
", prediction=" + prediction +
'}';
}
}
我使用CrudRepository而不是JPA:
@Repository
public interface RatingResultRepository extends CrudRepository<RatingResult, String> {
}
查询:
RatingResult found = ratingResultRepository.findById("1").get();
System.out.println("found = " + found);
输出:
found = RatingResult{id='null', user=1, product='product1', productN=2.0, rating=3.0, prediction=4.0}
您可能注意到,id
字段未填充,因为spark-redis存储具有哈希ID,而不是作为哈希属性。
英文:
The following works for me.
Writing data from spark-redis.
I use Scala here, but it's essentially the same as you do in Java. The only thing I changed is I added a .option("key.column", "id")
to specify the hash id.
val ratingResult = new RatingResult("1", 1, "product1", 2.0, 3.0, 4)
val result: JavaRDD[RatingResult] = spark.sparkContext.parallelize(Seq(ratingResult)).toJavaRDD()
spark
.createDataFrame(result, classOf[RatingResult])
.write
.format("org.apache.spark.sql.redis")
.option("key.column", "id")
.option("table", "collaborative_filtering")
.mode(SaveMode.Overwrite)
.save()
In spring-data-redis I have the following:
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
@Builder
@RedisHash("collaborative_filtering")
public class RatingResult implements Serializable {
private static final long serialVersionUID = 8755574422193819444L;
@Id
private String id;
@Indexed
private int user;
@Indexed
private String product;
private double productN;
private double rating;
private float prediction;
@Override
public String toString() {
return "RatingResult{" +
"id='" + id + '\'' +
", user=" + user +
", product='" + product + '\'' +
", productN=" + productN +
", rating=" + rating +
", prediction=" + prediction +
'}';
}
}
I use CrudRepository instead of JPA:
@Repository
public interface RatingResultRepository extends CrudRepository<RatingResult, String> {
}
Querying:
RatingResult found = ratingResultRepository.findById("1").get();
System.out.println("found = " + found);
The output:
found = RatingResult{id='null', user=1, product='product1', productN=2.0, rating=3.0, prediction=4.0}
You may notice that the id
field was not populated because the spark-redis stored has a hash id and not as a hash attribute.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论