英文:
How to convert date column from INT to DATE type in large CSV files?
问题
我有一个超过30 GB的CSV文件,我想使用loadTextEx
函数将其加载并存储到DolphinDB数据库中。文件包含一个名为“date”的列,该列以INT类型存储,其中的值类似于“20230731”。我想将该列转换为DATE类型(即,2023.07.31)。我该如何做呢?
英文:
I have a CSV file of over 30 GB, and I want to load and store it into a DolphinDB database using the loadTextEx
function. The file contains a column named “date” which is stored as INT type, with values like “20230731”. I would like to convert the column into the DATE type (i.e., 2023.07.31). How can I do this?
答案1
得分: 1
你可以定义一个UDF,在该函数中应用temporalParse函数以转换数据类型,然后将这个UDF传递给loadTextEx
中的transform参数。以下是一个例子:
// 建立数据库连接
db = database("dfs://snapshot_L2_TSDB")
// 指定转换后的日期和时间格式,以匹配按日期和哈希分区的数据库
def transType(mutable memTable)
{
return memTable.replaceColumn!(`DataTimeStamp,temporalParse("20221001"+string(memTable.DataTimeStamp),"yyyyMMddHH:mm:ss.SSS"))
}
// 导入数据
tmpTB=loadTextEx(dbHandle=db, tableName=`snapshot_L2_TSDB_t, partitionColumns=`DataTimeStamp`SecurityID, filename=dataFilePath, schema=schemaTB ,sortColumns=`SecurityID`DataTimeStamp, transform=transType);
英文:
You can define a UDF where the function temporalParse can be applied to convert data types, and then pass this UDF to the transform parameter in loadTextEx
. Here is an example:
// establish a database connection
db = database("dfs://snapshot_L2_TSDB")  
// specify the format for converted date and time to match the database which is partitioned by date and hash
def transType(mutable memTable)
{
   return memTable.replaceColumn!(`DataTimeStamp,temporalParse( "20221001"+string(memTable.DataTimeStamp),"yyyyMMddHH:mm:ss.SSS"))
}
// import the data
tmpTB=loadTextEx(dbHandle=db, tableName=`snapshot_L2_TSDB_t, partitionColumns=`DataTimeStamp`SecurityID, filename=dataFilePath,schema=schemaTB ,sortColumns=`SecurityID`DataTimeStamp,transform=transType);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论