英文:
Apache Ignite Persistence Issues/Warnings
问题
我们正在对Apache Ignite进行写入密集型的负载测试。我们也在进行读取操作。在测试几小时后,我们经常收到来自sys-stripe线程的以下警告消息。
数据库配置:
RAM - 8GB,CPU核心 - 64,
持久化 - 开启,
堆内存 - 2GB,持久化内存 - 2GB,XX:MaxDirectMemorySize - 1GB,
WAL归档 - 关闭,
checkpointBufferSize - 1GB,
walSegmentSize - 256MB
我们在应用服务器(客户端)中初始化了70个查询线程,客户端具有64核CPU和2GB最大堆配置。
另一种类型的警告:
[WARNING][grid-timeout-worker-#135][G] >>> 可能出现分片池中的饥饿情况。
线程名称:sys-stripe-0-#1
队列:[消息闭包 [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=a1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=b1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
死锁:false
完成:1396153
线程 [name="sys-stripe-0-#1",id=28,state=TIMED_WAITING,blockCnt=61,waitCnt=1874266]
锁定 [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@1b08480,ownerName=null,ownerId=-1]
在sun.misc.Unsafe.park(Native Method)处休眠
在java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)处休眠
在java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)处休眠
在java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)处休眠
在java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)处休眠
在o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1638)处休眠
在o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:368)处休眠
在o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1304)处休眠
...
池度量信息:
分片线程池 [活动=7,空闲=57,队列大小=9]
由于上述警告,我们认为SQL查询会出现以下延迟:
[WARNING][long-qry-#170][LongRunningQueryManager] 查询执行时间过长 [持续时间=3424ms,类型=MAP,分布式联接=false,强制联接顺序=false,懒加载=false,模式=PUBLIC,SQL='SELECT
"__Z0"."ID" "__C0_0",
"__Z0"."URL" "__C0_1",
"__Z0"."SCORE" "__C0_2",
"__Z0"."APPNAME_ID" "__C0_3"
FROM "PUBLIC"."URLS" "__Z0"
WHERE "__Z0"."APPNAME_ID" = ?1
ORDER BY 3 FETCH FIRST ?2 ROWS ONLY',计划=SELECT
__Z0.ID AS __C0_0,
__Z0.URL AS __C0_1,
__Z0.SCORE AS __C0_2,
__Z0.APPNAME_ID AS __C0_3
FROM PUBLIC.URLS __Z0
/* PUBLIC.IDX_2_URLS */
/* scanCount: 101020 */
WHERE __Z0.APPNAME_ID = ?1
ORDER BY 3
FETCH FIRST ?2 ROWS ONLY
/* index sorted */, node=TcpDiscoveryNode [id=2ab5710f-6568-4940-b3cc-ce756a634f4e, consistentId=2ab5710f-6568-4940-b3cc-ce756a634f4e, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.20.46.195], sockAddrs=HashSet [/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /172.20.46.195:0], discPort=0, order=3, intOrder=3, lastExchangeTime=1588604898769, loc=false, ver=8.7.10#20191227-sha1:c481441d, isClient=true], reqId=292181, segment=0]
在运行12小时后,检查点需要大约1分钟才能完成。
还有一些其他警告经常打印在日志中,请查看以下内容:
2个检查点页面尚未写入,因为未成功获取页面写入锁,将进行重试
页面修改受到限制 [percentOfPartTime=0.62,markDirty=2440页/
<details>
<summary>英文:</summary>
We are running a write-intensive load test on Apache Ignite. We are also doing reads. We are getting the following warning messages from sys-stripe threads quite often after a few hours of testing.
DB Config ->
RAM - 8GB, CPU cores - 64,
Persistence - ON,
Heap - 2GB, Durable memory Off-heap - 2GB, XX:MaxDirectMemorySize - 1GB,
WAL Archiving - Off,
checkpointBufferSize - 1GB,
walSegmentSize - 256mb
We initialized 70 threads(which query) in App Server (Client) and the client has 64 cores CPU with 2GB maxheap configuration.
[WARNING][grid-timeout-worker-#135][G] >>> Possible starvation in striped pool.
Thread name: sys-stripe-1-#2
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=49990d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=59990d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateFilterRequest [filter=[o.a.i.i.processors.cache.CacheEntrySerializablePredicate@653d7f85], parent=GridNearAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=129, val=null, hasValBytes=true], parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=29605669, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=needRes|keepBinary]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxFinishRequest [miniId=1, mvccSnapshot=null, super=GridDistributedTxFinishRequest [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], futId=97a90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, threadId=1251, commitVer=null, invalidate=false, commit=false, baseVer=null, txSize=0, sys=true, plc=2, subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, flags=32, syncMode=FULL_SYNC, txState=null, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652521556, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59316508, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest [futId=cca90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null, flags=, super=GridDistributedTxPrepareRequest [threadId=2528, concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, writeVer=GridCacheVersion [topVer=200084669, order=1588652521620, nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry [txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null, hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last|sys, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652521620, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59316562, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=36c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=46c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest [futId=39c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null, flags=, super=GridDistributedTxPrepareRequest [threadId=2141, concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, writeVer=GridCacheVersion [topVer=200084669, order=1588652522004, nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry [txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null, hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last|sys, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652522004, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59316868, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=8cc90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=9cc90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=4cd90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=5cd90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=28e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=38e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest [futId=d9e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null, flags=, super=GridDistributedTxPrepareRequest [threadId=3430, concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, writeVer=GridCacheVersion [topVer=200084669, order=1588652522452, nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry [txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null, hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last|sys, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652522452, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59317224, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearSingleGetRequest [futId=1588665032453, key=KeyCacheObjectImpl [part=385, val=null, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
Deadlock: false
Completed: 1094133
Thread [name="sys-stripe-1-#2", id=29, state=RUNNABLE, blockCnt=32, waitCnt=1280827]
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.dataSize(RecordDataV1Serializer.java:2083)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.plainSize(RecordDataV1Serializer.java:386)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV2Serializer.plainSize(RecordDataV2Serializer.java:101)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.size(RecordDataV1Serializer.java:181)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordV2Serializer$2.sizeWithHeaders(RecordV2Serializer.java:96)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordV2Serializer.size(RecordV2Serializer.java:226)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:837)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:796)
at o.a.i.i.processors.cache.GridCacheMapEntry.logUpdate(GridCacheMapEntry.java:4307)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.remove(GridCacheMapEntry.java:6505)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6177)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5863)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3820)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3714)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1969)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1940)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1847)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1654)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1637)
at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2436)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:433)
at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2309)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2576)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2036)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1854)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3241)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
at o.a.i.i.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
at o.a.i.i.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
at o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
Another type of warning
[WARNING][grid-timeout-worker-#135][G] >>> Possible starvation in striped pool.
Thread name: sys-stripe-0-#1
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=a1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=b1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
Deadlock: false
Completed: 1396153
Thread [name="sys-stripe-0-#1", id=28, state=TIMED_WAITING, blockCnt=61, waitCnt=1874266]
Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@1b08480, ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1638)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:368)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1304)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:709)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1102)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:410)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:576)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:373)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:182)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:160)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:122)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:204)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:202)
at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
at o.a.i.i.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
at o.a.i.i.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
at o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
Pool metric
Striped thread pool [active=7, idle=57, qSize=9]
I am assuming because of the above warnings we are getting the below delay on SQL queries.
[WARNING][long-qry-#170][LongRunningQueryManager] Query execution is too long [duration=3424ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=PUBLIC, sql='SELECT
"__Z0"."ID" "__C0_0",
"__Z0"."URL" "__C0_1",
"__Z0"."SCORE" "__C0_2",
"__Z0"."APPNAME_ID" "__C0_3"
FROM "PUBLIC"."URLS" "__Z0"
WHERE "__Z0"."APPNAME_ID" = ?1
ORDER BY 3 FETCH FIRST ?2 ROWS ONLY', plan=SELECT
__Z0.ID AS __C0_0,
__Z0.URL AS __C0_1,
__Z0.SCORE AS __C0_2,
__Z0.APPNAME_ID AS __C0_3
FROM PUBLIC.URLS __Z0
/* PUBLIC.IDX_2_URLS /
/ scanCount: 101020 /
WHERE __Z0.APPNAME_ID = ?1
ORDER BY 3
FETCH FIRST ?2 ROWS ONLY
/ index sorted */, node=TcpDiscoveryNode [id=2ab5710f-6568-4940-b3cc-ce756a634f4e, consistentId=2ab5710f-6568-4940-b3cc-ce756a634f4e, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.20.46.195], sockAddrs=HashSet [/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /172.20.46.195:0], discPort=0, order=3, intOrder=3, lastExchangeTime=1588604898769, loc=false, ver=8.7.10#20191227-sha1:c481441d, isClient=true], reqId=292181, segment=0]
We don't think the there is any issue with query because we ran a read/write speed test on big data and found the query time to be under 10ms. But here it's getting delayed with even less data in database.
After 12 hours of uptime, checkpointing is taking approximately 1 minute to finish.
There are a few more warnings that are being printed in logs often. Please find them belo
2 checkpoint pages were not written yet due to unsuccessful page write lock acquisition and will be retried
Throttling is applied to page modifications [percentOfPartTime=0.62, markDirty=2440 pages/sec, checkpointWrite=1971 pages/sec, estIdealMarkDirty=0 pages/sec, curDirty=0.00, maxDirty=0.02, avgParkTime=253807 ns, pages: (total=132474, evicted=0, written=831, synced=0, cpBufUsed=543, cpBufTotal=259107)]
[sys-stripe-38-#39][GridContinuousProcessor] Failed to wait for ack message. [node=2ab5710f-6568-4940-b3cc-ce756a634f4e, routine=b24f7959-546b-4242-81d4-c51de3ce0fc2]
Page replacements started, pages will be rotated with disk, this will affect storage performance (consider increasing DataRegionConfiguration#setMaxSize for data region)
</details>
# 答案1
**得分**: 1
[answer][1]是在Ignite用户列表上提供的。
[1]: http://apache-ignite-users.70518.x6.nabble.com/Apache-Ignite-Persistence-Issues-Warnings-td32313.html
<details>
<summary>英文:</summary>
The [answer][1] is provided on the Ignite user list.
[1]: http://apache-ignite-users.70518.x6.nabble.com/Apache-Ignite-Persistence-Issues-Warnings-td32313.html
</details>
# 答案2
**得分**: 1
除了用户列表上的答案之外,我认为在进行大量写入操作时,您的检查点页面缓冲区可能已经用完了。
我建议在数据区域中增加`checkpointPageBufferSize`属性。如果没有指定,它会从您的数据区域大小的20%开始。
<details>
<summary>英文:</summary>
In addition to answers on user list, I think you are running out of checkpoint page buffer when doing intensive writes.
I recommend increasing `checkpointPageBufferSize` attribute if your data regions. It starts at 20% of your data region size if not specified, I think.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论