Apache Ignite持久性问题/警告

huangapple go评论94阅读模式
英文:

Apache Ignite Persistence Issues/Warnings

问题

我们正在对Apache Ignite进行写入密集型的负载测试。我们也在进行读取操作。在测试几小时后,我们经常收到来自sys-stripe线程的以下警告消息。

数据库配置:
RAM - 8GB,CPU核心 - 64,
持久化 - 开启,
堆内存 - 2GB,持久化内存 - 2GB,XX:MaxDirectMemorySize - 1GB,
WAL归档 - 关闭,
checkpointBufferSize - 1GB,
walSegmentSize - 256MB

我们在应用服务器(客户端)中初始化了70个查询线程,客户端具有64核CPU和2GB最大堆配置。

另一种类型的警告:

[WARNING][grid-timeout-worker-#135][G] >>> 可能出现分片池中的饥饿情况。
    线程名称:sys-stripe-0-#1
    队列:[消息闭包 [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=a1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=b1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
    死锁:false
    完成:1396153
线程 [name="sys-stripe-0-#1",id=28,state=TIMED_WAITING,blockCnt=61,waitCnt=1874266]
    锁定 [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@1b08480,ownerName=null,ownerId=-1]
    在sun.misc.Unsafe.park(Native Method)处休眠
    在java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)处休眠
    在java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)处休眠
    在java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)处休眠
    在java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)处休眠
    在o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1638)处休眠
    在o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:368)处休眠
    在o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1304)处休眠
    ...

池度量信息:

分片线程池 [活动=7,空闲=57,队列大小=9]

由于上述警告,我们认为SQL查询会出现以下延迟:

[WARNING][long-qry-#170][LongRunningQueryManager] 查询执行时间过长 [持续时间=3424ms,类型=MAP,分布式联接=false,强制联接顺序=false,懒加载=false,模式=PUBLIC,SQL='SELECT
"__Z0"."ID" "__C0_0",
"__Z0"."URL" "__C0_1",
"__Z0"."SCORE" "__C0_2",
"__Z0"."APPNAME_ID" "__C0_3"
FROM "PUBLIC"."URLS" "__Z0"
WHERE "__Z0"."APPNAME_ID" = ?1
ORDER BY 3 FETCH FIRST ?2 ROWS ONLY',计划=SELECT
    __Z0.ID AS __C0_0,
    __Z0.URL AS __C0_1,
    __Z0.SCORE AS __C0_2,
    __Z0.APPNAME_ID AS __C0_3
FROM PUBLIC.URLS __Z0
    /* PUBLIC.IDX_2_URLS */
    /* scanCount: 101020 */
WHERE __Z0.APPNAME_ID = ?1
ORDER BY 3
FETCH FIRST ?2 ROWS ONLY
/* index sorted */, node=TcpDiscoveryNode [id=2ab5710f-6568-4940-b3cc-ce756a634f4e, consistentId=2ab5710f-6568-4940-b3cc-ce756a634f4e, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.20.46.195], sockAddrs=HashSet [/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /172.20.46.195:0], discPort=0, order=3, intOrder=3, lastExchangeTime=1588604898769, loc=false, ver=8.7.10#20191227-sha1:c481441d, isClient=true], reqId=292181, segment=0]

在运行12小时后,检查点需要大约1分钟才能完成。

还有一些其他警告经常打印在日志中,请查看以下内容:

2个检查点页面尚未写入,因为未成功获取页面写入锁,将进行重试
页面修改受到限制 [percentOfPartTime=0.62,markDirty=2440页/

<details>
<summary>英文:</summary>

We are running a write-intensive load test on Apache Ignite. We are also doing reads. We are getting the following warning messages from sys-stripe threads quite often after a few hours of testing.

DB Config -&gt;
RAM - 8GB, CPU cores - 64,
Persistence - ON,
Heap - 2GB, Durable memory Off-heap - 2GB, XX:MaxDirectMemorySize - 1GB,
WAL Archiving - Off,
checkpointBufferSize - 1GB,
walSegmentSize - 256mb

We initialized 70 threads(which query) in App Server (Client) and the client has 64 cores CPU with 2GB maxheap configuration.


[WARNING][grid-timeout-worker-#135][G] >>> Possible starvation in striped pool.
Thread name: sys-stripe-1-#2
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=49990d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=59990d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateFilterRequest [filter=[o.a.i.i.processors.cache.CacheEntrySerializablePredicate@653d7f85], parent=GridNearAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=129, val=null, hasValBytes=true], parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=29605669, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=needRes|keepBinary]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxFinishRequest [miniId=1, mvccSnapshot=null, super=GridDistributedTxFinishRequest [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], futId=97a90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, threadId=1251, commitVer=null, invalidate=false, commit=false, baseVer=null, txSize=0, sys=true, plc=2, subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, flags=32, syncMode=FULL_SYNC, txState=null, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652521556, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59316508, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest [futId=cca90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null, flags=, super=GridDistributedTxPrepareRequest [threadId=2528, concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, writeVer=GridCacheVersion [topVer=200084669, order=1588652521620, nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry [txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null, hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last|sys, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652521620, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59316562, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=36c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=46c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest [futId=39c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null, flags=, super=GridDistributedTxPrepareRequest [threadId=2141, concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, writeVer=GridCacheVersion [topVer=200084669, order=1588652522004, nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry [txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null, hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last|sys, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652522004, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59316868, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=8cc90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=9cc90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=4cd90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=5cd90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=28e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=38e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest [futId=d9e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null, flags=, super=GridDistributedTxPrepareRequest [threadId=3430, concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, writeVer=GridCacheVersion [topVer=200084669, order=1588652522452, nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry [txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null, hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last|sys, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669, order=1588652522452, nodeOrder=3], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage [msgId=59317224, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearSingleGetRequest [futId=1588665032453, key=KeyCacheObjectImpl [part=385, val=null, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
Deadlock: false
Completed: 1094133
Thread [name="sys-stripe-1-#2", id=29, state=RUNNABLE, blockCnt=32, waitCnt=1280827]
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.dataSize(RecordDataV1Serializer.java:2083)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.plainSize(RecordDataV1Serializer.java:386)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV2Serializer.plainSize(RecordDataV2Serializer.java:101)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.size(RecordDataV1Serializer.java:181)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordV2Serializer$2.sizeWithHeaders(RecordV2Serializer.java:96)
at o.a.i.i.processors.cache.persistence.wal.serializer.RecordV2Serializer.size(RecordV2Serializer.java:226)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:837)
at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:796)
at o.a.i.i.processors.cache.GridCacheMapEntry.logUpdate(GridCacheMapEntry.java:4307)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.remove(GridCacheMapEntry.java:6505)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6177)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5863)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3820)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3714)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1969)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1940)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1847)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1654)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1637)
at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2436)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:433)
at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2309)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2576)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2036)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1854)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3241)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
at o.a.i.i.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
at o.a.i.i.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
at o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)


Another type of warning

[WARNING][grid-timeout-worker-#135][G] >>> Possible starvation in striped pool.
Thread name: sys-stripe-0-#1
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest [futId=a1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=b1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null, keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
Deadlock: false
Completed: 1396153
Thread [name="sys-stripe-0-#1", id=28, state=TIMED_WAITING, blockCnt=61, waitCnt=1874266]
Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@1b08480, ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1638)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:368)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1304)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:709)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1102)
at o.a.i.i.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:410)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:576)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:373)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:182)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:160)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:122)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:204)
at o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:202)
at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
at o.a.i.i.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
at o.a.i.i.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
at o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)


Pool metric

Striped thread pool [active=7, idle=57, qSize=9]



I am assuming because of the above warnings we are getting the below delay on SQL queries.

[WARNING][long-qry-#170][LongRunningQueryManager] Query execution is too long [duration=3424ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=PUBLIC, sql='SELECT
"__Z0"."ID" "__C0_0",
"__Z0"."URL" "__C0_1",
"__Z0"."SCORE" "__C0_2",
"__Z0"."APPNAME_ID" "__C0_3"
FROM "PUBLIC"."URLS" "__Z0"
WHERE "__Z0"."APPNAME_ID" = ?1
ORDER BY 3 FETCH FIRST ?2 ROWS ONLY', plan=SELECT
__Z0.ID AS __C0_0,
__Z0.URL AS __C0_1,
__Z0.SCORE AS __C0_2,
__Z0.APPNAME_ID AS __C0_3
FROM PUBLIC.URLS __Z0
/* PUBLIC.IDX_2_URLS /
/
scanCount: 101020 /
WHERE __Z0.APPNAME_ID = ?1
ORDER BY 3
FETCH FIRST ?2 ROWS ONLY
/
index sorted */, node=TcpDiscoveryNode [id=2ab5710f-6568-4940-b3cc-ce756a634f4e, consistentId=2ab5710f-6568-4940-b3cc-ce756a634f4e, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.20.46.195], sockAddrs=HashSet [/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /172.20.46.195:0], discPort=0, order=3, intOrder=3, lastExchangeTime=1588604898769, loc=false, ver=8.7.10#20191227-sha1:c481441d, isClient=true], reqId=292181, segment=0]

We don&#39;t think the there is any issue with query because we ran a read/write speed test on big data and found the query time to be under 10ms. But here it&#39;s getting delayed with even less data in database.

After 12 hours of uptime, checkpointing is taking approximately 1 minute to finish.

There are a few more warnings that are being printed in logs often. Please find them belo

2 checkpoint pages were not written yet due to unsuccessful page write lock acquisition and will be retried

Throttling is applied to page modifications [percentOfPartTime=0.62, markDirty=2440 pages/sec, checkpointWrite=1971 pages/sec, estIdealMarkDirty=0 pages/sec, curDirty=0.00, maxDirty=0.02, avgParkTime=253807 ns, pages: (total=132474, evicted=0, written=831, synced=0, cpBufUsed=543, cpBufTotal=259107)]

[sys-stripe-38-#39][GridContinuousProcessor] Failed to wait for ack message. [node=2ab5710f-6568-4940-b3cc-ce756a634f4e, routine=b24f7959-546b-4242-81d4-c51de3ce0fc2]

Page replacements started, pages will be rotated with disk, this will affect storage performance (consider increasing DataRegionConfiguration#setMaxSize for data region)


</details>


# 答案1
**得分**: 1

[answer][1]是在Ignite用户列表上提供的。

  [1]: http://apache-ignite-users.70518.x6.nabble.com/Apache-Ignite-Persistence-Issues-Warnings-td32313.html

<details>
<summary>英文:</summary>

The [answer][1] is provided on the Ignite user list.


  [1]: http://apache-ignite-users.70518.x6.nabble.com/Apache-Ignite-Persistence-Issues-Warnings-td32313.html

</details>



# 答案2
**得分**: 1

除了用户列表上的答案之外,我认为在进行大量写入操作时,您的检查点页面缓冲区可能已经用完了。

我建议在数据区域中增加`checkpointPageBufferSize`属性。如果没有指定,它会从您的数据区域大小的20%开始。

<details>
<summary>英文:</summary>

In addition to answers on user list, I think you are running out of checkpoint page buffer when doing intensive writes.

I recommend increasing `checkpointPageBufferSize` attribute if your data regions. It starts at 20% of your data region size if not specified, I think.

</details>



huangapple
  • 本文由 发表于 2020年5月5日 14:00:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/61606696.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定