2
0
mirror of https://github.com/xcat2/confluent.git synced 2026-06-24 16:30:48 +00:00
Commit Graph

92 Commits

Author SHA1 Message Date
Jarrod Johnson 4c8ba92856 Change configuration sync to use msgpack
This removes use of pickle for config sync over network.
2020-01-27 15:53:29 -05:00
Jarrod Johnson 97ca6dc48e Provide more detail on leader when leader is lost 2019-10-21 13:55:43 -04:00
Jarrod Johnson a84b88e269 Fix mistake in the expression change 2019-10-14 15:02:45 -04:00
Jarrod Johnson fc626d36ba Fix greenlet 'isAlive'
There is no 'isAlive' in a greenlet.
2019-10-14 13:59:24 -04:00
Jarrod Johnson 8cab591a8b Add collective member deletion
This allows deletion of a dead member, down to deleting down to non-collective
mode.
2019-10-10 11:30:03 -04:00
Jarrod Johnson c1953bdad3 Another set of python 3 compatibility
Numerous issues arose, particularly
when participating in a mixed
collective.
2019-10-08 10:45:43 -04:00
Jarrod Johnson 8fc3b7c9c0 Implement cross-python collective compat
This enables cross-version compatibility
for a collective.
2019-10-07 15:41:38 -04:00
Jarrod Johnson 3105b9b1f9 Significantly rework the collective startup behavior
One, make the tracking bools enforce a lock to reduce confusion

Treat an initializing peer as failed, to avoid getting too fixated
on an uncertain target.

Make sure that no more than one follower is tried at a time by
killing before starting a new one, and syncing up the configmanager
state

Decline to act on an assimilation request if we are trying to connect
and also if the current leader asks us to connect and we already are.

Avoid calling get_leader while connecting, as that can cause a member
to decide to become a leader while trying to connect, by swapping
the reactions to the connect request.

Avoid trying to assimilate existing followers.

Fix some logging.
2018-10-12 11:45:23 -04:00
Jarrod Johnson f525c25ba6 Provide more verbose collective logging
This helps understand the flow in practice of collective behavior.
2018-10-11 15:15:11 -04:00
Jarrod Johnson be930fc076 Add missing subsystem marker from a collective log 2018-10-10 16:30:28 -04:00
Jarrod Johnson 32ddb33de3 Fix error when trying to do fullsync without globals yet
If globals is missing, then do not break the sync trying to handle it
2018-10-10 13:11:15 -04:00
Jarrod Johnson b77ed8dbff Fix config sync on dead writer
The sync thread can die without clearing syncrunning.  Make sure that
the thread is alive *and* that the thread has not indicated
intent to give up.
2018-10-10 13:07:27 -04:00
Jarrod Johnson d86e1fc4eb Give the cfg init a lock
Move collective manager and configmanager to share a configinitlock,
so that bad timings during internal initialization and collective
activity cannot interfere and produce corrupt database.

This became an issue with the fix for 'everything' disappearing.
2018-10-02 10:17:44 -04:00
Jarrod Johnson 78a1741e0e Fix usage of check_quorum()
It is not a boolean, it is exception driven.
2018-10-01 16:02:16 -04:00
Jarrod Johnson 4329c1d388 Have collective start bail out if leader
Leader should not relinquish if quorum, so don't bother in such
a case.
2018-10-01 15:50:49 -04:00
Jarrod Johnson b0b5493ff7 Cancel retry if we become leader
If an instance is first to start, it's retry should be canceled
when other members prod it to become leader.
2018-10-01 15:29:18 -04:00
Jarrod Johnson 61e7c90ad1 Do not restart on intentional kill
Additionally, add some output to help filter events log
2018-10-01 10:32:55 -04:00
Jarrod Johnson e57cdf9a7b Add more collective event log handling
More detail to analyze how the collective membership is handled.
2018-09-27 15:15:05 -04:00
Jarrod Johnson 10ce7a9de9 Add more logging to collective process 2018-09-27 10:51:06 -04:00
Jarrod Johnson 0724ad812b Add logging to the assimilation phase of collective
When attempting assimilation, provide logging about the attempt.
2018-09-27 10:51:01 -04:00
Jarrod Johnson a3b0b0240d Abort assimilation attempt on non-member cleanly
If a confluent instance has forgotten the collective, more cleanly
handle the situation, and abort the assimilation rather than assuming
the peer should be leader, unless txcount specifically is called out
as the reason.
2018-09-27 10:50:54 -04:00
Jarrod Johnson 784e4bed2f Force cleanup if follow thread dies of exception
If something killed a follow thread, it was not always able to fire the
recovery off.  Wrap the risky code in a try statement.
2018-08-20 15:02:34 -04:00
Jarrod Johnson f0edbbad39 Have collective show present some info when not in quorum 2018-07-20 14:11:38 -04:00
Jarrod Johnson 5cf1671350 Make the takeover process more deterministic
Try to avoid submitting to be a follower while we are currently
becoming a leader
2018-07-20 13:50:42 -04:00
Jarrod Johnson e5c4219ee9 Reorder certificate check
First order of business is to verify certificate before even thinking
about if the request is possible
2018-07-20 13:34:14 -04:00
Jarrod Johnson a1ba5f59a8 Fix collective show on non-collective 2018-07-19 17:21:01 -04:00
Jarrod Johnson 9bcca6bfad Provide collective show on all members 2018-07-19 17:08:20 -04:00
Jarrod Johnson 54d93571d1 Have leader provide more data in collective show 2018-07-19 16:26:05 -04:00
Jarrod Johnson f2f902de7b Have collective show report when collective inactive
Collective show was misleading if not in a collective.
2018-07-19 15:59:15 -04:00
Jarrod Johnson a09792f969 Schedule periodic attempts to restart collective
If collective is lost due to connectivity, this will cause
occasional attempts to bring it back.
2018-07-19 15:49:05 -04:00
Jarrod Johnson 7d16c943a8 Handle updating address of collective member on connect
If a collective member changes its IP address, update at the next
possible opportunity.
2018-07-19 15:24:08 -04:00
Jarrod Johnson 497ca40492 Do not abort connecting process on bad cert
The target may be non-viable, but don't let that ruin the party
for everyone.  Let it keep going as if the system were down.
2018-07-18 14:58:16 -04:00
Jarrod Johnson fc5472065a Catch missing '@' in token as invalid token 2018-07-17 11:46:40 -04:00
Jarrod Johnson 1dad69097b Be consistent with sync during load of leader cfg
Pass through sync as appropriate.

Also changes meant for previous commit
2018-07-13 21:52:17 -04:00
Jarrod Johnson fd7c428d1f Cleanup leftover sockets and more reliably be following or leading
Before there was a chance to be in a half state, leading to an inability
to reach consensus on leader.
2018-07-13 21:20:42 -04:00
Jarrod Johnson c74fdf5924 More collective join errors 2018-07-13 11:07:39 -04:00
Jarrod Johnson 58bf226d23 Relay error from server about token issue 2018-07-13 10:50:17 -04:00
Jarrod Johnson c80ebb0e8d Explicitly close connection before replacement
If an existing follower is stalled out, close the socket explicitly
to avoid leaving it open in lsof.
2018-07-13 09:14:36 -04:00
Jarrod Johnson efaf1dae70 Make cfgleader modifications more robust
If cfgleader is about to forget a socket, explicitly try to close
it first.
2018-07-13 09:05:28 -04:00
Jarrod Johnson 7cdc3c1400 Implement clear config rollback
Should something go awry during config
load, rollback the clear and load.
2018-07-12 08:48:21 -04:00
Jarrod Johnson beedfb0600 If a drone doesn't exist, treat it as if it's an invalid certificate 2018-07-11 16:29:45 -04:00
Jarrod Johnson ce59a36351 Avoid excessive syncs on connect
This removes some redundancy and avoids writing and loading to disk
during the initialization process.
2018-07-11 16:07:56 -04:00
Jarrod Johnson 8e9bcbb44f Clear txcount on enroll
The transaction count on 'join' was being honored as high, when
it never should be.
2018-07-11 09:40:22 -04:00
Jarrod Johnson 704aaeecf9 Tolerate newline in myname
vim is quite insistent on adding a newline, tolerate that.
2018-07-11 09:36:51 -04:00
Jarrod Johnson 11968faffc Numerous fixes to collective
If client has higher transaction count, do not close the connection
before extracting peer address.

If our connect session is rudely terminated, abort rather than trying
to continue.

On assimilate failure, ignore a failed assimilate with no data.

Fix problem where a follower getting double deleted was causing an error.
2018-07-10 14:55:57 -04:00
Jarrod Johnson 298e11f60f Allow invite from non-leader role
A non-leader transaction is modified such that the enroll node
can be connected to the leader and have validation.
2018-07-09 16:40:43 -04:00
Jarrod Johnson 67d6e9a6c7 Add collective show
Provide a harmless way to look at collective state
2018-07-09 15:07:24 -04:00
Jarrod Johnson 2342fe717e Remove superfluous call to sync to file
load_from_json already makes the call, remove the extra call that is
redundant.
2018-07-09 12:59:37 -04:00
Jarrod Johnson 1eaf5357ca Resolve race conditions on simultaneous collective outage
Implement random backoff strategy for serializing connect out and
connect in.
2018-07-03 14:09:09 -04:00
Jarrod Johnson 956faee052 Correct typo in variable name 2018-06-28 14:12:56 -04:00