NetworkDB's Watch() facility is problematic to use in practice. The
stream of events begins when the watch is started, so the watch cannot
be used to process table entries that existed beforehand. Either option
to process existing table entries is racy: walking the table before
starting the watch leaves a race window where events could be missed,
and walking the table after starting the watch leaves a race window
where created/updated entries could be processed twice.
Modify Watch() to initialize the channel with synthetic CREATE events
for all existing entries owned by remote nodes before hooking it up to
the live event stream. This way watchers observe an equivalent sequence
of events irrespective of whether the watch was started before or after
entries from remote nodes are added to the database. Remove the bespoke
and racy synthetic event replay logic for driver watches from the
libnetwork agent.
Signed-off-by: Cory Snider <csnider@mirantis.com>
NetworkDB gossips changes to table entries to other nodes using distinct
CREATE, UPDATE and DELETE events. It is unfortunate that the wire
protocol distinguishes CREATEs from UPDATEs as nothing useful can be
done with this information. Newer events for an entry invalidate older
ones, so there is no guarantee that a CREATE event is broadcast to any
node before an UPDATE is broadcast. And due to the nature of gossip
protocols, even if the CREATE event is broadcast from the originating
node, there is no guarantee that any particular node will receive the
CREATE before an UPDATE. Any code which handles an UPDATE event
differently from a CREATE event is therefore going to behave in
unexpected ways in less than perfect conditions.
NetworkDB table watchers also receive CREATE, UPDATE and DELETE events.
Since the watched tables are local to the node, the events could all
have well-defined meanings that are actually useful. Unfortunately
NetworkDB is just bubbling up the wire-protocol event types to the
watchers. Redefine the table-watch events such that a CREATE event is
broadcast when an entry pops into existence in the local NetworkDB, an
UPDATE event is broadcast when an entry which was already present in the
NetworkDB state is modified, and a DELETE event is broadcast when an
entry which was already present in the NetworkDB state is marked for
deletion. DELETE events are broadcast with the same value as the most
recent CREATE or UPDATE event for the entry.
The handler for endpoint table events in the libnetwork agent assumed
incorrectly that CREATE events always correspond to adding a new active
endpoint and that UPDATE events always correspond to disabling an
endpoint. Fix up the handler to handle CREATE and UPDATE events using
the same code path, checking the table entry's ServiceDisabled flag to
determine which action to take.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Commit 8b7af1d0f added some code to update the DNSNames of all
endpoints attached to a sandbox by loading a new instance of each
affected endpoints from the datastore through a call to
`Network.EndpointByID()`.
This method then calls `Network.getEndpointFromStore()`, that in
turn calls `store.GetObject()`, which then calls `cache.get()`,
which calls `o.CopyTo(kvObject)`. This effectively creates a fresh
new instance of an Endpoint. However, endpoints are already kept in
memory by Sandbox, meaning we now have two in-memory instances of
the same Endpoint.
As it turns out, libnetwork is built around the idea that no two objects
representing the same thing should leave in-memory, otherwise breaking
mutex locking and optimistic locking (as both instances will have a drifting
version tracking ID -- dbIndex in libnetwork parliance).
In this specific case, this bug materializes by container rename failing
when applied a second time for a given container. An integration test is
added to make sure this won't happen again.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Instead of special-casing anonymous endpoints, use the list of DNS names
associated to the endpoint.
`(*Endpoint).isAnonymous()` has no more uses, so let's delete it.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
We don't need C-style callback functions which accept a void* context
parameter: Go has closures. Drop the unnecessary httpHandlerCustom type
and refactor the diagnostic server handler functions into closures which
capture whatever context they need implicitly.
If the node leaves and rejoins a swarm, the cluster agent and its
associated NetworkDB are discarded and replaced with new instances. Upon
rejoin, the agent registers its NetworkDB instance with the diagnostic
server. These handlers would all conflict with the handlers registered
by the previous NetworkDB instance. Attempting to register a second
handler on a http.ServeMux with the same pattern will panic, which the
diagnostic server would historically deal with by ignoring the duplicate
handler registration. Consequently, the first NetworkDB instance to be
registered would "stick" to the diagnostic server for the lifetime of
the process, even after it is replaced with another instance. Improve
duplicate-handler registration such that the most recently-registered
handler for a pattern is used for all subsequent requests.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- use named return variables to make the function more self-describing
- rename variable for readability
- slightly optimize slice initialization, and keep linters happy
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function was used to check if the network is a multi-host, swarm-scoped
network. Part of this check involved a check whether the cluster-agent was
present.
In all places where this function was used, the next step after checking if
the network was "cluster eligible", was to get the agent, and (again) check
if it was not nil.
This patch rewrites the isClusterEligible utility into a clusterAgent utility,
which both checks if the network is cluster-eligible, and returns the agent
(if set). For convenience, an "ok" bool is added, which callers can use to
return early (although just checking for nilness would likely have been
sufficient).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This removes redundant nil-checks in Endpoint.deleteServiceInfoFromCluster
and Endpoint.addServiceInfoToCluster.
These functions return early if the network is not ["cluster eligible"][1],
and the function used for that (`Network.isClusterEligible`) requires the
[agent to not be `nil`][2].
This check moved around a few times ([3][3], [4][4]), but was originally
added in [libnetwork 1570][5] which, among others, tried to avoid a nil-pointer
exception reported in [moby 28712][6], which accessed the `Controller.agent`
[without locking][7]. That issue was addressed by adding locks, adding a
`Controller.getAgent` accessor, and updating deleteServiceInfoFromCluster
to use a local var. It also sprinkled this `nil` check to be on the safe
side, but as `Network.isClusterEligible` already checks for the agent
to not be `nil`, this should not be redundant.
[1]: 5b53ddfcdd/libnetwork/agent.go (L529-L534)
[2]: 5b53ddfcdd/libnetwork/agent.go (L688-L696)
[3]: f2307265c7
[4]: 6426d1e66f
[5]: 8dcf9960aa
[6]: https://github.com/moby/moby/issues/28712
[7]: 75fd88ba89/vendor/github.com/docker/libnetwork/agent.go (L452)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add a fast-patch to some functions, to prevent locking/unlocking,
or other operations that would not be needed;
- Network.addDriverInfoToCluster
- Network.deleteDriverInfoFromCluster
- Network.addServiceInfoToCluster
- Network.deleteServiceInfoFromCluster
- Network.addDriverWatches
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- return early when failing to fetch the driver
- store network-ID and controller in a variable to prevent repeatedly
locking/unlocking. We don't expect the network's ID to change
during this operation.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
There were quite some places where the type collided with variables
named `agent`. Let's rename the type.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The "Capability" type defines DataScope and ConnectivityScope fields,
but their value was set from consts in the datastore package, which
required importing that package and its dependencies for the consts
only.
This patch:
- Moves the consts to a separate "scope" package
- Adds aliases for the consts in the datastore package.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Most drivers do not implement this, so detect if a driver implements
the discoverAPI, and remove the implementation from drivers that do
not support it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function was implemented in dd4950f36d
which added a "key" field, but that field was never used anywhere, and
still appears unused.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Basically every exported method which takes a libnetwork.Sandbox
argument asserts that the value's concrete type is *sandbox. Passing any
other implementation of the interface is a runtime error! This interface
is a footgun, and clearly not necessary. Export and use the concrete
type instead.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Embedded structs are part of the exported surface of a struct type.
Boxing a struct value into an interface value does not erase that;
any code could gain access to the embedded struct value with a simple
type assertion. The mutex is supposed to be a private implementation
detail, but *controller implements sync.Locker because the mutex is
embedded.
c, _ := libnetwork.New()
c.(sync.Locker).Lock()
Change the mutex to an unexported field so *controller no longer
spuriously implements the sync.Locker interface.
Signed-off-by: Cory Snider <csnider@mirantis.com>
libnetwork/etchosts/etchosts_test.go:167:54: empty-lines: extra empty line at the end of a block (revive)
libnetwork/osl/route_linux.go:185:74: empty-lines: extra empty line at the start of a block (revive)
libnetwork/osl/sandbox_linux_test.go:323:36: empty-lines: extra empty line at the start of a block (revive)
libnetwork/bitseq/sequence.go:412:48: empty-lines: extra empty line at the start of a block (revive)
libnetwork/datastore/datastore_test.go:67:46: empty-lines: extra empty line at the end of a block (revive)
libnetwork/datastore/mock_store.go:34:60: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/firewalld.go:202:44: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/firewalld_test.go:76:36: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/iptables.go:256:67: empty-lines: extra empty line at the end of a block (revive)
libnetwork/iptables/iptables.go:303:128: empty-lines: extra empty line at the start of a block (revive)
libnetwork/networkdb/cluster.go:183:72: empty-lines: extra empty line at the end of a block (revive)
libnetwork/ipams/null/null_test.go:44:38: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/macvlan/macvlan_store.go:45:52: empty-lines: extra empty line at the end of a block (revive)
libnetwork/ipam/allocator_test.go:1058:39: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/bridge/port_mapping.go:88:111: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/bridge/link.go:26:90: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/bridge/setup_ipv6_test.go:17:34: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/bridge/setup_ip_tables.go:392:4: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/bridge/bridge.go:804:50: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/ov_serf.go:183:29: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/ov_utils.go:81:64: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/overlay/peerdb.go:172:67: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/peerdb.go:209:67: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/peerdb.go:344:89: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/peerdb.go:436:63: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/overlay.go:183:36: empty-lines: extra empty line at the start of a block (revive)
libnetwork/drivers/overlay/encryption.go:69:28: empty-lines: extra empty line at the end of a block (revive)
libnetwork/drivers/overlay/ov_network.go:563:81: empty-lines: extra empty line at the start of a block (revive)
libnetwork/default_gateway.go:32:43: empty-lines: extra empty line at the start of a block (revive)
libnetwork/errors_test.go:9:40: empty-lines: extra empty line at the start of a block (revive)
libnetwork/service_common.go:184:64: empty-lines: extra empty line at the end of a block (revive)
libnetwork/endpoint.go:161:55: empty-lines: extra empty line at the end of a block (revive)
libnetwork/store.go:320:33: empty-lines: extra empty line at the end of a block (revive)
libnetwork/store_linux_test.go:11:38: empty-lines: extra empty line at the end of a block (revive)
libnetwork/sandbox.go:571:36: empty-lines: extra empty line at the start of a block (revive)
libnetwork/service_common.go:317:246: empty-lines: extra empty line at the start of a block (revive)
libnetwork/endpoint.go:550:17: empty-lines: extra empty line at the end of a block (revive)
libnetwork/sandbox_dns_unix.go:213:106: empty-lines: extra empty line at the start of a block (revive)
libnetwork/controller.go:676:85: empty-lines: extra empty line at the end of a block (revive)
libnetwork/agent.go:876:60: empty-lines: extra empty line at the end of a block (revive)
libnetwork/resolver.go:324:69: empty-lines: extra empty line at the end of a block (revive)
libnetwork/network.go:1153:92: empty-lines: extra empty line at the end of a block (revive)
libnetwork/network.go:1955:67: empty-lines: extra empty line at the start of a block (revive)
libnetwork/network.go:2235:9: empty-lines: extra empty line at the start of a block (revive)
libnetwork/libnetwork_internal_test.go:336:26: empty-lines: extra empty line at the start of a block (revive)
libnetwork/resolver_test.go:76:35: empty-lines: extra empty line at the end of a block (revive)
libnetwork/libnetwork_test.go:303:38: empty-lines: extra empty line at the end of a block (revive)
libnetwork/libnetwork_test.go:985:46: empty-lines: extra empty line at the end of a block (revive)
libnetwork/ipam/allocator_test.go:1263:37: empty-lines: extra empty line at the start of a block (revive)
libnetwork/errors_test.go:9:40: empty-lines: extra empty line at the end of a block (revive)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It was unclear what the distinction was between these configuration
structs, so merging them to simplify.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Issue - "index out of range" panic in drivers/overlay/encryption.go:539
due to a mismatch in indices between curKeys and spis due to
case where updateKeys might bail out due to an error and
not update the spis
Fix - Reconfigure keys when there is a key update failure
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
This patch attempts to allow endpoints to complete servicing connections
while being removed from a service. The change adds a flag to the
endpoint.deleteServiceInfoFromCluster() method to indicate whether this
removal should fully remove connectivity through the load balancer
to the endpoint or should just disable directing further connections to
the endpoint. If the flag is 'false', then the load balancer assigns
a weight of 0 to the endpoint but does not remove it as a linux load
balancing destination. It does remove the endpoint as a docker load
balancing endpoint but tracks it in a special map of "disabled-but-not-
destroyed" load balancing endpoints. This allows traffic to continue
flowing, at least under Linux. If the flag is 'true', then the code
removes the endpoint entirely as a load balancing destination.
The sandbox.DisableService() method invokes deleteServiceInfoFromCluster()
with the flag sent to 'false', while the endpoint.sbLeave() method invokes
it with the flag set to 'true' to complete the removal on endpoint
finalization. Renaming the endpoint invokes deleteServiceInfoFromCluster()
with the flag set to 'true' because renaming attempts to completely
remove and then re-add each endpoint service entry.
The controller.rmServiceBinding() method, which carries out the operation,
similarly gets a new flag for whether to fully remove the endpoint. If
the flag is false, it does the job of moving the endpoint from the
load balancing set to the 'disabled' set. It then removes or
de-weights the entry in the OS load balancing table via
network.rmLBBackend(). It removes the service entirely via said method
ONLY IF there are no more live or disabled load balancing endpoints.
Similarly network.addLBBackend() requires slight tweaking to properly
manage the disabled set.
Finally, this change requires propagating the status of disabled
service endpoints via the networkDB. Accordingly, the patch includes
both code to generate and handle service update messages. It also
augments the service structure with a ServiceDisabled boolean to convey
whether an endpoint should ultimately be removed or just disabled.
This, naturally, required a rebuild of the protocol buffer code as well.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This commit introduces the possibility to enable a debug mode
for the networkDB, this will allow the opening of a tcp port
on localhost that will expose the networkDB api for debugging
purposes.
The API can be discovered using curl localhost:<port>/help
It support json output if passed json as URL query parameter
option and pretty printing if passing json=pretty
All the binaries values are serialized in base64 encoding, this
can be skip passing the unsafe option as url query parameter
A simple go client will follow up
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Separate the hostname from the node identifier. All the messages
that are exchanged on the network are containing a nodeName field
that today was hostname-uniqueid. Now being encoded as strings in
the protobuf without any length restriction they plays a role
on the effieciency of protocol itself. If the hostname is very long
the overhead will increase and will degradate the performance of
the database itself that each single cycle by default allows 1400
bytes payload
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>