fix(kubernetes source)!: Use Pod's `UID` to exclude Vector's logs by ktff · Pull Request #2188 · vectordotdev/vector

ktff · 2020-03-31T10:26:05Z

Closes #2171.

With this we will always exclude current vector's pod's logs.

Enables testing of v1.13.12 in CI.

Deployment update

To update your previous Vector deployment .yaml, add this section to the container definition of Vector:

        env:
        - name: VECTOR_POD_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid

This will expose Pod's UID to Vector so it knows what logs not to collect. Full example can be found here.

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

ktff · 2020-04-01T10:59:32Z

I'll mark this as a breaking change, because it requires changing how vector with kubernetes source is to be deployed. Simple steps on how to update previous deployments is in the opening comment.

binarylogic · 2020-04-01T12:14:48Z

Thanks @ktff, could you make sure that #1450 is accurate with these changes as well?

LucioFranco · 2020-04-01T14:56:56Z


-        let (file_recv, file_source) =
-            file_source_builder::FileSourceBuilder::new(self).build(name, globals, shutdown)?;
+        let vector_pod_uid = env::var(VECTOR_POD_UID_ENV).map_err(|error| BuildError::PodUid {


I know I asked this before but is this env var always available? If it is lets make sure we document this very clearly, if not we need to find a way to work around this and that may be just ignoring this and letting vector logs come through.

Also curious.

It is present if .yaml is configured with

env: - name: VECTOR_POD_UID valueFrom: fieldRef: fieldPath: metadata.uid

, and if it isn't configured like that, I wouldn't consider it a proper configuration. In the same way it isn't a proper configuration to run vector without mapping the folders necessary for it's operation.

Passing the logs would make it easy for us ~~brick~~(EDIT: to strong of a word, the Github runners became unresponsive so they were stopped automatically after some time) the node, as it happened with Github's nodes, if we accidentally, or forget, add a info log somewhere that depends on the number of logs passing. And even worse if the size of the info depends on the size of the log.

For other ways, we filtered out logs for containers which had names starting with vector, which could be suprising for users that name their containers that way, but we could name vector container in some unique way, but then we are back at the beginning where we need the .yaml to be configured in a proper way.

There is also another way, that just poped into my head, we could with intention log a message with unique string of characters and once we detect that string as a message in the log we would know which UID is ours and could filter them out, although file source would still unecessary be picking them up from the log file.

What do you think?

I am not really happy with this required env var personally. I think its a pretty bad UX hazard.

Do we know what other solutions do here? What does fluentd do?

For example this daemonset seems quite simple compared to ours https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml

Yes! I am very much in favor of anything that improves reliability and confidence for v1 of our Kubernetes integration. I do not want to compromise that supporting old versions right now. If requiring >= 1.14 achieves that then let's do it.

AWS EKS still supports v1.13.

That's ok. EKS also offers 1.14 and 1.15.

I agree, simplifying the source would be great. It seems like 1.14 is over a year old and technically not supported anymore (see here and here).

I'll add my 2 cents, we've been using in production for quite some time fluentd and there has been an attempt to use fluent-bit, and now we're replacing fluent-bit with vector. In all our configurations we've been deploying the entire log-related stack in its own namespace. Mostly because with an appropriated Role and PSP it can be easily restricted.
Even with the current (0.8.2) kubernetes source it works perfectly fine (because you can just whitelist any namespace except vector's).
Just saying that it can be a suggested pattern of deployment and in this case the solution can be less invasive. Otherwise filtering using Pod's UID can be opted-in.

@Alexx-G we really appreciate your feedback! So if I understand correctly for you all it would make sense to have some way in the vector config to set the namespace you are deploying your log stack in. From this we can ignore all logs from that namespace? This to me seems like the ideal solution.

Using a separate namespace is a particular deployment pattern, and not everyone can use that. So we can't solely rely on that - it would be a very significant limiting factor.

As an alternative to env vars, we could use downward API mounts: https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/

We can also communicate with the cluster API to determine the current container info: https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/#accessing-the-api-from-within-a-pod

Also, a hostname in the container environment is by default set to the pod name.

That said, I like the explicit configuration as this PR proposes. This, being explicit, is much easier to understand and maintain for infra teams. I.e. when they operate on a configuration for vector - it becomes very simple to reason which logs are ignored and which are picked up. And this is, arguably, a better tradeoff when we're talking k8s.
Also, since the industry standard in k8s-compatible apps is to ship the deployment configuration as part of the release, I don't see this configuration detail as particularly problematic. We should provide a helm chart for our stuff anyway, as well as applicable daemonset/sidecar deployment files. This would be a proper implementation of our "good defaults" paradigm in the k8s sense.

binarylogic · 2020-04-01T20:27:07Z

Just noting, I'd like to get this in 0.9.0, and will wait to release until we get this merged. I'd like to get 0.9.0 out this week.

Hoverbear

Other than @LucioFranco 's comment , looks good.

ktff · 2020-04-03T12:47:07Z

Thanks @ktff, could you make sure that #1450 is accurate with these changes as well?

No need, they are the same. This only changes the vector-daemonset.yaml, and those that are using it or have created their own custom .yaml, based on this one, should also update it once they update Vector. But we could avoid all that if we implement the last option in this comment .

LucioFranco

I'd like us to not merge this PR until we have found a proper solution for this. Unless we are happy to merge a not proper solution.

binarylogic · 2020-04-05T23:48:49Z

I'd like for us to agree on our approach in #2222 before merging this. It's likely that our approach will allow us to exclude Vector's logs through deployment conventions.

binarylogic · 2020-04-08T14:50:36Z

@MOZGIII feel free to close if you'd like to take a different approach.

Hoverbear · 2020-05-25T18:04:43Z

Is this issue still relevant? (cc @MOZGIII )

MOZGIII · 2020-05-25T22:46:59Z

No, closing in favor of #2222.

Exclude with UID

9cd67c4

Signed-off-by: Kruno Tomola Fabro <krunotf@gmail.com>

ktff requested a review from LucioFranco April 1, 2020 08:15

ktff self-assigned this Apr 1, 2020

ktff changed the title ~~fix(kubernetes source): Use Pod's UID to exclude Vector's logs~~ fix(kubernetes source)!: Use Pod's UID to exclude Vector's logs Apr 1, 2020

ktff added the meta: breaking change Anything that breaks backward compatibility. label Apr 1, 2020

LucioFranco reviewed Apr 1, 2020

View reviewed changes

Hoverbear approved these changes Apr 2, 2020

View reviewed changes

ktff mentioned this pull request Apr 3, 2020

Initial AWS EKS support #814

Closed

ktff mentioned this pull request Apr 3, 2020

Log recursion detection feature #2218

Closed

LucioFranco suggested changes Apr 3, 2020

View reviewed changes

Hoverbear self-requested a review April 5, 2020 23:06

binarylogic assigned MOZGIII and unassigned ktff Apr 8, 2020

Hoverbear removed their request for review April 9, 2020 18:08

Merge branch 'master' into exclude_self

61e1add

Hoverbear self-requested a review April 16, 2020 16:28

MOZGIII closed this May 25, 2020

binarylogic deleted the exclude_self branch July 23, 2020 17:31

Uh oh!

Conversation

ktff commented Mar 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deployment update

Uh oh!

ktff commented Apr 1, 2020

Uh oh!

binarylogic commented Apr 1, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ktff Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MOZGIII Apr 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

binarylogic commented Apr 1, 2020

Uh oh!

Hoverbear left a comment

Choose a reason for hiding this comment

Uh oh!

ktff commented Apr 3, 2020

Uh oh!

LucioFranco left a comment

Choose a reason for hiding this comment

Uh oh!

binarylogic commented Apr 5, 2020

Uh oh!

binarylogic commented Apr 8, 2020

Uh oh!

Hoverbear commented May 25, 2020

Uh oh!

MOZGIII commented May 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ktff commented Mar 31, 2020 •

edited

Loading

ktff Apr 3, 2020 •

edited

Loading

MOZGIII Apr 4, 2020 •

edited

Loading