How To Use Kubernetes’ Job and CronJob
Schedule code to run in your containerized environment
Welcome to another installment of the “Kubernetes in a Nutshell” blog series. So far, we covered Kubernetes resources (objects) such as Deployment
s, Service
s, Volume
s, etc.
In this blog, we will explore Job
and CronJob
. With the help of examples, you will learn about:
- How to use these components.
- Specify constraints such as time limit, concurrency.
- Handle failures, etc.
The code (lots of YAML) is available on GitHub.
Kubernetes Job
You can use a Kubernetes Job
to run batch processes, ETL jobs, ad-hoc operations, etc. It starts off a Pod
and lets it run to completion. This is quite different from other Pod
controllers such a Deployment
or ReplicaSet
.
As always, we will learn by doing. So, let’s dive in!
Hello Job!
Here is what a typical Job
manifest looks like:
apiVersion: batch/v1
kind: Job
metadata:
name: job1
spec:
template:
spec:
containers:
- name: job
image: busybox
args:
- /bin/sh
- -c
- date; echo sleeping....; sleep 90s; echo exiting...; date
restartPolicy: Never
This Job
will simply start a busybox
container that simply executes a bunch of shell commands. Let's create this Job
and investigate what's going on
To keep things simple, the YAML file is being referenced directly from the GitHub repo, but you can also download the file to your local machine and use it in the same way.
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/jobs/job1.yaml
Check the Job
and its associated Pod
.
kubectl get job/job1NAME COMPLETIONS DURATION AGE
job1 0/1 8s 8s
You should see a Pod
in Running
state, for e.g.:
kubectl get pod -l=job-name=job1job1-bptmd 1/1 Running
If you check the Pod
logs, you should see something similar to this:
kubectl logs <pod_name>Thu Jan 9 10:10:35 UTC 2020
sleeping....
Check the job again after ~90s.
kubectl get job/job1NAME COMPLETIONS DURATION AGE
job1 1/1 95s 102s
The Job
ran for a little over 90 seconds and COMPLETIONS
reflects that one Pod
completed successfully. This will reflect in the Pod
logs as well.
Thu Jan 9 10:10:05 UTC 2020
sleeping....
exiting...
Thu Jan 9 10:11:35 UTC 2020
Also, the Pod
status should change to Completed
.
kubectl get pod -l=job-name=job1job1-bptmd 0/1 Completed
If all the Job
did was to create a Pod
to run a container, why can’t we use a plain old Pod
?
That's because a Job
can be restarted by Kubernetes if the container fails, that cannot happen with an isolated Pod
. In addition to this, there are many other capabilities that a Job Controller provides, which we will explore going forward.
To delete this Job
, simply run kubectl delete job/job1
.
Enforcing a time limit
For e.g., you are running a batch job and it takes too long to finish due to some reason. This might be undesirable. You can limit the time for which a Job
can continue to run by setting the activeDeadlineSeconds
attribute in the spec.
Here is an example:
apiVersion: batch/v1
kind: Job
metadata:
name: job2
spec:
activeDeadlineSeconds: 5
template:
spec:
containers:
- name: job
image: busybox
args:
- /bin/sh
- -c
- date; echo sleeping....; sleep 10s; echo exiting...; date
restartPolicy: Never
Notice that the activeDeadlineSeconds
has been set to 5 seconds while the container process has been designated to run for 10 seconds.
Create the Job
, wait for a few seconds (~10 seconds) and check the Job
.
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/jobs/job2.yamlkubect get job/job2 -o yaml
Scroll down to check the status
field and you will see that the Job
is in a Failed
state due to DeadlineExceeded
.
status:
conditions:
- lastProbeTime: "2020-01-09T10:57:13Z"
lastTransitionTime: "2020-01-09T10:57:13Z"
message: Job was active longer than specified deadline
reason: DeadlineExceeded
status: "True"
type: Failed
To delete the job, simply run kubectl delete job/job2
.
Handling failures
What if there are issues due to container failure (process exited) or Pod
failure? Let's try this out by simulating a failure.
In this Job
, the container prints the date
, sleep
s for 5 seconds, and exits with a status 1 to simulate failure.
apiVersion: batch/v1
kind: Job
metadata:
name: job3
spec:
backoffLimit: 2
template:
spec:
containers:
- name: job
image: busybox
args:
- /bin/sh
- -c
- date; echo sleeping....; sleep 5s; exit 1;
restartPolicy: OnFailure
Notice that the restartPolicy: OnFailure
is different compared to the previous example where it was set to Never
. We will come back to this in a moment.
Create the Job
and keep an eye on a specific Pod
for this job.
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/jobs/job3.yamlkubectl get pod -l=job-name=job3 -w
You should see something similar to below:
NAME READY STATUS RESTARTS AGE
job3-qgv4b 0/1 ContainerCreating 0 4s
job3-qgv4b 1/1 Running 0 6s
job3-qgv4b 0/1 Error 0 12s
job3-qgv4b 1/1 Running 1 17s
job3-qgv4b 0/1 Error 1 22s
job3-qgv4b 0/1 CrashLoopBackOff 1 34s
job3-qgv4b 1/1 Running 2 40s
job3-qgv4b 1/1 Terminating 2 40s
job3-qgv4b 0/1 Terminating 2 45s
job3-qgv4b 0/1 Terminating 2 51s
Notice how the Pod
status transitions.
- It starts off by pulling and running the container.
- It transitions to
Error
state since it exits with status 1 (after sleeping for 5 seconds). - It goes back to
Running
status again (notice that theRESTARTS
count is now 1). - As expected, it goes into
Error
state again and is restarted once more -RESTARTS
count is now 2. - Finally, it’s
terminated
.
Kubernetes (the Job Controller to be specific) restarted the container for us because we specified restartPolicy: OnFailure
.
But there might be a situation where this might continue indefinitely, so we put a limit on this using backoffLimit: 2
which will ensure that Kubernetes re-tries only twice before marking this Job
as Failed
.
Note that this was an example of the container being restarted. the Job controller can also create a new Pod
in case of a Pod
failure.
If you check the Job
status...
kubectl get job/job3 -o yaml
… you will see that it has Failed
due to BackoffLimitExceeded
.
status:
conditions:
- lastProbeTime: "2020-01-09T11:16:24Z"
lastTransitionTime: "2020-01-09T11:16:24Z"
message: Job has reached the specified backoff limit
reason: BackoffLimitExceeded
status: "True"
type: Failed
restartPolicy
of Never
means that a failure will not restart the container or create a new Pod
when things go wrong. Also, the default limit for backoffLimit
is 6
.
To delete this job, just run kubectl delete job/job3
.
More is better
There are requirements where you might want the Job
to spin up more than one Pod
to get things done.
For e.g., consider a scenario where you are running a batch job to process records from a database — having multiple Pod
s share the load can definitely help.
One way of doing this might be for each Pod
to run sequentially, record the number of rows processed in an external source (e.g. another DB table) and the other Pod
can pick up from there.
This can be done by adding the completions
property in the Job
spec.
apiVersion: batch/v1
kind: Job
metadata:
name: job4
spec:
completions: 2
template:
spec:
containers:
- name: job
image: busybox
args:
- /bin/sh
- -c
- date; echo sleeping....; sleep 10s; echo exiting...; date
restartPolicy: Never
Create the Job
and keep an eye on how it progresses.
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/jobs/job4.yamlkubectl get job/job4 -w
You should see something similar to this:
NAME COMPLETIONS DURATION AGE
job4 0/2 3s 3s
job4 1/2 20s 20s
job4 2/2 37s 37s
Since we had set completions
to two:
- Two
Pod
s were instantiated one after the other (sequentially). Job
was markedCompleted
(successful) only after bothPod
s ran to completion. Otherwise, the failure conditions would have applied (as discussed above).
Let’s check the Pod
logs as well.
kubectl get pods -l=job-name=job4
kubect logs <pod_name>
If you see the logs for both the Pod
s, you will be able to confirm that they started one after the other in a sequence (and each ran for ~10 seconds).
Logs for Pod
1.
Thu Jan 9 11:31:57 UTC 2020
sleeping....
exiting...
Thu Jan 9 11:32:07 UTC 2020
Logs for Pod
2.
Thu Jan 9 11:32:15 UTC 2020
sleeping....
exiting...
Thu Jan 9 11:32:25 UTC 2020
How about running the batch processing in a parallel fashion where all the Pod
s are instantiated at once (instead of sequentially)?
To handle this case, our processing logic needs to be tuned accordingly since there is co-ordination required amongst the parallel Pod
s in terms of which set of work items to pick and how to update their completion status.
We will not dive into that, but I hope you get the idea in terms of the requirement.
Now, this can be achieved by using parallelism
along with completions
. Here is an example:
apiVersion: batch/v1
kind: Job
metadata:
name: job5
spec:
completions: 3
parallelism: 3
template:
spec:
containers:
- name: job
image: busybox
args:
- /bin/sh
- -c
- date; echo sleeping....; sleep 10s; echo exiting...; date
restartPolicy: Never
By using the parallelism
attribute, we were able to put a cap on the maximum number of Pod
s that can run at a time. In this case, since parallelism
is set to three, it implies that:
- Three
Pod
s will be instantiated all at once. Job
will be markedCompleted
(successful) only if all three run to completion. Otherwise, the failure conditions apply (as discussed above).
Once you’re done
You can use ttlSecondsAfterFinished
to specify the number of seconds after which the Job
can be automatically deleted once it is finished (either Completed
or Failed
). This also removes dependent entities such as Pod
s spawned by the Job
.
CronJob
A CronJob
object allows you to schedule Job
execution rather than starting them manually.
It uses the Cron format to run a job as scheduled. Basically, the CronJob
is a higher-level abstraction that embeds within itself a Job
template (as seen above) along with a schedule (cron format) and other attributes.
Let’s create a simple CronJob
that repeats every minute.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cronjob1
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: cronjob
image: busybox
args:
- /bin/sh
- -c
- date; echo sleeping....; sleep 5s; echo exiting...;
restartPolicy: Never
The jobTemplate
section is the same as that of a Job
. It’s simply embedded within this CronJob
spec. It’s the same container that we were using for the Job
example.
Create the CronJob
and check it:
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/jobs/cronjob1.yamlkubectl get cronjob/cronjob1
The output:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob1 */1 * * * * False 0 <none> 4s
Keep track of the Job
that this CronJob
spawns.
kubectl get job -wNAME COMPLETIONS DURATION AGE
cronjob1-1578572340 0/1 2s 2s
cronjob1-1578572340 1/1 11s 11s
cronjob1-1578572400 0/1 0s
cronjob1-1578572400 0/1 0s 0s
cronjob1-1578572400 1/1 10s 10s
cronjob1-1578572460 0/1 0s
cronjob1-1578572460 0/1 0s 0s
cronjob1-1578572460 1/1 11s 11s
A new Job
is being created every minute and it ran for ~10 seconds as expected. You can also check the logs of the individual Pod
that the Job
created (just like you did with previous examples).
kubectl get pod -l=job-name=<job_name>
kubectl logs <pod_name>
There are other (optional) CronJob
properties in addition to the schedule
attribute. Let's look at one of these.
concurrencyPolicy
It has three possible values — Forbid
, Allow
, and Replace
.
Choose Forbid
if you don't want concurrent executions of your Job
. When it’s time to trigger a Job
as per the schedule and a Job
instance is already running, the current iteration is skipped.
If you choose Replace
as the concurrency policy, the currently running Job
will be stopped and a new Job
will be spawned.
Specifying Allow
will let multiple Job
instances run concurrently.
Here is an example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cronjob2
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Allow
jobTemplate:
spec:
template:
spec:
containers:
- name: cronjob
image: busybox
args:
- /bin/sh
- -c
- date; echo sleeping....; sleep 90s; echo exiting...;
restartPolicy: Never
You can create this CronJob
and then track the individual Job
s to observe the behavior.
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/jobs/cronjob2.yamlkubectl get job -w
Since the schedule is every one minute and the container runs for 90 seconds, you will see multiple Job
s running at the same time. This overlap is possible since we have applied concurrencyPolicy: Allow
.
You might see something like this:
cronjob2-1578573480 0/1 0s
cronjob2-1578573480 0/1 0s 0s
cronjob2-1578573540 0/1 0s
cronjob2-1578573540 0/1 0s 0s
cronjob2-1578573480 1/1 95s 95s
Notice that job cronjob2-1578573540
was triggered before cronjob2-1578573480
could finish.
The other properties of a CronJob
are:
- Job history:
successfulJobsHistoryLimit
andfailedJobsHistoryLimit
can be used to specify how much history you want to retain for failed and completedJob
s. - Start deadline specified by
startingDeadlineSeconds
. - Suspend specified by
suspend
.
Conclusion
That’s it for this part of the “Kubernetes in a Nutshell” series. Stay tuned for more.
I really hope you enjoyed and learned something from this article. Happy to get your feedback as a comment.