Flex-start VMs and calendar-mode reservations are generally available (GA).
Both consumption options use Dynamic Workload Scheduler pricing, which offers discounts of up to 53% off of on-demand pricing. For more information, see Create and run a job that uses GPUs and Ensure resource availability using VM reservations.
]]>Dynamic Workload Scheduler for Batch (Preview) has been replaced with the following consumption options:
Flex-start VMs (Preview): We recommend Flex-start VMs if your job can withstand best-effort availability in exchange for discounted pricing and up to 7 days to finish running.
Calendar-mode reservations (Preview): We recommend calendar-mode reservations if your job needs a very high level of assurance of resource availability for at least 1 day and up to 90 days.
Both consumption options use Dynamic Workload Scheduler pricing, which offers discounts of up to 53% off of on-demand pricing. For more information, see Create and run a job that uses GPUs.
Documentation has been updated to clarify the machine types that jobs can use.
]]>Pub/Sub might not send notifications for all intermediate states when a job or task changes very quickly. You can mitigate this issue by viewing state history through status events. For more information, see Known issues.
]]>Cancelling jobs is generally available (GA).
]]>You can use the Google Cloud console to create jobs that use GPUs.
]]>Dependent jobs are available in Preview. Dependent jobs let you schedule an automated chain of jobs, which can help you optimize resource consumption—for example, separate the types of VMs used for data preparation and compute-intensive data processing.
]]>Dynamic Workload Scheduler for Batch is available in Preview. We recommend using Dynamic Workload Scheduler to improve resource availability for jobs that run on A3 GPU VMs when you don't intend to use a reservation. For more information, see Create and run a job that uses GPUs.
]]>Documentation has been added to explain how to export job information. Exporting a job's information is useful when you want to retain the information after a job is deleted or analyze the information outside of Batch. For more information, see Export job information.
]]>The documentation has been updated to clarify that a Batch OS stops being supported when its base Compute Engine OS is deprecated. This restriction only applies to Batch OSes that have not already reached the end of development as of the date of this notice.
For more information, see Restrictions for VM OS images.
Batch CentOS (batch-centos) and Batch HPC CentOS (batch-hpc-centos) have reached end of development due to the end of support (EOS) of Compute Engine CentOS 7 images on June 30, 2024.
The final image versions of these Batch OSes—batch-centos-7-official-20240628-00-p00 and batch-hpc-centos-7-official-20240628-00-p00 from June 28, 2024—are only supported until August 27, 2024. By then, migrate any job that uses Batch CentOS or Batch HPC CentOS to a different OS.
Cancel jobs is available in Preview.
]]>Documentation has been added to explain how to view resource metrics for your jobs in Cloud Monitoring. The metrics provide resource utilization and performance information, which you can use to help optimize the performance and costs of future jobs. For more information, see Monitor and optimize job resources by viewing metrics.
You can configure a job to automatically install the Ops Agent, which provides additional resource metrics in Cloud Monitoring. For more information, see Collect additional resource metrics using the Ops Agent.
]]>When a job fails due to exceeding a timeout, the job's logs don't indicate whether the failure was caused by the relevant task's timeout or the relevant runnable's timeout. For more information, including a workaround, see Known issues.
You can set maximum time limits for tasks and runnables. For more information, see Limit run times for tasks and runnables using timeouts.
]]>In the Google Cloud console, the Job list page has been updated to reduce latency. Although the console no longer summarizes the statuses of your jobs, you can filter based on job state when you view a list of your jobs.
Fixed the issue causing latency when listing jobs in projects that contain more than 10,000 jobs.
]]>You can configure custom status events, which describe important events for a job's runnables. By providing additional information about a job's progress, custom status events can help make a job easier to analyze and troubleshoot.
For more information, see Configure custom status events to describe runnables and View a job's history through status events.
You can run Batch jobs as a non-root user to meet workload or security requirements. For more information, see Create and run jobs as a non-root user.
You can write unstructured and structured task logs:
By allowing you to surface custom information in Cloud Logging, task logs can help make a job easier to analyze and troubleshoot.
For more information, see Write task logs.
]]>Jobs that try to consume reserved VMs might be incorrectly delayed or prevented from running. For more information, including workarounds, see Known issues.
]]>The limit for concurrent VMs per job now varies based on the number of zones allowed for a job's VMs:
Learn more about Quotas and limits and Batch locations.
]]>You can use Image streaming to enable Batch jobs to initialize without waiting for a container image to finish downloading. For more information, see Use Image streaming to reduce container startup time.
]]>Logs from Batch jobs created after December 19, 2023 will no longer use the general-purpose generic_task monitored resource type and instead use the new batch.googleapis.com/Job monitored resource type. The batch.googleapis.com/Job monitored resource type is specific to Batch, which makes it simpler to query Cloud Logging for logs only from Batch.
When querying Cloud Logging for logs from Batch, update any filters that require the generic_task monitored resource type to specify the batch.googleapis.com/Job monitored resource type accordingly. Alternatively, you can enable the use_generic_task_monitored_resource field for your jobs to continue using the generic_task monitored resource type instead.
For more information, see the documentation for Cloud Logging monitored resources types and Batch job logs.
]]>Documentation has been added to explain how to configure jobs that can run on reserved VMs. Using reserved VMs helps minimize a job's scheduling time and prevent resource availability errors.
For more information, see Ensure resource availability using VM reservations
]]>You might experience latency when listing jobs in projects that contain more than 10,000 jobs. For more information, see Known issues.
Documentation has been added to explain how to configure jobs to send status notifications using Pub/Sub and how to query those notifications using BigQuery.
For more information, see the following pages:
To configure your project to support status notifications, see Monitor job status using Pub/Sub notifications and BigQuery.
To configure a job to send status notifications, see Create and run a job that sends Pub/Sub status notifications.
Documentation has been added to explain how to run dsub pipelines on Batch. For more information, see Orchestrate jobs by running dsub pipelines on Batch.
Documentation has been added to explain how to colocate the VMs for a job by using a compact placement policy. For example, use compact placement policies to reduce the latency between VMs for jobs with tightly coupled tasks, such as tasks that communicate using MPI libraries.
For more information, see Reduce latency by using compact placement policies.
]]>Documentation has been added to explain how to securely reference sensitive data in a job by using Secret Manager secrets for encryption. For example, use secrets to protect sensitive data when defining custom environment variables or protect login credentials when accessing private container images from Docker Registry.
For more information, see Protect sensitive data using Secret Manager with Batch.
]]>Job limits have increased to 100,000 tasks per task group and 5,000 parallel tasks per job. Learn more about Quotas and limits.
]]>Batch is available in the following regions:
australia-southeast2 (Melbourne)europe-west8 (Milan)europe-west12 (Turin)me-west1 (Tel Aviv)northamerica-northeast2 (Toronto)southamerica-east1 (São Paulo)us-east5 (Columbus)For more information, see Locations.
]]>Documentation has been updated to reflect new default options for jobs that use GPUs:
For more information, see Create and run a job that uses GPUs.
]]>Documentation has been added to explain how to automatically retry some or all of the failed tasks for a job. For example, automatic task retries can help prevent job failures from temporary issues like Spot VM preemption, host events, and transient networking errors.
For more information, see Automate task retries.
]]>Documentation has been added to explain the VM OS environment for Batch. For a job's VMs, you can optionally configure the OS image and/or boot disk properties. Otherwise, a job uses the default configuration.
For more information, see the following pages:
]]>Batch is available in the europe-west10 (Berlin) region.
For more information, see Locations.
]]>Batch is available in the following regions:
asia-south2 (Delhi)asia-southeast2 (Jakarta)europe-southwest1 (Madrid)me-central1 (Doha) For more information, see Locations.
]]>Batch is enforcing a 60-day retention policy for all finished (failed or succeeded) jobs:
Any existing jobs that have finished before August 17, 2023 are automatically deleted 60 days after, on October 16, 2023.
All new and existing jobs that are not yet finished on August 17, 2023 are automatically deleted 60 days after they finish running.
If you need to retain the information for a job for more than 60 days, you can export the job. For more information, see Delete and export jobs.
]]>