Tips for Closing Timing in Vivado

Article by Rishov Sarkar. Last updated on January 29, 2023.

Introduction

When I started using Vivado for the first time, timing closure was an incredibly frustrating problem for me, especially because it took hours for the tool to run implementation only to tell you your design failed to meet timing. However, I eventually learned that there are several settings you can adjust in the tool to tell it to be more aggressive in trying to close timing on the design (which I affectionately call Vivado’s “please try harder” settings).

In this article, I document a few of these techniques (mostly for my own reference). As I am still pretty new to Vivado, I might have made mistakes in my explanations; if so, please shoot me an email with corrections!

Step 1: Choose a Vivado Implementation Strategy

Reference: “Implementation Strategy Descriptions” from Vivado Design Suite User Guide: Implementation (UG904)

By default, Vivado runs implementation using a default implementation strategy that “balances runtime with trying to achieve timing closure.” Unfortunately, it often fails on the latter part of that statement. I usually find it beneficial to change the implementation strategy right away, before even running implementation for the first time.

You can change the implementation strategy under the “Design Runs” tab by right-clicking the implementation run (usually impl_1), selecting “Change Run Settings…”, and choosing your desired strategy from the “Strategy” dropdown menu. (Note that this is not the same as the “Report Strategy” dropdown menu.)

There are a lot of implementation strategies Vivado offers, but I’ve found two to be particularly useful:

Performance_Auto_1

Best predicted directive for place_design.

Like all Performance strategies, this strategy puts more effort than the default strategy into overall design performance, including timing closure. The Performance_Auto strategies, in particular, use Vivado’s ML models to try to predict the best directives for certain steps in the implementation flow. Performance_Auto_1 uses Vivado’s top choice for these directives (in contrast to Performance_Auto_2 and Performance_Auto_3, which use Vivado’s second and third choices, respectively).

This strategy took a little over three hours on one of my designs with a resulting WNS of −0.111 ns (TNS −71.536 ns) on a 300 MHz clock.

Performance_ExploreWithRemap

Similar to Performance_ExplorePostRoutePhysOpt, but enables logic optimization step (opt_design) with the ExploreWithRemap directive.

This is probably my current preference for implementation strategy. As far as I can tell, it basically means “run all the optimizations.” As the description implies, it’s a variant of Performance_ExplorePostRoutePhysOpt, which itself is a variant of Performance_Explore. Performance_Explore is described as Us[ing] multiple algorithms for optimization, placement, and routing to get potentially better results. Performance_ExplorePostRoutePhysOpt adds a post-route physical optimization step (shocker, I know), and Performance_ExploreWithRemap changes the opt_design directive from Explore to ExploreWithRemap (which, as far as I can tell from the documentation, “adds the remap optimization to compress logic levels.” Sure, why not).

On the same design, this took about 2 hours and 45 minutes to achieve a WNS of −0.031 ns (TNS −1.099 ns). That’s a better WNS in less time, but of course your mileage may vary.

Step 2: Try Incremental Implementation

Reference: “Incremental Implementation” from Vivado Design Suite User Guide: Implementation (UG904)

As you might have noticed, neither of the strategies I mentioned above were able to achieve timing closure on their own. The next thing you might want to try is incremental implementation.

To quote the Vivado documentation (emphasis mine):

Incremental Implementation refers to the implementation phase of the incremental compile design flow that:

Preserves QoR predictability by reusing prior placement and routing from a reference design.

Speeds up place and route run time or attempts last mile timing closure.

The way that I went about this was to copy the implementation run (right-click → “Copy Run…”), then right-click the newly copied run and choose “Set Incremental Implementation…”. A dialog box will appear, in which you should choose the following settings:

Specify design checkpoint
Incremental Directive: TimingClosure
New design checkpoint: Add new design checkpoint
- From the file selector, browse to the directory of the previous implementation run (e.g., impl_1) and choose the last modified .dcp file (e.g., design_1_wrapper_postroute_physopt.dcp).

From here, just start the new implementation run and cross your fingers.

In my case, I tried this on the previously mentioned implementation run using Performance_ExploreWithRemap. Incremental implementation was able to successfully close timing on the design, improving the WNS from −0.031 ns to 0.005 ns in about 2 hours and 15 minutes.

Step 3: Close Timing using Intelligent Design Runs

Reference: “Intelligent Design Runs” from Vivado Design Suite User Guide: Design Analysis and Closure Techniques (UG906)

This is probably the most powerful technique for closing timing, making it probably the most likely to succeed. Simply right-click an implementation run and choose the menu option “Close Timing using Intelligent Design Runs” to create a new intelligent design run (IDR, usually named i_impl_1). Then just launch the newly generated IDR flow and wait.

The main downside of this flow is how excruciatingly slow it is. Vivado has to run implementation several times over, so expect this to take at least a full work day. However, assuming the machine you are using has sufficient CPU power, you can run this in parallel with incremental implementation (step 2) in case that step fails.

I ran an IDR flow for each of the two implementation runs mentioned in step 1. For the Performance_Auto_1 run, the IDR flow closed timing in 19 hours, achieving a WNS of 0.003 ns. For the Performance_ExploreWithRemap run, it closed timing in 18 hours and 30 minutes, achieving the same WNS.

I am not certain that it makes a difference which implementation run you start from. I am not even sure that you have to have already completed an implementation run first—you might be able to run this right at the start, in parallel with steps 1 and 2. I’ll update this if I ever find out.

Step 4: Hope for the Best

If none of the above techniques work to close timing, depending on how small your smallest WNS is, you might be able to use a bitstream generated from the design with the smallest WNS anyway. I am not sure if the clock frequency gets adjusted accordingly or not, but I was able to run the Performance_ExploreWithRemap bitstream (with 300 MHz clock and WNS −0.031 ns) on-board without any issues.

If you know whether this is because Vivado is adjusting the clock frequency the design runs at, or because I just got lucky, please let me know!