In this post we talk about the FPGA implementation process. This process involves taking an existing HDL based design and creating a programming file for our target FPGA. In this post we cover the three major steps involved in this process - synthesis, place and route and finally programming file generation.
In the previous post in this series we talked about the process of creating an FPGA design.
Once we have proven our design works, we then transfer the functional HDL code into an actual FPGA.
We normally carry this out in three separate stages - synthesis, place and route and generation of the programming file.
We discuss each of these steps in more detail in the rest of this post.
The first stage in building the FPGA is known as synthesis.
This process transforms the functional RTL design into an array of gate level macros.
This has the effect of creating a flat hierarchical circuit diagram which implements the RTL design.
In this context, the macros are actually models of the internal FPGA cells. This can be any digital element in the FPGA such as flip-flops, RAM or look up tables (LUT).
There are a number of different tools we can use to run the synthesis process.
Both of the major FPGA vendors (Xilinx and Intel) offer free synthesis tools which are suitable for most projects.
In addition to this, there are also a number of open source synthesis tools which we can use. The most popular of these tools is yosys which is frequently used with Lattice FPGAs.
There are also paid tools which we can use for this pirpose. The most well known of these tools are Synplify Pro from Synopsys and Leonardo Spectrum from Mentor Graphics.
The paid tools are typically capable of delivering more optimised netlists than the free tools. We normally only need the paid tools for large or high speed designs.
The synthesis process requires at least two inputs.
The first of these is the source code for our design.
We also need a script or project file which defines the configuration of the synthesis tool. This script typically tells the tool which FPGA to target, the pinout of the design and which strategy to use when running the synthesis.
In addition to this, it is good practise to create a file which defines the timing constraints of the design.
We use timing constraints to define details about the FPGA which can’t be specified in the source code.
This includes information such as the clock frequency, the number of clock domains and the timing of external interfaces.
These details determine how much effort the synthesiser puts into optimising timing within the FPGA.
We can also perform some analyses of our design as a part of the synthesis process. However, this information is typically more reliable after the place and route process.
The first of these analyses is the logic utilisation of the design. This analysis details how many of each of the different types of FPGA cells our design uses.
The individual cells within a device vary from chip to chip, as well as between vendors.
Almost all modern chips will include RAM, some form of LUT and flip flops.
High end chips can also include dedicated DSP core, clock management blocks such as PLLs as well as other peripheral interfaces such as ADCs or dedicated high speed interfaces.
After completing the synthesis process, we can generate a report which tells us how many cells are used in our design both in absolute terms and as a percentage of all available cells in the device.
After running the synthesis process it is not uncommon to find that our design is too big for our device. There are a number of options available to us when this occurs.
It is normally possible to reduce utilisation by changing the configuration of the synthesis tool. Examples of this could be changing FSM encoding or selecting a different synthesis algorithm.
This reduction can be enough if our design is only slightly larger than our chosen FPGA.
If this doesn't sufficiently reduce utilisation then we must select a new FPGA or make our original code more efficient.
We can also analyse the timing of our FPGA after we have run the synthesus.
We use this analysis to determine whether the FPGA can run our design at the required frequency.
When our design can't be run at the desired frequency then we can not be sure that there will will be no timing violations on the internal flip flops. As a result of this, we can't guarantee that our device will operate as expected.
We typically analyse the timing of a design in more detail following the place and route process. As the timing is dependent on the location of cells in the FPGA, results are more accurate following this process.
During the synthesis process, we can request that the tool generates a netlist in either VHDL or verilog.
This process also generates a set of timing delays which model the propagation of signals through the FPGA.
We can then use this information to run simulations of our synthesized netlist.
As these simulations also model the timing of our design, they give a more accurate model of the behavior of our final device.
We typically generate the post-synthesis simulation model using verilog, regardless fo the language we used in our design.
The reason for this is that verilog based models are faster to simulate than their VHDL equivalent. This is especially important for post-synthesis simulations as they typically have long execution times.
There are two main advantages to running post-synthesis simulations.
Firstly, these simulations help to ensure that our generated netlist matches the behaviour of our original RTL model.
Secondly, the timing of the chip can be more closely considered. This helps us to find bugs which may relate to timing based errors, such as race conditions or timing violations.
Although there are advantages to running post synthesis simulations, we normally won't do this as part of our design flow.
One reason for this is that these simulations require a long time to run. It is not uncommon for post-synthesis simulations to require several days to run a full set of tests.
Another reason is that we can also run simulations on the netlist which is generated by our place and route tool.
As these netlists as more representative of the final silicon solutio, it is preferable to carry out any timing simulations using this netlist instead.
After completing the synthesis, we then need to map the netlist to actual resources in our FPGA. This process is known as place and route and it actually consists of a few different steps.
Typically, the first stage of this process involves optimising the netlist. We use this process to remove or replace any elements of the netlist which are redundant or duplicated.
The optimised netlist is then mapped to the physical cells in the FPGA, which is commonly known as placement.
After we have completed the placement process, we then then run a process known as routing.
We use this part of the build process to define the interconnection between the different cells in our chosen FPGA.
We will often to perform several runs of this process in order to meet the timing requirements of our design. However, the place and route tool is responsible for scheduling these multiple runs based on our configuration.
When we are having difficulties getting our design to meet with our timing requirements, it is common to increase the number and types of run which we allow the tool to perform.
There are no third party place and route tools for Xilinx or Intel parts, meaning we must use the vendor specific tools. These are freely available for download, although paid versions are also available.
There are also paid versions of these tools which are available, although they are normally only required for designs which target high end FPGAs.
For Lattice FPGAs, the open source nextpnr software is a popular place and route tool.
Depending on the size of our design, the place and route process can take several hours to complete.
As with the synthesis process, the place and route tool requires a number of inputs to run correctly.
The netlist which we generated using the synthesis tool is the most important input. This netlist is typically an .edf file, although this varies between tools.
We typically also use a project file or script to determine the configuration of our place and route tools.
We use this to define important information, such as the FPGA part number and package. In addition to this, we use this script to define the configuration of our place and route tool.
We also need to provide a constraint file to the tool which defines the timing characteristics of our design. This is often the same file as we use in the synthesis process which defines information about the clock frequencies and domains.
We also use a constraint file to define physical characteristics of our design which we can't describe in our HDL code. As a minimum this will include the mapping of the inputs and outputs to the physical pins of the device.
As with the synthesis process, we can generate a number of reports after the place and route has finished. This allows us to further analyse our design to ensure it works correctly.
We usually run a utiliyation report after completing the place and route process.
This report details the number of the number of different cells which we have used to implement our design with the FPGA.
This report is exactly the same as the report which we can generate during the synthesis process. However, this report is more accurate when we generate it after completing the place and route process.
Another analysis which we typically perform after completing the place and route process is the Static Timing Analysis (STA).
We use this process to calculate the delay times through all of the logic chains in our design. By calculating this information, the place and route tool can determine whether the chip is capable of running at the specified clock frequency.
The place and route tool performs this analysis with both the worst and best case timing conditions. However, it is more common for timing issues to arise with the worst case delays in the silicon.
We typically use the STA report as a crucial part of our design verification.
If our design fails the STA then we can't guarantee that our FPGA will work reliably. When this happens we either have to run the implementation process again with different settings or we must change our design.
The final stage in the implementation of the FPGA design is the generation of the programming file.
We normally use the place and route tool to generate our programming file.
However, we typically run this as a separate process.
This process can only be run once the place and route process has generated its outputs. We only need to tell the tool which file type we require in order to generate this output.
Once this process has completed, we can use the generated file to program our FPGA.