Guide VLIW Microprocessor Hardware Design: On ASIC and FPGA (Electronics)

Free download. Book file PDF easily for everyone and every device. You can download and read online VLIW Microprocessor Hardware Design: On ASIC and FPGA (Electronics) file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with VLIW Microprocessor Hardware Design: On ASIC and FPGA (Electronics) book. Happy reading VLIW Microprocessor Hardware Design: On ASIC and FPGA (Electronics) Bookeveryone. Download file Free Book PDF VLIW Microprocessor Hardware Design: On ASIC and FPGA (Electronics) at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF VLIW Microprocessor Hardware Design: On ASIC and FPGA (Electronics) Pocket Guide.
Editorial Reviews. About the Author. Weng Fook Lee is a distinguished Senior Member of the Technical Staff at Emerald Systems Design Center. He holds.
Table of contents

Use a nonblocking statement when writing code for sequential always block always posedge clock begin Q1 ii. When synthesized, a sequential always block will translate to rising edge flip-flop for posedge clock and a falling edge flip-flop for negedge clock. It only uses signals in the always block list. When a combinational always block is synthesized, it will translate to combinational logic only. Do not mix blocking and nonblocking statements in one always block. Most synthesis tools do not allow an always block to have a combination of blocking and nonblocking statements.

Although having such a combination allows the code to be simulated, synthesis will fail. It is a good coding practice to ensure that RTL code does not violate this rule! Avoid using initial statements in RTL code. An initial statement is used to initialize values of signals in a verilog code. Some synthesis tools will fail when there are initial statements in the RTL code, while some synthesis tools will ignore the initial statements. However synthesis tools treat the initial statements, it should not be used in RTL coding. Usage of initial statements in RTL can cause problems during verification due to the mismatch between pre- and post-synthesis.

Using bitwise operator and logical operator. When bitwise operators are used on a bus, they operate on each bit of the bus and return the result in bus format. When using an if-else statement, ensure that unwanted latch is not inferred. When using an if-else statement, all possible combinations must be specified. Alternatively, the if-else statement can use an else condition at the end of the if-else statement.

This will ensure that unwanted latch is not inferred in the design. If the operation in bus form when the inputs are value of A is , this translates to A being in bus form. Therefore, A and B is Only bit 0 is built using NOR gates. Synthesized circuit for module logical is shown in the diagram below. Synthesized circuit for module bitwise is shown in the diagram below. When using a case statement, ensure that unwanted latch is not inferred. When using a case statement, all possible combinations must be specified. Alternatively, the case statement can use a default condition at the end of the case statement.

The logic synthesized from the verilog code is an AND gate. As a result, if the incomplete verilog code is synthesized, the synthesis tool will infer a latch to enable output Q to hold its previous value. This is referred to as latch inference. This is in accordance to the functionality of Q being generated from an AND function in verilog. Specifying all the possible combinations of A and B allows the synthesis tool to evaluate all the different combinations and synthesize the best logic that can fit the functionality.

There may be occasions where it is difficult or tedious to specify all the possible combinations especially for cases that involve many signals ; use of keyword else at the end of the if-else statement is adequate. Therefore if a verilog code has an incomplete if-else statement, a latch is inferred.

This is undesirable because it increases the amount of logic. To avoid latch inference, use a complete if-else statement that specifies all the different combination of A and B, or use the keyword else at the end of the if-else statement. This ensures that a latch is not inferred even though not all possible combinations of A and B are specified.

The logic synthesized for module useelse is the same as the AND logic shown in the diagram above. Partition a design such that each design module is between 5, gates and 50, gates. A design can consist of many modules. Each module should be partitioned such that it should not be less than 5, gates or more than 50, gates.

A design module that is partitioned too small would cause inefficient synthesis, while a design module that is partitioned too large will have very long synthesis run-time. Figure 3. Using X in coding for synthesis. X is interpreted differently in simulation and in synthesis. Therefore, when coding for synthesis, RTL code can use X. However, use of X in RTL should be limited to internal signals and only used when necessary. Example 3. The logic synthesized from the verilog code uses combinational logic of a multiplexer and NOR gate to form the required functionality.

Similarly to the if-else, there may be occasions where it is difficult or tedious to specify all the possible combinations especially for cases that involve many signals ; use of keyword default at the end of the case statement is adequate. This occurrence is similar to the latch inference for the incomplete if-else statement. This ensures that a latch is not inferred even though not all possible combinations of sel are specified.

Avoid using infinite timing loop. This creates an infinite timing loop and complicates timing analysis. Designs with infinite timing loop on combinational logic must be avoided. Ensure that the sensitivity list is complete. The sensitivity list is the list of signals that are used with an always block. Whenever a signal within the sensitivity list changes, the verilog code in the always block is evaluated by the simulator.

This always block is triggered whenever signal A or B changes. The code is not evaluated by the simulator when signal B changes. Other combinational logic Figure 3. Q t1 t2 t3 t4 When the verilog codes of Example 3. Referring to Figure 3. At time t1, signal A changes. The sensitivity list for Example 3. Since B is 0, Q is also 0. At time t2, signal B changes. The sensitivity list is triggered and the evaluation of Q occurs.

Since A and B are 1, Q is 1. At time t3, signal A changes. Since A is 0, Q is 0. At time t4, signal B changes. Since B is 0, Q is 0. However, signal B is not in the sensitivity list. Nothing occurs due to an incomplete sensitivity list. A B Figure 3. For Example 3. When simulating a verilog code, an incomplete sensitivity list will yield simulation results that do not accurately reflect the functionality of the RTL code. However, when synthesized, both Example 3.

VLIW Microprocessor Hardware Design: For ASIC and FPGA

But from simulation waveforms of Example 3. When writing RTL code, it is good practice to always ensure that the sensitivity list is complete to avoid mismatch. These rules must be followed in order to obtain good RTL code that can translate to optimal synthesis results. Referring to the architectural diagram of Figure 2. For ease of understanding, each operation is numbered and categorized as pipe1, pipe2, and pipe3 with pipe1 operating operation 1, pipe2 operating operation 2, and pipe3 operating operation 3.

All three operations within the VLIW instruction word have access to a sixteen bit register file. The RTL code for the VLIW microprocessor can be split into five separate modules: fetch, decode, execute, writeback, and register file refer to Figure 2. The fetched information is passed to the decode module to allow the instruction to be decoded. It is also passed to the register file module to allow the execute module to retrieve data from its register file for those operations that access internal registers. The VLIW microprocessor processes three operations in parallel within one instruction word.

The data provided for each operation are represented by this bit data bus. As shown in Table 2. Input clock pin. The VLIW microprocessor is active on rising edge of clock. This is a global signal that flushes all the modules, indicating that a branch is to occur. This is a bit bus to pass the data fetched from external instruction memory to the decode unit.

Processor Design | Cadence IP

Represents the destination register for operation 1. Represents the destination register for operation 2. Represents the destination register for operation 3. Represents the instruction of operation 1. Represent the instruction of operation 2. Represents the instruction of operation 3. Represents the source1 register for operation 1. Represents the source1 register for operation 2. Represents the source1 register for operation 3. Represents the source2 register for operation 1.

Represents the source2 register for operation 2. Represents the source2 register for operation 3. Based on the interface signals shown in Table 3. The bit VLIW instruction word is fetched and passed on to decode module as instruction for operation 1. Reset the signals to its default when flushing occurs. However, referring to Example 3. An example is the decoding of word[] to form source1, source2, destination, and instruction.

This decoding logic is put in the fetch module and not in the decode module to enable sharing of decoding logic between the fetch module and the decode module. Commonly all decoding logic is located in the decode module. This in turn balances the critical path between the fetch module and decode module, allowing better overall speed performance. The clock is targeted to run at MHz or 10 ns per clock period. However, the first set of logic in the circuit has a total propagation delay of 13ns while the second set of logic has a total propagation delay of 2 ns. This creates a negative slack of 3 ns.

Note: Negative slack occurs when a design is not able to meet timing specification. A design with negative slack of 3 ns is a design that cannot meet timing specification by 3 ns. To optimize the design shown in Figure 3. Because the first set of logic requires 13 ns while the second stage of logic requires only 2 ns, some of the logic from the first set can be moved to the second set. This allows the logic functionality of the design to be shared equally among both sets of logic, allowing for an optimized design. By sharing the logic, the first set of logic gates has a propagation delay of 8 ns while the second set of logic gates has a propagation delay of 7 ns, allowing the design to achieve the specified MHz clock speed.

Contents of regname. The contents of regname. One is for representation of the register names as reg0, reg1, reg2, This allows for simple representation using names rather than numbers. The verilog code of regname. The operation is passed to execute module for execution.


  1. 10 Habits for Effective Ministry: A Guide for Life-giving Pastors.
  2. FPGAs, SoCs, Microcontrollers— A Quick Rundown of IoT Devices.
  3. The Holy City: Part II. Adoration, No. 10, A New Heaven and a New Earth.
  4. Digital signal processor - Wikipedia;
  5. Natural History of Host-Parasite Interactions?

The RTL verilog code for decode module is shown in Example 3. Represents the instruction of operation 2. Represents the data for operation 1 of the VLIW instruction. The data bus is used only during load instruction. Represents the data for operation 2 of the VLIW instruction.

Represents the data for operation 3 of the VLIW instruction. This allows for simplifying the verilog RTL code of Example 3. Both Example 3. Contents of the register file module is read and passed to the execute module, while results of operations is written to the register file module by the writeback module.

To maintain simplicity and ease of understanding on the register file module, the function of register scoreboarding is not implemented. Represents the bit result of operation 1 executed by execute module. Represents the bit result of operation 2 executed by execute module. Represents the bit result of operation 3 executed by execute module. Represents the write signal from writeback module to register file module.

Account Options

Represents the destination register of operation 1. Represents the destination register of operation 2. Represents the destination register of operation 3. The contents are the source1 data of operation 1. The contents are the source1 data of operation 2. The contents are the source1 data of operation 3. The contents are the source2 data of operation 1. The contents are the source2 data of operation 2.

The contents are the source2 data of operation 3. Represents the source1 register of operation 1. Represents the source1 register of operation 2. Represents the source1 register of operation 3. Represents the source2 register of operation 1.

Tensilica Processor IP

Represents the source2 register of operation 2. Represents the source2 register of operation 3. During nop, the source1 and source2 contents passed to execute module are zero. In multiply operation, only lower 32 bits are used to form a bit multiply result. A possible data corruption situation may occur. If the write and read operations to the register file module occur at the same time to the same register location, data corruption may happen.

A possible stale data situation may occur when there is data dependency between different VLIW instructions. The second VLIW operation consists of an addition function of contents reg1 and reg2, a subtraction function of contents of reg2 and reg3, and a read function of reg6. However, before the writeback stage can complete writing the data into the three mentioned registers, the contents of register reg1, reg2, and reg3 are read. The contents of reg1, reg2, and reg3 are passed to the execute stage to allow the execution of the operations of the second VLIW instruction add r1,r2,r4:sub r2,r3,r5:read r6.

This creates a situation of stale data being passed to the execute stage as the data from the load operation have not been written into the registers. To workaround the problem of data corruption and stale data, a design concept called register bypassing is introduced to the VLIW microprocessor. This allows the data to be written into the register file to be used directly by the execute stage instead of reading stale data from the register file. From an architectural perspective, implementation of register bypassing is shown in Figure 3. In Figure 3. The multiplexer chooses between the contents read from the register file or the output of the writeback stage to be used by the execute stage.

Which data are chosen depends on whether instruction dependency is detected by the VLIW microprocessor. This will require additional logic to monitor for instruction dependency. For the VLIW microprocessor, the feature of register bypass and instruction dependency monitoring is implemented in the execute stage, which is discussed in Section 3.

Hennessy and David A. Patterson Morgan Kaufmann. Its functionality is to execute the operations of the VLIW instruction. The feature of register bypassing is also implemented in the execute module. TABLE 3. Represents the data for operation 1. The data bus is used only for load instruction. Represents the data for operation 2.

Represents the data for operation 3. Represents the bit result of operation 1. Represents the bit result of operation 2. Represents the bit result of operation 3. This signal is passed from writeback module to register file module. Represents the write signal from execute module to writeback module. This signal indicates to the external instruction memory module that a branch to another VLIW instruction is occurring. The external instruction memory module will fetch the newly branched instruction. A significant portion of the logic required for the execute module is for the register bypassing mechanism for avoiding data corruption and stale data as discussed in Section 3.

Among the different conditions which require register bypassing are the following: 1. Before the data is written into register r3 in the register file, the contents of register r3 are read and passed from the register file to the execute module for use on the VLIW instruction add r3, r0, r sub r0, r1, r mul r1, r2, r This causes stale data on the contents of register r3.

To work around this problem, a register bypass is required between source1 data of operation1 to operation1 of VLIW instruction. As such, the amount of logic required in the execute module for register bypass is rather significant and complex. This complexity increases when the VLIW microprocessor increases the number of operations in parallel. For four parallel pipes, a total of different conditions require register bypassing. Note: To ease understanding on the RTL code of the execute module, only partial bypassing logic is implemented.

For load instructions, a written register value can only be used for instruction dependency after 2 clocks.

For other instructions, a written register value can only be used for instruction dependency after 1 clock for intrapipe bypass and 2 clocks for interpipe bypass. Shift right inst. Its functionality is to write the results of executed operations into the register file. Represents the write signal to register file module.

VLIW Microprocessor Hardware Design

Represents the bit data read out of the VLIW microprocessor for operation2. Represents the bit data read out of the VLIW microprocessor for operation3. Represents a data valid condition on the output port of readdatapipe1, readdatapipe2, and readdatapipe3. It is a top level instantiation of the five modules of fetch, decode, execute, writeback, and register file.

The top level interface signals for the VLIW microprocessor and its interface signal functionality are shown in Table 2. The module vliwtop is integrated with periphery modules as shown in Figure 2. Testbenches can be in many different forms. Whatever the language used for writing testbench, the end result is the same: creation of testbenches used for simulating the design.

Testbenches is a wrap-around of a design, which allows the testbench to pump in stimulus into the design under test, and monitoring the output of the design. If the output waveforms of the design are not as expected, a bug has occurred. The bug could be in the design or in the testbench. When a bug is found, the designer must debug the waveforms and decide if it is from the design or the testbench. Either way, the bug must be fixed and simulation is performed again.

Only when the output waveform of the design is as expected can the designer proceed to the next phase of the design flow synthesis. The testbench and RTL code of the design are simulated using a verilog simulator. The simulation waveform from the verilog simulation is checked for matches with the expected waveform. If the simulated waveform is not what is expected, the designer will have to modify the RTL code of the design or the testbench, depending on which causes the error. A resimulation is performed, and this action is repeated until the designer is satisfied that the simulated waveform matches what is expected of the design.

Appendix A shows some of the testbenches that are used to verify the VLIW microprocessor and its corresponding simulation waveform. N Y Figure 3. Proceed to synthesis 3. Each of these features has to be thoroughly simulated in order to ensure that the design is fully functional. As such, it is a good practice to always create a testplan to define the different testbenches that are needed to fully validate the design. The testplan will serve as a useful guide to achieving the targeted verification milestone.

In a design project, it is common for a design module to be checked for its code coverage to ensure that most portions of the RTL verilog code have been verified. Code coverage is a method in which a code coverage tool can analyze all the testbenches and the RTL verilog code of the design and provide a report on portions of the RTL code that is not exercised by the testbenches. The more RTL verilog code that is exercised, the better the code coverage.

If the code coverage is less than the targeted rate, more testbenches must be written to verify those parts of the RTL verilog code that are not exercised. The code coverage analysis tool analyzes all the different testbenches with the RTL code, and provides a detailed report on portions of the RTL code that are not exercised. Referring to Example 3. Statements that indicate zero count show that it has not been exercised.

The designer will have to expand or create new testbenches to exercise those statements with zero count. Synthesis is the process of converting and mapping the RTL verilog code into logic gates based on a standard cell library. The process of synthesis requires three separate inputs: 1. RTL design code Synthesis can be categorized into pre-layout synthesis and post-layout synthesis. Pre-layout synthesis is synthesis on the RTL code using estimation on the interconnects between gates. Pre-layout synthesis uses wireload models which are statistical models of estimation on interconnects.

Post-layout synthesis is an incremental synthesis process that is performed after layout. The interconnects between gates are accurately extracted after layout and back annotated for post-layout synthesis. The alphabet represents the size of the gate. A larger alphabet represents a larger size gate which has larger drive strength.

The standard cell library is an important requirement during synthesis, as the RTL code is mapped to the logic gates of the standard cell library. Size of a standard cell library varies greatly between different designs. Typically a standard cell library has at least 50 types of gates to several hundred types of gates.

A larger standard cell library can have better synthesis optimization compared to a smaller standard cell library. However, a large standard cell library is difficult to create and maintain. The size of a standard cell library is dependent on the type of application for which the design is targeted. Designs targeted for high speed performance commonly have a large standard cell library that consists of hundreds of logic gates while designs that do not require high speed performance commonly have a smaller standard cell library.

This file contains all the information about the different logic gates, its input capacitance, its area, fan-out information, and timing information on each pin of the logic gate. Each of these synthesis tools uses its own binary format for the standard cell library. The synthesis tool provides for a Library Compiler that can compile the. Apart from requiring an RTL code and standard cell library, the process of synthesis also requires design constraints. Design constraints specify requirements of the synthesized circuit, for example: 1. What is the clock frequency?

Should synthesis focus on synthesizing a circuit for performance or should it synthesize for area optimization? What is the allowed fan-out for the logic gates? Are there any multicycle paths? Multicycle paths are paths in a design that require more than one clock cycle Refer to Figure 3. Otherwise, the synthesis tool may spend a lot of its computational resource to optimize that path when there is no necessity for it since the path requires more than one clock cycle.

Are there any false paths? False paths are paths that are asynchronous in nature and can occur at any given time, irrespective of clock reference. False paths must be specified to allow the synthesis tool to understand which paths are false. An example of a false path is reset path. This is commonly referred to as synthesis tweaks.

Several methods of performing synthesis tweaks are described in Section 3. The output of synthesis is a structural gate level netlist which is passed to layout. Appendix B shows the structural gate level netlist of the VLIW microprocessor generated from synthesis. The designer will need to perform synthesis tweaks to squeeze the synthesis tool to obtain better and improved synthesis results. There are several ways to tweak the synthesis process: 1. Full-time 64 Contract 3.

Experience Level. Forgot to save your resume? Use for to create your resume on Indeed and apply to jobs quicker. Save your resume. You consent to receiving marketing messages from Indeed and may opt from receiving such messages by following the unsubscribe link in our messages, or as detailed in our terms. Upload your resume - Let employers find you. Page 1 of 64 jobs. Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed.

For more information, see the Indeed Privacy Policy. Embedded Firmware Engineer. Banner Engineering Corporation 38 reviews. Apple 8, reviews. Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed. For more information, see the Indeed Privacy Policy.

Embedded Firmware Engineer. Banner Engineering Corporation 38 reviews. Apple 8, reviews. Embedded Software Engineer. Los Alamos National Laboratory reviews. Computer Processor Architectures and Design. Proficiency debugging embedded software systems. Annapolis Micro Systems, Inc. Experience with real-time embedded environments RTOS. View all Annapolis Micro Systems, Inc. Embedded Hardware Engineer. Hardware and embedded software design and development for…. View all HK Consulting, Inc. RF Design Engineer.