https://doi.org/10.52326/jes.utm.2024.31(2).04 UDC 621.3.049.77:004.8



# METHODOLOGY AND BACKEND FLOW OPTIMIZATION FOR 3D

Ionica-Marcela Pletea , ORCID 0009-0001-9698-6200

Technical University of Moldova, 168 Stefan cel Mare Blvd., Chisinau, Republic of Moldova \* Corresponding author: Ionica-Marcela Pletea, ipletea@etti.tuiasi.ro

> Received: 05. 10. 2024 Accepted: 06. 18. 2024

**Abstract**. The article deals with the complexity of the workflow used when integrated circuits are implemented using 2D electronic design automation (EDA) tools and adapting the workflow for 3D integrated circuits. If some issues are not identified promptly in the workflow, the working time to produce an integrated circuit is increased. By analyzing and refining workflow, identifying bottlenecks and early problems in a design, the optimization process will ensure an error-minimized trajectory from synthesis to final design prepared for manufacturing. Considering all the details at every stage of the working flow will help the backend designer to solve problems faster and save time. It was created detailed scripts for automatization of the process at every stage including floorplan, power plan, placement, clock tree synthesis, routing and all physical and logical analysis. The workflow was optimized using loops at all levels, extracting important information at every level and improving the process of working. Moreover, optimization techniques contribute significantly to precision and quality in design implementation. This working flow can be used to implement 3D integrated circuits with automated 2D tools from Synopsis.

Keywords: optimization, workflow, integrated circuits, automatization, 3D.

**Abstract**. Articolul tratează complexitatea procedurii de lucru utilizata atunci când circuitele integrate sunt implementate folosind instrumente electronic design automation (EDA) 2D si adaptarea procedurii de lucru pentru circuitele integrate 3D. Dacă unele probleme nu sunt identificate prompt în fluxul de lucru, timpul de lucru pentru producerea unui circuit integrat este crescut. Prin analiza și rafinarea fluxului de lucru, identificarea blocajelor și a problemelor timpurii dintr-un proiect, procesul de optimizare va asigura o traiectorie minimizată de erori de la sinteză până la proiectarea finală pregătită pentru fabricație. Luarea în considerare a tuturor detaliilor în fiecare etapă a fluxului de lucru va ajuta designerul backend să rezolve problema mai rapid și să economisească timp. S-au creat scripturi detaliate pentru automatizarea procesului în fiecare etapă, inclusiv florplanul, power planul, plasarea, clock tree synthesis, rutarea și toate analizele fizice și logice. Fluxul de lucru a fost optimizat folosind bucle la toate nivelurile, extragând informații importante la fiecare nivel și îmbunătățind procesul de lucru. Mai mult, tehnicile de optimizare contribuie semnificativ la precizie și calitate în implementarea proiectării unui circuit. Acest flux de lucru poate fi folosit pentru a implementa circuite integrate 3D cu instrumente automate 2D de la Synopsis.

**Cuvinte cheie**: *optimizare, procedura de lucru, circuite integrate, automatizare, 3D.* 

#### 1. Introduction

Every two years the number of transistors on an integrated circuit doubles and confirms the Moore's Law from 60 years ago [1,2]. Following this prediction in a few years will be necessary to find a new approach to manufacturing integrated circuits and since 2D it is not a solution, moving to 3D will be an alternative. In industry the focus is on finding solutions for the 3D integration and there are few options like through-silicon via and monolithic 3D integration. But besides the technology used for 3D integration there will be necessary tools to allow this new integration of the circuits.

To produce 3D integrated circuits there are at least 2 directions that can "pave" the way: one is developing technology which allows the 3D integration, and the other one is focus on the tool which should be "aware" by the 3<sup>rd</sup> dimension. Developing new technology and new tools will help a lot with the progress of workflow for the new 3D integrated circuits.

The detailed design of a 2D integrated circuit, which includes all the steps from the initial stage in the backend flow called floorplan to the last stage in the flow called routing, can be the starting point to develop a flow for 3D integrated circuits. The normal steps in the workflow used to design 2D integrated circuits are in general the next ones: floorplan, power plan, placement & optimization, clock tree synthesis and routing and all these steps can be adapted and optimized for the implementation of 3D integrated circuits [3]. Besides these mandatory steps in 2D flow, there are some ways to optimize the workflow and to do the process faster and this involve early checks, extracting information from the beginning of the process and recheck all the reports after each step.

Research for 3D integration can be done with the help of 2D design programs, by adapting technological files that allow 2D tools to superimpose memories on logic. One of the goals of achieving 3D integration is to reduce the length and complexity of interconnections as well as delays associated with 2D integration [4,5]. Also, when 3D integration is brought into discussion, the using of different technology on different stratum can be considered and this strategy can have a big impact on speed and power consuming.

A suitable design should have about half memory area and half standard cells area, memories going to the 2<sup>nd</sup> stratum and standard cell remaining on the 1st stratum. Usually, these 2 strata will have different placement utilization, but anyhow much smaller than 100% (this is very depended by design size and shape as well as by memories' size, shape and number) [6,7].

Considering memories' area and 2nd stratum utilization along with standard cells area and 1st stratum utilization, a shrinking VALUE for memories was computed. Each memory must be shrunk with this VALUE on each of its 4 sides (memories' width and height will be reduced by 2\*shrinking VALUE). This is due to the way 2D tools are evaluating design's area at beginning of block placement. Tool (area checker) doesn't know about the 2nd stratum, and it considers that whole logic will go on the same stratum, so a utilization bigger than 100 % is not acceptable. Total memory reduced area should be "equivalent" with standard cells area (e.g. 90 % utilization on 2nd stratum results in total mems area deduction of standard cells area (assuming 100 % utilization) -10% of 2<sup>nd</sup> stratum's area [7-9].

### 2. Floorplan and placement in 3D

The methodology used to place memories above logic in the case of 3D integration consists of steps that mostly use elements of existing 2D integrated circuit design methodologies, but which is completed with specific and necessary steps for 3D integration

[4]. From the first step called floorplan, where is estimated the area of the design, the tool needs to be aware about the size of the memories and in this way will go further with the next steps.

Using the 2D flow design of integrated circuits, in the first stage the input-output memories and pins were placed, and their positions fixed in the respective locations. For each memory on layer 2, a special representation was generated that allowed the program to design integrated 2D circuits to overlay logic under memory. To avoid overlapping memories with other memories (separation on the two layers was done as follows: memories on layer 2 and logic on layer 1), generating special memory representation involved a special form of coil whose pitch was less than the smallest memory in the design as shown in Figure 1. The serpentine allows the design program to place logic under the memories but does not allow other memories to be placed.



Figure 1. 3D placement of memories above standard cells.

Using memory representation in the form of coils, the 2D integrated circuits design tool "sees" a smaller area use than if memories are used in their real form, and this allows the total area of the circuit to be reduced so that the overlapping of the two layers confirms the benefits of 3D placement [10].

Before using this form of serpentine for memories, some other shapes were generated and used, but the size of the area occupied by the memories and interpreted by the tool exceeded the area allowed to place all the logical in the created floorplan [11,12].

In the second stage, the global placement of logic under memories according to time constraints and connections between components was achieved. The use of the area used to place logic decreases in the case of 3D integration of the design, given the increase in the

area in which the logic can be placed. Also, the length of connections between components is reduced, due to the shortening of the distance between them vertically [13].

After the global placement of the logic, detailed placement of logic components was performed, which involved optimizing the logical components and their connections, which was evident by reducing the area of the combinational cells and the buffers and inverters used to comply with the time constraints in the design.

Going through the stages of 3D implementation of the circuit until this phase was done by modifying the technological files, as well as the scripts used in the implementation flow. The changes were made iteratively, adapting and modifying the implementation method according to the results [11].

After the placement of memories in the form of serpentines, virtual representations were replaced with real memories from technology. There is necessary to use virtual representations of the memories, to "force" the 2D tool to recognize the 3D placement which will allow to generate all the reports about area and power.

The flow was implemented using Synopsys Design Compiler and Synopsys IC Compiler tools, and the memory libraries were adjusted manually.

The replacement of virtual forms of memories in the form of a coil is done after each stage of the flow to perform all the checks related to memories: the legitimacy of memory placement, the time paths to and from memories, the correctness of routing paths to and from memories [6,7]. After the replacement of the virtual forms of memories all the reports about area and timing are generated, not only to and from memories to the cells but also to and from cells to cells.

The transition from one stage of flow to another is done using virtual representations of memories to avoid possible design rule constraints errors if real memories are used. Basically, when switching from floorplan to placement, as well as from placement to clock tree, virtual memory representations were used, and at the end they were replaced by technological/real ones [14].

# 3. Results and Discussion

# **3.1 Clock Tree Synthesis**

To synchronize data in a design, data transfer between sequential elements of the design is done using one or more clock signals.

The clock tree is the most widely used structure for distributing clock signals in the network. Achieving design performance in terms of circuit frequency can be helped by achieving a high-performance clock shaft [15].

Clock shaft delay (skew) and power consumption are the main objectives considered in optimizing the clock tree. The clock shaft can consume over 30% of the power consumed by the entire design, due to the high switching activity of the clock signal.

The balance of delays on the branches of the clock tree in order to obtain an efficient skew is done by inserting buffers and / or inverters on the path of the clock signal. Clock tree synthesis is an important step in the design flow of digital integrated circuits and plays an important role in implementing a balanced clock tree to help achieve design time constraints. The main purpose in building the clock tree is to reduce skew, cover all registers in the design and maintain as small an area of design as possible.

The implementation of the clock tree in the 3D dimension with the help of the 2D integrated circuit design tool, namely Synopsys IC Compiler, was achieved after placing the memories, represented graphically by a coil as in Figure 1.



Figure 2. Distribution of the 3D clock tree under virtual memories.

The modified memories have the pitch smaller than the smallest memory in the design, so the design rule constraints that would have occurred if the virtual memories are replaced with the real ones, can be avoided [8].

In the structure of the clock tree were used buffers and inverters from technology with sizes from 2X to 32X were used to obtain a balanced tree with a smaller area. The clock tree synthesis was built using only buffers or only inverters, and after comparing the reports for both scenarios, using buffers and inverters at the same time it was the best solution.

Unlike the distribution of the 2D clock tree, where the buffers/inverters used in the structure are placed in the area around the memory, in the case of 3D placement, the clock tree structure is scattered over the entire surface of the core. Disappearing the area constraint, the distribution of the clock tree is much more uniform in the case of 3D, and the need for buffers / inverters in the structure decreases significantly.

The distribution of the clock tree highlighted by white cells, which are buffers and inverters, under the virtual memories can be seen in Figure 2 and can be observed the balanced distribution of the clock tree synthesis. The importance of a balanced clock tree synthesis in a design is reflected especially on the timing, helping in fixing the timing violation and reducing the extra unnecessary pessimism in the design.

# 3.2 Routing

After the implementation of the clock tree, the last stage of flow follows (except for the analyzes and verifications that are done at the end), namely design routing. In this phase, the connections between the components of the design are made at the physical level [10].

The routing phase comprises two phases, namely: global routing in which the length of the nets is estimated and the routes where the nets will be routed, and the second phase is detailed routing in which the nets are drawn on the routing grid [9].

The technology used to implement this design contains 10 metals from M1 to MDRL, MDRL metal being the TOP metal. It has been used only 6 metals out of the 10 for routing signal nets, the last ones were kept for power rings and stripes.

The direction of routing the connections between the components is metal M1 – in the horizontal direction, metal M2 in the vertical direction, metal M3 in the horizontal direction and so on. Basically, odd metals are used for horizontal routing direction, and even metals are used for vertical routing direction. Usually, metals with the greatest width and lowest strength are used to create power.

An important factor in the routing stage is metal congestion. Routing congestion is the ratio of the number of available routing tracks to the number of routing tracks required to route connections. This factor can be at most equal to 1, otherwise there will be design rule constraints. In this new design where the memories have been overlapped above the standard cells, routing from and to memories was the sensitive part and the accessibility to the memories' pins was modified in order to access them [16].

#### 3.3 Flow 2D

The steps taken are the classic ones from the 2D flow of integrated circuit design according to Figure 3, namely: in the first floorplan phase, the shape of the die and core was outlined, followed by the placement of input-output pins and memory placement according to connections. After placing the memories, the memories were fixed on fixed positions, and in the second stage called placement, the cells were standardly placed in the remaining area within the core. Both memory placement and standard cell placement were done considering both technological constraints and timing constraints. After the placement stage, the clock tree was implemented, aiming to reduce the skew and use as few buffers / inverters as possible so that the area is as small as possible. This was followed by timing optimization and then the routing stage in which metals were drawn to create connections between components.

The optimization in term of timing is indicated to be done after every step of the flow, especially starting with the placement, when the timing paths are estimated and the information about the frequency begin to be real. Some information about the timing appear in an early stage named synthesis, but since in that phase there is not information about the real delays, only when technological files, like library exchange format, appear in the flow, the delays through nets start to be real. After clock tree synthesis the crosstalk and process variation are checked and if there are some problems about timing or power, the designer need to go back to the previous levels that can be either floorplan or placement.

This methodology of checking timing, voltage drop - electro migration and additional requests after every level can lead to a decrease in time and less problems that can be identified at the end of the flow. More checks are done after the routing, since the information about the timing are the real ones and signal integrity as well as the hold and setup are checked with the sign-off tool that are more precisely in calculating delays and checking all the timing paths. Even the tool used to implement the design has incorporated all the checkers, to validate the final results in term of timing, rule constrains, power and so on, there are necessary the sign-off tools which check more precisely in calculating and generating clean reports.

This methodology is saving time only if there are created scripts to help at the automatization of the workflow. Besides the time saved using scripts for sure the probability to miss some important errors or defects is significantly decreased.



Figure 3. Detailed 2D workflow of BACKEND.

Journal of Engineering Science

In the end, when the timing is closed, the final checks are done using sign-off tool: design rule constraints, formal verification, antenna rules, layout versus schematic and electrical rule checks. Once there are no more violation, the final netlist can be generated and together with all the technological files can be sent to be manufactured.

### 4. Conclusions

After each step in the flow, it can be seen the advantages of the three-dimensional integration of the circuit. After the floorplan stage, can be observed the total reduction of the area in which the circuit is integrated, then follows the placement and optimization of the placement where the number of buffers and inverters is significantly reduced due to the reduction of the lengths of the connections between the design components.

The three-dimensional organization of the circuit reduces the length of the connections, both the average and the maximum, needed to connect the system components, reducing the power dissipation and increasing the performance at the same time. Reducing the length of the connection will impact not only the layout of the circuit, but also the speed since the delays on the shorter connections between cells will decrease.

Beside the development of the new technologies which allows 3D integrations, the development of the new tools necessary to "understand" the new technology will be a must and all the research on this direction bring an important intake for the new 3D integrated circuits.

Using the 2D integrated circuits flow presented in Figure 3 and adapting all the technological files and all the scripts used for automatization according to the requirements for 3D implementation, it has been created a new flow which can help at adapting the 2D tools with new features that can support 3D implementation.

The focus in this paper was on developing an automated flow adapted for 3D integration, using existing 2D tools and on optimization after each stage of the flow. The 2D flow described in Figure 1 is complex and covers all the steps necessary to implement an integrated circuit and is an important base to develop a flow for 3D integrated circuits using 2D tools.

Conflicts of Interest: The authors declare no conflict of interest.

#### References

- 1. Moore, G.E. Cramming more components onto integrated circuits. *Solid-State Circuits Society Newsletter* 2006, 11, pp. 33-35.
- 2. Muhibul, H.B. History and Evolution of CMOS Technology and its Application in Semiconductor Industry. *SEU Journal of Science and Engineering*. 2017, 11, pp. 28-42.
- 3. Xie, Y.; Cong, J.; Sapatnekar, S. *Three-Dimensional Integrated Circuit Design*. Springer, State College, Pennsyilvania, USA, 2010, pp. 63-77.
- 4. Bobba S.; Chakraborty, A.; Thomas, O.; Batude, P.; Ernst, T.; Faynot, O.; Micheli G. Effective Design Technique for 3-D Monolithic Integration targeting High Performance Integrated Circuits. *Asia and South Pacific Design Automation Conference* 2011, pp. 337-343.
- 5. Chiang, C.; Sinha, S. The road to 3D EDA tool readiness. *Asia and South Pacific Design Automation Conference* 2009, pp. 429-436.
- 6. Kim, D.H.; Lim, S.K. Impact of through-silicon-via scaling on the wirelength distribution of current and future 3D ICs. In: *Interconnect Technology Conference, IEEE*, 2011.
- 7. Pletea, I.; Wurman, Z.E.; Or-Bach, Z. Monolithic 3D layout using 2D EDA for embedded memory-rich designs. In: *Interconnect Technology Conference, IEEE*, 2015, pp. 1–2.

- Sakuma, K.; Andry, P.S; Tsang, C.K.; Wright, S.L.; Dang, B.; Patel, C.S.; Webb, B.C.; Maria, J.; Sprogis, E.J.; Kang, S.L.; Polastre, R.J.; Horton, R.R.; Knickerbocker, J.U. 3D chip-stacking technology with through-silicon vias and low-volume lead-free interconnections. *IBM J. Res. & Dev.* 2008, 52 (6), pp. 611-622.
- 9. Synopsys. 3D Integration. Available online: https://past.date-conference.com/date09/files/file/09-workshops/date09-3dws-digestv2-090504.pdf (accessed on 23.04.2021).
- 10. Pavlidis, V.F.; Friedman, E.G. Three-dimensional Integrated Circuit Design. *Morgan Kaufmann Publishers*, Burlington, Massaschusetts, USA, 2009, pp. 79-98.
- 11. Jiang, I.H.R. Generic Integer Linear Programming Formulation for 3D IC Partitioning. In: *IEEE SOC Conference*. 2009, pp. 321-324.
- 12. Tan, C.S.; Peng, L.; Fan, Ji.; Hongyu, L.; Gao, S. Three-dimensional wafer stacking using Cu–Cu bonding for simultaneous formation of electrical, mechanical, and hermetic bonds. *IEEE Transactions on Device and Materials Reliability* 2012, 12(2), pp. 194–200.
- 13. Panth, S.; Samadi, K.; Du, Y.; Lim, S.K. Shrunk-2-D: A physical design methodology to build commercialquality monolithic 3-D ICs. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and System* 2017, 36(10), pp. 1716–1724.
- 14. Beyne, E. The 3-D interconnect technology landscape. IEEE Design & Tes. 2016, 33(3), pp. 8-20.
- 15. Tsai, J.L. Clock Tree Synthesis for Timing Convergence and Timing Yield Improvement in Nanometer Technologies. Dissertation submitted in partial fulfillment for the degree of Doctor in Philosophy (Electrical Engineering). *University of Wisconsin-Madison*, Madison, Wisconsin, USA, 2005, pp. 56-67.
- 16. Ababei, C.; Feng, Y.; Goplen, B.; Mogal, H.; Zhang, T.; Bazargan, K.; Spatnekar, S. Placement and routing in 3D integrated circuits. *IEEE Design & Test of Computer* 2005, 22(6), pp. 520-531.

**Citation**: Pletea, I.-M. Methodology and backend flow optimization for 3D. Journal of Engineering Science 2024, XXXI (2), pp. 39-47. https://doi.org/10.52326/jes.utm.2024.31(2).04.

**Publisher's Note:** JES stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.



**Copyright:** 2024 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Submission of manuscripts:

jes@meridian.utm.md