At 65 and 40nm, hold fixing is becoming equally important as setup fixing. But, most of the P&R tools wont spent the same amount of effort in fixing hold violations as they do in setup. Traditionally, by inserting delay cell the path which is violating hold gets resolved. Nothing really wrong with this approach, but if there are 50K or 100K hold violations, inserting that many or more delay cells is overkill in terms of area. There are some other ways to handle this.
Here are some other ways hold violations can be fixed so that adding too many cells can be avoided:
a. Area recovery : Often overlooked, but by doing area recovery, the setup slack (or positive slack) on the path can be reduced and thus can help reduce the number of hold violations.
b. Using more HVT cells : Doing leakage power optimization has it's benefit with hold fixing as well. Using more HVT cells will slow down the timing path and can help fix the hold violation.
c. CTS : Clock trees can be build such that there is not so much skew in the best case corner. Using multi-corner cts can easily help address this.
d. Useful skew : Clock-trees can be skewed to address hold.
f. Using driver and receiver sizing can also help fixing hold violations.
Wednesday, June 17, 2009
Friday, May 15, 2009
When NOT to have "shut-down" power domain?
There has been a lot of talk about low power these days and one of the techniques which is widely published and used is to create shut-down region also some times referred to as multi-supply design or MTCMOS based design. You can search the net to find 1000s of articles on why this is useful and possible advantages associated with it.
But, have you ever wondered when NOT to create shut-down power domain in your chip?
Here are possible dis-advantages in creating shut-down region in your design:
- If your circuit is such it there is very little possibility to put it to sleep.
- If the cycle time to wake-up the circuit is not an acceptable spec for the application in which it is being used.
- Most of the EDA tools want this shut-down power domain in a separate hierarchy module. While this is not difficult to do, but if you are inheriting your RTL which is already 90% pre-written based on previous chip, it make it challenging to re-group your hierarchy to meet this need.
- Moreover, even if you have a hierarchy (module) identified to be shut-down, but not every element in there can be shut-down can introduce addition design constraints such as adding retention flops or creating always-on islands for these elements.
- MTCMOS switches occupy a finite area in your chip. So, by adding them you are going to have a bigger die size. In addition, you'll have to add control circuit to shut-down and wake up these cells. This also means more cells and thus bigger die size.
- MTCMOS switches are leaky when in on-state. This will add to the leakage power of the design when in on-state. This becomes an issue when your leakage power specs are very tight.
- MTCMOS requires accurate analysis of rush-current and circuit transients due to sudden wake-up of shut-down regions. While this is doable, it is going to put extra burden on the back-end designer to come up with the right number of switch cells and robust power plan.
- Using shut-down regions will add extra steps in your chip design flow in terms to IR analysis, verification and implementation. If you have tight design schedule and this flow is not in-place it is a sure no-go. This also means adding new tools and licenses in your flow which may be another reason from business stand-point (Yeah, lot of you'll dis-agree to this but how many times you have used certain tool just because that was the only one available for you?)
Any more you can think of? Feel free to comment...
But, have you ever wondered when NOT to create shut-down power domain in your chip?
Here are possible dis-advantages in creating shut-down region in your design:
- If your circuit is such it there is very little possibility to put it to sleep.
- If the cycle time to wake-up the circuit is not an acceptable spec for the application in which it is being used.
- Most of the EDA tools want this shut-down power domain in a separate hierarchy module. While this is not difficult to do, but if you are inheriting your RTL which is already 90% pre-written based on previous chip, it make it challenging to re-group your hierarchy to meet this need.
- Moreover, even if you have a hierarchy (module) identified to be shut-down, but not every element in there can be shut-down can introduce addition design constraints such as adding retention flops or creating always-on islands for these elements.
- MTCMOS switches occupy a finite area in your chip. So, by adding them you are going to have a bigger die size. In addition, you'll have to add control circuit to shut-down and wake up these cells. This also means more cells and thus bigger die size.
- MTCMOS switches are leaky when in on-state. This will add to the leakage power of the design when in on-state. This becomes an issue when your leakage power specs are very tight.
- MTCMOS requires accurate analysis of rush-current and circuit transients due to sudden wake-up of shut-down regions. While this is doable, it is going to put extra burden on the back-end designer to come up with the right number of switch cells and robust power plan.
- Using shut-down regions will add extra steps in your chip design flow in terms to IR analysis, verification and implementation. If you have tight design schedule and this flow is not in-place it is a sure no-go. This also means adding new tools and licenses in your flow which may be another reason from business stand-point (Yeah, lot of you'll dis-agree to this but how many times you have used certain tool just because that was the only one available for you?)
Any more you can think of? Feel free to comment...
Saturday, May 2, 2009
Temperature Inversion. Is this a new 45nm effect?
TSMC recently announced that temperature inversion will be one of the challenging effects for 45nm process.
Here is the article I am referring to: http://www.edn.com/article/CA6434612.html
So, what is temperature inversion and how it affects delay and leakage power?
Actually, this effect was there at 65nm process node as well. But, people just didn't notice it so much. Basic concept is that transistor's threshold voltage increases at lower temperature and in turn will cause more delay at lower temperature as compared to higher temperatures.
This is contrary to the fact that you design your chip for worst case and best case condition. Worst case being worst PVT. Now, at 65nm and below this is no longer true. Your worst case can be not PV-Tmax but it will be PV-Tmin. So, what happens is that if you design your chip with worst case condition(say 125C) and meet the clock frequency, you'll still have timing paths failing at Tmin (say -40C).
This is easy to handle. Most of the P&R tools support multi-corner-multi-mode analysis. So, you can just plug-in the data and let the tool optimize for all those scenarios or determine your worst delay scenario and just work on that. The challenging part is that you need to have your libs characterized not only for the worst temp corner but also for the lower temp corner in worst mode.
That's not all. With this, your leakage power measurement also gets impacted. Leakage power is generally pretty good at Tmin. But, is worst at Tmax. Traditionally, if you are optimizing for timing at Tmin, and also tend to measure leakage at Tmin, you get wrong impression of your chip dissipating very less power. You need to measure leakage power at the Tmax. There are two ways today's P&R tools handle this:
a) Vendor will tell you to cut-past worst leakage numbers in the Tmin libraries and this way you are measuring worst delay and worst leakage using the same .libs. This libraries can only be used by P&R tool and later you still use original libs when you go to sign-off.
b) Some tools allow leakage corner based reporting. Here you don't need to create separate libs with mix and match as in case (a) but instead just instruct P&R tool as to which libs you want leakage power to be measured on.
There are some more effects like HVT cells being more leaky at lower temperature nodes and it's implications. I'll discuss that in one of the later posts.
Here is the article I am referring to: http://www.edn.com/article/CA6434612.html
So, what is temperature inversion and how it affects delay and leakage power?
Actually, this effect was there at 65nm process node as well. But, people just didn't notice it so much. Basic concept is that transistor's threshold voltage increases at lower temperature and in turn will cause more delay at lower temperature as compared to higher temperatures.
This is contrary to the fact that you design your chip for worst case and best case condition. Worst case being worst PVT. Now, at 65nm and below this is no longer true. Your worst case can be not PV-Tmax but it will be PV-Tmin. So, what happens is that if you design your chip with worst case condition(say 125C) and meet the clock frequency, you'll still have timing paths failing at Tmin (say -40C).
This is easy to handle. Most of the P&R tools support multi-corner-multi-mode analysis. So, you can just plug-in the data and let the tool optimize for all those scenarios or determine your worst delay scenario and just work on that. The challenging part is that you need to have your libs characterized not only for the worst temp corner but also for the lower temp corner in worst mode.
That's not all. With this, your leakage power measurement also gets impacted. Leakage power is generally pretty good at Tmin. But, is worst at Tmax. Traditionally, if you are optimizing for timing at Tmin, and also tend to measure leakage at Tmin, you get wrong impression of your chip dissipating very less power. You need to measure leakage power at the Tmax. There are two ways today's P&R tools handle this:
a) Vendor will tell you to cut-past worst leakage numbers in the Tmin libraries and this way you are measuring worst delay and worst leakage using the same .libs. This libraries can only be used by P&R tool and later you still use original libs when you go to sign-off.
b) Some tools allow leakage corner based reporting. Here you don't need to create separate libs with mix and match as in case (a) but instead just instruct P&R tool as to which libs you want leakage power to be measured on.
There are some more effects like HVT cells being more leaky at lower temperature nodes and it's implications. I'll discuss that in one of the later posts.
Thursday, April 30, 2009
Which Clock tree is better?
There are two clock trees build for the same 65nm-block: One with 1.2ns latency and 150ps skew and other has 2ns latency and 100ps skew. Which clock tree is better and why?
Answer is clock tree with 1.2ns latency and 150 ps is better.
Why?
At 65nm, the timing derates and OCV effects will come into play. Clock trees with longer latency are more susceptible to all this. Also, CRPR claiming needed on a 1.2ns clock tree is much less than 2.0ns clock tree. If the branching has not happened at the very end, you cannot claim a lot of CRPR on the 2.0ns clock tree and it will in-turn cause lot of timing disturbance in the design. Of course, this is assuming that the design is using timing derates.
If design is not using timing derates, still due to OCV related effects, 1.2ns latency tree is favoured. In addition, most likely 2.0ns tree will be using more # of clock drivers and thus more clock tree power. Since the latency is large, it may be possible that the clock waveform can be sluggish if the clock transitions are not maintained properly. From the design practice, standpoint too, you dont want the clock latency to be too long else it becomes more un-predictable to deliver the clock efficiently to all parts of the chip.
But wait, what about skew? The clock tree with 2ns latency has 100ps skew. Ain't that good?
This skew is global skew. So, it may be between two branches which has no timing relationship. So, really claiming that we have a better skew may not mean anything since in reality the two flops which are really talking (from timing standpoint) may have much better skew and that is what we are really concerned about.
Answer is clock tree with 1.2ns latency and 150 ps is better.
Why?
At 65nm, the timing derates and OCV effects will come into play. Clock trees with longer latency are more susceptible to all this. Also, CRPR claiming needed on a 1.2ns clock tree is much less than 2.0ns clock tree. If the branching has not happened at the very end, you cannot claim a lot of CRPR on the 2.0ns clock tree and it will in-turn cause lot of timing disturbance in the design. Of course, this is assuming that the design is using timing derates.
If design is not using timing derates, still due to OCV related effects, 1.2ns latency tree is favoured. In addition, most likely 2.0ns tree will be using more # of clock drivers and thus more clock tree power. Since the latency is large, it may be possible that the clock waveform can be sluggish if the clock transitions are not maintained properly. From the design practice, standpoint too, you dont want the clock latency to be too long else it becomes more un-predictable to deliver the clock efficiently to all parts of the chip.
But wait, what about skew? The clock tree with 2ns latency has 100ps skew. Ain't that good?
This skew is global skew. So, it may be between two branches which has no timing relationship. So, really claiming that we have a better skew may not mean anything since in reality the two flops which are really talking (from timing standpoint) may have much better skew and that is what we are really concerned about.
Tuesday, April 28, 2009
Why do people use timing derates at 90 nm and below as opposed to using clock uncertainty?
The purpose for timing derates and clock uncertainty is to over constraint the design. At higher tech nodes (like 0.18 um and above), people used to rely on clock uncertainty to make sure they hit the required frequency target for the chip. But, at 90nm and below, people use timing derates and/or clock uncertainty to over constraint the design.
The reason behind using timing derates at lower geometries is to allow taking OCV related effects and more path specific effects into account. For example, assume that you have a path which is 5-stages long and other paths which is 40-stages long. With help of clock uncertainty you are going to penalize both the paths by same amount thus overcontraining your design. With help of timing derates you are allocating budgets according to the path length and nature. This will do less overconstraining on your design and thus get you more silicon area :)
The reason behind using timing derates at lower geometries is to allow taking OCV related effects and more path specific effects into account. For example, assume that you have a path which is 5-stages long and other paths which is 40-stages long. With help of clock uncertainty you are going to penalize both the paths by same amount thus overcontraining your design. With help of timing derates you are allocating budgets according to the path length and nature. This will do less overconstraining on your design and thus get you more silicon area :)
Friday, April 17, 2009
Why do people build clock tree with SVT cells and rest of the design with HVT cells?
There are several reason and advantages for doing this:
- SVT cells are faster as compared to HVT cells. So, you can get better clock latency by using SVT cells.
- Better latency can also result in less OCV effect (i.e. less CRPR to claim etc) and thus better timing.
- Numbers of clock buffers need to build the clock tree are less. This results in better dynamic power since the clock tree is always active and dynamic power is directly proportional to the number of cells used in the design. So, although you are going to have more leakage power, your overall chip power can still be substantially saved by this.
- You can also save area by using less number of clock buffers. Useful if you have a complex clock tree and design is already congested after place_opt.
- HVT cells are more prone to SI/noise effects. Using SVT cells, you can guarantee that there will be minimal SI effects on the clock network.
HVT = High Vth threshold
SVT = Standard Vth threshold
LVT = Low Vth threshold
- SVT cells are faster as compared to HVT cells. So, you can get better clock latency by using SVT cells.
- Better latency can also result in less OCV effect (i.e. less CRPR to claim etc) and thus better timing.
- Numbers of clock buffers need to build the clock tree are less. This results in better dynamic power since the clock tree is always active and dynamic power is directly proportional to the number of cells used in the design. So, although you are going to have more leakage power, your overall chip power can still be substantially saved by this.
- You can also save area by using less number of clock buffers. Useful if you have a complex clock tree and design is already congested after place_opt.
- HVT cells are more prone to SI/noise effects. Using SVT cells, you can guarantee that there will be minimal SI effects on the clock network.
HVT = High Vth threshold
SVT = Standard Vth threshold
LVT = Low Vth threshold
Why this blog?
Most of my time is spent interacting with physical designers. Sometimes I also interview potential AE candidates. What I have seen is that lot of people know common physical design terms but don't really know the details beneath it or what are the potential implications/advantages of one over the other.
With economy in dire state, there are lot of people looking out for job or trying to improve their skills. But, they are left with very little choice to learn new things or explore in-depth into why things have to be done in certain way.
I have been in the physical implementation field since past 10 years and is working with lot of customers and that puts me in a unique position to understand some of these things and share the same with readers. Lot of these may be really fundamental and in some cases you may even know what I am talking about but, I am sure there will be some people who can find value or learn something from it.
I will try to come up with some topics every week and try to post at least once a week. Please post your comments so that I can do necessary variation to my future posts...
And last not the least, the views expressed here are all my own and is not endorsed or no way representative of the company I work with or used to work with :)
With economy in dire state, there are lot of people looking out for job or trying to improve their skills. But, they are left with very little choice to learn new things or explore in-depth into why things have to be done in certain way.
I have been in the physical implementation field since past 10 years and is working with lot of customers and that puts me in a unique position to understand some of these things and share the same with readers. Lot of these may be really fundamental and in some cases you may even know what I am talking about but, I am sure there will be some people who can find value or learn something from it.
I will try to come up with some topics every week and try to post at least once a week. Please post your comments so that I can do necessary variation to my future posts...
And last not the least, the views expressed here are all my own and is not endorsed or no way representative of the company I work with or used to work with :)
Subscribe to:
Posts (Atom)