PowerNotebooks.com - Laptop & Notebook News & Articles
Links:Articles Home
Laptops - General Category Index
Benchmarking: Facts and Fiction
29 July, 2007 in Laptops - General
Deceptions, benchmarks, and statistics... what they have in common is their ability to seem trustworthy even when they actually misrepresent the truth. In this article, I will discuss real world vs. sythentic benchmarks and what this means for you as you decide what or even whether to buy.
First, when discussing benchmarks, we must first establish that a good benchmark measures as few changes as possible. If computer A has X CPU, X Graphics Card, and X memory, and computer B has Y CPU, Y Graphics Card, and Y memory, then the measurements are comparing the system as a whole and do not really help you to determine if upgrading from X to Y graphics card is worth it. A benchmark might depend more on the CPU or memory performance, which skews the value of the GPU. The ideal scenario is that the benchmarked computer system is 100% identical except for the part being measured.
Secondly, the benchmarked system will probably not be identical or even close to your computer. A better graphics card will make a greater difference on a system with the fastest CPU than it will on a system with the slowest CPU. The reason is that on the slower system, the slower CPU creates a bottleneck that the graphics card must wait on. It can be difficult to estimate these things accurately, and most reviews/benchmarks are done on top-of-the-line or near top-of-the-line systems to create the greatest possible margin for the part being tested. Consider benchmarks in reviews to be a "best case" difference between X part and Y part, and be aware that if your system specs don't match up to the benchmark system specs, then you probably won't see quite that amount of difference.
Third, every benchmark should be judged by percentage of difference. This is very important to understand. If a given part is 10% faster, it's not going to make a noticeable difference. Let's use a best case scenario. If a given CPU is 10% faster, and you have a calculation that takes 3:30 min to complete, that calculation will instead complete in 3:09. An example of this sort of process would be mp3 encoding, not gaming or business applications. If you really think you're going to notice 20 seconds over a 3.5 minute process, and you run that process all the time, then by all means, go for the performance increase. A more realistic scenario is that instead of taking 3s for an application to start, it instead takes 2.8s. Good luck noticing that. What I personally look for when considering paying significantly more for a part is whether A) that part is at least 50% faster (a sizeable boost) and B) how important that part is to my use of the computer. I might spend the extra $200 for a jump from integrated graphics to an Nvidia 8600M GT graphics card (a huge performance difference), but if I don't play games or work with 3D apps, it doesn't really matter.
Real-World Benchmarks
The two types of benchmarks are real-world and synthetic benchmarks. A real-world benchmark is a measurement of how long it takes a specific application to complete a specific task. That task is then run and measured on different hardware to quantify the performance difference. This results in a very concrete figure. You'll know that X part is 37% faster than Y part doing this specific task in that specific application. If you use that application a lot, then that will mean everything, and you should lean toward X part. However, that level of detail is also its downside. That particular task could be perfectly suited to X part, and in general X and Y parts are actually equivalent. You have no way of knowing if you only look at one task or one application. So, for real-world benchmarks, you have to view many of them to see the big picture. If you use your computer for many tasks and applications, as do most people, then you have to really dig into the benchmarks to get a feel for what is actually equivalent. If there are big gaps on some applications, then you have to decide how important those applications are to you. It is very common that one part will be good at one thing while another part will be better at something else.
Examples of real-world benchmarks: Running a timedemo on a game to measure FPS, measuring how fast a codec can encode the same mp3, measuring how fast an application can convert mpg2 to DiVX, and measuring how fast a Photoshop filter runs. The BapCo Sysmark benchmarks are generally real-world benchmarks because they run scripts on different applications. However, the results are typically scored in a single "sysmark" score, which is somewhat generic and open to interpretation. You also have to find out what applications are being tested and how the scores are being weighted into the final score. It can be slightly misleading.
Synthetic Benchmarks
Synthetic benchmarks are the opposite of real-world benchmarks. Instead of measuring a specific application, the goal of a synthetic benchmark is to try to quantify a general difference between X part and Y part so that you can get a general feel for the performance difference from a single number. Other advantages of synthetic benchmarks include the fact that they are generally updated to support the newest features of the newest hardware before real-world applications have been released to support those new features, and they can sometimes be written in such a way as to make it easier to compare a single subsystem rather than a real-world application which MUST have the same hardware between two benchmarked systems to create any real conclusions about the part comparison.
The downside of synthetic benchmarks is that a specific application will always be written differently and therefore depend on the hardware differently. So the synthetic benchmark is more of a guess than a determination. The scores often include more than simply X is faster than Y, but also try to compare the features of X and Y, even if those features aren't supported in applications yet. Synthetic benchmarks occasionally disagree dramatically with current real-world applications, though they may end up agreeing with future applications if those applications implement the newer features of Y. But you can't bank on the future as some parts never have their full potential realized.
A more sinster negative to synthetic benchmarks occasionally pops up. A manufacturer sometimes creates specific optimizations or even specific hardware to run a certain benchmark faster and so look better than the competition. In fact, there have been times that the manufacturer has degraded the "user experience" within that benchmark to create a better score. This would be unacceptable in a real-world application, but since it's just a benchmark that no one actually uses, it doesn't really "matter." However, it does mean the numbers that come out of the application are even less trustworthy. At least with a real game or application, if a manufacturer creates a game-specific optimization, everyone who plays that game will benefit.
Examples of synthetic benchmarks: 3DMark, SPEC, Whetstone, Dhrystone
| Real-world | Synthetic | |
| Advantages |
|
|
| Disadvantages |
|
|
Most hardware reviews done today offer a number of scores from a number of applications and application types, from games, to media encoding, to business/professional applications, and multitasking tasks have even entered the fray as HyperThreading and multi-core CPUs have taken hold. Reading these reviews gives a good idea of how a given part compares to its competition. On the other hand, ignore the 3DMark scores from forum posters. The only purposes they serve are for bragging and making sure that your computer isn't getting dramatically worse performance than it should since the scores for the same configuration should be roughly equivalent.
Sadly, laptops are the most difficult to compare because they are complete systems instead of off-the-shelf parts. So most reviews that compare a 8400M to a HD2400 Go will have those 2 cards in entirely different laptops, often with somewhat different configurations. That will be less useful unless you are actually comparing those exact configurations of those two laptops. GPUs in particular are a sticking point for laptop benchmarks because the clock speed of the GPU core and memory is up to the laptop ODM, and ODMs can vary widely to what speeds those are set. So if you're comparing an 8600M GT to a 7600M GT, you can't be sure that a different laptop with an 8600M GT will really score that well or that poorly.
If that all sounds confusing, it's because understanding benchmarks is confusing. The right attitude to have with any benchmark is one of distrust, especially when it's only one or two benchmarks. The good news is that most of the time, once you have a general handle on X part vs. Y part, the devitations from that will be less than 10% (again, in the range you won't notice), so you can figure you have a decent understanding. As I said earlier, go for the big gains, and you'll always win. The more you try to sniggle out a few extra percentage points, the more likely you are to be completely wasting your time.


