Saturday, October 22, 2011

Work Units and Meeting Deadlines


Picking up where I left off, I’m going to write more about the work units and their deadlines, the energy bill and begin to mention some of the folding hardware. From the beginning, I stated that although I was going to explore distributive computing, I was going to emphasize in Folding@Home. From what I understand from photos, they all share a similar layout with an almost identical method of operation. When the program is first run by the user, he or she is asked to type in a username with “Anonymous” as default, next option is the team number where the user can create his or her’s own team or join an existing team. Other options that are not important includes: choosing to use a proxy, selecting the memory usage (small, medium or large) and indicate whether to be prompted before fetching a new work unit. Now comes the good part. The work unit is automatically downloaded into the computer where the computer (central processing unit or graphics processor unit) will crunch that data.
A bigadv work unit being run nets about 25000 points per day.
The fastest single Nvidia GPU today produces over 16,000 PPD. Notice each percentage of the WU is completed in 9 secs.
            There are two main ways to fold on a computer and only one on the PlayStation 3. For the PC, there is the CPU client and GPU client. As time progresses, so do advances in technology architecture. Take a look at the new high end CPU from Intel 2600k codenamed “Sandy Bridge” which utilizes hyperthreading, allowing simultaneous tasks to be done at once. It has four physical cores but eight threads for the computer to take advantage of. The SMP or Symmetric Multi-processing client is used by multicore processors. But running a 2600k on regular client is a waste of time and money, instead it is put to the test on bigadv units, a high point value and high priority work unit and require a minimum of eight cores. To be eligible to bonus points, the user must complete 10 WU first and also complete the bigadv one before the deadline, a feat that requires at least 16 hours of folding. The meat of the points originates from the bonuses and the faster you complete them, the more points you get. The GPU client, in my opinion is more simplistic in terms of setting up and getting it running. Another pro is the work units are short and manageable. What I mean is, with a pretty decent Nvidia video card, you can finish a work unit in more or less an hour. Sure, you won’t match a 2600k’s PPD but you also don’t need fold for hours on end. Lastly, the PS3 is also capable of producing using its Cell processor but as the name implies, it’s a gaming console, inadequate of mass completing units. A quick search indicates that it produces about 900 points a day and takes about 8 hours to complete one work unit.
With Fahmon, users are able to keep track of progress in realtime.
When you finish that particular unit, it will automatically upload the processed data back to Stanford’s servers and a new replacement is given. Not much info on the work unit is given unless you use a Folding monitor program such as FahMon; it is extremely helpful as it lists the type of unit, the point value, the preferred and final deadline and probably the most important feature: the estimated of completion. If it isn’t completed by the final deadline, the results are still uploaded but no credit is given to your name. A general rule is the faster the work units are finished, the more can be churned out. Now, there are hardcore folders out there that will run their systems day and night for the sole purpose of folding. Running your components at this rate will reduce the lifespan of your hardware to a certain extent but it’s probably going to be outdated and considered obsolete in the ever changing technology field. Since I fold when my computer is not in 3D mode and not nonstop, I can’t accurately determine how much electricity my computer consumes when folding. Furthermore, I fold sporadically and without any schedule on my video card. My next post will mention some of the hardware involved and folding farms.

Tuesday, October 4, 2011

What is Distributed Computing?

           When the term “distributed computing” comes up, something complex usually comes to mind. In fact, it’s quite simple despite the term not being used so often. I feel that once distributed computing is defined, everything will fall into place as I attempt to get my meaning across. Without ado, let’s begin.
Distributed computing is the linking of computers in either one area or around the world, collaborating to achieve a common goal. The workload is divided up between multiple systems. In our case, it’s going to use computers around the world to assist scientists in research from finding a cure for caner to finding life forms in outer space. There are many programs available depending on what your interests lie. Simply put, you run a piece of software downloaded from their website and thus, you’ve began to make a difference to benefit humanity.
Folding@Home running on a Nvidia GPU.
Folding@Home is a program created by Stanford used to study the folding of proteins. Proteins are essential to life because they carry out the many functions of the body such as defending your body from foreign invaders and speeding up chemical reactions to providing structural support like keratin, found in nails and hair. The macromolecules are created from various chains of amino acids, and depending on the sequence and chain length, it will have varying functions and coiled shape. Each one is specifically designed to complete a certain task within the body. To do that, the proteins fold themselves into a particular shape. Think about enzymes that speed up chemical reactions, there is a certain groove (active site) for the molecule (substrate) to sit in. This example can be described as a lock and key mechanism. When these proteins misfold, it’s believed by scientists to cause Alzheimer’s, Huntington’s and many types of cancers that aggregation related, meaning that they clump together.
Rosetta@Home
Rosetta@Home, formed by Baker Laboratory, is a similar program to Folding@Home but takes a different approach to the same solution. This program mainly focuses on the prediction of an unknown protein given the amino sequence whereas Folding@Home tries to understand how proteins fold in a specific timeframe. An example I can think of is the process of baking a cake. Rosetta is only interested in predicting the end product, the cake while Folding wants to understand each step on how it’s created.
Seti@Home, created by UC Berkeley, is dedicated to detecting life forms in outer space. One approach is to utilize radio telescopes to listen for radio signals from space. Since I’m not interested in extraterrestrials and don’t know much about Seti particularly, I will try to do a quick run-through.  From what I gathered, huge volumes of data are recorded from a telescope in Puerto Rico, filling about 35 gigbytes of data per day. Then the tape containing all the data is sent to Berkeley where the data is then divided into 0.25 megabyte chunks.  From there, they are sent from the Seti@Home server over the Internet and around the world for people to analyze using their computers.
Hopefully this post gives you an idea of some of the many distributed computing projects out there. From here I will dive deeper into the specifics of Folding@Home and use my own personal experience and knowledge to guide the reader from work units, points worth per WU and other small tidbits.