Obtaining real data set for experiment and concurrency computing

Prediction, scheduling, optimization and classification problem are becoming more complex. Relying on high-computation power for machine doesn’t guarantee that you would execute it fast enough. It depends on many factors, for example, the data set, the algorithm, heuristic function and so on. Parallel computation seems to be a good approach to solve the run time efficiency of the problem. Getting the data for an experiment is a painful job. Using the existing available data set from other would give you more headache due to the unbalance information. Developing a web crawler to automate gather information for the hetero-genus websites is another approach that we should consider using when obtaining some generic information. Why do it manually or hiring someone to key-type them in? Runni