Friday, November 7, 2008

The Max Download Session Problem

Think about this story : there is a big data warehouse department in a company. The DW department unix machine has to download numerous files from various other departments in order to collect data to build the warehouse.

There is a situation that : for example, job script-1 will download file-A, file-B, file-C, file-D & file-E from department-1 of the company, Job script-2 will download file-F, file-G, file-H, file-I & file-J from department-2 of the company, similarly for job script-3, script-4, script-5, script-6 and so so. All jobs are inside the job database and are kick-out by the job scheduler in a monthly, weekly and daily basis respectively.

How to do the download ? Actually, the script will call a third party file transfer software.

For this situation, there are many outstanding questions :

Due to license issue, the third party file transfer software only allows a max download session of 4. If there are 6 scripts kick-out by the job scheduler, what will happen ? Can the third party file transfer software put the requests queuing ? Or, will it just turn down the all upcoming requests ?

How about, for example, job script-1 & job script-2 are monthly jobs, script-3 & script-4 are weekly jobs and script-5 & script-6 are daily jobs. The files in monthly job scripts (e.g. script-1 and script-2) are very large. They take a long time for transfer. The files in daily job scripts (e.g. script-5 and script-6) are relatively smaller, just a few hours or minutes are required. But, the daily job scripts have a batch running window of approx 24 hour only. They must finish the file downloading within one day. On the next day, there is a new daily batch. Can the third party file transfer software prioritize the transfer requests ? What happen when all the 4 downloading sessions are occupied by the files of the monthly job scripts ? Can the third party file transfer software let the daily job scripts to jump queue to do the downloading once a monthly file finishes its downloading ? If the transfer software only serves the request in a first-come-first-serve basis and the monthly job scripts appear first, all the monthly file transfer will be answered first. Then, the daily job scripts may face starvation.

What happen when the third party software is hang up ? Will the job scripts get notified ? Or, will the job script simply wait infinitely ? Will the human being production system support get alerted ?

When that third party file transfer software is down, what happens to all the downloading job scripts ? Can all the job scripts automatically switch to use another file transfer software ?

If the DW department wants to upgrade the third party file transfer software or to replace it by another software, will the job script affected ?

To tackle all these questions, one solution is to build a centralized file transfer request manager as an agent or a middle-man between the job scripts and the third party file transfer software.

First of all, the DW job scripts should not call the third party file transfer software directly. Instead, they will submit the file transfer requests to the file transfer request manager and then go sleeping. For example, job script-1 will submit the request of file-A, file-B, file-C, file-D & file-E to the manager and then go sleeping for 2 days. After 2 days, if all the 5 files are not received, the job script-1 should raise a non-zero return-code to the job scheduler to signal the production system support personnel, similarly for script-2. For script-3 and script-4, since they are weekly jobs, sleeping hour is only 1 day. For daily job script-5 & script-6, they can only sleep for a few hours. Each job script can control its own sleeping hours, depending on the average file transfer time of the job script.

On receiving the file transfer requests, the manager can then call the third party file transfer software to perform the actual file downloading.

Normally, the request is done in a first-come-file-serve basis. However, the manager can also allow daily jobs to jump queue. Daily jobs should have a higher priority always.

Also, the manager can control over the max file transfer session. If there are already 4 files performing transfer, the manager simply put all upcoming request queuing. Also, the manager can further control that each job can only occupy at most 2 file transfer sessions in order to avoid one single job to use up all the 4 sessions.

The manager should totally take care about the third party file transfer software. If the third party software is down, the manager should alert the human being production system support (i.e. PSS). After investigation, if the bug cannot be fixed shortly, this PSS can instruct the manger to switch to use another file transfer software. This is transparent to the DW job scripts.

For the handshaking between the manager and the job scripts, the manager to send signal to the job script to report whether the file transfer is successful or fail. When receiving a successful signal, the DW job script can do some post processing.

For example, when all the file-A, file-B, file-C, file-D & file-E are downloaded successfully, the manger can alert the job script-1. Another method is to let the job script-1 to count whether all the 5 files are downloaded, the manager does not perform the counting. When all the 5 files are downloaded successfully, job script-1 should wake up from sleeping (job script-1 sleeps at most 2 days). Then, job script-1 should give a zero return-code to the job scheduler to indicate job successfully complete so that the scheduler can kick-out job(s) in the downstream.

When there is a software upgrade, only the manger is affected. The DW job scripts are totally transparent.

Furthermore, the manager can be enhanced to classify and distribute requests to different multiple queues. The requests of some VIP DW job scripts can go to high priority queues which is always served first. Those requests from some can-be-wait job scripts should go to lower priority queues. Also, maybe different queues are using different third party file transfer software. Maybe some queues are using software for secured channel to provide higher security level with maybe slower performance.

By the way, although this article is talking about file downloading, everything can also be applied to uploading as well. In other words, this situation and the file request manager can actually handle both file downloading / receiving / pulling as well as file uploading / sending / pushing. There can be out-queues and in-queues in the manger.


No comments:

Duplicate Open Current Folder in a New Window

Sometimes after I opened a folder in Win7, I would like to duplicate open the same folder again in another explorer window. Then, I can ope...