I am approaching the need to scale a video-processign application that runs on EC2.
So far the setup is one machine:
Backbonejs frontend
Rails 3.2
Postgresql
Resque
+
S3 for storage
The flow of the app is as follows:
1) Request from frontend. Upload a video.
2) Storing video
3) Quering external APIs.
4) Processing / encoding videos.
5) Post to frontend.
I can separate the backend and frontend without any problems, but when it comes to distributing the backend between several servers I am a bit puzzled. I can probably come up with a temporary solution (like just duplicating apps making several instances), but since I don't really have expertise in backend system administration, there can be some fundamental mistakes.. Also I would rather have something that is scalable. I wonder if anyone can give some feedback on the following plan:
A) Frontend machine. Just frontend, talks to backend via REST Api of sorts.
B) Backend server (BS), main database. Gets request from 1), posts to 2) saves uploads to 3)
C) S3 storage.
D) Server for quering APIs. Basically just a Resque workers, that post info back to 2)
E) Server for video encoding. Processes videos uploaded on 3) and uploads them back.
So I will have:
A)frontend
\
\
B)MAIN_APP/DB ----- C)S3 Storage (Files)
/ \ /
/ \ /
D)ExternalAPI_queries E)Video_Processing
(redundant DB) (redundant DB)
All this will supposedly talk to each other via HTTP requests.
My reason for this is that Video Processing part is really the most resource-intensive and I would just run barebones application that accepts requests and starts processing them.
Questions:
1) In this setup I will have the main database at B) and all other servers will communicate with it via HTTP requests (and store duplicates of databases also I guess..for safety reasons). Is it the right approach or should I have 1 database that everyone connects to (how then?)
2) Is it a good idea to separate API queries from Video Processing part? Logically they are very close (processing is determined by the result of API queries), but resource-wise Video Processing is waaay more intensive.
3) what should I use to distribute calls between backend apps based on load?