Large scale file replication with an option to "unsubscribe" from a replicated file on a given machine

Posted by Alexander Gladysh on Server Fault See other posts from Server Fault or by Alexander Gladysh
Published on 2013-06-29T21:38:30Z Indexed on 2013/06/29 22:22 UTC
Read the original article Hit count: 307

Filed under:

ubuntu

|

replication

|

big-data

I have a 100+ GB files per day incoming on one machine. (File size is arbitrary and can be adjusted as needed.)

I have several other machines that do some work on these files.

I need to reliably deliver each incoming file to the worker machines. A worker machine should be able to free its HDD from a file once it is done working with it.

It is preferable that a file would be uploaded to the worker only once and then processed in place, and then deleted, without copying somewhere else — to minimize already high HDD load. (Worker itself requires quite a bit of bandwidth.)

Please advise a solution that is not based on Java. None of existing replication solutions that I've seen can do the "free HDD from the file once processed" stuff — but maybe I'm missing something...

A preferable solution should work with files (from the POV of our business logic code), not require the business logic to connect to some queue or other. (Internally the solution may use whatever technology it needs to — except Java.)

© Server Fault or respective owner

Related posts about ubuntu

Cannot update, apt-get cannot fetch index files

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I have a fresh install of Ubuntu 11.10 from the iso 'ubuntu-11.10-desktop-amd64.iso'. I installed this in VMWare Fusion 4.1.1 running on OSX 10.7.3. When setting up the VM, I allowed easy install to take care of creating my user and installing VMWare tools. No problems during installation, everything… >>> More
'sudo apt-get update' error on Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
When I type in 'sudo apt-get update' I get this mohd-arafat-hossain@TUD:~$ sudo apt-get update [sudo] password for mohd-arafat-hossain: Ign http://bd.archive.ubuntu.com precise InRelease Ign http://bd.archive.ubuntu.com precise-updates InRelease Ign http://bd.archive.ubuntu… >>> More
Opening Skype, Opera, OpenOffice logs me off

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
Whats common among Skype, Opera, OpenOffice in Ubuntu ? Whenever I open these applications I get logged off and shows back me the login screen. This started happening since the 10.10 upgrade. Forgot to mention : Yes, its x64.Each time I open these applications, the UI shows and then crashes. I… >>> More
Update information outdated

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I have a warning triangle that my update information is outdated, the last update was 12 days ago. I use Ubuntu 11.10. A run of sudo apt-get update produces the following output: Ign http://ppa.launchpad.net oneiric InRelease Ign http://de.archive.ubuntu.com oneiric InRelease … >>> More
ubuntu/apt-get update said "Failed to Fetch http:// .... 404 not found"

as seen on Server Fault - Search for 'Server Fault'
Hi all, I'm trying to run apt-get update on ubuntu 9.10 I've configured my proxy server and I can access the internet without any problem: /etc/apt# wget "http://www.google.com" Resolving (...) Proxy request sent, awaiting response... 200 OK Length: 292 [text/html] Saving to: `index.html' 100%[=================================================================================================================================>]… >>> More

Related posts about replication

Is there any replication standard or concept for application server data replication

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi friends, Say, I have a server that handles file based mass data and can process thousands of read requests and hundreds of provisioning requests(Add, modify, delete) per second. This is not SQL based database. Now i planned to implement replication. There should be master- master replication,… >>> More
MySQL Connect 8 Days Away - Replication Sessions

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Following on from my post about MySQL Cluster sessions at the forthcoming Connect conference, its now the turn of MySQL Replication - another technology at the heart of scaling and high availability for MySQL. Unless you've only just returned from a 6-month alien abduction, you will know… >>> More
re-enabling a table for mysql replication

as seen on Server Fault - Search for 'Server Fault'
We were able to setup mysql master-slave replication with the following version on both master/slave: mysqld Ver 5.5.28-29.1-log for Linux on x86_64 (Percona Server (GPL), Release 29.1) One day, we noticed that replication has stopped, we tried skipping over the entries that caused the replication… >>> More
Replication Services in a BI environment

as seen on SQL Blog - Search for 'SQL Blog'
In this blog post I will explain the principles of SQL Server Replication Services without too much detail and I will take a look on the BI capabilities that Replication Services could offer in my opinion. SQL Server Replication Services provides tools to copy and distribute database objects from… >>> More
Fixing MySQL Replication

as seen on Internet.com - Search for 'Internet.com'
Continuing last months article on "MySQL Replication Pitfalls," Sean Hull discusses what to do to make your replication setup more resilient. >>> More