distributed filesystems - Page 2

Helper library for distributed algorithms programming?

- by anon

When you code a distributed algorithm, do you use any library to model abstract things like processor, register, message, link, etc.? Is there any library that does that? I'm thinking about e.g. self-stabilizing algorithms, like self-stabilizing minimum spanning-tree algorithms.

Read the article

Distributed transactions

- by javi

Hello! I've a question regarding distributed transactions. Let's assume I have 3 transaction programs: Transaction A begin a=read(A) b=read(B) c=a+b write(C,c) commit Transaction B begin a=read(A) a=a+1 write(A,a) commit Transaction C begin c=read(C) c=c*2 write(A,c) commit So there are 5 pairs of critical operations: C2-A5, A2-B4, B4-C4, B2-C4, A2-C4. I should ensure integrity and confidentiality, do you have any idea of how to achieve it? Thank you in advance!

Read the article

What are the functionalities of Distributed File systems and Distributed Storage Systems?

- by Berkay

i'm reading cloud vendors solutions for the distributed storage systems such as Amazon Dynamo and Google Big Table. and really confused in two terms : what is Distrubuted file systems for in cloud ? what is Distributed storage systems for? what are differences of these terms and functionalities ? if i understand these terms i will create the general architecture of the cloud vendors, any good tutorial or web page will be appreciated. Thanks

Read the article

Distributed file systems

- by Neeraj

I need to implement a distributed storage system for a set of nodes(devices) connected in a mesh network. So what basically my design goals are: The storage system should be capable of handling dynamic entry and exit of nodes. Replication (for fault tolerance). For this i am thinking of using a Distributed file system. Every node could access data in the other nodes in a transparent manner. Are there some simple, easily pluggable opensource implementations? Thanks for your thoughts!

Read the article

Distributed filesystem for automated offline data mirroring

- by Petr Pudlák

I'd like to achieve the following setup: Every time I connect my laptop to a local network, my partition gets automatically mirrored to a partition on my local server. I only want to mirror what has changed from the last time. (I understand that it is not a proper backup solution since there is no history of the changes, it'd be more like a non-persistent network RAID.) Is there a distributed file system that allows such a setup? I've done some searching and it seems to me that most distributed file-systems are focused on data availability and distribution, not duplicating them. I'd be thankful for suggestions. Edit: Sorry, I forgot to mention: I'm using Linux.

Read the article

Optimistic work sharing on sparsely distributed systems

- by Asti

What would a system like BOINC look like if it were written today? At the time BOINC was written, databases were the primary choice for maintaining a shared state and concurrency among nodes. Since then, many approaches have been developed for tasking with optimistic concurrency (OT, partial synchronization primitives, shared iterators etc.) Is there an optimal paradigm for optimistically distributing units of work on sparsely distributing systems which communicate through message passing? Sorry if this is a bit vague. P.S. The concept of Tuple-spaces is great, but locking is inherent to its definition. Edit: I already have a federation system which works very well. I have a reactive OT system is implemented on top of it. I'm looking to extend it to get clients to do units of work.

Read the article

Distributed Transaction Framework across webservices

- by John Petrak

I am designing a new system that has one central web service and several site web services which are spread across the country and some overseas. It has some data that must be identical on all sites. So my plan is to maintain that data in the central web service and then "sync" the data to sites. This includes inserts, edits and deletes. I see a problem when deleting, if one site has used the record, then I need to undo the delete that has happened on the other servers. This lead me to idea that I need some sort of transaction system that can work across different web servers. Before I design one from scratch, I would like to know if anyone has come across this sort of problem and if there are any frame works or even design patterns that might aid me?

Read the article

Interconnect nodes in a Java distributed infrastructure for tweet processing

- by David Moreno García

I'm working in a new version of an old project that I used to download and process user statuses from Twitter. The main problem of that project was its infrastructure. I used multiple instances of a java application (trackers) to download from Twitter given an specific task (basically terms to search for), connected with a central node (a web application) that had to process all tweets once per day and generate a new task for each trackers once each 15 minutes. The central node also had to monitor all trackers and enable/disable them under user petition. This, as I said, was too slow because I had multiple bottlenecks, so in this new version I want to improve the infrastructure and isolate all functionalities in specific nodes. I also need a good notification system to receive notifications for any node. So, in the next diagram I show the components that I'll need in this new version: As you can see, there are more nodes. Here are some notes about them: Dashboard: Controls trackers statuses and send a single task to each of them (under user request). The trackers will use this task until replaced with a new one (if done, not each 15 minutes like before). Search engine: I need to store all the tweets. They are firstly stored in a local database for each tracker but after that I'm thinking on using something like Elasticsearch to be able to do fast searches. Tweet processor: Just and isolated component with its own database (maybe something like the search engine to have fast access to info generated by the module). In the future more could be added. Application UI: A web application with a shared database with the Dashboard (mainly to store users information and preferences). Indeed, both could be merged into a single web. The main difference with the previous version of the project is that now they will be isolated and they will only show information and send requests. I will not do any heavy task in them (like process tweets as I did before). So, having this components, my main headache is how to structure all to not have to rewrite a lot of code every time I need to access any new data. Another headache is how can I interconnect nodes. I could use sockets but that is a pain in the ass. Maybe a REST layer? And finally, if all the nodes are isolated, how could I generate notifications for each user which info is only in the database used by the Application UI? I'm programming this using Java and Spring (at least I used them in the last version) but I have no problems with changing the language if I can take advantage of a tool/library/engine to make my life easier and have a better platform. Any comment will be appreciated.

Read the article

I've totally missed the point of distributed vcs [closed]

- by NimChimpsky

I thought the major benefit of it was that each developers code gets stored within each others repository. My impression was that each developer has their working directory, their own repository, and then a copy of the other developers repository. Removing the need for central server, as you have as many backups as you have developers/repositories Turns out this is nto the case, and your code is only backed up (somewhere other than locally) when you push, the same as a commit in subversions. I am bit disappointed ... hopefully I will be pleasantly surprised when it handles merges better and there are less conflicts ?

Read the article

Distributed filesystem across a slow link

- by Jeff Ferland

I have an image in my head where a link is too slow to realize the real-time transfer of files, but fast enough to catch up every day. What I'd like to see is a master <- master setup where when I write a file to Server A, the metadata will transfer to Server B immediately and the file will transfer at idle or immediately when Server B's client tries to read the file before Server A has sent it. It seems that there are many filesystems which can perform well over fast links, but I don't know of any that do well with a big bottle neck and a few hours of latency.

Read the article

Building a Redundant / Distrubuted Application

- by MattW

This is more of a "point me in the right direction" question. I (and my team of 3) have built a hosted web app that queues and routes customer chat requests to available customer service agents (It does other things as well, but this is enough background to illustrate the issue). The basic dev architecture today is: a single page ajax web UI (ASP.NET MVC) with floating chat windows (think Gmail) a backend Windows service to queue and route the chat requests this service also logs the chats, calculates service levels, etc a Comet server product that routes data between the web frontend and the backend Windows service this also helps us detect which Agents are still connected (online) And our hardware architecture today is: 2 servers to host the web UI portion of the application a load balancer to route requests to the 2 different web app servers a third server to host the SQL Server DB and the backend Windows service responsible for queuing / delivering chats So as it stands today, one of the web app servers could go down and we would be ok. However, if something would happen to the SQL Server / Windows Service server we would be boned. My question - how can I make this backend Windows service logic be able to be spread across multiple machines (distributed)? The Windows service is written to accept requests from the Comet server, check for available Agents, and route the chat to those agents. How can I make this more distributed? How can I make it so that I can distribute the work of the backend Windows service can be spread across multiple machines for redundancy and uptime purposes? Will I need to re-write it with distributed computing in mind? I should also note that I am hosting all of this on Rackspace Cloud instances - so maybe it is something I should be less concerned about? Thanks in advance for any help!

Read the article

Geographically distributed file system with preferred locality

- by dpb

Hi All -- I'm building a application that needs to distribute a standard file server across a few sites over a WAN. Basically, each site needs to write a lot of misc files of varying size (some in the 100s MB range, but most small), and the application is written such that collisions aren't a problem. I'd like to have a system set up that meets the following qualifications: Each site can store files in a shared "namespace". That is, all the files would show up in the same filesystem. Each site would not send data over the WAN unless necessary. I.e., there would be local storage on each side of the WAN that would be "merged" into the same logical filesystem. Linux & Free ($$$) is a must. Basically, something like a central NFS share would meet most of the requirements, however it would not allow the locally written data to stay local. All data from remote sides of the WAN would be copied locally all the time. I have looked into Lustre, and have run some successful tests with it, however, it appears to distribute files fairly uniformly across the distributed storage. I have dug through the documentation and have not found anything that automatically will "prefer" local storage over remote storage. Even something that went with the lowest latency storage would be fine. It would work most of the time, which would meet this application's requirements. Any ideas?

Read the article

Distributed File Systems.

- by GruffTech

So, I've been reading several articles around ServerFault as well as google. (For Example, this link) My Requirements are very similar to the link above, however i'd like to also have dynamic or at least resizeable file volumes, so if necessary i can add 4-5 servers to the pool, and then expand the volume. Any Distributed File systems that support that, to save me some time? Thanks! LustreFS will be my next test cluster to build. GlusterFS I've build a 3-machine test GlusterFS cluster, However i quickly became aware of several of its limitations that it doesn't seem to make clearly public. One, i can't seem to resize a volume. Once a volume is created, its done. Which seems retarded, why have a fully scalable file system if i can't scale a volume? So maybe i'm doing something wrong. I'm not sure. AmazonS3 while gives the cheapest startup adds too much cost when broken down to per client per month, so its out. Building my own system when prorated over several years with no bandwidth costs makes it significantly cheaper. MogileFS isn't an option as we'd like this server to be a SAN-Replacement, for storing tons of media from a multitude of systems, which for us means it needs to be POSIX compliant so it can be remotely mounted via NFS or CIFS.

Read the article

open source gossip-based membership protocol?

- by Aaron

I am looking for a library which I can plug into a distributed application which implements any gossip-based membership protocol. Such a library would allow me to send/receive membership lists, merge received membership lists, etc... Even better would be if the library implemented a protocol with performance O(logn) performance guarantees. Does anyone know of any open source library like this? It doesn't need to meet all of the aforementioned requirements; even something partially implemented would be helpful.

Read the article

Cross-platform distributed fault-tolerant (disconnected operation/local cache) filesystem

- by Adrian Frühwirth

We are facing a design "challenge" where we are required to set up a storage solution with the following properties: What we need HA a scalable storage backend offline/disconnected operation on the client to account for network outages cross-platform access client-side access from certainly Windows (probably XP upwards), possibly Linux backend integrates with AD/LDAP (permission management (user/group management, ...)) should work reasonably well over slow WAN-links Another problem is that we don't really know all possible use cases here, if people need to be able to have concurrent access to shared files or if they will only be accessing their own files, so a possible solution needs to account for concurrent access and how conflict management would look in this case from a user's point of view. This two years old blog posts sums up the impression that I have been getting during the last couple of days of research, that there are lots of current übercool projects implementing (non-Windows) clustered petabyte-capable blob-storage solutions but that there is none that supports disconnected operation nicely and natively, but I am hoping that we have missed an obvious solution. What we have tried OpenAFS We figured that we want a distributed network filesystem with a local cache and tested OpenAFS (which, as the only currently "stable" DFS supporting disconnected operation, seemed the way to go) for a week but there are several problems with it: it's a real pain to set up there are no official RHEL/CentOS packages the package of the current stable version 1.6.5.1 from elrepo randomly kernel panics on fresh installs, this is an absolute no-go Windows support (including the required Kerberos packages) is mystical. The current client for the 1.6 branch does not run on Windows 8, the current client for the 1.7 does but it just randomly crashes. After that experience we didn't even bother testing on XP and Windows 7. Suffice to say, we couldn't get it working and the whole setup has been so unstable and complicated to setup that it's just not an option for production. Samba + Unison Since OpenAFS was a complete disaster and no other DFS seems to support disconnected operation we went for a simpler idea that would sync files against a Samba server using Unison. This has the following advantages: Samba integrates with ADs; it's a pain but can be done. Samba solves the problem of remotely accessing the storage from Windows but introduces another SPOF and does not address the actual storage problem. We could probably stick any clustered FS underneath Samba, but that means we need a HA Samba setup on top of that to maintain HA which probably adds a lot of additional complexity. I vaguely remember trying to implement redundancy with Samba before and I could not silently failover between servers. Even when online, you are working with local files which will result in more conflicts than would be necessary if a local cache were only touched when disconnected It's not automatic. We cannot expect users to manually sync their files using the (functional, but not-so-pretty) GTK GUI on a regular basis. I attempted to semi-automate the process using the Windows task scheduler, but you cannot really do it in a satisfactory way. On top of that, the way Unison works makes syncing against Samba a costly operation, so I am afraid that it just doesn't scale very well or even at all. Samba + "Offline Files" After that we became a little desparate and gave Windows "offline files" a chance. We figured that having something that is inbuilt into the OS would reduce administrative efforts, helps blaming someone else when it's not working properly and should just work since people have been using this for years. Right? Wrong. We really wanted it to work, but it just doesn't. 30 minutes of copying files around and unplugging network cables/disabling network interfaces left us with (silent! there is only a tiny notification in Windows explorer in the statusbar, which doesn't even open Sync Center if you click on it!) undeletable files on the server (!) and conflicts that should not even be conflicts. In the end, we had one successful sync of a tiny text file, everything else just exploded horribly. Beyond that, there are other problems: Microsoft admits that "offline files" in Windows XP cannot cope with "large files" and therefore does not cache/sync them at all which would mean those files become unavailable if the connection drop In Windows 7 the feature is only available in the Professional/Ultimate/Enterprise editions. Summary Unless there is another fault-tolerant DFS that supports Windows natively I assume that stacking a HA Samba cluster on top of something like GlusterFS/Lustre/whatnot is the only option, but I hope that I am wrong here. How do other companies allow fault-tolerant network access to redundant storage in a heterogeneous environment with Windows?

Read the article

GlusterFS vs Ceph, which is better for production use for the moment?

- by Mickey Shine

I am evaluating GlusterFS and Ceph, seems Gluster is FUSE based which means it may be not as fast as Ceph. But looks like Gluster got a very friendly control panel and is ease to use. Ceph was merged into linux kernel a few days ago and this indicates that it has much more potential energy and may be a good choice in the future. I am wondering which(even out of the two?) is a better choice for production use? It would be nice if you could share your practical experiences

Read the article

How stable is POHMELFS?

- by Ztyx

I am currently looking into POHMELFS because of its ability to scale reads. Does anyone have it in production and could tell me how stable it is?

Read the article

Distributed website server redundancy

- by Keith Lion

Assume a website infrastructure is very complicated and is fully distributed (probably like most large web companies). Am I right in thinking that although there are all these extra web servers to handle multiple client requests, there is still a single "machine" whereby users must enter? I am guessing this machine will be the one physically associated to the IP address? I ask because I need to know whether, in places where distributed systems exist, there is still a single point of failure- usually the control node or, in this example, the machine connected to the public internet? Surely there cannot be two machines connected to the internet, as they would have to have different IP addresses? This "machine" may not be a server per se, but maybe it is a piece of cisco equipment. I just need to know whether, in the real world, these distributed systems still have a particular section where they depend on the integrity of one electronic device?

Read the article

How do filesystems handle concurrent read/write?

- by kemp

User A asks the system to read file foo and at the same time user B wants to save his or her data onto the same file. How is this situation handled on the filesystem level?

Read the article

Distributed Database Services?

- by Cameron

I'm working on a database-driven web service with clients in the US and Australia. We're currently hosted in the US, however our Australian clients are experiencing lag. The lag is primarily due to the fact that the pages launch AJAX queries which require some db work to be done on our database in the US and these take a while to perform a round trip. Ideally, we're looking for some kind of distributed database system which replicates our main US database in Australia (and possibly other locations if we choose to expand later on). Does anyone have any suggestions for services which offer something like this? Something like a CDN (CacheFly etc), which is web-based, simple to set up etc but for databases instead of static files. Ideally it would be completely transparent to the application and abstract away all the distributed database management, syncs etc.

Read the article

Do any filesystems support multiple forks / streams on directories?

- by hippietrail

Apple's HFS+ supports multiple forks such as the old data and resource forks. NTFS supports alternate data streams. I believe some *nix filesystems also have some support for multiple file forks or streams. Given that directories (folders) are just a kind of file at the filesystem level, I'm wondering if any of the filesystems which support this feature support it for dirs as well as files? (Or indeed directories in the alternate forks / streams?) I'm mostly asking out of curiosity rather than wanting to use such a feature. But one use it would have would be additional metadata for directories, which seems to be the most common use for these streams for files currently.

Read the article

Report: Linux 2.6.34 Kernel Debuts With New Filesystems

The newest Linux kernel release continues its rapid development pace, adding a pair of new filesystems and boosting virtualization performance.

Read the article

distributed, fault-tolerant network block device

- by gucki

I'm looking for a distributed, fault-tolerant network storage system which exposes block devices (not filesystems) on the clients. A client's block device should write simultaneously to several storage nodes A client's block device should not fail as long as not all storage nodes backing it went down The master should automatically redistribute storages' data when a storage node fails or gets added/ removed A single master (which is for metadata only) is fine So ideally the architecture would be very similar to moosefs (http://www.moosefs.org/) but instead of exposing a real filesystem mounted using a fuse client it'd expose block devices on the clients. I know of iscsi and drbd but both don't seem to offer what I'm looking for. Or am I missing something?

Read the article

Experience with MooseFS?

- by brown.2179

Anyone have any experience using MooseFS? I want an easy distributed storage platform to store static data archive of about 10 TB and serve it to 20-40 nodes. Also I want to be able to add storage as the archive grows without having to rebuild the filesystem. I don't care if it's a bit slow. I just want it to be simple and stable. Basically from what I can see for OS X it's between MooseFS and Gluster. Any other suggestions?

Read the article

Experience with MooseFS?

- by brown.2179

Anyone have any experience using MooseFS? I want an easy distributed storage platform to store static data archive of about 10 TB and serve it to 20-40 nodes. Also I want to be able to add storage as the archive grows without having to rebuild the filesystem. I don't care if it's a bit slow. I just want it to be simple and stable. Basically from what I can see for OS X it's between MooseFS and Gluster. Any other suggestions?

Search Results

Search found 2515 results on 101 pages for 'distributed filesystems'.

Page 2/101 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by anon

- by javi

- by Berkay

- by Neeraj

- by Petr Pudlák

- by Asti

- by John Petrak

- by David Moreno García

- by NimChimpsky

- by Jeff Ferland

- by MattW

- by dpb

- by GruffTech

- by Aaron

- by Adrian Frühwirth

- by Mickey Shine

- by Ztyx

- by Keith Lion

- by kemp

- by Cameron

- by hippietrail

- by gucki

- by brown.2179

- by brown.2179

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >