Search Results

Search found 3409 results on 137 pages for 'distributed computing'.

Page 4/137 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Use a custom value object or a Guid as an entity identifier in a distributed system?

    - by Kazark
    tl;dr I've been told that in domain-driven design, an identifier for an entity could be a custom value object, i.e. something other than Guid, string, int, etc. Can this really be advisable in a distributed system? Long version I will invent an situation analogous to the one I am currently facing. Say I have a distributed system in which a central concept is an egg. The system allows you to order eggs and see spending reports and inventory-centric data such as quantity on hand, usage, valuation and what have you. There area variety of services backing these behaviors. And say there is also another app which allows you to compose recipes that link to a particular egg type. Now egg type is broken down by the species—ostrich, goose, duck, chicken, quail. This is fine and dandy because it means that users don't end up with ostrich eggs when they wanted quail eggs and whatnot. However, we've been getting complaints because jumbo chicken eggs are not even close to equivalent to small ones. The price is different, and they really aren't substitutable in recipes. And here we thought we were doing users a favor by not overwhelming them with too many options. Currently each of the services (say, OrderSubmitter, EggTypeDefiner, SpendingReportsGenerator, InventoryTracker, RecipeCreator, RecipeTracker, or whatever) are identifying egg types with an industry-standard integer representation the species (let's call it speciesCode). We realize we've goofed up because this change could effect every service. There are two basic proposed solutions: Use a predefined identifier type like Guid as the eggTypeID throughout all the services, but make EggTypeDefiner the only service that knows that this maps to a speciesCode and eggSizeCode (and potentially to an isOrganic flag in the future, or whatever). Use an EggTypeID value object which is a combination of speciesCode and eggSizeCode in every service. I've proposed the first solution because I'm hoping it better encapsulates the definition of what an egg type is in the EggTypeDefiner and will be more resilient to changes, say if some people now want to differentiate eggs by whether or not they are "organic". The second solution is being suggested by some people who understand DDD better than I do in the hopes that less enrichment and lookup will be necessary that way, with the justification that in DDD using a value object as an ID is fine. Also, they are saying that EggTypeDefiner is not a domain and EggType is not an entity and as such should not have a Guid for an ID. However, I'm not sure the second solution is viable. This "value object" is going to have to be serialized into JSON and URLs for GET requests and used with a variety of technologies (C#, JavaScript...) which breaks encapsulation and thus removes any behavior of the identifier value object (is either of the fields optional? etc.) Is this a case where we want to avoid something that would normally be fine in DDD because we are trying to do DDD in a distributed fashion? Summary Can it be a good idea to use a custom value object as an identifier in a distributed system (solution #2)?

    Read the article

  • Configuration management in support of scientific computing

    - by Sharpie
    For the past few years I have been involved with developing and maintaining a system for forecasting near-shore waves. Our team has just received a significant grant for further development and as a result we are taking the opportunity to refactor many components of the old system. We will also be receiving a new server to run the model and so I am taking this opportunity to consider how we set up the system. Basically, the steps that need to happen are: Some standard packages and libraries such as compilers and databases need to be downloaded and installed. Some custom scientific models need to be downloaded and compiled from source as they are not commonly provided as packages. New users need to be created to manage the databases and run the models. A suite of scripts that manage model-database interaction needs to be checked out from source code control and installed. Crontabs need to be set up to run the scripts at regular intervals in order to generate forecasts. I have been pondering applying tools such as Puppet, Capistrano or Fabric to automate the above steps. It seems perfectly possible to implement most of the above functionality except there are a couple usage cases that I am wondering about: During my preliminary research, I have found few examples and little discussion on how to use these systems to abstract and automate the process of building custom components from source. We may have to deploy on machines that are isolated from the Internet- i.e. all configuration and set up files will have to come in on a USB key that can be inserted into a terminal that can connect to the server that will run the models. I see this as an opportunity to learn a new tool that will help me automate my workflow, but I am unsure which tool I should start with. If any member of the community could suggest a tool that would support the above workflow and the issues specific to scientific computing, I would be very grateful. Our production server will be running Linux, but support for OS X would be a bonus as it would allow the development team to setup test installations outside of VirtualBox.

    Read the article

  • Configuration management in support of scientific computing

    - by Sharpie
    For the past few years I have been involved with developing and maintaining a system for forecasting near-shore waves. Our team has just received a significant grant for further development and as a result we are taking the opportunity to refactor many components of the old system. We will also be receiving a new server to run the model and so I am taking this opportunity to consider how we set up the system. Basically, the steps that need to happen are: Some standard packages and libraries such as compilers and databases need to be downloaded and installed. Some custom scientific models need to be downloaded and compiled from source as they are not commonly provided as packages. New users need to be created to manage the databases and run the models. A suite of scripts that manage model-database interaction needs to be checked out from source code control and installed. Crontabs need to be set up to run the scripts at regular intervals in order to generate forecasts. I have been pondering applying tools such as Puppet, Capistrano or Fabric to automate the above steps. It seems perfectly possible to implement most of the above functionality except there are a couple usage cases that I am wondering about: During my preliminary research, I have found few examples and little discussion on how to use these systems to abstract and automate the process of building custom components from source. We may have to deploy on machines that are isolated from the Internet- i.e. all configuration and set up files will have to come in on a USB key that can be inserted into a terminal that can connect to the server that will run the models. I see this as an opportunity to learn a new tool that will help me automate my workflow, but I am unsure which tool I should start with. If any member of the community could suggest a tool that would support the above workflow and the issues specific to scientific computing, I would be very grateful. Our production server will be running Linux, but support for OS X would be a bonus as it would allow the development team to setup test installations outside of VirtualBox.

    Read the article

  • Geographically distributed file system with preferred locality

    - by dpb
    Hi All -- I'm building a application that needs to distribute a standard file server across a few sites over a WAN. Basically, each site needs to write a lot of misc files of varying size (some in the 100s MB range, but most small), and the application is written such that collisions aren't a problem. I'd like to have a system set up that meets the following qualifications: Each site can store files in a shared "namespace". That is, all the files would show up in the same filesystem. Each site would not send data over the WAN unless necessary. I.e., there would be local storage on each side of the WAN that would be "merged" into the same logical filesystem. Linux & Free ($$$) is a must. Basically, something like a central NFS share would meet most of the requirements, however it would not allow the locally written data to stay local. All data from remote sides of the WAN would be copied locally all the time. I have looked into Lustre, and have run some successful tests with it, however, it appears to distribute files fairly uniformly across the distributed storage. I have dug through the documentation and have not found anything that automatically will "prefer" local storage over remote storage. Even something that went with the lowest latency storage would be fine. It would work most of the time, which would meet this application's requirements. Any ideas?

    Read the article

  • Distributed File Systems.

    - by GruffTech
    So, I've been reading several articles around ServerFault as well as google. (For Example, this link) My Requirements are very similar to the link above, however i'd like to also have dynamic or at least resizeable file volumes, so if necessary i can add 4-5 servers to the pool, and then expand the volume. Any Distributed File systems that support that, to save me some time? Thanks! LustreFS will be my next test cluster to build. GlusterFS I've build a 3-machine test GlusterFS cluster, However i quickly became aware of several of its limitations that it doesn't seem to make clearly public. One, i can't seem to resize a volume. Once a volume is created, its done. Which seems retarded, why have a fully scalable file system if i can't scale a volume? So maybe i'm doing something wrong. I'm not sure. AmazonS3 while gives the cheapest startup adds too much cost when broken down to per client per month, so its out. Building my own system when prorated over several years with no bandwidth costs makes it significantly cheaper. MogileFS isn't an option as we'd like this server to be a SAN-Replacement, for storing tons of media from a multitude of systems, which for us means it needs to be POSIX compliant so it can be remotely mounted via NFS or CIFS.

    Read the article

  • Using Openfire for distributed XMPP-based video-chat

    - by Yitzhak
    I have been tasked with setting up a distributed video-chat system built on XMPP. Currently my setup looks like this: Openfire (XMPP server) + JingleNodes plugin for video chat OpenLDAP (LDAP server) for storing user information and allowing directory queries Kerberos server for authentication and passwords In testing with one set of machines (i.e. only three), everything works as expected: I can log in to Openfire and it looks up the user information in the OpenLDAP database, which in turn authenticates my user with Kerberos. Now, I want to have several clusters, so that there is a cluster on each continent. A typical cluster will probably contain 2-5 servers. Users logging in will be directed to the closest cluster based on geographical location. Something that concerns me particularly is the dynamic maintenance of contact lists. If a user is using a machine in Asia, for example, how would contact lists be updated around the world to reflect the current server he is using? How would that work with LDAP? Specific questions: How do I direct users based on geographical location? What is the best architecture for a cluster? -- would all traffic need to come into a load-balancer on each one, for example? How do I manage the update of contact lists across all these servers? In general, how do I go about setting this up? What are the pitfalls in doing this? I am inexperienced in this area, so any advice and suggestions would be appreciated.

    Read the article

  • Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story What is Cloud? Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service.  Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post. There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud Public Cloud Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services. Private Cloud Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally. Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia:   Public Cloud Private Cloud Initial cost Typically zero Typically high Running cost Unpredictable Unpredictable Customization Impossible Possible Privacy No (Host has access to the data Yes Single sign-on Impossible Possible Scaling up Easy while within defined limits Laborious but no limits Hybrid Cloud Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together. Cloud and Big Data – Common Characteristics There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework. Here is the list of all the characteristics of cloud computing important in Big Data Scalability Elasticity Ad-hoc Resource Pooling Low Cost to Setup Infastructure Pay on Use or Pay as you Go Highly Available Leading Big Data Cloud Providers There are many players in Big Data Cloud but we will list a few of the known players in this list. Amazon Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence  they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling. Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services: Amazon Elastic MapReduce – It processes very high volumes of data Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster Amazon RedShift – It is petabyte scale data warehousing service Google Though Google is known for Search Engine, we all know that it is much more than that. Google Compute Engine – It offers secure, flexible computing from energy efficient data centers Google Big Query – It allows SQL-like queries to run against large datasets Google Prediction API – It is a cloud based machine learning tool Other Players Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware. Thing to Watch The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch: Data Integrity Initial Cost Recurring Cost Performance Data Access Security Location Compliance Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud. Tomorrow In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Cloud Computing Forces Better Design Practices

    - by Herve Roggero
    Is cloud computing simply different than on premise development, or is cloud computing actually forcing you to create better applications than you normally would? In other words, is cloud computing merely imposing different design principles, or forcing better design principles?  A little while back I got into a discussion with a developer in which I was arguing that cloud computing, and specifically Windows Azure in his case, was forcing developers to adopt better design principles. His opinion was that cloud computing was not yielding better systems; just different systems. In this blog, I will argue that cloud computing does force developers to use better design practices, and hence better applications. So the first thing to define, of course, is the word “better”, in the context of application development. Looking at a few definitions online, better means “superior quality”. As it relates to this discussion then, I stipulate that cloud computing can yield higher quality applications in terms of scalability, everything else being equal. Before going further I need to also outline the difference between performance and scalability. Performance and scalability are two related concepts, but they don’t mean the same thing. Scalability is the measure of system performance given various loads. So when developers design for performance, they usually give higher priority to a given load and tend to optimize for the given load. When developers design for scalability, the actual performance at a given load is not as important; the ability to ensure reasonable performance regardless of the load becomes the objective. This can lead to very different design choices. For example, if your objective is to obtains the fastest response time possible for a service you are building, you may choose the implement a TCP connection that never closes until the client chooses to close the connection (in other words, a tightly coupled service from a connectivity standpoint), and on which a connection session is established for faster processing on the next request (like SQL Server or other database systems for example). If you objective is to scale, you may implement a service that answers to requests without keeping session state, so that server resources are released as quickly as possible, like a REST service for example. This alternate design would likely have a slower response time than the TCP service for any given load, but would continue to function at very large loads because of its inherently loosely coupled design. An example of a REST service is the NO-SQL implementation in the Microsoft cloud called Azure Tables. Now, back to cloud computing… Cloud computing is designed to help you scale your applications, specifically when you use Platform as a Service (PaaS) offerings. However it’s not automatic. You can design a tightly-coupled TCP service as discussed above, and as you can imagine, it probably won’t scale even if you place the service in the cloud because it isn’t using a connection pattern that will allow it to scale [note: I am not implying that all TCP systems do not scale; I am just illustrating the scalability concepts with an imaginary TCP service that isn’t designed to scale for the purpose of this discussion]. The other service, using REST, will have a better chance to scale because, by design, it minimizes resource consumption for individual requests and doesn’t tie a client connection to a specific endpoint (which means you can easily deploy this service to hundreds of machines without much trouble, as long as your pockets are deep enough). The TCP and REST services discussed above are both valid designs; the TCP service is faster and the REST service scales better. So is it fair to say that one service is fundamentally better than the other? No; not unless you need to scale. And if you don’t need to scale, then you don’t need the cloud in the first place. However, it is interesting to note that if you do need to scale, then a loosely coupled system becomes a better design because it can almost always scale better than a tightly-coupled system. And because most applications grow overtime, with an increasing user base, new functional requirements, increased data and so forth, most applications eventually do need to scale. So in my humble opinion, I conclude that a loosely coupled system is not just different than a tightly coupled system; it is a better design, because it will stand the test of time. And in my book, if a system stands the test of time better than another, it is of superior quality. Because cloud computing demands loosely coupled systems so that its underlying service architecture can be leveraged, developers ultimately have no choice but to design loosely coupled systems for the cloud. And because loosely coupled systems are better… … the cloud forces better design practices. My 2 cents.

    Read the article

  • Cloud Computing = Elasticity * Availability

    - by Herve Roggero
    What is cloud computing? Is hosting the same thing as cloud computing? Are you running a cloud if you already use virtual machines? What is the difference between Infrastructure as a Service (IaaS) and a cloud provider? And the list goes on… these questions keep coming up and all try to fundamentally explain what “cloud” means relative to other concepts. At the risk of over simplification, answering these questions becomes simpler once you understand the primary foundations of cloud computing: Elasticity and Availability.   Elasticity The basic value proposition of cloud computing is to pay as you go, and to pay for what you use. This implies that an application can expand and contract on demand, across all its tiers (presentation layer, services, database, security…).  This also implies that application components can grow independently from each other. So if you need more storage for your database, you should be able to grow that tier without affecting, reconfiguring or changing the other tiers. Basically, cloud applications behave like a sponge; when you add water to a sponge, it grows in size; in the application world, the more customers you add, the more it grows. Pure IaaS providers will provide certain benefits, specifically in terms of operating costs, but an IaaS provider will not help you in making your applications elastic; neither will Virtual Machines. The smallest elasticity unit of an IaaS provider and a Virtual Machine environment is a server (physical or virtual). While adding servers in a datacenter helps in achieving scale, it is hardly enough. The application has yet to use this hardware.  If the process of adding computing resources is not transparent to the application, the application is not elastic.   As you can see from the above description, designing for the cloud is not about more servers; it is about designing an application for elasticity regardless of the underlying server farm.   Availability The fact of the matter is that making applications highly available is hard. It requires highly specialized tools and trained staff. On top of it, it's expensive. Many companies are required to run multiple data centers due to high availability requirements. In some organizations, some data centers are simply on standby, waiting to be used in a case of a failover. Other organizations are able to achieve a certain level of success with active/active data centers, in which all available data centers serve incoming user requests. While achieving high availability for services is relatively simple, establishing a highly available database farm is far more complex. In fact it is so complex that many companies establish yearly tests to validate failover procedures.   To a certain degree certain IaaS provides can assist with complex disaster recovery planning and setting up data centers that can achieve successful failover. However the burden is still on the corporation to manage and maintain such an environment, including regular hardware and software upgrades. Cloud computing on the other hand removes most of the disaster recovery requirements by hiding many of the underlying complexities.   Cloud Providers A cloud provider is an infrastructure provider offering additional tools to achieve application elasticity and availability that are not usually available on-premise. For example Microsoft Azure provides a simple configuration screen that makes it possible to run 1 or 100 web sites by clicking a button or two on a screen (simplifying provisioning), and soon SQL Azure will offer Data Federation to allow database sharding (which allows you to scale the database tier seamlessly and automatically). Other cloud providers offer certain features that are not available on-premise as well, such as the Amazon SC3 (Simple Storage Service) which gives you virtually unlimited storage capabilities for simple data stores, which is somewhat equivalent to the Microsoft Azure Table offering (offering a server-independent data storage model). Unlike IaaS providers, cloud providers give you the necessary tools to adopt elasticity as part of your application architecture.    Some cloud providers offer built-in high availability that get you out of the business of configuring clustered solutions, or running multiple data centers. Some cloud providers will give you more control (which puts some of that burden back on the customers' shoulder) and others will tend to make high availability totally transparent. For example, SQL Azure provides high availability automatically which would be very difficult to achieve (and very costly) on premise.   Keep in mind that each cloud provider has its strengths and weaknesses; some are better at achieving transparent scalability and server independence than others.    Not for Everyone Note however that it is up to you to leverage the elasticity capabilities of a cloud provider, as discussed previously; if you build a website that does not need to scale, for which elasticity is not important, then you can use a traditional host provider unless you also need high availability. Leveraging the technologies of cloud providers can be difficult and can become a journey for companies that build their solutions in a scale up fashion. Cloud computing promises to address cost containment and scalability of applications with built-in high availability. If your application does not need to scale or you do not need high availability, then cloud computing may not be for you. In fact, you may pay a premium to run your applications with cloud providers due to the underlying technologies built specifically for scalability and availability requirements. And as such, the cloud is not for everyone.   Consistent Customer Experience, Predictable Cost With all its complexities, buzz and foggy definition, cloud computing boils down to a simple objective: consistent customer experience at a predictable cost.  The objective of a cloud solution is to provide the same user experience to your last customer than the first, while keeping your operating costs directly proportional to the number of customers you have. Making your applications elastic and highly available across all its tiers, with as much automation as possible, achieves the first objective of a consistent customer experience. And the ability to expand and contract the infrastructure footprint of your application dynamically achieves the cost containment objectives.     Herve Roggero is a SQL Azure MVP and co-author of Pro SQL Azure (APress).  He is the co-founder of Blue Syntax Consulting (www.bluesyntax.net), a company focusing on cloud computing technologies helping customers understand and adopt cloud computing technologies. For more information contact herve at hroggero @ bluesyntax.net .

    Read the article

  • Distributed and/or Parallel SSIS processing

    - by Jeff
    Background: Our company hosts SaaS DSS applications, where clients provide us data Daily and/or Weekly, which we process & merge into their existing database. During business hours, load in the servers are pretty minimal as it's mostly users running simple pre-defined queries via the website, or running drill-through reports that mostly hit the SSAS OLAP cube. I manage the IT Operations Team, and so far this has presented an interesting "scaling" issue for us. For our daily-refreshed clients, the server is only "busy" for about 4-6 hrs at night. For our weekly-refresh clients, the server is only "busy" for maybe 8-10 hrs per week! We've done our best to use some simple methods of distributing the load by spreading the daily clients evenly among the servers such that we're not trying to process daily clients back-to-back over night. But long-term this scaling strategy creates two notable issues. First, it's going to consume a pretty immense amount of hardware that sits idle for large periods of time. Second, it takes significant Production Support over-head to basically "schedule" the ETL such that they don't over-lap, and move clients/schedules around if they out-grow the resources on a particular server or allocated time-slot. As the title would imply, one option we've tried is running multiple SSIS packages in parallel, but in most cases this has yielded VERY inconsistent results. The most common failures are DTExec, SQL, and SSAS fighting for physical memory and throwing out-of-memory errors, and ETLs running 3,4,5x longer than expected. So from my practical experience thus far, it seems like running multiple ETL packages on the same hardware isn't a good idea, but I can't be the first person that doesn't want to scale multiple ETLs around manual scheduling, and sequential processing. One option we've considered is virtualizing the servers, which obviously doesn't give you any additional resources, but moves the resource contention onto the hypervisor, which (from my experience) seems to manage simultaneous CPU/RAM/Disk I/O a little more gracefully than letting DTExec, SQL, and SSAS battle it out within Windows. Question to the forum: So my question to the forum is, are we missing something obvious here? Are there tools out there that can help manage running multiple SSIS packages on the same hardware? Would it be more "efficient" in terms of parallel execution if instead of running DTExec, SQL, and SSAS same machine (with every machine running that configuration), we run in pairs of three machines with SSIS running on one machine, SQL on another, and SSAS on a third? Obviously that would only make sense if we could process more than the three ETL we were able to process on the machine independently. Another option we've considered is completely re-architecting our SSIS package to have one "master" package for all clients that attempts to intelligently chose a server based off how "busy" it already is in terms of CPU/Memory/Disk utilization, but that would be a herculean effort, and seems like we're trying to reinvent something that you would think someone would sell (although I haven't had any luck finding it). So in summary, are we missing an obvious solution for this, and does anyone know if any tools (for free or for purchase, doesn't matter) that facilitate running multiple SSIS ETL packages in parallel and on multiple servers? (What I would call a "queue & node based" system, but that's not an official term). Ultimately VMWare's Distributed Resource Scheduler addresses this as you simply run a consistent number of clients per VM that you know will never conflict scheduleing-wise, then leave it up to VMWare to move the VMs around to balance out hardware usage. I'm definitely not against using VMWare to do this, but since we're a 100% Microsoft app stack, it seems like -someone- out there would have solved this problem at the application layer instead of the hypervisor layer by checking on resource utilization at the OS, SQL, SSAS levels. I'm open to ANY discussion on this, and remember no suggestion is too crazy or radical! :-) Right now, VMWare is the only option we've found to get away from "manually" balancing our resources, so any suggestions that leave us on a pure Microsoft stack would be great. Thanks guys, Jeff

    Read the article

  • Cross-platform distributed fault-tolerant (disconnected operation/local cache) filesystem

    - by Adrian Frühwirth
    We are facing a design "challenge" where we are required to set up a storage solution with the following properties: What we need HA a scalable storage backend offline/disconnected operation on the client to account for network outages cross-platform access client-side access from certainly Windows (probably XP upwards), possibly Linux backend integrates with AD/LDAP (permission management (user/group management, ...)) should work reasonably well over slow WAN-links Another problem is that we don't really know all possible use cases here, if people need to be able to have concurrent access to shared files or if they will only be accessing their own files, so a possible solution needs to account for concurrent access and how conflict management would look in this case from a user's point of view. This two years old blog posts sums up the impression that I have been getting during the last couple of days of research, that there are lots of current übercool projects implementing (non-Windows) clustered petabyte-capable blob-storage solutions but that there is none that supports disconnected operation nicely and natively, but I am hoping that we have missed an obvious solution. What we have tried OpenAFS We figured that we want a distributed network filesystem with a local cache and tested OpenAFS (which, as the only currently "stable" DFS supporting disconnected operation, seemed the way to go) for a week but there are several problems with it: it's a real pain to set up there are no official RHEL/CentOS packages the package of the current stable version 1.6.5.1 from elrepo randomly kernel panics on fresh installs, this is an absolute no-go Windows support (including the required Kerberos packages) is mystical. The current client for the 1.6 branch does not run on Windows 8, the current client for the 1.7 does but it just randomly crashes. After that experience we didn't even bother testing on XP and Windows 7. Suffice to say, we couldn't get it working and the whole setup has been so unstable and complicated to setup that it's just not an option for production. Samba + Unison Since OpenAFS was a complete disaster and no other DFS seems to support disconnected operation we went for a simpler idea that would sync files against a Samba server using Unison. This has the following advantages: Samba integrates with ADs; it's a pain but can be done. Samba solves the problem of remotely accessing the storage from Windows but introduces another SPOF and does not address the actual storage problem. We could probably stick any clustered FS underneath Samba, but that means we need a HA Samba setup on top of that to maintain HA which probably adds a lot of additional complexity. I vaguely remember trying to implement redundancy with Samba before and I could not silently failover between servers. Even when online, you are working with local files which will result in more conflicts than would be necessary if a local cache were only touched when disconnected It's not automatic. We cannot expect users to manually sync their files using the (functional, but not-so-pretty) GTK GUI on a regular basis. I attempted to semi-automate the process using the Windows task scheduler, but you cannot really do it in a satisfactory way. On top of that, the way Unison works makes syncing against Samba a costly operation, so I am afraid that it just doesn't scale very well or even at all. Samba + "Offline Files" After that we became a little desparate and gave Windows "offline files" a chance. We figured that having something that is inbuilt into the OS would reduce administrative efforts, helps blaming someone else when it's not working properly and should just work since people have been using this for years. Right? Wrong. We really wanted it to work, but it just doesn't. 30 minutes of copying files around and unplugging network cables/disabling network interfaces left us with (silent! there is only a tiny notification in Windows explorer in the statusbar, which doesn't even open Sync Center if you click on it!) undeletable files on the server (!) and conflicts that should not even be conflicts. In the end, we had one successful sync of a tiny text file, everything else just exploded horribly. Beyond that, there are other problems: Microsoft admits that "offline files" in Windows XP cannot cope with "large files" and therefore does not cache/sync them at all which would mean those files become unavailable if the connection drop In Windows 7 the feature is only available in the Professional/Ultimate/Enterprise editions. Summary Unless there is another fault-tolerant DFS that supports Windows natively I assume that stacking a HA Samba cluster on top of something like GlusterFS/Lustre/whatnot is the only option, but I hope that I am wrong here. How do other companies allow fault-tolerant network access to redundant storage in a heterogeneous environment with Windows?

    Read the article

  • Alternative to distributed caching

    - by Chen
    Hi, There is a technical requirement to scale a new system easily. This new system consists of three tiered applications (as a batch processors). Each tier will contains at least 2 servers with the same application resides on each server. So, when one of the tier reaches peak performance, we could extend the scalability easily by adding a new server and the same application to off-load some of the processing loads. The problem is that one or two of the three tiers require heavy caching (about 3 million records and increasing). I'm thinking of using distributed caching system to overcome this problem but the new distributed caching system will means an additional point of failure as applications now need to interact with additional caching systems for processing. I'm currently looking at ncache but just wondering if there is an alternatives to this problem? or is there any other comparable distributed caching system that maybe similar or better than ncache and provide enterprise supports too? Thanks, Chen

    Read the article

  • Is there a secure p2p distributed database?

    - by p2pgirl
    I'm looking for a distributed hash table to store and retrieve values securely. These are my requirements: It must use an existing popular p2p network (I must guarantee my key/value will be stored and kept in multiple peers). None but myself should be able to edit or delete the key/value. Ideally an encryption key that only I have access to would be required to edit my key value. All peers would be able to read the key value (read-only access, only the key holder would be able to edit the value) Is there such p2p distributed hash table? Would the bittorrent distributed hash table meet my requirements?' Where could I find documentation?

    Read the article

  • Grid computing projects similar to NGrid (thread based)

    - by DivdeAndConquer
    Hello there, first time poster. This is a great place for reading about programming problems. I've been looking at some grid computing projects for .Net/Mono and stumbled upon NGrid. NGrid seems really appealing for grid computing because you simply pass threads to it and there is very little modification you have to make to your code. However, I see that NGrid (http://ngrid.sourceforge.net/?page=overview) is still at version 0.7 and hasn't been updated since May 2008. So, I'm wondering if there are any other grid computing projects that use a similar thread-passing architecture and if anyone has had success using NGrid.

    Read the article

  • Understanding: cloud-server, cloud-hosting, cloud-computing, the cloud

    - by Abel
    There's a lot of buzz about these subjects and there seems little consensus on the terms. Is that just me not understanding the subject, or is there a clear meaning for each of these terms? Are there more elaborate terms or descriptions that describe what a cloud provider has, is or offers? EDIT: rewritten question, apparently it was unclear, partially due to the bloat I added.

    Read the article

  • How are distributed services better than distributed objects?

    - by Gabriel Šcerbák
    I am not interested in the technology e.g. CORBA vs Web Services, I am interested in principles. When we are doing OOP, why should we have something so procedural at higher level? Is not it the same as with OOP and relational databases? Often services are supported through code generation, apart from boilerplate, I think it is because we new SOM - service object mapper. So again, what are the reasons for wervices rather than objects?

    Read the article

  • Web Site Serving, Cloud-Computing, oh, my

    - by Frank
    I'm planning a software based service. To give it a bit of context (type of traffic), assume it similar to facebook in nature (with a little GitHub thrown in). I've been trying to understand my different hosting options. I've been using a shared host with GoDaddy for years just fine. I currently host a Wordpress web site there and I've not had any problems. Quite frankly, they've taken good care of me. However, the nature of a shared hosting environment is limited in nature. For example, I can't do anything but host a web site there. For example, I can not run a Mercurial server. Last time I attempted to build a web application with the intention of eventually launching it via GoDaddy, I ran in to all sorts of troubles because it was shared-hosted. Assembly issues, etc. At the time, the cost and time sank my project. (The lack of direct access was also frustrating.) (to be fair to godaddy, this was over 3 years ago) I've been looking at Rackspace or Amazon as a possible cloud solution but it seems to be just processing power and bandwidth (and an OS). From what I understand, I'd need to get Apache and MySQL Working on my own. The way cloud hosting is priced, however, seems appealing. I figure my final option might be to use a virtual private host. I think this would be more flexible than a shared-host site but less scalable than a cloud based server. So, I guess my question is what is an appropriate solution for someone who intends to build a web application service? I figure that I need to establish a hosting environment now rather than later so I can plan to effectively use the environment. I'd prefer to be fairly economical to start out with. I really can't afford to pay $999 (or even $99) while I build up the site and get the core functionality online but at the same time, I'd like to have the selected environment grow as needed. Thank you.

    Read the article

  • Distributed, Parallel, Fault-tolerant File System

    - by Eddified
    There are so many choices that it's hard to know where to start. My requirements are these: Runs on Linux Most of the files will be between 5-9 MB in size. There will also be a significant number of small-ish jpgs (100px x 100px). All of the files need to be available over http. Redundancy -- ideally it would provide the space efficiency similar to RAID 5 of 75% (in RAID 5 this would be calculated thus: with 4 identical disks, 25% of the space is used for parity = 75% efficent) Must support several petabytes of data scalable runs on commodity hardware In addition, I look for these qualities, though they are not "requirements": Stable, mature file system Lots of momentum and support etc I would like some input as to which file system works best for the given requirements. Some people at my organization are leaning towards MogileFS, but I'm not convinced of the stability and momentum of that project. GlusterFS and Lustre, based on my limited research, appear to be better supported... Thoughts?

    Read the article

  • Web application design with distributed servers

    - by Bonn
    I want to build a web application/server with this structure: main-server sub-server transaction-server (create, update, delete) view-server (view, search) authentication-server documents-server reporting-server library-server e-learning-server The main-server acts as host server for sub-server. I can add many sub-servers and connect it to main-server (via plug-play interface maybe), then it can begin querying data from another sub-servers (which has been connected to the main-server). The sub-servers can be anywhere as long as connected to internet. The main-server can manage all sub-servers which are connected to it (query data, setting permission between sub-servers, etc). The purpose is simple, the web application will be huge as the company grows, so I want to distribute it into small connected plug-able servers. My question is, does the structure above already have a standardized method? or are there any different views? what are the technologies needed? I need a lot of researches before the execution plan begin. thanks a lot.

    Read the article

  • How secure is cloud computing?

    - by Rhubarb
    By secure, I don't mean the machines itself and access to it from the network. I mean, and I suppose this could be applied to any kind of hosting service, when you put all your intellectual property onto a hosted provider, what happens to the hard disks as they cycle through them? Say I've invested million into my software, and the information and data that I have is valuable, how can I be sure it isn't read off old disks as they're recycled? Is there some kind of standard to look for that ensures a provider is going to use the strictest form of intellectual property protection? Is SAS70 applicable here?

    Read the article

  • Distributed filesystem across a slow link

    - by Jeff Ferland
    I have an image in my head where a link is too slow to realize the real-time transfer of files, but fast enough to catch up every day. What I'd like to see is a master <- master setup where when I write a file to Server A, the metadata will transfer to Server B immediately and the file will transfer at idle or immediately when Server B's client tries to read the file before Server A has sent it. It seems that there are many filesystems which can perform well over fast links, but I don't know of any that do well with a big bottle neck and a few hours of latency.

    Read the article

  • Can Foswiki be used as a distributed Redmine replacement? [closed]

    - by Tobias Kienzler
    I am quite familiar with and love using git, among other reasons due to its distributed nature. Now I'd like to set up some similarly distributed (FOSS) Project Management software with features similar to what Redmine offers, such as Issue & time tracking, milestones Gantt charts, calendar git integration, maybe some automatic linking of commits and issues Wiki (preferably with Mathjax support) Forum, news, notifications Multiple Projects However, I am looking for a solution that does not require a permanently accesible server, i.e. like in git, each user should have their own copy which can be easily synchronized with others. However it should be possible to not have a copy of every Project on every machine. Since trac uses multiple instances for multiple projects anyway, I was considering using that, but I neither know how well it adapts to simply giting the database itself (which would be be easiest way to handle the distribution due to git being used anyway), nor does it include all of Redmine's feature. After checking http://www.wikimatrix.org for Wikis with integrated tracking system and RCS support, and filtering out seemingly stale project, the choices basically boil down to Foswiki, TWiki and Ikiwiki. The latter doesn't seem to offer as many usability features, and in the TWiki vs Foswiki issue I tend to the latter. Finally, there is Fossil, which starts from the other end by attempting to replace git entirely and tracking itself. I am however not too comfortable with the thought of replacing git, and Fossil's non-SCM features don't seem to be as developed. Now before I invest too much time when someone else might already have tried this, I basically have two questions: Are there crucial features of Project Management software like Redmine that Foswiki does not provide even with all the extensions available? How to set Foswiki up to use git instead of the perl RcsLite?

    Read the article

  • Cloud computing?

    - by Shawn H
    I'm an analyst and intermediate programmer working for a consulting company. Sometimes we are doing some intensive computing in Excel which can be frustrating because we have slow computers. My company does not have enough money to buy everyone new computers right now. Is there a cloud computing service that allows me to login to a high performance virtual computer from remote desktop? We are not that technical so preferrably the computer is running Windows and I can run Excel and other applications from this computer. Thanks

    Read the article

  • cloud computing in .net 4.0

    - by HotTester
    Since the launch of .net 4.0 the buzz word has been cloud computing. But very little is said and discussed about it in perspective of .net technologies. Further is it really the worth to invest or do we have sufficient current technologies that can handle what cloud computing offers ? Can you please describe it and an example would be quite helpful ! Thanks in advance.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >