Search Results

Search found 59554 results on 2383 pages for 'distributed data mining'.

Page 16/2383 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >

E-Book on big data (featuring Analysts, Customers and more)

- by Jean-Pierre Dijcks

As we are gearing up for Openworld, here is a nice E-book on big data to start paging through. It contains Gartner's take on big data, customer and partner interviews and a lot more good info. Enjoy the read so you come prepared for Openworld!! Read the E-Book here. For those coming to Oracle Openworld (or the Americas Cup races around the same time), you can find big data sessions via this URL. Enjoy!!

Read the article
Choices in Architecture, Design, Algorithms, Data Structures for effective RDF Reasoning and Querying in a Big Data Environment [on hold]

- by user2891213

As part of my academic project I would like to know what choices in Architecture, Design, Algorithms, Data Structures do we need in order to provide effective and efficient RDF Reasoning and Querying in a Big Data Environment. Basically I want to get info regarding below points: What are the Systems and Software to get appropriate Architecture? What kind of API layer(s) would we need on top of the Big Data stores, to make this possible? The Indexing structures we will need. The appropriate Algorithms, and appropriate Algorithms for Query Planning across Big Data stores. The Performance Analysis and Cost Models we will need to justify the design decisions we have made along the way. Can anyone please provide pointers.. Thanks, David

Read the article
Uploading your data using the Maps Data API just got easier

The Google Maps Data API is a great way to host your geographic data on Google’s scalable, high-performance servers, making your data accessible across platforms using HTTP or...

Read the article
MSDTC attempts to enlist client machine in a distributed transaction

- by Ken

Hi there We're seeing the following intermittent warning logged by MSDTC: A caller has attempted to propagate a transaction to a remote system, but MSDTC network DTC access is currently disabled on machine 'X'. Please review the MS DTC configuration settings. However, MSDTC is disabled on machine X by design - it's a client machine, and has no business being enlisted in the transaction! Several windows service endpoints hosting WCF services over TCP Single SQL Server 2005 instance beneath Linq to Sql Remote client receives event callbacks over WCF/TCP The issue is tricky to reproduce - usually following restart of services. We suspect a callback to the client machine is occurring within the context of a transaction. Just wondering if anyone has seen similar issues?? Ken

Read the article
running a python script where dependencies are not avail: distributed computing

- by sadhu_

Hi, I have access to a grid (running condor) that would (potentially) allow to very substantially reduce how long by nltk based nlp tasks take. unfortunately, i dont have root access on the cluster so cannot install new packages, only run whatever is available on the linux boxes. python is of course available, but nltk isnt - i was wondering however, if there might be a way around this somehow ? is there a way i can somehow still distribute the task in a self-contained 'package' of some sort? Thanks for your hel

Read the article
Sphinx search distributed index tuning

- by Andriy Bohdan

I'm deciding how to split 3 large sphinx indexes between 3 servers. Each of the 3 indexes is searched separately. What's more effective: to host each index on separate machine Example machine1 - index1 machine2 - index2 machine3 - index3 or to split each index into 3 parts and host each part of the same index on separate machine. Example machine1 - index1_chunk1, index2_chunk1, index3_chunk1 machine2 - index1_chunk2, index2_chunk2, index3_chunk2 machine3 - index1_chunk3, index2_chunk3, index3_chunk3 ?

Read the article
Best Work Queue service for distributed clusters

- by onewheelgood

Hi there. I require a simple work queue type system for asynchronous task management. I have looked at both beanstalkd and gearman. However, both these seem to assume that the client and the queue server are on the same network, and therefore that there will always be a reliable network between them. I need one that can support the client and server being in different places in the world, and be able to manage temporary loss of network connection between clusters. Ideally, this would work in such a way where I post a job to a local proxy that attempts to send it to the main queue server. If there is no network connection, it would try again later, however it would not lose the job or delay the client. Any recommendations?

Read the article
Estimating the boundary of arbitrarily distributed data

- by Dave

I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it. Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch. My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists. OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg: from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx y=ymin+dy, do 1 do 1-2, but now sample in y An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments. Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary? A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon. The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now? Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really! Cheers, David

Read the article
Understanding omission failure in distributed systems

- by karthik A

The following text says this which I'm not able to quite agree : client C sends a request R to server S. The time taken by a communication link to transport R over the link is D. P is the maximum time needed by S to recieve , process and reply to R. If omission failure is assumed ; then if no reply to R is received within 2(D+P) , then C will never recieve a reply to R . Why is the time here 2(D+P). As I understand shouldn't it be 2D+P ?

Read the article
High performance distributed asynchronous RPC in java

- by unludo

I would like to do RPC to a list of clients with the following requirements: the server does not know the clients (implies a kind of broker?) and the cleints do not know the server there may be several clients - they share the load to treat the RPC The RPC is asynchronous very fast (round-trip < 1ms) optional : offers a fail-over mechanism. It can be done with underlying tools which are not really intended for that (Hazelcast is an example). What would you use for such requirements? Thanks!

Read the article
Generic DRM (Distributed resource management) wrapper

- by Pavel Bernshtam

I need to write a software, which launches DRM jobs in a customer environment and monitors those jobs status. It should work with various customer environments and DRMs - like LSF, Sun Grid and others. Can you recommend some 3rd party library, which hides DRM differences from me and has API like "launch job", "get list of jobs", "get job status" etc. ? Both Java and native libraries are good for me.

Read the article
Data Structure Behind Amazon S3s Keys (Filtering Data Structure)

- by dimo414

I'd like to implement a data structure similar to the lookup functionality of Amazon S3. For those of you who don't know what I'm taking about, Amazon S3 stores all files at the root, but allows you to look up groups of files by common prefixes in their names, therefore replicating the power of a directory tree without the complexity of it. The catch is, both lookup and filter operations are O(1) (or close enough that even on very large buckets - S3's disk equivalents - both operations might as well be O(1))). So in short, I'm looking for a data structure that functions like a hash map, with the added benefit of efficient (at the very least not O(n)) filtering. The best I can come up with is extending HashMap so that it also contains a (sorted) list of contents, and doing a binary search for the range that matches the prefix, and returning that set. This seems slow to me, but I can't think of any other way to do it. Does anyone know either how Amazon does it, or a better way to implement this data structure?

Read the article
Best Data Structure For Time Series Data

- by TriParkinson

Hi all, I wonder if someone could take a minute out of their day to give their two cents on my problem. I would like some suggestions on what would be the best data structure for representing, on disk, a large data set of time series data. The main priority is speed of insertion, with other priorities in decreasing order; speed of retrieval, size on disk, size in memory, speed of removal. I have seen that B+ trees are often used in database because of their fast search times, but how about for fast insertion times? Is a linked list really the way to go? Thanks in advance for your time, Tri

Read the article
MySQL: Blank row in table after LOAD DATA INFILE

- by Tom

Hi, I'm uploading a large amount of data from a CSV (I'm doing it via MySQL Workbench): LOAD DATA INFILE 'C:/development/mydoc.csv' INTO TABLE mydatabase.mytable CHARACTER SET utf8 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r'; However, I'm noticing that it keeps adding an empty line full of nulls/zeros after the last record. I'm guessing it's because of the "LINES TERMINATED" command. However, I need that to load the data in correctly. Is there some way around this / some better SQL to avoid the blank row in the table? Thanks

Read the article
Update tableview instantly as data pushed in core data iphone

- by user336685

I need to update the tableview as soon as the content is pushed in core data database. for this AppDelegate.m contains following code NSManagedObjectContext *moc = [self managedObjectContext]; NSFetchRequest *request = [[NSFetchRequest alloc] init]; [request setEntity:[NSEntityDescription entityForName:@"FeedItem" inManagedObjectContext:moc]]; //for loop // push data in code data & then save context [moc save:&error]; ZAssert(error == nil, @"Error saving context: %@", [error localizedDescription]); //for loop ends This code triggers following code from RootviewController.m - (void)controllerWillChangeContent:(NSFetchedResultsController*)controller { [[self tableView] beginUpdates]; } But this updates the tableview only at the end of the for loop ,the table does not get updated after immediate push in db. I tried following code but that didn't work - (void)controllerDidChangeContent:(NSFetchedResultsController *)controller { // In the simplest, most efficient, case, reload the table view. [self.tableView reloadData]; } I have been stuck with this problem for several days.Please help.Thanks in advance for solution.

Read the article
Core Data data type for just the date - not including time

- by Jason

I am new at Core Data, and it seems like it is a great way to manage the data store. However I am also very memory-conscious due to the fact that the iPhone doesn't have that much of it. I was a little surprised to see that the data types are so limited - eg. there is a Date type which includes also the time, but no Date type for just the date! All the time information takes up precious bytes of memory, if I just wanted an attribute with the date (e.g. 2/15/2010 rather than 2/15/2010 02:34:48), how could I do this? Is it possible?

Read the article
Versioning millions of files with distributed SCM

- by C. Lawrence Wenham

I'm looking into the feasibility of using off-the-shelf distributed SCMs such as Git or Mercurial to manage millions of XML files. Each file would be a commercial transaction, such as a purchase order, that would be updated perhaps 10 times during the lifecycle of the transaction until it is "done" and changes no more. And by "manage", I mean that the SCM would be used to not just version the files, but also to replicate them to other machines for redundancy and transfer of IP. Lets suppose, for the sake of example, that a goal is to provide good performance if it was handling the volume of orders that Amazon.com claimed to have at its peak in December 2010: about 150,000 orders per minute. We're expecting the system to be distributed over many servers in order to get reasonable performance. We're also planning to use solid-state drives exclusively. There is a reason why we don't want to use an RDBMS for primary storage, but it's a bit beyond the scope of this question. Does anyone have first-hand experience with the performance of distributed SCMs under such a load, and what strategies were used? Open-source preferred, since the final product is to be FOSS, too.

Read the article
Big Data – Data Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Clever ways of implementing different data structures in C & data structures that should be used mor

- by Yktula

What are some clever (not ordinary) ways of implementing data structures in C, and what are some data structures that should be used more often? For example, what is the most effective way (generating minimal overhead) to implement a directed and cyclic graph with weighted edges in C? I know that we can store the distances in an array as is done here, but what other ways are there to implement this kind of a graph?

Read the article
Using GameKit to transfer CoreData data between iPhones, via NSDictionary

- by OscarTheGrouch

I have an application where I would like to exchange information, managed via Core Data, between two iPhones. First turning the Core Data object to an NSDictionary (something very simple that gets turned into NSData to be transferred). My CoreData has 3 string attributes, 2 image attributes that are transformables. I have looked through the NSDictionary API but have not had any luck with it, creating or adding the CoreData information to it. Any help or sample code regarding this would be greatly appreciated.

Read the article
Good practices for multiple language data in Core Data

- by Luca Bartoletti

Hi, i need a multilingual coredata db in my iphone app. I could create different database for each language but i hope that in iphone sdk exist an automatically way to manage data in different language core data like for resources and string. Someone have some hints?

Read the article
core-data relationships and data structure.

- by Boaz

What is the right way to build iPhone core data for this SMS like app (with location)? - I want to represent an entity of conversation with "profile1" "profile2" that heritage from a profile entity, and a message entity with: "to" "from" "body" where the "to" and "from" are equal to "profile1" and/or "profile2" in the conversation entity. How can I make such a relationships? is there a better way to represent the data (other structure)? Thanks

Read the article
convert unstructured data to structured data?

- by Codeguru

How to convert unstructured data into structured data?For example email,contacts to a structured format. Are there any algorithms to do this??

Read the article
When is the Data Vault model the right model for a data-warehouse?

- by Stephan Eggermont

I recently found a reference to 'Data Vault modeling' as a model for data-warehouses. The models I've seen before are Inmon and Kimball. The author refers to possible performance problems due to the joins needed. It looks like a nice model, but I wonder about the gotcha's. Are there any experience reports on-line?

Read the article
[Visual C++]Forcing memory alignment of variables/data-structures

- by John

I'm looking at using SSE and I gather aligning data on 16byte boundaries is recommended. There are two cases to consider: float data[4]; struct myystruct { float x,y,z,w; }; I'm not sure the first case can be done explicitly, though there's perhaps a compiler option I could use? In the second case I remember being able to control packing in old versions of GCC several years back, is this still possible?

Read the article

< Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >