Search Results

Search found 65558 results on 2623 pages for 'very large data'.

Page 23/2623 | < Previous Page | 19 20 21 22 23 24 25 26 27 28 29 30 | Next Page >

Storage for large gridded datasets

- by nullglob

I am looking for a good storage format for large, gridded datasets. The application is meteorology, and we would prefer a format that is common within this field (to help exchange data with others). I don't need to deal with special data structures, and there should be a Fortran API. I am currently considering HDF5, GRIB2 and NetCDF4. How do these formats compare in terms of data compression? What are their main limitations? How steep is the learning curve? Are there any other storage formats worth investigating? I have not found a great deal of material outlining the differences and pros/cons of these formats (there is one relevant SO thread, and a presentation comparing GRIB and NetCDF).

Read the article
JavaScript Data Binding Frameworks

- by dwahlin

Data binding is where it’s at now days when it comes to building client-centric Web applications. Developers experienced with desktop frameworks like WPF or web frameworks like ASP.NET, Silverlight, or others are used to being able to take model objects containing data and bind them to UI controls quickly and easily. When moving to client-side Web development the data binding story hasn’t been great since neither HTML nor JavaScript natively support data binding. This means that you have to write code to place data in a control and write code to extract it. Although it’s certainly feasible to do it from scratch (many of us have done it this way for years), it’s definitely tedious and not exactly the best solution when it comes to maintenance and re-use. Over the last few years several different script libraries have been released to simply the process of binding data to HTML controls. In fact, the subject of data binding is becoming so popular that it seems like a new script library is being released nearly every week. Many of the libraries provide MVC/MVVM pattern support in client-side JavaScript apps and some even integrate directly with server frameworks like Node.js. Here’s a quick list of a few of the available libraries that support data binding (if you like any others please add a comment and I’ll try to keep the list updated): AngularJS MVC framework for data binding (although closely follows the MVVM pattern). Backbone.js MVC framework with support for models, key/value binding, custom events, and more. Derby Provides a real-time environment that runs in the browser an in Node.js. The library supports data binding and templates. Ember Provides support for templates that automatically update as data changes. JsViews Data binding framework that provides “interactive data-driven views built on top of JsRender templates”. jQXB Expression Binder Lightweight jQuery plugin that supports bi-directional data binding support. KnockoutJS MVVM framework with robust support for data binding. For an excellent look at using KnockoutJS check out John Papa’s course on Pluralsight. Meteor End to end framework that uses Node.js on the server and provides support for data binding on the client. Simpli5 JavaScript framework that provides support for two-way data binding. WinRT with HTML5/JavaScript If you’re building Windows 8 applications using HTML5 and JavaScript there’s built-in support for data binding in the WinJS library. I won’t have time to write about each of these frameworks, but in the next post I’m going to talk about my (current) favorite when it comes to client-side JavaScript data binding libraries which is AngularJS. AngularJS provides an extremely clean way – in my opinion - to extend HTML syntax to support data binding while keeping model objects (the objects that hold the data) free from custom framework method calls or other weirdness. While I’m writing up the next post, feel free to visit the AngularJS developer guide if you’d like additional details about the API and want to get started using it.

Read the article
Storing large data in HTTP Session (Java Application)

- by Umesh Awasthi

I am asking this question in continuation with http-session-or-database-approach. I am planning to follow this approach. When user add product to cart, create a Cart Model, add items to cart and save to DB. Convert Cart model to cart data and save it to HTTP session. Any update/ edit update underlying cart in DB and update data snap shot in Session. When user click on view cart page, just pick cart data from Session and display to customer. I have following queries regarding HTTP Session How good is it to store large data (Shopping Cart) in Session? How scalable this approach can be ? (With respect to Session) Won't my application going to eat and demand a lot of memory? Is my approach is fine or do i need to consider other points while designing this? Though, we can control what all cart data should be stored in the Session, but still we need to have certain information in cart data being stored in session?

Read the article
Protect Data and Save Money? Learn How Best-in-Class Organizations do Both

- by roxana.bradescu

Databases contain nearly two-thirds of the sensitive information that must be protected as part of any organization's overall approach to security, risk management, and compliance. Solutions for protecting data housed in databases vary from encrypting data at the application level to defense-in-depth protection of the database itself. So is there a difference? Absolutely! According to new research from the Aberdeen Group, Best-in-Class organizations experience fewer data breaches and audit deficiencies - at lower cost -- by deploying database security solutions. And the results are dramatic: Aberdeen found that organizations encrypting data within their databases achieved 30% fewer data breaches and 15% greater audit efficiency with 34% less total cost when compared to organizations encrypting data within applications. Join us for a live webcast with Derek Brink, Vice President and Research Fellow at the Aberdeen Group, next week to learn how your organization can become Best-in-Class.

Read the article
Bad Data is Really the Monster

- by Dain C. Hansen

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Bad Data is really the monster – is an article written by Bikram Sinha who I borrowed the title and the inspiration for this blog. Sinha writes: “Bad or missing data makes application systems fail when they process order-level data. One of the key items in the supply-chain industry is the product (aka SKU). Therefore, it becomes the most important data element to tie up multiple merchandising processes including purchase order allocation, stock movement, shipping notifications, and inventory details… Bad data can cause huge operational failures and cost millions of dollars in terms of time, resources, and money to clean up and validate data across multiple participating systems. Yes bad data really is the monster, so what do we do about it? Close our eyes and hope it stays in the closet? We’ve tacked this problem for some years now at Oracle, and with our latest introduction of Oracle Enterprise Data Quality along with our integrated Oracle Master Data Management products provides a complete, best-in-class answer to the bad data monster. What’s unique about it? Oracle Enterprise Data Quality also combines powerful data profiling, cleansing, matching, and monitoring capabilities while offering unparalleled ease of use. What makes it unique is that it has dedicated capabilities to address the distinct challenges of both customer and product data quality – [different monsters have different needs of course!]. And the ability to profile data is just as important to identify and measure poor quality data and identify new rules and requirements. Included are semantic and pattern-based recognition to accurately parse and standardize data that is poorly structured. Finally all of the data quality components are integrated with Oracle Master Data Management, including Oracle Customer Hub and Oracle Product Hub, as well as Oracle Data Integrator Enterprise Edition and Oracle CRM. Want to learn more? On Tuesday Nov 15th, I invite you to listen to our webcast on Reduce ERP consolidation risks with Oracle Master Data Management I’ll be joined by our partner iGate Patni and be talking about one specific way to deal with the bad data monster specifically around ERP consolidation. Look forward to seeing you there!

Read the article
CPU Usage in Very Large Coherence Clusters

- by jpurdy

When sizing Coherence installations, one of the complicating factors is that these installations (by their very nature) tend to be application-specific, with some being large, memory-intensive caches, with others acting as I/O-intensive transaction-processing platforms, and still others performing CPU-intensive calculations across the data grid. Regardless of the primary resource requirements, Coherence sizing calculations are inherently empirical, in that there are so many permutations that a simple spreadsheet approach to sizing is rarely optimal (though it can provide a good starting estimate). So we typically recommend measuring actual resource usage (primarily CPU cycles, network bandwidth and memory) at a given load, and then extrapolating from those measurements. Of course there may be multiple types of load, and these may have varying degrees of correlation -- for example, an increased request rate may drive up the number of objects "pinned" in memory at any point, but the increase may be less than linear if those objects are naturally shared by concurrent requests. But for most reasonably-designed applications, a linear resource model will be reasonably accurate for most levels of scale. However, at extreme scale, sizing becomes a bit more complicated as certain cluster management operations -- while very infrequent -- become increasingly critical. This is because certain operations do not naturally tend to scale out. In a small cluster, sizing is primarily driven by the request rate, required cache size, or other application-driven metrics. In larger clusters (e.g. those with hundreds of cluster members), certain infrastructure tasks become intensive, in particular those related to members joining and leaving the cluster, such as introducing new cluster members to the rest of the cluster, or publishing the location of partitions during rebalancing. These tasks have a strong tendency to require all updates to be routed via a single member for the sake of cluster stability and data integrity. Fortunately that member is dynamically assigned in Coherence, so it is not a single point of failure, but it may still become a single point of bottleneck (until the cluster finishes its reconfiguration, at which point this member will have a similar load to the rest of the members). The most common cause of scaling issues in large clusters is disabling multicast (by configuring well-known addresses, aka WKA). This obviously impacts network usage, but it also has a large impact on CPU usage, primarily since the senior member must directly communicate certain messages with every other cluster member, and this communication requires significant CPU time. In particular, the need to notify the rest of the cluster about membership changes and corresponding partition reassignments adds stress to the senior member. Given that portions of the network stack may tend to be single-threaded (both in Coherence and the underlying OS), this may be even more problematic on servers with poor single-threaded performance. As a result of this, some extremely large clusters may be configured with a smaller number of partitions than ideal. This results in the size of each partition being increased. When a cache server fails, the other servers will use their fractional backups to recover the state of that server (and take over responsibility for their backed-up portion of that state). The finest granularity of this recovery is a single partition, and the single service thread can not accept new requests during this recovery. Ordinarily, recovery is practically instantaneous (it is roughly equivalent to the time required to iterate over a set of backup backing map entries and move them to the primary backing map in the same JVM). But certain factors can increase this duration drastically (to several seconds): large partitions, sufficiently slow single-threaded CPU performance, many or expensive indexes to rebuild, etc. The solution of course is to mitigate each of those factors but in many cases this may be challenging. Larger clusters also lead to the temptation to place more load on the available hardware resources, spreading CPU resources thin. As an example, while we've long been aware of how garbage collection can cause significant pauses, it usually isn't viewed as a major consumer of CPU (in terms of overall system throughput). Typically, the use of a concurrent collector allows greater responsiveness by minimizing pause times, at the cost of reducing system throughput. However, at a recent engagement, we were forced to turn off the concurrent collector and use a traditional parallel "stop the world" collector to reduce CPU usage to an acceptable level. In summary, there are some less obvious factors that may result in excessive CPU consumption in a larger cluster, so it is even more critical to test at full scale, even though allocating sufficient hardware may often be much more difficult for these large clusters.

Read the article
large product image structured data and visibility

- by Mark Resølved

On an eCommerce site we two images for a product. One medium sized shown on top of the page and one large photo shown on click in an overlay. We use http://schema.org/Product microdata on the page. We'd like the large, initially hidden, photo to be the main image for the product, as it's the better looking one. So it's also referenced in the XML sitemap as <image:image>. So we also put the itemprop"image" attribute on the, hidden large image. But i'm wondering is it a bad idea to use a microdata attribute on a hidden style="display:none;" element? is there a better way to embed the main image in terms of SEO, without showing it initially?

Read the article
Where can I locate business data to use in my application?

- by Aaron McIver

This question talks about any and all free public raw data which appeared to have valuable pieces but nothing that really provides what I am looking for. Instead of using a socially defined listing of businesses (foursquare), I would like a business listing data set of registered businesses and associated addresses that could then be searchable based on location (coordinates). The critical need is that the data set should be filterable based on varying criteria (give me all restaurants, coffee shops, etc...). If the data is free that is great but anywhere that sells this type of data would also suffice. Infochimps looked like a possibility but perhaps something a bit more extensive exists. Where can I find a free or for fee data set of registered business that is filterable based on type of business and location?

Read the article
How to create a large level game?

- by Siddharth

I want to know how to create a large game which has more than one level in it and those levels are loaded from the xml file. In my game I have many objects for each different level which I have to load when use click on it. At present for example my game contain 20 levels and now I was loading all the graphic object for all 20 levels. But the correct way was that only load graphic of that particular level only. So I don't know how to do that. So please explain this by providing game example. At present I was creating a class for each my game object image by extending sprite to it. I know it was not a suitable way so provide guidance on it. Basically I want to know how to create large games in andengine? Please help me about that because it will provide help to other community member also because andengine did not have proper documentation for learning developer about how to manage large game?

Read the article
Display large amount of data to client through pagination

- by ebram tharwat

I have a web application in which i need to show a big number of data or records for clients. Now i 'll use pagination but i was wondering should I: Load all the data once then pagination, sorting and sarching 'll be easy..But it 'll takes big time(using local DB it takes up to 9 sec.) Or each time i show new page(from the pagination) i make a new request to server and then new request to DB to get the next records..But then what if the client click on Prev button, i 'll make a new request to get data that I had previously..Should i cach data that are loaded before and how if that's good technique? So load all data once or make a new request every time i need data that maybe have been loaded before. I'm using ASP.NET MVC SPA with durandaljs and knockoutjs

Read the article
How to gain understanding of large systems? [closed]

- by vonolsson

Possible Duplicate: How do you dive into large code bases? I have worked as a developer developing C/C++ applications for mobile platforms (Windows Mobile and Symbian) for about six years. About a year ago, however, I changed job and currently work with large(!) enterprise systems with high security and availability requirements, all developed in Java. My problem is that I am having a hard time getting a grip on the architecture of the systems, even after a year working with them, and understanding systems other people have built has never been my strong side. The fact that I haven't worked with enterprise systems before doesn't exactly help. Does anyone have a good approach on how to learn and understand large systems? Are there any particular techniques and/or patterns I should read up on?

Read the article
Visualizing the SiteMap of a large (page number) website

- by Michael

I'm looking for a tool or service that can spider a web domain with a large number of pages, create a sitemap, and then visualize that map in a way that will help me see, understand and group content (I'm new to the site) Something like a tree-view or other standard Site Map visualizations would be great. I am yet unable to find a tool that does this (I've found plenty of things to spider the site and create an xml file, nothing to visualize it) Thanks!

Read the article
Sending Large Data > 1 MB through Windows Sockets viz using the Send function

- by user66854

Hi all, I am looking to send a large message 1 MB through the windows sockets send api . Is there a efficient way to do this , i do not want to loop and then send the data in chunks . I have read somewhere that you can increase the socket buffer size and that could help . Could anyone please elobrate on this . Any help is appreciated

Read the article
How to write a large number of nested records in JSON with Python

- by jamesmcm

I want to produce a JSON file, containing some initial parameters and then records of data like this: { "measurement" : 15000, "imi" : 0.5, "times" : 30, "recalibrate" : false, { "colorlist" : [234, 431, 134] "speclist" : [0.34, 0.42, 0.45, 0.34, 0.78] } { "colorlist" : [214, 451, 114] "speclist" : [0.44, 0.32, 0.45, 0.37, 0.53] } ... } How can this be achieved using the Python json module? The data records cannot be added by hand as there are very many.

Read the article
Sybase PowerDesigner Change Many (Find/Replace/Convert) Data Item's Data Types

- by Andy

Hello, I have a relatively large Conceptual Data Model in PowerDesigner. After generating a Physical Data Model and seeing the DBMS data types, I need to update all of data types(NUMBER/TEXT) for each data item. I'd like to either do a find/replace within the Conceptual Data Model or somehow map to different data types when creating the Physical Data Model. Ex. Change the auto conversion of Text - Clob, to Text - NVARCHAR(20). Thanks!

Read the article
Isolate large amounts of data w/ jQuery

- by Bry4n

I work for a transit agency and I have large amounts of data (mostly times), and I need a way to filter the data using two textboxes (To and From). I found jQuery quick search, but it seems to only work with one textbox. If anyone has any ideas via jQuery or some other client side library, that would be fantastic.

Read the article
are there any useful datasets available on the web for data mining?

- by niko

Hi, Does anyone know any good resource where example (real) data can be downloaded for experimenting statistics and machine learning techniques such as decision trees etc? Currently I am studying machine learning techniques and it would be very helpful to have real data for evaluating the accuracy of various tools. If anyone knows any good resource (perhaps csv, xls files or any other format) I would be very thankful for a suggestion.

Read the article
Big Data: Size isn’t everything

- by Simon Elliston Ball

Big Data has a big problem; it’s the word “Big”. These days, a quick Google search will uncover terabytes of negative opinion about the futility of relying on huge volumes of data to produce magical, meaningful insight. There are also many clichéd but correct assertions about the difficulties of correlation versus causation, in massive data sets. In reading some of these pieces, I begin to understand how climatologists must feel when people complain ironically about “global warming” during snowfall. Big Data has a name problem. There is a lot more to it than size. Shape, Speed, and…err…Veracity are also key elements (now I understand why Gartner and the gang went with V’s instead of S’s). The need to handle data of different shapes (Variety) is not new. Data developers have always had to mold strange-shaped data into our reporting systems, integrating with semi-structured sources, and even straying into full-text searching. However, what we lacked was an easy way to add semi-structured and unstructured data to our arsenal. New “Big Data” tools such as MongoDB, and other NoSQL (Not Only SQL) databases, or a graph database like Neo4J, fill this gap. Still, to many, they simply introduce noise to the clean signal that is their sensibly normalized data structures. What about speed (Velocity)? It’s not just high frequency trading that generates data faster than a single system can handle. Many other applications need to make trade-offs that traditional databases won’t, in order to cope with high data insert speeds, or to extract quickly the required information from data streams. Unfortunately, many people equate Big Data with the Hadoop platform, whose batch driven queries and job processing queues have little to do with “velocity”. StreamInsight, Esper and Tibco BusinessEvents are examples of Big Data tools designed to handle high-velocity data streams. Again, the name doesn’t do the discipline of Big Data any favors. Ultimately, though, does analyzing fast moving data produce insights as useful as the ones we get through a more considered approach, enabled by traditional BI? Finally, we have Veracity and Value. In many ways, these additions to the classic Volume, Velocity and Variety trio acknowledge the criticism that without high-quality data and genuinely valuable outputs then data, big or otherwise, is worthless. As a discipline, Big Data has recognized this, and data quality and cleaning tools are starting to appear to support it. Rather than simply decrying the irrelevance of Volume, we need as a profession to focus how to improve Veracity and Value. Perhaps we should just declare the ‘Big’ silent, embrace these new data tools and help develop better practices for their use, just as we did the good old RDBMS? What does Big Data mean to you? Which V gives your business the most pain, or the most value? Do you see these new tools as a useful addition to the BI toolbox, or are they just enabling a dangerous trend to find ghosts in the noise?

Read the article
Know your Data Lineage

- by Simon Elliston Ball

An academic paper without the footnotes isn’t an academic paper. Journalists wouldn’t base a news article on facts that they can’t verify. So why would anyone publish reports without being able to say where the data has come from and be confident of its quality, in other words, without knowing its lineage. (sometimes referred to as ‘provenance’ or ‘pedigree’) The number and variety of data sources, both traditional and new, increases inexorably. Data comes clean or dirty, processed or raw, unimpeachable or entirely fabricated. On its journey to our report, from its source, the data can travel through a network of interconnected pipes, passing through numerous distinct systems, each managed by different people. At each point along the pipeline, it can be changed, filtered, aggregated and combined. When the data finally emerges, how can we be sure that it is right? How can we be certain that no part of the data collection was based on incorrect assumptions, that key data points haven’t been left out, or that the sources are good? Even when we’re using data science to give us an approximate or probable answer, we cannot have any confidence in the results without confidence in the data from which it came. You need to know what has been done to your data, where it came from, and who is responsible for each stage of the analysis. This information represents your data lineage; it is your stack-trace. If you’re an analyst, suspicious of a number, it tells you why the number is there and how it got there. If you’re a developer, working on a pipeline, it provides the context you need to track down the bug. If you’re a manager, or an auditor, it lets you know the right things are being done. Lineage tracking is part of good data governance. Most audit and lineage systems require you to buy into their whole structure. If you are using Hadoop for your data storage and processing, then tools like Falcon allow you to track lineage, as long as you are using Falcon to write and run the pipeline. It can mean learning a new way of running your jobs (or using some sort of proxy), and even a distinct way of writing your queries. Other Hadoop tools provide a lot of operational and audit information, spread throughout the many logs produced by Hive, Sqoop, MapReduce and all the various moving parts that make up the eco-system. To get a full picture of what’s going on in your Hadoop system you need to capture both Falcon lineage and the data-exhaust of other tools that Falcon can’t orchestrate. However, the problem is bigger even that that. Often, Hadoop is just one piece in a larger processing workflow. The next step of the challenge is how you bind together the lineage metadata describing what happened before and after Hadoop, where ‘after’ could be a data analysis environment like R, an application, or even directly into an end-user tool such as Tableau or Excel. One possibility is to push as much as you can of your key analytics into Hadoop, but would you give up the power, and familiarity of your existing tools in return for a reliable way of tracking lineage? Lineage and auditing should work consistently, automatically and quietly, allowing users to access their data with any tool they require to use. The real solution, therefore, is to create a consistent method by which to bring lineage data from these data various disparate sources into the data analysis platform that you use, rather than being forced to use the tool that manages the pipeline for the lineage and a different tool for the data analysis. The key is to keep your logs, keep your audit data, from every source, bring them together and use the data analysis tools to trace the paths from raw data to the answer that data analysis provides.

Read the article
NSURLConnection receives data even if no data was thrown back

- by Anna Fortuna

Let me explain my situation. Currently, I am experimenting long-polling using NSURLConnection. I found this and I decided to try it. What I do is send a request to the server with a timeout interval of 300 secs. (or 5 mins.) Here is a code snippet: NSURL *url = [NSURL URLWithString:urlString]; NSURLRequest *request = [NSURLRequest requestWithURL:url cachePolicy:NSURLCacheStorageAllowedInMemoryOnly timeoutInterval:300]; NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:&resp error:&err]; Now I want to test if the connection will "hold" the request if no data was thrown back from the server, so what I did was this: if (data != nil) [self performSelectorOnMainThread:@selector(dataReceived:) withObject:data waitUntilDone:YES]; And the function dataReceived: looks like this: - (void)dataReceived:(NSData *)data { NSLog(@"DATA RECEIVED!"); NSString *string = [NSString stringWithUTF8String:[data bytes]]; NSLog(@"THE DATA: %@", string); } Server-side, I created a function that will return a data once it fits the arguments and returns none if nothing fits. Here is a snippet of the PHP function: function retrieveMessages($vardata) { if (!empty($vardata)) { $result = check_data($vardata) //check_data is the function which returns 1 if $vardata //fits the arguments, and 0 if it fails to fit if ($result == 1) { $jsonArray = array('Data' => $vardata); echo json_encode($jsonArray); } } } As you can see, the function will only return data if the $result is equal to 1. However, even if the function returns nothing, NSURLConnection will still perform the function dataReceived: meaning the NSURLConnection still receives data, albeit an empty one. So can anyone help me here? How will I perform long-polling using NSURLConnection? Basically, I want to maintain the connection as long as no data is returned. So how will I do it? NOTE: I am new to PHP, so if my code is wrong, please point it out so I can correct it.

Read the article
How to maintain an ordered table with Core Data (or SQL) with insertions/deletions?

- by Jean-Denis Muys

This question is in the context of Core Data, but if I am not mistaken, it applies equally well to a more general SQL case. I want to maintain an ordered table using Core Data, with the possibility for the user to: reorder rows insert new lines anywhere delete any existing line What's the best data model to do that? I can see two ways: 1) Model it as an array: I add an int position property to my entity 2) Model it as a linked list: I add two one-to-one relations, next and previous from my entity to itself 1) makes it easy to sort, but painful to insert or delete as you then have to update the position of all objects that come after 2) makes it easy to insert or delete, but very difficult to sort. In fact, I don't think I know how to express a Sort Descriptor (SQL ORDER BY clause) for that case. Now I can imagine a variation on 1): 3) add an int ordering property to the entity, but instead of having it count one-by-one, have it count 100 by 100 (for example). Then inserting is as simple as finding any number between the ordering of the previous and next existing objects. The expensive renumbering only has to occur when the 100 holes have been filled. Making that property a float rather than an int makes it even better: it's almost always possible to find a new float midway between two floats. Am I on the right track with solution 3), or is there something smarter?

Read the article
Compact data structure for storing a large set of integral values

- by Odrade

I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple: Add a new value Iterate over all of the values. There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives. I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions. EDIT: To answer your questions: No, the items don't need to be sorted By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this. There could be a decent number of these sets in memory at once (~100).

Read the article
How can I scrape specific data from a website

- by Stoney

I'm trying to scrape data from a website for research. The urls are nicely organized in an example.com/x format, with x as an ascending number and all of the pages are structured in the same way. I just need to grab certain headings and a few numbers which are always in the same locations. I'll then need to get this data into structured form for analysis in Excel. I have used wget before to download pages, but I can't figure out how to grab specific lines of text. Excel has a feature to grab data from the web (Data-From Web) but from what I can see it only allows me to download tables. Unfortunately, the data I need is not in tables.

Read the article
How should I architect my Model and Data Access layer objects in my website?

- by Robin Winslow

I've been tasked with designing Data layer for a website at work, and I am very interested in architecture of code for the best flexibility, maintainability and readability. I am generally acutely aware of the value in completely separating out my actual Models from the Data Access layer, so that the Models are completely naive when it comes to Data Access. And in this case it's particularly useful to do this as the Models may be built from the Database or may be built from a Soap web service. So it seems to me to make sense to have Factories in my data access layer which create Model objects. So here's what I have so far (in my made-up pseudocode): class DataAccess.ProductsFromXml extends DataAccess.ProductFactory {} class DataAccess.ProductsFromDatabase extends DataAccess.ProductFactory {} These then get used in the controller in a fashion similar to the following: var xmlProductCreator = DataAccess.ProductsFromXml(xmlDataProvider); var databaseProductCreator = DataAccess.ProductsFromXml(xmlDataProvider); // Returns array of Product model objects var XmlProducts = databaseProductCreator.Products(); // Returns array of Product model objects var DbProducts = xmlProductCreator.Products(); So my question is, is this a good structure for my Data Access layer? Is it a good idea to use a Factory for building my Model objects from the data? Do you think I've misunderstood something? And are there any general patterns I should read up on for how to write my data access objects to create my Model objects?

Read the article
If OOP makes problems with large projects, what doesn't?

- by osca

I learned Python OOP at school. My (good in theory, bad in practice) informatics told us about how good OOP was for any purpose; Even/Especially for large projects. Now I don't have any experience with teamwork in software development (what a pity, I'd like to program in a team) and I don't know anything about scaling and large projects either. Since some time I'm reading more and more about that object-oriented programming has (many) disadvantages when it comes to really big and important projects/systems. I got a bit confused by that as I always thought that OOP helped you keep large amounts of code clean and structured. Now why should OOP be problematic in large projects? If it is, what would be better? Functional, Declarative/Imperative?

Read the article

Search Results

Search found 65558 results on 2623 pages for 'very large data'.

Page 23/2623 | < Previous Page | 19 20 21 22 23 24 25 26 27 28 29 30 | Next Page >

- by nullglob

- by dwahlin

- by Umesh Awasthi

- by roxana.bradescu

- by Dain C. Hansen

- by jpurdy

- by Mark Resølved

- by Aaron McIver

- by Siddharth

- by ebram tharwat

- by vonolsson

- by Michael

- by user66854

- by jamesmcm

- by Andy

- by Bry4n

- by niko

- by Simon Elliston Ball

- by Simon Elliston Ball

- by Anna Fortuna

- by Jean-Denis Muys

- by Odrade

- by Stoney

- by Robin Winslow

- by osca

< Previous Page | 19 20 21 22 23 24 25 26 27 28 29 30 | Next Page >