Search Results

Search found 356 results on 15 pages for 'datasets'.

Page 6/15 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

GDD-BR 2010 [2F] Storage, Bigquery and Prediction APIs

GDD-BR 2010 [2F] Storage, Bigquery and Prediction APIs Speaker: Patrick Chanezon Track: Cloud Computing Time slot: F [15:30 - 16:15] Room: 2 Level: 101 Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access. From: GoogleDevelopers Views: 1 0 ratings Time: 39:27 More in Science & Technology

Read the article
links for 2010-04-07

- by Bob Rhubart

James McGovern: Enterprise Architecture and Social CRM "With a few exceptions, the vast majority of enterprise architects I know spend an awful lot of time focused on internal issues whether it is rationalization, the cloud, storage governance, data center consolidation, creation of reference architectures, portfolio management and other considerations that aren’t even visible to customers. One should ask whether IT can be truly successful if we are busy listening to the business but otherwise are blissfully ignorant towards the customers they serve." -- James McGovern (tags: enterprisearchitecture crm socialcomputing) WRF Benchmark: X6275 Beats Power6 - BestPerf "Oracle's Sun Blade X6275 cluster is 28% faster than the IBM POWER6 cluster on Weather Research and Forecasting (WRF) continental United Status (CONUS) benchmark datasets. The Sun Blade X6275 cluster used a Quad Data Rate (QDR) InfiniBand connection along with Intel compilers and MPI." (tags: oracle sun x6275 benchmarks)

Read the article
Design review for application facing memory issues

- by Mr Moose

I apologise in advance for the length of this post, but I want to paint an accurate picture of the problems my app is facing and then pose some questions below; I am trying to address some self inflicted design pain that is now leading to my application crashing due to out of memory errors. An abridged description of the problem domain is as follows; The application takes in a “dataset” that consists of numerous text files containing related data An individual text file within the dataset usually contains approx 20 “headers” that contain metadata about the data it contains. It also contains a large tab delimited section containing data that is related to data in one of the other text files contained within the dataset. The number of columns per file is very variable from 2 to 256+ columns. The original application was written to allow users to load a dataset, map certain columns of each of the files which basically indicating key information on the files to show how they are related as well as identify a few expected column names. Once this is done, a validation process takes place to enforce various rules and ensure that all the relationships between the files are valid. Once that is done, the data is imported into a SQL Server database. The database design is an EAV (Entity-Attribute-Value) model used to cater for the variable columns per file. I know EAV has its detractors, but in this case, I feel it was a reasonable choice given the disparate data and variable number of columns submitted in each dataset. The memory problem Given the fact the combined size of all text files was at most about 5 megs, and in an effort to reduce the database transaction time, it was decided to read ALL the data from files into memory and then perform the following; perform all the validation whilst the data was in memory relate it using an object model Start DB transaction and write the key columns row by row, noting the Id of the written row (all tables in the database utilise identity columns), then the Id of the newly written row is applied to all related data Once all related data had been updated with the key information to which it relates, these records are written using SqlBulkCopy. Due to our EAV model, we essentially have; x columns by y rows to write, where x can by 256+ and rows are often into the tens of thousands. Once all the data is written without error (can take several minutes for large datasets), Commit the transaction. The problem now comes from the fact we are now receiving individual files containing over 30 megs of data. In a dataset, we can receive any number of files. We’ve started seen datasets of around 100 megs coming in and I expect it is only going to get bigger from here on in. With files of this size, data can’t even be read into memory without the app falling over, let alone be validated and imported. I anticipate having to modify large chunks of the code to allow validation to occur by parsing files line by line and am not exactly decided on how to handle the import and transactions. Potential improvements I’ve wondered about using GUIDs to relate the data rather than relying on identity fields. This would allow data to be related prior to writing to the database. This would certainly increase the storage required though. Especially in an EAV design. Would you think this is a reasonable thing to try, or do I simply persist with identity fields (natural keys can’t be trusted to be unique across all submitters). Use of staging tables to get data into the database and only performing the transaction to copy data from staging area to actual destination tables. Questions For systems like this that import large quantities of data, how to you go about keeping transactions small. I’ve kept them as small as possible in the current design, but they are still active for several minutes and write hundreds of thousands of records in one transaction. Is there a better solution? The tab delimited data section is read into a DataTable to be viewed in a grid. I don’t need the full functionality of a DataTable, so I suspect it is overkill. Is there anyway to turn off various features of DataTables to make them more lightweight? Are there any other obvious things you would do in this situation to minimise the memory footprint of the application described above? Thanks for your kind attention.

Read the article
Google I/O 2010 - Tips and tricks for Google Earth API and KML

Google I/O 2010 - Tips and tricks for Google Earth API and KML Google I/O 2010 - Mapping in 3D: Tips and tricks for Google Earth API and KML Geo 201 Josh Livni, Mano Marks Google Earth and the Earth API can handle a tremendous amount of data. But you always have more. We will talk about integrating large datasets efficiently, coding for optimal performance, and taking advantage of advanced features in KML and the Earth API. For all I/O 2010 sessions, please go to code.google.com From: GoogleDevelopers Views: 14 0 ratings Time: 01:01:18 More in Science & Technology

Read the article
Interactive manifest editing with the Automated Installer Manifest Wizard

- by Glynn Foster

Oracle Solaris 11.2 adds a new Automated Installer (AI) Manifest Wizard to allow administrators to more easily create AI manifests for use in provisioning new client systems in the data center. The AI Manifest Wizard is a web web based interface that steps administrators through the basics of the AI manifest - target disks and layout selection, additional ZFS pools and datasets, IPS publisher and package selection, and the creation of any Oracle Solaris Zone virtual environments. The end result is an AI manifest without having to directly edit XML, and this can then be associated with an appropriate AI service. To get started, check out How To Create an Automated Installer Manifest with an Interactive Wizard

Read the article
Google I/O 2010 - BigQuery and Prediction APIs

Google I/O 2010 - BigQuery and Prediction APIs Google I/O 2010 - BigQuery and Prediction APIs App Engine 101 Amit Agarwal, Max Lin, Gideon Mann, Siddartha Naidu Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access. For all I/O 2010 sessions, please go to code.google.com From: GoogleDevelopers Views: 6 0 ratings Time: 57:48 More in Science & Technology

Read the article
How do I start correctly in building database classes in c#?

- by e4rthdog

I am new in C# programming and in OOP. I need to dive into web applications for my company, and I need to do it fast and correct. So even that I know ASP.NET MVC is the way to go, I want to start with some simple applications with ASP.NET Webforms and then advance to MVC logic. Also regarding my db classes: I plan to create common database classes in order to be able to use them either from WinForms or ASP.NET applications. I also know that the way to go is to learn about ORM and EF. BUT I also want to start from where I am feeling comfortable and that is the traditional ADO.NET way. So about my Data Access Layer classes: Should I return my results in datasets or arraylists/lists? Should my methods do their own connect/disconnect from the db, or have separate methods and let the application maintain the connection?

Read the article
Is there a modern (eg NoSQL) web analytics solution based on log files?

- by Martin

I have been using Awstats for many years to process my log files. But I am missing many possibilities (like cross-domain reports) and I hate being stuck with extra fields I created years ago. Anyway, I am not going to continue to use this script. Is there a modern apache logs analytics solution based on modern storage technologies like NoSQL or at least somehow ready to cope with large datasets efficiently? I am primarily looking for something that generates nice sortable and searchable outputs with the focus on web analytics, before having to write my own frontends. (so graylog2 is not an option) This question is purely about log file based solutions.

Read the article
Which programming language could I use for Natural Language Processing to extract clinical words?

- by MACEE

I am going to do entity extraction (like Named Entity Recognition) from clinical free text (unstructured raw text such as discharge summaries) and these entities could be any medical problem, medical tests or treatments. I am going to use one of i2b2 datasets (https://www.i2b2.org/) if case you are familiar with that. I am new to the NLP(Natural Language Processing) field and I need a programming language to support NLP tasks and also easily connect to the available libraries of machine learning algorithms like CRF. I don't know much java and I heard about Python, Perl and Scala but I am not sure which one would be the best option for this task?

Read the article
Automatically kill a process if it exceeds a given amount of RAM

- by chrisamiller

I work on large-scale datasets. When testing new software, a script will sometimes sneak up on me, quickly grab all available RAM, and render my desktop unusable. I'd like a way to set a RAM limit for a process so that if it exceeds that amount, it will be killed automatically. A language-specific solution probably won't work, as I use all sorts of different tools (R, Perl, Python, Bash, etc). So is there some sort of process-monitor that will let me set a threshold amount of RAM and automatically kill a process if it uses more?

Read the article
ASP.NET Combo Box and List Box Performance Improvements - v2010 vol 1

Check out this great new performance feature of our ASP.NET combo box and list box controls for the DXperience v2010.1 release. You can now manually populate lists with items based on the currently applied filter criteria. This means that you can significantly decrease web server workload by loading only a subset of all items when working with large datasets. For instance, when using a large data source, you can only request a few records to be visible on the screen. The rest of the items can...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
ETL Software Research Question

- by WernerCD

Where I work, we use an in-house ETL solution that's homegrown and has been around for 5-10 years. I'm still new to my data analysis job, but I was wondering about the ETL tools that are out there. This is a new area for me. My situation, and job, is basically digging in a set of databases (DB2, SQL2005, Citrix, Ancient Cobol Database with a SQL Wrapper on top, MySQL, etc). Gather the desired information. combine the different datasets into one set. output into a file of choice (CSV, Tab Separated, Pipe Separated, XLS, etc). FTP to customer. I guess what my real question is, given my job, what are some good ETL suites that I can look at and compare to my in-house tools? This is more to research some other options. Ultimately, I'd either suggest a new solution or get options/ideas to improve our current app.

Read the article
sqlbulkcopy using sql CE

- by harrisonmeister

Is it possible to use SqlBulkcopy with Sql Compact Edition e.g. (*.sdf) files? I know it works with SQL Server 200 Up, but wanted to check CE compatibility. If it doesnt does anyone else know the fastest way of getting a CSV type file into SQL Server CE without using DataSets (puke here)?

Read the article
Prevent scrollbars with WPF WebBrowser displaying content

- by Kevin Montrose

I'm using the WPF WebBrowser component to display some very simple HTML content. However, since I don't know the content size in advance, I'm currently getting scrollbars on the control when I load certain datasets. Basically, how can I force (or otherwise effect the equivalent of forcing) the WebBrowser to expand in size so that all content is displayed without the need for scrollbars?

Read the article
Test data generators / quickest route to generating solid, non-repetitive, but not-real database sam

- by Jamo

I need to build a quick feasibility test / proof-of-concept of a remote database for a client, that will be populated with mostly-typical Company and People data (names, addresses, etc); 150K records or so. The sample databases mentioned here were helpful: http://stackoverflow.com/questions/57068/good-databases-with-sample-data ...but, I'd like to be able to generate sample data like this easily on less-typical datasets as well. Anyone have any recommendations for off-the-shelf (or off-the-web) solutions?

Read the article
Mongoid or MongoMapper?

- by PanosJee

I have tried MongoMapper and it is feature complete (offering almost all AR functionality) but i was not very happy with the performance when using large datasets. Has anyone compared with Mongoid? Any performance gains ?

Read the article
How to do wpf datavalidation with Ado.net

- by biju

How can i use data validation mechanisms with ado.net datatable or datasets. I have an input form which i am binding to a datatable.Now i want to do input validation how can i do that.I have tried using validationRules but i cant bind parameters to it.I tried using idataerrorinfo but cant get a clue.can someone provide some input..?

Read the article
Merits of .NET ORM data access methods Enity Framework vs. NHibernate vs. Subsonic vs. ADO.NET Datas

- by Lloyd

I have recently heard "fanboys" of different .NET ORM methodologies express strong, if not outlandish oppinions of other ORM methodologies. And frankly feel a bit in the dark. Could you please explain the key merits of each of these .NET ORM solutions? Entity Framework NHibernate Subsonic ADO.NET Datasets I have a good understanding of 1&4, and a cursory understanding of 2&3, but apparently not enough to understand the implied cultural perceptions of one towards the other.

Read the article
Multiple Connection Types for one Designer Generated TableAdapter

- by Tim

I have a Windows Forms application with a DataSet (.xsd) that is currently set to connect to a Sql Ce database. Compact Edition is being used so the users can use this application in the field without an internet connection, and then sync their data at day's end. I have been given a new project to create a supplemental web interface for displaying some of the same reports as the Windows Forms application so certain users can obtain reports without installing the Windows app. What I've done so far is create a new Web Project and added it to my current Solution. I have split both the reports (.rdlc) and DataSets out of the Windows Forms project into their own projects so they can be accessed by both the Windows and Web applications. So far, this is working fine. Here's my dilemma: As I said before, the DataSets are currently set up to connect to a local Sql Ce database file. This is correct for the Windows app, but for the Web application I would like to use these same TableAdapters and queries to connect to the Sql Server 2005 database. I have found that the designer generated, strongly-typed TableAdapter classes have a ConnectionModifier property that allows you to make the TableAdapter's Connection public. This exposes the Connection property and allows me to set it, however it is strongly-typed as a SqlCeConnection, whereas I would like to set it to a SqlConnection for my Web project. I'm assuming the DataSet Designer strongly-types the Connection, Command, and DataAdapter objects based on the Provider of the ConnectionString as indicated in the app.config file. Is there any way I can use some generic provider so that the DataSet Designer will use object types that can connect to both a Sql Ce database file AND the actual Sql Server 2005 database? I know that SqlCeConnection and SqlConnection both inherit from DbConnection, which implements IDbConnection. Relatively, the same goes for SqlCeCommand/SqlCommand:DbCommand:IDbCommand. It would be nice if I could just figure out a way for the designer to use the Interface types rather than the strong types, but I'm hesitant that that is possible. I hope my problem and question are clear. Any help is much appreciated. Let me know if there's anything I can clarify.

Read the article
Use cases for NoSQL

- by seengee

NoSQL has been getting a lot of attention in the industry recently. Really interested in peoples thoughts on the best use-cases for its use over relational database storage. What should trigger a developer into thinking that particular datasets are more suited to a NoSQL solution like Redis/CouchDB/MongoDB/Cassandra etc. Would also be really interested to hear what people have ported from relational db's to NoSQL and what improvements they have seen.

Read the article
Visualize Classifier Error Weka

- by user1780592

Hye there i have a have datasets where this data i have test it on weka with J48 classifier It give me an output = 87.2611% Total of instances = 157 Correctly Instances = 137 Incorrectly instance = 20 Then i have do a visualize classifier error on my data. However my result have been decrease to: New result = 85.4015% Correctly Instances = 117 Incorrectly instances = 20 Total of instances = 137 Is there any reason for that? Should my result become much better after i do the visualize classifier error?

Read the article
Open existing reportings sevices in Business Intelligence Development Studio

- by Uwe

Hello, I've lost my Reporting Services project file for Business Intelligence Development Studio. How can I open the existing reports in a new project? Can I "open" the whole project like in SSAS? I want to get the datasets and data sources into the project again as well.

Read the article
Transport Database Query through Web Services without using Dataset

- by Marty Trenouth

Understanding that passing Datasets through web services is a bit heavy (and almost if not completely unconsumable to non .NET clients) what is the best way to prep database query results that don't map to known types for transport through web services in c#?

Read the article
two UserControls, one page, need to notify each other of updates

- by jeriley

I've got a user control thats used twice on the same page, each have the ability to be updated (a dropdown list gets a new item) and I'm not sure what might be the best way to handle this. One concern - this is an older system (~4+ years, datasets, .net2) and it is amazingly brittle. I did manage to have it run on 3.5 with no problems, but I've had a few run-ins with the javascript validation (~300 lines per page) throwing up all over the place when I change/add/modify controls in the parent.

Read the article
Mainframe : JCL DISP parameters

- by Manasi

Hi, I would like to know the difference between KEEP and UNCATLG. As per my knowledge both will remove the catalog entry for the datasets. Thanks and Regards,Manasi

Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >