datasets - Developer IT

Large public datasets?

- by Jason

I am looking for some large public datasets, in particular: Large sample web server logs that have been anonymized. Datasets used for database performance benchmarking. Any other links to large public datasets would be appreciated. I already know about Amazon's public datasets at: http://aws.amazon.com/publicdatasets/

Read the article

Merging datasets with 2 different time variables in SAS

- by John

Hye Guys, for those regularly browsing this site sorry for already another question (however I did solve my last question myself!) I have another problem with merging datasets, it seems that accounting for time in datasets is a real pain in the ass. I succesfully managed to merge on months in my previous datasets, however it seems I have a final dataset which only has quarter as a time count variable. So where all my normal databases have month 1- xxx as an indicator of time, this database had quarter as an indicator of time. I still want to add the variables of this last database, let's call it TVOL, into my WORK database. Quick summary QUARTER: Quarter 0 = JAN1996-MAR1996 Month: Month 0 = JAN1996 Example: TVOL TVOL _ Ticker ____ Quarter 1500 _ AA ________ -1 52546 _ BB ________ 15 Example: WORK BETA _ Ticker ____ Month 1.52 _ AA ________ 2 1.54__ BB _______ 3 Example: Merged: BETA ______ TVOL __ Ticker ____ Month 1.52 _______ 500 ___ AA _______ 2 I now want to merge this 2 tables using following relationship if the month is in quarter 1, the data of quarter 0 has to be used, so if i have an observation i nWORK with date 2FEB1996 the TVOL of quarter -1 has to be put behind this observation. Something like IF month = quarter i use data quarter i-1. Also, as TVOL is measured quarterly and I have to put in monthly I have to take the average, so (TVOL/3) should be added as a variable. Thanks!

Read the article

ASP.NET - dynamically creating database connection in datasets

- by Foo

I am currently using datasets for my data access layer. I currently store the connection string in the web.config file. I need the ability to change the connection to another database before any queries are processed. Is there an event that is triggered or a base class that can be modified that intercepts the process of retrieving the connection string from the web.config file? Is there any way to handle multiple database connections using the same code base and also take advantage of the connection pooling? I understand the best method is to get rid of datasets and use custom data objects. Any ideas?

Read the article

Datasets for Running Statistical Analysis on

- by Tal Galili

What datasets exist out on the internet that I can run statistical analysis on?

Read the article

Managing Data Prefetching and Dependencies with .NET Typed Datasets

- by Derek Morrison

I'm using .NET typed datasets on a project, and I often get into situations where I prefetch data from several tables into a dataset and then pass that dataset to several methods for processing. It seems cleaner to let each method decide exactly which data it needs and then load the data itself. However, several of the methods work with the same data, and I want the performance benefit of loading data in the beginning only once. My problem is that I don't know of a good way or pattern to use for managing dependencies (I want to be sure I load all the data that I'm going to need for each class/method that will use the dataset). Currently, I just end up looking through the code for the various classes that will use the dataset to make sure I'm loading everything appropriately. What are good approaches or patterns to use in this situation? Am I doing something fundamentally wrong? Although I'm using typed datasets, this seems like it would be a common situation where prefetching data is used. Thanks!

Read the article

Datasets do not regenerate code behind.

- by Nick

So I was lucky enough to inherit a project where someone decided to use datasets as a model. The problem is that a column has been added to table in the database. Using the dataset desinger I added the column to the table and tryied to run the 'Custom tool' That tool is doing absolutly nothing from what I can tell. So is there a way to make the generated dataset code actually represent the changes that are made at the designer? Now I know why I have never used a dataset. :)

Read the article

Selecting one row when working with typed datasets.

- by Wodzu

I have a typed dataset in my project. I would like to populate a datatable with only one row instead of all rows. The selected row must be based on the primary key column. I know I could modify the designer code to achive this functionality however if I change the code in the designer I risk that this code will be deleted when I update my datased via designer in the future. So I wanted to alter the SelectCommand not in the designer but just before firing up MyTypedTableAdapter.Fill method. The strange thing is that the designer does not create a SelectCommand! It creates all other commands but not this one. If it would create SelectCommand I could alter it in this way: this.operatorzyTableAdapter.Adapter.SelectCommand.CommandText += " WHERE MyColumn = 1"; It is far from perfection but atleast I would not have to modify the designer's work. unfortunately as I said earlier the SelectCommand is not created. Instead designer creates something like this: [global::System.Diagnostics.DebuggerNonUserCodeAttribute()] private void InitCommandCollection() { this._commandCollection = new global::System.Data.SqlClient.SqlCommand[1]; this._commandCollection[0] = new global::System.Data.SqlClient.SqlCommand(); this._commandCollection[0].Connection = this.Connection; this._commandCollection[0].CommandText = "SELECT Ope_OpeID, Ope_Kod, Ope_Haslo, Ope_Imie, Ope_Nazwisko FROM dbo.Operatorzy"; this._commandCollection[0].CommandType = global::System.Data.CommandType.Text; } It doesn't make sense in my opinion. Why to create UpdateCommand, InsertCommand and DeleteCommand but do not create SelectCommand? I could bear with this but this._commandCollection is private so I cannot acces it outside of the class code. I don't know how to get into this collection without changing the designer's code. The idea which I have is to expose the collection via partial class definition. However I want to introduce many typed datasets and I really don't want to create partial class definition for each of them. Please note that I am using .NET 3.5. I've found this article about accessing private properties but it concerns .NET 4.0 Thanks for your time.

Read the article

BigQuery - Best Practices for Running Queries on Massive Datasets

BigQuery - Best Practices for Running Queries on Massive Datasets Join Michael Manoochehri and Ryan Boyd from the big data Developer Relations team on Friday, September 21th, at 10am PDT, as they discuss best practices for answering questions about massive datasets with Google BigQuery. They'll explore interesting Big Data use cases with some of our public datasets, using BigQuery's SQL-like language to return query results in seconds. They will also cover some of BigQuery's unique query functions as well. For a general overview of BigQuery, watch our overview video: youtu.be Please use the moderator below (goo.gl to ask your questions, which will be answered live! More info here: developers.google.com From: GoogleDevelopers Views: 0 0 ratings Time: 00:00 More in Science & Technology

Read the article

Moving DataSets through BizTalk

- by EltonStoneman

[Source: http://geekswithblogs.net/EltonStoneman] Yuck. But sometimes you have to, so here are a couple of things to bear in mind: Schemas Point a codegen tool at a WCF endpoint which exposes a DataSet and it will generate an XSD which describes the DataSet like this: <xs:elementminOccurs="0"name="GetDataSetResult"nillable="true"> <xs:complexType> <xs:annotation> <xs:appinfo> <ActualTypeName="DataSet" Namespace="http://schemas.datacontract.org/2004/07/System.Data" xmlns="http://schemas.microsoft.com/2003/10/Serialization/" /> </xs:appinfo> </xs:annotation> <xs:sequence> <xs:elementref="xs:schema" /> <xs:any /> </xs:sequence> </xs:complexType> </xs:element> In a serialized instance, the element of type xs:schema contains a full schema which describes the structure of the DataSet – tables, columns etc. The second element, of type xs:any, contains the actual content of the DataSet, expressed as DiffGrams: <GetDataSetResult> <xs:schemaid="NewDataSet"xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns=""xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:elementname="NewDataSet"msdata:IsDataSet="true"msdata:UseCurrentLocale="true"> <xs:complexType> <xs:choiceminOccurs="0"maxOccurs="unbounded"> <xs:elementname="Table1"> <xs:complexType> <xs:sequence> <xs:elementname="Id"type="xs:string"minOccurs="0" /> <xs:elementname="Name"type="xs:string"minOccurs="0" /> <xs:elementname="Date"type="xs:string"minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> </xs:schema> <diffgr:diffgramxmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1"xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <NewDataSetxmlns=""> <Table1diffgr:id="Table11"msdata:rowOrder="0"diffgr:hasChanges="inserted"> <Id>377fdf8d-cfd1-4975-a167-2ddb41265def</Id> <Name>157bc287-f09b-435f-a81f-2a3b23aff8c4</Name> <Date>a5d78d83-6c9a-46ca-8277-f2be8d4658bf</Date> </Table1> </NewDataSet> </diffgr:diffgram> </GetDataSetResult> Put the XSD into a BizTalk schema and it will fail to compile, giving you error: The 'http://www.w3.org/2001/XMLSchema:schema' element is not declared. You should be able to work around that, but I've had no luck in BizTalk Server 2006 R2 – instead you can safely change that xs:schema element to be another xs:any type: <xs:elementminOccurs="0"name="GetDataSetResult"nillable="true"> <xs:complexType> <xs:sequence> <xs:any /> <xs:any /> </xs:sequence> </xs:complexType> </xs:element> (This snippet omits the annotation, but you can leave it in the schema). For an XML instance to pass validation through the schema, you'll also need to flag the any attributes so they can contain any namespace and skip validation: <xs:elementminOccurs="0"name="GetDataSetResult"nillable="true"> <xs:complexType> <xs:sequence> <xs:anynamespace="##any"processContents="skip" /> <xs:anynamespace="##any"processContents="skip" /> </xs:sequence> </xs:complexType> </xs:element> You should now have a compiling schema which can be successfully tested against a serialised DataSet. Transforms If you're mapping a DataSet element between schemas, you'll need to use the Mass Copy Functoid to populate the target node from the contents of both the xs:any type elements on the source node: This should give you a compiled map which you can test against a serialized instance. And if you have a .NET consumer on the other side of the mapped BizTalk output, it will correctly deserialize the response into a DataSet.

Read the article

DataSets and XML - The Simplistic Approach

One of the first ways I learned how to read xml data from external data sources was by using a DataSet’s ReadXML function. This function takes file path for an XML document and then converts it to a Dataset. This functionality is great when you need a simple way to process an XML document. In addition the DataSet object also offers a simple way to save data in an xml format by using the WriteXML function. This function saves the current data in the DataSet to an XML file to be used later. DataSet ds = New DataSet();String filePath = “http://www.yourdomain.com/someData.xml”;String fileSavePath = “C:\Temp\Test.xml”//Read file for this locationds.readxml(filePath);//Save file to this locationds.writexml(fileSavePath); I have used the ReadXML function before when consuming data from external Rss feeds to display on one of my sites. It allows me to quickly pull in data from external sites with little to no processing. Example site: MyCreditTech.com

Read the article

Shared Datasets in SQL Server 2008 R2

This article leverages the examples and concepts explained in the Part I through Part IV of the spatial data series which develops a "BI-Satellite" app. Overview In the spatial data series we ... [Read Full Article]

Read the article

Returning multiple datasets from a stored proc in VB

- by Ryan Stephens

I have an encrypted SQL Server stored proc that I run with (or the vb .net equivalent code of): declare @p4 nvarchar(100) set @p4=NULL declare @p5 bigint set @p5=NULL exec AA_PAY_BACS_EXPORT_RETRIEVE_S @PS_UserId=N'ADMN', @PS_Department=N'', @PS_PayFrequency=2, @PS_ErrorDescription=@p4 output select @p4, @p5 this returns 2 datasets and the output parameters for the results, the datasets are made up of various table joins etc, one holds the header record and one the detail records. I need to get the 2 datasets into a structure in VB .net (e.g. linq, sqldatareader, typed datasets) so that I can code with them, I don't know what tables any of this comes from and there are alot of them Whooopeee!!! I came close using Linq to SQL and IMultipleResults but got frustrated when I had to recode it every time I made a change to the designer file. My feelings are that there must be a simple way to do this, any ideas?

Read the article

DataSets to POCOs - an inquiry regarding DAL architecture

- by alexsome

Hello all, I have to develop a fairly large ASP.NET MVC project very quickly and I would like to get some opinions on my DAL design to make sure nothing will come back to bite me since the BL is likely to get pretty complex. A bit of background: I am working with an Oracle backend so the built-in LINQ to SQL is out; I also need to use production-level libraries so the Oracle EF provider project is out; finally, I am unable to use any GPL or LGPL code (Apache, MS-PL, BSD are okay) so NHibernate/Castle Project are out. I would prefer - if at all possible - to avoid dishing out money but I am more concerned about implementing the right solution. To summarize, there are my requirements: Oracle backend Rapid development (L)GPL-free Free I'm reasonably happy with DataSets but I would benefit from using POCOs as an intermediary between DataSets and views. Who knows, maybe at some point another DAL solution will show up and I will get the time to switch it out (yeah, right). So, while I could use LINQ to convert my DataSets to IQueryable, I would like to have a generic solution so I don't have to write a custom query for each class. I'm tinkering with reflection right now, but in the meantime I have two questions: Are there any problems I overlooked with this solution? Are there any other approaches you would recommend to convert DataSets to POCOs? Thanks in advance.

Read the article

sql server 2005 reporting services-- how to use multiple datasets in report

- by larryq

Hi everyone, I'm new to SQL Server reporting services, and am trying to decipher an existing report. It's nothing too bad, but I notice it does have two report datasets defined. (They are generated via separate stored procedures) I'm trying to figure out where and how the report datasets are linked together so the Fields collection has both sets of columns available and the report has a single rowset to traverse. Is there a section in the report layout where a joining of datasets is defined? I'm using Visual Studio 2005 to design and preview the report fwiw. Thanks for your help!

Read the article

How to move from untyped DataSets to POCO\LINQ2SQL in legacy application

- by artvolk

Good day! I've a legacy application where data access layer consists of classes where queries are done using SqlConnection/SqlCommand and results are passed to upper layers wrapped in untyped DataSets/DataTable. Now I'm working on integrating this application into newer one where written in ASP.NET MVC 2 where LINQ2SQL is used for data access. I don't want to rewrite fancy logic of generating complex queries that are passed to SqlConnection/SqlCommand in LINQ2SQL (and don't have permission to do this), but I'd like to have result of these queries as strong-typed objects collection instead of untyped DataSets/DataTable. The basic idea is to wrap old data access code in a nice-looking from ASP.NET MVC "Model". What is the fast\easy way of doing this?

Read the article

MongoDB and datasets that don't fit in RAM no matter how hard you shove

- by sysadmin1138

This is very system dependent, but chances are near certain we'll scale past some arbitrary cliff and get into Real Trouble. I'm curious what kind of rules-of-thumb exist for a good RAM to Disk-space ratio. We're planning our next round of systems, and need to make some choices regarding RAM, SSDs, and how much of each the new nodes will get. But now for some performance details! During normal workflow of a single project-run, MongoDB is hit with a very high percentage of writes (70-80%). Once the second stage of the processing pipeline hits, it's extremely high read as it needs to deduplicate records identified in the first half of processing. This is the workflow for which "keep your working set in RAM" is made for, and we're designing around that assumption. The entire dataset is continually hit with random queries from end-user derived sources; though the frequency is irregular, the size is usually pretty small (groups of 10 documents). Since this is user-facing, the replies need to be under the "bored-now" threshold of 3 seconds. This access pattern is much less likely to be in cache, so will be very likely to incur disk hits. A secondary processing workflow is high read of previous processing runs that may be days, weeks, or even months old, and is run infrequently but still needs to be zippy. Up to 100% of the documents in the previous processing run will be accessed. No amount of cache-warming can help with this, I suspect. Finished document sizes vary widely, but the median size is about 8K. The high-read portion of the normal project processing strongly suggests the use of Replicas to help distribute the Read traffic. I have read elsewhere that a 1:10 RAM-GB to HD-GB is a good rule-of-thumb for slow disks, As we are seriously considering using much faster SSDs, I'd like to know if there is a similar rule of thumb for fast disks. I know we're using Mongo in a way where cache-everything really isn't going to fly, which is why I'm looking at ways to engineer a system that can survive such usage. The entire dataset will likely be most of a TB within half a year and keep growing.

Read the article

Using DataSets and DataTables, Microsoft opinion

- by Azat

I heard that microsoft does not recommend using DataSets and DataTables in new projects. Instead it recommends using their new technologies or build own class structures. Is that true? If yes, then could someone please give me the official link. Thank you :)

Read the article

ArcObjects - enumerating feature classes and datasets within a geodatabase

- by Tom

I'm trying to enumerate the contents (feature classes and feature datasets, not interested in tables, etc) of a file geodatabase using vba/arcobjects. I have the file GDB set as an IGxDatabase object, but can't find a way of getting further in. I've had a look at the geodatabase object model and tried using IFeatureClass and IFeatureDataset but neither seem to return useful results. Thanks in advance for any assistance

Read the article

Binary serialization of datasets parameters in web services

- by Someone

In a system with both ends (client and server) in .NET, is it possible to use the binary serialization provided by the dataset class in ADO.NET 2.0 when the datasets are exposed as WebMethods parameters ? Is it ok to use something like the following just before the dataset is returned: someDataSet.RemotingFormat = SerializationFormat.Binary;

Read the article

Interpolating Large Datasets On the Fly

- by Karl

Interpolating Large Datasets I have a large data set of about 0.5million records representing the exchange rate between the USD / GBP over the course of a given day. I have an application that wants to be able to graph this data or maybe a subset. For obvious reasons I do not want to plot 0.5 million points on my graph. What I need is a smaller data set (100 points or so) which accurately (as possible) represents the given data. Does anyone know of any interesting and performant ways this data can be achieved? Cheers, Karl

Read the article

Free Large datasets to experiment with Hadoop

- by Sundar

Do you know any large datasets to experiment with Hadoop which is free/low cost? Any pointers/links related is appreciated. Prefernce: Atleast one GB of data. Production log data of webserver. Few of them which I found so far: http://dumps.wikimedia.org/enwiki/20100130/ http://wiki.freebase.com/wiki/Data_dumps http://aws.amazon.com/publicdatasets/ Also can we run our own crawler to gather data from sites e.g. Wikipedia? Any pointers on how to do this is appreciated as well.

Read the article

Importing large datasets on iPhone using CoreData

- by Matthes

Hi there, I'm facing very annoying problem. My iPhone app is loading it's data from a network server. Data are sent as plist and when parsed, it neeeds to be stored to SQLite db using CoreData. Issue is that in some cases those datasets are too big (5000+ records) and import takes way too long. More on that, when iPhone tries to suspend the screen, Watchdog kills the app because it's still processing the import and does not respond up to 5 seconds, so import is never finished. I used all recommended techniques according to article "Efficiently Importing Data" http://developer.apple.com/mac/library/DOCUMENTATION/Cocoa/Conceptual/CoreData/Articles/cdImporting.html and other docs concerning this, but it's still awfully slow. Solution I'm looking for is to let app suspend, but let import run in behind (better one) or to prevent attempts to suspend the app at all. Or any better idea is welcomed too. Any tips on how to overcome these issues are highly appreciated! Thanks

Read the article

Storage for large gridded datasets

- by nullglob

I am looking for a good storage format for large, gridded datasets. The application is meteorology, and we would prefer a format that is common within this field (to help exchange data with others). I don't need to deal with special data structures, and there should be a Fortran API. I am currently considering HDF5, GRIB2 and NetCDF4. How do these formats compare in terms of data compression? What are their main limitations? How steep is the learning curve? Are there any other storage formats worth investigating? I have not found a great deal of material outlining the differences and pros/cons of these formats (there is one relevant SO thread, and a presentation comparing GRIB and NetCDF).

Read the article

Using report viewer, how do I pull from two seperate Datasets

- by Robert

I have two datasets I need to pull from, A base that both reports use and then a separate one that only one report pulls from. I get the error Error 12 The Value expression for the text box ‘Textbox9’ refers to the field ‘Name’. Report item expressions can only refer to fields within the current dataset scope or, if inside an aggregate, the specified dataset scope. My best guess is I have to associate them with the correct dataset but I have not been able to find any documentation on this. Can someone please tell me where in the rdlc document I need to code something like name.value, "dataset1" or something similar?

Read the article

PostgreSQL - Why are some queries on large datasets so incredibly slow

- by Brad Mathews

Hello, I have two types of queries I run often on two large datasets. They run much slower than I would expect them to. The first type is a sequential scan updating all records: Update rcra_sites Set street = regexp_replace(street,'/','','i') rcra_sites has 700,000 records. It takes 22 minutes from pgAdmin! I wrote a vb.net function that loops through each record and sends an update query for each record (yes, 700,000 update queries!) and it runs in less than half the time. Hmmm.... The second type is a simple update with a relation and then a sequential scan: Update rcra_sites as sites Set violations='No' From narcra_monitoring as v Where sites.agencyid=v.agencyid and v.found_violation_flag='N' narcra_monitoring has 1,700,000 records. This takes 8 minutes. The query planner refuses to use my indexes. The query runs much faster if I start with a set enable_seqscan = false;. I would prefer if the query planner would do its job. I have appropriate indexes, I have vacuumed and analyzed. I optimized my shared_buffers and effective_cache_size best I know to use more memory since I have 4GB. My hardware is pretty darn good. I am running v8.4 on Windows 7. Is PostgreSQL just this slow? Or am I still missing something? Thanks! Brad

Search Results

Search found 356 results on 15 pages for 'datasets'.

Page 1/15 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Jason

- by John

- by Foo

- by Tal Galili

- by Derek Morrison

- by Nick

- by Wodzu

- by EltonStoneman

- by Ryan Stephens

- by alexsome

- by larryq

- by artvolk

- by sysadmin1138

- by Azat

- by Tom

- by Someone

- by Karl

- by Sundar

- by Matthes

- by nullglob

- by Robert

- by Brad Mathews

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >