discriminated union - Page 15

strict aliasing and alignment

- by cooky451

I need a safe way to alias between arbitrary POD types, conforming to ISO-C++11 explicitly considering 3.10/10 and 3.11 of n3242 or later. There are a lot of questions about strict aliasing here, most of them regarding C and not C++. I found a "solution" for C which uses unions, probably using this section union type that includes one of the aforementioned types among its elements or nonstatic data members From that I built this. #include <iostream> template <typename T, typename U> T& access_as(U* p) { union dummy_union { U dummy; T destination; }; dummy_union* u = (dummy_union*)p; return u->destination; } struct test { short s; int i; }; int main() { int buf[2]; static_assert(sizeof(buf) >= sizeof(double), ""); static_assert(sizeof(buf) >= sizeof(test), ""); access_as<double>(buf) = 42.1337; std::cout << access_as<double>(buf) << '\n'; access_as<test>(buf).s = 42; access_as<test>(buf).i = 1234; std::cout << access_as<test>(buf).s << '\n'; std::cout << access_as<test>(buf).i << '\n'; } My question is, just to be sure, is this program legal according to the standard?* It doesn't give any warnings whatsoever and works fine when compiling with MinGW/GCC 4.6.2 using: g++ -std=c++0x -Wall -Wextra -O3 -fstrict-aliasing -o alias.exe alias.cpp * Edit: And if not, how could one modify this to be legal?

Read the article

Will the following NHibernate interface mapping work?

- by Ben Aston

I'd like to program against interfaces when working with NHibernate due to type dependency issues within the solution I am working with. SO questions such as this indicate it is possible. I have an ILocation interface and a concrete Location type. Will the following work? HBM mapping: <class name="ILocation" abstract="true" table="ILocation"> <id name="Id" type="System.Guid" unsaved-value="00000000-0000-0000-0000-000000000000"> <column name="LocationId" /> <generator class="guid" /> </id> <union-subclass table="Location" name="Location"> <property name="Name" type="System.String"/> </union-subclass> </class> Detached criteria usage using the interface: var criteria = DetachedCriteria.For<ILocation>().Add(Restrictions.Eq("Name", "blah")); var locations = criteria.GetExecutableCriteria(UoW.Session).List<ILocation>(); Are there any issues with not using the hilo ID generator and/or with this approach in general?

Read the article

Examples of monoids/semigroups in programming

- by jkff

It is well-known that monoids are stunningly ubiquitous in programing. They are so ubiquitous and so useful that I, as a 'hobby project', am working on a system that is completely based on their properties (distributed data aggregation). To make the system useful I need useful monoids :) I already know of these: Numeric or matrix sum Numeric or matrix product Minimum or maximum under a total order with a top or bottom element (more generally, join or meet in a bounded lattice, or even more generally, product or coproduct in a category) Set union Map union where conflicting values are joined using a monoid Intersection of subsets of a finite set (or just set intersection if we speak about semigroups) Intersection of maps with a bounded key domain (same here) Merge of sorted sequences, perhaps with joining key-equal values in a different monoid/semigroup Bounded merge of sorted lists (same as above, but we take the top N of the result) Cartesian product of two monoids or semigroups List concatenation Endomorphism composition. Now, let us define a quasi-property of an operation as a property that holds up to an equivalence relation. For example, list concatenation is quasi-commutative if we consider lists of equal length or with identical contents up to permutation to be equivalent. Here are some quasi-monoids and quasi-commutative monoids and semigroups: Any (a+b = a or b, if we consider all elements of the carrier set to be equivalent) Any satisfying predicate (a+b = the one of a and b that is non-null and satisfies some predicate P, if none does then null; if we consider all elements satisfying P equivalent) Bounded mixture of random samples (xs+ys = a random sample of size N from the concatenation of xs and ys; if we consider any two samples with the same distribution as the whole dataset to be equivalent) Bounded mixture of weighted random samples Which others do exist?

Read the article

Counting computers for each lab

- by Irvin

Alright I have a problem with having to count PCs, and Macs from different labs. In each lab I need to display how many PC and Macs there is available. The data is coming from a SQL server, right am trying sub queries and the use of union, this the closest I can get to what I need. The query below shows me the number of PCs, and Macs in two different columns, but of course, the PCs will be in one row and the Macs on another right below it. Having the lab come up twice. EX: LabName -- PC / MAC Lab1 -- 5 / 0 Lab1 -- 0 / 2 Query SELECT Labs.LabName, COUNT(*),0 AS Mac FROM HardWare INNER JOIN Labs ON HardWare.LabID = Labs.LabID WHERE ComputerStatus = 'AVAILABLE' GROUP BY Labs.LabName UNION SELECT Labs.LabName, COUNT(*), (SELECT COUNT(Manufacturer)) AS Mac FROM HardWare INNER JOIN Labs ON HardWare.LabID = Labs.LabID WHERE ComputerStatus = 'AVAILABLE' AND Manufacturer = 'Apple' GROUP BY Labs.LabName ORDER BY Labs.LabName So is there any way to get them together in one row as in Lab1 -- 5 / 2 or is there a different way to write the query? anything will be a big help, am pretty much stuck here. Cheers

Read the article

What is a data structure for quickly finding non-empty intersections of a list of sets?

- by Andrey Fedorov

I have a set of N items, which are sets of integers, let's assume it's ordered and call it I[1..N]. Given a candidate set, I need to find the subset of I which have non-empty intersections with the candidate. So, for example, if: I = [{1,2}, {2,3}, {4,5}] I'm looking to define valid_items(items, candidate), such that: valid_items(I, {1}) == {1} valid_items(I, {2}) == {1, 2} valid_items(I, {3,4}) == {2, 3} I'm trying to optimize for one given set I and a variable candidate sets. Currently I am doing this by caching items_containing[n] = {the sets which contain n}. In the above example, that would be: items_containing = [{}, {1}, {1,2}, {2}, {3}, {3}] That is, 0 is contained in no items, 1 is contained in item 1, 2 is contained in itmes 1 and 2, 2 is contained in item 2, 3 is contained in item 2, and 4 and 5 are contained in item 3. That way, I can define valid_items(I, candidate) = union(items_containing[n] for n in candidate). Is there any more efficient data structure (of a reasonable size) for caching the result of this union? The obvious example of space 2^N is not acceptable, but N or N*log(N) would be.

Read the article

Compiling and using NTL c++ library for Windows

- by Martin Lauridsen

Hi there, I have compiled the NTL inifite precision integer arithmetic library for c++, using Microsoft Visual Studio 2008. I did as explained, on this site, using the Visual Studio interface, rather than from the command prompt. Actually I would rather do it from the command prompt, but I was not sure how to. Anyhow, I got the library compiled, and I now want to compile a program using the library, from the command prompt. The program I am trying to compile, has been tested on a linux system, where I compile it with the following c++ -I/appl/htopopt/Linux_x86_64/NTL-5.4.2/include mpqs.cpp main.cpp -o main -L/appl/htopopt/Linux_x86_64/NTL-5.4.2/lib -lntl -L/appl/htopopt/Linux_x86_64/gmp-4.2.1/lib -lgmp -lm Nevermind the gmp stuff, I dont have that installed on Windows. It is purely an optional thing that will make the NTL run faster. Anyhow, this works fine on linux. Now on Windows I write the following cl /EHsc /I D:\Downloads\WinNTL-5_5_2\include mpqs.cpp main.cpp /link /LIBPATH:"D:\Documents\Visual Studio 2008\Projects\ntl\Debug" But this results in the following errors: mpqs.cpp mpqs.cpp(38) : error C2039: 'find_smooth_vals' : is not a member of 'QS' d:\desktop\qs\mpqs.h(12) : see declaration of 'QS' mpqs.cpp(41) : error C2065: 'M' : undeclared identifier mpqs.cpp(41) : error C2065: 'n' : undeclared identifier mpqs.cpp(42) : error C2065: 'sieve_table' : undeclared identifier mpqs.cpp(42) : error C2228: left of '.size' must have class/struct/union type is ''unknown-type'' mpqs.cpp(43) : error C2065: 'sieve_table' : undeclared identifier mpqs.cpp(44) : error C2065: 'qx_table' : undeclared identifier mpqs.cpp(44) : error C3861: 'test_smoothness': identifier not found mpqs.cpp(45) : error C2065: 'smooth_indices' : undeclared identifier mpqs.cpp(45) : error C2228: left of '.push_back' must have class/struct/union type is ''unknown-type'' main.cpp Generating Code... It is as if, my mpqs.h file is not included into the compilation process? Also I dont understand why it complains about .push_back() for a vector type? Help is much appreciated!

Read the article

SQL: How do I INSERT primary key values from two tables INTO a master table.

- by Stefan

Hello, I would appreciate some help with an SQL statement I really can't get my head around. What I want to do is fairly simple, I need to take the values from two different tables and copy them into an master table when a new row is inserted into one of the two tables. The problem is perhaps best explained like this: I have three tables, productcategories, regioncategories and mastertable. --------------------------- TABLE: PRODUCTCATEGORIES --------------------------- COLUMNS: CODE | DESCRIPTION --------------------------- VALUES: BOOKS | Books --------------------------- --------------------------- TABLE: REGIONCATEGORIES --------------------------- COLUMNS: CODE | DESCRIPTION --------------------------- VALUES: EU | European Union --------------------------- --------------------------- TABLE: MASTERTABLE --------------------------- COLUMNS: REGION | PRODUCT --------------------------- VALUES: EU | BOOKS --------------------------- I want the values to be inserted like this when a new row is created in either productcategories or regioncategories. New row is created. --------------------------- TABLE: PRODUCTCATEGORIES --------------------------- COLUMNS: CODE | DESCRIPTION --------------------------- VALUES: BOOKS | Books --------------------------- VALUES: DVD | DVDs --------------------------- And a SQL statement copies the new values into the mastertable. --------------------------- TABLE: MASTERTABLE --------------------------- COLUMNS: REGION | PRODUCT --------------------------- VALUES: EU | BOOKS --------------------------- VALUES: EU | DVD --------------------------- The same goes if a row is created in the regioncategories. New row. --------------------------- TABLE: REGIONCATEGORIES --------------------------- COLUMNS: CODE | DESCRIPTION --------------------------- VALUES: EU | European Union --------------------------- VALUES: US | United States --------------------------- Copied to the mastertable. --------------------------- TABLE: MASTERTABLE --------------------------- COLUMNS: REGION | PRODUCT --------------------------- VALUES: EU | BOOKS --------------------------- VALUES: EU | DVD --------------------------- VALUES: US | BOOKS --------------------------- VALUES: US | DVD --------------------------- I hope it makes sense. Thanks, Stefan

Read the article

Oracle command hangs when using view for "WHILE x IN..." subquery

- by Calvin Fisher

I'm working on a web service that fetches data from an oracle data source in chunks and passes it back to an indexing/search tool in XML format. I'm the C#/.NET guy, and am kind of fuzzy on parts of Oracle. Our Oracle team gave us the following script to run, and it works well: SELECT ROWID, [columns] FROM [table] WHERE ROWID IN ( SELECT ROWID FROM ( SELECT ROWID FROM [table] WHERE ROWID > '[previous_batch_last_rowid]' ORDER BY ROWID ) WHERE ROWNUM <= 10000 ) ORDER BY ROWID 10,000 rows is an arbitrary but reasonable chunk size and ROWID is sufficiently unique for our purposes to use as a UID since each indexing run hits only one table at a time. Bracketed values are filled in programmatically by the web service. Now we're going to start adding views to the indexing, each of which will union a few separate tables. Since ROWID would no longer function as a unique identifier, they added a column to the views (VIEW_UNIQUE_ID) that concatenates the ROWIDs from the component tables to construct a UID for each union. But this script does not work, even though it follows the same form as the previous one: SELECT VIEW_UNIQUE_ID, [columns] FROM [view] WHERE VIEW_UNIQUE_ID IN ( SELECT VIEW_UNIQUE_ID FROM ( SELECT VIEW_UNIQUE_ID FROM [view] WHERE ROWID > '[previous_batch_last_view_unique_id]' ORDER BY VIEW_UNIQUE_ID ) WHERE ROWNUM <= 10000 ) ORDER BY VIEW_UNIQUE_ID It hangs indefinitely with no response from the Oracle server. I've waited 20+ minutes and the SQLTools dialog box indicating a running query remains the same, with no progress or updates. I've tested each subquery independently and each works fine and takes a very short amount of time (<= 1 second), so the view itself is sound. But as soon as the inner two SELECT queries are added with "WHERE VIEW_UNIQUE_ID IN...", it hangs. Why doesn't this query work for views? In what important way are they not interchangeable here?

Read the article

postgres stored procedure problem

- by easyrider

Hi all, Ich have a problem in postgres function: CREATE OR REPLACE FUNCTION getVar(id bigint) RETURNS TABLE (repoid bigint, suf VARCHAR, nam VARCHAR) AS $$ declare rec record; BEGIN FOR rec IN (WITH RECURSIVE children(repoobjectid,variant_of_object_fk, suffix, variantname) AS ( SELECT repoobjectid, variant_of_object_fk, '' as suffix,variantname FROM b2m.repoobject_tab WHERE repoobjectid = id UNION ALL SELECT repo.repoobjectid, repo.variant_of_object_fk, suffix || '..' , repo.variantname FROM b2m.repoobject_tab repo, children WHERE children.repoobjectid = repo.variant_of_object_fk) SELECT repoobjectid,suffix,variantname FROM children) LOOP RETURN next; END LOOP; RETURN; END; It can be compiled, but if y try to call it select * from getVar(18) I got 8 empty rows with 3 columns. If i execute the following part of procedure with hard-coded id parameter: WITH RECURSIVE children(repoobjectid,variant_of_object_fk, suffix, variantname) AS ( SELECT repoobjectid, variant_of_object_fk, '' as suffix,variantname FROM b2m.repoobject_tab WHERE repoobjectid = 18 UNION ALL SELECT repo.repoobjectid, repo.variant_of_object_fk, suffix || '..' , repo.variantname FROM b2m.repoobject_tab repo, children WHERE children.repoobjectid = repo.variant_of_object_fk) SELECT repoobjectid,suffix,variantname FROM children I got exactly, what i need 8 rows with data: repoobjectid suffix variantname 18 19 .. for IPhone 22 .. for Nokia 23 .... OS 1.0 and so on. What is going wrong ? Please help. Thanx in advance

Read the article

gcc, strict-aliasing, and horror stories

- by Joseph Quinsey

In http://stackoverflow.com/questions/2906365/gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No. This question is broader: do you have any horror stories about gcc and strict-aliasing? Background: Quoting from AndreyT's answer in http://stackoverflow.com/questions/2771023/c99-strict-aliasing-rules-in-c-gcc/2771041#2771041: "Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it." Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html. Some other less-relevant links: http://stackoverflow.com/questions/1225741/performance-impact-of-fno-strict-aliasing http://stackoverflow.com/questions/754929/strict-aliasing http://stackoverflow.com/questions/262379/when-is-char-safe-for-strict-pointer-aliasing http://stackoverflow.com/questions/725138/how-to-detect-strict-aliasing-at-compile-time So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.

Read the article

process semaphores linux - wait

- by coubeatczech

Hi, I try to code a simple program that starts and waits on the system semaphore until it gets terminated by signal. union semun { int val; struct semid_ds *buf; unsigned short int *array; struct seminfo *__buf; }; int main(){ int semaphores = semget(IPC_PRIVATE,1,IPC_CREAT | 0666); union semun arg; arg.val = 0; semctl(semaphores,0,SETVAL,arg); struct sembuf operations[1]; operations[0].sem_num = 0; operations[0].sem_op = -1; operations[0].sem_flg = 0; semop(semaphores,operations,1); fprintf(stderr,"Why?\n"); return 0; } I expect, that everytime this program gets executed, nothing actually happens and it waits on the semaphore, but everytime it goes through the semaphore and writes Why?. Why?

Read the article

Can the Subversion client (svn) derefence symbolic links as if they were files?

- by Ryan B. Lynch

I have a directory on a Linux system that mostly contains symlinks to files on a different filesystem. I'd like to add the directory to a Subversion repository, dereferencing the symlinks in the process (treating them as the files they point to, rather than links). Generally, I'd like to be able to handle any working-copy operations with this behavior, but the 'svn add' command is where it starts, I think. The SVN client utility doesn't appear to have any options related to symlink dereferencing in the working copy. I didn't find any references to this in the manual (http://svnbook.red-bean.com/en/1.5/index.html), either. I found a poster on the SVN users mailing list who asked the same question but never received an answer, here: http://markmail.org/message/ngchfnzlmm43yj7h (That poster ended up using hard links instead of symlinks. That technique is not an option, in my case, because the real underlying files reside on a separate filesystem.) I'm using Subversion v1.6.1 on Fedora 11. For what it's worth, I know that there are alternative tools/techniques that could help approximate this behavior, but which I have to discard for various reasons. I've already considered [and dust-binned] these possibilities: - a "union" mount, merging all of the the directories containing the real files, with the SVN working-copy directory as the "top" layer in the union; - copying/moving the real files to the same filesystem as the SVN working-copy, and using hardlinks instead of symlinks; - non-SVN version control systems. These were all neat ideas, and I'm sure they are good solutions to other problems, but they won't work given the constraints of this environment and situation.

Read the article

Best way to dynamically get column names from oracle tables

- by MNC

Hi, We are using an extractor application that will export data from the database to csv files. Based on some condition variable it extracts data from different tables, and for some conditions we have to use UNION ALL as the data has to be extracted from more than one table. So to satisfy the UNION ALL condition we are using nulls to match the number of columns. Right now all the queries in the system are pre-built based on the condition variable. The problem is whenever there is change in the table projection (i.e new column added, existing column modified, column dropped) we have to manually change the code in the application. Can you please give some suggestions how to extract the column names dynamically so that any changes in the table structure do not require change in the code? My concern is the condition that decides which table to query. The variable condition is like if the condition is A, then load from TableX if the condition is B then load from TableA and TableY. We must know from which table we need to get data. Once we know the table it is straightforward to query the column names from the data dictionary. But there is one more condition, which is that some columns need to be excluded, and these columns are different for each table. I am trying to solve the problem only for dynamically generating the list columns. But my manager told me to make solution on the conceptual level rather than just fixing. This is a very big system with providers and consumers constantly loading and consuming data. So he wanted solution that can be general. So what is the best way for storing condition, tablename, excluded columns? One way is storing in database. Are there any other ways? If yes what is the best? As I have to give at least a couple of ideas before finalizing. Thanks,

Read the article

MySQL SELECT combining 3 SELECTs INTO 1

- by Martin Tóth

Consider following tables in MySQL database: entries: creator_id INT entry TEXT is_expired BOOL other: creator_id INT entry TEXT userdata: creator_id INT name VARCHAR etc... In entries and other, there can be multiple entries by 1 creator. userdata table is read only for me (placed in other database). I'd like to achieve a following SELECT result: +------------+---------+---------+-------+ | creator_id | entries | expired | other | +------------+---------+---------+-------+ | 10951 | 59 | 55 | 39 | | 70887 | 41 | 34 | 108 | | 88309 | 38 | 20 | 102 | | 94732 | 0 | 0 | 86 | ... where entries is equal to SELECT COUNT(entry) FROM entries GROUP BY creator_id, expired is equal to SELECT COUNT(entry) FROM entries WHERE is_expired = 0 GROUP BY creator_id and other is equal to SELECT COUNT(entry) FROM other GROUP BY creator_id. I need this structure because after doing this SELECT, I need to look for user data in the "userdata" table, which I planned to do with INNER JOIN and select desired columns. I solved this problem with selecting "NULL" into column which does not apply for given SELECT: SELECT creator_id, COUNT(any_entry) as entries, COUNT(expired_entry) as expired, COUNT(other_entry) as other FROM ( SELECT creator_id, entry AS any_entry, NULL AS expired_entry, NULL AS other_enry FROM entries UNION SELECT creator_id, NULL AS any_entry, entry AS expired_entry, NULL AS other_enry FROM entries WHERE is_expired = 1 UNION SELECT creator_id, NULL AS any_entry, NULL AS expired_entry, entry AS other_enry FROM other ) AS tTemp GROUP BY creator_id ORDER BY entries DESC, expired DESC, other DESC ; I've left out the INNER JOIN and selecting other columns from userdata table on purpose (my question being about combining 3 SELECTs into 1). Is my idea valid? = Am I trying to use the right "construction" for this? Are these kind of SELECTs possible without creating an "empty" column? (some kind of JOIN) Should I do it "outside the DB": make 3 SELECTs, make some order in it (let's say python lists/dicts) and then do the additional SELECTs for userdata? Solution for a similar question does not return rows where entries and expired are 0. Thank you for your time.

Read the article

How do I send floats in window messages.

- by yngvedh

Hi, What is the best way to send a float in a windows message using c++ casting operators? The reason I ask is that the approach which first occurred to me did not work. For the record I'm using the standard win32 function to send messages: PostWindowMessage(UINT nMsg, WPARAM wParam, LPARAM lParam) What does not work: Using static_cast<WPARAM>() does not work since WPARAM is typedef'ed to UINT_PTR and will do a numeric conversion from float to int, effectively truncating the value of the float. Using reinterpret_cast<WPARAM>() does not work since it is meant for use with pointers and fails with a compilation error. I can think of two workarounds at the moment: Using reinterpret_cast in conjunction with the address of operator: float f = 42.0f; ::PostWindowMessage(WM_SOME_MESSAGE, *reinterpret_cast<WPARAM*>(&f), 0); Using an union: union { WPARAM wParam, float f }; f = 42.0f; ::PostWindowMessage(WM_SOME_MESSAGE, wParam, 0); Which of these are preffered? Are there any other more elegant way of accomplishing this?

Read the article

Need help tuning a SQL statement

- by jeffself

I've got a table that has two fields (custno and custno2) that need to be searched from a query. I didn't design this table, so don't scream at me. :-) I need to find all records where either the custno or custno2 matches the value returned from a query on the same table based on a titleno. In other words, the user types in 1234 for the titleno. My query searches the table to find the custno associated with the titleno. It also looks for the custno2 for that titleno. Then it needs to do a search on the same table for all other records that have either the custno or custno2 returned in the previous search in the custno or custno2 fields for those other records. Here is what I've come up with: SELECT BILLYR, BILLNO, TITLENO, VINID, TAXPAID, DUEDATE, DATEPIF, PROPDESC FROM TRCDBA.BILLSPAID WHERE CUSTNO IN (select custno from trcdba.billspaid where titleno = '1234' union select custno2 from trcdba.billspaid where titleno = '1234' and custno2 != '') OR CUSTNO2 IN (select custno from trcdba.billspaid where titleno = '1234' union select custno2 from trcdba.billspaid where titleno = '1234' and custno2 != '') The query takes about 5-10 seconds to return data. Can it be rewritten to work faster?

Read the article

SSE (SIMD extensions) support in gcc

- by goldenmean

Hi, I see a code as below: include "stdio.h" #define VECTOR_SIZE 4 typedef float v4sf __attribute__ ((vector_size(sizeof(float)*VECTOR_SIZE))); // vector of four single floats typedef union f4vector { v4sf v; float f[VECTOR_SIZE]; } f4vector; void print_vector (f4vector *v) { printf("%f,%f,%f,%f\n", v->f[0], v->f[1], v->f[2], v->f[3]); } int main() { union f4vector a, b, c; a.v = (v4sf){1.2, 2.3, 3.4, 4.5}; b.v = (v4sf){5., 6., 7., 8.}; c.v = a.v + b.v; print_vector(&a); print_vector(&b); print_vector(&c); } This code builds fine and works expectedly using gcc (it's inbuild SSE / MMX extensions and vector data types. this code is doing a SIMD vector addition using 4 single floats. I want to understand in detail what does each keyword/function call on this typedef line means and does: typedef float v4sf __attribute__ ((vector_size(sizeof(float)*VECTOR_SIZE))); What is the vector_size() function return; What is the __attribute__ keyword for Here is the float data type being type defined to vfsf type? I understand the rest part. thanks, -AD

Read the article

Efficiently select top row for each category in the set

- by VladV

I need to select a top row for each category from a known set (somewhat similar to this question). The problem is, how to make this query efficient on the large number of rows. For example, let's create a table that stores temperature recording in several places. CREATE TABLE #t ( placeId int, ts datetime, temp int, PRIMARY KEY (ts, placeId) ) -- insert some sample data SET NOCOUNT ON DECLARE @n int, @ts datetime SELECT @n = 1000, @ts = '2000-01-01' WHILE (@n>0) BEGIN INSERT INTO #t VALUES (@n % 10, @ts, @n % 37) IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts) SET @n = @n - 1 END Now I need to get the latest recording for each of the places 1, 2, 3. This way is efficient, but doesn't scale well (and looks dirty). SELECT * FROM ( SELECT TOP 1 placeId, temp FROM #t WHERE placeId = 1 ORDER BY ts DESC ) t1 UNION ALL SELECT * FROM ( SELECT TOP 1 placeId, temp FROM #t WHERE placeId = 2 ORDER BY ts DESC ) t2 UNION ALL SELECT * FROM ( SELECT TOP 1 placeId, temp FROM #t WHERE placeId = 3 ORDER BY ts DESC ) t3 The following looks better but works much less efficiently (30% vs 70% according to the optimizer). SELECT placeId, ts, temp FROM ( SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum FROM #t WHERE placeId IN (1, 2, 3) ) t WHERE rownum = 1 The problem is, during the latter query execution plan a clustered index scan is performed on #t and 300 rows are retrieved, sorted, numbered, and then filtered, leaving only 3 rows. For the former query three times one row is fetched. Is there a way to perform the query efficiently without lots of unions?

Read the article

workaround for ORA-03113: end-of-file on communication channel

- by Jefferstone

The call to TEST_FUNCTION below fails with "ORA-03113: end-of-file on communication channel". A workaround is presented in TEST_FUNCTION2. I boiled down the code as my actual function is far more complex. Tested on Oracle 11G. Anyone have any idea why the first function fails? CREATE OR REPLACE TYPE "EMPLOYEE" AS OBJECT ( employee_id NUMBER(38), hire_date DATE ); CREATE OR REPLACE TYPE "EMPLOYEE_TABLE" AS TABLE OF EMPLOYEE; CREATE OR REPLACE FUNCTION TEST_FUNCTION RETURN EMPLOYEE_TABLE IS table1 EMPLOYEE_TABLE; table2 EMPLOYEE_TABLE; return_table EMPLOYEE_TABLE; BEGIN SELECT CAST(MULTISET ( SELECT user_id, created FROM all_users WHERE LOWER(username) < 'm' ) AS EMPLOYEE_TABLE) INTO table1 FROM dual; SELECT CAST(MULTISET ( SELECT user_id, created FROM all_users WHERE LOWER(username) >= 'm' ) AS EMPLOYEE_TABLE) INTO table2 FROM dual; SELECT CAST(MULTISET ( SELECT employee_id, hire_date FROM TABLE(table1) UNION SELECT employee_id, hire_date FROM TABLE(table2) ) AS EMPLOYEE_TABLE) INTO return_table FROM dual; RETURN return_table; END TEST_FUNCTION; CREATE OR REPLACE FUNCTION TEST_FUNCTION2 RETURN EMPLOYEE_TABLE IS table1 EMPLOYEE_TABLE; table2 EMPLOYEE_TABLE; return_table EMPLOYEE_TABLE; BEGIN SELECT CAST(MULTISET ( SELECT user_id, created FROM all_users WHERE LOWER(username) < 'm' ) AS EMPLOYEE_TABLE) INTO table1 FROM dual; SELECT CAST(MULTISET ( SELECT user_id, created FROM all_users WHERE LOWER(username) >= 'm' ) AS EMPLOYEE_TABLE) INTO table2 FROM dual; WITH combined AS ( SELECT employee_id, hire_date FROM TABLE(table1) UNION SELECT employee_id, hire_date FROM TABLE(table2) ) SELECT CAST(MULTISET ( SELECT * FROM combined ) AS EMPLOYEE_TABLE) INTO return_table FROM dual; RETURN return_table; END TEST_FUNCTION2; SELECT * FROM TABLE (TEST_FUNCTION()); -- Throws exception ORA-03113. SELECT * FROM TABLE (TEST_FUNCTION2()); -- Works

Read the article

F#: Can't hide a type abbreviation in a signature? Why not?

- by Nels Beckman

In F#, I'd like to have what I see as a fairly standard Abstract Datatype: // in ADT.fsi module ADT type my_Type // in ADT.fs module ADT type my_Type = int In other words, code inside the module knows that my_Type is an int, but code outside does not. However, F# seems to have a restriction where type abbreviations specifically cannot be hidden by a signature. This code gives a compiler error, and the restriction is described here. If my_Type were instead a discriminated union, then there is no compiler error. My question is, why the restriction? I seem to remember being able to do this in SML and Ocaml, and furthermore, isn't this a pretty standard thing to do when creating an abstract datatype? Thanks

Read the article

What's the best way to get a bunch of rows from MySQL if you have an array of integer primary keys?

- by Evan P.

I have a MySQL table with an auto-incremented integer primary key. I want to get a bunch of rows from the table based on an array of integers I have in memory in my program. The array ranges from a handful to about 1000 items. What's the most efficient query syntax to get the rows? I can think of a few: "SELECT * FROM thetable WHERE id IN (1, 2, 3, 4, 5)" (this is what I do now) "SELECT * FROM thetable where id = 1 OR id = 2 OR id = 3" Multiple queries of the form "SELECT * FROM thetable WHERE id = 1". Probably the most friendly to the query cache, but expensive due to having lots of query parsing. A union, like "SELECT * FROM thetable WHERE id = 1 UNION SELECT * FROM thetable WHERE id = 2 ..." I'm not sure if MySQL caches the results of each query; it's also the most verbose format. I think using the NoSQL interface in MySQL 5.6+ would be the most efficient way to do this, but I'm not yet up to MySQL 5.6.

Read the article

merging two tables, while applying aggregates on the duplicates (max,min and sum)

- by cloudraven

I have a table (let's call it log) with a few millions of records. Among the fields I have Id, Count, FirstHit, LastHit. Id - The record id Count - number of times this Id has been reported FirstHit - earliest timestamp with which this Id was reported LastHit - latest timestamp with which this Id was reported This table only has one record for any given Id Everyday I get into another table (let's call it feed) with around half a million records with these fields among many others: Id Timestamp - Entry date and time. This table can have many records for the same id What I want to do is to update log in the following way. Count - log count value, plus the count() of records for that id found in feed FirstHit - the earliest of the current value in log or the minimum value in feed for that id LastHit - the latest of the current value in log or the maximum value in feed for that id. It should be noticed that many of the ids in feed are already in log. The simple thing that worked is to create a temporary table and insert into it the union of both as in Select Id, Min(Timestamp) As FirstHit, MAX(Timestamp) as LastHit, Count(*) as Count FROM feed GROUP BY Id UNION ALL Select Id, FirstHit,LastHit,Count FROM log; From that temporary table I do a select that aggregates Min(firsthit), max(lasthit) and sum(Count) Select Id, Min(FirstHit),Max(LastHit),Sum(Count) FROM @temp GROUP BY Id; and that gives me the end result. I could then delete everything from log and replace it with everything with temp, or craft an update for the common records and insert the new ones. However, I think both are highly inefficient. Is there a more efficient way of doing this. Perhaps doing the update in place in the log table?

Read the article

Python: Behavior of object in set operations

- by Josh Arenberg

I'm trying to create a custom object that behaves properly in set operations. I've generally got it working, but I want to make sure I fully understand the implications. In particular, I'm interested in the behavior when there is additional data in the object that is not included in the equal / hash methods. It seems that in the 'intersection' operation, it returns the set of objects that are being compared to, where the 'union' operations returns the set of objects that are being compared. To illustrate: class MyObject: def __init__(self,value,meta): self.value = value self.meta = meta def __eq__(self,other): if self.value == other.value: return True else: return False def __hash__(self): return hash(self.value) a = MyObject('1','left') b = MyObject('1','right') c = MyObject('2','left') d = MyObject('2','right') e = MyObject('3','left') print a == b # True print a == c # False for i in set([a,c,e]).intersection(set([b,d])): print "%s %s" % (i.value,i.meta) #returns: #1 right #2 right for i in set([a,c,e]).union(set([b,d])): print "%s %s" % (i.value,i.meta) #returns: #1 left #3 left #2 left Is this behavior documented somewhere and deterministic? If so, what is the governing principle?

Read the article

Inheritance Mapping Strategies with Entity Framework Code First CTP5: Part 3 – Table per Concrete Type (TPC) and Choosing Strategy Guidelines

- by mortezam

This is the third (and last) post in a series that explains different approaches to map an inheritance hierarchy with EF Code First. I've described these strategies in previous posts: Part 1 – Table per Hierarchy (TPH) Part 2 – Table per Type (TPT)In today’s blog post I am going to discuss Table per Concrete Type (TPC) which completes the inheritance mapping strategies supported by EF Code First. At the end of this post I will provide some guidelines to choose an inheritance strategy mainly based on what we've learned in this series. TPC and Entity Framework in the Past Table per Concrete type is somehow the simplest approach suggested, yet using TPC with EF is one of those concepts that has not been covered very well so far and I've seen in some resources that it was even discouraged. The reason for that is just because Entity Data Model Designer in VS2010 doesn't support TPC (even though the EF runtime does). That basically means if you are following EF's Database-First or Model-First approaches then configuring TPC requires manually writing XML in the EDMX file which is not considered to be a fun practice. Well, no more. You'll see that with Code First, creating TPC is perfectly possible with fluent API just like other strategies and you don't need to avoid TPC due to the lack of designer support as you would probably do in other EF approaches. Table per Concrete Type (TPC)In Table per Concrete type (aka Table per Concrete class) we use exactly one table for each (nonabstract) class. All properties of a class, including inherited properties, can be mapped to columns of this table, as shown in the following figure: As you can see, the SQL schema is not aware of the inheritance; effectively, we’ve mapped two unrelated tables to a more expressive class structure. If the base class was concrete, then an additional table would be needed to hold instances of that class. I have to emphasize that there is no relationship between the database tables, except for the fact that they share some similar columns. TPC Implementation in Code First Just like the TPT implementation, we need to specify a separate table for each of the subclasses. We also need to tell Code First that we want all of the inherited properties to be mapped as part of this table. In CTP5, there is a new helper method on EntityMappingConfiguration class called MapInheritedProperties that exactly does this for us. Here is the complete object model as well as the fluent API to create a TPC mapping: public abstract class BillingDetail { public int BillingDetailId { get; set; } public string Owner { get; set; } public string Number { get; set; } } public class BankAccount : BillingDetail { public string BankName { get; set; } public string Swift { get; set; } } public class CreditCard : BillingDetail { public int CardType { get; set; } public string ExpiryMonth { get; set; } public string ExpiryYear { get; set; } } public class InheritanceMappingContext : DbContext { public DbSet<BillingDetail> BillingDetails { get; set; } protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<BankAccount>().Map(m => { m.MapInheritedProperties(); m.ToTable("BankAccounts"); }); modelBuilder.Entity<CreditCard>().Map(m => { m.MapInheritedProperties(); m.ToTable("CreditCards"); }); } } The Importance of EntityMappingConfiguration ClassAs a side note, it worth mentioning that EntityMappingConfiguration class turns out to be a key type for inheritance mapping in Code First. Here is an snapshot of this class: namespace System.Data.Entity.ModelConfiguration.Configuration.Mapping { public class EntityMappingConfiguration<TEntityType> where TEntityType : class { public ValueConditionConfiguration Requires(string discriminator); public void ToTable(string tableName); public void MapInheritedProperties(); } } As you have seen so far, we used its Requires method to customize TPH. We also used its ToTable method to create a TPT and now we are using its MapInheritedProperties along with ToTable method to create our TPC mapping. TPC Configuration is Not Done Yet!We are not quite done with our TPC configuration and there is more into this story even though the fluent API we saw perfectly created a TPC mapping for us in the database. To see why, let's start working with our object model. For example, the following code creates two new objects of BankAccount and CreditCard types and tries to add them to the database: using (var context = new InheritanceMappingContext()) { BankAccount bankAccount = new BankAccount(); CreditCard creditCard = new CreditCard() { CardType = 1 }; context.BillingDetails.Add(bankAccount); context.BillingDetails.Add(creditCard); context.SaveChanges(); } Running this code throws an InvalidOperationException with this message: The changes to the database were committed successfully, but an error occurred while updating the object context. The ObjectContext might be in an inconsistent state. Inner exception message: AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges. The reason we got this exception is because DbContext.SaveChanges() internally invokes SaveChanges method of its internal ObjectContext. ObjectContext's SaveChanges method on its turn by default calls AcceptAllChanges after it has performed the database modifications. AcceptAllChanges method merely iterates over all entries in ObjectStateManager and invokes AcceptChanges on each of them. Since the entities are in Added state, AcceptChanges method replaces their temporary EntityKey with a regular EntityKey based on the primary key values (i.e. BillingDetailId) that come back from the database and that's where the problem occurs since both the entities have been assigned the same value for their primary key by the database (i.e. on both BillingDetailId = 1) and the problem is that ObjectStateManager cannot track objects of the same type (i.e. BillingDetail) with the same EntityKey value hence it throws. If you take a closer look at the TPC's SQL schema above, you'll see why the database generated the same values for the primary keys: the BillingDetailId column in both BankAccounts and CreditCards table has been marked as identity. How to Solve The Identity Problem in TPC As you saw, using SQL Server’s int identity columns doesn't work very well together with TPC since there will be duplicate entity keys when inserting in subclasses tables with all having the same identity seed. Therefore, to solve this, either a spread seed (where each table has its own initial seed value) will be needed, or a mechanism other than SQL Server’s int identity should be used. Some other RDBMSes have other mechanisms allowing a sequence (identity) to be shared by multiple tables, and something similar can be achieved with GUID keys in SQL Server. While using GUID keys, or int identity keys with different starting seeds will solve the problem but yet another solution would be to completely switch off identity on the primary key property. As a result, we need to take the responsibility of providing unique keys when inserting records to the database. We will go with this solution since it works regardless of which database engine is used. Switching Off Identity in Code First We can switch off identity simply by placing DatabaseGenerated attribute on the primary key property and pass DatabaseGenerationOption.None to its constructor. DatabaseGenerated attribute is a new data annotation which has been added to System.ComponentModel.DataAnnotations namespace in CTP5: public abstract class BillingDetail { [DatabaseGenerated(DatabaseGenerationOption.None)] public int BillingDetailId { get; set; } public string Owner { get; set; } public string Number { get; set; } } As always, we can achieve the same result by using fluent API, if you prefer that: modelBuilder.Entity<BillingDetail>() .Property(p => p.BillingDetailId) .HasDatabaseGenerationOption(DatabaseGenerationOption.None); Working With The Object Model Our TPC mapping is ready and we can try adding new records to the database. But, like I said, now we need to take care of providing unique keys when creating new objects: using (var context = new InheritanceMappingContext()) { BankAccount bankAccount = new BankAccount() { BillingDetailId = 1 }; CreditCard creditCard = new CreditCard() { BillingDetailId = 2, CardType = 1 }; context.BillingDetails.Add(bankAccount); context.BillingDetails.Add(creditCard); context.SaveChanges(); } Polymorphic Associations with TPC is Problematic The main problem with this approach is that it doesn’t support Polymorphic Associations very well. After all, in the database, associations are represented as foreign key relationships and in TPC, the subclasses are all mapped to different tables so a polymorphic association to their base class (abstract BillingDetail in our example) cannot be represented as a simple foreign key relationship. For example, consider the the domain model we introduced here where User has a polymorphic association with BillingDetail. This would be problematic in our TPC Schema, because if User has a many-to-one relationship with BillingDetail, the Users table would need a single foreign key column, which would have to refer both concrete subclass tables. This isn’t possible with regular foreign key constraints. Schema Evolution with TPC is Complex A further conceptual problem with this mapping strategy is that several different columns, of different tables, share exactly the same semantics. This makes schema evolution more complex. For example, a change to a base class property results in changes to multiple columns. It also makes it much more difficult to implement database integrity constraints that apply to all subclasses. Generated SQLLet's examine SQL output for polymorphic queries in TPC mapping. For example, consider this polymorphic query for all BillingDetails and the resulting SQL statements that being executed in the database: var query = from b in context.BillingDetails select b; Just like the SQL query generated by TPT mapping, the CASE statements that you see in the beginning of the query is merely to ensure columns that are irrelevant for a particular row have NULL values in the returning flattened table. (e.g. BankName for a row that represents a CreditCard type). TPC's SQL Queries are Union Based As you can see in the above screenshot, the first SELECT uses a FROM-clause subquery (which is selected with a red rectangle) to retrieve all instances of BillingDetails from all concrete class tables. The tables are combined with a UNION operator, and a literal (in this case, 0 and 1) is inserted into the intermediate result; (look at the lines highlighted in yellow.) EF reads this to instantiate the correct class given the data from a particular row. A union requires that the queries that are combined, project over the same columns; hence, EF has to pad and fill up nonexistent columns with NULL. This query will really perform well since here we can let the database optimizer find the best execution plan to combine rows from several tables. There is also no Joins involved so it has a better performance than the SQL queries generated by TPT where a Join is required between the base and subclasses tables. Choosing Strategy GuidelinesBefore we get into this discussion, I want to emphasize that there is no one single "best strategy fits all scenarios" exists. As you saw, each of the approaches have their own advantages and drawbacks. Here are some rules of thumb to identify the best strategy in a particular scenario: If you don’t require polymorphic associations or queries, lean toward TPC—in other words, if you never or rarely query for BillingDetails and you have no class that has an association to BillingDetail base class. I recommend TPC (only) for the top level of your class hierarchy, where polymorphism isn’t usually required, and when modification of the base class in the future is unlikely. If you do require polymorphic associations or queries, and subclasses declare relatively few properties (particularly if the main difference between subclasses is in their behavior), lean toward TPH. Your goal is to minimize the number of nullable columns and to convince yourself (and your DBA) that a denormalized schema won’t create problems in the long run. If you do require polymorphic associations or queries, and subclasses declare many properties (subclasses differ mainly by the data they hold), lean toward TPT. Or, depending on the width and depth of your inheritance hierarchy and the possible cost of joins versus unions, use TPC. By default, choose TPH only for simple problems. For more complex cases (or when you’re overruled by a data modeler insisting on the importance of nullability constraints and normalization), you should consider the TPT strategy. But at that point, ask yourself whether it may not be better to remodel inheritance as delegation in the object model (delegation is a way of making composition as powerful for reuse as inheritance). Complex inheritance is often best avoided for all sorts of reasons unrelated to persistence or ORM. EF acts as a buffer between the domain and relational models, but that doesn’t mean you can ignore persistence concerns when designing your classes. SummaryIn this series, we focused on one of the main structural aspect of the object/relational paradigm mismatch which is inheritance and discussed how EF solve this problem as an ORM solution. We learned about the three well-known inheritance mapping strategies and their implementations in EF Code First. Hopefully it gives you a better insight about the mapping of inheritance hierarchies as well as choosing the best strategy for your particular scenario. Happy New Year and Happy Code-Firsting! References ADO.NET team blog Java Persistence with Hibernate book a { color: #5A99FF; } a:visited { color: #5A99FF; } .title { padding-bottom: 5px; font-family: Segoe UI; font-size: 11pt; font-weight: bold; padding-top: 15px; } .code, .typeName { font-family: consolas; } .typeName { color: #2b91af; } .padTop5 { padding-top: 5px; } .padTop10 { padding-top: 10px; } .exception { background-color: #f0f0f0; font-style: italic; padding-bottom: 5px; padding-left: 5px; padding-top: 5px; padding-right: 5px; }

Read the article

C#/.NET Little Wonders: The Useful But Overlooked Sets

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>. Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items. In other words, the set {2,3,5} is identical to the set {3,5,2}. In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2). In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both. Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set. Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets. Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation? The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order. Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for. Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list. If it is, great, we already have that market data feed! If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it. This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6: 7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4: 5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13: 14: // place the order logic! 15: } What’s wrong with this? In short: performance! Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion. Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long. Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection. Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage. So performing a sort after each add would probably add more time. As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant. The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface. As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>. That is, a Dictionary where the TKey and TValue refer to the same object. This is really an oversimplification, but logically it makes sense. I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups! Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n) Now, these times are amortized and represent the typical case. In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains! This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection. Compare this to the List where almost any add or remove must rearrange potentially all the elements! And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T> is change our declaration. Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4: 5: // ... 6: 7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22: 23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>. Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order. Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally. Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes. There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>. Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n) The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error. There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity. Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection. Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection. This is very good performance! It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000 Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count. This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search). So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets. Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets. Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets. Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets. You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order. That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it. This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection. Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value. In addition, if you have need for the mathematical set operations, the C# sets support those as well. The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order. In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance. Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

Search Results

Search found 765 results on 31 pages for 'discriminated union'.

Page 15/31 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >

- by cooky451

- by Ben Aston

- by jkff

- by Irvin

- by Andrey Fedorov

- by Martin Lauridsen

- by Stefan

- by Calvin Fisher

- by easyrider

- by Joseph Quinsey

- by coubeatczech

- by Ryan B. Lynch

- by MNC

- by Martin Tóth

- by yngvedh

- by jeffself

- by goldenmean

- by VladV

- by Jefferstone

- by Nels Beckman

- by Evan P.

- by cloudraven

- by Josh Arenberg

- by mortezam

- by James Michael Hare

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >