Search Results

Search found 132 results on 6 pages for 'granularity'.

Page 4/6 | < Previous Page | 1 2 3 4 5 6 | Next Page >

How granular should a command be in a CQ[R]S model?

- by Aaronaught

I'm considering a project to migrate part of our WCF-based SOA over to a service bus model (probably nServiceBus) and using some basic pub-sub to achieve Command-Query Separation. I'm not new to SOA, or even to service bus models, but I confess that until recently my concept of "separation" was limited to run-of-the-mill database mirroring and replication. Still, I'm attracted to the idea because it seems to provide all the benefits of an eventually-consistent system while sidestepping many of the obvious drawbacks (most notably the lack of proper transactional support). I've read a lot on the subject from Udi Dahan who is basically the guru on ESB architectures (at least in the Microsoft world), but one thing he says really puzzles me: As we get larger entities with more fields on them, we also get more actors working with those same entities, and the higher the likelihood that something will touch some attribute of them at any given time, increasing the number of concurrency conflicts. [...] A core element of CQRS is rethinking the design of the user interface to enable us to capture our users’ intent such that making a customer preferred is a different unit of work for the user than indicating that the customer has moved or that they’ve gotten married. Using an Excel-like UI for data changes doesn’t capture intent, as we saw above. -- Udi Dahan, Clarified CQRS From the perspective described in the quotation, it's hard to argue with that logic. But it seems to go against the grain with respect to SOAs. An SOA (and really services in general) are supposed to deal with coarse-grained messages so as to minimize network chatter - among many other benefits. I realize that network chatter is less of an issue when you've got highly-distributed systems with good message queuing and none of the baggage of RPC, but it doesn't seem wise to dismiss the issue entirely. Udi almost seems to be saying that every attribute change (i.e. field update) ought to be its own command, which is hard to imagine in the context of one user potentially updating hundreds or thousands of combined entities and attributes as it often is with a traditional web service. One batch update in SQL Server may take a fraction of a second given a good highly-parameterized query, table-valued parameter or bulk insert to a staging table; processing all of these updates one at a time is slow, slow, slow, and OLTP database hardware is the most expensive of all to scale up/out. Is there some way to reconcile these competing concerns? Am I thinking about it the wrong way? Does this problem have a well-known solution in the CQS/ESB world? If not, then how does one decide what the "right level" of granularity in a Command should be? Is there some "standard" one can use as a starting point - sort of like 3NF in databases - and only deviate when careful profiling suggests a potentially significant performance benefit? Or is this possibly one of those things that, despite several strong opinions being expressed by various experts, is really just a matter of opinion?

Read the article
What's the best way to cache a growing database table for html generation?

- by McLeopold

I've got a database table which will grow in size by about 5000 rows a hour. For a key that I would be querying by, the query will grow in size by about 1 row every hour. I would like a web page to show the latest rows for a key, 50 at a time (this is configurable). I would like to try and implement memcache to keep database activity low for reads. If I run a query and create a cache result for each page of 50 results, that would work until a new entry is added. At that time, the page of latest results gets new result and the oldest results drops off. This cascades down the list of cached pages causing me to update every cache result. It seems like a poor design. I could build the cache pages backwards, then for each page requested I should get the latest 2 pages and truncate to the proper length of 50. I'm not sure if this is good or bad? Ideally, the mechanism I use to insert a new row would also know how to invalidate the proper cache results. Has someone already solved this problem in a widely acceptable way? What's the best method of doing this? EDIT: If my understanding of the MYSQL query cache is correct, it has table level granularity in invalidation. Given the fact that I have about 5000 updates before a query on a key should need to be invalidated, it seems that the database query cache would not be used. MS SQL caches execution plans and frequently accessed data pages, so it may do better in this scenario. My query is not against a single table with TOP N. One version has joins to several tables and another has sub-selects. Also, since I want to cache the html generated table, I'm wondering if a cache at the web server level would be appropriate? Is there really no benefit to any type of caching? Is the best advice really to just allow a website site query to go through all the layers and hit the database every request?

Read the article
Database types for customer analytics

- by Drewdavid

I am exploring a paid solution to start providing better embedded, dashboard-style analytics information to our website customers/account holders, but would like to also offer an in-house development option to our team. The more equipped I am with specifics (such as the subject of this question), the better the adoption rate from the team (or so I have found), regardless of the path we choose Would anyone care to summarize a couple of options for a fast and scalable database type through which we would provide the following: • Daily pageviews to a users account pages (users have between 1 and 1000 pages) • Some calculated/compounded metrics (such as conversion rate, i.e. certain page type viewed to contact form thank you page ratio) • We have about 1,500 members (will need room to grow); the number of concurrently logged in users will for the question's sake be 50 I ask because our developer has balked at providing this level of "over time" granularity (i.e. daily) due to the number of space it would take up in a MYSQL database To avoid a downvote I have asked specifically for more than one option, realizing that different people will have different solutions. I will make amendments to my question if so guided by answering parties Thank you for sharing your valued answers :)

Read the article
Multiple *NIX Accounts with Identical UID

- by Tim

I am curious whether there is a standard expected behavior and whether it is considered bad practice when creating more than one account on Linux/Unix that have the same UID. I've done some testing on RHEL5 with this and it behaved as I expected, but I don't know if I'm tempting fate using this trick. As an example, let's say I have two accounts with the same IDs: a1:$1$4zIl1:5000:5000::/home/a1:/bin/bash a2:$1$bmh92:5000:5000::/home/a2:/bin/bash What this means is: I can log in to each account using its own password. Files I create will have the same UID. Tools such as "ls -l" will list the UID as the first entry in the file (a1 in this case). I avoid any permissions or ownership problems between the two accounts because they are really the same user. I get login auditing for each account, so I have better granularity into tracking what is happening on the system. So my questions are: Is this ability designed or is it just the way it happens to work? Is this going to be consistent across *nix variants? Is this accepted practice? Are there unintended consequences to this practice? Note, the idea here is to use this for system accounts and not normal user accounts.

Read the article
De-duplicating backup tool on a block basis? [closed]

- by SST

I am looking for an (ideally free as in speech or beer) backup tool for Unix-like OS which can store deduplicated backups, i.e. only nonredundant content takes up additional space. I already looked at dirvish (my first candidate) and rsnapshot which use hardlinks to achieve deduplication on a per-file level. However, as I want to back up large files (Thunderbird mailboxes 3GB, VMware images 10GB), such file are stored again entirely even if just a few bytes change. Then there are rsync-based tools like rdiff-backup which only store deltas and a current mirror. However, as the deltas are generated against each previous mirror, it is difficult to fine-tune the retention granularity (only keep one backup after a week, etc.) because the deltas would have to be re-evaluated. Another approach is to partition content into blocks and store each block only if it is not stored yet, otherwise just linking it to the first occurrence. The only tool I know of that does this by now is obnam (http://liw.fi/obnam), and it even supports zlib-compression and gpg-encryption -- nice! But it is very slow, AFAICT. Does any one know any other, solid backup software which supports deduplication on a sub-file level, ideally with at least some management options (show/select/delete generations...)?

Read the article
Problems setting up VLC Sever/client streaming

- by Ayos

I'm trying to set up a Linux machine as the server and a Windows XP machine as the client. Both machines are connected to the same local network via a Wi-Fi router. I setup the stream with the following properties : http stream port 8080 play locally And not much else. No firewall on the windows client(Windows firewall is disabled) When I try to open network stream via the client machine(Using VLC or Windows Media Player) I get the following errors: Media Player error code : 0xC00D11B3: Encountered a network Problem. VLC Console: main warning: connection timed out access_mms error: cannot connect to 192.168.1.3:8080 main debug: no access module matching "http" could be loaded main debug: TIMER module_need() : 12625.810 ms - Total 12625.810 ms / 1 intvls (Avg 12625.809 ms) main error: open of `http://192.168.1.3:8080' failed main debug: dead input main debug: repeating item main debug: starting playback of the new playlist item main debug: resyncing on http://192.168.1.3:8080 main debug: http://192.168.1.3:8080 is at 0 main debug: creating new input thread main debug: Creating an input for 'http://192.168.1.3:8080' main debug: using timeshift granularity of 50 MiB, in path 'C:\DOCUME~1\Accer\LOCALS~1\Temp' main debug: `http://192.168.1.3:8080' gives access `http' demux `' path `192.168.1.3:8080' main debug: creating demux: access='http' demux='' location='192.168.1.3:8080' file='\\192.168.1.3:8080' main debug: looking for access_demux module: 0 candidates main debug: no access_demux module matched "http" main debug: TIMER module_need() : 0.461 ms - Total 0.461 ms / 1 intvls (Avg 0.461 ms) main debug: creating access 'http' location='192.168.1.3:8080', path='\\192.168.1.3:8080' main debug: looking for access module: 2 candidates access_http debug: http: server='192.168.1.3' port=8080 file='' main debug: net: connecting to 192.168.1.3 port 8080 qt4 debug: IM: Deleting the input main debug: TIMER input launching for 'http://192.168.1.3:8080' : 13397.979 ms - Total 13397.979 ms / 1 intvls (Avg 13397.978 ms) qt4 debug: IM: Setting an input Need Help. Thanks in advance.

Read the article
What's up with LDoms: Part 9 - Direct IO

- by Stefan Hinker

In the last article of this series, we discussed the most general of all physical IO options available for LDoms, root domains. Now, let's have a short look at the next level of granularity: Virtualizing individual PCIe slots. In the LDoms terminology, this feature is called "Direct IO" or DIO. It is very similar to root domains, but instead of reassigning ownership of a complete root complex, it only moves a single PCIe slot or endpoint device to a different domain. Let's look again at hardware available to mars in the original configuration: root@sun:~# ldm ls-io NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ pci_0 BUS pci_0 primary pci_1 BUS pci_1 primary pci_2 BUS pci_2 primary pci_3 BUS pci_3 primary /SYS/MB/PCIE1 PCIE pci_0 primary EMP /SYS/MB/SASHBA0 PCIE pci_0 primary OCC /SYS/MB/NET0 PCIE pci_0 primary OCC /SYS/MB/PCIE5 PCIE pci_1 primary EMP /SYS/MB/PCIE6 PCIE pci_1 primary EMP /SYS/MB/PCIE7 PCIE pci_1 primary EMP /SYS/MB/PCIE2 PCIE pci_2 primary EMP /SYS/MB/PCIE3 PCIE pci_2 primary OCC /SYS/MB/PCIE4 PCIE pci_2 primary EMP /SYS/MB/PCIE8 PCIE pci_3 primary EMP /SYS/MB/SASHBA1 PCIE pci_3 primary OCC /SYS/MB/NET2 PCIE pci_3 primary OCC /SYS/MB/NET0/IOVNET.PF0 PF pci_0 primary /SYS/MB/NET0/IOVNET.PF1 PF pci_0 primary /SYS/MB/NET2/IOVNET.PF0 PF pci_3 primary /SYS/MB/NET2/IOVNET.PF1 PF pci_3 primary All of the "PCIE" type devices are available for SDIO, with a few limitations. If the device is a slot, the card in that slot must support the DIO feature. The documentation lists all such cards. Moving a slot to a different domain works just like moving a PCI root complex. Again, this is not a dynamic process and includes reboots of the affected domains. The resulting configuration is nicely shown in a diagram in the Admin Guide: There are several important things to note and consider here: The domain receiving the slot/endpoint device turns into an IO domain in LDoms terminology, because it now owns some physical IO hardware. Solaris will create nodes for this hardware under /devices. This includes entries for the virtual PCI root complex (pci_0 in the diagram) and anything between it and the actual endpoint device. It is very important to understand that all of this PCIe infrastructure is virtual only! Only the actual endpoint devices are true physical hardware. There is an implicit dependency between the guest owning the endpoint device and the root domain owning the real PCIe infrastructure: Only if the root domain is up and running, will the guest domain have access to the endpoint device. The root domain is still responsible for resetting and configuring the PCIe infrastructure (root complex, PCIe level configurations, error handling etc.) because it owns this part of the physical infrastructure. This also means that if the root domain needs to reset the PCIe root complex for any reason (typically a reboot of the root domain) it will reset and thus disrupt the operation of the endpoint device owned by the guest domain. The result in the guest is not predictable. I recommend to configure the resulting behaviour of the guest using domain dependencies as described in the Admin Guide in Chapter "Configuring Domain Dependencies". Please consult the Admin Guide in Section "Creating an I/O Domain by Assigning PCIe Endpoint Devices" for all the details! As you can see, there are several restrictions for this feature. It was introduced in LDoms 2.0, mainly to allow the configuration of guest domains that need access to tape devices. Today, with the higher number of PCIe root complexes and the availability of SR-IOV, the need to use this feature is declining. I personally do not recommend to use it, mainly because of the drawbacks of the depencies on the root domain and because it can be replaced with SR-IOV (although then with similar limitations). This was a rather short entry, more for completeness. I believe that DIO can usually be replaced by SR-IOV, which is much more flexible. I will cover SR-IOV in the next section of this blog series.

Read the article
Math with Timestamp

- by Knut Vatsendvik

table.sql { border-width: 1px; border-spacing: 2px; border-style: dashed; border-color: #0023ff; border-collapse: separate; background-color: white; } table.sql th { border-width: 1px; padding: 1px; border-style: none; border-color: gray; background-color: white; -moz-border-radius: 0px 0px 0px 0px; } table.sql td { border-width: 1px; padding: 3px; border-style: none; border-color: gray; background-color: white; -moz-border-radius: 0px 0px 0px 0px; } .sql-keyword { color: #0000cd; background-color: inherit; } .sql-result { color: #458b74; background-color: inherit; } Got this little SQL quiz from a colleague. How to add or subtract exactly 1 second from a Timestamp? Sounded simple enough at first blink, but was a bit trickier than expected. If the data type had been a Date, we knew that we could add or subtract days, minutes or seconds using + or – sysdate + 1 to add one day sysdate - (1 / 24) to subtract one hour sysdate + (1 / 86400) to add one second Would the same arithmetic work with Timestamp as with Date? Let’s test it out with the following query SELECT systimestamp , systimestamp + (1 / 86400) FROM dual; ---------- 03.05.2010 22.11.50,240887 +02:00 03.05.2010 The first result line shows us the system time down to fractions of seconds. The second result line shows the result as Date (as used for date calculation) meaning now that the granularity is reduced down to a second. By using the PL/SQL dump() function, we can confirm this with the following query SELECT dump(systimestamp) , dump(systimestamp + (1 / 86400)) FROM dual; ---------- Typ=188 Len=20: 218,7,5,4,8,53,9,0,200,46,89,20,2,0,5,0,0,0,0,0 Typ=13 Len=8: 218,7,5,4,10,53,10,0 Where typ=13 is a runtime representation for Date. So how can we increase the precision to include fractions of second? After investigating it a bit, we found out that the interval data type INTERVAL DAY TO SECOND could be used with the result of addition or subtraction being a Timestamp. Let’s try again our first query again, now using the interval data type. SELECT systimestamp, systimestamp + INTERVAL '0 00:00:01.0' DAY TO SECOND(1) FROM dual; ---------- 03.05.2010 22.58.32,723659000 +02:00 03.05.2010 22.58.33,723659000 +02:00 Yes, it worked! To finish the story, here is one example showing how to specify an interval of 2 days, 6 hours, 30 minutes, 4 seconds and 111 thousands of a second. INTERVAL ‘2 6:30:4.111’ DAY TO SECOND(3)

Read the article
Impressions of Pivotal Tracker

Pivotal Tracker is a free, online agile project management system. Ive been using it recently to better communicate to customers about the current state of our project. In Pivotal Tracker, the unit of work is a story and stories are arranged into iterations or delivery cycles. Stories can be any level of granularity you want, but the idea is to use stories to communicate clearly to customers, so you dont want to write a novel. You especially dont want to write a list of detailed programming tasks. A good story for a point of sale system might be: Allow managers to override the price of an item while ringing up a customer. A less useful story: Script out the process of adding a manager flag to the user table and stage that script into the deploy directory. Stories are estimated using a point scale, by default 1, 2 or 3. Iterations are then automatically laid out by combining enough tasks to fill the point total for that period of time. You have to start with a guess on how many points your team can do in an iteration, then adjust with real data as you complete iterations. This is basic agile methodology, but where Pivotal Tracker adds value is that it automatically and graphically lays out iterations for you on your project site. This makes communication and planning easy. Compiling release notes is no longer painful as it has been clear from the outset what work is going on. While I much prefer Pivotal Trackers customer facing interface over what we used previously (TFS), I see a couple of gaps. First, I have not able to make much headway with the reporting tools. Despite my complaints about TFS, it can produce some nice reports. Second, its not clear where if at all, Id keep track of purely internal tasks. Im talking about server maintenance, cleaning up source control, checking back on some code which you never quite felt right about. Theres no purpose in cluttering up an iteration backlog with these items, but if you dont track them, you lose them. Im not sure what a good answer for that is. One gap I thought Id see, which I dont, is more granular dev tasks. If Im implementing a story, Ill write out the steps and track my progress, but really, those steps arent useful to anybody but me. The only time Ive found that level of detail really useful is when my tasks are defined at too high a level anyway or when Im working with someone who needs more coaching and might not be able to finish a story in time without some scaffolding to get them going. You can learn more about Pivotal Tracker at: http://www.pivotaltracker.com/learnmore. --- Relevant Links --- A good intro to stories: http://www.agilemodeling.com/artifacts/userStory.htmDid you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
New Analytic settings for the new code

- by Steve Tunstall

If you have upgraded to the new 2011.1.3.0 code, you may find some very useful settings for the Analytics. If you didn't already know, the analytic datasets have the potential to fill up your OS hard drives. The more datasets you use and create, that faster this can happen. Since they take a measurement every second, forever, some of these metrics can get in the multiple GB size in a matter of weeks. The traditional 'fix' was that you had to go into Analytics -> Datasets about once a month and clean up the largest datasets. You did this by deleting them. Ouch. Now you lost all of that historical data that you might have wanted to check out many months from now. Or, you had to export each metric individually to a CSV file first. Not very easy or fun. You could also suspend a dataset, and have it not collect data at all. Well, that fixed the problem, didn't it? of course you now had no data to go look at. Hmmmm.... All of this is no longer a concern. Check out the new Settings tab under Analytics... Now, I can tell the ZFSSA to keep every second of data for, say, 2 weeks, and then average those 60 seconds of each minute into a single 'minute' value. I can go even further and ask it to average those 60 minutes of data into a single 'hour' value. This allows me to effectively shrink my older datasets by a factor of 1/3600 !!! Very cool. I can now allow my datasets to go forever, and really never have to worry about them filling up my OS drives. That's great going forward, but what about those huge datasets you already have? No problem. Another new feature in 2011.1.3.0 is the ability to shrink the older datasets in the same way. Check this out. I have here a dataset called "Disk: I/O opps per second" that is about 6.32M on disk (You need not worry so much about the "In Core" value, as that is in RAM, and it fluctuates all the time. Once you stop viewing a particular metric, you will see that shrink over time, just relax). When one clicks on the trash can icon to the right of the dataset, it used to delete the whole thing, and you would have to re-create it from scratch to get the data collecting again. Now, however, it gives you this prompt: As you can see, this allows you to once again shrink the dataset by averaging the second data into minutes or hours. Here is my new dataset size after I do this. So it shrank from 6.32MB down to 2.87MB, but i can still see my metrics going back to the time I began the dataset. Now, you do understand that once you do this, as you look back in time to the minute or hour data metrics, that you are going to see much larger time values, right? You will need to decide what size of granularity you can live with, and for how long. Check this out. Here is my Disk: Percent utilized from 5-21-2012 2:42 pm to 4:22 pm: After I went through the delete process to change everything older than 1 week to "Minutes", the same date and time looks like this: Just understand what this will do and how you want to use it. Right now, I'm thinking of keeping the last 6 weeks of data as "seconds", and then the last 3 months as "Minutes", and then "Hours" forever after that. I'll check back in six months and see how the sizes look. Steve

Read the article
Investigating on xVelocity (VertiPaq) column size

- by Marco Russo (SQLBI)

In January I published an article about how to optimize high cardinality columns in VertiPaq. In the meantime, VertiPaq has been rebranded to xVelocity: the official name is now “xVelocity in-memory analytics engine (VertiPaq)” but using xVelocity and VertiPaq when we talk about Analysis Services has the same meaning. In this post I’ll show how to investigate on columns size of an existing Tabular database so that you can find the most important columns to be optimized. A first approach can be looking in the DataDir of Analysis Services and look for the folder containing the database. Then, look for the biggest files in all subfolders and you will find the name of a file that contains the name of the most expensive column. However, this heuristic process is not very optimized. A better approach is using a DMV that provides the exact information. For example, by using the following query (open SSMS, open an MDX query on the database you are interested to and execute it) you will see all database objects sorted by used size in a descending way. SELECT * FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS ORDER BY used_size DESC You can look at the first rows in order to understand what are the most expensive columns in your tabular model. The interesting data provided are: TABLE_ID: it is the name of the object – it can be also a dictionary or an index COLUMN_ID: it is the column name the object belongs to – you can also see ID_TO_POS and POS_TO_ID in case they refer to internal indexes RECORDS_COUNT: it is the number of rows in the column USED_SIZE: it is the used memory for the object By looking at the ration between USED_SIZE and RECORDS_COUNT you can understand what you can do in order to optimize your tabular model. Your options are: Remove the column. Yes, if it contains data you will never use in a query, simply remove the column from the tabular model Change granularity. If you are tracking time and you included milliseconds but seconds would be enough, round the data source column to the nearest second. If you have a floating point number but two decimals are good enough (i.e. the temperature), round the number to the nearest decimal is relevant to you. Split the column. Create two or more columns that have to be combined together in order to produce the original value. This technique is described in VertiPaq optimization article. Sort the table by that column. When you read the data source, you might consider sorting data by this column, so that the compression will be more efficient. However, this technique works better on columns that don’t have too many distinct values and you will probably move the problem to another column. Sorting data starting from the lower density columns (those with a few number of distinct values) and going to higher density columns (those with high cardinality) is the technique that provides the best compression ratio. After the optimization you should be able to reduce the used size and improve the count/size ration you measured before. If you are interested in a longer discussion about internal storage in VertiPaq and you want understand why this approach can save you space (and time), you can attend my 24 Hours of PASS session “VertiPaq Under the Hood” on March 21 at 08:00 GMT.

Read the article
Investigating on xVelocity (VertiPaq) column size

- by Marco Russo (SQLBI)

In January I published an article about how to optimize high cardinality columns in VertiPaq. In the meantime, VertiPaq has been rebranded to xVelocity: the official name is now “xVelocity in-memory analytics engine (VertiPaq)” but using xVelocity and VertiPaq when we talk about Analysis Services has the same meaning. In this post I’ll show how to investigate on columns size of an existing Tabular database so that you can find the most important columns to be optimized. A first approach can be looking in the DataDir of Analysis Services and look for the folder containing the database. Then, look for the biggest files in all subfolders and you will find the name of a file that contains the name of the most expensive column. However, this heuristic process is not very optimized. A better approach is using a DMV that provides the exact information. For example, by using the following query (open SSMS, open an MDX query on the database you are interested to and execute it) you will see all database objects sorted by used size in a descending way. SELECT * FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS ORDER BY used_size DESC You can look at the first rows in order to understand what are the most expensive columns in your tabular model. The interesting data provided are: TABLE_ID: it is the name of the object – it can be also a dictionary or an index COLUMN_ID: it is the column name the object belongs to – you can also see ID_TO_POS and POS_TO_ID in case they refer to internal indexes RECORDS_COUNT: it is the number of rows in the column USED_SIZE: it is the used memory for the object By looking at the ration between USED_SIZE and RECORDS_COUNT you can understand what you can do in order to optimize your tabular model. Your options are: Remove the column. Yes, if it contains data you will never use in a query, simply remove the column from the tabular model Change granularity. If you are tracking time and you included milliseconds but seconds would be enough, round the data source column to the nearest second. If you have a floating point number but two decimals are good enough (i.e. the temperature), round the number to the nearest decimal is relevant to you. Split the column. Create two or more columns that have to be combined together in order to produce the original value. This technique is described in VertiPaq optimization article. Sort the table by that column. When you read the data source, you might consider sorting data by this column, so that the compression will be more efficient. However, this technique works better on columns that don’t have too many distinct values and you will probably move the problem to another column. Sorting data starting from the lower density columns (those with a few number of distinct values) and going to higher density columns (those with high cardinality) is the technique that provides the best compression ratio. After the optimization you should be able to reduce the used size and improve the count/size ration you measured before. If you are interested in a longer discussion about internal storage in VertiPaq and you want understand why this approach can save you space (and time), you can attend my 24 Hours of PASS session “VertiPaq Under the Hood” on March 21 at 08:00 GMT.

Read the article
Webcast On-Demand: Building Java EE Apps That Scale

- by jeckels

With some awesome work by one of our architects, Randy Stafford, we recently completed a webcast on scaling Java EE apps efficiently. Did you miss it? No problem. We have a replay available on-demand for you. Just hit the '+' sign drop-down for access.Topics include: Domain object caching Service response caching Session state caching JSR-107 HotCache and more! Further, we had several interesting questions asked by our audience, and we thought we'd share a sampling of those here for you - just in case you had the same queries yourself. Enjoy! What is the largest Coherence deployment out there? We have seen deployments with over 500 JVMs in the Coherence cluster, and deployments with over 1000 JVMs using the Coherence jar file, in one system. On the management side there is an ecosystem of monitoring tools from Oracle and third parties with dashboards graphing values from Coherence's JMX instrumentation. For lifecycle management we have seen a lot of custom scripting over the years, but we've also integrated closely with WebLogic to leverage its management ecosystem for deploying Coherence-based applications and managing process life cycles. That integration introduces a new Java EE archive type, the Grid Archive or GAR, which embeds in an EAR and can be seen by a WAR in WebLogic. That integration also doesn't require any extra WebLogic licensing if Coherence is licensed. How is Coherence different from a NoSQL Database like MongoDB? Coherence can be considered a NoSQL technology. It pre-dates the NoSQL movement, having been first released in 2001 whereas the term "NoSQL" was coined in 2009. Coherence has a key-value data model primarily but can also be used for document data models. Coherence manages data in memory currently, though disk persistence is in a future release currently in beta testing. Where the data is managed yields a few differences from the most well-known NoSQL products: access latency is faster with Coherence, though well-known NoSQL databases can manage more data. Coherence also has features that well-known NoSQL database lack, such as grid computing, eventing, and data source integration. Finally Coherence has had 15 years of maturation and hardening from usage in mission-critical systems across a variety of industries, particularly financial services. Can I use Coherence for local caching? Yes, you get additional features beyond just a java.util.Map: you get expiration capabilities, size-limitation capabilities, eventing capabilites, etc. Are there APIs available for GoldenGate HotCache? It's mostly a black box. You configure it, and it just puts objects into your caches. However you can treat it as a glass box, and use Coherence event interceptors to enhance its behavior - and there are use cases for that. Are Coherence caches updated transactionally? Coherence provides several mechanisms for concurrency control. If a project insists on full-blown JTA / XA distributed transactions, Coherence caches can participate as resources. But nobody does that because it's a performance and scalability anti-pattern. At finer granularity, Coherence guarantees strict ordering of all operations (reads and writes) against a single cache key if the operations are done using Coherence's "EntryProcessor" feature. And Coherence has a unique feature called "partition-level transactions" which guarantees atomic writes of multiple cache entries (even in different caches) without requiring JTA / XA distributed transaction semantics.

Read the article
Version control of software refactoring

- by Muhammad Alkarouri

What is the best way of doing version control of large scale refactoring? My typical style of programming (actually of writing documents as well) is getting something out as quickly as possible and then refactoring it. Typically, refactoring takes place at the same time as adding other functionality. In addition to standard refactoring of classes and functions, functions may move from one file to another, files get split and merged or just reordered. For the time being, I am using version control as a lone user, so there is no issue of interaction with other developers at this stage. Still, version control gives me two aspects: Backup and ability to revert to a good version "in case". Looking at the history tells me how the project progressed and the flow of ideas. I am using mercurial on windows using TortoiseHg which enables selections of hunks to commit. The reason I mention this is that I would like advice on the granularity of a commit in refactoring. Should I split refactoring from functionality added always in committing? I have looked at the answers of http://stackoverflow.com/questions/68459/refactoring-and-source-control-how-to but it doesn't answer my question. That question focuses on collaboration with a team. This one concentrates on having a history that is understandable in future (assuming I don't rewrite history as some VCS seem to allow).

Read the article
MDX equivalent to SQL subqueries with aggregation

- by James Lampe

I'm new to MDX and trying to solve the following problem. Investigated calculated members, subselects, scope statements, etc but can't quite get it to do what I want. Let's say I'm trying to come up with the MDX equivalent to the following SQL query: SELECT SUM(netMarketValue) net, SUM(CASE WHEN netMarketValue > 0 THEN netMarketValue ELSE 0 END) assets, SUM(CASE WHEN netMarketValue < 0 THEN netMarketValue ELSE 0 END) liabilities, SUM(ABS(netMarketValue)) gross someEntity1 FROM ( SELECT SUM(marketValue) netMarketValue, someEntity1, someEntity2 FROM <some set of tables> GROUP BY someEntity1, someEntity2) t GROUP BY someEntity1 In other words, I have an account ledger where I hide internal offsetting transactions (within someEntity2), then calculate assets & liabilities after aggregating them by someEntity2. Then I want to see the grand total of those assets & liabilities aggregated by the bigger entity, someEntity1. In my MDX schema I'd presumably have a cube with dimensions for someEntity1 & someEntity2, and marketValue would be my fact table/measure. I suppose i could create another DSV that did what my subquery does (calculating net), and simply create a cube with that as my measure dimension, but I wonder if there is a better way. I'd rather not have 2 cubes (one for these net calculations and another to go to a lower level of granularity for other use cases), since it will be a lot of duplicate info in my database. These will be very large cubes.

Read the article
Is there any danger in calling free() or delete instead of delete[]? [closed]

- by Matt Joiner

Possible Duplicate: ( POD )freeing memory : is delete[] equal to delete ? Does delete deallocate the elements beyond the first in an array? char *s = new char[n]; delete s; Does it matter in the above case seeing as all the elements of s are allocated contiguously, and it shouldn't be possible to delete only a portion of the array? For more complex types, would delete call the destructor of objects beyond the first one? Object *p = new Object[n]; delete p; How can delete[] deduce the number of Objects beyond the first, wouldn't this mean it must know the size of the allocated memory region? What if the memory region was allocated with some overhang for performance reasons? For example one could assume that not all allocators would provide a granularity of a single byte. Then any particular allocation could exceed the required size for each element by a whole element or more. For primitive types, such as char, int, is there any difference between: int *p = new int[n]; delete p; delete[] p; free p; Except for the routes taken by the respective calls through the delete-free deallocation machinery?

Read the article
Internal bug tracking tickets - Redmine, Trac, or JIRA

- by Tai Squared

I've been looking at setting up Redmine, Trac, or JIRA to track issues. I want to be able to have my development team create internal tickets that are never seen by clients, while clients can create/edit tickets that are seen by the internal team. From the Trac documentation, you can set permissions to create or view tickets, but it doesn't seem to allow for viewing only certain tickets. It may be possible with Trac Fine Grained Permissions, but doesn't appear so. The Redmine documentation mentions: Define your own roles and set their permissions in a click but doesn't appear to have the level of granularity. From the JIRA documentation: At the moment JIRA is only able to support security at a project level or issue level. Currently there is no field level security available. According to this question, Redmine doesn't support internal tickets, so you would have to use multiple projects. I don't want a situation where I would have to create multiple projects - one internal, one external and have the external tickets brought into the internal repository. It seems as this would lead to unnecessary overhead and inevitably, the projects wouldn't be in sync. Is there any way with any of these products (possibly through a plug-in if not in the core product itself) to specify these permissions, or simplify having two projects with different users and permissions that must still share information?

Read the article
File-level filesystem change notification in Mac OS X

- by Paul J. Lucas

I want my code to be notified when any file under (either directly or indirectly) a given directory is modified. By "modified", I mead I want my code to be notified whenever a file's contents are altered, it's renamed, or it's deleted; or if a new file is added. For my application, there can be thousands of files. I looked as FSEvents, but its Technology Overview says, in part: The important point to take away is that the granularity of notifications is at a directory level. It tells you only that something in the directory has changed, but does not tell you what changed. It also says: The file system events API is also not designed for finding out when a particular file changes. For such purposes, the kqueues mechanism is more appropriate. However, in order to use kqueue on a given file, one has to open the file to obtain a file descriptor. It's impractical to manage thousands of file descriptors (and would probably exceed the maximum allowable number of open file descriptors anyway). Curiously, under Windows, I can use the ReadDirectoryChangesW() function and it does precisely what I want. So how can one do what I want under Mac OS X? Or, asked another way: how would one go about writing the equivalent of ReadDirectoryChangesW() for Mac OS X in user-space (and do so very efficiently)?

Read the article
What is best practice (and implications) for packaging projects into JAR's?

- by user245510

What is considered best practice deciding how to define the set of JAR's for a project (for example a Swing GUI)? There are many possible groupings: JAR per layer (presentation, business, data) JAR per (significant?) GUI panel. For significant system, this results in a large number of JAR's, but the JAR's are (should be) more re-usable - fine-grained granularity JAR per "project" (in the sense of an IDE project); "common.jar", "resources.jar", "gui.jar", etc I am an experienced developer; I know the mechanics of creating JAR's, I'm just looking for wisdom on best-practice. Personally, I like the idea of a JAR per component (e.g. a panel), as I am mad-keen on encapsulation, and the holy-grail of re-use accross projects. I am concerned, however, that on a practical, performance level, the JVM would struggle class loading over dozens, maybe hundreds of small JAR's. Each JAR would contain; the GUI panel code, necessary resources (i.e. not centralised) so each panel can stand alone. Does anyone have wisdom to share?

Read the article
Webbased data modelling and management tool

- by pixeldude

Is there a web-based tool available, where I am able to... ...define data models (like in a database admin tool) ...fill in data (in custom web forms, not too generic) with basic features like completion ...import data from CSV oder Excel Sheets ...export data to CSV or SQL ...create snapshots of my data models (versions, diff, etc.) ...share my data models ...discuss/collaborate with other people about my data models Well, I can develop something like this in PHP or with Ruby or whatever. But this is such a common task, where the application support could be a lot better. And it would be language and database independent. This would help to maintain data models in different versions and you can maybe share your data models with others, extend it with your team members, etc. There is a website called FreeBase, which allows you to define a data entity model and fill in data, which also has export features, but I need to define my own data model with my own granularity and structure. And it should not be shared in public if I don't want to. How do you solve problems like this yourself?

Read the article
database design suggestion needed

- by JMSA

I need to design a table for daily sales of pharmaceutical products. There are hundreds of types of products available {Name, code}. Thousands of sales-persons are employed to sell those products{name, code}. They collect products from different depots{name, code}. They work in different Areas - Zones - Markets - Outlets, etc. {All have names and codes} Each product has various types of prices {Production Price, Trade Price, Business Price, Discount Price, etc.}. And, sales-persons are free to choose from those combination to estimate the sales price. The problem is, daily sales requires huge amount of data-entry. Within couple of years there may be gigabytes of data (if not terabytes). If I need to show daily, weekly, monthly, quarterly and yearly sales reports there will be various types of sql queries I shall need. This is my initial design: Product {ID, Code, Name, IsActive} ProductXYZPriceHistory {ID, ProductID, Date, EffectDate, Price, IsCurrent} SalesPerson {ID, Code, Name, JoinDate, and so on..., IsActive} SalesPersonSalesAraeaHistory {ID, SalesPersonID, SalesAreaID, IsCurrent} Depot {ID, Code, Name, IsActive} Outlet {ID, Code, Name, AreaID, IsActive} AreaHierarchy {ID, Code, Name, PrentID, AreaLevel, IsActive} DailySales {ID, ProductID, SalesPersonID, OutletID, Date, PriceID, SalesPrice, Discount, etc...} Now, apart from indexing, how can I normalize my DailySales table to have a fine grained design that I shall not need to change for years to come? Please show me a sample design of only the DailySales data-entry table (from which all types of reports would be queried) on the basis of above information. I don't need a detailed design advice. I just need an advice regarding only the DailySales table. Is there any way to break this particular table to achieve granularity?

Read the article
How can I strip Python logging calls without commenting them out?

- by cdleary

Today I was thinking about a Python project I wrote about a year back where I used logging pretty extensively. I remember having to comment out a lot of logging calls in inner-loop-like scenarios (the 90% code) because of the overhead (hotshot indicated it was one of my biggest bottlenecks). I wonder now if there's some canonical way to programmatically strip out logging calls in Python applications without commenting and uncommenting all the time. I'd think you could use inspection/recompilation or bytecode manipulation to do something like this and target only the code objects that are causing bottlenecks. This way, you could add a manipulator as a post-compilation step and use a centralized configuration file, like so: [Leave ERROR and above] my_module.SomeClass.method_with_lots_of_warn_calls [Leave WARN and above] my_module.SomeOtherClass.method_with_lots_of_info_calls [Leave INFO and above] my_module.SomeWeirdClass.method_with_lots_of_debug_calls Of course, you'd want to use it sparingly and probably with per-function granularity -- only for code objects that have shown logging to be a bottleneck. Anybody know of anything like this? Note: There are a few things that make this more difficult to do in a performant manner because of dynamic typing and late binding. For example, any calls to a method named debug may have to be wrapped with an if not isinstance(log, Logger). In any case, I'm assuming all of the minor details can be overcome, either by a gentleman's agreement or some run-time checking. :-)

Read the article
Design patterns for Agent / Actor based concurrent design.

- by nso1

Recently i have been getting into alternative languages that support an actor/agent/shared nothing architecture - ie. scala, clojure etc (clojure also supports shared state). So far most of the documentation that I have read focus around the intro level. What I am looking for is more advanced documentation along the gang of four but instead shared nothing based. Why ? It helps to grok the change in design thinking. Simple examples are easy, but in a real world java application (single threaded) you can have object graphs with 1000's of members with complex relationships. But with agent based concurrency development it introduces a whole new set of ideas to comprehend when designing large systems. ie. Agent granularity - how much state should one agent manage - implications on performance etc or are their good patterns for mapping shared state object graphs to agent based system. tips on mapping domain models to design. Discussions not on the technology but more on how to BEST use the technology in design (real world "complex" examples would be great).

Read the article
C++: defining maximum/minimum limits for a class

- by Luis

Basically what the title says... I have created a class that models time slots in a variable-granularity daily schedule (where for example the first time slot is 30 minutes, but the second time slot can be 40 minutes); the first available slot starts at (a value comparable to) 1. What I want to do now is to define somehow the maximum and minimum allowable values that this class takes and I have two practical questions in order to do so: 1.- does it make sense to define absolute minimum and maximum in such a way for a custom class? Or better, does it suffice that a value always compares as lower-than any other possible value of the type, given the class's defined relational operators, to be defined the min? (and analogusly for the max) 2.- assuming the previous question has an answer modeled after "yes" (or "yes but ..."), how do I define such max/min? I know that there is std::numeric_limits<> but from what I read it is intended for "numeric types". Do I interpret that as meaning "represented as a number" or can I make a broader assumption like "represented with numbers" or "having a correspondence to integers"? After all, it would make sense to define the minimum and maximum for a date class, and maybe for a dictionary class, but numeric_limits may not be intended for those uses (I don't have much experience with it). Plus, numeric_limits has a lot of extra members and information that I don't know what to make with. If I don't use numeric_limits, what other well-known / widely-used mechanism does C++ offer to indicate the available range of values for a class?

Read the article
how to let the parser print help message rather than error and exit

- by fluter

Hi, I am using argparse to handle cmd args, I wanna if there is no args specified, then print the help message, but now the parse will output a error, and then exit. my code is: def main(): print "in abing/start/main" parser = argparse.ArgumentParser(prog="abing")#, usage="%(prog)s <command> [args] [--help]") parser.add_argument("-v", "--verbose", action="store_true", default=False, help="show verbose output") subparsers = parser.add_subparsers(title="commands") bkr_subparser = subparsers.add_parser("beaker", help="beaker inspection") bkr_subparser.set_defaults(command=beaker_command) bkr_subparser.add_argument("-m", "--max", action="store", default=3, type=int, help="max resubmit count") bkr_subparser.add_argument("-g", "--grain", action="store", default="J", choices=["J", "RS", "R", "T", "job", "recipeset", "recipe", "task"], type=str, help="resubmit selection granularity") bkr_subparser.add_argument("job_ids", nargs=1, action="store", help="list of job id to be monitored") et_subparser = subparsers.add_parser("errata", help="errata inspection") et_subparser.set_defaults(command=errata_command) et_subparser.add_argument("-w", "--workflows", action="store_true", help="generate workflows for the erratum") et_subparser.add_argument("-r", "--run", action="store_true", help="generate workflows, and run for the erratum") et_subparser.add_argument("-s", "--start-monitor", action="store_true", help="start monitor the errata system") et_subparser.add_argument("-d", "--daemon", action="store_true", help="run monitor into daemon mode") et_subparser.add_argument("erratum", action="store", nargs=1, metavar="ERRATUM", help="erratum id") if len(sys.argv) == 1: parser.print_help() return args = parser.parse_args() args.command(args) return how can I do that? thanks.

Read the article

Search Results

Search found 132 results on 6 pages for 'granularity'.

Page 4/6 | < Previous Page | 1 2 3 4 5 6 | Next Page >

- by Aaronaught

- by McLeopold

- by Drewdavid

- by Tim

- by SST

- by Ayos

- by Stefan Hinker

- by Knut Vatsendvik

- by Steve Tunstall

- by Marco Russo (SQLBI)

- by Marco Russo (SQLBI)

- by jeckels

- by Muhammad Alkarouri

- by James Lampe

- by Matt Joiner

- by Tai Squared

- by Paul J. Lucas

- by user245510

- by pixeldude

- by JMSA

- by cdleary

- by nso1

- by Luis

- by fluter

< Previous Page | 1 2 3 4 5 6 | Next Page >