ids - Page 18 - Developer IT

Google Chrome Extensions: Identity, Signing and Auto Update

Google Chrome Extensions: Identity, Signing and Auto Update Antony Sargent, a software engineer at Google discusses topics related to ids, packaging and distribution of extensions in the Google Chrome Extension system. To get more information, visit code.google.com/chrome/extensions From: GoogleDevelopers Views: 27337 54 ratings Time: 04:08 More in Science & Technology

Read the article

It's called College.

- by jeffreyabecker

Today I saw yet another 'GUID vs int as your primary key' article. Like most of the ones I've read this was filled with technical misrepresentations and out-right fallices. Chef's famous line that "There's a time and a place for everything children" applies here. GUIDs have distinct advantages and disadvantages which should be considered when choosing a data type for the primary key. Fallacy 1: "Its easier" An integer data type(tinyint, smallint, int, bigint) is a better artifical key than a GUID because its easier to remember. I'm a firm believer that your artifical primary keys should be opaque gibberish. PK's are an implementation detail which should never be exposed to the user or relied on for business logic. If you want things to come back in an order, add and ORDER BY clause and SortOrder fields. If you want a human-usable look-up add a business key with a unique constraint. If you want to know what order things were inserted into a table add a timestamp. Fallacy 2: "Size Matters" For many applications, the size of the artifical primary key is going to be irrelevant. The particular article which kicked this post off stated repeatedly that joining against an int has better performance than joining against a GUID. In computer science the performance of your algorithm is always a function of the number of data points. This still holds true for databases. Unless your table is very large, the performance difference between an int and a guid probably isnt going to be mesurable let alone noticeable. My personal experience is that the performance becomes an issue when you start having billions of rows in the table. At this point, you should probably start looking to move from int to bigint so the effective space/performance gain isnt as much as you'd think. GUID Advantages: Insert-ability / Mergeability: You can reliably insert guids into tables without key collisions. Database Independence: Saving entities to the database often requires knowing ids. With identity based ids the id must be selected back after every insert. GUIDs can be generated application-side allowing much faster inserts. GUID Disadvantages: Generatability: You can calculate the next id for an integer pk pretty easily in your head but will need a program to generate GUIDs. Solution: "Select top 100 newid() from sysobjects" Fragmentation: most GUID generation algorithms generate pseudo random GUIDs. This can cause inserts into the middle of your clustered index. Solutions: add a default of newsequentialid() or use GuidComb in NHibernate.

Read the article

Useful Extensions for SecurityToken Handling - Convert a SecurityToken to Claims

- by Your DisplayName here!

That’s a very common one: public static IClaimsPrincipal ToClaimsPrincipal( this SecurityToken token, X509Certificate2 signingCertificate) { var configuration = CreateStandardConfiguration(signingCertificate); return token.ToClaimsPrincipal(configuration.CreateDefaultHandlerCollection()); } public static IClaimsPrincipal ToClaimsPrincipal(this SecurityToken token, X509Certificate2 signingCertificate, string audienceUri) { var configuration = CreateStandardConfiguration(signingCertificate); configuration.AudienceRestriction.AudienceMode = AudienceUriMode.Always; configuration.AudienceRestriction.AllowedAudienceUris.Add(new Uri(audienceUri)); return token.ToClaimsPrincipal(configuration.CreateDefaultHandlerCollection()); } public static IClaimsPrincipal ToClaimsPrincipal( this SecurityToken token, SecurityTokenHandlerCollection handler) { var ids = handler.ValidateToken(token); return ClaimsPrincipal.CreateFromIdentities(ids); } private static SecurityTokenHandlerConfiguration CreateStandardConfiguration( X509Certificate2 signingCertificate) { var configuration = new SecurityTokenHandlerConfiguration(); configuration.AudienceRestriction.AudienceMode = AudienceUriMode.Never; configuration.IssuerNameRegistry = signingCertificate.CreateIssuerNameRegistry(); configuration.IssuerTokenResolver = signingCertificate.CreateSecurityTokenResolver(); configuration.SaveBootstrapTokens = true; return configuration; } private static IssuerNameRegistry CreateIssuerNameRegistry(this X509Certificate2 certificate) { var registry = new ConfigurationBasedIssuerNameRegistry(); registry.AddTrustedIssuer(certificate.Thumbprint, certificate.Subject); return registry; } private static SecurityTokenResolver CreateSecurityTokenResolver( this X509Certificate2 certificate) { var tokens = new List<SecurityToken> { new X509SecurityToken(certificate) }; return SecurityTokenResolver.CreateDefaultSecurityTokenResolver(tokens.AsReadOnly(), true); } private static SecurityTokenHandlerCollection CreateDefaultHandlerCollection( this SecurityTokenHandlerConfiguration configuration) { return SecurityTokenHandlerCollection.CreateDefaultSecurityTokenHandlerCollection(configuration); }

Read the article

Commnunity Technology Update (CTU) 2011

- by Aman Garg

Spoke at the session on Webforms in CTU 2011 (Community Technology Update) in Singapore. Had a good interaction with the Developer community here in Singapore. I covered the following topics during the session: *Dynamic Data *Routing *Web Form Additions *Predictable Client IDs *Programmable Meta Data *Better control over ViewState *Persist selected rows *Web Deployment The Slide Deck used can be accessed using the following URL: http://www.slideshare.net/amangarg516/web-forms-im-still-alive

Read the article

Syncing client and server CRUD operations using json and php

- by Justin

I'm working on some code to sync the state of models between client (being a javascript application) and server. Often I end up writing redundant code to track the client and server objects so I can map the client supplied data to the server models. Below is some code I am thinking about implementing to help. What I don't like about the below code is that this method won't handle nested relationships very well, I would have to create multiple object trackers. One work around is for each server model after creating or loading, simply do $model->clientId = $clientId; IMO this is a nasty hack and I want to avoid it. Adding a setCientId method to all my model object would be another way to make it less hacky, but this seems like overkill to me. Really clientIds are only good for inserting/updating data in some scenarios. I could go with a decorator pattern but auto generating a proxy class seems a bit involved. I could use a generic proxy class that uses a __call function to allow for original object data to be accessed, but this seems wrong too. Any thoughts or comments? $clientData = '[{name: "Bob", action: "update", id: 1, clientId: 200}, {name:"Susan", action:"create", clientId: 131} ]'; $jsonObjs = json_decode($clientData); $objectTracker = new ObjectTracker(); $objectTracker->trackClientObjs($jsonObjs); $query = $this->em->createQuery("SELECT x FROM Application_Model_User x WHERE x.id IN (:ids)"); $query->setParameters("ids",$objectTracker->getClientSpecifiedServerIds()); $models = $query->getResults(); //Apply client data to server model foreach ($models as $model) { $clientModel = $objectTracker->getClientJsonObj($model->getId()); ... } //Create new models and persist foreach($objectTracker->getNewClientObjs() as $newClientObj) { $model = new Application_Model_User(); .... $em->persist($model); $objectTracker->trackServerObj($model); } $em->flush(); $resourceResponse = $objectTracker->createResourceResponse(); //Id mappings will be an associtave array representing server id resources with client side // id. //This method Dosen't seem to flexible if we want to return additional data with each resource... //Would have to modify the returned data structure, seems like tight coupling... //Ex return value: //[{clientId: 200, id:1} , {clientId: 131, id: 33}];

Read the article

How can I manage multiple administrators with juju?

- by Jorge Castro

I manage some deployments with juju. However I am not an island, I have coworkers who also want to manage shared environments. I know I can use the following stanza in ~/.juju/environments.yaml to give people access to my juju environment: authorized-keys: [and then put their ssh IDs in here] What other best practices are available to manage multiple environments with multiple system administrators?

Read the article

Where should I define constants in scripts?

- by bshacklett

When writing scripts using a modern scripting language, e.g. Powershell or JavaScript, where should I define constants? Should I make all constants global for readability and ease of use, or does it make sense to define constants as close to their scopes as possible (in a function, for instance, if it's not needed elsewhere)? I'm thinking mostly of error messages, error IDs, paths to resources or configuration options.

Read the article

How can I track hits to areas of my web application?

- by Tyson

We have a growing web application, and we currently use Google Analytics and Chartbeat to track usage and engagement (although we're open to alternatives). Unfortunately, both are geared towards content-based sites where everything is about the URL. Our URLs contain object IDs, making them less useful independently, and causing us to grow beyond Google Analytics' 50,000 unique URLs per day. How can we track hits to areas of our web application, essentially ignoring parts of the URLs?

Read the article

Reminder: Oracle OpenWorld Presentations Available for Download

- by Di Seghposs

A little over 30 days ago 45,000 Oracle OpenWorld conference attendees stormed San Francisco to battle the heat in search of Oracle solutions to address their business needs. With so much activity from Keynotes, General Sessions, and Conference Panels to Meet the Experts and the Exhibit Halls, perhaps there was some Oracle Financials sessions you missed! Keynotes and sessions are now available for download on the OpenWorld site. For a complete list of sessions and session IDs, view any of the Focus on Documents located on the OpenWorld site.

Read the article

Is it good practice to keep 2 related tables (using auto_increment PK) to have the same Max of auto_increment ID when table1 got modified?

- by Tum

This question is about good design practice in programming. Let see this example, we have 2 interrelated tables: Table1 textID - text 1 - love.. 2 - men... ... Table2 rID - textID 1 - 1 2 - 2 ... Note: In Table1: textID is auto_increment primary key In Table2: rID is auto_increment primary key & textID is foreign key The relationship is that 1 rID will have 1 and only 1 textID but 1 textID can have a few rID. So, when table1 got modification then table2 should be updated accordingly. Ok, here is a fictitious example. You build a very complicated system. When you modify 1 record in table1, you need to keep track of the related record in table2. To keep track, you can do like this: Option 1: When you modify a record in table1, you will try to modify a related record in table 2. This could be quite hard in term of programming expecially for a very very complicated system. Option 2: instead of modifying a related record in table2, you decided to delete old record in table 2 & insert new one. This is easier for you to program. For example, suppose you are using option2, then when you modify record 1,2,3,....,100 in table1, the table2 will look like this: Table2 rID - textID 101 - 1 102 - 2 ... 200 - 100 This means the Max of auto_increment IDs in table1 is still the same (100) but the Max of auto_increment IDs in table2 already reached 200. what if the user modify many times? if they do then the table2 may run out of records? we can use BigInt but that make the app run slower? Note: If you spend time to program to modify records in table2 when table1 got modified then it will be very hard & thus it will be error prone. But if you just clear the old record & insert new records into table2 then it is much easy to program & thus your program is simpler & less error prone. So, is it good practice to keep 2 related tables (using auto_increment PK) to have the same Max of auto_increment ID when table1 got modified?

Read the article

??????????

- by user762552

?????????????????????????????????????????????????????????????????????11/9???12/7?2??????????????????????????????????????????????????4???????3??????1??????2???????·?????IDS/IPS???????? ????????????????(JNSA)???????·??/??WG????????????????????????????????????????????2???????????????????????2????(90?×2)???????????????????????????????????···????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????(?????)????????????AKB48????????????????????????????(?)??????IP????????????ACL???????????????????????????????????????????????????????????????????????????????????????????????????????????

Read the article

How to mount vfat drive on Linux with ownership other than root?

- by Norman Ramsey

I'm running into trouble mounting an iPod on a newly upgraded Debian Squeeze. I suspect either a protocol has changed or I've tickled a bug, which I don't know where to report. I'm trying to mount the iPod so that I have permission to read and write it. But my efforts come to nothing: $ sudo mount -v -t vfat -o uid=32074,gid=6202 /dev/sde2 /mnt /dev/sde2 on /mnt type vfat (rw,uid=32074,gid=6202) $ ls -l /mnt total 80 drwxr-xr-x 2 root root 16384 Jan 1 2000 Calendars drwxr-xr-x 2 root root 16384 Jan 1 2000 Contacts drwxr-xr-x 2 root root 16384 Jan 1 2000 Notes drwxr-xr-x 3 root root 16384 Jun 23 2007 Photos drwxr-xr-x 6 root root 16384 Jun 19 2007 iPod_Control $ sudo umount /mnt $ sudo mount -v -t vfat -o uid=nr,gid=nr /dev/sde2 /mnt /dev/sde2 on /mnt type vfat (rw,uid=32074,gid=6202) $ ls -l /mnt total 80 drwxr-xr-x 2 root root 16384 Jan 1 2000 Calendars drwxr-xr-x 2 root root 16384 Jan 1 2000 Contacts drwxr-xr-x 2 root root 16384 Jan 1 2000 Notes drwxr-xr-x 3 root root 16384 Jun 23 2007 Photos drwxr-xr-x 6 root root 16384 Jun 19 2007 iPod_Control As you see, I've tried both symbolic and numberic IDs, but the files persist in being owned by root (and only writable by root). The IDs are really mine; I've had the UID since 1993. $ id uid=32074(nr) gid=6202(nr) groups=6202(nr),0(root),2(bin),4(adm),... I've put an strace at http://pastebin.com/Xue2u9FZ, and the mount(2) call looks good: mount("/dev/sde2", "/mnt", "vfat", MS_MGC_VAL, "uid=32074,gid=6202") = 0 Finally, here's my kernel version from uname -a: Linux homedog 2.6.32-5-686 #1 SMP Mon Jun 13 04:13:06 UTC 2011 i686 GNU/Linux Does anyone know if I should be doing something different, or If there is a workaround, or If this is a bug, where it should be reported?

Read the article

Wireless Activity Monitoring for PCI DSS Compliance

- by dkusleika

In an effort to be PCI DSS compliant, I took a trustkeeper.net questionnaire. I failed the question that asks Is the presence of wireless access points tested for by using a wireless analyzer at least quarterly or by deploying a wireless IDS/IPS to identify all wireless devices in use? (SAQ #11.1) My only wireless access point is outside my firewall, so even if you cracked my wireless you couldn't get inside my domain (unless you crack that too). My firewall doesn't have IPS and I couldn't tell if it had IDS. I looked around for a wireless analyzer, but what I found was $500, which is a little pricey for my size business. And even if I got it, I'm not sure I would understand what it tells me. Surely there are smaller/less sophisticated businesses that take credit cards and have solved this. My questions are: What are the risks if someone were to crack my wireless? (Could they read all internet traffic? Just wireless traffic? Just use my internet connection?) And what is the best/cheapest way to test my connection point quarterly? Should I buy the $500 analyzer? Domain is Windows Server 2000. Firewall is Sonicwall Pro 2040. Router is 8 port D-link.

Read the article

Optimal file system type and mount options for an rsnapshot dedicated drive

- by Nimmy Lebby

We have an external USB 2 drive that we are using as a backup drive for our configuration. We use rsnapshot for the backups. It uses a few standard commands for managing snapshots: rm -rf: deletes expired snapshots mv: moves older snapshots down a slot cp -al: duplicates last snapshot to new slot rsync -a --delete --numeric-ids --relative: synchronizes new snapshot As you could see by the log below, the majority of the time is spent on the rm -rf and the cp -al steps: [25/Dec/2010:14:00:02] rsnapshot hourly: started [25/Dec/2010:14:00:02] echo 21012 > /var/run/rsnapshot.pid [25/Dec/2010:14:00:02] rm -rf /mnt/extdrive/snapshots/hourly.5/ [25/Dec/2010:14:15:48] mv /mnt/extdrive/snapshots/hourly.4/ /mnt/extdrive/snapshots/hourly.5/ [25/Dec/2010:14:15:48] mv /mnt/extdrive/snapshots/hourly.3/ /mnt/extdrive/snapshots/hourly.4/ [25/Dec/2010:14:15:48] mv /mnt/extdrive/snapshots/hourly.2/ /mnt/extdrive/snapshots/hourly.3/ [25/Dec/2010:14:15:48] mv /mnt/extdrive/snapshots/hourly.1/ /mnt/extdrive/snapshots/hourly.2/ [25/Dec/2010:14:15:48] cp -al /mnt/extdrive/snapshots/hourly.0 /mnt/extdrive/snapshots/hourly.1 [25/Dec/2010:14:23:32] rsync -a --delete --numeric-ids --relative /etc /mnt/extdrive/snapshots/hourly.0/sm4/ [25/Dec/2010:14:23:52] touch /mnt/extdrive/snapshots/hourly.0/ [25/Dec/2010:14:23:52] rm -f /var/run/rsnapshot.pid [25/Dec/2010:14:23:52] rsnapshot hourly: completed successfully My questions: I'm currently using ext4 for the filesystem. Maybe this is not the best choice from those available in Red Hat. Anyone have any recommendations that would speed up the process? The partition's mount options are sync,dirsync 1 2. Is there a way to optimize this since it's solely used for rsnapshot? Of course, reasoning would be greatly appreciated.

Read the article

ipmi - can't ping or remotely connect

- by Fidel

I've tried configuring the IPMI controller to accept remote connections, but I can't even ping it. Here is it status: #/usr/local/bin/ipmitool lan print 2 Set in Progress : Set Complete Auth Type Support : NONE PASSWORD Auth Type Enable : Callback : : User : NONE PASSWORD : Operator : PASSWORD : Admin : PASSWORD : OEM : IP Address Source : Static Address IP Address : 192.168.1.112 Subnet Mask : 255.255.255.0 MAC Address : 00:a0:a5:67:45:25 IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10 BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Enabled Gratituous ARP Intrvl : 8.0 seconds Default Gateway IP : 192.168.1.1 Default Gateway MAC : 00:00:00:00:00:00 802.1q VLAN ID : Disabled 802.1q VLAN Priority : 0 RMCP+ Cipher Suites : 0,1,2,3 Cipher Suite Priv Max : uaaaXXXXXXXXXXX : X=Cipher Suite Unused : c=CALLBACK : u=USER : o=OPERATOR : a=ADMIN : O=OEM # /usr/local/bin/ipmitool user list 2 ID Name Enabled Callin Link Auth IPMI Msg Channel Priv Limit 1 true false true true USER 2 admin true false true true ADMINISTRATOR # /usr/local/bin/ipmitool channel getaccess 2 2 Maximum User IDs : 5 Enabled User IDs : 2 User ID : 2 User Name : admin Fixed Name : No Access Available : callback Link Authentication : enabled IPMI Messaging : enabled Privilege Level : ADMINISTRATOR # /usr/local/bin/ipmitool channel info 2 Channel 0x2 info: Channel Medium Type : 802.3 LAN Channel Protocol Type : IPMB-1.0 Session Support : multi-session Active Session Count : 0 Protocol Vendor ID : 7154 Volatile(active) Settings Alerting : disabled Per-message Auth : disabled User Level Auth : disabled Access Mode : always available Non-Volatile Settings Alerting : disabled Per-message Auth : disabled User Level Auth : disabled Access Mode : always available # /usr/local/bin/ipmitool chassis status System Power : on Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : unknown Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false # arp Address HWtype HWaddress Flags Mask Iface 192.168.1.112 ether 00:A0:A5:67:45:25 C bond0 # /usr/local/bin/ipmitool -I lan -H 192.168.1.112 -U admin -P admin chassis power status Error: Unable to establish LAN session Unable to get Chassis Power Status In summary. It exists on the ARP list so arp's are being broadcast. I can't ping it and can't connect to it. Can anyone spot any glaring mistakes in the configuration? Many thanks, Fidel

Read the article

Why does this rsnapshot exclude not work?

- by bstpierre

Rsnapshot passes excludes directly to rsync, but rsync's behavior appears inconsistent. I've simplified my rsnapshot backup test to the following directory tree (this tree will be backed up): gorilla:~# find /tmp/snaptest -exec file {} \; /tmp/snaptest: directory /tmp/snaptest/SKIPTHIS: directory /tmp/snaptest/SKIPTHIS/xyz: directory /tmp/snaptest/SKIPTHIS/xyz/testing: ASCII text /tmp/snaptest/SKIPTHIS/bar: ASCII text /tmp/snaptest/SKIPTHIS/foo: ASCII text /tmp/snaptest/SKIPTHIS.txt: ASCII text My config file: config_version 1.2 snapshot_root /tmp/backup-media no_create_root 1 cmd_cp /bin/cp cmd_rm /bin/rm cmd_rsync /usr/bin/rsync cmd_ssh /usr/bin/ssh cmd_logger /usr/bin/logger cmd_du /usr/bin/du interval hourly 6 interval daily 7 interval weekly 4 interval monthly 3 verbose 3 loglevel 3 logfile /media/maxtor-one-touch/rsnapshot.log lockfile /media/maxtor-one-touch/backups/.rsnapshot.pid rsync_short_args -a rsync_long_args --delete --numeric-ids --relative --delete-excluded exclude "SKIPTHIS/**" link_dest 1 backup /tmp/snaptest snaptest The result: gorilla:~# rsnapshot -c /tmp/snaptest.conf hourly echo 12638 > /media/maxtor-one-touch/backups/.rsnapshot.pid mkdir -m 0755 -p /tmp/backup-media/hourly.0/ /usr/bin/rsync -a --delete --numeric-ids --relative --delete-excluded \ --exclude="SKIPTHIS/**" /tmp/snaptest \ /tmp/backup-media/hourly.0/snaptest touch /tmp/backup-media/hourly.0/ rm -f /media/maxtor-one-touch/backups/.rsnapshot.pid gorilla:~# find /tmp/backup-media/ -exec file {} \; /tmp/backup-media/: directory /tmp/backup-media/hourly.0: directory /tmp/backup-media/hourly.0/snaptest: directory /tmp/backup-media/hourly.0/snaptest/tmp: sticky directory /tmp/backup-media/hourly.0/snaptest/tmp/snaptest: directory /tmp/backup-media/hourly.0/snaptest/tmp/snaptest/SKIPTHIS: directory /tmp/backup-media/hourly.0/snaptest/tmp/snaptest/SKIPTHIS/xyz: directory /tmp/backup-media/hourly.0/snaptest/tmp/snaptest/SKIPTHIS/xyz/testing: ASCII text /tmp/backup-media/hourly.0/snaptest/tmp/snaptest/SKIPTHIS/bar: ASCII text /tmp/backup-media/hourly.0/snaptest/tmp/snaptest/SKIPTHIS/foo: ASCII text /tmp/backup-media/hourly.0/snaptest/tmp/snaptest/SKIPTHIS.txt: ASCII text My confusion stems from the fact that if I copy-paste the rsync command echoed by rsnapshot, the SKIPTHIS directory is excluded! (I've tested with various other SKIPTHIS patterns with the same results.) Any idea what's going on?

Read the article

Importing long numerical identifiers into Excel

- by Niels Basjes

I have some data in a database that uses ids that have the form of 16 digit numbers. In some situations i need to export the data in such a way that it can be manipulated in excel. So i export the data into a file and import it into excel. I've tried several file formats and I'm stuck. The problem I'm facing is that when reading a file into excel that has a cell that looks like a number then excel treats it as a number. The catch is that (as far as i can tell) all numerical values in excel are double precision floating point which have a precision of less than 16 digits. So my ids are changed: very often the last digit its changed to a 0. So far I've only been able to convince excel to keep the Id unchanged by breaking it myself: by adding a letter or symbol to the Id. This however means that in order to use the value again it must be "unbroken". Is there a way to create a file where i can specify that excel must treat the value as a text without changing the value? Or its there a way to let excel treat the value as a long (64bit integer)?

Read the article

UIDs for service users in Mac OS X

- by LaC

Some third-party servers should be run under a special user for security reasons (eg, PostgreSQL is typically run by "postgres"). Of course, these service users should not show up in the Mac OS X login windows. I know how to create hidden users using dscl or dsimport, but I'm wondering what the best policy is for assigning UIDs (and matching GIDs). Apple's documentation states that UIDs from 0 to 100 are reserved (pg. 69), but OS X comes with several special users and groups outside that range. I used to use ids from 401 onwards for services, but I noticed that OS X 10.6 has started using that range for groups created by the Sharing pane in System Preferences. What is the recommended ID range to use for third-party services, then? Perhaps I should just use IDs in the 500 range, since all that is needed to hide a user in Snow Leopard is setting his password to "*"? Also, most of Apple's services have names starting with an underscore, with an alias sans underscore; eg, _sandbox and sandbox. Is there any special significance to this? Should I do the same for my services?

Read the article

How to optimize a postgreSQL server for a "write once, read many"-type infrastructure ?

- by mhu

Greetings, I am working on a piece of software that logs entries (and related tagging) in a PostgreSQL database for storage and retrieval. We never update any data once it has been inserted; we might remove it when the entry gets too old, but this is done at most once a day. Stored entries can be retrieved by users. The insertion of new entries can happen rather fast and regularly, thus the database will commonly hold several millions elements. The tables used are pretty simple : one table for ids, raw content and insertion date; and one table storing tags and their values associated to an id. User search mostly concern tags values, so SELECTs usually consist of JOIN queries on ids on the two tables. To sum it up : 2 tables Lots of INSERT no UPDATE some DELETE, once a day at most some user-generated SELECT with JOIN huge data set What would an optimal server configuration (software and hardware, I assume for example that RAID10 could help) be for my PostgreSQL server, given these requirements ? By optimal, I mean one that allows SELECT queries taking a reasonably little amount of time. I can provide more information about the current setup (like tables, indexes ...) if needed.

Read the article

Windows 7 product key, which is the valid one - in registry or on a sticker?

- by me how

I am not too familiar with the software licensing and how this all works, but I have a question regarding Windows 7 and partially Office - generally Microsoft products. I have been asked to assist our IT guy who wants to collect all the product IDs for Windows 7 and Office. I haven't been given much details how to go about it and how to collect it. After a bit of research I have decided to use a freeware that pulls the software licenses out of the registry. I thought that was the easiest and would provide the most accurate product IDs. I've used Belrac Avisor to obtain all the informations. It turns out that about 25 machines use the same product key. I have asked if the company has bought a commercial license or something but there isn't anyone available at the moment who could answer my question. I have told the IT guy that there are 25 machines using the same product key and asked if that is alright. He told me to go around and write the product keys from the sticker(label) on each machine. I am just not quite sure if that's the right approach specially that the numbers do not match.... So, now I see that the numbers aren't matching and my question is in terms of software licensing which is the VALID and correct product key to provide if ever questioned about software license? Is it the number on the sticker or is it the number stored in the registry?

Read the article

Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK. But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage. Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19: 20: index++; 21: } 22: } 23: 24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29: 30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server. Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution. In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact, a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop. Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22: 23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30: 31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: 5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14: 15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally. Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12: 13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11: 12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21: 22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14: 15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24: 25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13: 14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20: 21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32: 33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40: 41: offset = end; 42: index++; 43: } 44: 45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72: 73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5: 6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12: 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16: 17: return View(); 18: } 19: 20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30: 31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40: 41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50: 51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64: 65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74: 75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage. Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation. Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

CentOS 5.4 NFS v4 client file permissions differ from original files & NFS Share file contents

- by p4guru

Having a strange problem with NFS share and file permissions on the 1 out of the 2 NFS clients, web1 has file permissions issues but web2 is fine. web1 and web2 are load balanced web servers. So questions are: how do I ensure NFS share file contents retain the same permissions for user/group as the original files on web1 server like they do on web2 server ? how do I reverse what I did on web1, i tried unmount command and said command not found ? Information: I'm using 3 dedicated server setup. All 3 servers CentOS 5.4 64bit based. servers are as follows: web1 - nfs client with file permissions issues web2 - nfs client file permissions are OKAY db1 - nfs share at /nfsroot web2 nfs client was setup by my web host, while web1 was setup by me. I did the following commands on web1 and it worked with updating db1 nfsroot share at /nfsroot/site_css with latest files on web1 but the file permissions don't stick even if i use tar with -p command to perserve file permissions ? cd /home/username/public_html/forums/script/ tar -zcp site_css/ > site_css.tar.gz mount -t nfs4 nfsshareipaddress:/site_css /home/username/public_html/forums/scripts/site_css/ -o rw,soft cd /home/username/public_html/forums/script/ tar -zxf site_css.tar.gz But checking on web1 file permissions no longer username user/group but owned by nobody ? but web2 file permissions correct ? This is only a problem for web1 while web2 is correct ? Looks like numeric ids aren't the same ? Not sure how to correct this ? web1 with incorrect user/group of nobody ls -alh /home/username/public_html/forums/scripts/site_css total 48K drwxrwxrwx 2 nobody nobody 4.0K Feb 22 02:37 ./ drwxr-xr-x 3 username username 4.0K Feb 22 02:43 ../ -rw-r--r-- 1 nobody nobody 1 Nov 30 2006 index.html -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-cc2f96c9-00011.css web1 numeric ids ls -n /home/username/public_html/forums/scripts/site_css total 48 drwxrwxrwx 2 99 99 4096 Feb 22 02:37 ./ drwxr-xr-x 3 503 500 4096 Feb 22 02:43 ../ -rw-r--r-- 1 99 99 1 Nov 30 2006 index.html -rw-r--r-- 1 99 99 5876 Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 99 99 5877 Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 99 99 5877 Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 99 99 5876 Feb 18 05:37 style-cc2f96c9-00011.css web2 correct username user/group permissions ls -alh /home/username/public_html/forums/scripts/site_css total 48K drwxrwxrwx 2 root root 4.0K Feb 22 02:37 ./ drwxr-xr-x 3 username username 4.0K Dec 2 14:51 ../ -rw-r--r-- 1 username username 1 Nov 30 2006 index.html -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-cc2f96c9-00011.css web2 numeric ids ls -n /home/username/public_html/forums/scripts/site_css total 48 drwxrwxrwx 2 503 500 4096 Feb 22 02:37 ./ drwxr-xr-x 3 503 500 4096 Dec 2 14:51 ../ -rw-r--r-- 1 503 500 1 Nov 30 2006 index.html -rw-r--r-- 1 503 500 5876 Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 503 500 5877 Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 503 500 5877 Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 503 500 5876 Feb 18 05:37 style-cc2f96c9-00011.css I checked db1 /nfsroot/site_css and user/group ownership was incorrect for newer files dated feb22 owned by root and not username ? on db1 originally incorrect root assigned user/group for new feb22 dated files ls -alh /nfsroot/site_css total 44K drwxrwxrwx 2 root root 4.0K Feb 22 02:37 . drwxr-xr-x 17 root root 4.0K Feb 17 12:06 .. -rw-r--r-- 1 root root 1 Nov 30 2006 index.html -rw-r--r-- 1 root root 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 root root 5.8K Feb 22 02:37 style-95001864-00002.css -rw------- 1 username nfs 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw------- 1 username nfs 5.8K Feb 18 05:37 style-cc2f96c9-00011.css Then I chmod them all on db1 and chown to set to right ownership on db1 so it looks like below on db1 once corrected the newer feb22 dated files ls -alh /nfsroot/site_css total 44K drwxrwxrwx 2 root root 4.0K Feb 22 02:37 . drwxr-xr-x 17 root root 4.0K Feb 17 12:06 .. -rw-r--r-- 1 username username 1 Nov 30 2006 index.html -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-cc2f96c9-00011.css but still web1 shows owned by nobody ? while web2 shows correct permissions ? web1 still with incorrect user/group of nobody not matching what web2 and db1 are set to ? ls -alh /home/username/public_html/forums/scripts/site_css total 48K drwxrwxrwx 2 nobody nobody 4.0K Feb 22 02:37 ./ drwxr-xr-x 3 username username 4.0K Feb 22 02:43 ../ -rw-r--r-- 1 nobody nobody 1 Nov 30 2006 index.html -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-cc2f96c9-00011.css Just so confusing so any help is very very much appreciated! thanks

Read the article

CentOS 5.4 NFS v4 client file permissions differ from original files & NFS Share file contents

- by p4guru

Having a strange problem with NFS share and file permissions on the 1 out of the 2 NFS clients, web1 has file permissions issues but web2 is fine. web1 and web2 are load balanced web servers. So questions are: how do I ensure NFS share file contents retain the same permissions for user/group as the original files on web1 server like they do on web2 server ? how do I reverse what I did on web1, i tried unmount command and said command not found ? Information: I'm using 3 dedicated server setup. All 3 servers CentOS 5.4 64bit based. servers are as follows: web1 - nfs client with file permissions issues web2 - nfs client file permissions are OKAY db1 - nfs share at /nfsroot web2 nfs client was setup by my web host, while web1 was setup by me. I did the following commands on web1 and it worked with updating db1 nfsroot share at /nfsroot/site_css with latest files on web1 but the file permissions don't stick even if i use tar with -p command to perserve file permissions ? cd /home/username/public_html/forums/script/ tar -zcp site_css/ > site_css.tar.gz mount -t nfs4 nfsshareipaddress:/site_css /home/username/public_html/forums/scripts/site_css/ -o rw,soft cd /home/username/public_html/forums/script/ tar -zxf site_css.tar.gz But checking on web1 file permissions no longer username user/group but owned by nobody ? but web2 file permissions correct ? This is only a problem for web1 while web2 is correct ? Looks like numeric ids aren't the same ? Not sure how to correct this ? web1 with incorrect user/group of nobody ls -alh /home/username/public_html/forums/scripts/site_css total 48K drwxrwxrwx 2 nobody nobody 4.0K Feb 22 02:37 ./ drwxr-xr-x 3 username username 4.0K Feb 22 02:43 ../ -rw-r--r-- 1 nobody nobody 1 Nov 30 2006 index.html -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-cc2f96c9-00011.css web1 numeric ids ls -n /home/username/public_html/forums/scripts/site_css total 48 drwxrwxrwx 2 99 99 4096 Feb 22 02:37 ./ drwxr-xr-x 3 503 500 4096 Feb 22 02:43 ../ -rw-r--r-- 1 99 99 1 Nov 30 2006 index.html -rw-r--r-- 1 99 99 5876 Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 99 99 5877 Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 99 99 5877 Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 99 99 5876 Feb 18 05:37 style-cc2f96c9-00011.css web2 correct username user/group permissions ls -alh /home/username/public_html/forums/scripts/site_css total 48K drwxrwxrwx 2 root root 4.0K Feb 22 02:37 ./ drwxr-xr-x 3 username username 4.0K Dec 2 14:51 ../ -rw-r--r-- 1 username username 1 Nov 30 2006 index.html -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-cc2f96c9-00011.css web2 numeric ids ls -n /home/username/public_html/forums/scripts/site_css total 48 drwxrwxrwx 2 503 500 4096 Feb 22 02:37 ./ drwxr-xr-x 3 503 500 4096 Dec 2 14:51 ../ -rw-r--r-- 1 503 500 1 Nov 30 2006 index.html -rw-r--r-- 1 503 500 5876 Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 503 500 5877 Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 503 500 5877 Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 503 500 5876 Feb 18 05:37 style-cc2f96c9-00011.css I checked db1 /nfsroot/site_css and user/group ownership was incorrect for newer files dated feb22 owned by root and not username ? on db1 originally incorrect root assigned user/group for new feb22 dated files ls -alh /nfsroot/site_css total 44K drwxrwxrwx 2 root root 4.0K Feb 22 02:37 . drwxr-xr-x 17 root root 4.0K Feb 17 12:06 .. -rw-r--r-- 1 root root 1 Nov 30 2006 index.html -rw-r--r-- 1 root root 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 root root 5.8K Feb 22 02:37 style-95001864-00002.css -rw------- 1 username nfs 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw------- 1 username nfs 5.8K Feb 18 05:37 style-cc2f96c9-00011.css Then I chmod them all on db1 and chown to set to right ownership on db1 so it looks like below on db1 once corrected the newer feb22 dated files ls -alh /nfsroot/site_css total 44K drwxrwxrwx 2 root root 4.0K Feb 22 02:37 . drwxr-xr-x 17 root root 4.0K Feb 17 12:06 .. -rw-r--r-- 1 username username 1 Nov 30 2006 index.html -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 username username 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 username username 5.8K Feb 18 05:37 style-cc2f96c9-00011.css but still web1 shows owned by nobody ? while web2 shows correct permissions ? web1 still with incorrect user/group of nobody not matching what web2 and db1 are set to ? ls -alh /home/username/public_html/forums/scripts/site_css total 48K drwxrwxrwx 2 nobody nobody 4.0K Feb 22 02:37 ./ drwxr-xr-x 3 username username 4.0K Feb 22 02:43 ../ -rw-r--r-- 1 nobody nobody 1 Nov 30 2006 index.html -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-057c3df0-00011.css -rw-r--r-- 1 nobody nobody 5.8K Feb 22 02:37 style-95001864-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-b1879ba7-00002.css -rw-r--r-- 1 nobody nobody 5.8K Feb 18 05:37 style-cc2f96c9-00011.css Just so confusing so any help is very very much appreciated! thanks

Read the article

Commit in SQL

- by PRajkumar

SQL Transaction Control Language Commands (TCL) (COMMIT) Commit Transaction As a SQL language we use transaction control language very frequently. Committing a transaction means making permanent the changes performed by the SQL statements within the transaction. A transaction is a sequence of SQL statements that Oracle Database treats as a single unit. This statement also erases all save points in the transaction and releases transaction locks. Oracle Database issues an implicit COMMIT before and after any data definition language (DDL) statement. Oracle recommends that you explicitly end every transaction in your application programs with a COMMIT or ROLLBACK statement, including the last transaction, before disconnecting from Oracle Database. If you do not explicitly commit the transaction and the program terminates abnormally, then the last uncommitted transaction is automatically rolled back. Until you commit a transaction: · You can see any changes you have made during the transaction by querying the modified tables, but other users cannot see the changes. After you commit the transaction, the changes are visible to other users' statements that execute after the commit · You can roll back (undo) any changes made during the transaction with the ROLLBACK statement Note: Most of the people think that when we type commit data or changes of what you have made has been written to data files, but this is wrong when you type commit it means that you are saying that your job has been completed and respective verification will be done by oracle engine that means it checks whether your transaction achieved consistency when it finds ok it sends a commit message to the user from log buffer but not from data buffer, so after writing data in log buffer it insists data buffer to write data in to data files, this is how it works. Before a transaction that modifies data is committed, the following has occurred: · Oracle has generated undo information. The undo information contains the old data values changed by the SQL statements of the transaction · Oracle has generated redo log entries in the redo log buffer of the System Global Area (SGA). The redo log record contains the change to the data block and the change to the rollback block. These changes may go to disk before a transaction is committed · The changes have been made to the database buffers of the SGA. These changes may go to disk before a transaction is committed Note: The data changes for a committed transaction, stored in the database buffers of the SGA, are not necessarily written immediately to the data files by the database writer (DBWn) background process. This writing takes place when it is most efficient for the database to do so. It can happen before the transaction commits or, alternatively, it can happen some times after the transaction commits. When a transaction is committed, the following occurs: 1. The internal transaction table for the associated undo table space records that the transaction has committed, and the corresponding unique system change number (SCN) of the transaction is assigned and recorded in the table 2. The log writer process (LGWR) writes redo log entries in the SGA's redo log buffers to the redo log file. It also writes the transaction's SCN to the redo log file. This atomic event constitutes the commit of the transaction 3. Oracle releases locks held on rows and tables 4. Oracle marks the transaction complete Note: The default behavior is for LGWR to write redo to the online redo log files synchronously and for transactions to wait for the redo to go to disk before returning a commit to the user. However, for lower transaction commit latency application developers can specify that redo be written asynchronously and that transaction do not need to wait for the redo to be on disk. The syntax of Commit Statement is COMMIT [WORK] [COMMENT ‘your comment’]; · WORK is optional. The WORK keyword is supported for compliance with standard SQL. The statements COMMIT and COMMIT WORK are equivalent. Examples Committing an Insert INSERT INTO table_name VALUES (val1, val2); COMMIT WORK; · COMMENT Comment is also optional. This clause is supported for backward compatibility. Oracle recommends that you used named transactions instead of commit comments. Specify a comment to be associated with the current transaction. The 'text' is a quoted literal of up to 255 bytes that Oracle Database stores in the data dictionary view DBA_2PC_PENDING along with the transaction ID if a distributed transaction becomes in doubt. This comment can help you diagnose the failure of a distributed transaction. Examples The following statement commits the current transaction and associates a comment with it: COMMIT COMMENT 'In-doubt transaction Code 36, Call (415) 555-2637'; · WRITE Clause Use this clause to specify the priority with which the redo information generated by the commit operation is written to the redo log. This clause can improve performance by reducing latency, thus eliminating the wait for an I/O to the redo log. Use this clause to improve response time in environments with stringent response time requirements where the following conditions apply: The volume of update transactions is large, requiring that the redo log be written to disk frequently. The application can tolerate the loss of an asynchronously committed transaction. The latency contributed by waiting for the redo log write to occur contributes significantly to overall response time. You can specify the WAIT | NOWAIT and IMMEDIATE | BATCH clauses in any order. Examples To commit the same insert operation and instruct the database to buffer the change to the redo log, without initiating disk I/O, use the following COMMIT statement: COMMIT WRITE BATCH; Note: If you omit this clause, then the behavior of the commit operation is controlled by the COMMIT_WRITE initialization parameter, if it has been set. The default value of the parameter is the same as the default for this clause. Therefore, if the parameter has not been set and you omit this clause, then commit records are written to disk before control is returned to the user. WAIT | NOWAIT Use these clauses to specify when control returns to the user. The WAIT parameter ensures that the commit will return only after the corresponding redo is persistent in the online redo log. Whether in BATCH or IMMEDIATE mode, when the client receives a successful return from this COMMIT statement, the transaction has been committed to durable media. A crash occurring after a successful write to the log can prevent the success message from returning to the client. In this case the client cannot tell whether or not the transaction committed. The NOWAIT parameter causes the commit to return to the client whether or not the write to the redo log has completed. This behavior can increase transaction throughput. With the WAIT parameter, if the commit message is received, then you can be sure that no data has been lost. Caution: With NOWAIT, a crash occurring after the commit message is received, but before the redo log record(s) are written, can falsely indicate to a transaction that its changes are persistent. If you omit this clause, then the transaction commits with the WAIT behavior. IMMEDIATE | BATCH Use these clauses to specify when the redo is written to the log. The IMMEDIATE parameter causes the log writer process (LGWR) to write the transaction's redo information to the log. This operation option forces a disk I/O, so it can reduce transaction throughput. The BATCH parameter causes the redo to be buffered to the redo log, along with other concurrently executing transactions. When sufficient redo information is collected, a disk write of the redo log is initiated. This behavior is called "group commit", as redo for multiple transactions is written to the log in a single I/O operation. If you omit this clause, then the transaction commits with the IMMEDIATE behavior. · FORCE Clause Use this clause to manually commit an in-doubt distributed transaction or a corrupt transaction. · In a distributed database system, the FORCE string [, integer] clause lets you manually commit an in-doubt distributed transaction. The transaction is identified by the 'string' containing its local or global transaction ID. To find the IDs of such transactions, query the data dictionary view DBA_2PC_PENDING. You can use integer to specifically assign the transaction a system change number (SCN). If you omit integer, then the transaction is committed using the current SCN. · The FORCE CORRUPT_XID 'string' clause lets you manually commit a single corrupt transaction, where string is the ID of the corrupt transaction. Query the V$CORRUPT_XID_LIST data dictionary view to find the transaction IDs of corrupt transactions. You must have DBA privileges to view the V$CORRUPT_XID_LIST and to specify this clause. · Specify FORCE CORRUPT_XID_ALL to manually commit all corrupt transactions. You must have DBA privileges to specify this clause. Examples Forcing an in doubt transaction. Example The following statement manually commits a hypothetical in-doubt distributed transaction. Query the V$CORRUPT_XID_LIST data dictionary view to find the transaction IDs of corrupt transactions. You must have DBA privileges to view the V$CORRUPT_XID_LIST and to issue this statement. COMMIT FORCE '22.57.53';

Read the article

Performance Enhancement in Full-Text Search Query

- by Calvin Sun

Ever since its first release, we are continuing consolidating and developing InnoDB Full-Text Search feature. There is one recent improvement that worth blogging about. It is an effort with MySQL Optimizer team that simplifies some common queries’ Query Plans and dramatically shorted the query time. I will describe the issue, our solution and the end result by some performance numbers to demonstrate our efforts in continuing enhancement the Full-Text Search capability. The Issue: As we had discussed in previous Blogs, InnoDB implements Full-Text index as reversed auxiliary tables. The query once parsed will be reinterpreted into several queries into related auxiliary tables and then results are merged and consolidated to come up with the final result. So at the end of the query, we’ll have all matching records on hand, sorted by their ranking or by their Doc IDs. Unfortunately, MySQL’s optimizer and query processing had been initially designed for MyISAM Full-Text index, and sometimes did not fully utilize the complete result package from InnoDB. Here are a couple examples: Case 1: Query result ordered by Rank with only top N results: mysql> SELECT FTS_DOC_ID, MATCH (title, body) AGAINST ('database') AS SCORE FROM articles ORDER BY score DESC LIMIT 1; In this query, user tries to retrieve a single record with highest ranking. It should have a quick answer once we have all the matching documents on hand, especially if there are ranked. However, before this change, MySQL would almost retrieve rankings for almost every row in the table, sort them and them come with the top rank result. This whole retrieve and sort is quite unnecessary given the InnoDB already have the answer. In a real life case, user could have millions of rows, so in the old scheme, it would retrieve millions of rows' ranking and sort them, even if our FTS already found there are two 3 matched rows. Apparently, the million ranking retrieve is done in vain. In above case, it should just ask for 3 matched rows' ranking, all other rows' ranking are 0. If it want the top ranking, then it can just get the first record from our already sorted result. Case 2: Select Count(*) on matching records: mysql> SELECT COUNT(*) FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE); In this case, InnoDB search can find matching rows quickly and will have all matching rows. However, before our change, in the old scheme, every row in the table was requested by MySQL one by one, just to check whether its ranking is larger than 0, and later comes up a count. In fact, there is no need for MySQL to fetch all rows, instead InnoDB already had all the matching records. The only thing need is to call an InnoDB API to retrieve the count The difference can be huge. Following query output shows how big the difference can be: mysql> select count(*) from searchindex_inno where match(si_title, si_text) against ('people') +----------+ | count(*) | +----------+ | 666877 | +----------+ 1 row in set (16 min 17.37 sec) So the query took almost 16 minutes. Let’s see how long the InnoDB can come up the result. In InnoDB, you can obtain extra diagnostic printout by turning on “innodb_ft_enable_diag_print”, this will print out extra query info: Error log: keynr=2, 'people' NL search Total docs: 10954826 Total words: 0 UNION: Searching: 'people' Processing time: 2 secs: row(s) 666877: error: 10 ft_init() ft_init_ext() keynr=2, 'people' NL search Total docs: 10954826 Total words: 0 UNION: Searching: 'people' Processing time: 3 secs: row(s) 666877: error: 10 Output shows it only took InnoDB only 3 seconds to get the result, while the whole query took 16 minutes to finish. So large amount of time has been wasted on the un-needed row fetching. The Solution: The solution is obvious. MySQL can skip some of its steps, optimize its plan and obtain useful information directly from InnoDB. Some of savings from doing this include: 1) Avoid redundant sorting. Since InnoDB already sorted the result according to ranking. MySQL Query Processing layer does not need to sort to get top matching results. 2) Avoid row by row fetching to get the matching count. InnoDB provides all the matching records. All those not in the result list should all have ranking of 0, and no need to be retrieved. And InnoDB has a count of total matching records on hand. No need to recount. 3) Covered index scan. InnoDB results always contains the matching records' Document ID and their ranking. So if only the Document ID and ranking is needed, there is no need to go to user table to fetch the record itself. 4) Narrow the search result early, reduce the user table access. If the user wants to get top N matching records, we do not need to fetch all matching records from user table. We should be able to first select TOP N matching DOC IDs, and then only fetch corresponding records with these Doc IDs. Performance Results and comparison with MyISAM The result by this change is very obvious. I includes six testing result performed by Alexander Rubin just to demonstrate how fast the InnoDB query now becomes when comparing MyISAM Full-Text Search. These tests are base on the English Wikipedia data of 5.4 Million rows and approximately 16G table. The test was performed on a machine with 1 CPU Dual Core, SSD drive, 8G of RAM and InnoDB_buffer_pool is set to 8 GB. Table 1: SELECT with LIMIT CLAUSE mysql> SELECT si_title, match(si_title, si_text) against('family') as rel FROM si WHERE match(si_title, si_text) against('family') ORDER BY rel desc LIMIT 10; InnoDB MyISAM Times Faster Time for the query 1.63 sec 3 min 26.31 sec 127 You can see for this particular query (retrieve top 10 records), InnoDB Full-Text Search is now approximately 127 times faster than MyISAM. Table 2: SELECT COUNT QUERY mysql>select count(*) from si where match(si_title, si_text) against('family‘); +----------+ | count(*) | +----------+ | 293955 | +----------+ InnoDB MyISAM Times Faster Time for the query 1.35 sec 28 min 59.59 sec 1289 In this particular case, where there are 293k matching results, InnoDB took only 1.35 second to get all of them, while take MyISAM almost half an hour, that is about 1289 times faster!. Table 3: SELECT ID with ORDER BY and LIMIT CLAUSE for selected terms mysql> SELECT <ID>, match(si_title, si_text) against(<TERM>) as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.5 sec 5.05 sec 10.1 family film 0.95 sec 25.39 sec 26.7 Pizza restaurant orange county California 0.93 sec 32.03 sec 34.4 President united states of America 2.5 sec 36.98 sec 14.8 Table 4: SELECT title and text with ORDER BY and LIMIT CLAUSE for selected terms mysql> SELECT <ID>, si_title, si_text, ... as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.61 sec 41.65 sec 68.3 family film 1.15 sec 47.17 sec 41.0 Pizza restaurant orange county california 1.03 sec 48.2 sec 46.8 President united states of america 2.49 sec 44.61 sec 17.9 Table 5: SELECT ID with ORDER BY and LIMIT CLAUSE for selected terms mysql> SELECT <ID>, match(si_title, si_text) against(<TERM>) as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.5 sec 5.05 sec 10.1 family film 0.95 sec 25.39 sec 26.7 Pizza restaurant orange county califormia 0.93 sec 32.03 sec 34.4 President united states of america 2.5 sec 36.98 sec 14.8 Table 6: SELECT COUNT(*) mysql> SELECT count(*) FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.47 sec 82 sec 174.5 family film 0.83 sec 131 sec 157.8 Pizza restaurant orange county califormia 0.74 sec 106 sec 143.2 President united states of america 1.96 sec 220 sec 112.2 Again, table 3 to table 6 all showing InnoDB consistently outperform MyISAM in these queries by a large margin. It becomes obvious the InnoDB has great advantage over MyISAM in handling large data search. Summary: These results demonstrate the great performance we could achieve by making MySQL optimizer and InnoDB Full-Text Search more tightly coupled. I think there are still many cases that InnoDB’s result info have not been fully taken advantage of, which means we still have great room to improve. And we will continuously explore the area, and get more dramatic results for InnoDB full-text searches. Jimmy Yang, September 29, 2012

Search Results

Search found 1637 results on 66 pages for 'ids'.

Page 18/66 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by jeffreyabecker

- by Your DisplayName here!

- by Aman Garg

- by Justin

- by Jorge Castro

- by bshacklett

- by Tyson

- by Di Seghposs

- by Tum

- by user762552

- by Norman Ramsey

- by dkusleika

- by Nimmy Lebby

- by Fidel

- by bstpierre

- by Niels Basjes

- by LaC

- by mhu

- by me how

- by Shaun

- by p4guru

- by p4guru

- by PRajkumar

- by Calvin Sun

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >