Search Results

Search found 14692 results on 588 pages for 'android nested fragment'.

Page 448/588 | < Previous Page | 444 445 446 447 448 449 450 451 452 453 454 455  | Next Page >

  • Native mobile app development - how do I structure my user stories?

    - by richsage
    I'm about to start on a project which will involve developing prototype native mobile apps (iOS and Android initially) as well as a web-based admin interface and an API for these apps to communicate with. We've got a list of stories already drafted up, however a lot of them are in the format: As a mobile user I want to be able to view a login screen so that I can sign into the app If this were targeted for a single platform, I wouldn't see a problem. However, since we're targeting multiple platforms, I'm not sure whether these should now be duplicated eg "As an Android user" or similar. This seems like duplication, but it's work that will need to be completed separately for each platform. This is the first mobile project we've gone native on - previously it was Phonegap and we lumped all stories in under "As a mobile user". Since essentially this was a web-based app wrapped in native code, this didn't present too much of an issue, but I'm conscious that wholly-native apps are a different ballgame!

    Read the article

  • How To Sideload Modern Apps on Windows 8

    - by Chris Hoffman
    The average Windows 8 user can only download apps that Microsoft has approved from the Windows Store. Windows 8 offers two ways to sideload unapproved apps, which are intended for developers and businesses with internal apps. These methods cannot be used by the average geek to install unapproved apps from the web. Windows 8’s new interface takes the Apple iOS approach of forbidding unapproved software, not the Android approach of allowing all users to enable sideloading. Note: This only applies to Modern apps in the new Windows 8 interface, not on the desktop. Windows desktop applications can be installed normally. However, you can’t install any desktop applications on devices running Windows RT. Why Does 64-Bit Windows Need a Separate “Program Files (x86)” Folder? Why Your Android Phone Isn’t Getting Operating System Updates and What You Can Do About It How To Delete, Move, or Rename Locked Files in Windows

    Read the article

  • Adobe jette l'éponge sur l'iPhone, «Puisqu'on ne veut pas de nous, on va voir ailleurs» déclare le r

    Mise à jour du 21.04.2009 par Katleen Adobe jette l'éponge sur l'iPhone, «Puisqu'on ne veut pas de nous, on va voir ailleurs» déclare le responsable de Flash Suite à la situation qui s'envenime entre Adobe et Apple, Mike Chambers, le responsable du produit Flash pour la firme a décidé de s'exprimre publiquement dans un long billet, publié sur son blog. Pour contrer la compagnie de Jobs, il expose ses projets avec sa rivale de Mountain View. «Heureusement, Apple n'est pas le seul acteur. Les téléphones sous Android connaissent un succès croissant et de nombreuses tablettes Android doivent sortir cette année. Nous travaillons main dans la main avec Google pour amener Flash Player et Adobe Air s...

    Read the article

  • Naming convention: field starting with "m" or "s"

    - by Noya
    Hope this question hasn't posted yet... I saw lot of code (for example some Android source code) where fields name start with a "m" while static fields start with "s" Example (taken from Android View class source): private SparseArray<Object> mKeyedTags; private static int sNextAccessibilityViewId; I was wondering what "m" and "s" stand for... maybe is "m" mutable and "s" static? Since it seems that is a largely adopted pattern do you know if there some literature about this kind of naming convention?

    Read the article

  • HTG Explains: What are Shadow Copies and How Can I Use Them to Copy or Backup Locked Files?

    - by Jason Faulkner
    When trying to create simple file copy backups in Windows, a common problem is locked files which can trip up the operation. Whether the file is currently opened by the user or locked by the OS itself, certain files have to be completely unused in order to be copied. Thankfully, there is a simple solution: Shadow Copies. Using our simple tool, you can easily access shadow copies which allows access to point-in-time copies of the currently locked files as created by Windows Restore. Image credit: Best Backup Services How To Use USB Drives With the Nexus 7 and Other Android Devices Why Does 64-Bit Windows Need a Separate “Program Files (x86)” Folder? Why Your Android Phone Isn’t Getting Operating System Updates and What You Can Do About It

    Read the article

  • Ask the Readers: How Do You Stay Productive Working from Home?

    - by Jason Fitzpatrick
    Roughly 20% of the global workforce telecommutes on a permanent or part-time basis; if you’re one of the many laptop-toting and home-office working telecommuters we want to hear all about how you stay productive outside the walls of a traditional office. Whether you have a dedicated home office or an attache that unfolds into a mobile workstation, we want to hear your tips, tricks, and productivity-focusing methods for getting things done when you’re working from home. Sound off in the comments with your tips and then check back in on Friday for the What You Said Roundup. How To Use USB Drives With the Nexus 7 and Other Android Devices Why Does 64-Bit Windows Need a Separate “Program Files (x86)” Folder? Why Your Android Phone Isn’t Getting Operating System Updates and What You Can Do About It

    Read the article

  • How can a single database work for website, mobile apps?

    - by user1696497
    We have developed a job-portal where users can view jobs and and also post jobs. We have used Php and MySQL. We hosted this on web faction. Now we want to develop the mobile app of the job portal for android, ios and windows. As the database should be synchronous and aligned dynamically with apps and website database. As the back-end code has to be changed to Java in android and c# in windows, how to manage a single synchronous database?

    Read the article

  • Google s'intéresse de près au bureau en 3D en rachetant une start-up canadienne : bientôt dans Chrom

    Google s'intéresse de près au bureau en 3D En rachetant une start-up canadienne : bientôt dans Chrome et Android ? BumpTop vient d'être rachetée par Google. Cette start-up canadienne a mis au point une interface 3D destinée à remplacer le bureau de Windows ou de Mac OS. Visiblement très intéressé par ce produit - et très satisfait de son état d'avancement - Google a décidé de faire l'acquisition de la société pour un montant resté confidentiel. Un des atouts de BumpTop est de supporter le multi-touch. Une fonctionnalité qui en fait un client idéal pour intégrer Android, l'OS de Google pour les smartphones (ou les terminaux tels que les télévisions), voire pou...

    Read the article

  • La Chine concurrence Apple avec l'iPed, une tablette 5 fois moins chère que l'iPad tournant sous And

    La Chine concurrence Apple avec l'iPed, une tablette 5 fois moins chère que l'iPad tournant sous Android Au salon Computex, qui a actuellement lieu à Taïwan, divers constructeurs présentent leurs nouveaux produits. Parmi les nouveaux appareils, plusieurs Tablet PC. Et, parmi ces tablettes... l'iPed. Ce gadget chinois est commercialisé par plusieurs revendeurs chinois. Non content d'emprunter le nom de la tablette d'Apple à une lettre près, il en reprend aussi le design et les icônes (et même l'emballage). Se voulant bon marché et abordable, l'iPed ne coûte que 105 dollars (86 euros, soit cinq fois moins cher que l'original). Pour ce prix, l'appareil est équipé d'un processeur Intel et de l'OS de Google : Android. Mai...

    Read the article

  • Adding UL to jQuery UI Tabs

    - by Dave Kiss
    It seems like whenever I try to add a UL inside of the containers when using jQuery UI - Tabs, it breaks the javascript. Is there a way I can use a UL inside of these containers that I am missing? Thanks <div id="tabs"> <div id="fragment-1"> <h4>Pre-Press Requirements</h4> ? SAMPLE of final artwork with noted sizes. ? NATIVE FILES & High Resolution PDF preferred. Files must be created or saved to the listed accepted file formats. ? FONTS used in files need to be supplied in a separate folder marked "fonts". Please ensure all families (screen & printer) fonts are supplied for the job. ? IMAGES must be saved as CMYK and no less than 300dpi. NO RGB FILES! We prefer images to be TIF or EPS formats. If you are submitting artwork for spot color printing vector artwork is preferred. Your images should be supplied in a separate folder marked "links". This will ensure proper reproduction of your artwork. ? COLORS need to be clearly specified. Pantone (PMS) colors preferred. Please specify if job is to be printed CMYK, spot color, etc. ? BLEEDS should be no less than .25" ? PDF's should be High Resolution. Please include any spot colors or CMYK format for full color printing. NO RGB FILES! All bleeds should be included with trim marks. All fonts must be embedded or outlined. No "layered" PDF files. </div> <div id="fragment-2"> <p>Our presses are all capable of sizes up to 11" x 17" using spot color or full color.</p> </div> <div id="fragment-3"> <p>USA Quickprint has complete in house bindery to finish each job to meet your needs.</p> <ul> <li>Folding</li> <li>Scoring</li> <li>Perforation</li> <li>Drilling</li> <li>Shrink Wrap</li> <li>Trimming</li> <li>Collating</li> <li>Spiral Binding</li> <li>GBC Binding</li> <li>Padding</li> <li>Stapling</li> <li>Numbering</li> <li>Lamination</li> </ul> </div>

    Read the article

  • problem with send me log

    - by Lynnooi
    Hi, I had try to implement the send me log feature into my apps but I can't get it right. Can anyone please help me with it? In the logcat, it shows the errors: 03-29 21:23:37.636: ERROR/AndroidRuntime(820): Uncaught handler: thread AsyncTask #1 exiting due to uncaught exception 03-29 21:23:37.726: ERROR/AndroidRuntime(820): java.lang.RuntimeException: An error occured while executing doInBackground() 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at android.os.AsyncTask$3.done(AsyncTask.java:200) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at java.util.concurrent.FutureTask$Sync.innerSetException(FutureTask.java:234) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:258) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at java.util.concurrent.FutureTask.run(FutureTask.java:122) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:648) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:673) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at java.lang.Thread.run(Thread.java:1058) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): Caused by: java.lang.NullPointerException 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at resonet.android.androidgallery.helloAndroid$CheckForceCloseTask.doInBackground(helloAndroid.java:1565) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at resonet.android.androidgallery.helloAndroid$CheckForceCloseTask.doInBackground(helloAndroid.java:1) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at android.os.AsyncTask$2.call(AsyncTask.java:185) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:256) 03-29 21:23:37.726: ERROR/AndroidRuntime(820): ... 4 more Thanks. Here is my code: public class helloAndroid extends Activity implements OnClickListener { public static final int DIALOG_SEND_LOG = 345350; protected static final int DIALOG_PROGRESS_COLLECTING_LOG = 3255; protected static final int DIALOG_FAILED_TO_COLLECT_LOGS = 3535122; private static final int DIALOG_REPORT_FORCE_CLOSE = 3535788; private LogCollector mLogCollector; public void onCreate(Bundle savedInstanceState) { requestWindowFeature(Window.FEATURE_NO_TITLE); Bundle b = this.getIntent().getExtras(); s = b.getString("specialValue").trim(); String xmlURL = ""; CheckForceCloseTask task = new CheckForceCloseTask(); task.execute(); } private void throwException() { throw new NullPointerException(); } @Override protected Dialog onCreateDialog(int id) { Dialog dialog = null; switch (id) { case DIALOG_SEND_LOG: case DIALOG_REPORT_FORCE_CLOSE: Builder builder = new AlertDialog.Builder(this); String message; if (id==DIALOG_SEND_LOG) message = "Do you want to send me your logs?"; else message = "It appears this app has been force-closed, do you want to report it to me?"; builder.setTitle("Warning") .setIcon(android.R.drawable.ic_dialog_alert) .setMessage(message) .setPositiveButton("Yes", this) .setNegativeButton("No", this); dialog = builder.create(); break; case DIALOG_PROGRESS_COLLECTING_LOG: ProgressDialog pd = new ProgressDialog(this); pd.setTitle("Progress"); pd.setMessage("Collecting logs..."); pd.setIndeterminate(true); dialog = pd; break; case DIALOG_FAILED_TO_COLLECT_LOGS: builder = new AlertDialog.Builder(this); builder.setTitle("Error") .setMessage("Failed to collect logs.") .setNegativeButton("OK", null); dialog = builder.create(); } return dialog; } class CheckForceCloseTask extends AsyncTask { @Override protected Boolean doInBackground(Void... params) { return mLogCollector.hasForceCloseHappened(); } @Override protected void onPostExecute(Boolean result) { if (result) { showDialog(DIALOG_REPORT_FORCE_CLOSE); } else Toast.makeText(getApplicationContext(), "No force close detected.", Toast.LENGTH_LONG).show(); } } public void onClick(DialogInterface dialog, int which) { switch (which) { case DialogInterface.BUTTON_POSITIVE: new AsyncTask() { @Override protected Boolean doInBackground(Void... params) { return mLogCollector.collect(); } @Override protected void onPreExecute() { showDialog(DIALOG_PROGRESS_COLLECTING_LOG); } @Override protected void onPostExecute(Boolean result) { dismissDialog(DIALOG_PROGRESS_COLLECTING_LOG); if (result) mLogCollector.sendLog("[email protected]", "Error Log", "Preface\nPreface line 2"); else showDialog(DIALOG_FAILED_TO_COLLECT_LOGS); } }.execute(); } dialog.dismiss(); } }

    Read the article

  • what does the asterisk mean after a filename if you do ls -l

    - by James.Elsey
    I've done an ls -l inside a directory, and my files are displaying like this : james@nevada:~/development/tools/android-sdk-linux_86/tools$ ll total 9512 drwxr-xr-x 3 james james 4096 2010-05-07 19:48 ./ drwxr-xr-x 6 james james 4096 2010-08-21 20:43 ../ -rwxr-xr-x 1 james james 341773 2010-05-07 19:47 adb* -rwxr-xr-x 1 james james 3636 2010-05-07 19:47 android* -rwxr-xr-x 1 james james 2382 2010-05-07 19:47 apkbuilder* -rwxr-xr-x 1 james james 3265 2010-05-07 19:47 ddms* -rwxr-xr-x 1 james james 89032 2010-05-07 19:47 dmtracedump* -rwxr-xr-x 1 james james 1940 2010-05-07 19:47 draw9patch* -rwxr-xr-x 1 james james 6886136 2010-05-07 19:47 emulator* -rwxr-xr-x 1 james james 478199 2010-05-07 19:47 etc1tool* -rwxr-xr-x 1 james james 1987 2010-05-07 19:47 hierarchyviewer* -rwxr-xr-x 1 james james 23044 2010-05-07 19:47 hprof-conv* -rwxr-xr-x 1 james james 1939 2010-05-07 19:47 layoutopt* drwxr-xr-x 4 james james 4096 2010-05-07 19:48 lib/ -rwxr-xr-x 1 james james 16550 2010-05-07 19:47 mksdcard* -rw-r--r-- 1 james james 205851 2010-05-07 19:48 NOTICE.txt -rw-r--r-- 1 james james 33 2010-05-07 19:47 source.properties -rwxr-xr-x 1 james james 1447936 2010-05-07 19:47 sqlite3* -rwxr-xr-x 1 james james 3044 2010-05-07 19:47 traceview* -rwxr-xr-x 1 james james 187965 2010-05-07 19:47 zipalign* What does that asterisk mean? I'm also unable to run a particular file, as follows : james@nevada:~/development/tools/android-sdk-linux_86/tools$ ./emulator bash: ./emulator: No such file or directory EDIT : I'm trying to get Eclipse to use emulator, but it keeps complaining the files does not exist, yet it is here?

    Read the article

  • Eclipse grinds to a halt when building workspace

    - by Chris Thompson
    Hi all, This is a bit of a vague question because, frankly, I don't even know where to begin diagnosing the issue. My eclipse (Galileo) installation grinds to a complete halt when it's building the workspace -- to the point where I can't even type. I know the Android SDK I have installed is a major culprit because I can watch the memory usage go through the roof (through the built-in heap monitor) when the Android SDK content loader starts up. Every time I save a file though, the program just stops. The message at the bottom of the screen says Building workspace (74%) and sits there for about 30 or so seconds before completing and returning the performance to normal. I have a few other plugins installed (Maven, SVN, etc) but I'm assuming the main issue is Android. Has anybody had similar issues or any luck correcting this sort of problem? If there's anymore information you think would be helpful, just let me know...I didn't want to do a core dump on this question... I'm running it on Windows 7 64-bit for what it's worth. Thanks! Chris

    Read the article

  • Unzipping archives, preserving folder hierarchy

    - by Hydrangea
    I've got a problem and am not sure what it is, but hope someone can help me think this through because this has me stumped. Backstory: I wrote a Java app (Android) that unzips some zip files downloaded from the network. Until now, this was working great. Then, this week, the archives that I'm creating on my pc (in Ubuntu 12.04) unzip on the Android phone into a flat hierarchy instead of preserving the folders. I'm creating the archives the same way (right-click on folder compress) but even though my old archives (created in 10.04) still unzip as expected, the new ones don't. On Ubuntu, the new zip files look the same to me as the old ones. When unzipped on my pc the folders in these new archives are restored the same as the old ones... it's the Android app that extracts the old ones fine and the new ones flat. What I really want to know, though, is what the difference between the archives is. Question: How could one determine why one zip archive would be extracted with folder hierarchy preserved, when an identical one (to all appearances on Ubuntu 12.04) is extracted with no hierarchy? Are there different ways in which a .zip file can "have" folders, but Ubuntu doesn't distinguish between them?

    Read the article

  • How to set up Windows 7 Professional as a NAS

    - by Enyalius
    I searched and didn't find any answers, so please forgive me if this is a repeat. Anyway, I have an older computer that I'm using as an HTPC, and I was hoping that I could use it as a NAS/multimedia server, as well. My primary uses would include accessing content on my PS3 (same LAN), accessing content from other computers on my home network and (if I can) accessing content from my Android phone over the internet. I have used SubSonic to stream music to my Android phone and other computers before, but I would really like to find a way to do this natively if possible. I know that I can buy external hard disk cases that can plug in the USB port of my router, that I can get a Drobo or other network storage solution, but I would really just rather not spend the money (especially considering that I already have a computer that I should be able to use). Hardware involved: Apple AirPort Extreme base station router (most recent revision) Home Theater Personal Computer: Core 2 Duo @ 2.4GHz, 8GB DDR2 RAM, ~3.5TB hard drive space Sony Playstaiton 3 Thin 120GB HTC Thunderbolt (I have 4G coverage) rooted and running Android 2.2.1 Various Apple laptops Various Windows 7 desktops/laptops Thanks in advance! Note- I have looked at open source NAS software but I would like to preserve the Windows Media Center functionality in Windows 7, so other NAS software is not an option for me currently. .

    Read the article

  • Upgrade or replace?

    - by Felix
    My current PC is about four years old, although I have made upgrades to it throughout its existence. The current specs are: (old) Intel Pentium D 2.80Ghz (32K L1 / 2M L2), Gigabyte 945GCMX-S2 motherboard (old) 2.5GB DDR2 (slot0: 512MB @ 533Mhz; slot1: 2GB @ 667Mhz) (new) HIS Radeon HD 4670 - I think this is limited by the motherboard not supporting PCIe 2.0 (?) (old) WD Caviar 160GB - pretty slow (new) WD Caviar Black 640GB (if any more specs are relevant, let me know and I'll add them) Now, on to my question. I've been having performance issues lately, both in video games and in intensive applications. A couple of examples: Android application development (running Eclipse and the Android emulator) is painfully slow (on Linux). I only realized this when, at my new job as an Android dev, both tools are MUCH quicker. (I'm not sure what CPU I have there) The guys at my new job got me NFS Hot Pursuit, in which I barely get like 5-10FPS, even with graphics options turned all the way down My guess is that the bottleneck in my system is my CPU, so I'm thinking of upgrading to a Quad Core i5 + new motherboard + 4GB DDR3 (or more, 'cause I know you'll all jump and say 8GB minimum). Now: Is that a good idea? Is my CPU really a bottleneck, or is the whole system too old and I should replace it? I run Windows 7 on the old, 160GB HDD (which is on IDE, by the way). Could this slow down games as well? Should I get a new drive for Windows if I want to play new games? I know nothing about power supplies. Could that be a problem / will it be a problem if I upgrade to an i5? How come DiRT2 works on full graphics settings (pretty amazing graphics by the way) and NFS Hot Pursuit pulls only 5-10FPS?

    Read the article

  • 12.04 Unity 3D Not working where as Unity 2D works fine

    - by Stephen Martin
    I updated from 11.10 to 12.04 using the distribution upgrade, after that I couldn't log into using the Unity 3D desktop after logging in I would either never get unity's launcher or I would get the launcher and once I tried to do anything the windows lost their decoration and nothing would respond. If I use Unity 2D it works fine and in fact I'm using it to type this. I managed to get some info out of dmesg that looks like it's the route of whats happening. dmesg: [ 109.160165] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 109.160180] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 109.167587] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1226 at 1218, next 1227) [ 109.672273] [drm:i915_reset] *ERROR* Failed to reset chip. output of lspci | grep vga 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) output of /usr/lib/nux/unity_support_test -p IN 12.04 OpenGL vendor string: VMware, Inc. OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 0x300) OpenGL version string: 2.1 Mesa 8.0.2 Not software rendered: no Not blacklisted: yes GLX fbconfig: yes GLX texture from pixmap: yes GL npot or rect textures: yes GL vertex program: yes GL fragment program: yes GL vertex buffer object: yes GL framebuffer object: yes GL version is 1.4+: yes Unity 3D supported: no The same command in 11.10: stephenm@mcr-ubu1:~$ /usr/lib/nux/unity_support_test -p OpenGL vendor string: Tungsten Graphics, Inc OpenGL renderer string: Mesa DRI Mobile Intel® GM45 Express Chipset OpenGL version string: 2.1 Mesa 7.11 Not software rendered: yes Not blacklisted: yes GLX fbconfig: yes GLX texture from pixmap: yes GL npot or rect textures: yes GL vertex program: yes GL fragment program: yes GL vertex buffer object: yes GL framebuffer object: yes GL version is 1.4+: yes Unity 3D supported: yes stephenm@mcr-ubu1:~$ output of /var/log/Xorg.0.log [ 11.971] (II) intel(0): EDID vendor "LPL", prod id 307 [ 11.971] (II) intel(0): Printing DDC gathered Modelines: [ 11.971] (II) intel(0): Modeline "1280x800"x0.0 69.30 1280 1328 1360 1405 800 803 809 822 -hsync -vsync (49.3 kHz) [ 12.770] (II) intel(0): Allocated new frame buffer 2176x800 stride 8704, tiled [ 15.087] (II) intel(0): EDID vendor "LPL", prod id 307 [ 15.087] (II) intel(0): Printing DDC gathered Modelines: [ 15.087] (II) intel(0): Modeline "1280x800"x0.0 69.30 1280 1328 1360 1405 800 803 809 822 -hsync -vsync (49.3 kHz) [ 33.310] (II) XKB: reuse xkmfile /var/lib/xkb/server-93A39E9580D1D5B855D779F4595485C2CC66E0CF.xkm [ 34.900] (WW) intel(0): flip queue failed: Invalid argument [ 34.900] (WW) intel(0): Page flip failed: Invalid argument [ 34.900] (WW) intel(0): flip queue failed: Invalid argument [ 34.900] (WW) intel(0): Page flip failed: Invalid argument [ 34.913] (WW) intel(0): flip queue failed: Invalid argument [ 34.913] (WW) intel(0): Page flip failed: Invalid argument [ 34.913] (WW) intel(0): flip queue failed: Invalid argument [ 34.913] (WW) intel(0): Page flip failed: Invalid argument [ 34.926] (WW) intel(0): flip queue failed: Invalid argument [ 34.926] (WW) intel(0): Page flip failed: Invalid argument [ 34.926] (WW) intel(0): flip queue failed: Invalid argument [ 34.926] (WW) intel(0): Page flip failed: Invalid argument [ 35.501] (WW) intel(0): flip queue failed: Invalid argument [ 35.501] (WW) intel(0): Page flip failed: Invalid argument [ 35.501] (WW) intel(0): flip queue failed: Invalid argument [ 35.501] (WW) intel(0): Page flip failed: Invalid argument [ 41.519] [mi] Increasing EQ size to 512 to prevent dropped events. [ 42.079] (EE) intel(0): Detected a hung GPU, disabling acceleration. [ 42.079] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg. [ 42.598] (II) intel(0): EDID vendor "LPL", prod id 307 [ 42.598] (II) intel(0): Printing DDC gathered Modelines: [ 42.598] (II) intel(0): Modeline "1280x800"x0.0 69.30 1280 1328 1360 1405 800 803 809 822 -hsync -vsync (49.3 kHz) [ 51.052] (II) AIGLX: Suspending AIGLX clients for VT switch I know im using the beta version so I'm not expecting it to work but does any one know what the problem may be or even why they Unity compatibility test is describing my video card as vmware when its an intel i915 and Ubuntu is running on the metal not virtualised. Unity 3D worked fine in 11.10

    Read the article

  • Unity 3D Not working where as Unity 2D works fine [closed]

    - by Stephen Martin
    I updated from 11.10 to 12.04 using the distribution upgrade, after that I couldn't log into using the Unity 3D desktop after logging in I would either never get unity's launcher or I would get the launcher and once I tried to do anything the windows lost their decoration and nothing would respond. If I use Unity 2D it works fine and in fact I'm using it to type this. I managed to get some info out of dmesg that looks like it's the route of whats happening. dmesg: [ 109.160165] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 109.160180] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 109.167587] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1226 at 1218, next 1227) [ 109.672273] [drm:i915_reset] *ERROR* Failed to reset chip. output of lspci | grep vga 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) output of /usr/lib/nux/unity_support_test -p IN 12.04 OpenGL vendor string: VMware, Inc. OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 0x300) OpenGL version string: 2.1 Mesa 8.0.2 Not software rendered: no Not blacklisted: yes GLX fbconfig: yes GLX texture from pixmap: yes GL npot or rect textures: yes GL vertex program: yes GL fragment program: yes GL vertex buffer object: yes GL framebuffer object: yes GL version is 1.4+: yes Unity 3D supported: no The same command in 11.10: stephenm@mcr-ubu1:~$ /usr/lib/nux/unity_support_test -p OpenGL vendor string: Tungsten Graphics, Inc OpenGL renderer string: Mesa DRI Mobile Intel® GM45 Express Chipset OpenGL version string: 2.1 Mesa 7.11 Not software rendered: yes Not blacklisted: yes GLX fbconfig: yes GLX texture from pixmap: yes GL npot or rect textures: yes GL vertex program: yes GL fragment program: yes GL vertex buffer object: yes GL framebuffer object: yes GL version is 1.4+: yes Unity 3D supported: yes stephenm@mcr-ubu1:~$ output of /var/log/Xorg.0.log [ 11.971] (II) intel(0): EDID vendor "LPL", prod id 307 [ 11.971] (II) intel(0): Printing DDC gathered Modelines: [ 11.971] (II) intel(0): Modeline "1280x800"x0.0 69.30 1280 1328 1360 1405 800 803 809 822 -hsync -vsync (49.3 kHz) [ 12.770] (II) intel(0): Allocated new frame buffer 2176x800 stride 8704, tiled [ 15.087] (II) intel(0): EDID vendor "LPL", prod id 307 [ 15.087] (II) intel(0): Printing DDC gathered Modelines: [ 15.087] (II) intel(0): Modeline "1280x800"x0.0 69.30 1280 1328 1360 1405 800 803 809 822 -hsync -vsync (49.3 kHz) [ 33.310] (II) XKB: reuse xkmfile /var/lib/xkb/server-93A39E9580D1D5B855D779F4595485C2CC66E0CF.xkm [ 34.900] (WW) intel(0): flip queue failed: Invalid argument [ 34.900] (WW) intel(0): Page flip failed: Invalid argument [ 34.900] (WW) intel(0): flip queue failed: Invalid argument [ 34.900] (WW) intel(0): Page flip failed: Invalid argument [ 34.913] (WW) intel(0): flip queue failed: Invalid argument [ 34.913] (WW) intel(0): Page flip failed: Invalid argument [ 34.913] (WW) intel(0): flip queue failed: Invalid argument [ 34.913] (WW) intel(0): Page flip failed: Invalid argument [ 34.926] (WW) intel(0): flip queue failed: Invalid argument [ 34.926] (WW) intel(0): Page flip failed: Invalid argument [ 34.926] (WW) intel(0): flip queue failed: Invalid argument [ 34.926] (WW) intel(0): Page flip failed: Invalid argument [ 35.501] (WW) intel(0): flip queue failed: Invalid argument [ 35.501] (WW) intel(0): Page flip failed: Invalid argument [ 35.501] (WW) intel(0): flip queue failed: Invalid argument [ 35.501] (WW) intel(0): Page flip failed: Invalid argument [ 41.519] [mi] Increasing EQ size to 512 to prevent dropped events. [ 42.079] (EE) intel(0): Detected a hung GPU, disabling acceleration. [ 42.079] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg. [ 42.598] (II) intel(0): EDID vendor "LPL", prod id 307 [ 42.598] (II) intel(0): Printing DDC gathered Modelines: [ 42.598] (II) intel(0): Modeline "1280x800"x0.0 69.30 1280 1328 1360 1405 800 803 809 822 -hsync -vsync (49.3 kHz) [ 51.052] (II) AIGLX: Suspending AIGLX clients for VT switch I know im using the beta version so I'm not expecting it to work but does any one know what the problem may be or even why they Unity compatibility test is describing my video card as vmware when its an intel i915 and Ubuntu is running on the metal not virtualised. Unity 3D worked fine in 11.10

    Read the article

  • July, the 31 Days of SQL Server DMO’s – Day 22 (sys.dm_db_index_physical_stats)

    - by Tamarick Hill
    The sys.dm_db_index_physical_stats Dynamic Management Function is used to return information about the fragmentation levels, page counts, depth, number of levels, record counts, etc. about the indexes on your database instance. One row is returned for each level in a given index, which we will discuss more later. The function takes a total of 5 input parameters which are (1) database_id, (2) object_id, (3) index_id, (4) partition_number, and (5) the mode of the scan level that you would like to run. Let’s use this function with our AdventureWorks2012 database to better illustrate the information it provides. SELECT * FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks2012'), NULL, NULL, NULL, NULL) As you can see from the result set, there is a lot of beneficial information returned from this DMF. The first couple of columns in the result set (database_id, object_id, index_id, partition_number, index_type_desc, alloc_unit_type_desc) are either self-explanatory or have been explained in our previous blog sessions so I will not go into detail about these at this time. The next column in the result set is the index_depth which represents how deep the index goes. For example, If we have a large index that contains 1 root page, 3 intermediate levels, and 1 leaf level, our index depth would be 5. The next column is the index_level which refers to what level (of the depth) a particular row is referring to. Next is probably one of the most beneficial columns in this result set, which is the avg_fragmentation_in_percent. This column shows you how fragmented a particular level of an index may be. Many people use this column within their index maintenance jobs to dynamically determine whether they should do REORG’s or full REBUILD’s of a given index. The fragment count represents the number of fragments in a leaf level while the avg_fragment_size_in_pages represents the number of pages in a fragment. The page_count column tells you how many pages are in a particular index level. From my result set above, you see the the remaining columns all have NULL values. This is because I did not specify a ‘mode’ in my query and as a result it used the ‘LIMITED’ mode by default. The LIMITED mode is meant to be lightweight so it does collect information for every column in the result set. I will re-run my query again using the ‘DETAILED’ mode and you will see we now have results for these rows. SELECT * FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks2012'), NULL, NULL, NULL, ‘DETAILED’)   From the remaining columns, you see we get even more detailed information such as how many records are in a particular index level (record_count). We have a column for ghost_record_count which represents the number of records that have been marked for deletion, but have not physically been removed by the background ghost cleanup process. We later see information on the MIN, MAX, and AVG record size in bytes. The forwarded_record_count column refers to records that have been updated and now no longer fit within the row on the page anymore and thus have to be moved. A forwarded record is left in the original location with a pointer to the new location. The last column in the result set is the compressed_page_count column which tells you how many pages in your index have been compressed. This is a very powerful DMF that returns good information about the current indexes in your system. However, based on the mode you select, it could be a very resource intensive function so be careful with how you use it. For more information on this Dynamic Management Function, please see the below Books Online link: http://msdn.microsoft.com/en-us/library/ms188917.aspx Follow me on Twitter @PrimeTimeDBA

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • glsl shader to allow color change of skydome ogre3d

    - by Tim
    I'm still very new to all this but learning a lot. I'm putting together an application using Ogre3d as the rendering engine. So far I've got it running, with a simple scene, a day/night cycle system which is working okay. I'm now moving on to looking at changing the color of the skydome material based on the time of day. What I've done so far is to create a struct to hold the ColourValues for the different aspects of the scene. struct todColors { Ogre::ColourValue sky; Ogre::ColourValue ambient; Ogre::ColourValue sun; }; I created an array to store all the colours todColors sceneColours [4]; I populated the array with the colours I want to use for the various times of the day. For instance DayTime (when the sun is high in the sky) sceneColours[2].sky = Ogre::ColourValue(135/255, 206/255, 235/255, 255); sceneColours[2].ambient = Ogre::ColourValue(135/255, 206/255, 235/255, 255); sceneColours[2].sun = Ogre::ColourValue(135/255, 206/255, 235/255, 255); I've got code to work out the time of the day using a float currentHours to store the current hour of the day 10.5 = 10:30 am. This updates constantly and updates the sun as required. I am then calculating the appropriate colours for the time of day when relevant using else if( currentHour >= 4 && currentHour < 7) { // Lerp from night to morning Ogre::ColourValue lerp = Ogre::Math::lerp<Ogre::ColourValue, float>(sceneColours[GT_TOD_NIGHT].sky , sceneColours[GT_TOD_MORNING].sky, (currentHour - 4) / (7 - 4)); } My original attempt to get this to work was to dynamically generate a material with the new colour and apply that material to the skydome. This, as you can probably guess... didn't go well. I know it's possible to use shaders where you can pass information such as colour to the shader from the code but I am unsure if there is an existing simple shader to change a colour like this or if I need to create one. What is involved in creating a shader and material definition that would allow me to change the colour of a material without the overheads of dynamically generating materials all the time? EDIT : I've created a glsl vertex and fragment shaders as follows. Vertex uniform vec4 newColor; void main() { gl_FrontColor = newColor; gl_Position = ftransform(); } Fragment void main() { gl_FragColor = gl_Color; } I can pass a colour to it using ShaderDesigner and it seems to work. I now need to investigate how to use it within Ogre as a material. EDIT : I created a material file like this : vertex_program colour_vs_test glsl { source test.vert default_params { param_named newColor float4 0.0 0.0 0.0 1 } } fragment_program colour_fs_glsl glsl { source test.frag } material Test/SkyColor { technique { pass { lighting off fragment_program_ref colour_fs_glsl { } vertex_program_ref colour_vs_test { } } } } In the code I have tried : Ogre::MaterialPtr material = Ogre::MaterialManager::getSingleton().getByName("Test/SkyColor"); Ogre::GpuProgramParametersSharedPtr params = material->getTechnique(0)->getPass(0)->getVertexProgramParameters(); params->setNamedConstant("newcolor", Ogre::Vector4(0.7, 0.5, 0.3, 1)); I've set that as the Skydome material which seems to work initially. I am doing the same with the code that is attempting to lerp between colours, but when I include it there, it all goes black. Seems like there is now a problem with my colour lerping.

    Read the article

  • OpenGL/GLSL: Render to cube map?

    - by BobDole
    I'm trying to figure out how to render my scene to a cube map. I've been stuck on this for a bit and figured I would ask you guys for some help. I'm new to OpenGL and this is the first time I'm using a FBO. I currently have a working example of using a cubemap bmp file, and the samplerCube sample type in the fragment shader is attached to GL_TEXTURE1. I'm not changing the shader code at all. I'm just changing the fact that I wont be calling the function that was loading the cubemap bmp file and trying to use the below code to render to a cubemap. You can see below that I'm also attaching the texture again to GL_TEXTURE1. This is so when I set the uniform: glUniform1i(getUniLoc(myProg, "Cubemap"), 1); it can access it in my fragment shader via uniform samplerCube Cubemap. I'm calling the below function like so: cubeMapTexture = renderToCubeMap(150, GL_RGBA8, GL_RGBA, GL_UNSIGNED_BYTE); Now, I realize in the draw loop below that I'm not changing the view direction to look down the +x, -x, +y, -y, +z, -z axis. I really was just wanting to see something working first before implemented that. I figured I should at least see something on my object the way the code is now. I'm not seeing anything, just straight black. I've made my background white still the object is black. I've removed lighting, and coloring to just sample the cubemap texture and still black. I'm thinking the problem might be the format types when setting my texture which is GL_RGB8, GL_RGBA but I've also tried: GL_RGBA, GL_RGBA GL_RGB, GL_RGB I thought this would be standard since we are rendering to a texture attached to a framebuffer, but I've seen different examples that use different enum values. I've also tried binding the cube map texture in every draw call that I'm wanting to use the cube map: glBindTexture(GL_TEXTURE_CUBE_MAP, cubeMapTexture); Also, I'm not creating a depth buffer for the FBO which I saw in most examples, because I'm only wanting the color buffer for my cube map. I actually added one to see if that was the problem and still got the same results. I could of fudged that up when I tried. Any help that can point me in the right direction would be appreciated. GLuint renderToCubeMap(int size, GLenum InternalFormat, GLenum Format, GLenum Type) { // color cube map GLuint textureObject; int face; GLenum status; //glEnable(GL_TEXTURE_2D); glActiveTexture(GL_TEXTURE1); glGenTextures(1, &textureObject); glBindTexture(GL_TEXTURE_CUBE_MAP, textureObject); glTexParameterf(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameterf(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameterf(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameterf(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glTexParameterf(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE); for (face = 0; face < 6; face++) { glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, 0, InternalFormat, size, size, 0, Format, Type, NULL); } // framebuffer object glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_FRAMEBUFFER, fbo); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_CUBE_MAP_POSITIVE_X, textureObject, 0); status = glCheckFramebufferStatus(GL_FRAMEBUFFER); printf("%d\"\n", status); printf("%d\n", GL_FRAMEBUFFER_COMPLETE); glViewport(0,0,size, size); for (face = 1; face < 6; face++) { drawSpheres(); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, textureObject, 0); } //Bind 0, which means render to back buffer, as a result, fb is unbound glBindFramebuffer(GL_FRAMEBUFFER, 0); return textureObject; }

    Read the article

  • GLSL subroutine not being used

    - by amoffat
    I'm using a gaussian blur fragment shader. In it, I thought it would be concise to include 2 subroutines: one for selecting the horizontal texture coordinate offsets, and another for the vertical texture coordinate offsets. This way, I just have one gaussian blur shader to manage. Here is the code for my shader. The {{NAME}} bits are template placeholders that I substitute in at shader compile time: #version 420 subroutine vec2 sample_coord_type(int i); subroutine uniform sample_coord_type sample_coord; in vec2 texcoord; out vec3 color; uniform sampler2D tex; uniform int texture_size; const float offsets[{{NUM_SAMPLES}}] = float[]({{SAMPLE_OFFSETS}}); const float weights[{{NUM_SAMPLES}}] = float[]({{SAMPLE_WEIGHTS}}); subroutine(sample_coord_type) vec2 vertical_coord(int i) { return vec2(0.0, offsets[i] / texture_size); } subroutine(sample_coord_type) vec2 horizontal_coord(int i) { //return vec2(offsets[i] / texture_size, 0.0); return vec2(0.0, 0.0); // just for testing if this subroutine gets used } void main(void) { color = vec3(0.0); for (int i=0; i<{{NUM_SAMPLES}}; i++) { color += texture(tex, texcoord + sample_coord(i)).rgb * weights[i]; color += texture(tex, texcoord - sample_coord(i)).rgb * weights[i]; } } Here is my code for selecting the subroutine: blur_program->start(); blur_program->set_subroutine("sample_coord", "vertical_coord", GL_FRAGMENT_SHADER); blur_program->set_int("texture_size", width); blur_program->set_texture("tex", *deferred_output); blur_program->draw(); // draws a quad for the fragment shader to run on and: void ShaderProgram::set_subroutine(constr name, constr routine, GLenum target) { GLuint routine_index = glGetSubroutineIndex(id, target, routine.c_str()); GLuint uniform_index = glGetSubroutineUniformLocation(id, target, name.c_str()); glUniformSubroutinesuiv(target, 1, &routine_index); // debugging int num_subs; glGetActiveSubroutineUniformiv(id, target, uniform_index, GL_NUM_COMPATIBLE_SUBROUTINES, &num_subs); std::cout << uniform_index << " " << routine_index << " " << num_subs << "\n"; } I've checked for errors, and there are none. When I pass in vertical_coord as the routine to use, my scene is blurred vertically, as it should be. The routine_index variable is also 1 (which is weird, because vertical_coord subroutine is the first listed in the shader code...but no matter, maybe the compiler is switching things around) However, when I pass in horizontal_coord, my scene is STILL blurred vertically, even though the value of routine_index is 0, suggesting that a different subroutine is being used. Yet the horizontal_coord subroutine explicitly does not blur. What's more is, whichever subroutine comes first in the shader, is the subroutine that the shader uses permanently. Right now, vertical_coord comes first, so the shader blurs vertically always. If I put horizontal_coord first, the scene is unblurred, as expected, but then I cannot select the vertical_coord subroutine! :) Also, the value of num_subs is 2, suggesting that there are 2 subroutines compatible with my sample_coord subroutine uniform. Just to re-iterate, all of my return values are fine, and there are no glGetError() errors happening. Any ideas?

    Read the article

< Previous Page | 444 445 446 447 448 449 450 451 452 453 454 455  | Next Page >