Search Results

Search found 30 results on 2 pages for 'tbb'.

Page 2/2 | < Previous Page | 1 2 

  • implement SIMD in C++

    - by Hristo
    I'm working on a bit of code and I'm trying to optimize it as much as possible, basically get it running under a certain time limit. The following makes the call... static affinity_partitioner ap; parallel_for(blocked_range<size_t>(0, T), LoopBody(score), ap); ... and the following is what is executed. void operator()(const blocked_range<size_t> &r) const { int temp; int i; int j; size_t k; size_t begin = r.begin(); size_t end = r.end(); for(k = begin; k != end; ++k) { // for each trainee temp = 0; for(i = 0; i < N; ++i) { // for each sample int trr = trRating[k][i]; int ei = E[i]; for(j = 0; j < ei; ++j) { // for each expert temp += delta(i, trr, exRating[j][i]); } } myscore[k] = temp; } } I'm using Intel's TBB to optimize this. But I've also been reading about SIMD and SSE2 and things along that nature. So my question is, how do I store the variables (i,j,k) in registers so that they can be accessed faster by the CPU? I think the answer has to do with implementing SSE2 or some variation of it, but I have no idea how to do that. Any ideas? Thanks, Hristo

    Read the article

  • unroll nested for loops in C++

    - by Hristo
    How would I unroll the following nested loops? for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } I tried the following, but my output isn't the same, and it should be: for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i+4 < N; i+=4) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]); array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]); array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]); } if (i < N) { for (; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } } I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated. Thanks, Hristo

    Read the article

  • crash when using stl vector at instead of operator[]

    - by Jamie Cook
    I have a method as follows (from a class than implements TBB task interface - not currently multithreading though) My problem is that two ways of accessing a vector are causing quite different behaviour - one works and the other causes the entire program to bomb out quite spectacularly (this is a plugin and normally a crash will be caught by the host - but this one takes out the host program as well! As I said quite spectacular) void PtBranchAndBoundIterationOriginRunner::runOrigin(int origin, int time) const // NOTE: const method { BOOST_FOREACH(int accessMode, m_props->GetAccessModes()) { // get a const reference to appropriate vector from member variable // map<int, vector<double>> m_rowTotalsByAccessMode; const vector<double>& rowTotalsForAccessMode = m_rowTotalsByAccessMode.find(accessMode)->second; if (origin != 129) continue; // Additional debug constrain: I know that the vector only has one non-zero element at index 129 m_job->Write("size: " + ToString(rowTotalsForAccessMode.size())); try { // check for early return... i.e. nothing to do for this origin if (!rowTotalsForAccessMode[origin]) continue; // <- this works if (!rowTotalsForAccessMode.at(origin)) continue; // <- this crashes } catch (...) { m_job->Write("Caught an exception"); // but its not an exception } // do some other stuff } } I hate not putting in well defined questions but at the moment my best phrasing is : "WTF?" I'm compiling this with Intel C++ 11.0.074 [IA-32] using Microsoft (R) Visual Studio Version 9.0.21022.8 and my implementation of vector has const_reference operator[](size_type _Pos) const { // subscript nonmutable sequence #if _HAS_ITERATOR_DEBUGGING if (size() <= _Pos) { _DEBUG_ERROR("vector subscript out of range"); _SCL_SECURE_OUT_OF_RANGE; } #endif /* _HAS_ITERATOR_DEBUGGING */ _SCL_SECURE_VALIDATE_RANGE(_Pos < size()); return (*(_Myfirst + _Pos)); } (Iterator debugging is off - I'm pretty sure) and const_reference at(size_type _Pos) const { // subscript nonmutable sequence with checking if (size() <= _Pos) _Xran(); return (*(begin() + _Pos)); } So the only difference I can see is that at calls begin instead of simply using _Myfirst - but how could that possibly be causing such a huge difference in behaviour?

    Read the article

  • C++ Optimize if/else condition

    - by Heye
    I have a single line of code, that consumes 25% - 30% of the runtime of my application. It is a less-than comparator for an std::set (the set is implemented with a Red-Black-Tree). It is called about 180 Million times within 52 seconds. struct Entry { const float _cost; const long _id; // some other vars Entry(float cost, float id) : _cost(cost), _id(id) { } }; template<class T> struct lt_entry: public binary_function <T, T, bool> { bool operator()(const T &l, const T &r) const { // Most readable shape if(l._cost != r._cost) { return r._cost < l._cost; } else { return l._id < r._id; } } }; The entries should be sorted by cost and if the cost is the same by their id. I have many insertions for each extraction of the minimum. I thought about using Fibonacci-Heaps, but I have been told that they are theoretically nice, but suffer from high constants and are pretty complicated to implement. And since insert is in O(log(n)) the runtime increase is nearly constant with large n. So I think its okay to stick to the set. To improve performance I tried to express it in different shapes: return l._cost < r._cost || r._cost > l._cost || l._id < r._id; return l._cost < r._cost || (l._cost == r._cost && l._id < r._id); Even this: typedef union { float _f; int _i; } flint; //... flint diff; diff._f = (l._cost - r._cost); return (diff._i && diff._i >> 31) || l._id < r._id; But the compiler seems to be smart enough already, because I haven't been able to improve the runtime. I also thought about SSE but this problem is really not very applicable for SSE... The assembly looks somewhat like this: movss (%rbx),%xmm1 mov $0x1,%r8d movss 0x20(%rdx),%xmm0 ucomiss %xmm1,%xmm0 ja 0x410600 <_ZNSt8_Rb_tree[..]+96> ucomiss %xmm0,%xmm1 jp 0x4105fd <_ZNSt8_Rb_[..]_+93> jne 0x4105fd <_ZNSt8_Rb_[..]_+93> mov 0x28(%rdx),%rax cmp %rax,0x8(%rbx) jb 0x410600 <_ZNSt8_Rb_[..]_+96> xor %r8d,%r8d I have a very tiny bit experience with assembly language, but not really much. I thought it would be the best (only?) point to squeeze out some performance, but is it really worth the effort? Can you see any shortcuts that could save some cycles? The platform the code will run on is an ubuntu 12 with gcc 4.6 (-stl=c++0x) on a many-core intel machine. Only libraries available are boost, openmp and tbb. I am really stuck on this one, it seems so simple, but takes that much time. I have been crunching my head since days thinking how I could improve this line... Can you give me a suggestion how to improve this part, or is it already at its best?

    Read the article

  • CodePlex Daily Summary for Thursday, November 24, 2011

    CodePlex Daily Summary for Thursday, November 24, 2011Popular ReleasesASP.NET Comet Ajax Library (Reverse Ajax - Server Push): ASP.NET Reverse Ajax Samples: Chat, MVC Razor, DesktopClient, Reverse Ajax for VB.NET and C#Windows Azure SDK for PHP: Windows Azure SDK for PHP v4.0.5: INSTALLATION Windows Azure SDK for PHP requires no special installation steps. Simply download the SDK, extract it to the folder you would like to keep it in, and add the library directory to your PHP include_path. INSTALLATION VIA PEAR Maarten Balliauw provides an unofficial PEAR channel via http://www.pearplex.net. Here's how to use it: New installation: pear channel-discover pear.pearplex.net pear install pearplex/PHPAzure Or if you've already installed PHPAzure before: pear upgrade p...Anno 2070 Assistant: Beta v1.0 (STABLE): Anno 2070 Assistant Beta v1.0 Released! Features Included: Complete Building Layouts for Ecos, Tycoons & Techs Complete Production Chains for Ecos, Tycoons & Techs Completed Credits Screen Known Issues: Not all production chains and building layouts may be on the lists because they have not yet been discovered. However, data is still 99.9% complete. Currently the Supply & Demand, including Calculator screen are disabled until version 1.1.Minemapper: Minemapper v0.1.7: Including updated Minecraft Biome Extractor and mcmap to support the new Minecraft 1.0.0 release (new block types, etc).Metro Pandora: Metro Pandora SDK V1: For more information on this release please see Metro Pandora SDK Introduction. Supported platforms: Windows Phone 7 / Silverlight Windows 8 .Net 4.0, WPF, WinformsVisual Leak Detector for Visual C++ 2008/2010: v2.2.1: Enhancements: * strdup and _wcsdup functions support added. * Preliminary support for VS 11 added. Bugs Fixed: * Low performance after upgrading from VLD v2.1. * Memory leaks with static linking fixed (disabled calloc support). * Runtime error R6002 fixed because of wrong memory dump format. * version.h fixed in installer. * Some PVS studio warning fixed.NetSqlAzMan - .NET SQL Authorization Manager: 3.6.0.10: 3.6.0.10 22-Nov-2011 Update: Removed PreEmptive Platform integration (PreEmptive analytics) Removed all PreEmptive attributes Removed PreEmptive.dll assembly references from all projects Added first support to ADAM/AD LDS Thanks to PatBea. Work Item 9775: http://netsqlazman.codeplex.com/workitem/9775Developer Team Article System Management: DTASM v1.3: ?? ??? ???? 3 ????? ???? ???? ????? ??? : - ????? ?????? ????? ???? ?? ??? ???? ????? ?? ??? ? ?? ???? ?????? ???? ?? ???? ????? ?? . - ??? ?? ???? ????? ???? ????? ???? ???? ?? ????? , ?????? ????? ????? ?? ??? . - ??? ??????? ??? ??? ???? ?? ????? ????? ????? .VideoLan DotNet for WinForm, WPF & Silverlight 5: VideoLan DotNet for WinForm, WPF, SL5 - 2011.11.22: The new version contains Silverlight 5 library: Vlc.DotNet.Silverlight. A sample could be tested here The new version add and correct many features : Correction : Reinitialize some variables Deprecate : Logging API, since VLC 1.2 (08/20/2011) Add subitem in LocationMedia (for Youtube videos, ...) Update Wpf sample to use Youtube videos Many others correctionsSharePoint 2010 FBA Pack: SharePoint 2010 FBA Pack 1.2.0: Web parts are now fully customizable via html templates (Issue #323) FBA Pack is now completely localizable using resource files. Thank you David Chen for submitting the code as well as Chinese translations of the FBA Pack! The membership request web part now gives the option of having the user enter the password and removing the captcha (Issue # 447) The FBA Pack will now work in a zone that does not have FBA enabled (Another zone must have FBA enabled, and the zone must contain the me...SharePoint 2010 Education Demo Project: Release SharePoint SP1 for Education Solutions: This release includes updates to the Content Packs for SharePoint SP1. All Content Packs have been updated to install successfully under SharePoint SP1SQL Monitor - managing sql server performance: SQLMon 4.1 alpha 6: 1. improved support for schema 2. added find reference when right click on object list 3. added object rename supportBugNET Issue Tracker: BugNET 0.9.126: First stable release of version 0.9. Upgrades from 0.8 are fully supported and upgrades to future releases will also be supported. This release is now compiled against the .NET 4.0 framework and is a requirement. Because of this the web.config has significantly changed. After upgrading, you will need to configure the authentication settings for user registration and anonymous access again. Please see our installation / upgrade instructions for more details: http://wiki.bugnetproject.c...Free SharePoint 2010 Sites Templates: SharePoint Server 2010 Sites Templates: here is the list of sites templates to be downloadedVsTortoise - a TortoiseSVN add-in for Microsoft Visual Studio: VsTortoise Build 30 Beta: Note: This release does not work with custom VsTortoise toolbars. These get removed every time when you shutdown Visual Studio. (#7940) Build 30 (beta)New: Support for TortoiseSVN 1.7 added. (the download contains both setups, for TortoiseSVN 1.6 and 1.7) New: OpenModifiedDocumentDialog displays conflicted files now. New: OpenModifiedDocument allows to group items by changelist now. Fix: OpenModifiedDocumentDialog caused Visual Studio 2010 to freeze sometimes. Fix: The installer didn...nopCommerce. Open source shopping cart (ASP.NET MVC): nopcommerce 2.30: Highlight features & improvements: • Performance optimization. • Back in stock notifications. • Product special price support. • Catalog mode (based on customer role) To see the full list of fixes and changes please visit the release notes page (http://www.nopCommerce.com/releasenotes.aspx).WPF Converters: WPF Converters V1.2.0.0: support for enumerations, value types, and reference types in the expression converter's equality operators the expression converter now handles DependencyProperty.UnsetValue as argument values correctly (#4062) StyleCop conformance (more or less)Json.NET: Json.NET 4.0 Release 4: Change - JsonTextReader.Culture is now CultureInfo.InvariantCulture by default Change - KeyValurPairConverter no longer cares about the order of the key and value properties Change - Time zone conversions now use new TimeZoneInfo instead of TimeZone Fix - Fixed boolean values sometimes being capitalized when converting to XML Fix - Fixed error when deserializing ConcurrentDictionary Fix - Fixed serializing some Uris returning the incorrect value Fix - Fixed occasional error when...Media Companion: MC 3.423b Weekly: Ensure .NET 4.0 Full Framework is installed. (Available from http://www.microsoft.com/download/en/details.aspx?id=17718) Ensure the NFO ID fix is applied when transitioning from versions prior to 3.416b. (Details here) Replaced 'Rebuild' with 'Refresh' throughout entire code. Rebuild will now be known as Refresh. mc_com.exe has been fully updated TV Show Resolutions... Resolved issue #206 - having to hit save twice when updating runtime manually Shrunk cache size and lowered loading times f...ASP.net Awesome jQuery Ajax Controls Samples and Tutorials: 1.0 samples: Demos and Tutorials for ASP.net Awesome VS2008 are in .NET 3.5 VS2010 are in .NET 4.0 (demos for the ASP.net Awesome jQuery Ajax Controls)New Projects"My" Search SharePoint 2010 WebParts: Solution consists of two SharePoint 2010 web parts inheriting from core results web part. The "My Core Search Results" webpart and the "Sortable Results" web part.Alerta Mensagens: Projeto de alerta de mensagens.Analyzer GT3000: VU MIF PS1 "Utopine Amnezija" PSI projektasAutoReservation: Miniprojekt an der HSR im Fach MsTechContestMeter: Program for measuring participant scores in programming contests.CountriesWFA: oby ostatniDotNet.Framework.Common: <DotNet.Framework.Common> ASP.NET?????? Esaan Windows Phone 7 Application Compitition: Resource for Esaan Windows Phone 7 Application Excel add-in to enable users View Azure Storage Tables: An Excel 2007 add-in using Azure Storage REST APIs (no Azure libraries required) and Excel custom task pane. Uses of such Office 2007 add-in: 1. Read data from Azure storage, populate that in excel and use it by sorting (not directly available in Azure storage) and so on. 2. Write Sorted data back (in sorted order) back to Azure storage. 3. Have data used frequently (such as dictionary of legal terms, or math formulae, and so on) on Azure storage and let your Office Excel users use th...fastconv: fastconv, a fast charset encoding convertor.gaosu: this is a gaosu projectGems: Gossip-enabled monitoring system.GridIT: GridIT is a Puzzle written in Visual Basic.Habanero Faces: A set of user interface libraries for use with Habanero Core used to build Windows Forms or Visual WebGUI user interfaces for your business objects. HomeSite: HomeSiteHTML to docx Converter: This converts HTML into Word documents (docx format). The code is written in PHP and works with PHPWord.Image Curator: image curation programjLemon: jLemon is an LALR(1) parser generator. Lemon is similar to the much more famous programs "YACC", "BISON" and "LEMON". jLemon is a pure Java program and compatible with "LEMON".jngsoftware: JNG Computer ServicesKinectoid: Kinectoid is a Kinect based pong game based on Neat Game Engine.Linq Accelerator: coming soon!Linq to SSRS (SQL Server Reporting Services): Linq to SSRS allows you as developer generate objects and map them to the repots on SQL Server Reporting Services in linq to sql fashion using lambda expressions and linq notation.MicroBitmap - Image Compressor: MicroBitmap is a image compressor which is different from all other compressors This compressor is focussed at compressing the bitmap and making it so small as possible so fast as possible You can find more information in the source codeOMX_AL_test_environment: OMX application layer simulation/testing environmentShuriken Plugins: Shuriken Plugins is a project for developing plugins for Shuriken, the free launcher app on Windows. Simple Command Line Backup Tool: Simple Command Line Backup Tool automates the backup of files from the command line for use in batch files. Includes file type filters/ recursive/non recursive and number days to keep backups for. Right now this is simply a quick app I created for a client of mine that needed a simple backup utility to go with a product I delivered. .NET 2.0 C# Console Application Hopefully this will morph into a useful utility with features not common in other command line backup utilities.SomethingSpacial: Initially designed via Interact Designer Ariel (http://www.facingblend.com/) Something Spacial as a design was used to help give some look and feel for the Silverlight User Group Starter Kit and then finally used as part of the Seattle Silverlight User Group Community Site.SqlServer Packer: SqlServer???????????????????,?????????。TVDBMetaData: Construct TV Series and Episode XML metadata from TheTvDB database.Ukázkové projekty: Obsahuje ukázkové projekty uživatele TenCoKaciStromy.Uzing Inklude: UI = Uzing + Inklude Uzing : Utility classes for .Net written in C# Inklude : Utility classes written in native C++ It supports very low level communications, but in very simple way. Current Capability - TCP, UDP, Thread in C# & C++ - String in C++ - Shared memory in C# - Communication between C# and C++ via shared memory Dependency: - TBB (http://threadingbuildingblocks.org/) - Boost (http://www.boost.org/) Even though it depends on such heavy libs, end developer does...Visual Studio Project Converter: This project is inspired by the original source code provided by Emmet Gray to convert Visual Studio solution and project files from one version to another (both backwards and forwards). This project has been converted to C# and updated to support new features.WarrantyPrint: WarrantyPrintWordPress???? on Windows Azure: WordPress?????Windows Azure Platform????????。 ???????SQL Azure??????。WPFResumeVideo: Play video on any computer, and resume where you left offWyvern's Depot: Personal code repository.

    Read the article

< Previous Page | 1 2