Search Results

Search found 10536 results on 422 pages for 'cpu usage'.

Page 80/422 | < Previous Page | 76 77 78 79 80 81 82 83 84 85 86 87 | Next Page >

Gigabit network limited to 25MB/s by CPU. How to make it faster?

- by netvope

I have a Acer Aspire R1600-U910H with a nForce gigabit network adapter. The maximum TCP throughput of it is about 25MB/s, and apparently it is limited by the single core Intel Atom 230; when the maximum throughput is reached, the CPU usage is about 50%-60%, which corresponds to full utilization considering this is a Hyper-threading enabled CPU. The same problem occurs on both Windows XP and on Ubuntu 8.04. On Windows, I have installed the latest nForce chipset driver, disabled power saving features, and enabled checksum offload. On Linux, the default driver has checksum offload enabled. There is no Linux driver available on Nvidia's website. ethtool -k eth0 shows that checksum offload is enabled: Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: off The following is the output of powertop when the network is idle: Wakeups-from-idle per second : 61.9 interval: 10.0s no ACPI power usage estimate available Top causes for wakeups: 90.9% (101.3) <interrupt> : eth0 4.5% ( 5.0) iftop : schedule_timeout (process_timeout) 1.8% ( 2.0) <kernel core> : clocksource_register (clocksource_watchdog) 0.9% ( 1.0) dhcdbd : schedule_timeout (process_timeout) 0.5% ( 0.6) <kernel core> : neigh_table_init_no_netlink (neigh_periodic_timer) And when the maximum throughput of about 25MB/s is reached: Wakeups-from-idle per second : 11175.5 interval: 10.0s no ACPI power usage estimate available Top causes for wakeups: 99.9% (22097.4) <interrupt> : eth0 0.0% ( 5.0) iftop : schedule_timeout (process_timeout) 0.0% ( 2.0) <kernel core> : clocksource_register (clocksource_watchdog) 0.0% ( 1.0) dhcdbd : schedule_timeout (process_timeout) 0.0% ( 0.6) <kernel core> : neigh_table_init_no_netlink (neigh_periodic_timer) Notice the 20000 interrupts per second. Could this be the cause for the high CPU usage and low throughput? If so, how can I improve the situation? The other computers in the network can usually transfer at 50+MB/s without problems. And a minor question: How can I find out what is the driver in use for eth0?

Read the article
Performance of std::pow - cache misses???

- by Eamon Nerbonne

I've been trying to optimize a numeric program of mine, and have run into something of a mystery. I'm looping over code that performs thousands of floating point operations, and just 1 call to pow nevertheless, that call takes 5% of the time... That's not necessarily a critical issue, but it is odd, so I'd like to understand what's happening. When I profiled for cache misses, VS.NET 2010RC's profiler reports that virtually all cache misses are occurring in std::pow... so... what's up with that? Is there a faster alternative? I tried powf, but that's only slightly faster; it's still responsible for an abnormal number of cache misses. Why would a basic function like pow cause cache-misses?

Read the article
Looking for a good book on microprocessor internals

- by David Holm

I'm looking for a good book on how modern microprocessors are designed and work as I would like to increase my understanding of what makes them tick. Something that covers pipelines, superscalar architectures, caches etc. A book that is suitable for a programmer with several years of experience and has done and understands assembly programming and machine language, so basically not "CPUs for Dummies" or anything such. What books do people who design today's processors read for instance?

Read the article
branch prediction

- by Alexander

Consider the following sequence of actual outcomes for a single static branch. T means the branch is taken. N means the branch is not taken. For this question, assume that this is the only branch in the program. T T T N T N T T T N T N T T T N T N Assume a two-level branch predictor that uses one bit of branch history—i.e., a one-bit BHR. Since there is only one branch in the program, it does not matter how the BHR is concatenated with the branch PC to index the BHT. Assume that the BHT uses one-bit counters and that, again, all entries are initialized to N. Which of the branches in this sequence would be mis-predicted? Use the table below. Now I am not asking answers to this question, rather than guides and pointers on this. What does a two level branch predictor means and how does it works? What does the BHR and BHT stands for?

Read the article
Best CPUs for speeding up compiling times of C++ w/ DistGCC

- by Jay

I'm putting together a distributed build farm with DistGCC to speed up our teams compile times and just looking for thoughts on which processors to use in the hosts. Are we going to get a noticeable decrease in time using 8 cores vs. 4-hyperthreaded cores? Big difference in time between i7 and Xeon? etc, etc. Just need advice from people who've put together kick-a build clusters. We've got a majority of the normal things to speed up builds in place (pre-compiled headers, ccache, local gigabit connections between them, tons of ram, etc) so please just give advice on the best processor to use. And money is a factor, but anythings doable if the performance increase is noticeable. Thanks. Jay

Read the article
cache memory performance

- by Krewie

Hello, i just have a general question about cache memory. How would a program perform badly on a cache based system ? , since cache memory stores adresses from main memory that is requested, aswell as adresses that ranges around the same adress as the one copied from the main memory.

Read the article
Why is the JVM stack-based and the DalvikVM register based?

- by aioobe

I'm curious, why did Sun decide to make the JVM stack-based and Google decide to make the DalvikVM register based? I suppose the JVM can't really assume that a certain number of registers are available on the target platform, since it is supposed to be platform independent. Therefor it just postpones the register-allocation etc, to the JIT compiler. (Correct me if I'm wrong.) So the Android guys thought, "hey, that's inefficient, let's go for a register based vm right away..."? But wait, there are multiple different android devices, what number of registers did the Dalvik target? Are the Dalvik opcodes hardcoded for a certain number of registers? Do all current Android devices on the market have about the same number of registers? Or, is there a register re-allocation performed during dex-loading? How does all this fit together?

Read the article
P6 Architecture - Register renaming aside, does the limited user registers result in more ops spent

- by mrjoltcola

I'm studying JIT design with regard to dynamic languages VM implementation. I haven't done much Assembly since the 8086/8088 days, just a little here or there, so be nice if I'm out of sorts. As I understand it, the x86 (IA-32) architecture still has the same basic limited register set today that it always did, but the internal register count has grown tremendously, but these internal registers are not generally available and are used with register renaming to achieve parallel pipelining of code that otherwise could not be parallelizable. I understand this optimization pretty well, but my feeling is, while these optimizations help in overall throughput and for parallel algorithms, the limited register set we are still stuck with results in more register spilling overhead such that if x86 had double, or quadruple the registers available to us, there may be significantly less push/pop opcodes in a typical instruction stream? Or are there other processor optmizations that also optimize this away that I am unaware of? Basically if I've a unit of code that has 4 registers to work with for integer work, but my unit has a dozen variables, I've got potentially a push/pop for every 2 or so instructions. Any references to studies, or better yet, personal experiences?

Read the article
Could this code damage my processor??!!

- by Osama Gamal

A friend sent me that code and alleges that it could damage the processor. Is that true? void damage_processor() { while (true) { // Assembly code that sets the five control registers bits to ones which causes a bunch of exceptions in the system and then damages the processor Asm( "mov cr0, 0xffffffff \n\t" "mov cr1, 0xffffffff \n\t" "mov cr2, 0xffffffff \n\t" "mov cr3, 0xffffffff \n\t" "mov cr4, 0xffffffff \n\t" ) } } Is that true?

Read the article
Working with ieee format numbers in ARM

- by Jake Sellers

I'm trying to write an ARM program that will convert an ieee number to a TNS format number. TNS is a format used by some super computers, and is similar to ieee but different. I'm trying to use several masks to place the three different "part" of the ieee number in separate registers so I can move them around accordingly. Here is my unpack subroutine: UnpackIEEE LDR r1, SMASK ;load the sign bit mask into r1 LDR r2, EMASK ;load the exponent mask into r2 LDR r3, GMASK ;load the significand mask into r3 AND r4, r0, r1 ;apply sign mask to IEEE and save into r4 AND r5, r0, r2 ;apply exponent mask to IEEE and save into r5 AND r6, r0, r3 ;apply significand mask to IEEE and save into r6 MOV pc, r14 ;return And here are the masks and number declarations so you can understand: IEEE DCD 0x40300000 ;2.75 decimal or 01000000001100000000000000000000 binary SMASK DCD 0x80000000 ;Sign bit mask EMASK DCD 0x7F800000 ;Exponent mask GMASK DCD 0x007FFFFF ;Significand mask When I step through with the debugger, the results I get are not what I expect after working through it on paper. EDIT: What I mean, is that after the subroutine runs, registers 4, 5, and 6 all remain 0. I can't figure out why the masks are not working. I think I do not fully understand how the number is being stored in the register or using the masks wrong. Any help appreciated. If you need more info just ask. EDIT: entry point: Very simple, just trying to get these subroutines working. ENTRY LDR r1, IEEE ;load IEEE num into r1 BL UnpackIEEE ;call unpack sub SWI SWI_Exit ;finish

Read the article
Looking for a micro programmable FPGA + machine

- by ~konman

Hi! I'm looking for a FPGA + machine. It should be entry level pricing (e.g no more than $200). EDIT: I want to make an ASM chart and program the FPGA to act like I specified in the chart

Read the article
Unary NOT/Integersize of the architecture

- by sid_com

From "Mastering Perl/Chapter 16/Bit Operators/Unary NOT,~": The unary NOT operator (sometimes called the complement operator), ~, returns the bitwise negation, or 1's complement, of the value, based on integer size of the architecture Why does the following script output two different values? #!/usr/local/bin/perl use warnings; use 5.012; use Config; my $int_size = $Config{intsize} * 8; my $value = 0b1111_1111; my $complement = ~ $value; say length sprintf "%${int_size}b", $value; say length sprintf "%${int_size}b", $complement; Output: 32 64

Read the article
nested function call faster or not ?

- by KaluSingh Gabbar

I have this silly argument with a friend and need a authoritative word on it. I have these two snippet and want to know which one is faster ? [A or B] (assuming that compiler does not optimize anything) [A] if ( foo () ); [B] int t = foo (); if ( t )

Read the article
Why don't stacks grow upwards (for security)?

- by AshleysBrain

This is related to the question 'Why do stacks typically grow downwards?', but more from a security point of view. I'm generally referring to x86. It strikes me as odd that the stack would grow downwards, when buffers are usually written to upwards in memory. For example a typical C++ string has its end at a higher memory address than the beginning. This means that if there's a buffer overflow you're overwriting further up the call stack, which I understand is a security risk, since it opens the possibility of changing return addresses and local variable contents. If the stack grew upwards in memory, wouldn't buffer overflows simply run in to dead memory? Would this improve security? If so, why hasn't it been done? What about x64, do those stacks grow upwards and if not why not?

Read the article
Is there any .Net JIT Support from chip vendors?

- by NoMoreZealots

I know that ARM actually has some support for Java and SUN obviously, but I haven't really references seen any chip vendor supporting a .Net JIT compiler. I know IBM and Intel both support C compilers, as well as TI and many of the embedded chip vendors. When you think of it, all a JIT compiler is, is the last stages of compilation and optimization which you would think would be a good match for a chip vendor's expertize. Perhaps a standardized Plug In compilation engine for the VM would make sense. Microsoft is targeting .Net to embedded Windows platforms as well, so they are fair game. Pete

Read the article
Mips Data layout calculation

- by Michael Z

I am self studying computer architecture offer at Michigan university. I do not understand why the memory layout for d . Maybe I did not understanding the here well.

Read the article
How to get number of cores in C?

- by Alan

I'm writing a program in C on windows that needs to run as many threads as available cores. But I dont know how to get the number of cores. Any ideas?

Read the article
Is recursion preferred compare to iteration in multicore era?

- by prM

Or say, do multicore CPUs process recursion faster than iteration? Or it simply depends on how one language runs on the machine? like c executes function calls with large cost, comparing to doing simple iterations. I had this question because one day I told one of my friend that recursion isn't any amazing magic that can speed up programs, and he told me that with multicore CPUs recursion can be faster than iteration. EDIT: If we consider the most recursion-loved situation (data structure, function call), is it even possible for recursion to be faster?

Read the article
How is external memory, internal memory, and cache organized?

- by goldenmean

Consider a system as follows:= A hardware board having say ARM Cortex-A8 and Neon Vector coprocessor, and Embedded Linux OS running on Cortex-A8. On this environment, if there is some application - say, a video decoder is executing - then: How is it decided that which buffers would be in external memory, which ones would be allocated in internal SRAM, etc. When one says calloc/malloc on such system/code, the pointer returned is from which memory: internal or external? Can a user make buffers to be allocated to the memories of his choice (internal/external)? In ARM architectures, there is another memory called as Tightly coupled memory (TCM). What is that and how can user enable and use it? Can I declare buffers in this memory? Do I need to see the memory map (if any) of the hardware board to understand about all these different physical memories present in a typical hardware board? How much of a role does the OS play in distinguishing these different memories? Sorry for multiple questions, but i think they all are interlinked.

Read the article
Determine target architecture of binary file in Linux (library or executable)

- by Fernando Miguélez

We have an issue related to a Java application running under a (rather old) FC3 on a Advantech POS board with a Via C3 processor. The java application has several compiled shared libs that are accessed via JNI. Via C3 processor is suppossed to be i686 compatible. Some time ago after installing Ubuntu 6.10 on a MiniItx board with the same processor I found out that the previous statement is not 100% true. The Ubuntu kernel hanged on startup due to the lack of some specific and optional instructions of the i686 set in the C3 processor. These instructions missing in C3 implementation of i686 set are used by default by GCC compiler when using i686 optimizations. The solution in this case was to go with a i386 compiled version of Ubuntu distribution. The base problem with the Java application is that the FC3 distribution was installed on the HD by cloning from an image of the HD of another PC, this time an Intel P4. Afterwards the distribution needed some hacking to have it running such as replacing some packages (such as the kernel one) with the i383 compiled version. The problem is that after working for a while the system completely hangs without a trace. I am afraid that some i686 code is left somewhere in the system and could be executed randomly at any time (for example after recovering from suspend mode or something like that). My question is: Is there any tool or way to find out at what specific architecture is an binary file (executable or library) aimed provided that "file" does not give so much information?

Read the article
Is it possible to access 32-bit registers in C ?

- by C4theWin

Is it possible to access 32-bit registers in C ? If it is, how ? And if not, then is there any way to embed Assembly code in C ? I`m using the MinGW compiler, by the way. Thanks in advance!

Read the article
How can I dual boot my iphone or ipad to run a very simple custom os?

- by Jim98

I am an experienced C/C++ programmer and have worked with assembly and many other programing language and I want to start a project as a learning process. I want to try and run a simple custom os on the iphone or ipad. What knowledge would I need to do this, and how does the iphone or ipad bootloader load the os and how could I modify it to load a custom os? Im not really sure what to ask here so I really just need to get as much information as possible so I could ask some more informed questions to get my project started Thanks

Read the article
Grep without storing search to the "/ register in Vim

- by Phro

In my .vimrc I have a mapping that makes a line of text 'title capitalized': noremap <Leader>at :s/\v<(.)(\w{2,})/\u\1\L\2/g<CR> However, whenever I run this function, it highlights every word that is at least three characters long in my entire document. Of course I could get this behaviour to stop simply by appending :nohlsearch<CR> to the end of the mapping, but this is more of an awkward hack that still avoids a bigger problem: The last search has been replaced by \v<(.)(\w{2,}). Is there any way to use the search commands in Vim without storing the last search in the "/ register; a 'silent' search of sorts? That way, after running this title-making command, I can still use my previous search to navigate the document using n, N, etc. Edit Using @brettanomyces' answer, I found that simply setting the mapping: noremap <Leader>at :call setline(line('.'),substitute(getline('.'), '\v<(.)(\w{2,})', '\u\1\L\2', 'g'))<CR> will successfully perform the substitution without storing the searched text into the / register.

Read the article
What are good/great books on Logic Design ?

- by mudge

I want a couple good books that cover the subject of logic design, making computer circuits. There seems to be a lot of expensive books on logic design but it is unclear which ones are good.

Read the article
What's the purpose of the rotate instructions (ROL, RCL on x86) ?

- by lgratian

I always wondered what's the purpose of the rotate instructions some CPUs have (ROL, RCL on x86, for example). What kind of software makes use of these instructions? I first thought they may be used for encryption/computing hash codes, but these libraries are written usually in C, which doesn't have operators that map to these instructions. Has anybody found an use for them? Why where they added to the instructions set?

Read the article

< Previous Page | 76 77 78 79 80 81 82 83 84 85 86 87 | Next Page >