Search Results

Search found 19555 results on 783 pages for 'job performance'.

Page 140/783 | < Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147 | Next Page >

Fastest SFTP client

- by Stan

Protocol: SFTP (port 22) I've tested CuteFtp, FileZilla, SecureCRT and several others. Looks like CuteFTP has the best throughput, usually 200%-400% than others. I've read something about SecureFtp may have slower rate from here. Can anyone explain why CuteFtp has better throughput? And, is there any other FTP client even faster than CuteFtp? Thanks a lot!

Read the article
SPARC T4-4 Beats 8-CPU IBM POWER7 on TPC-H @3000GB Benchmark

- by Brian

Oracle's SPARC T4-4 server delivered a world record TPC-H @3000GB benchmark result for systems with four processors. This result beats eight processor results from IBM (POWER7) and HP (x86). The SPARC T4-4 server also delivered better performance per core than these eight processor systems from IBM and HP. Comparisons below are based upon system to system comparisons, highlighting Oracle's complete software and hardware solution. This database world record result used Oracle's Sun Storage 2540-M2 arrays (rotating disk) connected to a SPARC T4-4 server running Oracle Solaris 11 and Oracle Database 11g Release 2 demonstrating the power of Oracle's integrated hardware and software solution. The SPARC T4-4 server based configuration achieved a TPC-H scale factor 3000 world record for four processor systems of 205,792 QphH@3000GB with price/performance of $4.10/QphH@3000GB. The SPARC T4-4 server with four SPARC T4 processors (total of 32 cores) is 7% faster than the IBM Power 780 server with eight POWER7 processors (total of 32 cores) on the TPC-H @3000GB benchmark. The SPARC T4-4 server is 36% better in price performance compared to the IBM Power 780 server on the TPC-H @3000GB Benchmark. The SPARC T4-4 server is 29% faster than the IBM Power 780 for data loading. The SPARC T4-4 server is up to 3.4 times faster than the IBM Power 780 server for the Refresh Function. The SPARC T4-4 server with four SPARC T4 processors is 27% faster than the HP ProLiant DL980 G7 server with eight x86 processors on the TPC-H @3000GB benchmark. The SPARC T4-4 server is 52% faster than the HP ProLiant DL980 G7 server for data loading. The SPARC T4-4 server is up to 3.2 times faster than the HP ProLiant DL980 G7 for the Refresh Function. The SPARC T4-4 server achieved a peak IO rate from the Oracle database of 17 GB/sec. This rate was independent of the storage used, as demonstrated by the TPC-H @3000TB benchmark which used twelve Sun Storage 2540-M2 arrays (rotating disk) and the TPC-H @1000TB benchmark which used four Sun Storage F5100 Flash Array devices (flash storage). [*] The SPARC T4-4 server showed linear scaling from TPC-H @1000GB to TPC-H @3000GB. This demonstrates that the SPARC T4-4 server can handle the increasingly larger databases required of DSS systems. [*] The SPARC T4-4 server benchmark results demonstrate a complete solution of building Decision Support Systems including data loading, business questions and refreshing data. Each phase usually has a time constraint and the SPARC T4-4 server shows superior performance during each phase. [*] The TPC believes that comparisons of results published with different scale factors are misleading and discourages such comparisons. Performance Landscape The table lists the leading TPC-H @3000GB results for non-clustered systems. TPC-H @3000GB, Non-Clustered Systems System Processor P/C/T – Memory Composite(QphH) $/perf($/QphH) Power(QppH) Throughput(QthH) Database Available SPARC Enterprise M9000 3.0 GHz SPARC64 VII+ 64/256/256 – 1024 GB 386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g R2 09/22/11 SPARC T4-4 3.0 GHz SPARC T4 4/32/256 – 1024 GB 205,792.0 $4.10 190,325.1 222,515.9 Oracle 11g R2 05/31/12 SPARC Enterprise M9000 2.88 GHz SPARC64 VII 32/128/256 – 512 GB 198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g R2 12/09/10 IBM Power 780 4.1 GHz POWER7 8/32/128 – 1024 GB 192,001.1 $6.37 210,368.4 175,237.4 Sybase 15.4 11/30/11 HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 8/64/128 – 512 GB 162,601.7 $2.68 185,297.7 142,685.6 SQL Server 2008 10/13/10 P/C/T = Processors, Cores, Threads QphH = the Composite Metric (bigger is better) $/QphH = the Price/Performance metric in USD (smaller is better) QppH = the Power Numerical Quantity QthH = the Throughput Numerical Quantity The following table lists data load times and refresh function times during the power run. TPC-H @3000GB, Non-Clustered Systems Database Load & Database Refresh System Processor Data Loading(h:m:s) T4Advan RF1(sec) T4Advan RF2(sec) T4Advan SPARC T4-4 3.0 GHz SPARC T4 04:08:29 1.0x 67.1 1.0x 39.5 1.0x IBM Power 780 4.1 GHz POWER7 05:51:50 1.5x 147.3 2.2x 133.2 3.4x HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 08:35:17 2.1x 173.0 2.6x 126.3 3.2x Data Loading = database load time RF1 = power test first refresh transaction RF2 = power test second refresh transaction T4 Advan = the ratio of time to T4 time Complete benchmark results found at the TPC benchmark website http://www.tpc.org. Configuration Summary and Results Hardware Configuration: SPARC T4-4 server 4 x SPARC T4 3.0 GHz processors (total of 32 cores, 128 threads) 1024 GB memory 8 x internal SAS (8 x 300 GB) disk drives External Storage: 12 x Sun Storage 2540-M2 array storage, each with 12 x 15K RPM 300 GB drives, 2 controllers, 2 GB cache Software Configuration: Oracle Solaris 11 11/11 Oracle Database 11g Release 2 Enterprise Edition Audited Results: Database Size: 3000 GB (Scale Factor 3000) TPC-H Composite: 205,792.0 QphH@3000GB Price/performance: $4.10/QphH@3000GB Available: 05/31/2012 Total 3 year Cost: $843,656 TPC-H Power: 190,325.1 TPC-H Throughput: 222,515.9 Database Load Time: 4:08:29 Benchmark Description The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC. TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system. The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor. Key Points and Best Practices Twelve Sun Storage 2540-M2 arrays were used for the benchmark. Each Sun Storage 2540-M2 array contains 12 15K RPM drives and is connected to a single dual port 8Gb FC HBA using 2 ports. Each Sun Storage 2540-M2 array showed 1.5 GB/sec for sequential read operations and showed linear scaling, achieving 18 GB/sec with twelve Sun Storage 2540-M2 arrays. These were stand alone IO tests. The peak IO rate measured from the Oracle database was 17 GB/sec. Oracle Solaris 11 11/11 required very little system tuning. Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric in which to compare systems. The SPARC T4-4 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle Database parallel processes. Six Sun Storage 2540-M2 arrays were mirrored to another six Sun Storage 2540-M2 arrays on which all of the Oracle database files were placed. IO performance was high and balanced across all the arrays. The TPC-H Refresh Function (RF) simulates periodical refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T4-4 server outperformed both the IBM POWER7 server and HP ProLiant DL980 G7 server. (See the RF columns above.) See Also Transaction Processing Performance Council (TPC) Home Page Ideas International Benchmark Page SPARC T4-4 Server oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Sun Storage 2540-M2 Array oracle.com OTN Disclosure Statement TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org. SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads.

Read the article
Computer specs for a large database

- by SpeksETC

What sort of computer specs (CPU, RAM, disk speed) should I use for running queries on a database of 200+ million records? The queries are for a research project, so there is only one "user" and only one query will be running at a time. I tried it on my own laptop with SQL Server with an i3 processor, 2GB RAM, 5400 RPM disk and a simple query didn't finish even after 8+ hours. I have an option to connect a SSD via eSata and upgrade to 4GB RAM, but not sure if this will be enough... Thanks! Edit: The database is about 25 GB and the indexes are not setup properly. When I tried to add an index, I let it run for about 8 hours and it still hadn't finished so I gave up. Should I have more patience :)? In general, the queries will run once in a while and its ok even if it takes a couple hours to complete.... Also, the queries will produce probably about 10 million records which I need to process using Stata/Matlab and I'm concerned that my current laptop is not strong enough, but unsure of the bottleneck....

Read the article
Cleaning up temp files in Mac OS X

- by deddebme

I was a Windows person for more than 10 years. Around 4 months ago, I switched to Mac, and I have never looked back. But there is one thing that bothers me, which is my Mac partition volume is losing space slowly and gradually. I am pretty sure there are a lot of orphaned temporary files laying around in the volume. I know where to find the obsoleted temp files in my Windows partition, how about in Mac OSX?

Read the article
How to use most of memory available on MySQL

- by Zilvinas

I've got a MySQL server which has both InnoDB and MyISAM tables. InnoDB tablespace is quite small under 4 GB. MyISAM is big ~250 GB in total of which 50 GB is for indexes. Our server has 32 GB of RAM but it usually uses only ~8GB. Our key_buffer_size is only 2GB. But our key cache hit ratio is ~95%. I find it hard to believe.. Here's our key statistics: | Key_blocks_not_flushed | 1868 | | Key_blocks_unused | 109806 | | Key_blocks_used | 1714736 | | Key_read_requests | 19224818713 | | Key_reads | 60742294 | | Key_write_requests | 1607946768 | | Key_writes | 64788819 | key_cache_block_size is default at 1024. We have 52 GB's of index data and 2GB key cache is enough to get a 95% hit ratio. Is that possible? On the other side data set is 200GB and since MyISAM uses OS (Centos) caching I would expect it to use a lot more memory to cache accessed myisam data. But at this stage I see that key_buffer is completely used, our buffer pool size for innodb is 4gb and is also completely used that adds up to 6GB. Which means data is cached using just 1 GB? My question is how could I check where all the free memory could be used? How could I check if MyISAM hits OS cache for data reads instead of disk?

Read the article
Know any good network monitoring tool for OSX?

- by sa125

Hi - I'm trying to keep tabs on my home wireless network. Is there a good, GUI based tool that can show connected machines (make sure I don't have 'guests'), traffic stats, etc. Thanks.

Read the article
Just to not to be ignorant.

- by atch

Could anyone explain to me why is it that producers of processors claim that their processor can perform so many thousands (or millions) operations per second and yet typical program (Word, VS etc.) on my machine with 4GB, 3500hz starts with no less than 10 sec. Have to mention that I've just formatted disk and tick any necessary boxes to optimize my machine. So if for example outlook starts in 10 sec I wonder how many millions of operations have to be performed to run such program? Thanks

Read the article
Is Splitting IDE hdds between Primary and Secondary faster?

- by earlz

Hello, I'm doing RAID 0 on two IDE harddrives (yes, this is old hardware). Will the harddrives be faster if I attach them separately so that one is on the Primary IDE controller and the other is on the Secondary IDE controller? Or would it just be as good as having them both on the Primary IDE as master and slave?

Read the article
TIME_WAIT connections not being cleaned up after timeout period expires

- by Mark Dawson

I am stress testing one of my servers by hitting it with a constant stream of new network connections, the tcp_fin_timeout is set to 60, so if I send a constant stream of something like 100 requests per second, I would expect to see a rolling average of 6000 (60 * 100) connections in a TIME_WAIT state, this is happening, but looking in netstat (using -o) to see the timers, I see connections like: TIME_WAIT timewait (0.00/0/0) where their timeout has expired but the connection is still hanging around, I then eventually run out of connections. Anyone know why these connections don't get cleaned up? If I stop creating new connections they do eventually disappear but while I am constantly creating new connections they don't, seems like the kernel isn't getting chance to clean them up? Is there some other config options I need to set to remove the connections as soon as they have expired? The server is running Ubuntu and my web server is nginx. Also it has iptables with connection tracking, not sure if that would cause these TIME_WAIT connections to live on. Thanks Mark.

Read the article
windows is unresponsive but works well in safe mode

- by kacalapy

I was installing a few utility apps and things when my laptop running windows 7 stopped responding. I tried to reboot and still very unresponsive... the mouse moves but i am unable to click on icons or open the start menu. eventually things catch up after over a minute. when i start in safe mode with networking support i am able to operate the laptop at lightening speed and web browsing is a dream. I ran msconfig and shut off all non windows services and still get the same thing. after no results i then got and ran spybot, malwarebytes, and other such tools but got no results. they all show nothing wrong. now what?

Read the article
How to speed up a HP M9517C

- by Jen

I bought a system with 8GB RAM, 1TB HD, Quad-Core AMD Phenom 9550, Nvidia Geforce 9300GE, 64-bit Windows Vista Machine. Bought it primarily because it was cheap and came with 25.5 inch screen. Problem: It's slow - if you can believe it. My Dell laptop 1525 is faster and more stable! I tried installing and dual-booting Linux Mint and ran into video and audio troubles. I need fast and stable and I'm going for awesome. Anyone have some suggestions on making this thing smoking hot? Vista is fine, but slows over time - suspect virus/spyware/etc.. But I need to use Photoshop, Fireworks, Dreamweaver, Illustrator. I've tried the alternatives and I just don't like them. When you've got deadlines looming you want to work with what you know. Also use Skype (and I had audio problems with it in Linux), gotomeeting, gotowebinar. Don't need MS Office. Tried VMWare, Virtualbox and again - I keep getting audio/video problems. I'd love someone's input on THEIR setup and how they got there. I'm sure I need to upgrade my video card, but what should I go to?

Read the article
Max. Bandwidth Cannot be Reached

- by Poyraz Sagtekin

I have a question regarding the bandwidth usage. I don't think it's a very technical question but I really want to know the reason behind the problem. I'm having my education in one of my university's campuses. There is a 200mbps bandwidth availability for approx. 2100 students. Our main campus has 25,000 students and 2gbps bandwidth and there is no problem with the internet connection speed in main campus. However, in my campus, we are having difficulties with the internet connection speed. I went to IT department of my university and they've showed me some graphics about the bandwidth usage. The usage of the bandwidth has never reached 200mbps and it generally is around 130-140mbps. According to this information, maximum bandwidth is never made by the users but the browsing speed is not very convincing. What can be the problem? Maybe this is a silly question but I really want to know the reason. They told me that they've reached 180mbps before.

Read the article
Send mail on event log error trigger safe check frequency

- by Zeb Rawnsley

I want to use powershell to alert me when an error occurs in the event viewer on my new Win2k12 Standard Server, I was thinking I could have the script execute every 10mins but don't want to put any strain on the server just for event log checking, here is the powershell script I want to use: $SystemErrors = Get-EventLog System | Where-Object { $_.EntryType -eq "Error" } If ($SystemErrors.Length -gt 0) { Send-MailMessage -To "[email protected]" -From $env:COMPUTERNAME + @company.co.nz" -Subject $env:COMPUTERNAME + " System Errors" -SmtpServer "smtp.company.co.nz" -Priority High } What is a safe frequency I can run this script at without hurting my server? Hardware: Intel Xeon E5410 @ 2.33GHz x2 32GB RAM 3x 7200RPM S-ATA 1TB (2x RAID1) Edit: With the help of Mathias R. Jessen's answer, I ended up attaching an event to the application & system log with the following script: Param( [string]$LogName ) $ComputerName = $env:COMPUTERNAME; $To = "[email protected]" $From = $ComputerName + "@company.co.nz"; $Subject = $ComputerName + " " + $LogName + " Error"; $SmtpServer = "smtp.company.co.nz"; $AppErrorEvent = Get-EventLog $LogName -Newest 1 | Where-Object { $_.EntryType -eq "Error" }; If ($AppErrorEvent.Length -eq 1) { $AppErrorEventString = $AppErrorEvent | Format-List | Out-String; Send-MailMessage -To $To -From $From -Subject $Subject -Body $AppErrorEventString -SmtpServer $SmtpServer -Priority High; };

Read the article
What's the fastest filesystem for developer builds?

- by Dan Fabulich

I'm putting together a Linux box that will act as a continuous integration build server; we'll mostly build Java stuff, but I think this question applies to any compiled language. What filesystem and configuration settings should I use? (For example, I know I won't need atime for this!) The build server will spend a lot of time reading and writing small files, and scanning directories to see which files have been modified.

Read the article
Looking for the best ec2 setup for 3 sites totaling in 1.5 mil in traffic monthly

- by john h.

I am looking to consolidate our current aws setup of 2 Large ubuntu ec2 servers and 2 large RDS server for our 3 websites that have a total of about 1.5 million hits a month and increasing every month with the majority of traffic (1 mil) to one forum site in the group and the rest of traffic to an ecommerce site and a small wordpress site. So here is my question/thought? Would it be better for us to combine the two ec2 large servers to just one and same with the 2 RDS servers so we run all three sites off one large ec2 and one RDS. -or- Should we setup maybe 2-3 smaller ec2 servers load balenced and a single RDS. -or- Something completely different setup? One concern is that if one site crashes it takes with it the others. It happened in the past but I am pretty sure its because of the forum software and not the server setup. -john

Read the article
Zabbix - Some of the monitored items don't refreh

- by Niro

I'm experiencing a strange issue with Zabbix monitoring a MySQL server. Most of the data from the server such as MySQL queries per second and MySQL uptime , Buffers memory etc. update nicely while some data like CPU iowait time (avg1) , Host local time ,MySQL number of threads and other items which were monitored in the past has last check time of about a week ago. I can't find any logic in this, for example Mysql number of threads and Mysql queries per second are obtained in a similar way so it does not make sense one of them is monitored and one is not. Please help- how can I fix this? Update - I used zabbix_get from the zabbix server to check one of the items on the zabbix client and it works so the problem must be on the zabbix server side

Read the article
My dedicated server keeps getting very slow that it fails to load the application

- by server

I have an application running on Windows Server 2008, running IIS 7.5, SQL Server 2008, 4GB RAM from brinkster. The problem is, every couple of days I get the same 10,000 calls that the system is very slow, and its not operating properly, then after 30 minutes of that it just fails to load. I try to access the server from the remote desktop connection but I can't access it. The only way it I can get it working again is to call the support at brinkster and have them do a manual reboot of the server. After that it works well for some time, and the it re-crashes after some time. Support over there, are not helping a lot.

Read the article
PHP + MYSQL site perfomance

- by Diego

I have to manage a site which wasn't developed by me. It is in PHP using a mysql database, which is located in the web server. The site, sometimes (when the visitors increase too much) stops responding, or respond too slow. I have developed some sites in PHP but never took care of the management so really don't know where to start. The server (the hard) seems to be fine, when the web stops responding the cpu is being used at about 55% and has a lot of memory. I'm not asking someone to solve this issue. I only would really like if someone could give me a few tips about where can I find logs and how should I read and interpret them. So, that way I would be able to know if its the net traffic, the database (which queries), or what. Thanks! Update: Forgot to say: it is a Windows Server 2003. Note: I've recorded about a day with Jet Profiler. I don't really understand all the information it provides but there is one query which it marks as really slow. It makes sense because it is a select with a where clause which has three like condition. Initially I didn't include this in my question because when I run the query from MySQL Query Browser it doesn't take any long. It is under 0.01 seconds.

Read the article
Building Simple Workflows in Oozie

- by dan.mcclary

Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

Read the article
Is 30 calls / second a lot for one IIS server?

- by Lieven Cardoen

We have a RIA application that 300 clients concurrently use in an intranet environment. Together they make 30 calls / second to IIS (asp.net) (actually it's 60 but calls are loadbalanced over two IIS servers). Half of the calls is getting an asset (Caching Profile is used so most of the time cache is hit), the other half is saving data to a sql server. Retrieving an asset is done with a aspx page. Saving the data happens via WebORB, asp.net and Sql Server. So some processing is needed by WebORB (amf decoding, GZIP, ...). We also use Spring.NET, and some of the container objects have a request scope (not a lot). IIS servers -- Virtual machines, 4 CPU, 2 gb RAM. They are based on Windows 2008 x64 SP2 Enterprise Edition. Sql Server 2008 is used. Apparently CPU of both IIS serers is constantly around 60-70%. Now, my question, is the load of 60-70% acceptable and how could we possible bring that down to less % (maybe using only one IIS server)? + Is 2 gb RAM enough? Assets can be up to 20mb, but on average, they are about 30kb. (the load of 60-70% is achieved with assets around 30kb). The data that gets saved with weborb is very small (2kb) and is just one object.

Read the article
RDMA architecture - do you need adapters on both ends?

- by Bobb

I know Linux can use RDMA NICs like Solarflare... I just found Intel has something like that NetEffect cards. But Intel is talking all about clusters.. Can someone please explain. If I want low-latency networking and install RDMA NIC on my server. Is there limitation on where the cable can go? Is there a specific device expected on the other end? Is it special RDMA switch or RDMA adapter before switch or what? Why is this cluster talk? What if I want a single server with Windows (I can install HPC Windows or Windows 2008 R2)?

Read the article
AMD 24 core server memory bandwidth

- by ntherning

I need some help to determine whether the memory bandwidth I'm seeing under Linux on my server is normal or not. Here's the server spec: HP ProLiant DL165 G7 2x AMD Opteron 6164 HE 12-Core 40 GB RAM (10 x 4GB DDR1333) Debian 6.0 Using mbw on this server I get the following numbers: foo1:~# mbw -n 3 1024 Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory. Using 262144 bytes as blocks for memcpy block copy test. Getting down to business... Doing 3 runs per test. 0 Method: MEMCPY Elapsed: 0.58047 MiB: 1024.00000 Copy: 1764.082 MiB/s 1 Method: MEMCPY Elapsed: 0.58012 MiB: 1024.00000 Copy: 1765.152 MiB/s 2 Method: MEMCPY Elapsed: 0.58010 MiB: 1024.00000 Copy: 1765.201 MiB/s AVG Method: MEMCPY Elapsed: 0.58023 MiB: 1024.00000 Copy: 1764.811 MiB/s 0 Method: DUMB Elapsed: 0.36174 MiB: 1024.00000 Copy: 2830.778 MiB/s 1 Method: DUMB Elapsed: 0.35869 MiB: 1024.00000 Copy: 2854.817 MiB/s 2 Method: DUMB Elapsed: 0.35848 MiB: 1024.00000 Copy: 2856.481 MiB/s AVG Method: DUMB Elapsed: 0.35964 MiB: 1024.00000 Copy: 2847.310 MiB/s 0 Method: MCBLOCK Elapsed: 0.23546 MiB: 1024.00000 Copy: 4348.860 MiB/s 1 Method: MCBLOCK Elapsed: 0.23544 MiB: 1024.00000 Copy: 4349.230 MiB/s 2 Method: MCBLOCK Elapsed: 0.23544 MiB: 1024.00000 Copy: 4349.359 MiB/s AVG Method: MCBLOCK Elapsed: 0.23545 MiB: 1024.00000 Copy: 4349.149 MiB/s On one of my other servers (based on Intel Xeon E3-1270): foo2:~# mbw -n 3 1024 Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory. Using 262144 bytes as blocks for memcpy block copy test. Getting down to business... Doing 3 runs per test. 0 Method: MEMCPY Elapsed: 0.18960 MiB: 1024.00000 Copy: 5400.901 MiB/s 1 Method: MEMCPY Elapsed: 0.18922 MiB: 1024.00000 Copy: 5411.690 MiB/s 2 Method: MEMCPY Elapsed: 0.18944 MiB: 1024.00000 Copy: 5405.491 MiB/s AVG Method: MEMCPY Elapsed: 0.18942 MiB: 1024.00000 Copy: 5406.024 MiB/s 0 Method: DUMB Elapsed: 0.14838 MiB: 1024.00000 Copy: 6901.200 MiB/s 1 Method: DUMB Elapsed: 0.14818 MiB: 1024.00000 Copy: 6910.561 MiB/s 2 Method: DUMB Elapsed: 0.14820 MiB: 1024.00000 Copy: 6909.628 MiB/s AVG Method: DUMB Elapsed: 0.14825 MiB: 1024.00000 Copy: 6907.127 MiB/s 0 Method: MCBLOCK Elapsed: 0.04362 MiB: 1024.00000 Copy: 23477.623 MiB/s 1 Method: MCBLOCK Elapsed: 0.04262 MiB: 1024.00000 Copy: 24025.151 MiB/s 2 Method: MCBLOCK Elapsed: 0.04258 MiB: 1024.00000 Copy: 24048.849 MiB/s AVG Method: MCBLOCK Elapsed: 0.04294 MiB: 1024.00000 Copy: 23847.599 MiB/s For reference here's what I get on my Intel based laptop: laptop:~$ mbw -n 3 1024 Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory. Using 262144 bytes as blocks for memcpy block copy test. Getting down to business... Doing 3 runs per test. 0 Method: MEMCPY Elapsed: 0.40566 MiB: 1024.00000 Copy: 2524.269 MiB/s 1 Method: MEMCPY Elapsed: 0.38458 MiB: 1024.00000 Copy: 2662.638 MiB/s 2 Method: MEMCPY Elapsed: 0.38876 MiB: 1024.00000 Copy: 2634.043 MiB/s AVG Method: MEMCPY Elapsed: 0.39300 MiB: 1024.00000 Copy: 2605.600 MiB/s 0 Method: DUMB Elapsed: 0.30707 MiB: 1024.00000 Copy: 3334.745 MiB/s 1 Method: DUMB Elapsed: 0.30425 MiB: 1024.00000 Copy: 3365.653 MiB/s 2 Method: DUMB Elapsed: 0.30342 MiB: 1024.00000 Copy: 3374.849 MiB/s AVG Method: DUMB Elapsed: 0.30491 MiB: 1024.00000 Copy: 3358.328 MiB/s 0 Method: MCBLOCK Elapsed: 0.07875 MiB: 1024.00000 Copy: 13003.670 MiB/s 1 Method: MCBLOCK Elapsed: 0.08374 MiB: 1024.00000 Copy: 12228.034 MiB/s 2 Method: MCBLOCK Elapsed: 0.07635 MiB: 1024.00000 Copy: 13411.216 MiB/s AVG Method: MCBLOCK Elapsed: 0.07961 MiB: 1024.00000 Copy: 12862.006 MiB/s So according to mbw my laptop is 3 times faster than the server!!! Please help me explain this. I've also tried to mount a ram disk and use dd to benchmark it and I get similar differences so I don't think mbw is to blame. I've checked the BIOS settings and the memory seem to be running at full speed. According to the hosting company the modules are all OK. Could this have something to do with NUMA? It seems like Node Interleaving is disabled on this server. Will enabling it (thus turning off NUMA) make a difference? foo1:~# numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 node 0 size: 8190 MB node 0 free: 7898 MB node 1 cpus: 6 7 8 9 10 11 node 1 size: 12288 MB node 1 free: 12073 MB node 2 cpus: 18 19 20 21 22 23 node 2 size: 12288 MB node 2 free: 12034 MB node 3 cpus: 12 13 14 15 16 17 node 3 size: 8192 MB node 3 free: 8032 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10

Read the article
SQL server queries are really slow only on first run

- by JoelFan

Somewhat strange problem... when I start my .NET app for the first time after rebooting my machine, the SQL Server queries are really slow... when I pause the debugger, I notice that it's hanging on getting the response from the query. This only happens when connecting to a remote SQL server (2008)... if I connect to one on my local machine, it's fine. Also, if I restart the app, it works fast, even off the remote SQL server, and subsequent runs are also fine. The only problem is when I connect to a remote SQL server for the first time after rebooting my machine. What's more, I have even noticed this same exact behavior with a 3rd party app (also .NET) that also connects to a remote SQL server. Another piece of info... this has only started hapenning since I upgraded my machine from XP to Win7 (64 bit). Also, other developers on my team who upgraded to Win7 are seeing the same behavior (both with the app we're developing and the 3rd party .NET app). (copied from http://stackoverflow.com/questions/2014814/sql-server-queries-are-really-slow-only-on-first-run )

Read the article
Slow Windows Explorer on Windows 7

- by MadBoy

I have Laptop with i7 (4 cores), 8GB ram and SSD OCZ Vertex 3 MaxIOPS which in testing that I did just now does 400mb/s+ read/write. However the responsiveness of Windows Explorer is far from being perfect. Opening up Computer, Documents, going into folders is very slow (1-5seconds). I don't have any viruses or spyware and I have tried changing properties to optimize view for General Items. I tried disabling Search Indexer but it made search in Outlook 2010 crawl and didn't bring any other effect. Even double clicking on file takes some time to open things up (like clicking a Word document). I don't have any drives mapped, my computer is not joined to domain. I have multiple VPN connections that I connect to but they all have disabled default gateways. I tried using CC Cleaner or some Windows 7 Tweaks app to disable some things. I am power user using Visual Studio, Tortoise SVN and other developer/administration apps. Any non obvious ideas? Edit: So I've been trying to pinpoint where the issue comes from and it seems that straight after reboot Windows Explorer opens very fast, when I load 3-4 programs (Royal TS, Visual Studio, Outlook) it's noticeably slower and the more programs I have it gets worse. After I start closing programs it starts working better and if I leave 2 open it's fast again. I tried doing some research with DiskMon and other programs from sysinternals but couldn't find anything suspicious. Below are stats during normal usage with a lots of programs open: - Ram usage with a lot of programs open and no swap file (i disabled it for testing): 6.95GB - CPU usage: 15%, none of the cores takes more then 50% (I have VS 2010 open x 4) HD Tune Pro: OCZ-VERTEX3 MI Benchmark Test capacity: full Read transfer rate Transfer Rate Minimum : 363.9 MB/s Transfer Rate Maximum : 505.5 MB/s Transfer Rate Average : Access Time : Burst Rate : CPU Usage : HD Tune Pro: OCZ-VERTEX3 MI File Benchmark Drive C: Transfer rate test File Size: 500 MB Sequential read 484102 KB/s Sequential write 444714 KB/s Random read 7779 IOPS Random write 16888 IOPS Random read (queue depth = 32) 73007 IOPS Random write (queue depth = 32) 69790 IOPS HD Tune Pro: OCZ-VERTEX3 MI Random Access Test capacity: full Read test Transfer size operations / sec avg. access time max. access time avg. speed 512 bytes 3260 IOPS 0.306 ms 2.106 ms 1.592 MB/s 4 KB 4161 IOPS 0.240 ms 2.006 ms 16.256 MB/s 64 KB 2382 IOPS 0.419 ms 2.367 ms 148.934 MB/s 1 MB 449 IOPS 2.225 ms 4.197 ms 449.407 MB/s Random 809 IOPS 1.235 ms 6.551 ms 410.527 MB/s HD Tune Pro: OCZ-VERTEX3 MI Extra Tests Test capacity: full Random seek 3975 IOPS 0.252 ms 1.941 MB/s Random seek 4 KB 4245 IOPS 0.236 ms 16.583 MB/s Butterfly seek 4086 IOPS 0.245 ms 1.995 MB/s Random seek / size 64 KB 3812 IOPS 0.262 ms 58.606 MB/s Random seek / size 8 MB 120 IOPS 8.348 ms 485.737 MB/s Sequential outer 4524 IOPS 0.221 ms 282.721 MB/s Sequential middle 4429 IOPS 0.226 ms 276.818 MB/s Sequential inner 5504 IOPS 0.182 ms 344.000 MB/s Burst rate 4472 IOPS 0.224 ms 279.475 MB/s

Read the article
Best way to monitor a Grid of computers?

- by marc.riera

I've installed Sun Grid Engine on 10 nodes, and one virtual master host. Now I have to monitor all the resources prior to launching it into production, but I don't know which is the best way. I've tried using xml-qstat, but it seems unstable. Any tips or suggestions? Anyone got experience on this? thanks.

Read the article

< Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147 | Next Page >