dave jarvis - Page 27 - Developer IT

Random Page Cost and Planning

- by Dave Jarvis

A query (see below) that extracts climate data from weather stations within a given radius of a city using the dates for which those weather stations actually have data. The query uses the table's only index, rather effectively: CREATE UNIQUE INDEX measurement_001_stc_idx ON climate.measurement_001 USING btree (station_id, taken, category_id); Reducing the server's configuration value for random_page_cost from 2.0 to 1.1 had a massive performance improvement for the given range (nearly an order of magnitude) because it suggested to PostgreSQL that it should use the index. While the results now return in 5 seconds (down from ~85 seconds), problematic lines remain. Bumping the query's end date by a single year causes a full table scan: sc.taken_start >= '1900-01-01'::date AND sc.taken_end <= '1997-12-31'::date AND How do I persuade PostgreSQL to use the indexes regardless of years between the two dates? (A full table scan against 43 million rows is probably not the best plan.) Find the EXPLAIN ANALYSE results below the query. Thank you! Query SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5182 AND earth_distance( ll_to_earth(c.latitude_decimal,c.longitude_decimal), ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <= 30 AND s.elevation BETWEEN 0 AND 3000 AND s.applicable = TRUE AND sc.station_id = s.id AND sc.category_id = 1 AND sc.taken_start >= '1900-01-01'::date AND sc.taken_end <= '1996-12-31'::date AND m.station_id = s.id AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND m.category_id = sc.category_id GROUP BY extract(YEAR FROM m.taken) ORDER BY extract(YEAR FROM m.taken) 1900 to 1996: Index "Sort (cost=1348597.71..1348598.21 rows=200 width=12) (actual time=2268.929..2268.935 rows=92 loops=1)" " Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))" " Sort Method: quicksort Memory: 32kB" " -> HashAggregate (cost=1348586.56..1348590.06 rows=200 width=12) (actual time=2268.829..2268.886 rows=92 loops=1)" " -> Nested Loop (cost=0.00..1344864.01 rows=744510 width=12) (actual time=0.807..2084.206 rows=134893 loops=1)" " Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (sc.station_id = m.station_id))" " -> Nested Loop (cost=0.00..12755.07 rows=1220 width=18) (actual time=0.502..521.937 rows=23 loops=1)" " Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) <= 30::double precision)" " -> Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1)" " Index Cond: (id = 5182)" " -> Nested Loop (cost=0.00..9907.73 rows=3659 width=34) (actual time=0.014..28.937 rows=3458 loops=1)" " -> Seq Scan on station_category sc (cost=0.00..970.20 rows=3659 width=14) (actual time=0.008..10.947 rows=3458 loops=1)" " Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1996-12-31'::date) AND (category_id = 1))" " -> Index Scan using station_pkey1 on station s (cost=0.00..2.43 rows=1 width=20) (actual time=0.004..0.004 rows=1 loops=3458)" " Index Cond: (s.id = sc.station_id)" " Filter: (s.applicable AND (s.elevation >= 0) AND (s.elevation <= 3000))" " -> Append (cost=0.00..1072.27 rows=947 width=18) (actual time=6.996..63.199 rows=5865 loops=23)" " -> Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.000..0.000 rows=0 loops=23)" " Filter: (m.category_id = 1)" " -> Bitmap Heap Scan on measurement_001 m (cost=20.79..1047.27 rows=941 width=18) (actual time=6.995..62.390 rows=5865 loops=23)" " Recheck Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))" " -> Bitmap Index Scan on measurement_001_stc_idx (cost=0.00..20.55 rows=941 width=0) (actual time=5.775..5.775 rows=5865 loops=23)" " Index Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))" "Total runtime: 2269.264 ms" 1900 to 1997: Full Table Scan "Sort (cost=1370192.26..1370192.76 rows=200 width=12) (actual time=86165.797..86165.809 rows=94 loops=1)" " Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))" " Sort Method: quicksort Memory: 32kB" " -> HashAggregate (cost=1370181.12..1370184.62 rows=200 width=12) (actual time=86165.654..86165.736 rows=94 loops=1)" " -> Hash Join (cost=4293.60..1366355.81 rows=765061 width=12) (actual time=534.786..85920.007 rows=139721 loops=1)" " Hash Cond: (m.station_id = sc.station_id)" " Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end))" " -> Append (cost=0.00..867005.80 rows=43670150 width=18) (actual time=0.009..79202.329 rows=43670079 loops=1)" " -> Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.001..0.001 rows=0 loops=1)" " Filter: (category_id = 1)" " -> Seq Scan on measurement_001 m (cost=0.00..866980.80 rows=43670144 width=18) (actual time=0.008..73312.008 rows=43670079 loops=1)" " Filter: (category_id = 1)" " -> Hash (cost=4277.93..4277.93 rows=1253 width=18) (actual time=534.704..534.704 rows=25 loops=1)" " -> Nested Loop (cost=847.87..4277.93 rows=1253 width=18) (actual time=415.837..534.682 rows=25 loops=1)" " Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) <= 30::double precision)" " -> Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.012..0.014 rows=1 loops=1)" " Index Cond: (id = 5182)" " -> Hash Join (cost=847.87..1352.07 rows=3760 width=34) (actual time=6.427..35.107 rows=3552 loops=1)" " Hash Cond: (s.id = sc.station_id)" " -> Seq Scan on station s (cost=0.00..367.25 rows=7948 width=20) (actual time=0.004..23.529 rows=7949 loops=1)" " Filter: (applicable AND (elevation >= 0) AND (elevation <= 3000))" " -> Hash (cost=800.87..800.87 rows=3760 width=14) (actual time=6.416..6.416 rows=3552 loops=1)" " -> Bitmap Heap Scan on station_category sc (cost=430.29..800.87 rows=3760 width=14) (actual time=2.316..5.353 rows=3552 loops=1)" " Recheck Cond: (category_id = 1)" " Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1997-12-31'::date))" " -> Bitmap Index Scan on station_category_station_category_idx (cost=0.00..429.35 rows=6376 width=0) (actual time=2.268..2.268 rows=6339 loops=1)" " Index Cond: (category_id = 1)" "Total runtime: 86165.936 ms"

Read the article

Averaging initial values for rolling series

- by Dave Jarvis

Question Given a maximum sliding window size of 40 (i.e., the set of numbers in the list cannot exceed 40), what is the calculation to ensure a smooth averaging transition as the set size grows from 1 to 40? Problem Description Creating a trend line for a set of data has skewed initial values. The complete set of values is unknown at runtime: they are provided one at a time. It seems like a reverse-weighted average is required so that the initial values are averaged differently. In the image below the leftmost data for the trend line are incorrectly averaged. Current Solution Created a new type of ArrayList subclass that calculates the appropriate values and ensures its size never goes beyond the bounds of the sliding window: /** * A list of Double values that has a maximum capacity enforced by a sliding * window. Can calculate the average of its values. */ public class AveragingList extends ArrayList<Double> { private float slidingWindowSize = 0.0f; /** * The initial capacity is used for the sliding window size. * @param slidingWindowSize */ public AveragingList( int slidingWindowSize ) { super( slidingWindowSize ); setSlidingWindowSize( ( float )slidingWindowSize ); } public boolean add( Double d ) { boolean result = super.add( d ); // Prevent the list from exceeding the maximum sliding window size. // if( size() > getSlidingWindowSize() ) { remove( 0 ); } return result; } /** * Calculate the average. * * @return The average of the values stored in this list. */ public double average() { double result = 0.0; int size = size(); for( Double d: this ) { result += d.doubleValue(); } return (double)result / (double)size; } /** * Changes the maximum number of numbers stored in this list. * * @param slidingWindowSize New maximum number of values to remember. */ public void setSlidingWindowSize( float slidingWindowSize ) { this.slidingWindowSize = slidingWindowSize; } /** * Returns the number used to determine the maximum values this list can * store before it removes the first entry upon adding another value. * @return The maximum number of numbers stored in this list. */ public float getSlidingWindowSize() { return slidingWindowSize; } } Resulting Image Example Input The data comes into the function one value at a time. For example, data points (Data) and calculated averages (Avg) typically look as follows: Data: 17.0 Avg : 17.0 Data: 17.0 Avg : 17.0 Data: 5.0 Avg : 13.0 Data: 5.0 Avg : 11.0 Related Sites The following pages describe moving averages, but typically when all (or sufficient) data is known: http://www.cs.princeton.edu/introcs/15inout/MovingAverage.java.html http://stackoverflow.com/questions/2161815/r-zoo-series-sliding-window-calculation http://taragana.blogspot.com/ http://www.dreamincode.net/forums/index.php?showtopic=92508 http://blogs.sun.com/nickstephen/entry/dtrace_and_moving_rolling_averages

Read the article

AJAX, PHP, XML, and cascading drop-down lists

- by Dave Jarvis

What PHP libraries would you recommend to implement the following: Three dependent drop-down lists Three XML data sources AJAX-based Essentially, I'd like to create an XML database and wire up a form that allows the user to select three different dependent parameters: User clicks Region User clicks District (filtered by Region) User clicks Station (filtered by District) Even though I would like to use PHP and XML, the general problem is: One XHTML form Three dependent, cascading drop-down lists Three flat files (no relational database) for the list data The solution must be efficient, simple, reliable, and cross-browser. What technologies would you recommend to solve the problem? Thank you!

Read the article

Write binary stream to browser using PHP

- by Dave Jarvis

Background Trying to stream a PDF report written using iReport through PHP to the browser. The general problem is: how do you write binary data to the browser using PHP? Working Code The following code does the job, but (for many reasons) it is not as efficient as it should be (the code writes a file then sends the file contents the browser). // Load the MySQL database driver. // java( 'java.lang.Class' )->forName( 'com.mysql.jdbc.Driver' ); // Attempt a database connection. // $conn = java( 'java.sql.DriverManager' )->getConnection( "jdbc:mysql://localhost:3306/climate?user=$user&password=$password" ); // Extract parameters. // $params = new java('java.util.HashMap'); $params->put('DistrictCode', '101'); $params->put('StationCode', '0066'); $params->put('CategoryCode', '010'); // Use the fill manager to produce the report. // $fm = java('net.sf.jasperreports.engine.JasperFillManager'); $pm = $fm->fillReport($report, $params, $conn); header('Cache-Control: no-cache private'); header('Content-Description: File Transfer'); header('Content-Disposition: attachment, filename=climate-report.pdf'); header('Content-Type: application/pdf'); header('Content-Transfer-Encoding: binary'); header('Content-Length: ' . strlen( $result ) ); $path = realpath( "." ) . "/output.pdf"; $em = java('net.sf.jasperreports.engine.JasperExportManager'); $result = $em->exportReportToPdfFile($pm,$path); readfile( $path ); $conn->close(); Non-working Code To remove the slight redundancy (i.e., write directly to the browser), the following code looks like it should work, but it does not: $em = java('net.sf.jasperreports.engine.JasperExportManager'); $result = $em->exportReportToPdf($pm); header('Content-Length: ' . strlen( $result ) ); echo $result; Content is sent to the browser, but the file is corrupt (it begins with the PDF header) and cannot be read by any PDF reader. Question How can I take out the middle step of writing to the file and write directly to the browser so that the PDF is not corrupted? Thank you!

Read the article

Most efficient way to check if a date falls between two dates?

- by Dave Jarvis

Given: Start Month & Start Day End Month & End Day Any Year What SQL statement results in TRUE if a date lands between the Start and End days? 1st example: Start Date = 11-22 End Date = 01-17 Year = 2009 Specific Date = 2010-01-14 TRUE 2nd example: Start Date = 11-22 End Date = 11-16 Year = 2009 Specific Date = 2010-11-20 FALSE 3rd example: Start Date = 02-25 End Date = 03-19 Year = 2004 Specific Date = 2004-02-29 TRUE I was thinking of using the MySQL functions datediff and sign plus a CASE condition to determine whether the year wraps, but it seems rather expensive. Am looking for a simple, efficient calculation. Update The problem is the end date cannot simply use the year. The year must be increased if the end month/day combination happens before the start date. The start date is easy: Start Date = date( concat_ws( '-', year, Start Month, Start Day ) The end date is not so simple. Thank you!

Read the article

Time Saving Minutiae

- by Dave Jarvis

What little tricks do you use to save time when developing? For example: In JDeveloper, change the default editor to Source view from Design view for JSP files. Tools » Preferences » File Types » Default Editors » JSP Tag File By default, JDeveloper opens JSP files in Design view, which can take 10 to 15 seconds to begin editing a file. Opening files in Source view only takes 2 seconds.

Read the article

Multiple outliers for two variable linear regression

- by Dave Jarvis

Problem Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious: Question Given: T - Set of all temperatures Y - Set of all years ST - Sum of temperatures. SY - Sum of years. N - Number of elements T(n) - Temperature of the nth element in the temperature set How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.) Related Sites I am slowly working through these sites to get a better understanding of the problem: Multiple Outliers Detection Procedures in Linear Regression M-estimator Measure of Surprise for Outlier Detection Ordinary Least Squares Linear Regression Many thanks!

Read the article

Create date efficiently

- by Dave Jarvis

On Pavel's page is the following function: CREATE OR REPLACE FUNCTION makedate(year int, dayofyear int) RETURNS date AS $$ SELECT (date '0001-01-01' + ($1 - 1) * interval '1 year' + ($2 - 1) * interval '1 day'):: date $$ LANGUAGE sql; I have the following code: makedate(y.year,1) What is the fastest way in PostgreSQL to create a date for January 1st of a given year? Pavel's function would lead me to believe it is: date '0001-01-01' + y.year * interval '1 year' + interval '1 day'; My thought would be more like: to_date( y.year||'-1-1', 'YYYY-MM-DD'); Am looking for the fastest way using PostgreSQL 8.4. (The query that uses the date function can select between 100,000 and 1 million records, so it needs speed.) Thank you!

Read the article

Corporate Wiki Organization - Technical Documentation

- by Dave Jarvis

Corporations have documents describing various aspects of their technical systems, including: Custom Applications Custom Development Frameworks Third Party Applications Accounting Bug Tracking Network Management How To Guides User Manuals Software Tools Web Browsers Development IDEs Graphics GIMP xv Text Editing File Transfer ncFTP WinSCP Hardware Servers Web Database Exchange File Network Devices Printers What other items are missing from the list, and how would you organize it? (For example, would Software Tools make more sense under Third Party Applications?) Try to think about where you, a software developer, would expect to find the information by browsing (not searching). A few constraints: The structure should not go beyond three levels deep. Avoid the word "and" in favour of two different categories. Keep the structure general: it should appy as broadly as possible. Target audience is primarily technical.

Read the article

How does this code break the Law of Demeter?

- by Dave Jarvis

The following code breaks the Law of Demeter: public class Student extends Person { private Grades grades; public Student() { } /** Must never return null; throw an appropriately named exception, instead. */ private synchronized Grades getGrades() throws GradesException { if( this.grades == null ) { this.grades = createGrades(); } return this.grades; } /** Create a new instance of grades for this student. */ protected Grades createGrades() throws GradesException { // Reads the grades from the database, if needed. // return new Grades(); } /** Answers if this student was graded by a teacher with the given name. */ public boolean isTeacher( int year, String name ) throws GradesException, TeacherException { // The method only knows about Teacher instances. // return getTeacher( year ).nameEquals( name ); } private Grades getGradesForYear( int year ) throws GradesException { // The method only knows about Grades instances. // return getGrades().getForYear( year ); } private Teacher getTeacher( int year ) throws GradesException, TeacherException { // This method knows about Grades and Teacher instances. A mistake? // return getGradesForYear( year ).getTeacher(); } } public class Teacher extends Person { public Teacher() { } /** * This method will take into consideration first name, * last name, middle initial, case sensitivity, and * eventually it could answer true to wild cards and * regular expressions. */ public boolean nameEquals( String name ) { return getName().equalsIgnoreCase( name ); } /** Never returns null. */ private synchronized String getName() { if( this.name == null ) { this.name == ""; } return this.name; } } Questions How is the LoD broken? Where is the code breaking the LoD? How should the code be written to uphold the LoD?

Read the article

Revert scan state when prompting for variable values

- by Dave Jarvis

How do you detect the current SCAN state and revert it after changing it? An exerpt from the script in question: SET SCAN OFF SET ECHO ON SET SQLBLANKLINES ON SET SCAN ON UPDATE TABLE_NAME SET CREATED_BY = &&created_by; SET SCAN OFF The problem is that if the script doesn't have the first line (SET SCAN OFF), then the code to prompt the user should not turn the SCAN state off. In pseudocode, we'd like to do the following: SET SCAN OFF SET ECHO ON SET SQLBLANKLINES ON PUSH SCAN STATE SET SCAN ON UPDATE TABLE_NAME SET CREATED_BY = &&created_by; POP SCAN STATE The psudeocode PUSH SCAN STATE remembers the state so that if the code is altered by removing the first line, the rest of the script still works as expected.

Read the article

Read line comments in INI file

- by Dave Jarvis

The parse_ini_file function removes comments in configuration files. What would you do to keep the comments that are associated with the next line? For example: [email] ; Verify that the email's domain has a mail exchange (MX) record. validate_domain = true Am thinking of using X(HT)ML and XSLT to transform the content into an INI file (so that the documentation and options can be single sourced). For example: <h1>email</h1> <p>Verify that the email's domain has a mail exchange (MX) record.</p> <dl> <dt>validate_domain</dt> <dd>true</dd> </dl> Any other ideas?

Read the article

Date arithmetic using integer values

- by Dave Jarvis

Problem String concatenation is slowing down a query: date(extract(YEAR FROM m.taken)||'-1-1') d1, date(extract(YEAR FROM m.taken)||'-1-31') d2 This is realized in code as part of a string, which follows (where the p_ variables are integers): date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1, date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2 This part of the query runs in 3.2 seconds with the dates, and 1.5 seconds without, leading me to believe there is ample room for improvement. Question What is a better way to create the date (presumably without concatenation)? Many thanks!

Read the article

OO implementation of RFC 2445

- by Dave Jarvis

Question What PHP library would you recommend for attaching meeting notices to email? Preference given to: Swift Mailer integration Simple Object-Oriented library Application Scheduling of appointments for medical clinics. After the user books an appointment, it would be great to send the meeting notice to their email address (upon request, of course). Thank you!

Read the article

Update a PDF to include an encrypted, hidden, unique identifier?

- by Dave Jarvis

Background The idea is this: Person provides contact information for online book purchase Book, as a PDF, is marked with a unique hash Person downloads book PDF passwords are annoying and extremely easy to circumvent. The ideal process would be something like: Generate hash based on contact information Store contact information and hash in database Acquire book lock Update an "include" file with hash text Generate book as PDF (using pdflatex) Apply hash to book Release book lock Send email with book download link Technologies The following technologies can be used (other programming languages are possible, but libraries will likely be limited to those supplied by the host): C, Java, PHP LaTeX files PDF files Linux Question What programming techniques (or open source software) should I investigate to: Embed a unique hash (or other mark) to a PDF Create a collusion-attack resistant mark Develop a non-fragile (e.g., PDF -> EPS -> PDF still contains the mark) solution Research I have looked at the following possibilities: Steganography Natural Language Processing (NLP) Convert blank pages in PDF to images; mark those images; reassemble PDF LaTeX watermark package ImageMagick Steganograhy requires keeping a master copy of the images, and I'm not sure if the watermark would survive PDF -> EPS -> PDF, or other types of conversion. LaTeX creates an image cache, so any steganographic process would have to intercept that process somehow. NLP introduces grammatical errors. Inserting blank pages as images is immediately suspect; it is easy to replace suspicious blank pages. The LaTeX watermark package draws visible marks. ImageMagick draws visible marks. What other solutions are possible? Related Links http://www.tcpdf.org/ invisible watermarks in images Thank you!

Read the article

Simulate mouse movement in Ubuntu

- by Dave Jarvis

Problem Am looking to automatically move the mouse cursor and simulate mouse button clicks from the command-line using an external script. Am not looking to: Record mouse movement and playback (e.g., xnee, xmacro) Instantly move the mouse from one location to another (e.g., xdotool, Python's warp_pointer) Ideal Solution What I'd like to do is the following: Edit a simple script file (e.g., mouse-script.txt). Add a list of coordinates, movement speeds, delays, and button clicks. For example: (x, y, rate) = (500, 500, 50) sleep = 5 click = left Run the script: xsim < mouse-script.txt. Question How do you automate mouse movement so that it transitions from its current location to another spot on the screen, at a specific velocity? For example: xdotool mousemove 500 500 --rate 50 The --rate 50 doesn't exist with xdotool. I could write a script that uses xdotool to get the current mouse coordinates then move it a pixel at a time to the destination with a suitable sleep interval; what automated testing tool already does this? Thank you.

Read the article

Cast integer to real

- by Dave Jarvis

Question How do you cast an INTEGER value as a REAL value? Attempts CAST( Y.YEAR AS REAL), but that failed (the documentation indicates you cannot CAST or CONVERT values to REALs. Y.YEAR + 0.0, but that failed, too. Error Message Using udf_slope fails due to: Can't initialize function 'slope'; slope() requires a real as parameter 2 Code SELECT D.AMOUNT, Y.YEAR, slope(D.AMOUNT, Y.YEAR + 0.0) as SLOPE, intercept(D.AMOUNT, Y.YEAR + 0.0) as INTERCEPT FROM YEAR_REF Y, DAILY D Here, D.AMOUNT is a FLOAT and Y.YEAR is an INTEGER. Thank you!

Read the article

Bulk Insert of hundreds of millions of records

- by Dave Jarvis

What is the fastest way to insert 237 million records into a table that has rules (for distributing the data across 84 child tables)? First I tried inserts. No go. Then I tried inserts with BEGIN/COMMIT. Not nearly fast enough. Next, I tried COPY FROM, but then noticed the documentation states that the rules are ignored. (And it was having difficulties with the column order and date format -- it said that '1984-07-1' was not a valid integer; true, but a bit unexpected.) Some example data: station_id,taken,amount,category_id,flag 1,'1984-07-1',0,4, 1,'1984-07-2',0,4, 1,'1984-07-3',0,4, 1,'1984-07-4',0,4,T Here is the table structure (with one rule included): CREATE TABLE climate.measurement ( id bigserial NOT NULL, station_id integer NOT NULL, taken date NOT NULL, amount numeric(8,2) NOT NULL, category_id smallint NOT NULL, flag character varying(1) NOT NULL DEFAULT ' '::character varying ) WITH ( OIDS=FALSE ); ALTER TABLE climate.measurement OWNER TO postgres; CREATE OR REPLACE RULE i_measurement_01_001 AS ON INSERT TO climate.measurement WHERE date_part('month'::text, new.taken)::integer = 1 AND new.category_id = 1 DO INSTEAD INSERT INTO climate.measurement_01_001 (id, station_id, taken, amount, category_id, flag) VALUES (new.id, new.station_id, new.taken, new.amount, new.category_id, new.flag); I can generate the data into any format. Am looking for something that won't take four days. I originally had the data in MySQL (still do), but am hoping to get a performance increase by switching to PostgreSQL and am eager to use its PL/R extensions for stats. I was also thinking about using: http://pgbulkload.projects.postgresql.org/ Any help, tips, or guidance would be greatly appreciated. Thank you!

Read the article

Round date to fiscal year

- by Dave Jarvis

The following database view rounds the date back to the closest fiscal year (April 1st): CREATE OR REPLACE VIEW FISCAL_YEAR_VW AS SELECT CASE WHEN to_number(to_char( SYSDATE, 'MM' )) < 4 THEN to_date('1-APR-'||to_char(add_months(SYSDATE, -12), 'YYYY'), 'dd-MON-yyyy') ELSE to_date('1-APR-'||to_char(SYSDATE, 'YYYY'), 'dd-MON-yyyy') END AS fiscal_year FROM dual; This allows us to calculate the current fiscal year based on today's date. How can this calculation be simplified or optimized?

Read the article

Automated Oracle Schema Migration Tool

- by Dave Jarvis

What are some tools (commercial or OSS) that provide a GUI-based mechanism for creating schema upgrade scripts? To be clear, here are the tool responsibilities: Obtain connection to recent schema version (called "source"). Obtain connection to previous schema version (called "target"). Compare all schema objects between source and target. Create a script to make the target schema equivalent to the source schema ("upgrade script"). Create a rollback script to revert the source schema, used if the upgrade script fails (at any point). Create individual files for schema objects. The software must: Use ALTER TABLE instead of DROP and CREATE for renamed columns. Work with Oracle 10g or greater. Create scripts that can be batch executed (via command-line). Trivial installation process. (Bonus) Create scripts that can be executed with SQL*Plus. Here are some examples (from StackOverflow, ServerFault, and Google searches): Change Manager Oracle SQL Developer Software that does not meet the criteria, or cannot be evaluated, includes: TOAD PL/SQL Developer - Invalid SQL*Plus statements. Does not produce ALTER statements. SQL Fairy - No installer. Complex installation process. Poorly documented. DBDiff - Crippled data set evaluation, poor customer support. OrbitDB - Crippled data set evaluation. SchemaCrawler - No easily identifiable download version for Oracle databases. SQL Compare - SQL Server, not Oracle. LiquiBase - Requires changing the development process. No installer. Manually edit config files. Does not recognize its own baseUrl parameter. The only acceptable crippling of the evaluation version is by time. Crippling by restricting the number of tables and views hides possible bugs that are only visible in the software during the attempt to migrate hundreds of tables and views.

Read the article

Optimal two variable linear regression SQL statement (censoring outliers)

- by Dave Jarvis

Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL Code SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 15 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) <15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t Data The data is visualized here (with five outliers highlighted): Questions How do I return the y value against all rows without repeating the same query to collect and collate the data? That is, how do I "reuse" the list of t values? How would you change the query to eliminate outliers (at an 85% confidence interval)? The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)? (1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228 (2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406 I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65). Thank you!

Read the article

Optimize date query for large child tables: GiST or GIN?

- by Dave Jarvis

Problem 72 child tables, each having a year index and a station index, are defined as follows: CREATE TABLE climate.measurement_12_013 ( -- Inherited from table climate.measurement_12_013: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass), -- Inherited from table climate.measurement_12_013: station_id integer NOT NULL, -- Inherited from table climate.measurement_12_013: taken date NOT NULL, -- Inherited from table climate.measurement_12_013: amount numeric(8,2) NOT NULL, -- Inherited from table climate.measurement_12_013: category_id smallint NOT NULL, -- Inherited from table climate.measurement_12_013: flag character varying(1) NOT NULL DEFAULT ' '::character varying, CONSTRAINT measurement_12_013_category_id_check CHECK (category_id = 7), CONSTRAINT measurement_12_013_taken_check CHECK (date_part('month'::text, taken)::integer = 12) ) INHERITS (climate.measurement) CREATE INDEX measurement_12_013_s_idx ON climate.measurement_12_013 USING btree (station_id); CREATE INDEX measurement_12_013_y_idx ON climate.measurement_12_013 USING btree (date_part('year'::text, taken)); (Foreign key constraints to be added later.) The following query runs abysmally slow due to a full table scan: SELECT count(1) AS measurements, avg(m.amount) AS amount FROM climate.measurement m WHERE m.station_id IN ( SELECT s.id FROM climate.station s, climate.city c WHERE -- For one city ... -- c.id = 5182 AND -- Where stations are within an elevation range ... -- s.elevation BETWEEN 0 AND 3000 AND 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 ) AND -- -- Begin extracting the data from the database. -- -- The data before 1900 is shaky; insufficient after 2009. -- extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND -- Whittled down by category ... -- m.category_id = 1 AND m.taken BETWEEN -- Start date. (extract( YEAR FROM m.taken )||'-01-01')::date AND -- End date. Calculated by checking to see if the end date wraps -- into the next year. If it does, then add 1 to the current year. -- (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date GROUP BY extract( YEAR FROM m.taken ) The sluggishness comes from this part of the query: m.taken BETWEEN /* Start date. */ (extract( YEAR FROM m.taken )||'-01-01')::date AND /* End date. Calculated by checking to see if the end date wraps into the next year. If it does, then add 1 to the current year. */ (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date The HashAggregate from the plan shows a cost of 10006220141.11, which is, I suspect, on the astronomically huge side. There is a full table scan on the measurement table (itself having neither data nor indexes) being performed. The table aggregates 237 million rows from its child tables. Question What is the proper way to index the dates to avoid full table scans? Options I have considered: GIN GiST Rewrite the WHERE clause Separate year_taken, month_taken, and day_taken columns to the tables What are your thoughts? Thank you!

Read the article

Java bytecode compiler benchmarks

- by Dave Jarvis

Q.1. What free compiler produces the fastest executable Java bytecode? Q.2. What free virtual machine executes Java bytecode the fastest (on 64-bit multi-core CPUs)? Q.3. What other (currently active) compiler projects are missing from this list: http://www.ibm.com/developerworks/java/jdk/ http://gcc.gnu.org/java/ http://openjdk.java.net/groups/compiler/ http://java.sun.com/javase/downloads/ http://download.eclipse.org/eclipse/downloads/ Q.4. What performance improvements can compilers do that JITs cannot (or do not)? Q.5. Where are some recent benchmarks, comparisons, or shoot-outs (for Q1 or Q2)? Thank you!

Read the article

Master report and subreport communication

- by Dave Jarvis

How do you determine whether the content for the detail page being displayed is filled by a specific subreport? For example, consider two subreports on the same master report, using different detail bands: Detail 1 Detail 2 The header on the master report should show certain fields on pages where the subreport in the Detail 2 band is printed. A Return Value from the subreport in the Detail 2 band only seems to be set after the report has finished, even though the Reset type for the variable (in both the master and the subreport) is set to Page. (Setting it to Report also does not help.)

Read the article

Optimal two variable linear regression SQL statement

- by Dave Jarvis

Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL Code SELECT ((sum(t.YEAR) * sum(t.AMOUNT)) - (count(1) * sum(t.YEAR * t.AMOUNT))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as SLOPE, ((sum( t.YEAR ) * sum( t.YEAR * t.AMOUNT )) - (sum( t.AMOUNT ) * sum(power(t.YEAR, 2)))) / (power(sum(t.YEAR), 2) - count(1) * sum(power(t.YEAR, 2))) as INTERCEPT FROM (SELECT D.AMOUNT, Y.YEAR FROM CITY C, STATION S, YEAR_REF Y, MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 8590 AND -- Find all the stations within a 5 unit radius ... -- SQRT( POW( C.LATITUDE - S.LATITUDE, 2 ) + POW( C.LONGITUDE - S.LONGITUDE, 2 ) ) <15 AND -- Gather all known years for that station ... -- S.STATION_DISTRICT_ID = Y.STATION_DISTRICT_ID AND -- The data before 1900 is shaky; and insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '001' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR ORDER BY Y.YEAR ) t Data The data is visualized here: Questions How do I return the y value against all rows without repeating the same query to collect and collate the data? That is, how do I "reuse" the list of t values? How would you change the query to eliminate outliers (at an 85% confidence interval)? The following results (to calculate the start and end points of the line) appear incorrect. Why are the results off by ~10 degrees (e.g., outliers skewing the data)? (1900 * 0.0276653965651912) + (-57.2338357550468) = -4.66958228 (2009 * 0.0276653965651912) + (-57.2338357550468) = -1.65405406 I would have expected the 1900 result to be around 10 (not -4.67) and the 2009 result to be around 11.50 (not -1.65). Thank you!

Search Results

Search found 2480 results on 100 pages for 'dave jarvis'.

Page 27/100 | < Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

- by Dave Jarvis

< Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >