I'm trying to use a multidimensional array in VHDL and I'm having a lot of trouble getting it to work properly.  My issue is that I've got an array of 17, of 16 vectors, of a given size.  What I want to do is create 17 registers that are array of 16 * std_logic_vector of 32 bits (which = my b, 512).  So, I'm trying to pass in something to input and output on the register instantiation that tells the compiler/synthesizer that I want to pass in something that is 512 bits worth...  Similar to in C if I had:
int var[COLS][ROWS][ELEMENTS];
memcpy(&var[3].. // I'm talking about 3rd COL here, passing in memory that is ROWS*ELEMENTS long
(My actual declaration is here:)
type partial_pipeline_registers_type is array (0 to 16, 0 to 15) of std_logic_vector(iw - 1 downto 0);
   signal h_blk_pipelined_input : partial_pipeline_registers_type;
I tried simply using h_blk_pipelined_input(0) .. up to (16) but this doesn't work.  I get the following error, which makes me see that I need to double index in to the array:
ERROR:HDLParsers:821 - (at the register) Wrong index type for h_blk_pipelined_input.
So then I tried what's below, and I get this error:
ERROR:HDLParsers:164 - (at the register code). parse error, unexpected TO, expecting COMMA or CLOSEPAR
  instantiate_h_pipelined_reg : regn
   generic map ( N=> b, init => bzeros )
   port map ( clk => clk , rst => '0', en => '1',
      input => h_blk_pipelined_input((i - 1), 0 to 15),
      output=> h_blk_pipelined_input((i),     0 to 15));
-- Changing 0 to 15 to (0 to 15) has no effect...
I'm using XST, and from their documentation (http://www.xilinx.com/itp/xilinx6/books/data/docs/xst/xst0067_9.html), the above should have worked:
...declaration:
subtype MATRIX15 is array(4 downto 0, 2 downto 0)
   of STD_LOGIC_VECTOR (7 downto 0);
 A multi-dimensional array signal or variable can be completely used:
Just a slice of one row can be specified:
MATRIX15 (4,4 downto 1) <= TAB_B (3 downto 0);
One alternative is that I can create more  registers that are 16 times smaller, and instead of trying to do all '0 to 15' at once, I would just do that 15 additional times.  However, I think this may lead to inefficiency in synthesis and I don't feel like this is the right solution.
EDIT:
Tried what Ben said,
instantiate_h_m_qa_pipeline_registers: for i in 1 to 16 generate
instantiate_h_pipelined_reg : regn
    generic map ( N=> b, init => bzeros )
    port map ( clk => clk , rst => '0', en => '1',
       input => h_blk_pipelined_input(i - 1),                                            
       output=> h_blk_pipelined_input(i));                                          
end generate instantiate_h_m_qa_pipeline_registers;                                                 
The signals are now defined as:
type std_logic_block is array (0 to 15) of std_logic_vector(iw - 1 downto 0) ;
type partial_pipeline_registers_type is array (0 to 16) of std_logic_block;
signal h_blk_pipelined_input : partial_pipeline_registers_type;
And the error I get from XST is:
ERROR:HDLParsers:800 - ((where the register part is)) Type of input is incompatible with type of h_blk_pipelined_input.
I'm able to do everything I was able to do before, using ()() syntax instead of ( , ) so I haven't lost anything going this way, but it still doesn't resolve my problem.