Hive performance increase

Posted by Sagar Nikam on Stack Overflow See other posts from Stack Overflow or by Sagar Nikam
Published on 2012-11-01T08:17:16Z Indexed on 2012/11/17 5:01 UTC
Read the original article Hit count: 142

Filed under:

I am dealing with a database (2.5 GB) having some tables only 40 row to some having 9 million rows data. when I am doing any query for large table it takes more time. I want results in less time

small query on table which have 90 rows only-->

hive> select count(*) from cidade; 
Time taken: 50.172 seconds

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

<property>
<name>dfs.block.size</name>
<value>131072</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>

does these setting affects performance of hive? dfs.replication=3 dfs.block.size=131072

can i set it from hive prompt as

hive>set dfs.replication=5

Is this value remains for a perticular session only ?

or Is it better to change it in .xml file ?

© Stack Overflow or respective owner

Related posts about hive