Hive performance increase
- by Sagar Nikam
I am dealing with a database (2.5 GB) having some tables only 40 row to some having 9 million rows data.
when I am doing any query for large table it takes more time.
I want results in less time
small query on table which have 90 rows only--
hive> select count(*) from cidade;
Time taken: 50.172 seconds
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.block.size</name>
<value>131072</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
does these setting affects performance of hive?
dfs.replication=3
dfs.block.size=131072
can i set it from hive prompt as
hive>set dfs.replication=5
Is this value remains for a perticular session only ?
or Is it better to change it in .xml file ?