Hive configuration file Hive-site.xml parameter description purpose

Parameter description and usage in the Hive configuration file hive-site.xml

Article directory

Parameter description and usage in Hive configuration file hive-site.xml
Parameter Description
Parameter example
specific purpose:

Parameter description

Parameter name	Default value	Usage
hive.metastore.uris	–	The URI of Hive metadata storage.
hive.metastore.client.socket.timeout	600	Hive metadata client socket timeout.
hive.metastore.warehouse.dir	/user/hive/warehouse	Hive data warehouse directory.
hive.warehouse.subdir.inherit.perms	true	Whether subdirectories inherit permissions.
hive.auto.convert.join	true	Join operation that automatically converts the connection type.
hive.auto.convert.join.noconditionaltask.size	10000000	The conditions are not correct when automatically converting the join operation of the connection type The maximum amount of data that is satisfied.
hive.optimize.bucketmapjoin.sortedmerge	false	Whether to optimize the Sorted Merge of Bucket Map Join.
hive.smbjoin.cache.rows	10000	The number of rows cached by the SMB Join operation.
hive.server2.logging.operation.enabled	false	Whether to enable Hive Server2 logging operation.
hive.server2.logging.operation.log.location	${system:java.io.tmpdir}/ ${system:user.name } /operation_logs	The storage location of Hive Server2 operation logs.
mapred.reduce.tasks	–	The number of Reduce tasks for the MapReduce job.
hive.exec.reducers.bytes.per.reducer	67108864	The amount of data for each Reduce task.
hive.exec.copyfile.maxsize	33554432	The maximum size of the file allowed to be copied.
hive.exec.reducers.max	-1	The maximum number of Reduce tasks running simultaneously.
hive.vectorized.groupby.checkinterval	100000	Check interval for Vectorized Group By operation.
hive.vectorized.groupby.flush.percent	0.1	The Flush proportion of the Vectorized Group By operation.
hive.compute.query.using.stats	true	Whether to use statistical information to optimize query plans.
hive.vectorized.execution.enabled	false	Whether to enable the vectorized execution engine.
hive.vectorized.execution.reduce.enabled	false	Whether to enable vectorized execution in the Reduce phase.
hive.vectorized.use.vectorized.input.format	false	Whether to use vectorized input format.
hive.vectorized.use.checked.expressions	false	Whether to use vectorized execution of check expressions.
hive.vectorized.use.vector.serde.deserialize	false	Whether to use vectorized serialization and deserialization .
hive.vectorized.adaptor.usage.mode	off	The usage mode of the vectorized adapter.
hive.vectorized.input.format.excludes	–	List of excluded vectorized input formats.
hive.merge.mapfiles	true	Whether to merge the small files output by Map.
hive.merge.mapredfiles	false	Whether to merge the small files output by MapReduce.
hive.cbo.enable	false	Whether to enable CBO optimization.
hive.fetch.task.conversion	none	Fetch task conversion level.
hive.fetch.task.conversion.threshold	-1	The data volume threshold that triggers Fetch task conversion.
hive.limit.pushdown.memory.usage	0.1	The memory usage percentage of Limit operation.
hive.merge.sparkfiles	false	Whether to merge the small files output by the Spark task.
hive.merge.smallfiles.avgsize	-1	The average size when merging small files.
hive.merge.size.per.task	-1	The amount of data merged by each task.
hive.optimize.reducededuplication	true	Whether to enable duplicate elimination optimization.
hive.optimize.reducededuplication.min.reducer	4	The minimum number of Reduce tasks to enable duplication elimination optimization.
hive.map.aggr	false	Whether to enable Map-side aggregation.
hive.map.aggr.hash.percentmemory	0.5	Hash table memory proportion aggregated on the Map side.
hive.optimize.sort.dynamic.partition	false	Whether to optimize dynamic partition sorting.
hive.execution.engine	mr	Hive execution engine type.
spark.executor.memory	1g	The memory size of Spark Executor.
spark.driver.memory	1g	The memory size of Spark Driver.
spark.executor.cores	1	The number of cores for each Spark Executor.
spark.yarn.driver.memoryOverhead	384	Spark Driver’s memory Overhead.
spark.yarn.executor.memoryOverhead	384	The memory Overhead of Spark Executor.
spark.dynamicAllocation.enabled	false	Whether to enable dynamic resource allocation.
spark.dynamicAllocation.initialExecutors	-1	The initial number of Executors for dynamic resource allocation.
spark.dynamicAllocation.minExecutors	-1	The minimum number of Executors for dynamic resource allocation.
spark.dynamicAllocation.maxExecutors	-1	The maximum number of Executors for dynamic resource allocation.
hive.metastore.execute.setugi	false	Whether to execute setugi operation in Hive metadata store.
hive.support.concurrency	true	Whether to support concurrent operations.
hive.zookeeper.quorum	–	ZooKeeper server list.
hive.zookeeper.client.port	–	ZooKeeper client port number.
hive.zookeeper.namespace	default	The ZooKeeper namespace used by Hive.
hive.cluster.delegation.token.store.class	org.apache.hadoop.hive .thrift.MemoryTokenStore	Cluster delegation token storage class.
hive.server2.enable.doAs	false	Whether to enable Hive Server2 user agent mode.
hive.metastore.sasl.enabled	false	Whether to enable SASL authentication for Hive metadata storage.
hive.server2.authentication	NONE	Hive Server2 authentication method.
hive.metastore.kerberos.principal	–	The Kerberos principal name of the Hive metadata store.
hive.server2.authentication.kerberos.principal	–	The Kerberos principal name of Hive Server2.
spark.shuffle.service.enabled	true	Whether to enable the Spark Shuffle service.
hive.strict.checks.orderby.no.limit	true	Whether to execute in the OrderBy statement without Limit operation Check strictly.
hive.strict.checks.no.partition.filter	true	Whether to execute in queries without partition filter conditions Check strictly.
hive.strict.checks.type.safety	true	Whether to perform strict type safety checks.
hive.strict.checks.cartesian.product	false	Whether to perform strict Cartesian product checking.
hive.strict.checks.bucketing	true	Whether to perform strict bucket sorting check.

Parameter examples

<configuration>
  <!-- URI of Hive metadata storage -->
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://myhost:9083</value>
  </property>

  <!-- Hive metadata client socket timeout (in milliseconds) -->
  <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>300</value>
  </property>

  <!-- Hive data warehouse directory -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>

  <!-- Whether subdirectories inherit permissions -->
  <property>
    <name>hive.warehouse.subdir.inherit.perms</name>
    <value>true</value>
  </property>

  <!--Join operation that automatically converts the connection type -->
  <property>
    <name>hive.auto.convert.join</name>
    <value>true</value>
  </property>

  <!-- The maximum amount of data (in bytes) that does not meet the conditions when automatically converting the join operation of the connection type -->
  <property>
    <name>hive.auto.convert.join.noconditionaltask.size</name>
    <value>20971520</value>
  </property>

  <!-- Whether to optimize Sorted Merge of Bucket Map Join -->
  <property>
    <name>hive.optimize.bucketmapjoin.sortedmerge</name>
    <value>false</value>
  </property>

  <!-- Number of rows cached for SMB Join operation -->
  <property>
    <name>hive.smbjoin.cache.rows</name>
    <value>10000</value>
  </property>

  <!-- Whether to enable Hive Server2 logging operation -->
  <property>
    <name>hive.server2.logging.operation.enabled</name>
    <value>true</value>
  </property>

  <!-- Storage location of Hive Server2 operation log -->
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/var/log/hive/operation_logs</value>
  </property>

  <!-- Number of Reduce tasks of MapReduce job -->
  <property>
    <name>mapred.reduce.tasks</name>
    <value>-1</value>
  </property>

  <!-- The amount of data for each Reduce task (in bytes) -->
  <property>
    <name>hive.exec.reducers.bytes.per.reducer</name>
    <value>67108864</value>
  </property>

  <!-- Maximum size of files allowed to be copied (in bytes) -->
  <property>
    <name>hive.exec.copyfile.maxsize</name>
    <value>33554432</value>
  </property>

  <!-- The maximum number of Reduce tasks running simultaneously -->
  <property>
    <name>hive.exec.reducers.max</name>
    <value>1099</value>
  </property>

  <!-- Check interval for Vectorized Group By operation -->
  <property>
    <name>hive.vectorized.groupby.checkinterval</name>
    <value>4096</value>
  </property>

  <!-- Flush ratio of Vectorized Group By operation -->
  <property>
    <name>hive.vectorized.groupby.flush.percent</name>
    <value>0.1</value>
  </property>

  <!-- Whether to use statistics to optimize query plans -->
  <property>
    <name>hive.compute.query.using.stats</name>
    <value>false</value>
  </property>

  <!-- Whether to enable vectorized execution engine -->
  <property>
    <name>hive.vectorized.execution.enabled</name>
    <value>true</value>
  </property>

  <!-- Whether to enable vectorized execution in the Reduce phase -->
  <property>
    <name>hive.vectorized.execution.reduce.enabled</name>
    <value>true</value>
  </property>

  <!-- Whether to use vectorized input format -->
  <property>
    <name>hive.vectorized.use.vectorized.input.format</name>
    <value>true</value>
  </property>

  <!-- Whether to use vectorized execution of check expressions -->
  <property>
    <name>hive.vectorized.use.checked.expressions</name>
    <value>true</value>
  </property>

  <!-- Whether to use vectorized serialization and deserialization -->
  <property>
    <name>hive.vectorized.use.vector.serde.deserialize</name>
    <value>false</value>
  </property>

  <!-- Usage mode of vectorized adapter -->
  <property>
    <name>hive.vectorized.adaptor.usage.mode</name>
    <value>chosen</value>
  </property>

  <!-- List of excluded vectorized input formats -->
  <property>
    <name>hive.vectorized.input.format.excludes</name>
    <value>org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat</value>
  </property>

  <!-- Whether to merge small files output by Map -->
  <property>
    <name>hive.merge.mapfiles</name>
    <value>true</value>
  </property>

  <!-- Whether to merge small files output by MapReduce -->
  <property>
    <name>hive.merge.mapredfiles</name>
    <value>false</value>
  </property>

  <!-- Whether to enable CBO optimization -->
  <property>
    <name>hive.cbo.enable</name>
    <value>false</value>
  </property>

  <!-- Fetch task conversion level -->
  <property>
    <name>hive.fetch.task.conversion</name>
    <value>minimal</value>
  </property>

  <!-- Data volume threshold that triggers Fetch task conversion (in bytes) -->
  <property>
    <name>hive.fetch.task.conversion.threshold</name>
    <value>268435456</value>
  </property>

  <!-- Memory usage percentage of Limit operation -->
  <property>
    <name>hive.limit.pushdown.memory.usage</name>
    <value>0.1</value>
  </property>

  <!-- Whether to merge small files output by Spark tasks -->
  <property>
    <name>hive.merge.sparkfiles</name>
    <value>true</value>
  </property>

  <!-- Average size (in bytes) when merging small files -->
  <property>
    <name>hive.merge.smallfiles.avgsize</name>
    <value>16777216</value>
  </property>

  <!-- The amount of data merged per task (in bytes) -->
  <property>
    <name>hive.merge.size.per.task</name>
    <value>268435456</value>
  </property>

  <!-- Whether to enable duplicate elimination optimization -->
  <property>
    <name>hive.optimize.reducededuplication</name>
    <value>true</value>
  </property>

  <!-- Minimum number of Reduce tasks to enable duplicate elimination optimization -->
  <property>
    <name>hive.optimize.reducededuplication.min.reducer</name>
    <value>4</value>
  </property>

  <!-- Whether to enable Map-side aggregation -->
  <property>
    <name>hive.map.aggr</name>
    <value>true</value>
  </property>

  <!-- Hash table memory ratio aggregated on Map side -->
  <property>
    <name>hive.map.aggr.hash.percentmemory</name>
    <value>0.5</value>
  </property>

  <!-- Whether to optimize dynamic partition sorting -->
  <property>
    <name>hive.optimize.sort.dynamic.partition</name>
    <value>false</value>
  </property>

  <!-- Hive execution engine type (mr, tez, spark) -->
  <property>
    <name>hive.execution.engine</name>
    <value>mr</value>
  </property>

  <!-- Memory size of Spark Executor -->
  <property>
    <name>spark.executor.memory</name>
    <value>2572261785b</value>
  </property>

  <!-- Memory size of Spark Driver -->
  <property>
    <name>spark.driver.memory</name>
    <value>3865470566b</value>
  </property>

  <!-- Number of cores for each Spark Executor -->
  <property>
    <name>spark.executor.cores</name>
    <value>4</value>
  </property>

  <!-- Spark Driver's memory Overhead -->
  <property>
    <name>spark.yarn.driver.memoryOverhead</name>
    <value>409m</value>
  </property>

  <!-- Memory Overhead of Spark Executor -->
  <property>
    <name>spark.yarn.executor.memoryOverhead</name>
    <value>432m</value>
  </property>

  <!-- Whether to enable dynamic resource allocation -->
  <property>
    <name>spark.dynamicAllocation.enabled</name>
    <value>true</value>
  </property>

  <!-- The initial number of Executors for dynamic resource allocation -->
  <property>
    <name>spark.dynamicAllocation.initialExecutors</name>
    <value>1</value>
  </property>

  <!-- Minimum number of Executors for dynamic resource allocation -->
  <property>
    <name>spark.dynamicAllocation.minExecutors</name>
    <value>1</value>
  </property>

  <!-- The maximum number of Executors for dynamic resource allocation -->
  <property>
    <name>spark.dynamicAllocation.maxExecutors</name>
    <value>2147483647</value>
  </property>

  <!-- Whether to perform setugi operations in Hive metadata storage -->
  <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>

  <!-- Whether to support concurrent operations -->
  <property>
    <name>hive.support.concurrency</name>
    <value>true</value>
  </property>

  <!-- ZooKeeper server list -->
  <property>
    <name>hive.zookeeper.quorum</name>
    <value>myhost04,myhost03,myhost02</value>
  </property>

  <!-- ZooKeeper client port number -->
  <property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
  </property>

  <!-- ZooKeeper namespace used by Hive -->
  <property>
    <name>hive.zookeeper.namespace</name>
    <value>hive_zookeeper_namespace_hive</value>
  </property>

  <!-- Cluster delegation token storage class -->
  <property>
    <name>hive.cluster.delegation.token.store.class</name>
    <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
  </property>

  <!-- Whether to enable Hive Server2 user agent mode -->
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
  </property>

  <!-- Whether to enable SASL authentication for Hive metadata storage -->
  <property>
    <name>hive.metastore.sasl.enabled</name>
    <value>true</value>
  </property>

  <!-- Hive Server2 authentication method -->
  <property>
    <name>hive.server2.authentication</name>
    <value>kerberos</value>
  </property>

  <!-- Kerberos principal name for Hive metadata storage -->
  <property>
    <name>hive.metastore.kerberos.principal</name>
    <value>hive/[email protected]</value>
  </property>

  <!-- Kerberos principal name of Hive Server2 -->
  <property>
    <name>hive.server2.authentication.kerberos.principal</name>
    <value>hive/[email protected]</value>
  </property>

  <!-- Whether to enable Spark Shuffle service -->
  <property>
    <name>spark.shuffle.service.enabled</name>
    <value>true</value>
  </property>

  <!-- Whether to perform strict checking in the OrderBy statement without Limit operation -->
  <property>
    <name>hive.strict.checks.orderby.no.limit</name>
    <value>false</value>
  </property>

  <!-- Whether to perform strict checking in queries without partition filter conditions -->
  <property>
    <name>hive.strict.checks.no.partition.filter</name>
    <value>false</value>
  </property>

  <!-- Whether to perform strict type safety checks -->
  <property>
    <name>hive.strict.checks.type.safety</name>
    <value>true</value>
  </property>

  <!-- Whether to perform strict Cartesian product checking -->
  <property>
    <name>hive.strict.checks.cartesian.product</name>
    <value>false</value>
  </property>

  <!-- Whether to perform strict bucket sorting check -->
  <property>
    <name>hive.strict.checks.bucketing</name>
    <value>true</value>
  </property>
</configuration>

Specific use:

hive.metastore.uris: URI of Hive metadata store.
hive.metastore.client.socket.timeout: Hive metadata client socket timeout.
hive.metastore.warehouse.dir: Hive data warehouse directory.
hive.warehouse.subdir.inherit.perms: Whether subdirectories inherit permissions.
hive.auto.convert.join: Join operation that automatically converts the connection type.
hive.auto.convert.join.noconditionaltask.size: The maximum amount of data that does not meet the conditions when automatically converting the join operation of the connection type.
hive.optimize.bucketmapjoin.sortedmerge: Whether to optimize the Sorted Merge of Bucket Map Join.
hive.smbjoin.cache.rows: The number of rows cached by the SMB Join operation.
hive.server2.logging.operation.enabled: Whether to enable Hive Server2 logging operation.
hive.server2.logging.operation.log.location: The storage location of Hive Server2 operation logs.
mapred.reduce.tasks: The number of Reduce tasks for the MapReduce job.
hive.exec.reducers.bytes.per.reducer: The amount of data for each Reduce task.
hive.exec.copyfile.maxsize: The maximum size of the file allowed to be copied.
hive.exec.reducers.max: The maximum number of Reduce tasks running simultaneously.
hive.vectorized.groupby.checkinterval: Check interval for Vectorized Group By operation.
hive.vectorized.groupby.flush.percent: Flush proportion of Vectorized Group By operation.
hive.compute.query.using.stats: Whether to use statistics to optimize query plans.
hive.vectorized.execution.enabled: Whether to enable the vectorized execution engine.
hive.vectorized.execution.reduce.enabled: Whether to enable vectorized execution in the Reduce phase.
hive.vectorized.use.vectorized.input.format: Whether to use vectorized input format.
hive.vectorized.use.checked.expressions: Whether to use vectorized execution of check expressions.
hive.vectorized.use.vector.serde.deserialize: Whether to use vectorized serialization and deserialization.
hive.vectorized.adaptor.usage.mode: Usage mode of vectorized adapter.
hive.vectorized.input.format.excludes: List of excluded vectorized input formats.
hive.merge.mapfiles: Whether to merge small files output by Map.
hive.merge.mapredfiles: Whether to merge small files output by MapReduce.
hive.cbo.enable: Whether to enable CBO optimization.
hive.fetch.task.conversion: Fetch task conversion level.
hive.fetch.task.conversion.threshold: The data volume threshold that triggers Fetch task conversion.
hive.limit.pushdown.memory.usage: Memory usage percentage of Limit operation.
hive.merge.sparkfiles: Whether to merge small files output by Spark tasks.
hive.merge.smallfiles.avgsize: The average size when merging small files.
hive.merge.size.per.task: The amount of data merged by each task.
hive.optimize.reducededuplication: Whether to enable duplicate elimination optimization.
hive.optimize.reducededuplication.min.reducer: Minimum number of Reduce tasks to enable duplicate elimination optimization.
hive.map.aggr: Whether to enable Map-side aggregation.
hive.map.aggr.hash.percentmemory: The proportion of hash table memory aggregated on the Map side.
hive.optimize.sort.dynamic.partition: Whether to optimize dynamic partition sorting.
hive.execution.engine: Hive execution engine type.
spark.executor.memory: The memory size of Spark Executor.
spark.driver.memory: Spark Driver memory size.
spark.executor.cores: The number of cores for each Spark Executor.
spark.yarn.driver.memoryOverhead: Spark Driver’s memory Overhead.
spark.yarn.executor.memoryOverhead: Spark Executor’s memory Overhead.
spark.dynamicAllocation.enabled: Whether to enable dynamic resource allocation.
spark.dynamicAllocation.initialExecutors: The initial number of Executors for dynamic resource allocation.
spark.dynamicAllocation.minExecutors: The minimum number of Executors for dynamic resource allocation.
spark.dynamicAllocation.maxExecutors: The maximum number of Executors for dynamic resource allocation.
hive.metastore.execute.setugi: Whether to perform setugi operations in the Hive metadata store.
hive.support.concurrency: Whether to support concurrent operations.
hive.zookeeper.quorum: ZooKeeper server list.
hive.zookeeper.client.port: ZooKeeper client port number.
hive.zookeeper.namespace: ZooKeeper namespace used by Hive.
hive.cluster.delegation.token.store.class: cluster delegation token storage class.
hive.server2.enable.doAs: Whether to enable Hive Server2 user agent mode.
hive.metastore.sasl.enabled: Whether to enable SASL authentication for Hive metadata storage.
hive.server2.authentication: Hive Server2 authentication method.
hive.metastore.kerberos.principal: Kerberos principal name of Hive metadata store.
hive.server2.authentication.kerberos.principal: Kerberos principal name of Hive Server2.
spark.shuffle.service.enabled: Whether to enable the Spark Shuffle service.
hive.strict.checks.orderby.no.limit: Whether to perform strict checks in OrderBy statements without Limit operations.
hive.strict.checks.no.partition.filter: Whether to perform strict checks in queries without partition filter conditions.
hive.strict.checks.type.safety: Whether to perform strict type safety checks.
hive.strict.checks.cartesian.product: Whether to perform strict Cartesian product checking.
hive.strict.checks.bucketing: Whether to perform strict bucket sorting check.

The specific values of these parameters can be modified and configured according to actual needs to meet the requirements of your Hive and Spark environments.