Parameter description and usage in the Hive configuration file hive-site.xml
Article directory
- Parameter description and usage in Hive configuration file hive-site.xml
- Parameter Description
- Parameter example
- specific purpose:
Parameter description
Parameter name | Default value | Usage |
---|---|---|
hive.metastore.uris | – | The URI of Hive metadata storage. |
hive.metastore.client.socket.timeout | 600 | Hive metadata client socket timeout. |
hive.metastore.warehouse.dir | /user/hive/warehouse | Hive data warehouse directory. |
hive.warehouse.subdir.inherit.perms | true | Whether subdirectories inherit permissions. |
hive.auto.convert.join | true | Join operation that automatically converts the connection type. |
hive.auto.convert.join.noconditionaltask.size | 10000000 | The conditions are not correct when automatically converting the join operation of the connection type The maximum amount of data that is satisfied. |
hive.optimize.bucketmapjoin.sortedmerge | false | Whether to optimize the Sorted Merge of Bucket Map Join. |
hive.smbjoin.cache.rows | 10000 | The number of rows cached by the SMB Join operation. |
hive.server2.logging.operation.enabled | false | Whether to enable Hive Server2 logging operation. |
hive.server2.logging.operation.log.location | ${system:java.io.tmpdir}/ ${system:user.name } /operation_logs | The storage location of Hive Server2 operation logs. |
mapred.reduce.tasks | – | The number of Reduce tasks for the MapReduce job. |
hive.exec.reducers.bytes.per.reducer | 67108864 | The amount of data for each Reduce task. |
hive.exec.copyfile.maxsize | 33554432 | The maximum size of the file allowed to be copied. |
hive.exec.reducers.max | -1 | The maximum number of Reduce tasks running simultaneously. |
hive.vectorized.groupby.checkinterval | 100000 | Check interval for Vectorized Group By operation. |
hive.vectorized.groupby.flush.percent | 0.1 | The Flush proportion of the Vectorized Group By operation. |
hive.compute.query.using.stats | true | Whether to use statistical information to optimize query plans. |
hive.vectorized.execution.enabled | false | Whether to enable the vectorized execution engine. |
hive.vectorized.execution.reduce.enabled | false | Whether to enable vectorized execution in the Reduce phase. |
hive.vectorized.use.vectorized.input.format | false | Whether to use vectorized input format. |
hive.vectorized.use.checked.expressions | false | Whether to use vectorized execution of check expressions. |
hive.vectorized.use.vector.serde.deserialize | false | Whether to use vectorized serialization and deserialization . |
hive.vectorized.adaptor.usage.mode | off | The usage mode of the vectorized adapter. |
hive.vectorized.input.format.excludes | – | List of excluded vectorized input formats. |
hive.merge.mapfiles | true | Whether to merge the small files output by Map. |
hive.merge.mapredfiles | false | Whether to merge the small files output by MapReduce. |
hive.cbo.enable | false | Whether to enable CBO optimization. |
hive.fetch.task.conversion | none | Fetch task conversion level. |
hive.fetch.task.conversion.threshold | -1 | The data volume threshold that triggers Fetch task conversion. |
hive.limit.pushdown.memory.usage | 0.1 | The memory usage percentage of Limit operation. |
hive.merge.sparkfiles | false | Whether to merge the small files output by the Spark task. |
hive.merge.smallfiles.avgsize | -1 | The average size when merging small files. |
hive.merge.size.per.task | -1 | The amount of data merged by each task. |
hive.optimize.reducededuplication | true | Whether to enable duplicate elimination optimization. |
hive.optimize.reducededuplication.min.reducer | 4 | The minimum number of Reduce tasks to enable duplication elimination optimization. |
hive.map.aggr | false | Whether to enable Map-side aggregation. |
hive.map.aggr.hash.percentmemory | 0.5 | Hash table memory proportion aggregated on the Map side. |
hive.optimize.sort.dynamic.partition | false | Whether to optimize dynamic partition sorting. |
hive.execution.engine | mr | Hive execution engine type. |
spark.executor.memory | 1g | The memory size of Spark Executor. |
spark.driver.memory | 1g | The memory size of Spark Driver. |
spark.executor.cores | 1 | The number of cores for each Spark Executor. |
spark.yarn.driver.memoryOverhead | 384 | Spark Driver’s memory Overhead. |
spark.yarn.executor.memoryOverhead | 384 | The memory Overhead of Spark Executor. |
spark.dynamicAllocation.enabled | false | Whether to enable dynamic resource allocation. |
spark.dynamicAllocation.initialExecutors | -1 | The initial number of Executors for dynamic resource allocation. |
spark.dynamicAllocation.minExecutors | -1 | The minimum number of Executors for dynamic resource allocation. |
spark.dynamicAllocation.maxExecutors | -1 | The maximum number of Executors for dynamic resource allocation. |
hive.metastore.execute.setugi | false | Whether to execute setugi operation in Hive metadata store. |
hive.support.concurrency | true | Whether to support concurrent operations. |
hive.zookeeper.quorum | – | ZooKeeper server list. |
hive.zookeeper.client.port | – | ZooKeeper client port number. |
hive.zookeeper.namespace | default | The ZooKeeper namespace used by Hive. |
hive.cluster.delegation.token.store.class | org.apache.hadoop.hive .thrift.MemoryTokenStore | Cluster delegation token storage class. |
hive.server2.enable.doAs | false | Whether to enable Hive Server2 user agent mode. |
hive.metastore.sasl.enabled | false | Whether to enable SASL authentication for Hive metadata storage. |
hive.server2.authentication | NONE | Hive Server2 authentication method. |
hive.metastore.kerberos.principal | – | The Kerberos principal name of the Hive metadata store. |
hive.server2.authentication.kerberos.principal | – | The Kerberos principal name of Hive Server2. |
spark.shuffle.service.enabled | true | Whether to enable the Spark Shuffle service. |
hive.strict.checks.orderby.no.limit | true | Whether to execute in the OrderBy statement without Limit operation Check strictly. |
hive.strict.checks.no.partition.filter | true | Whether to execute in queries without partition filter conditions Check strictly. |
hive.strict.checks.type.safety | true | Whether to perform strict type safety checks. |
hive.strict.checks.cartesian.product | false | Whether to perform strict Cartesian product checking. |
hive.strict.checks.bucketing | true | Whether to perform strict bucket sorting check. |
Parameter examples
<configuration> <!-- URI of Hive metadata storage --> <property> <name>hive.metastore.uris</name> <value>thrift://myhost:9083</value> </property> <!-- Hive metadata client socket timeout (in milliseconds) --> <property> <name>hive.metastore.client.socket.timeout</name> <value>300</value> </property> <!-- Hive data warehouse directory --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <!-- Whether subdirectories inherit permissions --> <property> <name>hive.warehouse.subdir.inherit.perms</name> <value>true</value> </property> <!--Join operation that automatically converts the connection type --> <property> <name>hive.auto.convert.join</name> <value>true</value> </property> <!-- The maximum amount of data (in bytes) that does not meet the conditions when automatically converting the join operation of the connection type --> <property> <name>hive.auto.convert.join.noconditionaltask.size</name> <value>20971520</value> </property> <!-- Whether to optimize Sorted Merge of Bucket Map Join --> <property> <name>hive.optimize.bucketmapjoin.sortedmerge</name> <value>false</value> </property> <!-- Number of rows cached for SMB Join operation --> <property> <name>hive.smbjoin.cache.rows</name> <value>10000</value> </property> <!-- Whether to enable Hive Server2 logging operation --> <property> <name>hive.server2.logging.operation.enabled</name> <value>true</value> </property> <!-- Storage location of Hive Server2 operation log --> <property> <name>hive.server2.logging.operation.log.location</name> <value>/var/log/hive/operation_logs</value> </property> <!-- Number of Reduce tasks of MapReduce job --> <property> <name>mapred.reduce.tasks</name> <value>-1</value> </property> <!-- The amount of data for each Reduce task (in bytes) --> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>67108864</value> </property> <!-- Maximum size of files allowed to be copied (in bytes) --> <property> <name>hive.exec.copyfile.maxsize</name> <value>33554432</value> </property> <!-- The maximum number of Reduce tasks running simultaneously --> <property> <name>hive.exec.reducers.max</name> <value>1099</value> </property> <!-- Check interval for Vectorized Group By operation --> <property> <name>hive.vectorized.groupby.checkinterval</name> <value>4096</value> </property> <!-- Flush ratio of Vectorized Group By operation --> <property> <name>hive.vectorized.groupby.flush.percent</name> <value>0.1</value> </property> <!-- Whether to use statistics to optimize query plans --> <property> <name>hive.compute.query.using.stats</name> <value>false</value> </property> <!-- Whether to enable vectorized execution engine --> <property> <name>hive.vectorized.execution.enabled</name> <value>true</value> </property> <!-- Whether to enable vectorized execution in the Reduce phase --> <property> <name>hive.vectorized.execution.reduce.enabled</name> <value>true</value> </property> <!-- Whether to use vectorized input format --> <property> <name>hive.vectorized.use.vectorized.input.format</name> <value>true</value> </property> <!-- Whether to use vectorized execution of check expressions --> <property> <name>hive.vectorized.use.checked.expressions</name> <value>true</value> </property> <!-- Whether to use vectorized serialization and deserialization --> <property> <name>hive.vectorized.use.vector.serde.deserialize</name> <value>false</value> </property> <!-- Usage mode of vectorized adapter --> <property> <name>hive.vectorized.adaptor.usage.mode</name> <value>chosen</value> </property> <!-- List of excluded vectorized input formats --> <property> <name>hive.vectorized.input.format.excludes</name> <value>org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat</value> </property> <!-- Whether to merge small files output by Map --> <property> <name>hive.merge.mapfiles</name> <value>true</value> </property> <!-- Whether to merge small files output by MapReduce --> <property> <name>hive.merge.mapredfiles</name> <value>false</value> </property> <!-- Whether to enable CBO optimization --> <property> <name>hive.cbo.enable</name> <value>false</value> </property> <!-- Fetch task conversion level --> <property> <name>hive.fetch.task.conversion</name> <value>minimal</value> </property> <!-- Data volume threshold that triggers Fetch task conversion (in bytes) --> <property> <name>hive.fetch.task.conversion.threshold</name> <value>268435456</value> </property> <!-- Memory usage percentage of Limit operation --> <property> <name>hive.limit.pushdown.memory.usage</name> <value>0.1</value> </property> <!-- Whether to merge small files output by Spark tasks --> <property> <name>hive.merge.sparkfiles</name> <value>true</value> </property> <!-- Average size (in bytes) when merging small files --> <property> <name>hive.merge.smallfiles.avgsize</name> <value>16777216</value> </property> <!-- The amount of data merged per task (in bytes) --> <property> <name>hive.merge.size.per.task</name> <value>268435456</value> </property> <!-- Whether to enable duplicate elimination optimization --> <property> <name>hive.optimize.reducededuplication</name> <value>true</value> </property> <!-- Minimum number of Reduce tasks to enable duplicate elimination optimization --> <property> <name>hive.optimize.reducededuplication.min.reducer</name> <value>4</value> </property> <!-- Whether to enable Map-side aggregation --> <property> <name>hive.map.aggr</name> <value>true</value> </property> <!-- Hash table memory ratio aggregated on Map side --> <property> <name>hive.map.aggr.hash.percentmemory</name> <value>0.5</value> </property> <!-- Whether to optimize dynamic partition sorting --> <property> <name>hive.optimize.sort.dynamic.partition</name> <value>false</value> </property> <!-- Hive execution engine type (mr, tez, spark) --> <property> <name>hive.execution.engine</name> <value>mr</value> </property> <!-- Memory size of Spark Executor --> <property> <name>spark.executor.memory</name> <value>2572261785b</value> </property> <!-- Memory size of Spark Driver --> <property> <name>spark.driver.memory</name> <value>3865470566b</value> </property> <!-- Number of cores for each Spark Executor --> <property> <name>spark.executor.cores</name> <value>4</value> </property> <!-- Spark Driver's memory Overhead --> <property> <name>spark.yarn.driver.memoryOverhead</name> <value>409m</value> </property> <!-- Memory Overhead of Spark Executor --> <property> <name>spark.yarn.executor.memoryOverhead</name> <value>432m</value> </property> <!-- Whether to enable dynamic resource allocation --> <property> <name>spark.dynamicAllocation.enabled</name> <value>true</value> </property> <!-- The initial number of Executors for dynamic resource allocation --> <property> <name>spark.dynamicAllocation.initialExecutors</name> <value>1</value> </property> <!-- Minimum number of Executors for dynamic resource allocation --> <property> <name>spark.dynamicAllocation.minExecutors</name> <value>1</value> </property> <!-- The maximum number of Executors for dynamic resource allocation --> <property> <name>spark.dynamicAllocation.maxExecutors</name> <value>2147483647</value> </property> <!-- Whether to perform setugi operations in Hive metadata storage --> <property> <name>hive.metastore.execute.setugi</name> <value>true</value> </property> <!-- Whether to support concurrent operations --> <property> <name>hive.support.concurrency</name> <value>true</value> </property> <!-- ZooKeeper server list --> <property> <name>hive.zookeeper.quorum</name> <value>myhost04,myhost03,myhost02</value> </property> <!-- ZooKeeper client port number --> <property> <name>hive.zookeeper.client.port</name> <value>2181</value> </property> <!-- ZooKeeper namespace used by Hive --> <property> <name>hive.zookeeper.namespace</name> <value>hive_zookeeper_namespace_hive</value> </property> <!-- Cluster delegation token storage class --> <property> <name>hive.cluster.delegation.token.store.class</name> <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value> </property> <!-- Whether to enable Hive Server2 user agent mode --> <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> <!-- Whether to enable SASL authentication for Hive metadata storage --> <property> <name>hive.metastore.sasl.enabled</name> <value>true</value> </property> <!-- Hive Server2 authentication method --> <property> <name>hive.server2.authentication</name> <value>kerberos</value> </property> <!-- Kerberos principal name for Hive metadata storage --> <property> <name>hive.metastore.kerberos.principal</name> <value>hive/[email protected]</value> </property> <!-- Kerberos principal name of Hive Server2 --> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/[email protected]</value> </property> <!-- Whether to enable Spark Shuffle service --> <property> <name>spark.shuffle.service.enabled</name> <value>true</value> </property> <!-- Whether to perform strict checking in the OrderBy statement without Limit operation --> <property> <name>hive.strict.checks.orderby.no.limit</name> <value>false</value> </property> <!-- Whether to perform strict checking in queries without partition filter conditions --> <property> <name>hive.strict.checks.no.partition.filter</name> <value>false</value> </property> <!-- Whether to perform strict type safety checks --> <property> <name>hive.strict.checks.type.safety</name> <value>true</value> </property> <!-- Whether to perform strict Cartesian product checking --> <property> <name>hive.strict.checks.cartesian.product</name> <value>false</value> </property> <!-- Whether to perform strict bucket sorting check --> <property> <name>hive.strict.checks.bucketing</name> <value>true</value> </property> </configuration>
Specific use:
- hive.metastore.uris: URI of Hive metadata store.
- hive.metastore.client.socket.timeout: Hive metadata client socket timeout.
- hive.metastore.warehouse.dir: Hive data warehouse directory.
- hive.warehouse.subdir.inherit.perms: Whether subdirectories inherit permissions.
- hive.auto.convert.join: Join operation that automatically converts the connection type.
- hive.auto.convert.join.noconditionaltask.size: The maximum amount of data that does not meet the conditions when automatically converting the join operation of the connection type.
- hive.optimize.bucketmapjoin.sortedmerge: Whether to optimize the Sorted Merge of Bucket Map Join.
- hive.smbjoin.cache.rows: The number of rows cached by the SMB Join operation.
- hive.server2.logging.operation.enabled: Whether to enable Hive Server2 logging operation.
- hive.server2.logging.operation.log.location: The storage location of Hive Server2 operation logs.
- mapred.reduce.tasks: The number of Reduce tasks for the MapReduce job.
- hive.exec.reducers.bytes.per.reducer: The amount of data for each Reduce task.
- hive.exec.copyfile.maxsize: The maximum size of the file allowed to be copied.
- hive.exec.reducers.max: The maximum number of Reduce tasks running simultaneously.
- hive.vectorized.groupby.checkinterval: Check interval for Vectorized Group By operation.
- hive.vectorized.groupby.flush.percent: Flush proportion of Vectorized Group By operation.
- hive.compute.query.using.stats: Whether to use statistics to optimize query plans.
- hive.vectorized.execution.enabled: Whether to enable the vectorized execution engine.
- hive.vectorized.execution.reduce.enabled: Whether to enable vectorized execution in the Reduce phase.
- hive.vectorized.use.vectorized.input.format: Whether to use vectorized input format.
- hive.vectorized.use.checked.expressions: Whether to use vectorized execution of check expressions.
- hive.vectorized.use.vector.serde.deserialize: Whether to use vectorized serialization and deserialization.
- hive.vectorized.adaptor.usage.mode: Usage mode of vectorized adapter.
- hive.vectorized.input.format.excludes: List of excluded vectorized input formats.
- hive.merge.mapfiles: Whether to merge small files output by Map.
- hive.merge.mapredfiles: Whether to merge small files output by MapReduce.
- hive.cbo.enable: Whether to enable CBO optimization.
- hive.fetch.task.conversion: Fetch task conversion level.
- hive.fetch.task.conversion.threshold: The data volume threshold that triggers Fetch task conversion.
- hive.limit.pushdown.memory.usage: Memory usage percentage of Limit operation.
- hive.merge.sparkfiles: Whether to merge small files output by Spark tasks.
- hive.merge.smallfiles.avgsize: The average size when merging small files.
- hive.merge.size.per.task: The amount of data merged by each task.
- hive.optimize.reducededuplication: Whether to enable duplicate elimination optimization.
- hive.optimize.reducededuplication.min.reducer: Minimum number of Reduce tasks to enable duplicate elimination optimization.
- hive.map.aggr: Whether to enable Map-side aggregation.
- hive.map.aggr.hash.percentmemory: The proportion of hash table memory aggregated on the Map side.
- hive.optimize.sort.dynamic.partition: Whether to optimize dynamic partition sorting.
- hive.execution.engine: Hive execution engine type.
- spark.executor.memory: The memory size of Spark Executor.
- spark.driver.memory: Spark Driver memory size.
- spark.executor.cores: The number of cores for each Spark Executor.
- spark.yarn.driver.memoryOverhead: Spark Driver’s memory Overhead.
- spark.yarn.executor.memoryOverhead: Spark Executor’s memory Overhead.
- spark.dynamicAllocation.enabled: Whether to enable dynamic resource allocation.
- spark.dynamicAllocation.initialExecutors: The initial number of Executors for dynamic resource allocation.
- spark.dynamicAllocation.minExecutors: The minimum number of Executors for dynamic resource allocation.
- spark.dynamicAllocation.maxExecutors: The maximum number of Executors for dynamic resource allocation.
- hive.metastore.execute.setugi: Whether to perform setugi operations in the Hive metadata store.
- hive.support.concurrency: Whether to support concurrent operations.
- hive.zookeeper.quorum: ZooKeeper server list.
- hive.zookeeper.client.port: ZooKeeper client port number.
- hive.zookeeper.namespace: ZooKeeper namespace used by Hive.
- hive.cluster.delegation.token.store.class: cluster delegation token storage class.
- hive.server2.enable.doAs: Whether to enable Hive Server2 user agent mode.
- hive.metastore.sasl.enabled: Whether to enable SASL authentication for Hive metadata storage.
- hive.server2.authentication: Hive Server2 authentication method.
- hive.metastore.kerberos.principal: Kerberos principal name of Hive metadata store.
- hive.server2.authentication.kerberos.principal: Kerberos principal name of Hive Server2.
- spark.shuffle.service.enabled: Whether to enable the Spark Shuffle service.
- hive.strict.checks.orderby.no.limit: Whether to perform strict checks in OrderBy statements without Limit operations.
- hive.strict.checks.no.partition.filter: Whether to perform strict checks in queries without partition filter conditions.
- hive.strict.checks.type.safety: Whether to perform strict type safety checks.
- hive.strict.checks.cartesian.product: Whether to perform strict Cartesian product checking.
- hive.strict.checks.bucketing: Whether to perform strict bucket sorting check.
The specific values of these parameters can be modified and configured according to actual needs to meet the requirements of your Hive and Spark environments.