Spark MLlib Params source code analysis

Spark MLlib Params source code analysis

Article directory

  • Spark MLlib Params source code analysis
    • 1. Source code applicable scenarios
    • 2. Source code summary
    • 3. Usage and examples
    • 3. Chinese source code
    • 5. Official link

1. Source code applicable scenarios

Params is a trait used when parameters are required in components. It also provides an internal parameter map to store parameter values attached to the instance.

2. Source code summary

This code defines a Params trait and its companion object Params.

The Params trait is a component that accepts parameters and provides an internal parameter map to store parameter values associated with the instance. It has the following main members:

  • params: Array[Param[_]]: Returns an array of all parameters sorted by name.
  • explainParam(param: Param[_]): String: Explain a parameter, returning a string containing the parameter name, documentation, and default and user-supplied values.
  • explainParams(): String: Explain all parameters of this instance.
  • isSet(param: Param[_]): Boolean: Checks whether the parameter has been explicitly set.
  • isDefined(param: Param[_]): Boolean: Checks whether the parameter has been explicitly set or has a default value.
  • hasParam(paramName: String): Boolean: Tests whether this instance contains a parameter with the given name.
  • getParam(paramName: String): Param[Any]: Get parameters based on parameter names.
  • set[T](param: Param[T], value: T): this.type: Set parameters in the embedded parameter map.
  • get[T](param: Param[T]): Option[T]: Optionally returns the user-supplied value of the parameter.
  • clear(param: Param[_]): this.type: Clear the user-supplied value of the parameter.
  • getOrDefault[T](param: Param[T]): T: Get the value of the parameter or its default value.
  • setDefault[T](param: Param[T], value: T): this.type: Set the default value for the parameter.
  • setDefault(paramPairs: ParamPair[_]*): this.type: Set default values for a group of parameters.
  • getDefault[T](param: Param[T]): Option[T]: Get the default value of the parameter.
  • hasDefault[T](param: Param[T]): Boolean: Test whether the input parameter has a default value.
  • copy(extra: ParamMap): Params: Creates a copy of this instance with the same UID and extra parameters.
  • extractParamMap(extra: ParamMap): ParamMap: Extracts embedded default parameter values and user-supplied values and merges them with extra values into a flat parameter map.

The Params companion object contains some auxiliary methods, such as setDefault, for setting default parameter values.

By using the Params trait and the Params companion object, we can define, set and get parameters in the component, as well as interpret and copy parameters.

3. Usage and examples

Get all parameters sorted by name:

val params: Array[Param[_]] = this.params

Explain the meaning of individual parameters:

val param: Param[_] = ...
val explanation: String = this.explainParam(param)

Explain the meaning of all parameters:

val explanations: String = this.explainParams()

Check if the parameters are set:

val param: Param[_] = ...
val isSet: Boolean = this.isSet(param)

Check if the parameter is set or has a default value:

val param: Param[_] = ...
val isDefined: Boolean = this.isDefined(param)

Check if a parameter with a given name exists:

val paramName: String = ...
val hasParam: Boolean = this.hasParam(paramName)

Get parameters based on parameter name:

val paramName: String = ...
val param: Param[Any] = this.getParam(paramName)

Set parameter values:

val param: Param[T] = ...
val value: T = ...
this.set(param, value)

Get the parameter value (if set):

val param: Param[T] = ...
val valueOpt: Option[T] = this.get(param)

Clear parameter values:

val param: Param[_] = ...
this.clear(param)

Get parameter value or default value:

val param: Param[T] = ...
val value: T = this.getOrDefault(param)

Create a copy with the same UID and extra parameters:

val extra: ParamMap = ...
val copy: Params = this.copy(extra)

Extract parameter mapping:

val paramMap: ParamMap = this.extractParamMap()

3. Chinese source code

/**
 * :: DeveloperApi ::
 * This interface is used by components to accept parameters. This also provides an internal parameter map for storing parameter values attached to the instance.
 */
@DeveloperApi
trait Params extends Identifiable with Serializable {<!-- -->

  /**
   * Returns all parameters sorted by name. The default implementation uses Java reflection to list all public methods that have no parameters and return [[Param]].
   *
   * @note Developers should not use this method in the constructor because we cannot guarantee that this variable is initialized before other parameters.
   */
  lazy val params: Array[Param[_]] = {<!-- -->
    val methods = this.getClass.getMethods
    methods.filter {<!-- --> m =>
        Modifier.isPublic(m.getModifiers) & amp; & amp;
          classOf[Param[_]].isAssignableFrom(m.getReturnType) & amp; & amp;
          m.getParameterTypes.isEmpty
      }.sortBy(_.getName)
      .map(m => m.invoke(this).asInstanceOf[Param[_]])
  }

  /**
   * Explain a parameter.
   * @param param input parameter, must belong to this instance.
   * @return A string containing the input parameter name, documentation, and optional default and user-supplied values
   */
  def explainParam(param: Param[_]): String = {<!-- -->
    shouldOwn(param)
    val valueStr = if (isDefined(param)) {<!-- -->
      val defaultValueStr = getDefault(param).map("Default value: " + _)
      val currentValueStr = get(param).map("Current value: " + _)
      (defaultValueStr + + currentValueStr).mkString("(", ", ", ")")
    } else {<!-- -->
      "(undefined)"
    }
    s"${<!-- -->param.name}: ${<!-- -->param.doc} $valueStr"
  }

  /**
   * Explain all parameters of this instance. See `explainParam()`.
   */
  def explainParams(): String = {<!-- -->
    params.map(explainParam).mkString("\\
")
  }

  /** Checks whether the parameter has been set explicitly. */
  final def isSet(param: Param[_]): Boolean = {<!-- -->
    shouldOwn(param)
    paramMap.contains(param)
  }

  /** Checks whether a parameter has been set explicitly or has a default value. */
  final def isDefined(param: Param[_]): Boolean = {<!-- -->
    shouldOwn(param)
    defaultParamMap.contains(param) || paramMap.contains(param)
  }

  /** Tests whether this instance contains a parameter with the given name. */
  def hasParam(paramName: String): Boolean = {<!-- -->
    params.exists(_.name == paramName)
  }

  /** Get parameters based on parameter names. */
  def getParam(paramName: String): Param[Any] = {<!-- -->
    params.find(_.name == paramName).getOrElse {<!-- -->
      throw new NoSuchElementException(s"The parameter $paramName does not exist.")
    }.asInstanceOf[Param[Any]]
  }

  /**
   * Set parameters in the embedded parameter map.
   */
  final def set[T](param: Param[T], value: T): this.type = {<!-- -->
    set(param -> value)
  }

  /**
   * Set parameters (by name) in embedded parameter maps.
   */
  protected final def set(param: String, value: Any): this.type = {<!-- -->
    set(getParam(param), value)
  }

  /**
   * Set parameters in the embedded parameter map.
   */
  protected final def set(paramPair: ParamPair[_]): this.type = {<!-- -->
    shouldOwn(paramPair.param)
    paramMap.put(paramPair)
    this
  }

  /**
   * Optionally returns the user-supplied value of the parameter.
   */
  final def get[T](param: Param[T]): Option[T] = {<!-- -->
    shouldOwn(param)
    paramMap.get(param)
  }

  /**
   * Clear user-supplied values for input parameters.
   */
  final def clear(param: Param[_]): this.type = {<!-- -->
    shouldOwn(param)
    paramMap.remove(param)
    this
  }

  /**
   * Get the parameter value or its default value in the embedded parameter map. If neither is set, an exception is thrown.
   */
  final def getOrDefault[T](param: Param[T]): T = {<!-- -->
    shouldOwn(param)
    get(param).orElse(getDefault(param)).getOrElse(
      throw new NoSuchElementException(s"Default value for parameter ${<!-- -->param.name} not found"))
  }

  /**
   * Alias for `getOrDefault()`.
   */
  protected final def $[T](param: Param[T]): T = getOrDefault(param)

  /**
   * Set default values for parameters.
   * @param param The parameter to set the default value. Make sure to initialize this parameter before calling this method.
   * @param value default value
   */
  protected final def setDefault[T](param: Param[T], value: T): this.type = {<!-- -->
    defaultParamMap.put(param -> value)
    this
  }

  /**
   * Set default values for a set of parameters.
   *
   * Note: Java developers should use the single-parameter `setDefault`. In the Scala compiler, using the varargs annotation may cause compilation errors, which are caused by bugs in the Scala compiler.
   * See SPARK-9268.
   *
   * @param paramPairs Specifies the parameter whose default value is to be set and a set of parameter pairs with its default value. Make sure to initialize these parameters before calling this method.
   */
  protected final def setDefault(paramPairs: ParamPair[_]*): this.type = {<!-- -->
    paramPairs.foreach {<!-- --> p =>
      setDefault(p.param.asInstanceOf[Param[Any]], p.value)
    }
    this
  }

  /**
   * Get the default value of the parameter.
   */
  final def getDefault[T](param: Param[T]): Option[T] = {<!-- -->
    shouldOwn(param)
    defaultParamMap.get(param)
  }

  /**
   * Test whether the input parameter has a default value.
   */
  final def hasDefault[T](param: Param[T]): Boolean = {<!-- -->
    shouldOwn(param)
    defaultParamMap.contains(param)
  }

  /**
   * Create a copy of this instance with the same UID and some extra parameters.
   * Subclasses should implement this method and set the return type correctly. See `defaultCopy()`.
   */
  def copy(extra: ParamMap): Params

  /**
   * Default implementation of copying with extra parameters.
   * Try to create a new instance with the same UID.
   * Then copies the embedding and extra parameters to a new instance, and returns that new instance.
   */
  protected final def defaultCopy[T <: Params](extra: ParamMap): T = {<!-- -->
    val that = this.getClass.getConstructor(classOf[String]).newInstance(uid)
    copyValues(that, extra).asInstanceOf[T]
  }

  /**
   * Extracts embedded default parameter values and user-supplied values and merges them with extra values from the input into a flat parameter map,
   * where the following value (if there is a conflict) will be used, i.e.: default parameter value < user-supplied value < extra value.
   */
  final def extractParamMap(extra: ParamMap): ParamMap = {<!-- -->
    defaultParamMap + + paramMap + + extra
  }

  /**
   * `extractParamMap` without extra values.
   */
  final def extractParamMap(): ParamMap = {<!-- -->
    extractParamMap(ParamMap.empty)
  }

  /** Internal parameter map for user-supplied values. */
  private[ml] val paramMap: ParamMap = ParamMap.empty

  /** Internal parameter map for default values. */
  private[ml] val defaultParamMap: ParamMap = ParamMap.empty

  /** Verify that the input parameter belongs to this instance. */
  private def shouldOwn(param: Param[_]): Unit = {<!-- -->
    require(param.parent == uid & amp; & amp; hasParam(param.name), s"Parameter $param does not belong to $this.")
  }

  /**
   * Copies parameter values from this instance to another instance so that they share parameters.
   *
   * This handles default parameters and explicitly set parameters separately.
   * Default parameters are copied from `defaultParamMap` to `paramMap`, while explicitly set parameters are copied from `paramMap` to `defaultParamMap`.
   * Warning: This assumes that this [[Params]] instance and the target instance share the same default parameter set.
   *
   * @param to target instance, which should share the same default parameter set as this source instance
   * @param extra Extra parameters to copy to the target's `paramMap`
   * @return The target instance with copied parameter value
   */
  protected def copyValues[T <: Params](to: T, extra: ParamMap = ParamMap.empty): T = {<!-- -->
    val map = paramMap + + extra
    params.foreach {<!-- --> param =>
      //Copy default parameters
      if (defaultParamMap.contains(param) & amp; & amp; to.hasParam(param.name)) {<!-- -->
        to.defaultParamMap.put(to.getParam(param.name), defaultParamMap(param))
      }
      //Copy explicitly set parameters
      if (map.contains(param) & amp; & amp; to.hasParam(param.name)) {<!-- -->
        to.set(param.name, map(param))
      }
    }
    to
  }
}

private[ml] object Params {<!-- -->
  /**
   * Set default parameter values for `Params`.
   */
  private[ml] final def setDefault[T](params: Params, param: Param[T], value: T): Unit = {<!-- -->
    params.defaultParamMap.put(param -> value)
  }
}

5. Official link

  • SparkMLlib – Params