Vector Similarity Search Using Redis: Solving Similarity Matching Problems Between Text, Image and Audio Empowering Technology

1. Preface

As we all know, Java is a cross-platform language, which has different implementations for different operating systems. This article looks at how Java does it from a very simple api call.

2. Source code analysis

See from FileInputStream.java that readBytes is finally a native call

/**
     * Reads a subarray as a sequence of bytes.
     * @param b the data to be written
     * @param off the start offset in the data
     * @param len the number of bytes that are written
     * @exception IOException If an I/O error has occurred.
     */
    private native int readBytes(byte b[], int off, int len) throws IOException; // native call

    /**
     * Reads up to <code>b.length</code> bytes of data from this input
     * stream into an array of bytes. This method blocks until some input
     * is available.
     *
     * @param b the buffer into which the data is read.
     * @return the total number of bytes read into the buffer, or
     * <code>-1</code> if there is no more data because the end of
     * the file has been reached.
     * @exception IOException if an I/O error occurs.
     */
    public int read(byte b[]) throws IOException {
        return readBytes(b, 0, b. length);
    }

From the jdk source code, we find FileInputStream.c (/jdk/src/share/native/java/io), which defines the native call of the corresponding file.

// FileInputStream.c

JNIEXPORT jint JNICALL
Java_java_io_FileInputStream_readBytes(JNIEnv *env, jobject this,
        jbyteArray bytes, jint off, jint len) {
    return readBytes(env, this, bytes, off, len, fis_fd);
}

We observe the current directory, and we can see that java provides special implementations for four typical unix-like systems (bsd, linux, macosx, solaris) and windows. share is the public part.

Get the file fd field in the header (fd is a non-negative positive integer used to identify the open file)

// FileInputStream.c

JNIEXPORT void JNICALL
Java_java_io_FileInputStream_initIDs(JNIEnv *env, jclass fdClass) {
    fis_fd = (*env)->GetFieldID(env, fdClass, "fd", "Ljava/io/FileDescriptor;"); /* fd field, later used to get fd */
}

Continue to call readBytes

// ioutil.c

jint
readBytes(JNIEnv *env, jobject this, jbyteArray bytes,
          jint off, jint len, jfieldID fid)
{
    jint nread;
    char stackBuf[BUF_SIZE];
    char *buf = NULL;
    FD fd;

    if (IS_NULL(bytes)) {
        JNU_ThrowNullPointerException(env, NULL);
        return -1;
    }

    if (outOfBounds(env, off, len, bytes)) { /* out of bounds judgment */
        JNU_ThrowByName(env, "java/lang/IndexOutOfBoundsException", NULL);
        return -1;
    }

    if (len == 0) {
        return 0;
    } else if (len > BUF_SIZE) {
        buf = malloc(len); /* Insufficient buffer, dynamically allocate memory */
        if (buf == NULL) {
            JNU_ThrowOutOfMemoryError(env, NULL);
            return 0;
        }
    } else {
        buf = stackBuf;
    }

    fd = GET_FD(this, fid); /* get fd */
    if (fd == -1) {
        JNU_ThrowIOException(env, "Stream Closed");
        nread = -1;
    } else {
        nread = IO_Read(fd, buf, len); /* execute read, system call */
        if (nread > 0) {
            (*env)->SetByteArrayRegion(env, bytes, off, nread, (jbyte *)buf);
        } else if (nread == -1) {
            JNU_ThrowIOExceptionWithLastError(env, "Read error");
        } else { /* EOF */
            nread = -1;
        }
    }

    if (buf != stackBuf) {
        free(buf); /* Failed to free memory */
    }
    return nread;
}

Let’s continue to look at the implementation of IO_Read, which is a macro definition

#define IO_Read handleRead

handleRead has two implementations

Solaris implementation:

// /jdk/src/solaris/native/java/io/io_util_md.c

ssize_t
handleRead(FD fd, void *buf, jint len)
{
    ssize_t result;
    RESTARTABLE(read(fd, buf, len), result);
    return result;
}

/*
 * Retry the operation if it is interrupted
 */
#define RESTARTABLE(_cmd, _result) do { \
    do { \
        _result = _cmd; \
    } while((_result == -1) & amp; & amp; (errno == EINTR)); \ /* If it is interrupted, keep retrying to avoid process scheduling waiting*/
} while(0)

The read method can refer to unix man page

Windows implementation:

// jdk/src/windows/native/java/io/io_util_md.c

JNI EXPORT
jint
handleRead(FD fd, void *buf, jint len)
{
    DWORD read = 0;
    BOOL result = 0;
    HANDLE h = (HANDLE)fd;
    if (h == INVALID_HANDLE_VALUE) {
        return -1;
    }
    result = ReadFile(h, /* File handle to read */
                      buf, /* address to put data */
                      len, /* number of bytes to read */
                       & amp; read, /* number of bytes read */
                      NULL); /* no overlapped struct */
    if (result == 0) {
        int error = GetLastError();
        if (error == ERROR_BROKEN_PIPE) {
            return 0; /* EOF */
        }
        return -1;
    }
    return (jint) read;
}

3. Preliminary study on java exception

// jdk/src/share/native/common/jni_util.c

/**
 * Throw a Java exception by name. Similar to SignalError.
 */
JNIEXPORT void JNICALL
JNU_ThrowByName(JNIEnv *env, const char *name, const char *msg)
{
    jclass cls = (*env)->FindClass(env, name);

    if (cls != 0) /* Otherwise an exception has already been thrown */
        (*env)->ThrowNew(env, cls, msg); /* call JNI interface*/
}

/* JNU_Throw common exceptions */

JNIEXPORT void JNICALL
JNU_ThrowNullPointerException(JNIEnv *env, const char *msg)
{
    JNU_ThrowByName(env, "java/lang/NullPointerException", msg);
}

Finally, call JNI:

// hotspot/src/share/vm/prims/jni.h

jint ThrowNew(jclass clazz, const char *msg) {
        return functions->ThrowNew(this, clazz, msg);
    }

jint (JNICALL *ThrowNew)
      (JNIEnv *env, jclass clazz, const char *msg);

4. Summary

Many high-level languages have different programming paradigms, but in the final analysis it is still (c language) system calls, and c language can do a lot of optimization at a lower level. If we understand these low-level system calls, we can see the essence of the problem.

This article does not do an in-depth analysis of JNI, and we will continue to analyze it later.