[Solved] python3 running spark program error: ModuleNotFoundError: No module named ‘_ctypes’

Table of Contents

1. Project Scenario:

2. Problem description

3. Reason analysis:

Fourth, the solution:


1. Project Scenario:

After installing spark and anaconda, I was going to use python for spark programming. I reported this error when I ran the python program with the python command. Here is how to solve it.

2. Description of the problem

The spark program written:

from pyspark import SparkContext
sc = SparkContext('local', 'test')
logFile = "file:///usr/local/spark/README.md"
logData = sc.textFile(logFile, 2).cache()
numAs = logData.filter(lambda line: 'a' in line).count()
numBs = logData.filter(lambda line: 'b' in line).count()
print('Lines with a: %s, Lines with b: %s' % (numAs, numBs))

Running error:

3. Reason analysis:

I checked the information and found that the development link library package of the external function library (libffi) was missing in the CentOS7 system, and was used when running the spark program, so this error was reported.

Reference article: link

4. Solution:

1. Install external function library (libffi)

yum install libffi-devel -y

2. The environment variables that have been configured do not need to be changed, just delete anaconda3 and install it again.

2.1. Delete

rm -rf anaconda3

2.2, installation

bash ... #here...is your own anaconda installation package

Run the python file again, it runs successfully: