Use Python to batch modify PPT fonts and extract all text to word

Directory

    • 1. Modify the font of each page in PPT
      • Python modification
      • VBA modification, modify the text box and chart font to Microsoft Yahei (commonly used)
      • VBA code base
    • 2. Put all the words in the text box into word
    • 3. Insert pictures and modify positions in PPT
      • 1. Insert pictures into PPT
      • 2. Modify the position of pictures in PPT

It is a common requirement to unify the font, size, and boldness of each page of a PPT. In particular, font unification is a high-frequency and hot demand. There is a bug in python-pptx, a common library for controlling PPT in python. Modification of fonts can only modify numbers and English letters, but not Chinese characters. Right now
The run.font.namet property can only modify English and numbers, and
run.font.name also recognizes English and numeric names. If the English and numbers in the text box are ‘Arial’ and the Chinese characters are in Song style, ‘Arial’ will be returned. Because this package does not have an API for Chinese characters, and this package has not been updated for a long time, the developer provided a solution by modifying the underlying xml of the office file to implement it, and modifying the typeface attribute of a:ea in the xml. Some people on the Internet have already used pptx_ea_font This package implements this functionality.

First install the corresponding package
The packages of pptx and docx are, please note not pptx and docx

pip install python-pptx
pip install python-docx

The installation method of pptx_ea_font is

pip install pptx_ea_font

Import the corresponding module

from pptx import Presentation
import pptx_ea_font
from docx import Document
from pptx.util import Cm, Pt

1. Modify the font of each page in the PPT

Python modification

1. You can modify the font, size, and whether to bold
2. The Chinese characters of graphics, charts, and tables cannot be modified yet. This function needs to be added in the next step.

The function is as follows:

def change_ppt_font(ppt_file, new_font,new_size=None,bold=None,line_spacing=None):
    #Open PPT file
    presentation = Presentation(ppt_file)

    # Loop through each slide
    for slide in presentation.slides:
        # Loop through each shape in the slide
        for shape in slide.shapes:
            # Check if the shape type is a text box
            if shape.has_text_frame:
                # Get the text in the text box
                text_frame = shape.text_frame
                for paragraph in text_frame.paragraphs:
                    if line_spacing is not None:
                        paragraph.line_spacing = line_spacing
                    for run in paragraph.runs:
                        #Modify font
                        pptx_ea_font.set_font(run,new_font)
                        #The following methods can only modify numbers and English
                        #run.font.name = new_font
                        if new_size :
                            run.font.size = Pt(new_size)
                        if bold is not None:
                            run.font.bold = bold

    # Save the modified PPT file
    new_ppt_file = ppt_file.replace(".pptx", "_new.pptx")
    presentation.save(new_ppt_file)

    print("Font modification completed!")

The above code can only modify the text box because the graphic is an msogroup object. If you want to modify the fonts in graphics, you need to use VBA. alt + F11 to insert the module, copy the following code and press F5
The code comes from the integration of existing codes on the Internet.
Note: The following code still cannot modify the text in the chart.

VBA modification, change the text box and chart font to Microsoft Yahei (commonly used)

Sub SetTextFontToYahei()

    Dim sld As Slide
    Dim shp As Shape, chd As Shape
    Dim i & amp;, j & amp;
    
    For Each sld In ActivePresentation.Slides
        i = i + 1
        Debug.Print "Slide " & amp; i
        
        For Each shp In sld.Shapes
            j=j+1
            Debug.Print vbTab & "Shape " & j
        
            If shp.Type = msoGroup Then
                For Each chd In shp.GroupItems
                    If chd.HasTextFrame Then
                        chd.TextFrame.TextRange.Font.Name = "Microsoft Yahei"
                        chd.TextFrame.TextRange.Font.NameFarEast = "Microsoft Yahei"
                    End If
                Next
            ElseIf shp.HasTextFrame Then
                shp.TextFrame.TextRange.Font.Name = "Microsoft Yahei"
                shp.TextFrame.TextRange.Font.NameFarEast = "Microsoft Yahei"
            End If
        Next
    Next
            
End Sub

Sub ChangeTableFontInAllSlides()
Dim oSlide As Slide
Dim oShape As Shape
Dim oTable As Table
Dim oRow As Row
Dim oCell As Cell
Dim oTxtRange As TextRange

On Error Resume Next
For Each oSlide In ActivePresentation.Slides
For Each oShape In oSlide.Shapes
If oShape.HasTable Then ' Process text in the table
Set oTable = oShape.Table
For Each oRow In oTable.Rows
For Each oCell In oRow.Cells
If oCell.Shape.HasTextFrame Then
Set oTxtRange = oCell.Shape.TextFrame.TextRange

With oTxtRange.Font
'─────────────────────────────
'Change to the font name you need
.Name = "Microsoft Yahei"
'.Size = 20 Modify to the font size you need
'.Color.RGB = RGB(255, 0, 0) Modify to the font color you need
'.Bold = True Modify it to your desired boldness or not
'.Italic = False Modify it to your desired preference
'.Underline = False Modify to your needs whether there is an underline
'─────────────────────────────
End With

End If
Next oCell
Next oRow
End If
Next oShape
Next oSlide
MsgBox "Text box and chart modification completed"
End Sub

VBA code library

The following code is more comprehensive and can modify tables and graphics, but cannot modify the text in charts and graphics aggregated by text boxes.

Sub ChangeFontInAllSlides()

Dim oSlide As Slide

Dim oShape As Shape

Dim oTable As Table

Dim oRow As Row

Dim oCell As Cell

Dim oTxtRange As TextRange

On Error Resume Next

For Each oSlide In ActivePresentation.Slides

For Each oShape In oSlide.Shapes

If oShape.HasTextFrame Then ' Process the text in the text box

Set oTxtRange = oShape.TextFrame.TextRange

With oTxtRange.Font

'─────────────────────────────

.Name = "Arial" ' Change to the font name you need

.Size = 20 ' Modify to your desired font size

.Color.RGB = RGB(255, 0, 0) ' Modify to the font color you need

.Bold = True ' Modify to your desired boldness or not

.Italic = False ' Modify to your desired whether it is italic or not

.Underline = False ' Modify to what you need whether there is an underline

'─────────────────────────────

End With

End If

If oShape.HasTable Then ' Process text in the table

Set oTable = oShape.Table

For Each oRow In oTable.Rows

For Each oCell In oRow.Cells

If oCell.Shape.HasTextFrame Then

Set oTxtRange = oCell.Shape.TextFrame.TextRange

With oTxtRange.Font

'─────────────────────────────

.Name = "Arial" ' Change to the font name you need

.Size = 20 ' Modify to your desired font size

.Color.RGB = RGB(255, 0, 0) ' Modify to the font color you need

.Bold = True ' Modify to your desired boldness or not

.Italic = False ' Modify to your desired whether it is italic or not

.Underline = False ' Modify to what you need whether there is an underline

'─────────────────────────────

End With

End If

Next oCell

Next oRow

End If

Next oShape

Next oSlide

End Sub

2. Put all the words in the text box into word

def extract_text_from_ppt(ppt_file, word_file):
    #Open PPT file
    presentation = Presentation(ppt_file)

    #Create a new Word document
    word_doc = Document()

    # Loop through each slide
    for slide in presentation.slides:
        # Loop through each shape in the slide
        for shape in slide.shapes:
            # Check if the shape type is a text box
            if shape.has_text_frame:
                # Get the text in the text box
                text_frame = shape.text_frame
                for paragraph in text_frame.paragraphs:
                    # Extract text into Word
                    word_doc.add_paragraph(paragraph.text)

    # Save Word document
    word_doc.save(word_file)

    print("Text extraction completed!")

3. Insert pictures and modify positions in PPT

1. Insert pictures into PPT

Sub InsertPicturesToPPT()
  Dim sld As Slide
  Dim shp As Shape
  Dim i, count, numPicturePerSlide, curSlide As Long
  Dim slideWidth, slideHeight As Single
  Dim autoAddSlide As Boolean

  curSlide = ActiveWindow.View.Slide.SlideIndex

  'Use this variable to set the number of pictures to be inserted into each PPT page
  numPicturePerSlide = 1
  'Use this variable to set whether to automatically add a new PPT page to insert all selected pictures when the number of pages is insufficient. Set it to False to cancel this function.
  autoAddSlide = True

  fd = Split(FileDialogOpen, vbLf)
  If Left(fd(0), 1) = "-" Then
    Debug.Print "Canceled"
    Exit Sub
  End If

  slideWidth = ActivePresentation.PageSetup.slideWidth
  slideHeight = ActivePresentation.PageSetup.slideHeight

  If autoAddSlide Then
    If (ActivePresentation.Slides.count - curSlide + 1) * numPicturePerSlide < UBound(fd) - LBound(fd) + 1 Then
      total = Int((UBound(fd) - LBound(fd) + 1) / numPicturePerSlide - ActivePresentation.Slides.count + curSlide - 1 + 0.5)
      For i = ActivePresentation.Slides.count + 1 To ActivePresentation.Slides.count + total
        ' Add a blank page at the end
        'ActivePresentation.Slides.Add i, ppLayoutBlank
        ' Add a blank page after the current page
        ' ActivePresentation.Slides.Add curSlide, ppLayoutBlank
        ' ActivePresentation.Slides(curSldie - 1).Select
        ' Copies the current page after the current page
        ActivePresentation.Slides(curSlide).Duplicate
      Next i
    End If
  End If

  count = 0
  For Each sld In ActivePresentation.Slides
    ' Skip hidden PPT pages and skip pages before the current page
    Debug.Print sld.SlideIndex & amp; " >= " & amp; curSlide
    If sld.SlideShowTransition.Hidden = msoFalse And sld.SlideIndex >= curSlide Then
      If count + LBound(fd) > UBound(fd) Then
        ' No picture to insert
        Exit For
      End If

      For i = 1 To numPicturePerSlide
        If count + LBound(fd) <= UBound(fd) Then
          Set shp = sld.Shapes.AddPicture(_
            FileName:=fd(count + LBound(fd)), _
            LinkToFile:=msoFalse, _
            SaveWithDocument:=msoTrue, _
            Left:=0, _
            Top:=0, _
            Width:=-1, _
            Height:=-1 _
          )
          With shp
            .LockAspectRatio = msoTrue ' Lock aspect ratio
            '.ScaleHeight 0.75, msoTrue
            .Left = slideWidth / numPicturePerSlide * i - .Width / 2
            .Top = (slideHeight - .Height) / 2
            '.ZOrder msoSendToBack ' Set the picture to the bottom layer
          End With
          count = count + 1
        Else
          Exit For
        End If
      Next i
    End If
  Next sld

  'MsgBox "Processing finished. Inserted (" & amp; count & amp; ") pictures in total"
  MsgBox "Inserting pictures is completed, a total of " & amp; count & amp; " pictures"

End Sub

Function FileDialogOpen() As String

  #If Mac Then
    ' default path
    mypath = MacScript("return (path to desktop folder) as String")

    sMacScript = "set applescript's text item delimiters to "","" " & amp; vbNewLine & amp; _
    "try " & amp; vbNewLine & amp; _
    "set theFiles to (choose file of type {""png"", ""jpg"", ""jpeg"", ""svg"" , ""tiff"", ""gif""}" & amp; _
    "with prompt ""Please select one or more pictures to insert"" default location alias """ & amp; _
    mypath & amp; """ multiple selections allowed true)" & amp; vbNewLine & amp; _
    "set applescript's text item delimiters to """" " & amp; vbNewLine & amp; _
    "on error errStr number errorNumber" & amp; vbNewLine & amp; _
    "return errorNumber " & amp; vbNewLine & amp; _
    "end try " & amp; vbNewLine & amp; _
    "repeat with i from 1 to length of theFiles" & amp; vbNewLine & amp; _
    "if i = 1 then" & amp; vbNewLine & amp; _
    "set fpath to POSIX path of item i of theFiles" & amp; vbNewLine & amp; _
    "else" & amp; vbNewLine & amp; _
    "set fpath to fpath & amp; """ & amp; vbNewLine & amp; _
    """ & amp; POSIX path of item i of theFiles" & amp; vbNewLine & amp; _
    "end if" & amp; vbNewLine & amp; _
    "end repeat" & amp; vbNewLine & amp; _
    "return fpath"

    FileDialogOpen = MacScript(sMacScript)

  #Else
    With Application.FileDialog(msoFileDialogOpen)
      .AllowMultiSelect = True
      .Title = "Please select one or more pictures to insert"
      .Filters.Add "picture", "*.png; *.jpg; *.jpeg; *.svg; *.tiff; *.gif", 1
      If .Show = -1 Then
        FileDialogOpen = ""
        For i = 1 To .SelectedItems.count
          If i = 1 Then
            FileDialogOpen = .SelectedItems.Item(i)
          Else
            FileDialogOpen = FileDialogOpen & amp; vbLf & .SelectedItems.Item(i)
          End If
        Next
      Else
        FileDialogOpen = "-"
      End If
    End With

  #EndIf
End Function

2. Modify the position of pictures in PPT

Sub test()
'Get all ppt pages
For Each currentSlide In ActivePresentation.Slides 'Loop through each page
    For Each shp In currentSlide.Shapes
      'type = 13 is a picture and 17 is a text box
      If shp.Type = 13 Then
       shp.Top = 10 'Set the top position
       shp.Left = 10 'Set the left position
        shp.Height = 10000 'Set the image height position
        shp.Width = 600
        End If
    Next shp
 
Next currentSlide

End Sub