2

So far I have close to working code that parses the document and gets heading, title and text between two titles. The content I am trying to extract has bullets, line break, etc and I would like to maintain the format when I paste it into a cell. Have been looking around and reading a lot of forums but unable to figure out how to keep the formatting intact. I looked into PasteSpecial but that pastes the content across multiple cells plus I would like to avoid copy/paste if possible.

Below's a very early code I have (has bugs that I am debugging/fixing):

Sub GetTextFromWord()

Dim Paragraph As Object, WordApp As Object, WordDoc As Object
Dim para As Object
Dim paraText As String
Dim outlineLevel As Integer
Dim title As String
Dim body As String
Dim myRange As Object
Dim documentText As String
Dim startPos As Long
Dim stopPos As Long
Dim file As String
Dim i As Long
Dim category As String

startPos = -1
i = 2

Application.ScreenUpdating = True
Application.DisplayAlerts = False


file = "C:\Sample.doc"
Set WordApp = CreateObject("Word.Application")
WordApp.Visible = True
Set WordDoc = WordApp.Documents.Open(file)

Set myRange = WordDoc.Range
documentText = myRange.Text

For Each para In ActiveDocument.Paragraphs
    ' Get the current outline level.
    outlineLevel = para.outlineLevel

    ' Cateogry/Header begins outline level 1, and ends at the next outline level 1.
    If outlineLevel = wdOutlineLevel1 Then 'e.g., 1 Header
        category = para.Range.Text
    End If

    ' Set category as value for cells in Column A
    Application.ActiveWorkbook.Worksheets("Sheet1").Cells(i - 1, 1).Value = category

    ' Title begins outline level 1, and ends at the next outline level 1.
    If outlineLevel = wdOutlineLevel2 Then ' e.g., 1.1
        ' Get the title and update cells in Column B
        title = para.Range.Text
        Application.ActiveWorkbook.Worksheets("Sheet1").Cells(i, 2).Value = title

        startPos = InStr(nextPosition, documentText, title, vbTextCompare)

        If startPos <> stopPos Then
            ' this is text between the two titles
            body = Mid$(documentText, startPos, stopPos)
            ActiveSheet.Cells(i - 1, 3).Value = body
        End If

        stopPos = startPos
        i = i + 1

    End If


Next para


WordDoc.Close
WordApp.Quit
Set WordDoc = Nothing
Set WordApp = Nothing
End Sub

Link to Sample Doc

3
  • the best way to keep formatting is to...copy & paste, unfortunately. So, first try to fully explore this direction. Obviously it's not the only option but the other one will double-triple your code (or even more). Link to your files is not working, asking for log-in :( Commented Aug 29, 2013 at 9:06
  • Thanks for your response. I tried copy/paste but problem I encountered was that the text is spreading across multiple cells. In excel, I want everything between 1.1 and 1.2 into one cell with some amount of formatting preserved (atleast the line breaks if nothing). Below link to the Word Doc should work without requiring to sign-in: docs.google.com/file/d/0B_UNDFf6UzJHZHk3VC0xelFnV0U/… Commented Aug 29, 2013 at 20:35
  • Do you know there is a maximum length of text you can store in an Excel Cell? e.g. 32767 characters in Excel 2007. Commented Sep 6, 2013 at 7:04

1 Answer 1

1

You probably found a solution by now, but what I would do is open excel, start the macro recording, then select a cell, click on the icon to expand the cell entry field, then paste some formatted text. Then stop the macro and view the code. The key is the pasting into the cell field at the top. Grab the bit of code that you need for your word macro. Hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

This is probably more appropriate as a comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.