Create complex Word (.docx) documents programatically with docx4j

8 minute read

A couple of months ago I needed to create a dynamic Word document with a number of tables and paragraphs. In the past I’ve used POI for this, but I’ve found this hard to use and it doesn’t work that well for me when creating more complex documents. So for this project, after some searching around, I decided to use docx4j. Docx4j, according to their site is a:

"docx4j is a Java library for creating and manipulating Microsoft Open XML (Word docx, Powerpoint pptx, and Excel xlsx) files. It is similar to Microsoft's OpenXML SDK, but for Java. "

In this article I’ll show you a couple of examples you can use to generate content for word documents. More specifically we’ll look at the following two examples:

  • Load in a template word document to add content to and save as new document
  • Add paragraphs to this template document
  • Add tables to this template document

The general approach here is to first create a Word document that contains the layout and main styles of your final document. In this document you’ll need to add placeholders (simple strings) that we’ll use to search for and replace with real content.

A very basic template for instance looks like this: template.docx-1.png

In this article we’ll show you how you can fill this so get this: template-out.docx (Compatibiliteitsmodus).png

Load in a template word document to add content to and save as new document

First things first. Lets create a simple word document that we can use as a template. For this just open Word, create a new document and save it as template.docx. This is the word template we’ll use to add content to. The first thing we need to do is load this document with docx4j. You can this with the following piece of java code:

	private WordprocessingMLPackage getTemplate(String name) throws Docx4JException, FileNotFoundException {
		WordprocessingMLPackage template = WordprocessingMLPackage.load(new FileInputStream(new File(name)));
		return template;
	}

This will return a java object representing the complete (at this moment) empty document. We can now use the Docx4J API to add, delete and modify content in this word document. Docx4J has a number of helper classes you can use to traverse through this document. I did write a couple of helpers myself though that make it really easy to find the specific placeholders and replace them with the real content. Lets look at one of them. This operation is a wrapper around a couple of JAXB operations that allows you to search through a specific element and all it’s children for a certain class. You can for instance use this to get all the tables in the document, all the rows within a table and more like that.

	private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
		List<Object> result = new ArrayList<Object>();
		if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();
		
		if (obj.getClass().equals(toSearch))
			result.add(obj);
		else if (obj instanceof ContentAccessor) {
			List<?> children = ((ContentAccessor) obj).getContent();
			for (Object child : children) {
				result.addAll(getAllElementFromObject(child, toSearch));
			}

		}
		return result;
	}

Nothing to complex, but really helpful. Lets see how we can use this operation. For this example we’ll just replace a simple text placeholder with a different value. This is for instance something you’d use to dynamically set the title of a document. First though, add a custom placeholder in the word template you created. I’ll use SJ_EX1 for this. We’ll replace this value with our name. The basic text elements in a docx4j are represented by the org.docx4j.wml.Text class. To replace this simple placeholder all we have to do is call this method:

	private void replacePlaceholder(WordprocessingMLPackage template, String name, String placeholder ) {
		List<Object> texts = getAllElementFromObject(template.getMainDocumentPart(), Text.class);

		for (Object text : texts) {
			Text textElement = (Text) text;
			if (textElement.getValue().equals(placeholder)) {
				textElement.setValue(name);
			}
		}
	}

This will look for all the Text elements in the document, and those that match are replaced with the value we specify. Now all we need to do is write the document back to a file.

	private void writeDocxToStream(WordprocessingMLPackage template, String target) throws IOException, Docx4JException {
		File f = new File(target);
		template.save(f);
	}

Not that hard as you can see.

With this setup we can also add more complex content to our word documents. The easiest way to determine how to add specific content is by looking at the XML source code of the word document. That’ll tell you which wrappers are needed and how Word marshalls the XML. For the next example we’ll look at how to add a complete paragraph.

Add paragraphs to this template document

You might wonder why we need to be able to add paragraphs? We can already add text, and isn’t a paragraph just a large piece of text? Well, yes and no. A paragraph indeed looks like a big piece of text, but what you need to take into account are the linebreaks. If you add a Text element, like we did earlier, and add linebreaks to the text, they won’t show up. When you want linebreaks, you’ll need to create a new paragraph. Luckily, though, this is also very easy to do with Docx4j. We’ll do this by taking the following steps:

  1. Find the paragraph to replace from the template
  2. Split the input text into seperate lines
  3. For each line create a new paragraph based on the paragraph from the template
  4. Remove the original paragraph

Shouldn’t be to hard with the helper methods we already have.

	private void replaceParagraph(String placeholder, String textToAdd, WordprocessingMLPackage template, ContentAccessor addTo) {
		// 1. get the paragraph
		List<Object> paragraphs = getAllElementFromObject(template.getMainDocumentPart(), P.class);

		P toReplace = null;
		for (Object p : paragraphs) {
			List<Object> texts = getAllElementFromObject(p, Text.class);
			for (Object t : texts) {
				Text content = (Text) t;
				if (content.getValue().equals(placeholder)) {
					toReplace = (P) p;
					break;
				}
			}
		}
		
		// we now have the paragraph that contains our placeholder: toReplace
		// 2. split into seperate lines
		String as[] = StringUtils.splitPreserveAllTokens(textToAdd, '\n');

		for (int i = 0; i < as.length; i++) {
			String ptext = as[i];
			
			// 3. copy the found paragraph to keep styling correct
			P copy = (P) XmlUtils.deepCopy(toReplace);
			
			// replace the text elements from the copy
			List<?> texts = getAllElementFromObject(copy, Text.class);
			if (texts.size() > 0) {
				Text textToReplace = (Text) texts.get(0);
				textToReplace.setValue(ptext);
			}
			
			// add the paragraph to the document
			addTo.getContent().add(copy);
		}
		
		// 4. remove the original one
		((ContentAccessor)toReplace.getParent()).getContent().remove(toReplace);
		
	}

In this method we replace the content of a paragraph with the supplied text and then new paragraphs to the argument specified with addTo.

		String placeholder = "SJ_EX1";
		String toAdd = "jos\ndirksen";
		
		replaceParagraph(placeholder, toAdd, template, template.getMainDocumentPart());

If you run this with more content in your word template you’ll notice that the paragraphs will appear at the bottom of your document. The reason is that the paragraphs are added back to the main document. If you want your paragraphs to be added at a specific place in your document (which is something you usually want) you can wrap them in a 1x1 borderless table. This table is than seen as the parent of the paragraph and new paragraphs can be added there.

Add tables to this template document

The final example I’d like to show is how to add tables to a word template. A better description actually would be, how you can fill predefined tables in your word template. Just as we did for simple text and paragraphs, we’ll replace placeholders. For this example add a simple table to your word document (which you can style as you like). To this table add 1 dummy row that serves as template for the content. In the code we’ll look for that row, copy it, and replace the content with new rows from java code like this:

  1. find the table that contains one of our keywords
  2. copy the row that serves as row template
  3. for each row of data add a row to the table based on the row template
  4. remove the original template row

The same approach as we’ve also shown for the paragraphs. First though lets look at how we’ll provide the replacement data. For this example I just supply a set of hashmaps that contain the name of the placeholder to replace and the value to replace it with. I also provide the replacement tokens that can be found in the table row.

               Map<String,String> repl1 = new HashMap<String, String>();
		repl1.put("SJ_FUNCTION", "function1");
		repl1.put("SJ_DESC", "desc1");
		repl1.put("SJ_PERIOD", "period1");

		Map<String,String> repl2 = new HashMap<String, String>();
		repl2.put("SJ_FUNCTION", "function2");
		repl2.put("SJ_DESC", "desc2");
		repl2.put("SJ_PERIOD", "period2");
		
		Map<String,String> repl3 = new HashMap<String, String>();
		repl3.put("SJ_FUNCTION", "function3");
		repl3.put("SJ_DESC", "desc3");
		repl3.put("SJ_PERIOD", "period3");
		
		replaceTable(new String[]{"SJ_FUNCTION","SJ_DESC","SJ_PERIOD"}, Arrays.asList(repl1,repl2,repl3), template);

Now what does this replaceTable method look like.

	private void replaceTable(String[] placeholders, List<Map<String, String>> textToAdd,
			WordprocessingMLPackage template) throws Docx4JException, JAXBException {
		List<Object> tables = getAllElementFromObject(template.getMainDocumentPart(), Tbl.class);

		// 1. find the table
		Tbl tempTable = getTemplateTable(tables, placeholders[0]);
		List<Object> rows = getAllElementFromObject(tempTable, Tr.class);

		// first row is header, second row is content
		if (rows.size() == 2) {
			// this is our template row
			Tr templateRow = (Tr) rows.get(1);

			for (Map<String, String> replacements : textToAdd) {
				// 2 and 3 are done in this method
				addRowToTable(tempTable, templateRow, replacements);
			}

			// 4. remove the template row
			tempTable.getContent().remove(templateRow);
		}
	}

This method finds the table, gets the first row and for each supplied map it add a new row to the table. Before returning it removes the template row. This method uses two helpers: addRowToTable and getTemplateTable. We’ll first look at this last one:

	private Tbl getTemplateTable(List<Object> tables, String templateKey) throws Docx4JException, JAXBException {
		for (Iterator<Object> iterator = tables.iterator(); iterator.hasNext();) {
			Object tbl = iterator.next();
			List<?> textElements = getAllElementFromObject(tbl, Text.class);
			for (Object text : textElements) {
				Text textElement = (Text) text;
				if (textElement.getValue() != null && textElement.getValue().equals(templateKey))
					return (Tbl) tbl;
			}
		}
		return null;
	}

This function just looks whether a table contains one of our placeholders. If so that table is returned. The addRowToTable operation is also very simple.

	private static void addRowToTable(Tbl reviewtable, Tr templateRow, Map<String, String> replacements) {
		Tr workingRow = (Tr) XmlUtils.deepCopy(templateRow);
		List<?> textElements = getAllElementFromObject(workingRow, Text.class);
		for (Object object : textElements) {
			Text text = (Text) object;
			String replacementValue = (String) replacements.get(text.getValue());
			if (replacementValue != null)
				text.setValue(replacementValue);
		}

		reviewtable.getContent().add(workingRow);
	}

This method copies our template and replaces the placeholders in this template row with the provided values. This copy is added to the table. And that’s it. With this piece of code we can fill arbitrairy tables in our word document, while preserving table layout and styling.

That’s it so far for this article. With paragraphs and tables you can create many different types of documents and this nicely matches the type of documents that are most often generated. This same approach though can also be used to add other type of content to word documents.

Updated: