Document Information Extraction is a service provided on BTP. It leverages machine learning and you can upload business documents such as invoice, purchase order to receive extracted information.
The purpose of this blog post is to demonstrate how to integrate Document Information Extraction with UI5 application. We will upload an invoice and get extracted information displayed on the app.
The code is available at GitHub as always.
Application behavior
When you upload an invoice pdf, the app posts the file to Document Information Extraction. Next, press “Refresh” button until the extraction job finishes. Finally, Extracted data will be displayed on the screen.
You can download sample invoices from the following tutorial page.
Use Machine Learning to Extract Information from Documents with Document Information Extraction Trial UI
Prerequisites for running the application
- An instance of Document Information Extraction and its service key (you can run booster to create them automatically)
- Destination pointing to Document Information Extraction API
Property | Value |
---|---|
Name | doc-info-extraction |
Type | HTTP |
URL | “url” in the service key + “/v1” |
Proxy Type | Internet |
Authentication | OAuth2ClientCredentials |
Client ID | “uaa.clientid” in the service key |
Client Secret | “uaa.clientsecret” in the service key |
Token Service URL | “uaa.url” int the service key + /oautn/token |
API used for the application
The application uses two API endpoints of Document Information Extraction. You can find API documentation here.
POST /document/jobs
This endpoint is used to upload a file along with options to tell the service what type of document you are going to upload and which fields you want to have back. Instead of passing exact fields, you can also specify a template (you need to define it beforehand). For more information, please refer to the document.
The following screenshot shows a request executed from Postman. For headers Content-Type: multipart/form-data
is set.
GET /document/jobs/{id}
As you see int the picture above, POST request returns id which you can use to retrieve the extraction results. At first the status may be “RUNNING”.
After some time (say, 10 seconds), the status will become “DONE” and you will get extraction results.
UI5 code
The key parts are as follows.
- Uploading a file to Document Information Extraction
- Retrieving extraction results
I used ts-app (TypeScript) template of generator-ui5.
Please note that the app needs to be deployed to BTP to function.
1. Uploading a file to Document Information Extraction
When you presses “Upload” button, the app will get the uploaded file and post it to /document/jobs endpoint. After successful upload, you will get an id which you’ll use later to fetch extraction results.
public async handleUploadPress(): Promise<void> {
if(this._jobId) {
MessageBox.confirm((this.getResourceBundle() as ResourceBundle).getText("confirmText"), {
onClose: async (oAction: string) => {
if (oAction === "OK") {
this._resetData()
await this._uploadImage()
}
}
})
} else {
await this._uploadImage()
}
}
private async _uploadImage(): Promise<void> {
//prepare form data
const oFileUploader = this.byId("fileUploader") as FileUploader
const oUploadedFile = oFileUploader.oFileUpload.files[0] as File
const blob = new Blob([oUploadedFile], { type: oUploadedFile.type })
const formData = new FormData()
formData.append("file", blob, oUploadedFile.name)
const options = (this.getOwnerComponent().getModel("options") as JSONModel).getData() as Options
formData.append('options', JSON.stringify(options))
//call die
const response = await this._postToDie(formData)
this._jobId = response.id;
// enable refresh button
(this.getView().getModel("viewModel") as JSONModel).setProperty("/refreshEnabled", true)
}
private async _postToDie(formData:FormData): Promise<Response> {
const dieUrl = this._getbaseUrl() + "/document/jobs"
const response = await fetch(dieUrl, {
method: 'POST',
body: formData
})
return response.json()
}
private _getbaseUrl(): string {
const appId = this.getOwnerComponent().getManifestEntry("/sap.app/id")
const appPath = appId.replaceAll(".", "/")
const appModulePath = jQuery.sap.getModulePath(appPath) as string
return appModulePath + "/doc-info-extraction"
}
To post a job to Document Information Extraction, “options” object is required as described in “API used for the application” section. For this sample app, options are configured as below.
{
"clientId": "default",
"extraction": {
"headerFields": [
"documentNumber",
"purchaseOrderNumber",
"documentDate",
"dueDate",
"grossAmount",
"currencyCode"
],
"lineItemFields": [
"description",
"quantity",
"unitOfMeasure",
"unitPrice",
"netAmount"
]
},
"documentType": "invoice"
}
2. Retrieving extraction results
When you press “Refresh” button on the screen, the app will try to fetch extraction status from /document/jobs/{id} endpoint. If it is done, extracted fields will be stored into view model and displayed on the UI.
* In a real-world scenario, it would be preferable to retrieve the results automatically, rather than having the user refresh the page.
public async onRefresh(): Promise<void> {
const response = await this._getStatus()
if (response.status === "DONE") {
this._setInvoiceData(response.extraction)
const viewModel = this.getView().getModel("viewModel") as JSONModel
viewModel.setProperty("/refreshEnabled", false)
viewModel.setProperty("/editable", true)
} else if (response.status === "PENDING") {
MessageToast.show((this.getResourceBundle() as ResourceBundle).getText("pendingText"))
}
}
private async _getStatus(): Promise<any> {
const dieUrl = this._getbaseUrl() + "/document/jobs" + "/" + this._jobId
const response = await fetch(dieUrl, {
method: 'GET'
})
return response.json()
}
private _setInvoiceData(extractedData: any): void {
const invoice = {}
//set header
const invoiceHeader = (extractedData.headerFields as Item[]).reduce((acc, curr) => {
acc[curr.name] = curr.value
return acc
}, {})
//set items
const invoiceItems = (extractedData.lineItems as Item[][]).reduce((acc, item) => {
const lineItem = item.reduce((acc, curr) => {
acc[curr.name] = curr.value
return acc
} , {})
acc.push(lineItem)
return acc
}, [])
invoice["header"] = invoiceHeader;
invoice["items"] = invoiceItems;
(this.getView().getModel("invoice") as JSONModel).setData(invoice)
}
Closing
In this blog post, I have demonstrated how to upload a document to Document Information Extraction service using UI5. I hope this post will help you implement your own scenarios, such as using custom document types.