NAV
shell

Gini Health API Documentation v2.0

Introduction

Gini provides the information extraction system for analyzing documents for health domain, such as doctor's invoices. The system is capable of extracting fields like the document sender or the amount to pay, line items as well as many other types of specific information from various invoice formats.

The API supports PDF, GIF, JPEG, PNG, and TIFF documents.

The information extraction starts when the document is sent to the extraction system. There it first gets verified and then classified as being native or scanned.

There is a difference between native and scanned PDF files. Native PDFs are created using Microsoft Word, Excel, Illustrator or other software that generates PDF files from source code. Scanned PDFs are created by scanning devices from the actual paper documents.

The native PDF documents already contain this information in the document source code and are processed accordingly. However, the scanned documents do not have the source code and therefore do not directly provide the information that can be easily read and understood by the system. Therefore, the extraction system has to apply Optical Character Recognition (OCR) and various computer vision techniques to obtain the document contents.

Once the layout and the textual contents become available for the uploaded document, the system starts extracting document semantic information such as the document sender (name, address) and meta information such as the document type (invoice, contract).

It might be so that the system is unable to extract the information correctly. This could most likely happen due to OCR errors caused by poor quality of the scanned document, incomplete textual data or quite specific document design format. In such cases it is still possible to correct the extractions by manually selecting the correct amount to pay on the document and submitting it back to the API. The extraction system will receive the feedback and help us to improve its self-learning algorithms over time.

If you have any questions about the Gini Health API and the functionality it provides, please contact us via api@gini.net.

Getting started

Welcome! In order to process your first document with the Gini Health API, you will have to perform the following easy steps:

  1. register your application
  2. obtain an access token
  3. upload a document
  4. check the document status information
  5. retrieve the extractions
  6. send feedback

For general information about the Gini Health API, see overview.

Register Your Application

Before you can use the Gini Health API in your application, you need a valid client ID and a client secret. If you don't have the client ID and the client secret already, please contact your sales representative.

Obtain an Access Token

obtain an access token

curl -v -X POST --data-urlencode 'username=random@example.org'
        --data-urlencode 'password=geheim'
        -H 'Content-Type: application/x-www-form-urlencoded'
        -H 'Accept: application/json'
        -u 'client-id:client-secret'
        'https://user.gini.net/oauth/token?grant_type=password'

the JSON response will look similar to

{
  "access_token":"6c470ffa-abf1-41aa-b866-cd3be0ee84f4",
  "token_type":"bearer",
  "expires_in":3599
}

6c470ffa-abf1-41aa-b866-cd3be0ee84f4 is the access token which can be used for API requests.

All requests to the Gini Health API are made on behalf of the user authorized by the access token. For now, let's assume that you've already created an anonymous user. If not, for the details on how to do so please read Direct Communication from Client Devices to the Gini Health API

In order to get an access token for the Gini account, run the example command on the right (don't forget to replace random@example.org with your username and geheim with your password as well as client-id with your client ID and client-secret with your client secret).

Upload a Document

upload a document

curl -v -X POST --data-binary '@/path/to/your/document.pdf'
     -H 'Accept: application/vnd.gini.v2+json'
     -H 'Authorization: BEARER b6c470ffa-abf1-41aa-b866-cd3be0ee84f'
     'https://health-api.gini.net/documents'

Now that you have the access token, you can upload your first document by sending an API request. The request must contain the correspoinding Gini API version number. For example for our first document, we will use Gini Health API version v2. The command on the right will send a request against the corresponding version of the API.

the response (in case the document was Accepted)

HTTP/1.1 201 Created
X-Request-Id: 7b5a7f79-ae7c-4040-b6cf-25cde58ad937
Location: https://health-api.gini.net/documents/b4bd3e80-7bd1-11e4-95ab-000000000000
Content-Type: application/vnd.gini.v2+json

If the file was accepted by the Gini Health API (i.e. its file format is supported), the extraction system automatically starts to process the document and responses with the HTTP status code 201 as well as the document location URL.

Check the Document Status Information

check the document status information

curl -v -H 'Accept: application/vnd.gini.v2+json'
        -H 'Authorization: BEARER b6c470ffa-abf1-41aa-b866-cd3be0ee84f'
  'https://health-api.gini.net/documents/b4bd3e80-7bd1-11e4-95ab-000000000000'

the response body will look similar to

{
  "_links": {
    "processed": "https:\/\/health-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/processed",
    "layout": "https:\/\/health-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/layout",
    "extractions": "https:\/\/health-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/extractions",
    "document": "https:\/\/health-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000"
  },
  "sourceClassification": "NATIVE",
  "origin": "UPLOAD",
  "progress": "PENDING",
  "creationDate": 1417710133864,
  "pages": [
    {
      "images": {
        "1280x1810": "https:\/\/health-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/pages\/1\/1280x1810",
        "750x900": "https:\/\/health-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/pages\/1\/750x900"
      },
      "pageNumber": 1
    }
  ],
  "pageCount": 1,
  "name": "Document",
  "id": "b4bd3e80-7bd1lll-11e4-95ab-000000000000"
}

The document processing takes a bit of time and in order to get the extractions, you need to check the status of the processed document periodically. The status can have the value PENDING, which means that the document is being analyzed, and COMPLETED, which means that the document analysis is complete and the extractions are ready for retrieval. Check the current document status by sending a GET request to the URL that you received when the document was uploaded. Once the status changes to COMPLETED, the extractions are ready and you can retrieve them.

Retrieve the Extractions

retrieve the extractions

curl -v -H 'Accept: application/vnd.gini.v2+json'
    -H 'Authorization: BEARER b6c470ffa-abf1-41aa-b866-cd3be0ee84f'
    'https://health-api.gini.net/documents/b4bd3e80-7bd1-11e4-95ab-000000000000/extractions'

The document extractions represent various document contents that the extraction system was able to understand and retrieve. In order to get all the extractions, send a request to the right (notice the API version v2).

example response

{
  "extractions": {
    "docType": {
      "value": "Invoice",
      "entity": "doctype",
      "confidence": 0.923
    },
    "amountToPay": {
      "candidates": "amounts",
      "box": {
        "page": 1,
        "height": 9.0,
        "width": 30.870000000000005,
        "left": 524.13,
        "top": 357.89
      },
      "value": "12.00:EUR",
      "entity": "amount"
    },
    "customerId": {
      "candidates": "customerIds",
      "box": {
        "page": 1,
        "height": 7.0,
        "width": 31.139999999999986,
        "left": 470.0,
        "top": 152.89
      },
      "confidence": 0.821,
      "value": "20980000",
      "entity": "customerid"
    },
    "invoiceId": {
      "candidates": "invoiceIds",
      "box": {
        "page": 1,
        "height": 7.0,
        "width": 38.920000000000016,
        "left": 470.0,
        "top": 143.89
      },
      "confidence": 0.971,
      "value": "3113805926",
      "entity": "invoiceid"
    },
    "senderName": {
      "candidates": "senderNames",
      "box": {
        "page": 1,
        "height": 7.0,
        "width": 52.56000000000001,
        "left": 41.87,
        "top": 88.84
      },
      "value": "Deutsche Post AG",
      "entity": "companyname"
    }
  },
  "compoundExtractions": {
    "lineItems": [
      {
        "sumNet": {
          "entity": "amount",
          "value": "12.00:EUR",
          "box": {
            "top": 355.83,
            "left": 525.17,
            "width": 38.92000000000007,
            "height": 10.0,
            "page": 1
          }
        },
        "taxRate": {
          "entity": "text",
          "value": "19 %",
          "box": {
            "top": 355.83,
            "left": 388.18,
            "width": 20.00999999999999,
            "height": 10.0,
            "page": 1
          }
        }
      },
      {
        "artNumber": {
          "entity": "text",
          "value": "10101",
          "box": {
            "top": 388.43,
            "left": 82.05,
            "width": 20.0,
            "height": 10.0,
            "page": 1
          }
        }
      }
    ]
  },
  "candidates": {
    "amounts": [
      {
        "box": {
          "page": 1,
          "height": 9.0,
          "width": 30.870000000000005,
          "left": 524.13,
          "top": 357.89
        },
        "value": "12.00:EUR",
        "entity": "amount"
      },
      {
        "box": {
          "page": 1,
          "height": 12.0,
          "width": 40.89999999999998,
          "left": 138.02,
          "top": 413.09
        },
        "value": "12.00:EUR",
        "entity": "amount"
      }
   ],
    "senderNames": [
      {
        "box": {
          "page": 1,
          "height": 7.0,
          "width": 52.56000000000001,
          "left": 41.87,
          "top": 88.84
        },
        "value": "Deutsche Post AG",
        "entity": "companyname"
      }
    ],
    "customerIds": [
      {
        "box": {
          "page": 1,
          "height": 7.0,
          "width": 31.139999999999986,
          "left": 470.0,
          "top": 152.89
        },
        "value": "20980000",
        "entity": "customerid"
      }
    ],
    "invoiceIds": [
      {
        "box": {
          "page": 1,
          "height": 7.0,
          "width": 38.920000000000016,
          "left": 470.0,
          "top": 143.89
        },
        "value": "3113805926",
        "entity": "invoiceid"
      }
    ]
  }
}

The returned object contains specific extractions (a value with some specific semantic property), compound extractions (a group of values with some specific semantic property) as well as candidates (a list of values for some semantic property).

The example response (shortened, on the right) is an invoice (see docType) issued by Deutsche Post AG with invoice number 3113805926 (see invoiceId). The receiver of the invoice has to pay 12€ (see amountToPay). It contains one line item (see lineItem) with an article number 10101, a tax rate 19% and an amount 12€.

Send Feedback and Get Even Better Extractions Next Time

Feedback is an API request containing the correct extractions that you can send us in order to improve the future extraction accuracy of the system. In fact your application should always send at least some feedback. The more complete and qualitative the feedback is, the sooner the extraction system learns what is correct and what's not. Feedback is critical to us and important to you because there is no other way for us know in realtime whether the extraction system is delivering the best possible quality for your application.

In order to inform the system the extraction was correct or incorrect, send back the correct value in the feedback request. It is important that it should be the value exactly as it appears on the actual document (not calculated or inferred). Once the feedback is received it gets compared to the extracted value and the result is used futher in reports and is included into a self-learning mechanism of the Gini extraction system.

Overview of the Gini Health API

This section provides general information about the Gini Health API. If you want a step-by-step guide how to upload your first document and retrieve its semantic content, have a look at the getting started guide.

IPv6 Compatibility

IPv6 compatibility example

    $ host health-api.gini.net
    health-api.gini.net has address 46.245.182.123
    health-api.gini.net has IPv6 address 2a00:14e0:600:1500:d0c5::7

    $ host user.gini.net
    user.gini.net has address 46.245.182.124
    user.gini.net has IPv6 address 2a00:14e0:600:1500:d0c5::2

Gini Health API and User Center are accessible from legacy IPv4 and IPv6 networks. The protocol precedence depends on your operating system and configuration if both protocols are enabled.

Media Types

the media types consumed and produced by the Gini Health API look like this

    application/vnd.gini.<version>+json

Custom media types are used in the API to let the consumers choose the version of the data format they wish to receive. This is done by adding one or more of the following media types to the Accept header when a request is made. media types are specific to resources, allowing them to change independently and supporting formats that other resources don't.

API Versions

Currently there is one stable version of the Gini Health API. Future versions can be requested using a specific Accept header. This is primarily for testing new extractions but may affect other parts of the API as well.

Version 2 (v2)

v2 media type

    Accept: application/vnd.gini.v2+json

Gini Health API v2 is stable and will remain backwards compatible. Please contact us via api@gini.net if you have any problems.

Developers are strongly encouraged to explicitly specify the required version of the Gini Health API using the HTTP/1.1 Accept header (see the example) because by default the requests are treated as requests to version 1 (v1) of the API.

Authentication

Only authenticated users are allowed to make API requests. The Gini API uses the OAuth 2.0 protocol with bearer tokens for authentication.

In order to use the API in your application, you first have to register your application with Gini. Afterwards your application should request an access token from the Gini Authorization Server and use it to access the Gini Health API.

Security

The Gini Health API is only accessible over HTTPS. Please make sure that your application validates the relevant X.509 certificates (e.g. common name matches hostname, issuing CA is trusted, etc.).

Client Errors

HTTP response codes

The API uses idiomatic HTTP status codes to indicate if a request was successful or not and whether it should be repeated.

Code Description
2xx The request was successful.
4xx The request was not successful. See the response body for details. Retrying with the same arguments will not work.
5xx Some error occurred while processing the request. Please try again.

Error Entity

error entity response

{
  "message": "Validation of the request entity failed",
  "requestId": "8896f9dc-260d-4133-9848-c54e5715270f"
}

In case of an error, the Gini Health API always returns a JSON object with further information about the occurred error. The JSON object consists of the following properties:

Name Type Description
message string Human consumable error description (not intended for application end-users)
requestId string Unique ID identifying the request. Please provide this when contacting our support.

Managing Anonymous Gini Accounts

In order to achieve best results, the Gini Health API must be able to track the requests down to individual users for the following reasons:

Gini offers various ways to perform requests on behalf of the individual users without requiring physical interaction. Depending on your use case and your product's architecture, the following API authentication methods are available:

Communicating with the Gini Health API via Backend / Gateway

request the list of user1's documents

curl -v -H 'Accept: application/vnd.gini.v2+json'
    -u 'client-id:client-secret'
    -H 'X-User-Identifier: user1'
    https://health-api.gini.net/documents

This authentication scheme is based on HTTP Basic Authentication. Your application needs to use HTTP Basic Authentication to authenticate itself with the Gini Health API. Additionally, another header called X-User-Identifier is sent together with the Authorization header in one request. This header is used by the API to identify individual users. Your application is free to choose whatever value it wants for the header, as long as the following constraints are met:

Direct Communication between Client Devices and the Gini Health API

Gini offers the User Center API (UC API) to work with the Gini users. Here is a quick step-by-step guide that outlines how to create and use a new anonymous Gini account. Each step links to the corresponding section in the UC API where you can read more details about it.

  1. obtain the client token
  2. create a new user
  3. log in as a new user
  4. make API requests with the access token

Authenticate the Client

obtain the client token

curl -v -H 'Accept: application/json'
    -u 'client-id:client-secret'
    'https://user.gini.net/oauth/token?grant_type=client_credentials'

the successful response will have HTTP status 200 and the client access token 1eb7ca49-d99f-40cb-b86d-8dd689ca2345 will be returned

{
  "access_token":"1eb7ca49-d99f-40cb-b86d-8dd689ca2345",
  "token_type":"bearer","expires_in":43199,"scope":"read"
}

Before you are able to use the UC API, you need to obtain a client access token. The client access token authorizes your client (i.e. your application) against the UC API and allows you to create a new user.

At this point it is assumed that you already have the client ID client-id and the client secret client-secret. These will authorize your client (with HTTP Basic Authentication) to obtain the client access token, see the example on the right.

For more details see the corresponding UC API section.

Create a New User

create a new user

curl -v -X POST --data '{"email":"random@example.org", "password":"geheim"}'
    -H 'Content-Type: application/json'
    -H 'Accept: application/json'
    -H 'Authorization: BEARER 1eb7ca49-d99f-40cb-b86d-8dd689ca2345'
    'https://user.gini.net/api/users'

the above command creates a new user random@example.org with password geheim. If the creation was successful, the HTTP response has status 201 and contains the Location header pointing to the new user. Your client is now allowed to create a new user authorized by the client access token.

Once the client access token is successfully obtained, it's time to create a new user. To do so we require two more values: a username and a password. The username must be represented by a correct email address whose domain part is easily linkable to your application. For example, if your company is called Example Inc. then app.example.org would be a good domain name to use for your application's user accounts.

For more details see the corresponding UC API section.

Authenticate on behalf of a New User

login as a new user

curl -v -X POST --data-urlencode 'username=random@example.org'
    --data-urlencode 'password=geheim'
    -H 'Content-Type: application/x-www-form-urlencoded'
    -H 'Accept: application/json' -u 'client-id:client-secret'
    'https://user.gini.net/oauth/token?grant_type=password'

After the new user is created, you can log in. Note that log in request uses HTTP Basic Authentication with the client ID as a username and with the client secret as a password. It does not require a client access token. The request response will contain an access token that can be used to make API requests on behalf of the new user.

For more details see the corresponding UC API section.

Make API Requests with the Access Token

use the access token you obtained to make API requests

GET /documents HTTP/1.1
Host: health-api.gini.net
Authorization: BEARER 760822cb-2dec-4275-8da8-fa8f5680e8d4
Accept: application/vnd.gini.v2+json
Connection: close

In order to make API requests, send the access token as a bearer token in the Authorization request header.

Documents

As the key aspect of the Gini Health API is to provide information extraction for analyzing documents, the API is mainly built around the concept of documents. A document can be any written representation of information such as invoices, reminders, contracts and so on.

The main idea is that you submit a document in the form of an electronic file to Gini. After the document has been analyzed by Gini you can get the information that is extracted from the document by querying the API.

The following documentation explains those actions in more detail.

Submitting Files

documents can be submitted by doing a POST request to the /documents resource.

    POST /documents

In order to extract document information, the document source file must be first submitted to Gini.

Submitting documents is as easy as sending a POST request to the /documents resource path. After successful submission the location of the new document is returned in the Location header.

The Gini Health API currently supports two different variants of uploads, one optimized for web applications running in a web browser and the other for all other types of clients.

The first variant optimized for web applications expects the documents to be uploaded using a multipart/form-data encoding method.

The second variant for all other clients simply uses the request body (independent of the request Content-Type) as the document.

Supported File Formats

Gini currently supports document files in PDF, GIF (non-animated), PNG, JPEG, TIFF as well as plain text formats. You can use native documents (PDF only) as well as scanned document (all other supported formats).

Note that there are certain limitations though:

The above applies both to single page documents as well as to each page in a multi-page document.

Document Type Hints

In many cases the type of a document is known to the client application. If you provide the doctype parameter with a valid type, Gini can optimize the processing of the document in various ways.

Document Uploading Schemes

The Gini API allows you to upload a document as a single file or in parts, page by page.

Upload a Single File Document

This is the standard way of uploading a document to the Gini extraction system. A PDF document can contain single or multiple pages. JPEG, PNG documents are also accepted. When it is uploaded, it is processed by the system normally and without any adjustments regarding its structure.

Request

upload a document

variant for web applications running in a web browser with access token:

curl -H 'Authorization: BEARER <token>'
    --form 'file=@file.pdf'
    -H 'Accept: application/vnd.gini.v2+json'
    -i https://health-api.gini.net/documents

variant for all other types of applications:

curl -H 'Authorization: BEARER <token>'
    --data-binary '@file.pdf'
    -H 'Accept: application/vnd.gini.v2+json'
    -i https://health-api.gini.net/documents?filename=file.pdf

or with X-User-identifier (see how to setup X-User-identifier):

curl -H 'X-User-Identifier: user1'
    --form 'file=@file.pdf'
    -H 'Accept: application/vnd.gini.v2+json'
    -u 'client-id:client-secret'
    -i https://health-api.gini.net/documents
Headers
Header Value
Content-Type multipart/form-data; boundary=...
*/*
Accept application/vnd.gini.v2+json
Requesting Query Parameters

If the upload is performed without multipart/form-data you can optionally provide a file name for the submitted document with a query parameter:

Name Type Description
filename string (Optional) File name of the submitted document.
doctype string (Optional) Type of the submitted document. See document types for possible values.
Body

Only in case of Content-Type: multipart/form-data (applications running in a web browser):

Key Description
Content-Disposition form-data
file File contents of document.

Response

Headers
Status Code Description
201 (Created) Operation is successful.
Header Value
Content-Type application/vnd.gini.v2+json
Location Absolute URI (created document URI). Can be used to check progress and receive document information.
Errors
Status Code Description
400 (Bad Request) Returned when a file is sent in an invalid format.
401 (Unauthorized) Authorization credentials are either missing, wrong or outdated.
415 Content type is not supported.
503 Service unavailable. Please retry later.

Upload a Document Page by Page

Partial upload should be performed in two steps: a partial documents upload and a composite document upload. Keep in mind that you must complete Step 1 first before moving to Step 2!

Step 1 (Upload Each Page as a Partial Document)

Pages that are part of the document are referred to as partial documents. If you want to upload a page (or a page picture) that belongs to the document, your request header should additionally include Content-Type field with application/vnd.gini.v2.partial+png value or application/vnd.gini.v2.partial+pdf in case of PDF page.

Request

upload a partial document

variant for web applications running in a web browser with access token:

curl -H 'Authorization: BEARER <token>'
    --form 'file=@file.JPEG'
    -H 'Accept: application/vnd.gini.v2+json'
    -H 'Content-Type: application/vnd.gini.v2.partial+png'
    -i https://health-api.gini.net/documents

variant for other types of applications with access token:

curl -H 'Authorization: BEARER <token>'
    --data-binary '@file.JPEG'
    -H 'Accept: application/vnd.gini.v2+json'
    -H 'Content-Type: application/vnd.gini.v2.partial+png'
    -i https://health-api.gini.net/documents?filename=file.JPEG

or with X-User-identifier (see how to setup X-User-identifier):

curl -H 'X-User-Identifier: user1'
    --form 'file=@file.JPEG'
    -H 'Accept: application/vnd.gini.v2+json'
    -H 'Content-Type: application/vnd.gini.v2.partial+png'
    -u 'client-id:client-secret'
    -i https://health-api.gini.net/documents

In order to upload a page picture or a partial document, specify a different content type:

Headers
Header Value
Content-Type application/vnd.gini.v2.partial+png
application/vnd.gini.v2.partial+pdf
Accept application/vnd.gini.v2+json
Request Query Parameters
Name Type Description
filename string (Optional) File name of the submitted document.
doctype string (Optional) Type of the submitted document. See document types for possible values.
Body

File contents of the document.

Response
Headers
Status Code Description
201 (Created) Operation is successful.
Header Value
Content-Type application/vnd.gini.v2+json
Location Absolute URI (created partial document URI) which should be referred by a composite document.
Errors
Status Code Description
400 (Bad Request) Returned when the sent file has invalid format.
401 (Unauthorized) Authorization credentials are either missing, wrong or outdated.
415 Content type is not supported.
503 Service unavailable. Please retry later.

Step 2 (Upload JSON as a Composite Document)

After successfully uploading all pages as partial documents, you should announce their locations to the extraction system. This is done by uploading a composite document which is a simple JSON file with corresponding locations of partial documents.

Request

upload a composite document

variant for web applications running in a web browser with access token:

curl -H 'Authorization: BEARER <token>'
    -H 'Accept: application/vnd.gini.v2+json'
    -H 'Content-Type: application/vnd.gini.v2.composite+json'
    -X POST -d'{...}'
    -i https://health-api.gini.net/documents

or

curl -H 'Authorization: BEARER <token>'
    --data-binary '@data.json'
    -H 'Accept: application/vnd.gini.v2+json'
    -H 'Content-Type: application/vnd.gini.v2.composite+json'
    -i https://health-api.gini.net/documents

or with X-User-identifier (see how to setup X-User-identifier):

curl -H 'X-User-Identifier: user1'
    -H 'Accept: application/vnd.gini.v2+json'
    -H 'Content-Type: application/vnd.gini.v2.composite+json'
    -u 'client-id:client-secret'
    -X POST - d'{...}'
    -i https://health-api.gini.net/documents

with post body of

{
  "partialDocuments": [
    {
      "rotationDelta": 0,
      "document": "localtion of parital doc 1"
    },
    {
      "rotationDelta": 0,
      "document": "location of partial doc 2"
    },
    ...
  ]
}

location in form of https://health-api.gini.net/documents/e8606210-56ed-11ea-b823-b351b84ae4b3

In order to upload a composite document which aggregates one or multiple partial documents, a different content type needs to be specified.

Headers
Header Value
Content-Type application/vnd.gini.v2.composite+json
Accept application/vnd.gini.v2+json
Request Query Parameters

If the upload is performed without multipart/form-data you can optionally provide a file name for the submitted document with a query parameter:

Name Type Description
filename string (Optional) File name of the submitted document.
Body

Raw bytes of the composite json.

Key Description
partialDocuments A list of partial documents (the location is returned after the partial documents are successfully uploaded).
Response
Headers
Status Code Description
201 (Created) Operation is successful.
Header Value
Content-Type application/vnd.gini.v2+json
Location Absolute URI of created document (document URI) to check progress and getting document information.
Errors
Status Code Description
400 (Bad Request) Returned when a file in an invalid format is sent
401 (Unauthorized) Authorization credentials are either missing, wrong or outdated.
415 content type not supported.
503 Service unavailable. Please retry later.

Checking Processing Status and Getting Document Information

document information can be retrieved by sending a GET request to the document URI.

    GET /documents/{id}

Once the document is submitted and processed you can check its processing status by examining the document information. It can be retrieved with a GET request containing the document URI. When the document has been processed you can retrieve its extractions and layout.

Request

get document processing status

curl -H 'Authorization: BEARER <token>'
    -X GET -H 'Accept: application/vnd.gini.v2+json'
    -i https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000
Headers
Header Value
Accept application/vnd.gini.v2+json

Response

request response

{
  "id": "626626a0-749f-11e2-bfd6-000000000000",
  "creationDate": 1360623867402,
  "name": "scanned.jpg",
  "progress": "COMPLETED",
  "origin": "UPLOAD",
  "sourceClassification": "SCANNED",
  "pageCount": 1,
  "_links": {
    "extractions": "https://health-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000/extractions",
    "layout": "https://health-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000/layout",
    "document": "https://health-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000",
    "processed": "https://health-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000/processed"
  }
}
Headers
Status Code Description
200 (OK) Operation is successful.
Header Value
Content-Type application/vnd.gini.v2+.json
Body (application/vnd.gini.v2+json)
Key Child Key Type Description
id string Document qnique identifier (such as UUID Version 1).
name string Document name (as stated in upload).
pageCount number Number of pages.
creationDate number Document creation unix timestamp (in milliseconds).
origin string Document source channel: UPLOAD (if uploaded via Gini Health API) or UNKNOWN.
progress string Document processing status: PENDING, COMPLETED or ERROR.
sourceClassification string Classification of the source file: SCANNED, SANDWICH, NATIVE or TEXT.
pageNumber number Document page number.
images object Pre-rendered page image URIs.
_links array List of related resources, e.g. found extractions or document layout.
extractions string Document extractions URI
layout string Document layout URI.
processed string Processed document URI.
document string Document URI.

Errors

Status Code Description
404 (Not Found) Returned when no document can be found under specific URI.

Retrieving Extractions

extractions can be retrieved by performing a GET request with the extractions URI:

    GET /documents/{id}/extractions

Once the document is processed, the document extractions become available for retrieval. See document extractions for more details.

Request

get extractions

curl -H 'Authorization: BEARER <token>'
    -X GET -H 'Accept: application/vnd.gini.v2+json'
    -i https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/extractions
Headers
Header Value
Accept application/vnd.gini.v2+json

Response

example response

{
   "extractions": {
       "amountToPay": {
           "box": {
               "height": 9.0,
               "left": 516.0,
               "page": 1,
               "top": 588.0,
               "width": 42.0
           },
           "confidence": 0.715,
           "entity": "amount",
           "value": "24.99:EUR",
           "candidates": "amounts"
       }
     },
     "candidates": {
       "amounts": [
         {
             "box": {
                 "height": 9.0,
                 "left": 516.0,
                 "page": 1,
                 "top": 588.0,
                 "width": 42.0
             },
             "entity": "amount",
             "value": "24.99:EUR"
         },
         {
             "box": {
                 "height": 9.0,
                 "left": 241.0,
                 "page": 1,
                 "top": 588.0,
                 "width": 42.0
             },
             "entity": "amount",
             "value": "21.0:EUR"
         }
       ]
       ...
   }
}
Headers
Status Code Description
200 (OK) Operation is successful.
Header Value
Content-Type application/vnd.gini.v2+json
Body (application/vnd.gini.v2+json)

A detailed explanation of the response format can be found in document extractions section.

Name Type Description
extractions object Labels to extractions mapping (i.e. specific-extractions).
candidates object A mapping of labels to a list of extraction-candidates.

Errors

Status Code Description
404 (Not Found) Requested entity couldn't be found.

Submitting Feedback on Extractions

You should always submit the feedback on extractions in order to help the system improve its extraction quality.

Gini employs various machine learning techniques in order to learn from feedback automatically. Therefore, it is equally important for Gini to receive both feedback on correct and on incorrect extractions. There are currently two ways to submit the feedback. The first and the most common one is to submit the complete feedback in one request. This is the easiest way if your frontend (application) displays the extractions on a screen in an editable form. A user can modify the extractions before pressing the confirmation button. Another way (a rare use case) is where the final approvement signal (button click) is not possible. In such case you can send the feedback on one label per request.

There are three different types of feedback:

Please see detailed examples next.

Submitting Feedback on Extractions

submitting feedback on extractions


    PUT /documents/{id}/extractions

    POST /documents/{id}/extractions

The Gini Health API allows you to submit the feedback on multiple extractions for a single document with a single request. It is strongly recommended that you submit your feedback in this way for two reasons. On the one hand, the total number of round trips is reduced to one and the feedback is handled internally as a batch. Therefore, the update is more efficient for multiple extractions compared to submitting the feedback with each separate request (see single feedback). On the other hand, Gini's machine learning training techniques can benefit from the feedback on multiple extractions since Gini will be aware of the fact that single parts of the submitted feedback belong together.

Request

Example

We show a more elaborated example here in order to explain different types of the feedback. The example scenario is as follows: the user uploads a document where the labels amountToPay, paymentReference, iban are extracted. Unfortunately the label paymentRecipient could not be extracted. The response to the extractions request is as follows:

{
    "candidates": {
    },
    "extractions": {
        "amountToPay": {
            "box": {
                "height": 8.0,
                "left": 545.0,
                "page": 1,
                "top": 586.0,
                "width": 17.0
            },
            "candidates": "amounts",
            "entity": "amount",
            "value": "5.60:EUR"
        },
        "iban": {
            "box": {
                "height": 7.0,
                "left": 447.0,
                "page": 1,
                "top": 746.0,
                "width": 100.0
            },
            "candidates": "ibans",
            "entity": "iban",
            "value": "DE68130300000017850360"
        },
        "paymentReference": {
            "entity": "reference",
            "value": "ReNr 123, KdNr 32"
        }
    },
    "compoundExtractions": {
        "lineItems": [
            {
              "artNumber": {
                "value": "10101",
                "entity": "text" ,
                "box": {
                    "height": 7.0,
                    "left": 55.0,
                    "page": 1,
                    "top": 546.0,
                    "width": 100.0
                }
              },
              "quantity": {
                "value": "12",
                "entity": "numeric"
              }
            },
            {
              "artNumber": {
                "value": "10103",
                "entity": "text"
              },
              "quantity": {
                "value": "3",
                "entity": "numeric"
              }
            }
        ],
        "taxItems": [
            {
              "taxRate": {
                "value": "19.0%",
                "entity": "text"
              },
              "taxAmount": {
                "value": "30.00:EUR",
                "entity": "amount"
              }
            }
        ]
    }
}

The user adds missing paymentRecipient value (complementary feedback) and corrects the paymentReference to "ReNr 1735, KdNr 37" (negative feedback). Corrects one of the line items quantity from 12 to 17. The iban, amountToPay, taxItems and the remaining part of lineItems are correct (positive feedback). The document is not shown, so we can leave out the boxes. Then the resulting feedback request is as follows:

{
   "extractions": {
       "amountToPay": {
           "value": "5.60:EUR"
       },
       "iban": {
           "value": "DE68130300000017850360"
       },
       "paymentReference": {
           "value": "ReNr 1735, KdNr 37"
       },
       "paymentRecipient": {
           "value": "Zalando SE"
       }
   },
 "compoundExtractions": {
        "lineItems": [
            {
              "artNumber": {
                "value": "10101"
              },
              "quantity": {
                "value": "17"
              }
            },
            {
              "artNumber": {
                "value": "10103"
              },
              "quantity": {
                "value": "3"
              }
            }
        ],
        "taxItems": [
            {
              "taxRate": {
                "value": "19.0%",
                "entity": "text"
              },
              "taxAmount": {
                "value": "30.00:EUR",
                "entity": "amount"
              }
            }
        ]
    }
}

Give feedback and correct or verify multiple specific labeled extraction patterns with a single PUT or POST request to the document extractions URI.

The labels must correspond to the names of the extraction types e.g. amountToPay. See available specific extractions for possible values.

Headers
Header Value
Content-Type application/vnd.gini.v2+json
Body
Key Type Description
extractions object Feedback on atomic extractions
compoundExtractions object Feedback on compound extractions

Response

Status Code Description
204 (No Content) The feedback was successfully processed.
404 (Not Found) The document or the label could not be found.
422 (Unprocessable Entity) At least one value was not valid regarding entity validation rules of the label.

Submitting Feedback on Invalid Extractions

submitting feedback on invalid extractions

    DELETE /documents/{id}/extractions/{label}

Request

In case an extraction was erroneously found (i.e. not present in the source document), you can delete it by issuing a DELETE request to the extraction URI:

Response

Status Code Description
204 (No Content) Label removal was successful.
404 (Not Found) The document or the label could not be found.

Retrieving Document Pages

retrieving document pages

    GET /documents/{id}/pages

The Gini Health API renders preview images of the document pages. In order to retrieve a list of pages for a document, issue a GET request to the pages sub-resource of the document.

Request

Path Parameters

retrieve document pages

curl -H 'Authorization: BEARER <token>'
    -X GET -H 'Accept: application/vnd.gini.v2+json'
    -i https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages
Name Value
id Document ID
Headers
Header Value
Accept application/vnd.gini.v2+json

Response

the response will be a list of pages.

[
  {
    "images" : {
      "1280x1810" : "https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/1/1280x1810",
      "750x900" : "https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/1/750x900"
    },
    "pageNumber" : 1
  },
  {
    "pageNumber" : 2,
    "images" : {
      "1280x1810" : "https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/2/1280x1810",
      "750x900" : "https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/2/750x900"
    }
  }
]
Headers
Status Code Description
200 (OK) The request was successful.
404 (Not Found) The requested document does not exist.
Body
Name Type Description
pages array All pages in the current result page.

A page is an entity with the following fields:

Key Child key Type Description
documentId string UUID of the document which the page belongs to.
pagenum number Page number.
_links object Links to related resources.
document string Link to the document to which the page belongs.
pages string Link to the pages of the document.
_images object Links to pre-rendered page images in different resolutions.
image resolution in pixels string Link to a pre-rendered image of the page.

Retrieving the Layout of a Document

The layout of the document describes the textual content of a document with positional information, based on the processed document.

Request

retrieving a layout of the document


    GET /documents/{id}/layout

Example

curl -H 'Authorization: BEARER <token>'
    -X GET -H 'Accept: application/vnd.gini.v2+json'
    -i https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/layout

The layout of the document can be retrieved by a GET request to the layout URI:

Headers
Header Value
Accept application/vnd.gini.v2+json

Response

layout example

{
  "pages": [
    {
      "number": 1,
      "sizeX": 595.3,
      "sizeY": 841.9,
      "textZones": [
        {
          "paragraphs": [
            {
              "l": 54.0,
              "t": 158.76,
              "w": 190.1,
              "h": 36.55000000000001,
              "lines": [
                {
                  "l": 54.0,
                  "t": 158.76,
                  "w": 190.1,
                  "h": 10.810000000000002,
                  "wds": [
                    {
                      "l": 54.0,
                      "t": 158.76,
                      "w": 18.129999999999995,
                      "h": 9.900000000000006,
                      "fontSize": 9.9,
                      "fontFamily": "Arial-BoldMT",
                      "bold":false,
                      "text": "Ihre"
                    },
                    {
                      "l": 74.86,
                      "t": 158.76,
                      "w": 83.91000000000001,
                      "h": 9.900000000000006,
                      "fontSize": 9.9,
                      "fontFamily": "Arial-BoldMT",
                      "bold":false,
                      "text": "Vorgangsnummer"
                    },
                    {
                      "l": 158.76,
                      "t": 158.76,
                      "w": 3.3000000000000114,
                      "h": 9.900000000000006,
                      "fontSize": 9.9,
                      "fontFamily": "Arial-BoldMT",
                      "bold":false,
                      "text": ":"
                    },
                    [...]
                  ]
                },
                [...]
              ]
            }
          ]
        }
      ],
      "regions": [
        {
          "l": 20.0,
          "t": 240.1,
          "w": 190.0,
          "h": 150.3,
          "type": "RemittanceSlip"
        },
        [...]
      ]
    },
    [...]
  ]
}
Headers
Status Code Description
200 (OK) Operation is successful.
Header Value
Content-Type application/vnd.gini.v2+json
Body (application/vnd.gini.v2+json)
Key Type Description
pages array Array of page objects.
Page Object
Key Type Description
number number Number of the page starting with 1.
sizeX number Width of the page.
sizeY number Height of the page.
textZones array Array of textzone objects.
regions array Array of region objects.
TextZone Object
Key Type Description
paragraphs array Array of paragraph objects
Paragraph Object
Key Type Description
w number Width of the paragraph.
h number Height of the paragraph.
t number Distance of the paragraph from the upper edge of the page.
l number Distance of the paragraph from the left edge of the page.
lines array Array of line objects.
Line Object
Key Type Description
w number Width of the line.
h number Height of the line.
t number Distance of the line from the upper edge of the page.
l number Distance of the line from the left edge of the page.
wds array Array of word objects.
Word Object
Key Type Description
h number Height of the word.
w number Width of the word.
l number Distance of the word from the left edge of the page.
t number Distance of the word from the upper edge of the page.
fontSize number Font size of the word in points.
fontFamily string Name of the font family of the word.
bold boolean Indicates bold font style.
text string Text of word.
Region Object
Key Type Description
h number Height of the region of interest.
w number Width of the region of interest.
l number Distance of the region from the left edge of the page.
t number Distance of the region from the upper edge of the page.
type string Type of the region of interest, e.g. RemittanceSlip.

Errors

Status Code Description

404 (Not Found) The requested layout is invalid.

Retrieving the Processed Document

Request

retrieve the processed document

    GET /documents/{id}/processed

Before Gini tries to extract the information, it preprocesses the document, performing page deskewing, homography transformation, etc. The processed document can be retrieved by a GET request:

Path parameters

Name Value
id Document ID

Response

Headers

Status Code Description
200 (OK) Operation is successful.

Body

The version of the uploaded document file after preprocessing (color corrected, deskewed) which has been used for all layout and semantic extractions. In case of native PDF documents it is identical to the original document file.

Errors

Status Code Description
404 (Not Found) The requested document does not exist.

Deleting Documents

delete documents


    DELETE /documents/{id}

If you want to delete a document you can do this by sending a DELETE request to the document URI. When the document is deleted all associated resources (extractions, layout) will be deleted as well.

Request

delete request

curl -H 'Authorization: BEARER <token>'
    -X DELETE -i https://health-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000

Delete the document by sending a DELETE request to the document URI.

Response

Headers

Status Code Description
204 (No Content) Operation is successful.

Errors

Status Code Description
404 (Not Found) Returned when no document can be found under the specific URI.

Getting a List of All Documents

get a list of all documents

    GET /documents

In order to get the list of all documents, send a GET request to the /documents resource. The response will contain a paginated list of all documents.

Request Query Parameters

example request

curl -H 'Authorization: BEARER <token>'
    -H 'Accept: application/vnd.gini.v2+json'
    -X GET -i https://health-api.gini.net/documents?limit=50
Name Type Description
limit number (Optional) Maximum number of documents to return (default 20).
offset number (Optional) Starting offset (default 0).

Response

example response

{
  "documents": [
    {...},
    {...},
    ...
  ]
}

The response will contain a paginated list of documents.

Headers
Status Code Description
200 (OK) Operation is successful.
Body

The response entity has the following fields:

Name Type Description
documents array All documents of the current result page.

Document Extractions

Structured documents contain a lot of valuable information. For instance, invoices or remittance slips contain amounts to be paid, bank data, receiver, sender, address etc. Doctor prescriptions contain drug amounts, names and descriptions as well as other sensitive information. Receipts contain line-items with corresponding prices and tax values.

The Gini extraction system is able to extract this information and provide it in a structured form accessible through its Health API. From now on we shall refer to such data as the document extractions. Some extractions are shared between certain documents e.g. amount to pay, sender, date, line-item and others are quite unique e.g. medical treatment, time of receipt issue, invoice id.

The extraction system however does a little bit more than just information extraction. For example, certain invoices contain blobs of text with due date information without a specific date value or tag but an explanation of when the invoice is supposed to be paid. The Gini system is capable of inferring this data and converting it to the actual date value. We shall refer to this kind of information as specific extractions.

Additionally, Gini system groups various semantically related terms into compound extractions. For example, IBAN and BIC belong to the single compound bankData, tax rate and tax amount to taxItems and a group of items for purchase comprise lineItems.

Extractions

Extraction

{
  "entity": "date",
  "value": "2012-06-20",
  "box": { ... }
  "confidence": 0.997
}

An extraction contains an entity which describes a general semantic type of the extraction (e.g. a date, an address, an amount). The entity also determines the format of the value containing text information. There may be an optional box element describing the position of the extraction value on the document. We refer to it as the bounding box. In most cases the extractions without a bounding box are considered to be meta information such as doctype. Additionally, we provide the confidence of extraction correctness.

Name Type Description
entity string Key (primary identification) of an entity type (e.g. banknumber). See available extraction entities for possible values.
value string A normalized textual representation of the Text/Information provided by the extraction value (e.g. bank number without spaces between the digits).
box bounding-box (Optional) bounding box containing the position of the extraction value on the document.
confidence float Confidence of the extraction being correct.

Specific Extractions

specific extractions

{
  "paymentDueDate": {
      "entity": "date",
      "value": "2012-06-20",
      "box": { ... },
      "candidates": "dates"
  }
}

A specific extraction assigns a semantic property to the extraction. It also has an additional candidates field:

Name Type Description
candidates string (Optional) A reference to extraction candidates. See available extraction candidates for possible values.

Available Specific Extractions

Name Description Entity Candidates
amountToPay The amount which yet to be paid. amount amounts
bankAccountNumber The account number of a payment recipient. bankaccount bankAccountNumbers
bankNumber The bank number of a payment recipient. banknumber bankNumbers
bic The bic of a payment recipient. bic bics
branchId The branch id of a receipt. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. text n/a
companyRegisterId The Commercial Registry number of a document sender. companyregisterid companyRegisterIds
customerId The customer Id of a document recipient. customerid customerIds
docType The document type of a given document. doctype n/a
documentDate The document date. date dates
documentTime The document time. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. time times
documentDomain The domain of a current document. documentdomain n/a
email The most probable email address of a sender email emails
grossAmount The invoiced amount (tax included). amount n/a
iban The IBAN of a document sender. iban ibans
invoiceId The invoice Id of a given document. invoiceid invoiceIds
netAmount The net amount of an invoice. amount n/a
paymentDueDate The calculated payment due date (e.g. of an invoice). date dates
paymentMethod The payment method of a receipt. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. text n/a
paymentPurpose The extra payment purpose text when the payment reference is not available Note: Currently only available for clients in Austria. text n/a
paymentRecipient The payment recipient, beneficiary of a money transfer activity companyname senderNames
paymentReference The payment reference. reference n/a
paymentState If a document has yet to be paid or is paid already. paymentstate n/a
phoneNumber The first found phoneNumber in a given document. phonenumber phoneNumbers
receiptNumber The number of the receipt. Note that this extraction is only available if the document was uploaded with doctype hint Receipt. See Document Type Hints for details. invoiceid receiptNumbers
recipient The document’s recipient. (Deprecated: use individual recipient subfields) recipient n/a
recipientName The document’s recipient name. text n/a
recipientNameAddition The document’s recipient name addition. text n/a
recipientStreet The document’s recipient street address. street n/a
recipientCity The document’s recipient city. city n/a
recipientPostalCode The document’s recipient postal code. zipcode n/a
recipientPoBox The document’s recipient postal box. poboxnumber n/a
referenceId The first found reference id in a given document. text referenceIds
senderCity The sender city. city n/a
senderName The sender name. companyname senderNames
senderNameAddition The sender name addition. companynameaddition n/a
senderPoBox The sender post-office box. poboxnumber n/a
senderPostalCode The sender’s postal code. zipcode n/a
senderStreet The sender’s street with house number. street n/a
taxNumber The tax number of a document sender. taxnumber taxnumbers
templateId (Optional) The template id when the layout of document meets a certain template (available to clients who choose the template option) text n/a
transactionId The transaction id of a receipt when it is paid per card. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. text n/a
vatRegNumber The VAT number of a document sender. vat vatRegNumbers
website The most probable web address of a sender. url websites

Compound Extractions

Compound Extraction describe a group of extractions.

Available Compound Extractions

compound extractions example

{
 "compoundExtractions": {
     "lineItems": [
       {
         "sumNet": {
           "entity": "amount",
           "value": "172.48:EUR",
           "box": {
             "top": 355.83,
             "left": 525.17,
             "width": 38.92000000000007,
             "height": 10.0,
             "page": 1
           }
         },
         "description": {
           "entity": "text",
           "value": "Zählerstand : 539 kWh 04.03.2019 - 05.04.2019",
           "box": {...
           }
         },
         "taxRate": {
           "entity": "text",
           "value": "19 %",
           "box": {...
           }
         }
       },
       {
         "artNumber": {
           "entity": "text",
           "value": "kWh",
           "box": {...
           }
         },
         "sumNet": {
           "entity": "amount",
           "value": "128.64:EUR",
           "box": {...
           }
         },
         "description": {
           "entity": "text",
           "value": "Zählerstand : 402 kWh 04.03.2019 - 05.04.2019",
           "box": {...
           }
         }
       }
     ],
     "taxItems": [
       {
         "taxRate": {
           "entity": "text",
           "value": "19.0 %"
         },
         "taxAmount": {
           "entity": "amount",
           "value": "57.21:EUR",
           "box": {...
           }
         }
       }
     ]
   }
}
Name Description Children Children Description Entity
lineItems Invoice line items describe the details of purchased items. artNumber article number text
baseGross gross amount of 1 unit amount
baseNet net amount of 1 unit amount
deliveryDate delivery date text
description description of the item text
position position of the item text
quantity quantity in the units of the item numeric
sumGross gross amount of all the units of the item amount
sumNet net amount of all the units of the item amount
taxAmount tax amount of all the units of the item amount
taxRate tax rate of the item text
unit unit of the item text
taxItems Taxes sum and their corresponding rates. taxRate tax rate (in percentage) text
taxAmount tax amount of the rate amount
bankData iban and bic iban iban of one entity iban
bic bic of the same entity bic

Extraction Candidates

Extraction candidates represent a list of suggestions for an appropriate extraction.

Available Extraction Candidates

extraction candidates

{
    "dates": [
      {"entity": "date","value": "2012-06-20","box": { ... } },
      {"entity": "date","value": "2012-05-10","box": { ... } },
      ...
    ]
}
Name Description Entity
amounts All amounts of a given document. amount
bankAccountNumbers All account numbers of a given document. bankaccount
bankNumbers All bank numbers of a given document. banknumber
bics All BICs of a given document. bic
companyRegisterIds All alphanumeric strings (of a similar structure as a German company register id) of a given document. companyregisterid
customerIds All alphanumeric strings (of a similar structure as an identifier) of a given document. customerid
dates All dates of a given document. date
emails All emails of a given document. email
ibans All IBANs of a given document. iban
invoiceIds All alphanumeric strings (of similar structure as an identifier) of a given document. invoiceid
phoneNumbers All phone numbers of a given document. phonenumber
receiptNumbers All potential receipt numbers of a given document. invoiceid
referenceIds All potential reference id numbers of a given document. text
senderNames All possible sender names of a given document. companyname
taxNumbers All strings of digits (of a similar structure as a German tax number) of a given document. taxnumber
times All times of a given document. time
vatRegNumbers All alphanumeric strings (of a similar structure as an identifier) of a given document. vat
websites All links found in a given document. url

Extraction Entities

Available extraction entities list (follow each link for a detailed description):

Bounding Box

Bounding Box

{
    "box": {
      "page": 2,
      "left": 483.0,
      "top": 450.0,
      "width": 51.0,
      "height": 9.0
    }
}

A bounding box creates a direct relation between an extraction and a document. The box describes the page and the position where the extraction originates from.

Name Type Description
left number The distance from the left edge of the page
top number The distance from the top edge of the page
width number The horizontal dimension of a box
height number The vertical dimension of a box
page number The page on which the box can be found, starting with 1

Coordinate System

The origin of the coordinate system is adjusted to the upper left corner of the page. The coordinate system uses the DTP point as unit: 1 pt = 1 inch / 72 = 25.4 mm / 72 = 0.3528 mm.

Extraction Confidence

confidence example

{
    "entity": "recipient",
    "value": "Max Mustermann Musterstrasse 1 Musterstadt",
    "box": {
        "top": 379.0,
        "left": 68.0,
        "width": 244.0,
        "height": 10.0,
        "page": 1
    },
    "confidence": 0.955
}

We believe it is important to provide the information about how confident our system is of performed extractions. Therefore, we implemented a mechanism that allows to predict the confidence of document extractions. The confidence prediction algorithm estimates the chance of delivering the correct extraction based on previous extractions and your feedback. For example, we introduced an additional JSON field confidence that is an optional part of each document extraction now.

Due to the nature of the algorithm and the amount of feedback we are receiving, the system cannot deliver a confidence value for every single extraction. Keep in mind that a portion of document extractions will not have confidence field. However, if the field exists we estimate it's reliability between 98%-100%. For example, if the system returns netAmount with "confidence": 0.95 there is very little chance that the extraction is incorrect and you can save resources on proofreading.

Entity Reference

amount

amount

{
    "entity": "amount",
    "value": "33.78:EUR",
    "box": {
        "page": 1,
        "left": 535.0,
        "top": 395.0,
        "width": 25.0,
        "height": 10.0
    }
}

Describes an amount of money with a specific currency in the format <Amount>:<Currency Code>, where <Amount> is a decimal number with "." as decimal separator and ":" as delimiter between <Amount> and <Currency Code>.

The currency code must be given according to the list specified in ISO 4217.

Format
Name Type Description
entity string Must be amount.
value string Amount in the defined format.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
<Number>:<Currency Code/Symbol> 12.3:EUR; 12,4:USD; 12.98:USD

<Number> <Currency Code/Symbol> (1-space-separation) 12,3 EUR; 12,4 USD; 12 € <Currency Code/Symbol> <Number> (1-space-separation) EUR 12.3; \$ 12.4

bankaccount

bankaccount

{
    "entity": "bankaccount",
    "value": "1597880",
    "box": {
        "page": 1,
        "left": 506.0,
        "top": 777.0,
        "width": 53.0,
        "height": 6.0
    }
}

Describes a bank account number.

Format
Name Type Description
entity string Must be bankaccount.
value string Bank account number in the normalized form (without spaces between digits).
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
Digits > 1597880

If the string has less than 3 digits, it will be rejected.

banknumber

banknumber

{
    "entity": "banknumber",
    "value": "70250150",
    "box": {
        "page": 1,
        "left": 147.0,
        "top": 427.0,
        "width": 52.0,
        "height": 8.0
    }
}

Describes a bank number.

Format
Name Type Description
entity string Must be banknumber.
value string Bank number in the normalized form (without spaces between digits).
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
8 digits > 70250150

bic

Describes a BIC number.

bic

{
    "entity": "bic",
    "value": "GENODEF1HH2",
    "box": {
        "page": 1,
        "left": 506.0,
        "top": 777.0,
        "width": 53.0,
        "height": 6.0
    }
}
Format
Name Type Description
entity string Must be bic.
value string BIC number in the normalized form (without spaces between digits and letters).
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
String matching BIC format GENODEF1HH2

city

city

{
    "entity": "city",
    "value": "München",
    "box": {
        "page": 1,
        "left": 535.0,
        "top": 395.0,
        "width": 25.0,
        "height": 10.0
    }
}

Describes a city.

Format
Name Type Description
entity string Must be city.
value string The city name.
box bounding-box Bounding box of the occurrence including the page number.

companyname

companyname

{
    "entity": "companyname",
    "value": "Weinquelle Lühmann",
    "box": {
        "page": 1,
        "left": 535.0,
        "top": 395.0,
        "width": 25.0,
        "height": 10.0
    }
}

Describes a (sender) company name.

Format
Name Type Description
entity string Must be companyname.
value string The company name.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
Random string with at least 2 letter/digit characters O2, BMW, ABC GmbH

A string with a single letter/digit character will be rejected.

companynameaddition

companynameaddition

{
    "entity": "companynameaddition",
    "value": "Kundenservice",
    "box": {
        "page": 1,
        "left": 535.0,
        "top": 395.0,
        "width": 25.0,
        "height": 10.0
    }
}

Describes a (sender) company name addition (e.g. Kundenservice).

Format
Name Type Description
entity string Must be companynameaddition.
value string The company name addition.
box bounding-box Bounding box of the occurrence including the page number.

companyregisterid

companyregisterid

{
    "entity": "companyregisterid",
    "value": "HRB:108514:München",
    "box": {
        "page": 1,
        "left": 525.0,
        "top": 805.0,
        "width": 34.0,
        "height": 6.0
    }
}

Describes a German Register number in the format <Area of the Commercial Registry>:<Number>:<Office of the Registry> with ":" as a delimiter between the components.

Format
Name Type Description
entity string Must be companyregisterid.
value string Register number in the defined format.
box bounding-box Bounding box of the occurrence including the page number.

currency

currency

{
    "entity": "currency",
    "value": "EUR",
    "box": {
        "page": 1,
        "left": 535.0,
        "top": 395.0,
        "width": 25.0,
        "height": 10.0
    }
}

Describes the currency of the document.

The currency code must be given according to the list specified in ISO 4217.

Format
Name Type Description
entity string Must be currency.
value string Currency in the defined format.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example

Euro currency symbol € Euro Currency name/code | EUR, EURO

Since only German documents are supported at the moment, the following strings are allowed: €, EUR, EURO

customerid

customerid

{
    "entity": "customerid",
    "value": "M500721563",
    "box": {
        "page": 1,
        "left": 317.0,
        "top": 123.0,
        "width": 158.0,
        "height": 8.0
    }
}

Describes a customer ID.

Format
Name Type Description
entity string Must be customerid.
value string Customer ID in the normalized form (without spaces between digits and letters).
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
A string with length >= 1 and at least 1 digit 12345, KD678

date

date

{
    "entity": "date",
    "value": "2012-11-16",
    "box": {
        "page": 1,
        "left": 429.0,
        "top": 143.0,
        "width": 40.0,
        "height": 8.0
    }
}

Describes a date in the format <Year>-<Month>-<Day> with "-" as a delimiter between the date components.

Format
Name Type Description
entity string Must be date
value string Date in the defined format
box bounding-box Bounding box of the occurrence including the page number
Valid Feedback
Form Example
yyyy-mm-dd 2015-10-05
German style date 05.10.2015, 05-10-2015, 05 Okt 2015, 05 Oktober 2015

Document Types

doctype

{
    "entity": "doctype",
    "value": "Invoice"
}

Describes a document type. A list of supported document types:

Format
Name Type Description
entity string Must be doctype.
value string The document type.
Valid Feedback
Form Example
One of the above listed values Invoice, Reminder

documentdomain

documentdomain

{
    "entity": "documentdomain",
    "value": "TeleCommunication"
}

Describes a document domain. A list of supported values:

Format
Name Type Description
entity string Must be documentdomain.
value string The document domain.
Valid Feedback
Form Example
One of the above listed values Travel, HealthInsurance

email

email

{
    "entity": "email",
    "value": "info@t-online.de",
    "box": {
        "page": 1,
        "left": 189.0,
        "top": 820.0,
        "width": 73.0,
        "height": 7.0
    }
}

Describes an email in the format <Name>@<Domain> with "@" as a delimiter between email components.

Format
Name Type Description
entity string Must be email.
value string Email in the defined format.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
Valid email address hello@gini.net

iban

iban

{
    "entity": "iban",
    "value": "DE74700500000000028273",
    "box": {
        "page": 1,
        "left": 425.0,
        "top": 770.0,
        "width": 83.0,
        "height": 6.0
    }
}

Describes an IBAN.

Format
Name Type Description
entity string Must be iban.
value string IBAN in the normalized form (without spaces between digits and letters).
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
Valid IBAN DE68700202700667302269

Invalid IBAN will be rejected.

invoiceid

invoiceid

{
    "entity": "invoiceid",
    "value": "201210124056",
    "box": {
        "page": 1,
        "left": 429.0,
        "top": 133.0,
        "width": 53.0,
        "height": 8.0
    }
}

Describes an invoice ID as identifier.

Format
Name Type Description
entity string Must be invoiceId.
value string Invoice ID in the normalized form (without spaces between digits and letters).
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
A string with length >= 1 and at least 1 digit 12345, RE 67890

numeric

numeric

{
    "entity": "numeric",
    "value": "12.5",
    "box": {
        "page": 1,
        "left": 429.0,
        "top": 133.0,
        "width": 53.0,
        "height": 8.0
    }
}

Describes a numeric value (integer/float).

Format
Name Type Description
entity string Must be numeric.
value string Numeric value in string.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
A string of integer or float 1, 1.5, 1.234

paymentstate

paymentstate

{
    "entity": "paymentState",
    "value": "Paid"
}

Describes a payment state as one of the following values:

Format
Name Type Description
entity string Must be paymentState.
value string The payment state.
Valid Feedback
Form Example
One of the above listed values Paid, ToBePaid

phonenumber

phonenumber

{
    "entity": "phonenumber",
    "value": "08923508270",
    "box": {
        "page": 1,
        "left": 425.0,
        "top": 770.0,
        "width": 83.0,
        "height": 6.0
    }
}

Describes a phone number in one of the two formats <CountryCode> <Number> with " " as a delimiter and <Number> without a country code. All punctuation marks (e.g. "/", "-"), spaces and "(0)" (e.g. +49(0)61957746361) are deleted.

Format
Name Type Description
entity string Must be phonenumber.
value string The phone number in the defined format.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
Phonenumber-like string +49 89 1234 567

Brackets, spaces, leading '+', and '-' are allowed.

poboxnumber

poboxnumber

{
    "entity": "poboxnumber",
    "value": "22087",
    "box": {
        "page": 1,
        "left": 223.0,
        "top": 125.0,
        "width": 16.0,
        "height": 6.0
    }
}

Describes a post-office box.

Format
Name Type Description
entity string Must be poboxnumber.
value string The post-office box number.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
4-6 digits 123456
<Keyword> <4-6 digits> Postfach 123456, PF 123456, Brieffach 123456

The keyword can be one of the following: Postfach, PF, Brieffach.

recipient

recipient

{
    "entity": "recipient",
    "value": "Max Mustermann Musterstrasse 1 Musterstadt",
    "box": {
        "top": 379.0,
        "left": 68.0,
        "width": 244.0,
        "height": 10.0,
        "page": 1
    }
}

Represents the recipient of a letter.

Format
Name Type Description
entity string Must be recipient.
value string The recipient.
box bounding-box Bounding box of the occurrence including the page number.

reference

reference

{
    "entity": "reference",
    "value": "K19218331",
    "box": {
        "page": 1,
        "left": 535.0,
        "top": 395.0,
        "width": 25.0,
        "height": 10.0
    }
}

Describes a payment reference.

Format
Name Type Description
entity string Must be reference.
value string The payment reference with ", " as delimiter between reference parts.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
A string with length >= 5 This a reference.

A string with less than 5 non space characters will be rejected.

street

street

{
    "entity": "street",
    "value": "Emmy-Noether-Straße:2a",
    "box": {
        "page": 1,
        "left": 162.0,
        "top": 125.0,
        "width": 55.0,
        "height": 6.0
    }
}

Describes a street in the format <Street name>:<House number> with ":" as a delimiter between components. All abbreviations (e.g. "str.") are replaced with the German word "Straße".

Format
Name Type Description
entity string Must be street.
value string Street in the defined format.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
<Streetname>:<Housenumber> ABC Str:1a
<Streetname> <Housenumber> ABC Straße 1a
<Streetname> (without house number) ABC Straße

taxnumber

taxnumber

{
    "entity": "taxnumber",
    "value": "143/163/40289",
    "box": {
        "page": 1,
        "left": 501.0,
        "top": 812.0,
        "width": 58.0,
        "height": 6.0
    }
}

Describes a German tax number in the format <taxOfficeNumber>/<taxOfficeAreaNumber>/<personalNumber><checkDigit> with "/" as delimiter between the first 3 components.

Format
Name Type Description
entity string Must be taxnumber.
value string Tax number in the defined format.
box bounding box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
A string containing 11-13 digits 143/163/4028/9

All kinds of delimiters that are common for tax numbers are allowed (e.g. '/', '-'). A string without delimiters is also allowed.

text

text

{
    "entity": "text",
    "value": "Aktenzeichen: K19218331",
    "box": {
        "page": 1,
        "left": 535.0,
        "top": 395.0,
        "width": 25.0,
        "height": 10.0
    }
}

Describes a plain text entity.

Format
Name Type Description
entity string Must be text.
value string Plain text.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback

Generally Text entity accepts all kinds of Text as feedback except for some specific fields which the extra rules are applied to.

Label Form Example
branchId digit sequence 12345, 678901234
transactionId
paymentMethod one of the allowed valid payment methods Cash, Card, Contactless Card, Girocard, Contactless Girocard, Contactless Visa, Contactless Mastercard

time

time

{
    "entity": "time",
    "value": "12:13:14",
    "box": {
        "page": 1,
        "left": 429.0,
        "top": 143.0,
        "width": 40.0,
        "height": 8.0
    }
}

Describes a time in the format <hour>:<minute>:<second> with ":" as a delimiter between the time components.

Format
Name Type Description
entity string Must be time.
value string Time in the defined format.
box bounding-box Bounding box of the occurrence including the page number.

url

url

{
    "entity": "url",
    "value": "www.m-net.de",
    "box": {
        "page": 1,
        "left": 444.0,
        "top": 553.0,
        "width": 50.0,
        "height": 8.0
    }
}

Describes the host part of an URI as defined in RFC 3986. http:// is implicitly assumed as URI scheme.

Format
Name Type Description
entity string Must be url.
value string The host part of a URI.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
A valid url string www.gini.net

vat

vat

{
    "entity": "vat",
    "value": "DE188796931",
    "box": {
        "page": 1,
        "left": 453.0,
        "top": 812.0,
        "width": 43.0,
        "height": 6.0
    }
}

Describes a EU VAT number.

Format
Name Type Description
entity string Must be vat.
value string European Union VAT number in normalized form (without spaces between the digits and letters).
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
VAT string in valid form DE188796931

Currently only VAT for 'DE' 'GB' 'FR' 'AT' allowed.

zipcode

zipcode

{
    "entity": "zipCode",
    "value": "18337",
    "box": {
        "page": 1,
        "left": 62.0,
        "top": 25.0,
        "width": 55.0,
        "height": 6.0
    }
}

Describes a ZIP code.

Format
Name Type Description
entity string Must be zipCode.
value string The ZIP code.
box bounding-box Bounding box of the occurrence including the page number.
Valid Feedback
Form Example
4-5 digits 80809

User Center API

Gini's User Center offers an API to programmatically create new Gini accounts and to make API requests on behalf of the created user.

Client Authentication

All access to the User Center API requires client authentication. A client can authenticate itself with the Client Credentials Grant described in RFC 6749. In short, the client exchanges its client ID and client secret for an access token.

Request

get a client access token

curl -v -H 'Accept: application/json'
    -u 'client-id:secret'
    'https://user.gini.net/oauth/token?grant_type=client_credentials'
GET /oauth/token?grant_type=client_credentials HTTP/1.1
Authorization: Basic Y2xpZW50LWlkOnNlY3JldA==
Host: user.gini.net
Accept: application/json

example response

{
  "access_token":"74c1e7fe-e464-451f-a6eb-8f0998c46ff6","token_type":"bearer","expires_in":3599
}

In order to get a client access token, send a GET request to /oauth/token?grant_type=client_credentials. The request must contain a basic HTTP access authorization header with the client ID as a username and the client secret as a password.

The client can now use the returned access token to make requests to the User Center API by sending the token as a bearer token in the Authorization request header:

GET /api/users/c1e60c6b-a0a4-4d80-81eb-c1c6de729a0e HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Accept: application/json

Authenticating on behalf of a User

authenticating on behalf of a user

curl -v -X POST --data-urlencode
    'username=some_user@example.com'
    --data-urlencode 'password=supersecret'
    -H 'Content-Type: application/x-www-form-urlencoded'
    -H 'Accept: application/json'
    -u 'client-id:secret' 'https://user.gini.net/oauth/token?grant_type=password'
POST /oauth/token?grant_type=password HTTP/1.1
Authorization: Basic Y2xpZW50LWlkOnNlY3JldA==
Host: user.gini.net
Accept: application/json
Content-Type: application/x-www-form-urlencoded

username=some_user@example.com&password=supersecret

example response

{
  "access_token":"6c470ffa-abf1-41aa-b866-cd3be0ee84f4",
  "token_type":"bearer",
  "expires_in":3599
}

The returned access token can now be used to make requests to the Gini Health API on behalf of the user. To do so, send the access token as a bearer token in the Authorization request header:

GET /documents HTTP/1.1
Host: health-api.gini.net
Authorization: BEARER 6c470ffa-abf1-41aa-b866-cd3be0ee84f4
Accept: application/vnd.gini.v2+json
Connection: close

The Resource Owner Password Credentials Grant can be used to exchange a user's email address and a password with an access token. The access token can then be used to make requests to the Gini API on behalf of the user.

Request
Key Description
username The user's email address.
password The user's password.

Note that the client must authenticate itself using basic HTTP access authentication with its ID as a username and its secret as a password.

Creating a New User

creating a new user

curl -v -X POST --data '{"email":"some_user@example.com", "password":"supersecret"}'
    -H 'Content-Type: application/json'
    -H 'Accept: application/json'
    -H 'Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6'
    'https://user.gini.net/api/users'
POST /api/users HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Content-Type: application/json

{"email":"some_user@example.com","password:"supersecret"}

example response

HTTP/1.1 201 Created
Location: https://user.gini.net/api/users/c1e60c6b-a0a4-4d80-81eb-c1c6de729a0e
Content-Length: 0

In order to create a new user, submit a POST request to /api/users.

Request
Key Description
email The new user's email address (will be used as login username).
password The new user's password (must be at least 6 characters long).

If the request entity was invalid (missing field(s), password < 6 characters etc.) or a user with that email address already exists, the API will respond with 400 Bad Request.

Retrieving User Information

retrieving user information

GET /api/users/88a28076-18e8-4275-b39c-eaacc240d406 HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Accept: application/json

Response

{
  "id":"88a28076-18e8-4275-b39c-eaacc240d406",
  "email":"some_user@example.com"
}

Information about a user can be retrieved with a GET request to /api/users/{userId}

Response
Key Description
id Unique User ID.
email The user's email address.

Changing a User's Password and/or Email

change a user's password and/or email

PUT /api/users/c1e60c6b-a0a4-4d80-81eb-c1c6de729a0e HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Content-Type: application/json

with

{"oldPassword":"supersecret","password:"anothersecret"}

or

{"oldEmail":"old@email.com","email:"my.new@email.com"}

or

{
 "oldPassword":"supersecret",
 "password":"anothersecret",
 "oldEmail":"old@email.com",
 "email":"my.new@email.com"}

A user's password and/or email can be changed with a PUT request to /api/users/{userId}. In order to update a user's password and/or email, the current password/email must be provided.

Request
Key Description

oldPassword The user's current password. password | The password to which the user's password should be changed to. oldEmail | The user's current email. email | The email to which the user's email should be changed to.

Deleting a User

delete a user

DELETE /api/users/16aecc72-8032-4df6-9686-eaf4ec9532b8 HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Content-Type: application/json

An existing user can be deleted with a DELETE request to /api/users/{userId}. This also deletes all data associated with that user (e.g. access tokens, documents and extractions).

Troubleshooting

If you have trouble using the Gini Health API and you need to contact the support, there is some information you should always provide in order for us to help you quickly and efficiently.

X-Request-Id

The request id is generated for every request against the Gini Health API and tracked through the whole system. It is included in every response you receive from the Gini Health API as the HTTP header X-Request-Id. Please refer to it when you contact our support.

Document Id

The document id is generated for every accepted upload. It is included in the Location HTTP header which is part of the response to a successful upload. Please refer to the document id if you have questions related to the specific document upload.