UiPath Documentation
activities
latest
false
Importante :
Este contenido se ha localizado parcialmente a partir de un sistema de traducción automática. La localización de contenidos recién publicados puede tardar entre una y dos semanas en estar disponible.
UiPath logo, featuring letters U and I in white

Actividades de Document Understanding

Última actualización 22 de abr. de 2026

Digitalizar documento

UiPath.IntelligentOCR.Activities.Digitization.DigitizeDocument

Descripción

Digitaliza un documento, extrayendo su modelo de objeto del documento (DOM, por sus siglas en inglés) y su texto, y almacenándolos en sus correspondientes tipos de variables.

Nota:

You must assign an OCR engine to this activity by dragging it into the body of the activity. The chosen OCR engine is to be used only if the incoming documents require OCR processing. Visit OCR Engines to check the available OCR engines. The input and output parameters of the selected OCR engine are automatically set by the Digitize Document activity.

Compatibilidad de proyectos

Windows-Legacy | Windows

Configuración

Panel de propiedades

Común

  • Nombre para mostrar: el nombre para mostrar de la actividad.

Entrada

  • ApplyOcrOnPdf -Establishes if the OCR process should be applied or not to PDF documents. If set to Yes, the OCR is applied to all PDF pages of the document. If set to No, only digitally typed text is extracted. The default value is Auto, determining if the document requires to apply the OCR algorithm depending on the input document.

  • DegreeOfParalelism - Specifies how many, if any, pages to be analyzed in parallel. The -1 value uses the "Number of Cores on the machine - 1". This means that the activity tries to process as many pages in parallel as the number of cores - 1 value, while specifying a positive value uses that specific number of logical processors. By default, this property is set to -1.

    Esta propiedad acepta cualquier valor que no sea mayor que LogicalProcessorCount - 1.

  • DetectCheckboxes - Detects the available check-boxes from the document while digitizing it. The default value is True.

  • DocumentPath - The file path of the document you want to digitize. This field supports only strings and String variables.

    Nota:
    • Set the ApplyOcrOnPdf property to Yes for native PDF documents which contain logos, hidden images, or other elements that corrupt the digitization output and might lead to suboptimal extractions and/or classifications.
    • Text extraction from PDF files has been upgraded. This results in an optimized extraction process, where both native and scanned text is retrieved at the same time. The process applies OCR only on the images identified in the PDF file. This improvement is available only when the ApplyOCROnPDF option is set to Auto.
    Nota:

    The supported file types for this property field are .png, .jpe, .jpg, .jpeg, .tiff, .tif, and .pdf.

Otros

  • Privado : si se selecciona, los valores de variables y argumentos ya no se registran en el nivel Detallado.

Salida

  • DocumentObjectModel - The Document Object Model (DOM) of the file, stored in a Document variable. This field supports only Document variables.
  • DocumentText - The text extracted from the specified document. This variable can be subsequently used in the Present Validation Station activity. This field supports only String variables.
    Nota:

    Starting with UiPath.IntelligentOCR.Activities package v6.3.0-preview, the Digitize Document activity comes with a default preselected OCR engine, the UiPath® Document OCR engine.

Ambas variables de salida, emparejadas porque son dependientes, pueden utilizarse aún más en el procesamiento de documentos en todo el marco de procesamiento de documentos (clasificación, extracción de datos, validación humana, etc.).

Importante

If the UiPath.IntelligentOCR.Activities package has been updated to v5.1.0, then the ForceApplyOCR parameter has been replaced with the ApplyOcrOnPDF. Here is the compatibility between the old and new parameters:

  • ForceApplyOCR = True is replaced by ApplyOcrOnPDF = Yes;
  • ForceApplyOCR = False is replaced by ApplyOcrOnPDF = Auto;
  • ForceApplyOCR = Empty is replaced by ApplyOcrOnPDF = Auto;
  • ForceApplyOCR = Your defined variable is replaced by ApplyOcrOnPDF = Auto.
Nota:

The Digitize Document activity extracts the text from a PDF file and, for complex documents, it applies pre-processing and post-processing algorithms. This activity can be used together with other Document Understanding activities.

Modelo de objeto de documento

The Document Object Model is captured in a proprietary object. Visit Document Class for more information.

Consejo:

To successfully digitize and process your documents, consider the following advice:

  • Para que una imagen se digitalice/procese correctamente, sus dimensiones de ancho y altura deben estar entre 50 y 10 000 píxeles. Cualquier imagen por debajo o por encima de este rango se rechaza, con un mensaje de excepción. Una imagen validada con las dimensiones mencionadas anteriormente y con un tamaño total mayor que 14 MP, se escala a 14 MP, mientras se mantiene la relación de aspecto (relación de ancho o altura).
  • Los mejores resultados se obtienen manteniendo el ángulo de oblicuidad entre +/- 20 grados.

Ejemplo de uso de la actividad Digitalizar documento

Visit Manual validation for digitize documents to check how the Digitize Document activity is used in an example that incorporates multiple activities.

¿Te ha resultado útil esta página?

Conectar

¿Necesita ayuda? Soporte

¿Quiere aprender? UiPath Academy

¿Tiene alguna pregunta? Foro de UiPath

Manténgase actualizado