File Structure and Naming Conventions

Userpilot exports data in a structured manner to your cloud storage, typically organized by date and possibly event type, to facilitate easier management and querying.
  • Directory Structure: Data files are usually organized hierarchically. A common pattern is by date:
[your_path_prefix]/userpilot_datasync_{SYNC_ID}_{APP_TOKEN}/YYYY-MM-DD/
  • For example, data synced on May 10, 2025, might be found in a path like userpilot_events/userpilot_datasync_13214_NX-123454/2025-05-10/ .
  • File Naming Conventions: Files within these directories often include timestamps or unique identifiers to prevent overwrites and indicate the batch of data they contain.
  • Data Granularity per File: Each file typically contains data for a specific time window (e.g., one hour or one day, depending on your sync frequency).

Data Format Details

Userpilot exports data in a well-defined format to ensure consistency and ease of parsing. The most common formats for raw event data are JSON (JavaScript Object Notation) or Parquet.

JSON

  • Data is typically provided as one JSON object per line (NDJSON/JSON Lines).
  • Each line represents a single event.
  • Use Cases: Easy to read and parse by many systems, widely supported in data pipelines and analytics tools.
  • Example:
{
  "app_token": "NX-123ASVR",
  "event_type": "track",
  "event_name": "Subscription Purchased",
  "source": "web-client",
  "user_id": "user-001",
  "company_id": "company-001",
  "inserted_at": "2025-05-11T07:45:30.123456Z",
  "hostname": "app.example.com",
  "pathname": "/features/subscription",
  "screen_width": 1920,
  "screen_height": 1080,
  "operating_system": "Windows",
  "browser": "Chrome",
  "browser_language": "en-US",
  "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36",
  "device_type": "desktop",
  "country_code": "US",
  "metadata": {
    "subscription_id": "sub-001",
    "subscription_type": "monthly",
    "subscription_price": 10,
    "subscription_currency": "USD"
  }
}

CSV

  • Data is provided as plain text, with each line representing a record and fields separated by commas. The first line is a header row with column names.
  • Use Cases: Easily imported into spreadsheets, relational databases, and many data analysis tools. Best for flat data structures where all records have the same set of fields.
  • Example:
app_token,event_type,event_name,source,user_id,company_id,hostname,pathname,country_code,screen_width,screen_height,operating_system,browser,browser_language,user_agent,device_type,inserted_at
NX-APP-TOKEN,page_view,,web-client,user-123,company-456,example.com,/,US,1920,1080,macOS,Chrome,en-US,Mozilla/5.0...,desktop,2025-05-06T12:30:00.123Z
Note: The metadata field may be stringified as JSON, omitted, or flattened depending on implementation.

Parquet

  • Parquet is a columnar storage file format optimized for use with big data processing frameworks.
  • Use Cases: Ideal for large-scale analytics, efficient storage, and fast querying in data lakes and warehouses. Supported by tools like Apache Spark, Pandas (Python), and most modern data platforms.
  • Note: You will need tools or libraries that can read this format (e.g., Apache Spark, Pandas in Python with pyarrow or fastparquet ).

Apache Avro

  • Avro is a row-oriented data serialization framework that stores data in a compact binary format, with the schema included alongside the data.
  • Use Cases: Common in Apache Kafka, Hadoop, and for long-term data archival where schema evolution is important. Supports adding/removing fields over time without breaking downstream consumers.
  • Note: Data is stored in binary format. Use Avro libraries in your programming language of choice (Java, Python, etc.) to read and write Avro files. The schema is used to interpret the binary data.

Folder Structure

Userpilot Data Sync organizes your exported data in a clear, hierarchical folder structure to make it easy to locate, process, and analyze your data. This structure is designed to support both granular event analysis and high-level reporting. Below is an overview of the folder structure you will find in your storage destination:
userpilot_datasync_{Sync_ID}_{APP_TOKEN}/
├── all_companies/
   └── all_companies.avro
├── all_users/
   └── all_users.avro
├── {Date}/
   ├── all_events.avro
   ├── feature_tags/
   ├── matched_events/
   └── track_feature_{ID}.avro
   ├── feature_tags_breakdown.avro
   └── feature_tags_definitions.avro
   ├── interaction/
   └── matched_events/
       └── interaction_{Type}_{Id}.avro
   ├── labeled_events/
   ├── matched_events/
   └── track_labeled_{ID}.avro
   ├── labeled_events_breakdown.avro
   └── labeled_events_definitions.avro
   ├── tagged_pages/
   ├── matched_events/
   └── page_view_{ID}.avro
   ├── tagged_pages_breakdown.avro
   └── tagged_pages_definitions.avro
   ├── trackable_events/
   ├── matched_events/
   └── track_event_{NAME}.avro
   ├── trackable_events_breakdown.avro
   └── trackable_events_definitions.avro
   ├── users/
   └── identify_user.avro
   └── companies/
       └── identify_company.avro

Folder & File Descriptions

  • all_companies/ and all_users/
    • Contain a snapshot of all identified companies and users, respectively, as of the latest sync. Each file (e.g., all_companies.avroall_users.avro ) includes the most recent state and all auto-captured and custom properties for each entity.
  • {Date}/
    • Each date folder contains all data synced for that specific day. This allows for easy partitioning and historical analysis.
    • all_events.avro: All raw events captured on that date, across all event types.
    • feature_tags/labeled_events/tagged_pages/trackable_events/interaction/:
      • Each of these folders contains:
        • matched_events/: Raw events for each identified/tagged/labeled event (e.g., track_feature_{ID}.avrotrack_labeled_{ID}.avro , etc.).
        • breakdown files (e.g., feature_tags_breakdown.avro ): Aggregated counts and engagement metrics for that day.
        • definitions files (e.g., feature_tags_definitions.avro ): Metadata such as name, description, and category, allowing you to map event IDs to human-readable definitions.
    • users/ and companies/: Contain user and company identification events for that day (identify_user.avroidentify_company.avro ).

How to Use This Structure

  • Use the all_companies and all_users files to get the latest state and properties for all entities in your app.
  • Use the  folders to analyze daily event activity, engagement, and breakdowns.
  • The matched_events folders provide access to all raw events for each feature, label, or tag, enabling deep-dive analysis.
  • The breakdown and definitions files make it easy to join raw event data with descriptive metadata for reporting and analytics.
Planning for Data Ingestion:A clear understanding of the file structure and naming is vital for setting up automated ETL/ELT pipelines to load this data into your data warehouse or data lake. Your ingestion scripts will rely on these patterns to discover new files.