Configure Cloud Access

Import your dataset from Cloud storage.

📘

Cloud Imports Only

This page covers importing dataset files from cloud storage providers. If you want to upload dataset files from your local computer, skip this and go to Register Dataset page.

To register your cloud-stored datasets, MarkovML requires access to them. Simply create access credentials with your cloud service provider and add them to your MarkovML workspace, where they will be stored securely. Follow the steps below to add your credentials.

Register Cloud Access Credentials

🚧

Only AWS support

Currently, MarkovML only supports importing datasets from AWS S3. Support for other cloud storage providers is on our roadmap.

With MarkovML, you can analyze your datasets stored in AWS S3. Simply provide the S3 location for your dataset or each segment if your dataset is segmented.

You can register your S3 bucket credentials, such as, S3 ACCESS_KEY and ACCESS_SECRET with MarkovML once and reuse the credentials to access other datasets in the future.

Register Access Credentials Using SDK

📘

Note

If you have already registered your AWS S3 credentials via MarkovML UI then there's no need to register them again using the Markov SDK.

Create and store S3 credentials usingS3Credentials() method with the below details and then register them to MarkovML CredentialManager using the register_s3_cred() method. Finally, check if the credential_id was successfully registered with MarkovML.

Provide the following details when using S3Credentials() method:

  1. name: Name of the dataset for which you are adding these credentials.
  2. details: Notes for future reference. This could include details about the dataset or any specific considerations related to the credentials.
  3. access_key: Input your S3 access key. This key is required to access your S3 storage and retrieve the dataset.
  4. access_secret: Input your S3 access secret. Similar to the access key, this secret is also needed for authentication when accessing your S3 storage.

The code example below illustrates registering an S3 credential with MarkovML using the Python SDK.

Sample Code

from markov.api.credentials.credential import S3Credentials, CredentialManager

#Create S3  credentials 
s3_cred = S3Credentials(name='hatespeech',
                        details='Access credentials for the HateSpeech dataset',
                        access_key='<YOUR_S3_ACCESS_KEY>',
                        access_secret='<YOUR_S3_ACCESS_SECRET>')

cred_response = CredentialManager().register_s3_cred(s3_cred)

# check if the credential_id was successfully registered with MarkovML and handle it
if cred_response.is_ok():
   credential_id = cred_response.credential_id
else:
    raise f'Unable to register credentials. Error {cred_response.message}'

🔒

Security Note

For security reasons, direct retrieval of original cloud credentials through the SDK isn't permitted. Instead, utilize the credential_id returned for authentication.

Retrieve existing Credentials

You can retrieve any existing credential by name from the CredentialManager using CredentialManager.find_with_name() method as shown below:

Sample Code

from markov.api.credentials.credential import CredentialManager

cred_response = CredentialManager.find_with_name('hatespeech') # HateSpeech dataset

Register Access Credentials Using Web UI

To upload and register your dataset through MarkovML Web UI, follow these steps:

  1. Access Dataset Page: Go to the left sidebar and click on Dataset. Then, click the Add new dataset button at the top right corner of the Dataset page.
  2. Choose Cloud Upload: If you are uploading from cloud storage, select the third option labeled My dataset is stored in cloud.
  3. Add Access Credentials: By default, Cloud Storage should already be selected. Click on Add new in the dropdown menu under Access Credentials. This will prompt you to add new AWS S3 bucket credentials.
  4. Enter Access Information: Specify the cloud storage type and provide the necessary access details. Give the credential a unique name and optionally add a brief description. Click Save when finished.

You can add new credentials using the steps above or use the existing ones to register your dataset.