Hot questions for Using Azure in azure data lake

Top Java Programmings / Azure / azure data lake

Question:

I am writing a test application to read file from AzureData Lake. I have created the account and the resource, as well as uploading the file. I am trying to create a client using the following code (as described in the documentation https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-java-sdk). Where do I get those values from exactly? Thanks

String clientId = "FILL-IN-HERE";
String authTokenEndpoint = "FILL-IN-HERE";
String clientKey = "FILL-IN-HERE";

AccessTokenProvider provider = new ClientCredsTokenProvider(authTokenEndpoint, clientId, clientKey);
// full account FQDN, not just the account name
String accountFQDN = "FILL-IN-HERE";
ADLStoreClient client = ADLStoreClient.createClient(accountFQDN, provider);

Answer:

It seems that you are using Azure Active Directory authentication with Azure Data Lake.

Login Azure portal->click Azure Active Directory->click App registrations->find your application(or create a new one)

ClientId

clientKey(Click Certificates&secrets->click new client secret->click add) The client secret is the clientKey.

authTokenEndpoint Click Endpoints

Refer to this document for more details.

Question:

I login to adls gen 2 by these POST request:

https://login.microsoftonline.com//oauth2/v2.0/token

Request body:

grant_type:client_credentials

client_id: my_client_id from App registrations -> Owned applications -> Mu application

client_secret: my_client_secret from App registrations -> Owned applications -> My application

scope: https://storage.azure.com/.default

provider_type: org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

And get successfull responce code 200:

{
    "token_type": "Bearer",
    "expires_in": 3599,
    "ext_expires_in": 3599,
    "access_token": <token>
}

After I tryed to create filesystem by using following PUT request: https://dbmiadlsgen2.dfs.core.windows.net/mydata?resource=filesystem

Headers:

Authorization - Bearer Content-Type - text/plain x-ms-version - 2018-11-09

And get folloing error:

    {
        "error": {
            "code": "AuthorizationPermissionMismatch",
            "message": "This request is not authorized to perform this operation using this permission.\nRequestId:bcb4c0d3-901f-00cc-0722-2b7f0c000000\nTime:2019-06-25T06:54:57.3437434Z"
        }

}

I get grand to my user from azure portal: Storage Blob Data Contributor role, but it is not help...

How do you think what kind of role I should use? Are some request body or header parameters incorrect? Thank you.


Answer:

It's not enough for the app and account to be added as owners, I would suggest you to go to your storage account > IAM > Add role and add the special permission for this type of request, STORAGE BLOB DATA CONTRIBUTOR .

For further reference please visit:

https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app

Hope it helps.

Question:

I'm using Azure data lake store as a storage service for my Java app, sometimes I need to compress multiples files, what I do for now is I copy all files into the server compress them locally and then send the zip to azure, even though this is work it take a lot of time, so I'm wondering is there a way to compress files directly on azure, I checked the data-lake-store-SDK, but there's no such functionality.


Answer:

Unfortunately, at the moment there is no option to do that sort of compression.

There is an open feature request HTTP compression support for Azure Storage Services (via Accept-Encoding/Content-Encoding fields) that discusses uploading compressed files to Azure Storage, but there is no estimation on when this feature might be released.

The only option for you is to implement such a mechanism on your own (using an Azure Function for example).

Hope it helps!

Question:

I am trying to access file system in azure data lake storage gen 2 via REST API using java. this is how I am building my request:

public static void main(String[] args) throws Exception {
    String urlString = "https://" + account + ".dfs.core.windows.net/sterisfiles?resource=filesystem";
    HttpURLConnection connection = (HttpURLConnection)(new URL(urlString)).openConnection();
    getFileRequest(connection, account, key);
    connection.connect();
    System.out.println("Response message : "+connection.getResponseMessage());
}


public static void getFileRequest(HttpURLConnection request, String account, String key) throws Exception{
    SimpleDateFormat fmt = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss");
    fmt.setTimeZone(TimeZone.getTimeZone("GMT"));
    String date = fmt.format(Calendar.getInstance().getTime()) + " GMT";
    String stringToSign =  "GET\n"
            + "\n" // content encoding
            + "\n" // content language
            + "\n" // content length
            + "\n" // content md5
            + "\n" // content type
            + "\n" // date
            + "\n" // if modified since
            + "\n" // if match
            + "\n" // if none match
            + "\n" // if unmodified since
            + "\n" // range
            + "x-ms-date:" + date + "\n"
            + "x-ms-version:2014-02-14\n" //headers
            + "/"+account + request.getURL().getPath();
    String auth = getAuthenticationString(stringToSign);
    request.setRequestMethod("GET");
    request.setRequestProperty("x-ms-date", date);
    request.setRequestProperty("x-ms-version", "2014-02-14");
    request.setRequestProperty("Authorization", auth);
}

private static String getAuthenticationString(String stringToSign) throws Exception{
    Base64 base64 = new Base64();
    Mac mac = Mac.getInstance("HmacSHA256");
    mac.init(new SecretKeySpec(base64.decode(key), "HmacSHA256"));
    String authKey = new String(base64.encode(mac.doFinal(stringToSign.getBytes("UTF-8"))));
    String auth = "SharedKey " + account + ":" + authKey;
    return auth;
}

This is throwing 403 error with message: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

are my request headers not correct?


Answer:

According to my test, we can use Azure AD authentication to call Azure data lake storage Gen2 REST API. For more details, please refer to https://social.msdn.microsoft.com/Forums/en-US/45be0931-379d-4252-9d20-164261cc64c5/error-while-calling-adls-gen-2-rest-api-to-create-file?forum=AzureDataLake.

  1. Create Azure AD service principal and assign a RABC role to it. For futher information, please refer to https://docs.microsoft.com/en-us/azure/storage/common/storage-auth-aad.
az ad sp create-for-rbac -n 'your sp name' --role 'Storage Blob Data Owner' --scope 'your scope such as your storage account scope'

  1. Get access token
Method : POST 
URL: https://login.microsoftonline.com/<your Azure AD tenant domain>/oauth2/token
Body:
     grant_type =client_credentials 
    client_id=<the appid you copy>
    client_secret=<the password you copy>
    resource=https://storage.azure.com

  1. Call rest api a. Create File system

    PUT https://{accountName}.{dnsSuffix}/{filesystem}?resource=filesystem
    

    b. List File system

    GET https://{accountName}.{dnsSuffix}/?resource=account
    

Question:

I have a project using Camel(java). I am retrieving data from one source and am sending it to one endpoint(using the .to() ). I need to also send it to Azure DataLake. How would I go about this. I donot see any camel components for DataLake. Would I have to make my own component?


Answer:

Yes, today you would have to create your own Camel component.

There isn't currently a Camel component (link) for Azure Data Lake Store.

There's a way you could give feedback if this is something you'd like to see officially supported: see the UserVoice page (link) for Data Lake.