Aug 30, 2017

XSLT - basics, tips & tricks

This post is created from Pluralsight video:
XSLT 2.0 and 1.0 Foundations by Dimitre Novatchev

You can "develop" XSLT in Visual Studio. Create or open XSLT as simple file and execute Start debugging from Debug menu.

You can't learn XSLT without XPath.

XSLT is built around concept of templates.

XSLT can use one or even more XML as input data. Templates can be viewed as peace of code that must target using XPath some elements of XML input.

Ideally you are responsible to write templates that should cover all input data. You can do this with 1 or unlimited number of templates. Each template may nest another. This relates to inherited hierarchy of XML.

Furthermore you should be aware that there is priority and precedence applied on templates. It determines order of execution and resolve conflicts if two templates match same data.

Sample:

<xsl:template match="persons">

<!-- Render any data related to all persons -->

   <!-- Here nesting happens. XSLT processor will search for templates in rest of XSLT document that match XML data from this contextual point/node -->

    <xsl:apply-templates/>

</xsl:template>

Following template matches person and renders it inside above <xsl:apply-templates/>

<xsl:template match="person">

<xsl:value-of select="first-name"/>

<xsl:value-of select="last-name"/>

</xsl:template>


Built-in templates

By default XSLT processor will catch and render all data from source XML that hasn't been processed by your templates. Of course this is rarely wanted behavior.

Dimitre Novatchev has written templates that override these built-in template and outputs descriptive warning about missing matches for elements, nodes, attributes which is very helpful.

<!-- Built-in templates: Explicit  -->
  <xsl:template match="*" mode="#all" priority="-999">
    <xsl:message>
      Unmatched element <xsl:value-of select="name()"/>
    </xsl:message>
    <xsl:apply-templates mode="#current"/>
  </xsl:template>

  <xsl:template match="text()" mode="#all">
    <xsl:message>Unmatched text node</xsl:message>
    "<xsl:value-of select="string(.)"/>"
  </xsl:template>

  <xsl:template match="@*" mode="#all">
    <xsl:message>
      Unmatched attribute <xsl:value-of select="name()"/>
    </xsl:message>
    <xsl:value-of select="string(.)"/>
  </xsl:template>

Overriding templates

If you explicitly don't want to render some data just write empty template:

<xsl:template match="person" />


Pull vs Push stlye

This:

<xsl:apply-templates />

; is push style. We didn't target which explicit templates we want here so they'l get pushed in document order.
When following template gets executed:

 <xsl:template match="name|age|profession">
   <td><xsl:value-of select="."/></td>
 </xsl:template>

; we can't force order in which name, age or profession are rendered. Order is determined by input XML data. To change this use pull style.

For XSLT 1.0 use:

     <xsl:apply-templates select="name"/>
     <xsl:apply-templates select="age"/>
     <xsl:apply-templates select="profession"/>

This way we explicitly match place to render with code that renders it with select attribute.

Variables

They follow naming and expression that you should know from XPath.

<xsl:variable name="personsCount" select="count(person)"/>

<xsl:value-of select="$personsCount"/>

When defined as child of top <xsl:stylesheet> element it is global.
Here is an example of local:

<xsl:variable name="fullName" select="concat(name,' ',age)"/>
<xsl:value-of select="$fullName"/>

Modularity

As in procedural languages you can place your XSLT code in separate files or modules.
You reference these modules using :
- <xsl:include>
- <xsl:import>

Code included with Include has higher importance. It is treated as if it was written in primary stylesheet. If by mistake you INCLUDE same code you'll create conflict. Include instruction may be placed anywhere.
Import on the other hand can't create conflict in primar stylesheet. It's content is always treated with less precedence and priority than one in primary stylesheet. You must write it as the very first child of <xsl:stylesheet>
Precedence is determined by order of IMPORT instructions. Last one wins.

Precedence

When 2 or more templates match same XML content XSLT processor must choose a winner.
This is usually case for complex xslt comprised from multiple import/includes.
In such cases same match in different xslt modules may occur.
One solution is to play with priority attribute of template instruction:

<xsl:template match="name|age|profession">
   <td><xsl:value-of select="."/></td>
 </xsl:template>

  <xsl:template match="name|age|profession">
    <td style="color:red">
      <xsl:value-of select="."/>
    </td>
  </xsl:template>

Observe in above sample that both templates match same data. There will be no error and LAST template in document order will be the winner.
Alternatively we could change first template like this:

<xsl:template match="name|age|profession" priority="2">

This way the first template would win.

Same rule goes for variables. Variable from imported module has LESS priority then same one in primary stylesheet.

Data types

<xsl:variable name="salary" select="xs:decimal(0.1)"/>

For-each

Loop is very similar to apply templates.
There is support for sorting:

      <xsl:for-each select="person">
         <xsl:sort data-type ="text" select="age" order="descending"/>
         <p>
             <xsl:value-of select="name"/>
           <xsl:value-of select="age"/>
           <xsl:value-of select="profession"/>
         </p>          
       </xsl:for-each>

Sorting

You can perform sorting on both for-each and apply-templates instructions on same way:

       <xsl:apply-templates>
           <xsl:sort data-type ="text" select="age" order="descending"/>
         </xsl:apply-templates>

Apply-templates vs for-each

Dimitre Novatchev made great example for this:

XML document:

<creatures>
  <animal species="dog"/>
  <animal species="cat"/>
</creatures>

XSLT document:

<xsl:template match="creatures">
   <xsl:apply-templates mode="say"/>
 </xsl:template>

 <xsl:template match="animal[@species='dog']" mode="say">
   Dog says: "Bow" ...
 </xsl:template>
 <xsl:template match="animal[@species='cat']" mode="say">
   Cat says: "Meauu" ...
 </xsl:template>
<xsl:template match="text()" mode="say"/>


Foreach is functionally equivalent foreach in C#. Apply template can be viewed as virtual method.
In above example first template resembles abstract class. It creates generic behavior for functionality Say of animals. Than more specialized templates handle specific cases for each animal. We could say it resembles overriding virtual methods.
Observe last template. If all fails it gets executed. Do note use of mode attribute.

Bottom line is to use for-each for simple tasks and apply-templates for complex problems that are easier solved using more than one place for code.

Aug 29, 2017

XPath - Examples

// equals descendant-or-self::node()/

Watch it! // is expensive since it is relative path and has to traverse all tree!

Case sensitive names of nodes, elements and attributes !

//Person   is not equal   //person

Everything inside [ ] is expression. When you need another expression inside [ ] sometimes you can write:

//person[address/city]
; in this case / is used inside [] instead of :
//person[address[city]]

Relative and absolute path

Example of relative path:

//person[1]  

; this will return EVERY person that is the FIRST child of its parent ! In sample this will include "Oliver child person" and John Sullivan.

Another relative:

//*/*/*/*/name()

; every element that has 4 ancestors ( city -> address -> person -> persons -> root

String='city'
String='person'
String='city'

Absolute path:

/persons/person[1]

; or use :

(//person)[1]

; this will return only FIRST child of persons node!

Sample XML:

<?xml-stylesheet type="text/xml" href="persons.xsl"?>
<persons>
  <person>
    <id>1</id>
    <age>20</age>
    <first-name>
      John
    </first-name>
    <last-name>
      Sullivan
    </last-name>
    <salary>200.00</salary>
  </person>
  <person>
    <id>2</id>
    <age>25</age>
    <first-name>
      Mark
    </first-name>
    <last-name>
      Johny
    </last-name>
  </person>
  <person>
    <address>
      <city state="France">Paris</city>
    </address>
    <id>3</id>
    <age>30</age>
    <first-name>
      Mark
    </first-name>
    <last-name>
      Johny
    </last-name>
  </person>
  <person>
    <address>
      <person>
        Oliver child person
      </person>
      <city state="UK">London</city>
    </address>
    <id>4</id>
    <age>40</age>
    <first-name>
      Oliver
    </first-name>
    <last-name>
      Johny
    </last-name>
    <salary>300.00</salary>
  </person>
  <person>
    <id>5</id>
    <age>60</age>
    <first-name>
      Kris
    </first-name>
    <last-name>
      Sullivan
    </last-name>
    <salary>300.00</salary>
  </person>
</persons>

1. Get all persons:

//person

2. Get all persons text:

//person/string()

3. Get second person:

//person[2]

;returns: Mark, Johny, Id=2

4. Get fourth person:

//person[position() eq 4]

;returns: Oliver, Johny, Id=4

5. Get first two persons:

//person[position() lt 3]

6. Get persons that have element Salary:

//person[salary]

7. Get every person whose next person sibling in document order has salary that equals to preceding person.

//person[sallary = following-sibling::person/sallary]

8. Get persons whose age is between 20 and 60 and has position before filtering greater than 3 :

//person[age gt "20" and age lt "60" and position() gt 3]

9. Find persons whose salary is greater than salary of first person that has info about salary:

//person[salary gt //person[salary][1]/salary/text() ]

10. Look for persons that have City info. For them look their parent Person and person Id.

//person/address/city/../../id

11. Find cities whose attribute State is France:

//person/address/city[@state="France"]

Example of sequences introduced in XPath 2.0:

12. Iterate through all persons and output first-name value:

for $n in /persons/person return $n/first-name

13. Iterate all persons with value for salary. On each iteration calculate second sequence by multiplying current person id element value from from first sequence with 2.

for $n in /persons/person[salary], $id in $n/id * 2 return $id

14. Are there any person that has salary greater than $299.00 :

every $p in /persons/person/salary satisfies $p gt "299.00"

;returns true

15. Do all persons older than 18 have salary element ?

every $adult in /persons/person[age gt "30"] satisfies $adult[salary]

;returns true

16. Do at least one person that have city info have salary info:

some $personwcity in /persons/person/address/city satisfies $personwcity/../..[salary]

;returns true

17. Show last Person from Persons node:

/persons/person[last()]

18. Show first-name of persons whose Id is greater than 3. Observe use of dot  as self reference.

/persons/person/id[. gt "3"]/../first-name

19. Show first and last person.

(//person)[1] union (//person)[last()]

20. Show only persons with city info and salary info

Version 1:
//person[address/city] intersect //person[salary]

Version 2:
//person[address/city and salary]

21. Show persons that DON'T have salary:

//person except //person[salary]

22. Show all persons ONLY if there are 2 or more persons with address info:

//person[count(/persons/person/address) ge 2]

Aug 21, 2017

T-SQL - sample data script - creating random data

If you want more data just add to cross joins more sys tables.
Be careful since row count grows exponentially. 

CREATE TABLE Department (
DepartmentID int primary key not null identity,
[Name] nvarchar(4000)
)
CREATE TABLE Employee (
EmployeeID int primary key not null identity,
DepartmentID int,
constraint fk_employee_department foreign key (departmentid) references department(departmentid),
[Name] nvarchar(4000)
)
--Creating roughly 100000 dummy rows
INSERT INTO Department (Name)
SELECT
sys.objects.[Name]
FROM
sys.objects, sys.indexes, sys.tables
--Creating roughly 600000 dummy rows
INSERT INTO Employee (
DepartmentID, [Name]
)
SELECT
ASCII(LEFT(newid(),1)) AS DepartmentID,
sys.columns.[Name]
FROM
sys.objects, sys.columns, sys.tables

Aug 18, 2017

jquery / CSS selector - multiple class delimited with space, comma and greater then

When selecting more then one css class in jQuery or CSS stylesheet one must understand the subtle differences. When to use space, comma and greater then.

Let's take a look at this HTML:

<div id="level1" class="parent">
<span id="spanL1">L1</span>
<div id="level2">
<span id="spanL2">L2</span>
<div id="level3" class="child">
<span id="spanL3">L3</span>
Dummy
</div>
</div>
</div>
<div id="level11" class="sibling">
<span id="spanL11" class="child">L11</span>
<div id="level22">
<span id="spanL22">L22</span>
<div id="level33">
<span id="spanL33" class="child">L33</span>
Dummy
</div>
</div>
</div>

Here are jQuery selectors and their effect :

//Hides  L3
$('.parent .child').hide();
//Hides L3 and L33
$('.child').hide();
//Hides L1, L2, L3 & L11 & L33 - makes no sense since class child is contained by class parent
$('.parent, .child').hide();
// Nothing! Since class child is not direct child of class parent
$('.parent > .child').hide();
//Hides L11 since L11 is first child of sibling and L33 is not
$('.sibling > .child').hide();

Bottom line:
- space - Inheritance      ( parent child )
- comma - enumerating   (column1, column2)
- greater then - direct successor  (parent > directChild)

Example for inheritance:
<div class="login2">
<div class="form-group">
        <input type="submit" value="@Lsr("Users.Login.Login")" class="btn btn-primary" />
        <input type="submit" name="ForgotPassword" class="btn btn-link" />

    </div>
</div>

I want to override first CSS selector with second:

.login2 INPUT[type="submit"]:hover {
    background-color: #4B86B7
}

// DOES NOT OVERRIDE!
.login2 .btn .btn-link INPUT[type="submit"]:hover {
    background-color: white
}

//OK !
.login2 .form-group INPUT[type="submit"]:hover {
    background-color: white
}

CSS selector is selected based on more specific selector BUT as you can see winner is parent->child specificity and NOT putting a lot of CSS class on one level together.

JQuery find() vs children() vs filter()

For HTML:

<div id="level1">
<span id="spanL1">L1</span>
<div id="level2">
<span id="spanL2">L2</span>
<div id="level3">
<span id="spanL3">L3</span>
Dummy
</div>
</div>
</div>
: this code:

$('#level1').on('click',function(event){
console.log("level1 click");
$(this).children('span').hide();
});

; will hide only L1 since it is first child of Level1 div.

If instead we use:

$(this).find('span').hide();

; all spans will be hidden.

Filter on the other hand is applied only to already found set of elements.

For example:

$(this).filter('span').hide();

; can't work since filter is applied on single DIV element.
This will work:

$(this).find('span').filter(':first').hide();

Alternative and probably more used is using filter as item iterator :

$(this).find('span').filter(function(index){
console.log($(this).text());
});
});

Javascript - JQuery - DOM event propagation

For HTML:

<div id="level1">
<div id="level2">
<div id="level3">
Dummy
</div>
</div>
</div>

; and event handlers for all div's:

$('#level1').on('click',function(event){
console.log("level1");
});
$('#level2').on('click',function(event){
console.log("level2");
});
$('#level3').on('click',function(event){
console.log("level3");
});

; event click will be first handled in innermost level3 and then bubbled up to its parents.

level3
level2
level1

To prevent event bubbling we use event.stopPropagation();
When placed in level3 other levels wont get executed:

level3

Modern browsers support alternative propagation model - capturing. When writing event handler it must be explicitly specified. Capturing works opposite from bubbling. It first handles outermost elements and then executes all children handlers.

For years I've believed that: event.preventDefault() is preventing event bubbling.
For this href:

<a href="www.google.com">Google</a>

; and event handler:

$('a').on('click',function(event){
event.preventDefault();
console.log("google.com");
})

; it prevents standard browser action for HREF so no redirect happens.

Aug 8, 2017

MS SQL - query optimization tips

General hints

For execution plan graph:

- Find operators with high estimated cost within a query.
- Unless there is no filtering generally table or index SCAN is bad. Maybe you need to provide clustered index.
- Thick lines of data in execution plan maybe bad. Especially if they end up to result.
- Missing index warnings

Table scan - bad
Clustered index scan - better -> occurs when there is no WHERE
Clustered index seek - best -> occurs when there is clustered index

Non-clustered index is same idea. Scan - bad -> seek good.
When creating composite indexes sometimes mere change of places of columns in index will hint SQL to use created non-clustered index and perform Seek.

Observe logical reads. When using seek it should go dramatically down since using B+ tree.

Bottom line is to observe WHERE expression and play with indexes.

Key lookup is bad -> use covering index or included columns index

Use SARGable predicates (WHERE or ON expression etc.)- don't use UDF's or LIKE. If not sure take a look at execution plan. If you have index on columns and Index seek is not used then you are NOT using SARGable predicate Also look for mentioning of RESIDUAL predicate. If it is mentioned in JOIN operators or elsewhere then you are not using SARGable predicate. Optimizer will always try to "push" your predicates on index level and perform seek. These will then be seek predicates.

Execution plan - conversions

When you observe any kind of conversion in "Predicate" section of info for operator (Scan, Seek, Join) you should research. Key problem is that optimizer will NOT use SEEK on existing index but SCAN.
Either you used UDF explicitly or query optimizer concluded it should convert something implicitly.
For example: SELECT Title FROM Books WHERE Id = '112'
Since Id is INT implicit conversion occurs and index cant be used.

UDF functions

Use of system or custom UDF brings performance down. Solution is to create computed column with UDF function and nonclustered index on it.

Use most selective column as first column in index!

Dynamic SQL


sp_executesql - execution plan is cached, parametrized
exec - not cached, not safe

Don't use * in Views !

CREATE VIEW ProductVIEW
AS
SELECT * FROM Product

ALTER TABLE PRODUCT
ADD DUMMY VARCHAR(50)

--Dummy is missing !
SELECT * From ProductVIEW
--Must explicitly refresh view
EXEC sp_refreshview 'ProductVIEW'


SQL Server performance

Activity monitor -> waits & recent expensive queries

Look for blocks and deadlocks.

Columnstore indexes

Introduced in 2012 only as nonclustered. In 2014 added support for clustered.

Use them for scenario with heavy READ and Scan operation. It has penalty for CRUD.
Must be dropped for clustered and recreated.


Logical reads vs physical reads

Logical is accessing pages from memory while physical is reading from HD and is MUCH more expensive. Every logical read is reading one complete page.


Use these two to get exact time and resources used to execute query
SET STATISTICS IO ON/OFF

SET STATISTICS TIME ON/OFF

; primary we use these to measure logical reads.


SET SHOWPLAN_XML ON - useful if you need to search for some text since this dumps in XML.


Usefull toll for SSMS to analyze execution plan:
https://www.sentryone.com/plan-explorer


To get estimated and actual rows returned you can use this:

SET STATISTICS PROFILE ON
....
SET STATISTICS PROFILE OFF

Estimate vs Actual

You want to compare estimates to actual rows returned in order to pinpoint bad statistics on certain database element. This will result in poor execution plan and poor performance.
Frequency of updating statistics can be adjusted to resolve this issue.


UDF's can't be estimated by query optimizer ! They spoil info about time consumed and relative batch. Use SET STATISTICS IO ON instead.

Memory grant

For some operators data has to be cached and extra memory is required. Optimizer tries to estimate and grant memory for operator. Example is Hashed match and Sort. You'll observe this info in SELECT (most left operator). It's not desirable to have under-estimates or over-estimates. Under-estimate for memory can result in accessing TEMPDB and destroy performance. Over-estimate will take too much resources and degrade concurency.

Hashed match, Sort - usual culprit for memory consumption

Helper to list all existing indexes on table:  exec sp_helpindex 'TABLENAME'

Join operator "Nested loop" 

Top (on graph up) is outer table. Each row in outer is iterated over every element on inner, bottom (on graph down). Optimizer will choose optimal (smaller) table (rowset) as outer table. Order of JOIN is IGNORED! There are hints to force order but it's best to ignore them.
This type of join is generally expensive. It happens when optimizer doesn't have optimized sorted input for JOIN so it decides to iterate. Observe both outer and inner table and see if you can create index to cover this join.
Watch for index scan in inner table especially for under-estimate of CE.

Join operator "Merge join"

Generally this is optimal type of join. It's low on memory. Optimizer found indexed, sorted inputs for outer and inner table. Sometimes optimizer may create sorting on the fly to support Merge join. If statistics are ok then its probably the best way to join. If we know better we could force Nested loop. Again better approach is to look at stats and indexes and try to help optimizer instead of forcing join operator.
Watch for spill over in TEMPDB during injected sort.

Cardinality estimation (CE)

In short CE is logic used to calculate query plan based on statistics. CE is one of major differences between SQL versions. Using different compatibility level will probably result in different CE performance.

https://docs.microsoft.com/en-us/sql/relational-databases/performance/cardinality-estimation-sql-server

In practical sense, as above article elaborates, we need to make sure that CE works fine. Put it simply if you observe in estimated query plan estimated rows or memory grant that is way off from actual rows and memory you have problem. Solution is usually in dealing with statistics.

Statistics

SQL is auto-updating stats when CRUD operation happens in these case:

As data changes in your tables, the statistics – all the statistics – will be updated based on the following formula:

When a table with no rows gets a row
When 500 rows are changed to a table that is less than 500 rows
When 20% + 500 are changed in a table greater than 500 rows

Here is how you can show statistics for an index:
exec sp_helpindex 'employee'
go
dbcc show_statistics (employee, 'PK__Employee__7AD04FF17E197905')
go

Here is how you can force full stats rebuild:

update statistics Employee with fullscan

Watch it! Fullscan takes time and resources and it is auto executed during index rebuild.
Never update stats AFTER index rebuild since it will degrade already performed fullscan action during rebuild.

You can turn off, change frequency or sampling size for auto stats.

Good FAQ on stats:

https://www.red-gate.com/simple-talk/sql/performance/sql-server-statistics-questions-we-were-too-shy-to-ask/#1



Sort operator

Query optimizer will try to avoid explicit sorting. Usually it points to problem. Inspect is ORDER BY required and can you create or update existing index to support sorting.
ORDER BY can be very expensive!

Join operator "Hash join"

Memory consuming. So called stop & go operation. Selected for unsorted input. Optimizer assumes that injecting sorting and resorting to nested loop wan't help in overall cost so it resorts to hash match. In first phase on outer table hash keys are created based on row values. Then these hash keys are one by one matched to hashes created on fly in inner table. First phase is called "Build" and second is "Probe".
Watch for "fat" outer table. Optimizer should know to choose proper "thinner" candidate for inner table. If fails look into it.
Also pay attention to spill over to disk. From SQL 2012 you'll see a notification in execution plan.
If you use older versions checkout SQL Trace and Extended Events. Spill over is major issue since it indicates that physical reads from TEMPDB occured. Cause of problem is in CE.


Parallelism

You can hint that you don't want it using DOP hint (degree of parallelism). Not good or bad. It consumes resources.

Parameter sniffing

Related to executing stored procedures with parameters. Optimizer creates cached plan for value of parameter and then uses the same when stored proc gets executed with different value for parameter.
In some cases this results in long execution of stored proc for some parameter values. Why? Imagine table with 10 million rows. If you try to filter by parameter using stored proc 1000 rows then plan will be optimized expecting 1000 rows. When afterwards you use it to filter parameter value that should return 1 million rows another query plan would be more suitable but query optimizer won't create it. It will use existing cached that was optimized for 1000 rows.

https://www.mssqltips.com/sqlservertip/3257/different-approaches-to-correct-sql-server-parameter-sniffing/
















Aug 1, 2017

Design patterns

Types:

  • Creational
  • Structural
  • Behavioral 


Singleton 

  • requiring only one instance! For example only one (single) object may access text file or some resources at the time.
  • create sealed class that cant' be instantiated. Through one static public method (usually called Singleton) provide access to instance of of class. Ensure thread safe using lock and static object.
Factory

  • objects that are heavily used and frequently instantiated in various places. We expect that object creation may be affected in future by some other parameters. So we dedicate special class.
  • create static class with static method to handle object creation.
Unit of work
  • we need to combine many CRUD operations on different related entities in on single unit of work (transaction). Either it goes all or nothing.

Repository
Use a repository to separate the logic that retrieves the data and maps it to the entity model from the business logic that acts on the model. The business logic should be agnostic to the type of data that comprises the data source layer. For example, the data source layer can be a database, a SharePoint list, or a Web service.